GNU bug report logs -
#60697
GNU grep mishandles \b near encoding errors
Previous Next
Full log
Message #8 received at 60697 <at> debbugs.gnu.org (full text, mbox):
On Mon, Jan 9, 2023 at 10:16 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Here's a shell session illustrating the problem on Fedora 37, which has
> GNU grep 3.7. The same bug is still in bleeding-edge GNU grep.
>
> $ export LC_ALL=en_US.utf8
> $ printf '\300\n' | grep '\b'
> grep: (standard input): binary file matches
> $ printf '\300\n' | grep -P '\b'
> $
>
> Plain grep finds a word boundary in the input even though the input
> contains no words (just an encoding error). 'grep -P' does the right thing.
>
> The underlying issue is in the glibc regex code so the fix should be in
> glibc / Gnulib, but I thought I'd report it here before I forgot it.
Thanks! While this would definitely be nice to fix before the release
(in the next week or so), it's enough of a corner case that I wouldn't
feel bad releasing without a fix.
For the record, this problem first arose in grep-2.19.
This bug report was last modified 2 years and 212 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.