GNU bug report logs - #60697
GNU grep mishandles \b near encoding errors

Previous Next

Package: grep;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Mon, 9 Jan 2023 23:01:01 UTC

Severity: normal

Full log


Message #8 received at 60697 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 60697 <at> debbugs.gnu.org
Subject: Re: bug#60697: GNU grep mishandles \b near encoding errors
Date: Wed, 11 Jan 2023 22:03:52 -0800
On Mon, Jan 9, 2023 at 10:16 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Here's a shell session illustrating the problem on Fedora 37, which has
> GNU grep 3.7. The same bug is still in bleeding-edge GNU grep.
>
>    $ export LC_ALL=en_US.utf8
>    $ printf '\300\n' | grep '\b'
>    grep: (standard input): binary file matches
>    $ printf '\300\n' | grep -P '\b'
>    $
>
> Plain grep finds a word boundary in the input even though the input
> contains no words (just an encoding error). 'grep -P' does the right thing.
>
> The underlying issue is in the glibc regex code so the fix should be in
> glibc / Gnulib, but I thought I'd report it here before I forgot it.

Thanks! While this would definitely be nice to fix before the release
(in the next week or so), it's enough of a corner case that I wouldn't
feel bad releasing without a fix.

For the record, this problem first arose in grep-2.19.




This bug report was last modified 2 years and 212 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.