GNU bug report logs - #15759
regression in grep 2.15 with PCRE searches

Previous Next

Package: grep;

Reported by: Dave Reisner <d <at> falconindy.com>

Date: Wed, 30 Oct 2013 17:40:06 UTC

Severity: normal

Merged with 15758

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 15759 in the body.
You can then email your comments to 15759 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#15759; Package grep. (Wed, 30 Oct 2013 17:40:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Dave Reisner <d <at> falconindy.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Wed, 30 Oct 2013 17:40:08 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Dave Reisner <d <at> falconindy.com>
To: bug-grep <at> gnu.org
Subject: regression in grep 2.15 with PCRE searches
Date: Wed, 30 Oct 2013 13:23:10 -0400
Hi,

A user reported a regression in grep 2.15 which is easily reproducible
as ``grep -P foo /bin/mount''. The root cause is that pcre_exec is
returning PCRE_ERROR_BADUTF8 when the current locale supports UTF-8.
This is unhandled by grep and causes it to call abort().

I bisected the breakage to commit 67436786c110bb which essentially
introduces UTF-8 validation for all searched data. In a large number of
file hierarchies, one may easily hit this via a recursive search.

I crafted the following inline diff which fixes the problem. While I'm
not sure of its correctness, it at least describes one possible fix.

  diff --git a/src/pcresearch.c b/src/pcresearch.c
  index ad5999d..ce55ab3 100644
  --- a/src/pcresearch.c
  +++ b/src/pcresearch.c
  @@ -176,6 +176,9 @@ Pexecute (char const *buf, size_t size, size_t *match_size,
         switch (e)
           {
           case PCRE_ERROR_NOMATCH:
  +#ifdef HAVE_LANGINFO_CODESET
  +        case PCRE_ERROR_BADUTF8:
  +#endif
             return -1;

           case PCRE_ERROR_NOMEMORY:

Cheers,
Dave




Information forwarded to bug-grep <at> gnu.org:
bug#15759; Package grep. (Wed, 30 Oct 2013 21:20:03 GMT) Full text and rfc822 format available.

Message #8 received at 15759 <at> debbugs.gnu.org (full text, mbox):

From: Stefano Lattarini <stefano.lattarini <at> gmail.com>
To: Dave Reisner <dreisner <at> archlinux.org>
Cc: 15759 <at> debbugs.gnu.org, 15758 <at> debbugs.gnu.org
Subject: Re: bug#15758: grep 2.15 calls abort() on larger searches with -P
Date: Wed, 30 Oct 2013 21:19:33 +0000
merge 15758 15759
stop

bug#15758 is the same as bug#15759, so I'm merging them,
to avoid confusion or the risk of dispersing the discussion.

Regards,
  Stefano




Merged 15758 15759. Request was from Stefano Lattarini <stefano.lattarini <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 30 Oct 2013 21:20:05 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#15759; Package grep. (Thu, 31 Oct 2013 15:27:03 GMT) Full text and rfc822 format available.

Message #13 received at 15759 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Stefano Lattarini <stefano.lattarini <at> gmail.com>
Cc: 15759 <at> debbugs.gnu.org, 15758 <at> debbugs.gnu.org,
 Dave Reisner <dreisner <at> archlinux.org>
Subject: Re: bug#15758: grep 2.15 calls abort() on larger searches with -P
Date: Thu, 31 Oct 2013 08:26:10 -0700
> bug#15758 is the same as bug#15759, so I'm merging them,
> to avoid confusion or the risk of dispersing the discussion.

Thanks, Stefano and Dave.
With this and the nit about --version output being wrong, I now have
two reasons to make a new release.  I will take a look at the mass of
PCRE_ERROR* cases today.




Information forwarded to bug-grep <at> gnu.org:
bug#15759; Package grep. (Sat, 02 Nov 2013 23:07:02 GMT) Full text and rfc822 format available.

Message #16 received at 15759 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Stefano Lattarini <stefano.lattarini <at> gmail.com>
Cc: 15759 <at> debbugs.gnu.org, 15758 <at> debbugs.gnu.org,
 Dave Reisner <dreisner <at> archlinux.org>
Subject: Re: bug#15758: grep 2.15 calls abort() on larger searches with -P
Date: Sat, 2 Nov 2013 16:05:52 -0700
[Message part 1 (text/plain, inline)]
On Thu, Oct 31, 2013 at 8:26 AM, Jim Meyering <jim <at> meyering.net> wrote:
...
> With this and the nit about --version output being wrong, I now have
> two reasons to make a new release.

Thanks again for the report, Dave.
Here's the fix I expect to push:
[k.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#15759; Package grep. (Mon, 04 Nov 2013 19:39:03 GMT) Full text and rfc822 format available.

Message #19 received at 15759 <at> debbugs.gnu.org (full text, mbox):

From: Dave Reisner <d <at> falconindy.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 15759 <at> debbugs.gnu.org, 15758 <at> debbugs.gnu.org,
 Dave Reisner <dreisner <at> archlinux.org>,
 Stefano Lattarini <stefano.lattarini <at> gmail.com>
Subject: Re: bug#15758: grep 2.15 calls abort() on larger searches with -P
Date: Mon, 4 Nov 2013 14:38:40 -0500
On Sat, Nov 02, 2013 at 04:05:52PM -0700, Jim Meyering wrote:
> On Thu, Oct 31, 2013 at 8:26 AM, Jim Meyering <jim <at> meyering.net> wrote:
> ...
> > With this and the nit about --version output being wrong, I now have
> > two reasons to make a new release.
> 
> Thanks again for the report, Dave.
> Here's the fix I expect to push:

Thanks Jim.

Apologies for not responding to this sooner. I tested your patch and can
confirm that the behavior is better, but the new behavior still seems
like a regression. Take, for example, the simple instance of grep'ing
grep's own git repo.

# with grep 2.14
$ grep -rPw GNULIB
gnulib/m4/bison.m4:dnl Declaring YACC & YFLAGS precious will not be necessary after GNULIB
gnulib/lib/glob.c:   HAVE_STRUCT_DIRENT_D_TYPE plays the same role in GNULIB.  */
gnulib/lib/netdb.in.h:   GNULIB getaddrinfo() replacement, so are not yet needed.
gnulib/lib/argp.h:/* GNULIB makes sure both program_invocation_name and

# with grep built from HEAD
$ ./src/grep -rPw GNULIB
./src/grep: invalid UTF-8 byte sequence in input

I would expect that the invalid UTF-8 wouldn't stop grep cold, but
continue on, ignoring the non-matching data, just as grep without the -P
flag does.

Cheers,
Dave




Information forwarded to bug-grep <at> gnu.org:
bug#15759; Package grep. (Tue, 05 Nov 2013 16:18:02 GMT) Full text and rfc822 format available.

Message #22 received at 15759 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Dave Reisner <d <at> falconindy.com>
Cc: 15759 <at> debbugs.gnu.org, 15758 <at> debbugs.gnu.org,
 Dave Reisner <dreisner <at> archlinux.org>,
 Stefano Lattarini <stefano.lattarini <at> gmail.com>
Subject: Re: bug#15758: grep 2.15 calls abort() on larger searches with -P
Date: Tue, 5 Nov 2013 08:17:15 -0800
On Mon, Nov 4, 2013 at 11:38 AM, Dave Reisner <d <at> falconindy.com> wrote:
> On Sat, Nov 02, 2013 at 04:05:52PM -0700, Jim Meyering wrote:
>> On Thu, Oct 31, 2013 at 8:26 AM, Jim Meyering <jim <at> meyering.net> wrote:
>> ...
>> > With this and the nit about --version output being wrong, I now have
>> > two reasons to make a new release.
>>
>> Thanks again for the report, Dave.
>> Here's the fix I expect to push:
>
> Thanks Jim.
>
> Apologies for not responding to this sooner. I tested your patch and can
> confirm that the behavior is better, but the new behavior still seems
> like a regression. Take, for example, the simple instance of grep'ing
> grep's own git repo.
>
> # with grep 2.14
> $ grep -rPw GNULIB
> gnulib/m4/bison.m4:dnl Declaring YACC & YFLAGS precious will not be necessary after GNULIB
> gnulib/lib/glob.c:   HAVE_STRUCT_DIRENT_D_TYPE plays the same role in GNULIB.  */
> gnulib/lib/netdb.in.h:   GNULIB getaddrinfo() replacement, so are not yet needed.
> gnulib/lib/argp.h:/* GNULIB makes sure both program_invocation_name and
>
> # with grep built from HEAD
> $ ./src/grep -rPw GNULIB
> ./src/grep: invalid UTF-8 byte sequence in input
>
> I would expect that the invalid UTF-8 wouldn't stop grep cold, but
> continue on, ignoring the non-matching data, just as grep without the -P
> flag does.

Hi Dave,

I agree, and so does pcregrep.  There are a few other problems with
grep's PCRE driver code: for example, a problem (no matter how serious)
in one file should not cause the entire grep run to exit; grep should
continue processing remaining files. And when grep reports the problem,
it should include at least the file name in the diagnostic.

I will fix those before the upcoming snapshot.

Thanks,
Jim




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 20 Feb 2014 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 123 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.