GNU bug report logs -
#15759
regression in grep 2.15 with PCRE searches
Previous Next
Reported by: Dave Reisner <d <at> falconindy.com>
Date: Wed, 30 Oct 2013 17:40:06 UTC
Severity: normal
Merged with 15758
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 15759 in the body.
You can then email your comments to 15759 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#15759
; Package
grep
.
(Wed, 30 Oct 2013 17:40:07 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Dave Reisner <d <at> falconindy.com>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Wed, 30 Oct 2013 17:40:08 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi,
A user reported a regression in grep 2.15 which is easily reproducible
as ``grep -P foo /bin/mount''. The root cause is that pcre_exec is
returning PCRE_ERROR_BADUTF8 when the current locale supports UTF-8.
This is unhandled by grep and causes it to call abort().
I bisected the breakage to commit 67436786c110bb which essentially
introduces UTF-8 validation for all searched data. In a large number of
file hierarchies, one may easily hit this via a recursive search.
I crafted the following inline diff which fixes the problem. While I'm
not sure of its correctness, it at least describes one possible fix.
diff --git a/src/pcresearch.c b/src/pcresearch.c
index ad5999d..ce55ab3 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -176,6 +176,9 @@ Pexecute (char const *buf, size_t size, size_t *match_size,
switch (e)
{
case PCRE_ERROR_NOMATCH:
+#ifdef HAVE_LANGINFO_CODESET
+ case PCRE_ERROR_BADUTF8:
+#endif
return -1;
case PCRE_ERROR_NOMEMORY:
Cheers,
Dave
Information forwarded
to
bug-grep <at> gnu.org
:
bug#15759
; Package
grep
.
(Wed, 30 Oct 2013 21:20:03 GMT)
Full text and
rfc822 format available.
Message #8 received at 15759 <at> debbugs.gnu.org (full text, mbox):
merge 15758 15759
stop
bug#15758 is the same as bug#15759, so I'm merging them,
to avoid confusion or the risk of dispersing the discussion.
Regards,
Stefano
Merged 15758 15759.
Request was from
Stefano Lattarini <stefano.lattarini <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Wed, 30 Oct 2013 21:20:05 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#15759
; Package
grep
.
(Thu, 31 Oct 2013 15:27:03 GMT)
Full text and
rfc822 format available.
Message #13 received at 15759 <at> debbugs.gnu.org (full text, mbox):
> bug#15758 is the same as bug#15759, so I'm merging them,
> to avoid confusion or the risk of dispersing the discussion.
Thanks, Stefano and Dave.
With this and the nit about --version output being wrong, I now have
two reasons to make a new release. I will take a look at the mass of
PCRE_ERROR* cases today.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#15759
; Package
grep
.
(Sat, 02 Nov 2013 23:07:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 15759 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Thu, Oct 31, 2013 at 8:26 AM, Jim Meyering <jim <at> meyering.net> wrote:
...
> With this and the nit about --version output being wrong, I now have
> two reasons to make a new release.
Thanks again for the report, Dave.
Here's the fix I expect to push:
[k.txt (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#15759
; Package
grep
.
(Mon, 04 Nov 2013 19:39:03 GMT)
Full text and
rfc822 format available.
Message #19 received at 15759 <at> debbugs.gnu.org (full text, mbox):
On Sat, Nov 02, 2013 at 04:05:52PM -0700, Jim Meyering wrote:
> On Thu, Oct 31, 2013 at 8:26 AM, Jim Meyering <jim <at> meyering.net> wrote:
> ...
> > With this and the nit about --version output being wrong, I now have
> > two reasons to make a new release.
>
> Thanks again for the report, Dave.
> Here's the fix I expect to push:
Thanks Jim.
Apologies for not responding to this sooner. I tested your patch and can
confirm that the behavior is better, but the new behavior still seems
like a regression. Take, for example, the simple instance of grep'ing
grep's own git repo.
# with grep 2.14
$ grep -rPw GNULIB
gnulib/m4/bison.m4:dnl Declaring YACC & YFLAGS precious will not be necessary after GNULIB
gnulib/lib/glob.c: HAVE_STRUCT_DIRENT_D_TYPE plays the same role in GNULIB. */
gnulib/lib/netdb.in.h: GNULIB getaddrinfo() replacement, so are not yet needed.
gnulib/lib/argp.h:/* GNULIB makes sure both program_invocation_name and
# with grep built from HEAD
$ ./src/grep -rPw GNULIB
./src/grep: invalid UTF-8 byte sequence in input
I would expect that the invalid UTF-8 wouldn't stop grep cold, but
continue on, ignoring the non-matching data, just as grep without the -P
flag does.
Cheers,
Dave
Information forwarded
to
bug-grep <at> gnu.org
:
bug#15759
; Package
grep
.
(Tue, 05 Nov 2013 16:18:02 GMT)
Full text and
rfc822 format available.
Message #22 received at 15759 <at> debbugs.gnu.org (full text, mbox):
On Mon, Nov 4, 2013 at 11:38 AM, Dave Reisner <d <at> falconindy.com> wrote:
> On Sat, Nov 02, 2013 at 04:05:52PM -0700, Jim Meyering wrote:
>> On Thu, Oct 31, 2013 at 8:26 AM, Jim Meyering <jim <at> meyering.net> wrote:
>> ...
>> > With this and the nit about --version output being wrong, I now have
>> > two reasons to make a new release.
>>
>> Thanks again for the report, Dave.
>> Here's the fix I expect to push:
>
> Thanks Jim.
>
> Apologies for not responding to this sooner. I tested your patch and can
> confirm that the behavior is better, but the new behavior still seems
> like a regression. Take, for example, the simple instance of grep'ing
> grep's own git repo.
>
> # with grep 2.14
> $ grep -rPw GNULIB
> gnulib/m4/bison.m4:dnl Declaring YACC & YFLAGS precious will not be necessary after GNULIB
> gnulib/lib/glob.c: HAVE_STRUCT_DIRENT_D_TYPE plays the same role in GNULIB. */
> gnulib/lib/netdb.in.h: GNULIB getaddrinfo() replacement, so are not yet needed.
> gnulib/lib/argp.h:/* GNULIB makes sure both program_invocation_name and
>
> # with grep built from HEAD
> $ ./src/grep -rPw GNULIB
> ./src/grep: invalid UTF-8 byte sequence in input
>
> I would expect that the invalid UTF-8 wouldn't stop grep cold, but
> continue on, ignoring the non-matching data, just as grep without the -P
> flag does.
Hi Dave,
I agree, and so does pcregrep. There are a few other problems with
grep's PCRE driver code: for example, a problem (no matter how serious)
in one file should not cause the entire grep run to exit; grep should
continue processing remaining files. And when grep reports the problem,
it should include at least the file name in the diagnostic.
I will fix those before the upcoming snapshot.
Thanks,
Jim
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 20 Feb 2014 12:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 11 years and 123 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.