GNU bug report logs -
#15758
grep 2.15 calls abort() on larger searches with -P
Previous Next
Full log
Message #25 received at 15758 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Tue, Nov 05, 2013 at 08:17:15AM -0800, Jim Meyering wrote:
...
>
> Hi Dave,
>
> I agree, and so does pcregrep. There are a few other problems with
> grep's PCRE driver code: for example, a problem (no matter how serious)
> in one file should not cause the entire grep run to exit; grep should
> continue processing remaining files. And when grep reports the problem,
> it should include at least the file name in the diagnostic.
>
> I will fix those before the upcoming snapshot.
>
> Thanks,
> Jim
>
>
>
Hi there,
This bug was also reported in Debian ( http://bugs.debian.org/730472 ).
Taking a look on it, I think the most suitable solution for the moment
is to flag PCRE_NO_UTF8_CHECK instead of PCRE_UTF8, so
PCRE does not check if inputs are UTF8 valid. Resulting behavior is
similar to pre-grep-2.15. (See 15758-PCRE-no-check-UTF8.patch)
$ grep -Pr "DEFINE" /usr/lib/linux-kbuild-3.2/
/usr/lib/linux-kbuild-3.2/scripts/kernel-doc: if ($prototype =~ m/DEFINE_SINGLE_EVENT\((.*?),/) {
/usr/lib/linux-kbuild-3.2/scripts/kernel-doc: if ($prototype =~ m/DEFINE_EVENT\((.*?),(.*?),/) {
/usr/lib/linux-kbuild-3.2/scripts/kernel-doc:## if ($prototype =~ m/SYSCALL_DEFINE0\s*\(\s*(a-zA-Z0-9_)*\s*\)/) {
...
I have also tested printing a message when a file was invalid, but the results
can be annoying (15758-PCRE-no-exit-UTF8.patch), since a warning is shown even
if files do not match:
$ grep -Pr "DEFINE" /usr/lib/linux-kbuild-3.2/
grep: invalid UTF-8 byte sequence in input
grep: invalid UTF-8 byte sequence in input
grep: invalid UTF-8 byte sequence in input
grep: invalid UTF-8 byte sequence in input
grep: invalid UTF-8 byte sequence in input
grep: invalid UTF-8 byte sequence in input
...
/usr/lib/linux-kbuild-3.2/scripts/kernel-doc: if ($prototype =~ m/DEFINE_SINGLE_EVENT\((.*?),/) {
/usr/lib/linux-kbuild-3.2/scripts/kernel-doc: if ($prototype =~ m/DEFINE_EVENT\((.*?),(.*?),/) {
/usr/lib/linux-kbuild-3.2/scripts/kernel-doc:## if ($prototype =~ m/SYSCALL_DEFINE0\s*\(\s*(a-zA-Z0-9_)*\s*\)/) {
...
I propose 15758-PCRE-no-check-UTF8.patch as solution, at least temporal.
Regards,
Santiago
[15758-PCRE-no-check-UTF8.patch (text/x-diff, attachment)]
[15758-PCRE-no-exit-UTF8.patch (text/x-diff, attachment)]
This bug report was last modified 11 years and 121 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.