GNU bug report logs -
#16586
grep: infinite loop in grep -P on some files with invalid UTF-8 sequences
Previous Next
Reported by: Santiago <santiago <at> debian.org>
Date: Wed, 29 Jan 2014 09:46:02 UTC
Severity: important
Found in version 2.16
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Forwarded to Philip Hazel <ph10@hermes.cam.ac.uk>
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
On Mon, Apr 21, 2014 at 11:03 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 04/16/2014 05:13 AM, Norihiro Tanaka wrote:
>>
>> http://bugs.exim.org/show_bug.cgi?id=1468
>
>
> Thanks. The response there makes it clear that if grep passes arbitrary
> binary data to PCRE, and if grep uses PCRE_NO_UTF8_CHECK, undefined behavior
> will result (maybe infinite loop, core dump, etc.). We can't have undefined
> behavior in grep. A simple fix is to avoid using PCRE_NO_UTF8_CHECK so I
> installed the attached patch to do that. Perhaps we can think of a better
> way at some point. In the meantime I'm taking the liberty of closing
> Bug#17245 and Bug#16586.
Thanks for the patch, but I'm not sure I like the consequences:
that anyone using grep -P to search data that is even a tiny bit
inconsistent with their UTF-8 locale will now get an exit status of
2 rather than the matches they used to get. I would prefer to test for
working PCRE support and disable -P if it is deemed inadequate,
but that may have to wait for the release of a new version of
libpcre.
In any case, I found that this additional change is required,
at least on OS/X, to avoid a test failure:
[k.txt (text/plain, attachment)]
This bug report was last modified 11 years and 33 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.