GNU bug report logs -
#16586
grep: infinite loop in grep -P on some files with invalid UTF-8 sequences
Previous Next
Reported by: Santiago <santiago <at> debian.org>
Date: Wed, 29 Jan 2014 09:46:02 UTC
Severity: important
Found in version 2.16
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Forwarded to Philip Hazel <ph10@hermes.cam.ac.uk>
Full log
View this message in rfc822 format
On Wed, Apr 23, 2014 at 10:39 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Jim Meyering wrote:
>>
>> anyone using grep -P to search data that is even a tiny bit
>> inconsistent with their UTF-8 locale will now get an exit status of
>> 2 rather than the matches they used to get.
>
>
> Yes, I don't like that either, but <http://bugs.exim.org/1468> says libpcre
Oh! I had not read that. That is disappointing.
> intends to have undefined behavior here. If so, it wouldn't help to wait
> until the next libprce release, which may well have a serious bug of this
> form in a different area, a bug that's not easy to test for.
Indeed.
> Perhaps somebody should modify grep -P to discard input lines containing
> non-UTF-8 data instead of presenting them to libprce. That way, it would be
> safe for grep -P to use PCRE_NO_UTF8_CHECK. Although grep -P should report
> an error and exit with status 2 if it discards input due to encoding errors,
> it can also report matches in lines that do not contain encoding errors, so
> that users can see both the error messages and the matches.
That sounds reasonable, but I don't like the requirement that
one make two passes over each subject text.
This bug report was last modified 11 years and 33 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.