GNU bug report logs - #16586
grep: infinite loop in grep -P on some files with invalid UTF-8 sequences

Previous Next

Package: grep;

Reported by: Santiago <santiago <at> debian.org>

Date: Wed, 29 Jan 2014 09:46:02 UTC

Severity: important

Found in version 2.16

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Forwarded to Philip Hazel <ph10@hermes.cam.ac.uk>

Full log

View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17245 <17245 <at> debbugs.gnu.org>, Santiago <santiago <at> debian.org>, 16586 <at> debbugs.gnu.org, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Subject: bug#16586: bug#17245: GREP BUG: grep -P and binary files
Date: Thu, 24 Apr 2014 08:29:07 -0700

On Wed, Apr 23, 2014 at 10:39 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Jim Meyering wrote:
>>
>> anyone using grep -P to search data that is even a tiny bit
>> inconsistent with their UTF-8 locale will now get an exit status of
>> 2 rather than the matches they used to get.
>
>
> Yes, I don't like that either, but <http://bugs.exim.org/1468> says libpcre

Oh! I had not read that. That is disappointing.

> intends to have undefined behavior here.  If so, it wouldn't help to wait
> until the next libprce release, which may well have a serious bug of this
> form in a different area, a bug that's not easy to test for.

Indeed.

> Perhaps somebody should modify grep -P to discard input lines containing
> non-UTF-8 data instead of presenting them to libprce.  That way, it would be
> safe for grep -P to use PCRE_NO_UTF8_CHECK.  Although grep -P should report
> an error and exit with status 2 if it discards input due to encoding errors,
> it can also report matches in lines that do not contain encoding errors, so
> that users can see both the error messages and the matches.

That sounds reasonable, but I don't like the requirement that
one make two passes over each subject text.

This bug report was last modified 11 years and 87 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #16586 grep: infinite loop in grep -P on some files with invalid UTF-8 sequences

GNU bug report logs - #16586
grep: infinite loop in grep -P on some files with invalid UTF-8 sequences