GNU bug report logs - #16586
grep: infinite loop in grep -P on some files with invalid UTF-8 sequences

Previous Next

Package: grep;

Reported by: Santiago <santiago <at> debian.org>

Date: Wed, 29 Jan 2014 09:46:02 UTC

Severity: important

Found in version 2.16

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Forwarded to Philip Hazel <ph10@hermes.cam.ac.uk>

Full log


Message #43 received at 16586 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, 16586 <at> debbugs.gnu.org, 
 Santiago <santiago <at> debian.org>
Cc: 17245 <at> debbugs.gnu.org, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Subject: Re: bug#16586: bug#17245: GREP BUG: grep -P and binary files
Date: Wed, 23 Apr 2014 22:39:10 -0700
Jim Meyering wrote:
> anyone using grep -P to search data that is even a tiny bit
> inconsistent with their UTF-8 locale will now get an exit status of
> 2 rather than the matches they used to get.

Yes, I don't like that either, but <http://bugs.exim.org/1468> says 
libpcre intends to have undefined behavior here.  If so, it wouldn't 
help to wait until the next libprce release, which may well have a 
serious bug of this form in a different area, a bug that's not easy to 
test for.

Perhaps somebody should modify grep -P to discard input lines containing 
non-UTF-8 data instead of presenting them to libprce.  That way, it 
would be safe for grep -P to use PCRE_NO_UTF8_CHECK.  Although grep -P 
should report an error and exit with status 2 if it discards input due 
to encoding errors, it can also report matches in lines that do not 
contain encoding errors, so that users can see both the error messages 
and the matches.





This bug report was last modified 11 years and 33 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.