GNU bug report logs - #16586
grep: infinite loop in grep -P on some files with invalid UTF-8 sequences

Previous Next

Package: grep;

Reported by: Santiago <santiago <at> debian.org>

Date: Wed, 29 Jan 2014 09:46:02 UTC

Severity: important

Found in version 2.16

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Forwarded to Philip Hazel <ph10@hermes.cam.ac.uk>

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Santiago <santiago <at> debian.org>
Subject: bug#16586: closed (Re: bug#17245: GREP BUG: grep -P and binary files)
Date: Mon, 21 Apr 2014 18:04:03 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 16586 <at> debbugs.gnu.org.

-- 
16586: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16586
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17245-done <at> debbugs.gnu.org, 
 16586-done <at> debbugs.gnu.org
Subject: Re: bug#17245: GREP BUG: grep -P and binary files
Date: Mon, 21 Apr 2014 11:03:10 -0700
[Message part 3 (text/plain, inline)]
On 04/16/2014 05:13 AM, Norihiro Tanaka wrote:
> http://bugs.exim.org/show_bug.cgi?id=1468 

Thanks.  The response there makes it clear that if grep passes arbitrary 
binary data to PCRE, and if grep uses PCRE_NO_UTF8_CHECK, undefined 
behavior will result (maybe infinite loop, core dump, etc.).  We can't 
have undefined behavior in grep.  A simple fix is to avoid using 
PCRE_NO_UTF8_CHECK so I installed the attached patch to do that.  
Perhaps we can think of a better way at some point.  In the meantime I'm 
taking the liberty of closing Bug#17245 and Bug#16586.
[0001-grep-P-now-rejects-invalid-input-sequences-in-UTF-8-.patch (text/x-patch, attachment)]
[Message part 5 (message/rfc822, inline)]
From: Santiago <santiago <at> debian.org>
To: submit <at> debbugs.gnu.org
Subject: grep: infinite loop in grep -P on some files with invalid UTF-8
 sequences
Date: Wed, 29 Jan 2014 10:43:46 +0100
Package: grep
Version: 2.16
Severity: important

Hi there,

I forward this bug from debian's BTS. Last changes in -P brought another
problem. I've confirmed this behavior on last debian package:

----- Forwarded message from Vincent Lefevre <vincent <at> vinc17.net> -----

[snip]


grep -P loops on some files with invalid UTF-8 sequences, e.g.

$ /usr/bin/printf "\xe9\x65\n\xab\n" | grep -P '.e|.?z' | head
�e
�e
�e
�e
�e
�e
�e
�e
�e
�e

(the infinite loop is interrupted here by a broken pipe due to
the "head").

It seems that the fix of

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730472

didn't solve all the problems.

-- System Information:
Debian Release: jessie/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.12-1-amd64 (SMP w/2 CPU cores)
Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages grep depends on:
ii  dpkg          1.17.6
ii  install-info  5.2.0.dfsg.1-2
ii  libc6         2.17-97
ii  libpcre3      1:8.31-2

grep recommends no packages.

grep suggests no packages.

-- no debconf information


----- End forwarded message -----



This bug report was last modified 11 years and 33 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.