GNU bug report logs - #16586
grep: infinite loop in grep -P on some files with invalid UTF-8 sequences

Previous Next

Package: grep;

Reported by: Santiago <santiago <at> debian.org>

Date: Wed, 29 Jan 2014 09:46:02 UTC

Severity: important

Found in version 2.16

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Forwarded to Philip Hazel <ph10@hermes.cam.ac.uk>

Full log


View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net>
To: Santiago <santiago <at> debian.org>
Cc: 16586 <at> debbugs.gnu.org
Subject: bug#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences
Date: Mon, 3 Feb 2014 13:34:14 -0800
On Wed, Jan 29, 2014 at 1:43 AM, Santiago <santiago <at> debian.org> wrote:
> Package: grep
> Version: 2.16
> Severity: important
>
> Hi there,
>
> I forward this bug from debian's BTS. Last changes in -P brought another
> problem. I've confirmed this behavior on last debian package:
>
> ----- Forwarded message from Vincent Lefevre <vincent <at> vinc17.net> -----
>
> [snip]
>
>
> grep -P loops on some files with invalid UTF-8 sequences, e.g.
>
> $ /usr/bin/printf "\xe9\x65\n\xab\n" | grep -P '.e|.?z' | head
> �e
> �e
> �e
> �e
> �e
> �e
> �e
> �e
> �e
> �e
>
> (the infinite loop is interrupted here by a broken pipe due to
> the "head").
>
> It seems that the fix of
>
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730472

Thanks for the heads-up.  That appears to be a problem with pcre.
I've just build grep (git head) against pcre (git head), and adjusted
your example slightly and built with gcc's address sanitizer mode.
Now, libpcre gets an internal segfault:

$ printf "\xe9\n\xab\n" > k; src/grep -P 'e|.?z' k
ASAN:SIGSEGV
=================================================================
==11821==ERROR: AddressSanitizer: SEGV on unknown address
0x62cfffffffff (pc 0x00\
00004f0743 sp 0x7fff6b32f4a0 bp 0x7fff6b32f760 T0)
    #0 0x4f0742 in match /w/co/pcre/pcre_exec.c:5943
    #1 0x4f26d5 in pcre_exec /w/co/pcre/pcre_exec.c:6941
    #2 0x46f421 in Pexecute /w/co/grep/src/pcresearch.c:178
    #3 0x4717a3 in do_execute /w/co/grep/src/main.c:1075
    #4 0x4717a3 in grepbuf /w/co/grep/src/main.c:1111
    #5 0x472249 in grep /w/co/grep/src/main.c:1222
    #6 0x472249 in grepdesc /w/co/grep/src/main.c:1476
    #7 0x4073ca in main /w/co/grep/src/main.c:2396
    #8 0x7f6f21a53cdc in __libc_start_main (/lib64/libc.so.6+0x1ecdc)
    #9 0x408a54 (/w/u/w/co/grep/src/grep+0x408a54)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /w/co/pcre/pcre_exec.c:5943 match
==11821==ABORTING

Sorry, but I don't have time to debug further.  Quick glance suggests
it is backing up too far:

(gdb) b __asan_report_error
Breakpoint 1 at 0x448c40: file
../../.././libsanitizer/asan/asan_report.cc, line 711.
(gdb) r
Starting program: /w/u/w/co/grep/src/grep -P e\|.\?z k
warning: no loadable sections found in added symbol-file
system-supplied DSO at 0x7ffff7ffa000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00000000004f0743 in match (eptr=0x62cfffffffff "",
ecode=0x60700000df8a "\035zx",
    mstart=0x62d00000b002 "\253\n", '\276' <repeats 198 times>...,
offset_top=2, md=0x7fffffffce30, eptrb=0x0, rdepth=0)
    at pcre_exec.c:5943
5943              BACKCHAR(eptr);
(gdb) l
5938              {
5939              if (eptr == pp) goto TAIL_RECURSE;
5940              RMATCH(eptr, ecode, offset_top, md, eptrb, RM46);
5941              if (rrc != MATCH_NOMATCH) RRETURN(rrc);
5942              eptr--;
5943              BACKCHAR(eptr);
5944              if (ctype == OP_ANYNL && eptr > pp  && UCHAR21(eptr)
== CHAR_NL &&
5945                  UCHAR21(eptr - 1) == CHAR_CR) eptr--;
5946              }
5947            }




This bug report was last modified 11 years and 33 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.