GNU bug report logs - #16499
GNU grep-2.16-1.mga4 , grep-2.16 from sources and grep from git master HEAD get stuck during an LC_ALL=en_US.UTF-8 search inside a short binary file

Previous Next

Package: grep;

Reported by: Shlomi Fish <shlomif <at> shlomifish.org>

Date: Sun, 19 Jan 2014 18:47:03 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16499 in the body.
You can then email your comments to 16499 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#16499; Package grep. (Sun, 19 Jan 2014 18:47:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Shlomi Fish <shlomif <at> shlomifish.org>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sun, 19 Jan 2014 18:47:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Shlomi Fish <shlomif <at> shlomifish.org>
To: bug-grep <at> gnu.org
Subject: GNU grep-2.16-1.mga4 , grep-2.16 from sources and grep from git
 master HEAD get stuck during an LC_ALL=en_US.UTF-8 search inside a short
 binary file
Date: Sun, 19 Jan 2014 17:10:11 +0200
[Message part 1 (text/plain, inline)]
Hi all,

after I save the attached file as 1.dat , I see that grep -iP on '^Subject:'
or on '^S' gets stuck in the en_US.UTF-8 locale. It is fine in pcregrep and in
ack.

[SHELL]
shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8
~/apps/TEST-grep-from-git-TO-DEL/bin/grep -iP '^Subject:' < 1.dat ^C

real    0m4.199s
user    0m4.195s
sys     0m0.003s
shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8
~/apps/TEST-grep-from-git-TO-DEL/bin/grep -iP '^S' < 1.dat ^C

real    0m3.486s
user    0m3.485s
sys     0m0.001s
shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8
~/apps/TEST-grep-from-git-TO-DEL/bin/grep -iE '^S' < 1.dat

real    0m0.002s
user    0m0.002s
sys     0m0.000s
shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8
~/apps/TEST-grep-from-git-TO-DEL/bin/grep -P '^S' < 1.dat ^C

real    0m1.887s
user    0m1.885s
sys     0m0.000s
shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8
~/apps/TEST-grep-from-git-TO-DEL/bin/grep -P '^Subject:' < 1.dat

real    0m0.003s
user    0m0.000s
sys     0m0.002s
shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8
~/apps/TEST-grep-from-git-TO-DEL/bin/grep -P '^Subject:' < 1.dat time LC_ALL=C
~/apps/TEST-grep-from-git-TO-DEL/bin/grep -iP '^Subject:' < 1.dat

real    0m0.003s
user    0m0.001s
sys     0m0.001s
shlomif <at> telaviv1:~$ time LC_ALL=C pcregrep -i '^Subject:' < 1.dat

real    0m0.002s
user    0m0.001s
sys     0m0.000s
shlomif <at> telaviv1:~$ time LC_ALL=C ack -i '^Subject:' 1.dat

real    0m0.066s
user    0m0.059s
sys     0m0.007s
shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8 ack -i '^Subject:' 1.dat

real    0m0.070s
user    0m0.063s
sys     0m0.006s
[/SHELL]

The same thing happens with grep-2.16 built from the sources. I'm on Mageia
Linux x86-64 Cauldron (what will be Mageia 4). 

shlomif <at> telaviv1:~$ ldd ~/apps/TEST-grep-from-git-TO-DEL/bin/grep 
        linux-vdso.so.1 (0x00007fff2a7fe000)
        libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f19ed302000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f19ecf4d000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f19ecd30000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f19ed568000)
shlomif <at> telaviv1:~$ rpm -qf /lib64/libpcre.so.1
lib64pcre1-8.33-2.mga4

Regards,

	Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Humanity - Parody of Modern Life - http://shlom.in/humanity

Linux — Because Software Problems Should not Cost Money.

Please reply to list if it's a mailing list post - http://shlom.in/reply .
[1.dat (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#16499; Package grep. (Mon, 20 Jan 2014 01:56:02 GMT) Full text and rfc822 format available.

Message #8 received at 16499 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Shlomi Fish <shlomif <at> shlomifish.org>, 16499 <at> debbugs.gnu.org
Cc: shlomif <shlomif <at> gmail.com>, Paolo Bonzini <bonzini <at> gnu.org>,
 Tony Abou-Assaleh <taa <at> acm.org>
Subject: Re: GNU grep-2.16-1.mga4 , grep-2.16 from sources and grep from git
 master HEAD get stuck during an LC_ALL=en_US.UTF-8 search inside a short
 binary file
Date: Sun, 19 Jan 2014 17:55:18 -0800
[resending also to the correct bug address]
> Subject: GNU grep-2.16-1.mga4 , grep-2.16 from sources and grep from git master HEAD get stuck during an LC_ALL=en_US.UTF-8 search inside a short binary file
> Hi all,
>
> after I save the attached file as 1.dat , I see that grep -iP on '^Subject:'
> or on '^S' gets stuck in the en_US.UTF-8 locale. It is fine in pcregrep and in
> ack.
>
> [SHELL]
> shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8
> ~/apps/TEST-grep-from-git-TO-
DEL/bin/grep -iP '^Subject:' < 1.dat ^C
>
> real    0m4.199s
> user    0m4.195s
> sys     0m0.003s

Thanks for the report.  I am unable to reproduce that on debian
unstable using the latest grep:

  $ env LC_ALL=en_US.UTF-8 time -f %e grep -iP '^S' < /t/1.dat
  Command exited with non-zero status 1
  0.00
  [Exit 1]

A good way for you to diagnose it is to run under strace or, better,
via gdb and find out precisely what code it running when it is making
no progress.




Information forwarded to bug-grep <at> gnu.org:
bug#16499; Package grep. (Mon, 20 Jan 2014 10:36:02 GMT) Full text and rfc822 format available.

Message #11 received at 16499 <at> debbugs.gnu.org (full text, mbox):

From: Paolo Bonzini <bonzini <at> gnu.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: shlomif <shlomif <at> gmail.com>, Tony Abou-Assaleh <taa <at> acm.org>,
 16499 <at> debbugs.gnu.org, Shlomi Fish <shlomif <at> shlomifish.org>
Subject: Re: GNU grep-2.16-1.mga4 , grep-2.16 from sources and grep from git
 master HEAD get stuck during an LC_ALL=en_US.UTF-8 search inside a short
 binary file
Date: Mon, 20 Jan 2014 11:35:21 +0100
Il 20/01/2014 02:55, Jim Meyering ha scritto:
> Thanks for the report.  I am unable to reproduce that on debian
> unstable using the latest grep:
> 
>   $ env LC_ALL=en_US.UTF-8 time -f %e grep -iP '^S' < /t/1.dat
>   Command exited with non-zero status 1
>   0.00
>   [Exit 1]
> 
> A good way for you to diagnose it is to run under strace or, better,
> via gdb and find out precisely what code it running when it is making
> no progress.

I reproduced it with Fedora 20.

$ rpm -q grep pcre
grep-2.16-1.fc20.x86_64
pcre-8.33-2.fc20.1.x86_64

Paolo




Information forwarded to bug-grep <at> gnu.org:
bug#16499; Package grep. (Mon, 20 Jan 2014 10:55:02 GMT) Full text and rfc822 format available.

Message #14 received at 16499 <at> debbugs.gnu.org (full text, mbox):

From: Shlomi Fish <shlomif <at> shlomifish.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: Paolo Bonzini <bonzini <at> gnu.org>, shlomif <shlomif <at> gmail.com>,
 16499 <at> debbugs.gnu.org
Subject: Re: GNU grep-2.16-1.mga4 , grep-2.16 from sources and grep from git
 master HEAD get stuck during an LC_ALL=en_US.UTF-8 search inside a short
 binary file
Date: Mon, 20 Jan 2014 12:54:37 +0200
Hi Jim,

On Sun, 19 Jan 2014 17:55:18 -0800
Jim Meyering <jim <at> meyering.net> wrote:

> [resending also to the correct bug address]
> > Subject: GNU grep-2.16-1.mga4 , grep-2.16 from sources and grep from git
> > master HEAD get stuck during an LC_ALL=en_US.UTF-8 search inside a short
> > binary file Hi all,
> >
> > after I save the attached file as 1.dat , I see that grep -iP on '^Subject:'
> > or on '^S' gets stuck in the en_US.UTF-8 locale. It is fine in pcregrep and
> > in ack.
> >
> > [SHELL]
> > shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8
> > ~/apps/TEST-grep-from-git-TO-
> DEL/bin/grep -iP '^Subject:' < 1.dat ^C
> >
> > real    0m4.199s
> > user    0m4.195s
> > sys     0m0.003s
> 
> Thanks for the report.  I am unable to reproduce that on debian
> unstable using the latest grep:
> 
>   $ env LC_ALL=en_US.UTF-8 time -f %e grep -iP '^S' < /t/1.dat
>   Command exited with non-zero status 1
>   0.00
>   [Exit 1]
> 
> A good way for you to diagnose it is to run under strace or, better,
> via gdb and find out precisely what code it running when it is making
> no progress.

After some investigation I discovered that the problem was manifested on x86-64
systems only with PCRE-8.x that was built with JIT support (and --enable-utf
too naturally). The problem happens in a JIT-generated function without
debugging symbols. 

If I built PCRE and GNU grep-2.16 like this on a Debian Testing ("jessie")
x86-64 VM then running LC_ALL=en_US.UTF-8 ~/apps/grep/bin/grep -iP '^S' < 1.dat
caused it to hang:

BUILD_pcre.bash:

«
#!/bin/bash
CFLAGS="-g" ./configure --prefix="$HOME/apps/pcre" --enable-utf --enable-jit
»

BUILD_grep.bash:

«
#!/bin/bash
# Source this file.
export CPATH="/home/shlomif/apps/pcre/include/"
export LD_LIBRARY_PATH="/home/shlomif/apps/pcre/lib"
export LIBRARY_PATH="/home/shlomif/apps/pcre/lib"
CFLAGS="-g" ./configure --prefix="$HOME/apps/grep"
»

(searcing for «-iP '^Su'» was fine).

In any case here is the output of running this gdb command set on the grep on
my system:

«
set args -iP '^Subject:' 1.dat
b main
r
b 2280
c
b 2398
c
b grepfile
c
b grepdesc
c
b grep
c
b grepbuf
c
b do_execute
c
b 1077
c
s
b 178
c
s
b 6514
c
s
b 9519
c
bt
q

»

Output is:

«
shlomif <at> telaviv1:~/conf/bugs/gnu-grep$ LC_ALL=en_US.UTF-8 gdb --command=cmds.gdb ~/apps/TEST-grep-from-git-TO-DEL/bin/grep 
GNU gdb (GDB) 7.6-6.mga4 (Mageia release 4)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-mageia-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/shlomif/apps/TEST-grep-from-git-TO-DEL/bin/grep...done.
Breakpoint 1 at 0x407fdf: file main.c, line 1960.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, main (argc=4, argv=0x7fffffffd3e8) at main.c:1960
1960      exit_failure = EXIT_TROUBLE;
Breakpoint 2 at 0x408984: file main.c, line 2280.

Breakpoint 2, main (argc=4, argv=0x7fffffffd3e8) at main.c:2280
2280      if (color_option == 2)
Breakpoint 3 at 0x408d54: file main.c, line 2398.

Breakpoint 3, main (argc=4, argv=0x7fffffffd3e8) at main.c:2398
2398            status &= grep_command_line_arg (argv[optind]);
Breakpoint 4 at 0x406dd0: file main.c, line 1372.

Breakpoint 4, grepfile (dirdesc=-100, name=0x7fffffffd94e "1.dat", follow=1, 
    command_line=1) at main.c:1372
1372      int desc = openat_safer (dirdesc, name, O_RDONLY | (follow ? 0 : O_NOFOLLOW));
Breakpoint 5 at 0x406e66: file main.c, line 1386.

Breakpoint 5, grepdesc (desc=8, command_line=1) at main.c:1386
1386      int status = 1;
Breakpoint 6 at 0x406579: file main.c, line 1165.

Breakpoint 6, grep (fd=8, st=0x7fffffffd090) at main.c:1165
1165      char eol = eolbyte;
Breakpoint 7 at 0x4063b0: file main.c, line 1111.

Breakpoint 7, grepbuf (beg=0x62d000 "\277P\"\360\276P", lim=0x62d329 "")
    at main.c:1111
1111      nlines = 0;
Breakpoint 8 at 0x406279: file main.c, line 1076.

Breakpoint 8, do_execute (buf=0x62d000 "\277P\"\360\276P", size=809, 
    match_size=0x7fffffffcfb8, start_ptr=0x0) at main.c:1076
1076      if (MB_CUR_MAX == 1 || !match_icase)
Breakpoint 9 at 0x40628e: file main.c, line 1077.

Breakpoint 9, do_execute (buf=0x62d000 "\277P\"\360\276P", size=809, 
    match_size=0x7fffffffcfb8, start_ptr=0x0) at main.c:1077
1077        return execute (buf, size, match_size, start_ptr);
Pexecute (buf=0x62d000 "\277P\"\360\276P", size=809, 
---Type <return> to continue, or q <return> to quit---
    match_size=0x7fffffffcfb8, start_ptr=0x0) at pcresearch.c:152
152       int e = PCRE_ERROR_NOMATCH;
Breakpoint 10 at 0x404b04: file pcresearch.c, line 178.

Breakpoint 10, Pexecute (buf=0x62d000 "\277P\"\360\276P", size=809, 
    match_size=0x7fffffffcfb8, start_ptr=0x0) at pcresearch.c:178
178           e = pcre_exec (cre, extra, line_buf, line_end - line_buf,
pcre_exec (argument_re=0x62c7f0, extra_data=0x62c930, 
    subject=0x62d000 "\277P\"\360\276P", length=808, start_offset=0, 
    options=8192, offsets=0x7fffffffca40, offsetcount=300) at pcre_exec.c:6392
6392    {
Breakpoint 11 at 0x7ffff7b9d797: file pcre_exec.c, line 6514.

Breakpoint 11, pcre_exec (argument_re=0x62c7f0, extra_data=<optimized out>, 
    subject=0x62d000 "\277P\"\360\276P", length=808, start_offset=0, 
    options=8192, offsets=0x7fffffffca40, offsetcount=300) at pcre_exec.c:6514
6514      rc = PRIV(jit_exec)(extra_data, (const pcre_uchar *)subject, length,
_pcre_jit_exec (extra_data=extra_data <at> entry=0x62c930, 
    subject=subject <at> entry=0x62d000 "\277P\"\360\276P", 
    length=length <at> entry=808, start_offset=start_offset <at> entry=0, 
    options=options <at> entry=8192, offsets=offsets <at> entry=0x7fffffffca40, 
    offset_count=offset_count <at> entry=300) at pcre_jit_compile.c:9460
9460    {
Breakpoint 12 at 0x7ffff7bc08d4: file pcre_jit_compile.c, line 9519.

Breakpoint 12, _pcre_jit_exec (extra_data=extra_data <at> entry=0x62c930, 
    subject=subject <at> entry=0x62d000 "\277P\"\360\276P", 
    length=length <at> entry=808, start_offset=start_offset <at> entry=0, 
    options=options <at> entry=8192, offsets=offsets <at> entry=0x7fffffffca40, 
    offset_count=2, offset_count <at> entry=300) at pcre_jit_compile.c:9519
9519      retval = convert_executable_func.call_executable_func(&arguments);
#0  _pcre_jit_exec (extra_data=extra_data <at> entry=0x62c930, 
    subject=subject <at> entry=0x62d000 "\277P\"\360\276P", 
    length=length <at> entry=808, start_offset=start_offset <at> entry=0, 
    options=options <at> entry=8192, offsets=offsets <at> entry=0x7fffffffca40, 
    offset_count=2, offset_count <at> entry=300) at pcre_jit_compile.c:9519
#1  0x00007ffff7b9d7c0 in pcre_exec (argument_re=0x62c7f0, 
    extra_data=<optimized out>, subject=0x62d000 "\277P\"\360\276P", 
    length=808, start_offset=0, options=8192, offsets=0x7fffffffca40, 
    offsetcount=300) at pcre_exec.c:6514
#2  0x0000000000404b48 in Pexecute (buf=0x62d000 "\277P\"\360\276P", size=809, 
    match_size=0x7fffffffcfb8, start_ptr=0x0) at pcresearch.c:178
#3  0x00000000004062a7 in do_execute (buf=0x62d000 "\277P\"\360\276P", 
    size=809, match_size=0x7fffffffcfb8, start_ptr=0x0) at main.c:1077
#4  0x0000000000406509 in grepbuf (beg=0x62d000 "\277P\"\360\276P", 
    lim=0x62d329 "") at main.c:1113
#5  0x00000000004067fb in grep (fd=8, st=0x7fffffffd090) at main.c:1224
#6  0x0000000000407186 in grepdesc (desc=8, command_line=1) at main.c:1478
#7  0x0000000000406e4c in grepfile (dirdesc=-100, name=0x7fffffffd94e "1.dat", 
    follow=1, command_line=1) at main.c:1379
#8  0x000000000040737a in grep_command_line_arg (arg=0x7fffffffd94e "1.dat")
---Type <return> to continue, or q <return> to quit---
    at main.c:1530
#9  0x0000000000408d76 in main (argc=4, argv=0x7fffffffd3e8) at main.c:2398
A debugging session is active.

        Inferior 1 [process 14934] will be killed.

Quit anyway? (y or n) [answered Y; input not from terminal]
shlomif <at> telaviv1:~/conf/bugs/gnu-grep$ 

»

So it seems the problem is with a JIT-enabled PCRE (though I don't know why it
is not manifested with pcregrep).

Regards,

	Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Original Riddles - http://www.shlomifish.org/puzzles/

Writing a BitKeeper replacement is probably easier at this point than getting
its license changed.
    — Matt Mackall (who ended up writing a BitKeeper replacement)

Please reply to list if it's a mailing list post - http://shlom.in/reply .




Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Tue, 21 Jan 2014 23:57:02 GMT) Full text and rfc822 format available.

Notification sent to Shlomi Fish <shlomif <at> shlomifish.org>:
bug acknowledged by developer. (Tue, 21 Jan 2014 23:57:03 GMT) Full text and rfc822 format available.

Message #19 received at 16499-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Shlomi Fish <shlomif <at> shlomifish.org>
Cc: Paolo Bonzini <bonzini <at> gnu.org>, shlomif <shlomif <at> gmail.com>,
 16499-done <at> debbugs.gnu.org
Subject: Re: GNU grep-2.16-1.mga4 , grep-2.16 from sources and grep from git
 master HEAD get stuck during an LC_ALL=en_US.UTF-8 search inside a short
 binary file
Date: Tue, 21 Jan 2014 15:56:21 -0800
tags 16499 notabug
close 16499
thanks

Thank you for investigating.  I agree that it sure looks like the bug
is in libpcre, and not in grep itself.  If you haven't already
reported it to libpcre developers, would you please do that?




Information forwarded to bug-grep <at> gnu.org:
bug#16499; Package grep. (Wed, 22 Jan 2014 07:53:02 GMT) Full text and rfc822 format available.

Message #22 received at 16499-done <at> debbugs.gnu.org (full text, mbox):

From: Shlomi Fish <shlomif <at> shlomifish.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: Paolo Bonzini <bonzini <at> gnu.org>, shlomif <shlomif <at> gmail.com>,
 16499-done <at> debbugs.gnu.org
Subject: Re: GNU grep-2.16-1.mga4 , grep-2.16 from sources and grep from git
 master HEAD get stuck during an LC_ALL=en_US.UTF-8 search inside a short
 binary file
Date: Wed, 22 Jan 2014 09:52:12 +0200
Hi,

On Tue, 21 Jan 2014 15:56:21 -0800
Jim Meyering <jim <at> meyering.net> wrote:

> Thank you for investigating.  I agree that it sure looks like the bug
> is in libpcre, and not in grep itself.  If you haven't already
> reported it to libpcre developers, would you please do that?

reported here:

http://bugs.exim.org/show_bug.cgi?id=1437

Thanks!

— Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Best Introductory Programming Language - http://shlom.in/intro-lang

“I simply hate, detest, loathe, despise, and abhor redundancy.”
    — http://uncyclopedia.org/wiki/Redundancy

Please reply to list if it's a mailing list post - http://shlom.in/reply .




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 19 Feb 2014 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 117 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.