Package: grep;
Reported by: Shlomi Fish <shlomif <at> shlomifish.org>
Date: Sun, 19 Jan 2014 18:47:03 UTC
Severity: normal
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
Message #14 received at 16499 <at> debbugs.gnu.org (full text, mbox):
From: Shlomi Fish <shlomif <at> shlomifish.org> To: Jim Meyering <jim <at> meyering.net> Cc: Paolo Bonzini <bonzini <at> gnu.org>, shlomif <shlomif <at> gmail.com>, 16499 <at> debbugs.gnu.org Subject: Re: GNU grep-2.16-1.mga4 , grep-2.16 from sources and grep from git master HEAD get stuck during an LC_ALL=en_US.UTF-8 search inside a short binary file Date: Mon, 20 Jan 2014 12:54:37 +0200
Hi Jim, On Sun, 19 Jan 2014 17:55:18 -0800 Jim Meyering <jim <at> meyering.net> wrote: > [resending also to the correct bug address] > > Subject: GNU grep-2.16-1.mga4 , grep-2.16 from sources and grep from git > > master HEAD get stuck during an LC_ALL=en_US.UTF-8 search inside a short > > binary file Hi all, > > > > after I save the attached file as 1.dat , I see that grep -iP on '^Subject:' > > or on '^S' gets stuck in the en_US.UTF-8 locale. It is fine in pcregrep and > > in ack. > > > > [SHELL] > > shlomif <at> telaviv1:~$ time LC_ALL=en_US.UTF-8 > > ~/apps/TEST-grep-from-git-TO- > DEL/bin/grep -iP '^Subject:' < 1.dat ^C > > > > real 0m4.199s > > user 0m4.195s > > sys 0m0.003s > > Thanks for the report. I am unable to reproduce that on debian > unstable using the latest grep: > > $ env LC_ALL=en_US.UTF-8 time -f %e grep -iP '^S' < /t/1.dat > Command exited with non-zero status 1 > 0.00 > [Exit 1] > > A good way for you to diagnose it is to run under strace or, better, > via gdb and find out precisely what code it running when it is making > no progress. After some investigation I discovered that the problem was manifested on x86-64 systems only with PCRE-8.x that was built with JIT support (and --enable-utf too naturally). The problem happens in a JIT-generated function without debugging symbols. If I built PCRE and GNU grep-2.16 like this on a Debian Testing ("jessie") x86-64 VM then running LC_ALL=en_US.UTF-8 ~/apps/grep/bin/grep -iP '^S' < 1.dat caused it to hang: BUILD_pcre.bash: « #!/bin/bash CFLAGS="-g" ./configure --prefix="$HOME/apps/pcre" --enable-utf --enable-jit » BUILD_grep.bash: « #!/bin/bash # Source this file. export CPATH="/home/shlomif/apps/pcre/include/" export LD_LIBRARY_PATH="/home/shlomif/apps/pcre/lib" export LIBRARY_PATH="/home/shlomif/apps/pcre/lib" CFLAGS="-g" ./configure --prefix="$HOME/apps/grep" » (searcing for «-iP '^Su'» was fine). In any case here is the output of running this gdb command set on the grep on my system: « set args -iP '^Subject:' 1.dat b main r b 2280 c b 2398 c b grepfile c b grepdesc c b grep c b grepbuf c b do_execute c b 1077 c s b 178 c s b 6514 c s b 9519 c bt q » Output is: « shlomif <at> telaviv1:~/conf/bugs/gnu-grep$ LC_ALL=en_US.UTF-8 gdb --command=cmds.gdb ~/apps/TEST-grep-from-git-TO-DEL/bin/grep GNU gdb (GDB) 7.6-6.mga4 (Mageia release 4) Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-mageia-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /home/shlomif/apps/TEST-grep-from-git-TO-DEL/bin/grep...done. Breakpoint 1 at 0x407fdf: file main.c, line 1960. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Breakpoint 1, main (argc=4, argv=0x7fffffffd3e8) at main.c:1960 1960 exit_failure = EXIT_TROUBLE; Breakpoint 2 at 0x408984: file main.c, line 2280. Breakpoint 2, main (argc=4, argv=0x7fffffffd3e8) at main.c:2280 2280 if (color_option == 2) Breakpoint 3 at 0x408d54: file main.c, line 2398. Breakpoint 3, main (argc=4, argv=0x7fffffffd3e8) at main.c:2398 2398 status &= grep_command_line_arg (argv[optind]); Breakpoint 4 at 0x406dd0: file main.c, line 1372. Breakpoint 4, grepfile (dirdesc=-100, name=0x7fffffffd94e "1.dat", follow=1, command_line=1) at main.c:1372 1372 int desc = openat_safer (dirdesc, name, O_RDONLY | (follow ? 0 : O_NOFOLLOW)); Breakpoint 5 at 0x406e66: file main.c, line 1386. Breakpoint 5, grepdesc (desc=8, command_line=1) at main.c:1386 1386 int status = 1; Breakpoint 6 at 0x406579: file main.c, line 1165. Breakpoint 6, grep (fd=8, st=0x7fffffffd090) at main.c:1165 1165 char eol = eolbyte; Breakpoint 7 at 0x4063b0: file main.c, line 1111. Breakpoint 7, grepbuf (beg=0x62d000 "\277P\"\360\276P", lim=0x62d329 "") at main.c:1111 1111 nlines = 0; Breakpoint 8 at 0x406279: file main.c, line 1076. Breakpoint 8, do_execute (buf=0x62d000 "\277P\"\360\276P", size=809, match_size=0x7fffffffcfb8, start_ptr=0x0) at main.c:1076 1076 if (MB_CUR_MAX == 1 || !match_icase) Breakpoint 9 at 0x40628e: file main.c, line 1077. Breakpoint 9, do_execute (buf=0x62d000 "\277P\"\360\276P", size=809, match_size=0x7fffffffcfb8, start_ptr=0x0) at main.c:1077 1077 return execute (buf, size, match_size, start_ptr); Pexecute (buf=0x62d000 "\277P\"\360\276P", size=809, ---Type <return> to continue, or q <return> to quit--- match_size=0x7fffffffcfb8, start_ptr=0x0) at pcresearch.c:152 152 int e = PCRE_ERROR_NOMATCH; Breakpoint 10 at 0x404b04: file pcresearch.c, line 178. Breakpoint 10, Pexecute (buf=0x62d000 "\277P\"\360\276P", size=809, match_size=0x7fffffffcfb8, start_ptr=0x0) at pcresearch.c:178 178 e = pcre_exec (cre, extra, line_buf, line_end - line_buf, pcre_exec (argument_re=0x62c7f0, extra_data=0x62c930, subject=0x62d000 "\277P\"\360\276P", length=808, start_offset=0, options=8192, offsets=0x7fffffffca40, offsetcount=300) at pcre_exec.c:6392 6392 { Breakpoint 11 at 0x7ffff7b9d797: file pcre_exec.c, line 6514. Breakpoint 11, pcre_exec (argument_re=0x62c7f0, extra_data=<optimized out>, subject=0x62d000 "\277P\"\360\276P", length=808, start_offset=0, options=8192, offsets=0x7fffffffca40, offsetcount=300) at pcre_exec.c:6514 6514 rc = PRIV(jit_exec)(extra_data, (const pcre_uchar *)subject, length, _pcre_jit_exec (extra_data=extra_data <at> entry=0x62c930, subject=subject <at> entry=0x62d000 "\277P\"\360\276P", length=length <at> entry=808, start_offset=start_offset <at> entry=0, options=options <at> entry=8192, offsets=offsets <at> entry=0x7fffffffca40, offset_count=offset_count <at> entry=300) at pcre_jit_compile.c:9460 9460 { Breakpoint 12 at 0x7ffff7bc08d4: file pcre_jit_compile.c, line 9519. Breakpoint 12, _pcre_jit_exec (extra_data=extra_data <at> entry=0x62c930, subject=subject <at> entry=0x62d000 "\277P\"\360\276P", length=length <at> entry=808, start_offset=start_offset <at> entry=0, options=options <at> entry=8192, offsets=offsets <at> entry=0x7fffffffca40, offset_count=2, offset_count <at> entry=300) at pcre_jit_compile.c:9519 9519 retval = convert_executable_func.call_executable_func(&arguments); #0 _pcre_jit_exec (extra_data=extra_data <at> entry=0x62c930, subject=subject <at> entry=0x62d000 "\277P\"\360\276P", length=length <at> entry=808, start_offset=start_offset <at> entry=0, options=options <at> entry=8192, offsets=offsets <at> entry=0x7fffffffca40, offset_count=2, offset_count <at> entry=300) at pcre_jit_compile.c:9519 #1 0x00007ffff7b9d7c0 in pcre_exec (argument_re=0x62c7f0, extra_data=<optimized out>, subject=0x62d000 "\277P\"\360\276P", length=808, start_offset=0, options=8192, offsets=0x7fffffffca40, offsetcount=300) at pcre_exec.c:6514 #2 0x0000000000404b48 in Pexecute (buf=0x62d000 "\277P\"\360\276P", size=809, match_size=0x7fffffffcfb8, start_ptr=0x0) at pcresearch.c:178 #3 0x00000000004062a7 in do_execute (buf=0x62d000 "\277P\"\360\276P", size=809, match_size=0x7fffffffcfb8, start_ptr=0x0) at main.c:1077 #4 0x0000000000406509 in grepbuf (beg=0x62d000 "\277P\"\360\276P", lim=0x62d329 "") at main.c:1113 #5 0x00000000004067fb in grep (fd=8, st=0x7fffffffd090) at main.c:1224 #6 0x0000000000407186 in grepdesc (desc=8, command_line=1) at main.c:1478 #7 0x0000000000406e4c in grepfile (dirdesc=-100, name=0x7fffffffd94e "1.dat", follow=1, command_line=1) at main.c:1379 #8 0x000000000040737a in grep_command_line_arg (arg=0x7fffffffd94e "1.dat") ---Type <return> to continue, or q <return> to quit--- at main.c:1530 #9 0x0000000000408d76 in main (argc=4, argv=0x7fffffffd3e8) at main.c:2398 A debugging session is active. Inferior 1 [process 14934] will be killed. Quit anyway? (y or n) [answered Y; input not from terminal] shlomif <at> telaviv1:~/conf/bugs/gnu-grep$ » So it seems the problem is with a JIT-enabled PCRE (though I don't know why it is not manifested with pcregrep). Regards, Shlomi Fish -- ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ Original Riddles - http://www.shlomifish.org/puzzles/ Writing a BitKeeper replacement is probably easier at this point than getting its license changed. — Matt Mackall (who ended up writing a BitKeeper replacement) Please reply to list if it's a mailing list post - http://shlom.in/reply .
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.