GNU bug report logs - #22655
grep-2.21 (and git master): --null-data and ranges work in an odd way (-P works fine)

Previous Next

Package: grep;

Reported by: Sergei Trofimovich <slyfox <at> gentoo.org>

Date: Sat, 13 Feb 2016 23:24:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Stephane Chazelas <stephane.chazelas <at> gmail.com>
Cc: 22655 <at> debbugs.gnu.org
Subject: bug#22655: grep -Pz '^' now fails!
Date: Sat, 19 Nov 2016 23:57:22 -0800
[Message part 1 (text/plain, inline)]
Stephane Chazelas wrote:
> I don't find a x220 factor, more like a x2.5 factor:

I think I found the factor-of-hundreds slowdown, and fixed it in the 2nd 
attached patch.

When I tried your benchmark with pcregrep (pcre 8.39, configured with 
--enable-unicode-properties), and with ./grep0 (which has the PCRE_MULTILINE 
implementation, i.e., commit da94c91a81fc63275371d0580d8688b6abd85346), and with 
./grep (which is grep after the attached patches are installed), I got timings 
like the following:

    user  sys
    1.972 0.072 LC_ALL=en_US.utf8 pcregrep -u "z.*a" k
    0.234 0.076 LC_ALL=en_US.utf8 ./grep0 -P "z.*a" k
    1.280 0.064 LC_ALL=en_US.utf8 ./grep -P "z.*a" k
    1.487 0.077 LC_ALL=C pcregrep "z.*a" k
    0.193 0.067 LC_ALL=C ./grep0 -P "z.*a" k
    0.825 0.096 LC_ALL=C ./grep -P "z.*a" k

All times are CPU seconds. This is Fedora 24 x86-64, AMD Phenom II X4 910e. As 
before, k was created by the shell command: yes 'abcdefg hijklmn opqrstu vwxyz' 
| head -n 10000000 >k

So, on this benchmark using PCRE_MULTILINE gave a speedup of a factor of ~4.3 in 
a multibyte locale, and a speedup of ~3.5 in a unibyte locale.

> On the other hand if you change the pattern to "z[^+]*a",
> pcregrep still takes about one second, but GNU grep a lot longer

Yes, that example makes GNU grep -P look really bad. So installed the 1st 
attached patch, which mostly just reverts the January multiline patch, i.e., it 
goes back to the slower "./grep -P" lines measured above.
[0001-grep-P-no-longer-uses-PCRE_MULTILINE.patch (text/x-diff, attachment)]
[0002-grep-further-P-performance-fix.patch (text/x-diff, attachment)]

This bug report was last modified 8 years and 190 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.