GNU bug report logs -
#22655
grep-2.21 (and git master): --null-data and ranges work in an odd way (-P works fine)
Previous Next
Full log
Message #121 received at 22655 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Stephane Chazelas wrote:
> I don't find a x220 factor, more like a x2.5 factor:
I think I found the factor-of-hundreds slowdown, and fixed it in the 2nd
attached patch.
When I tried your benchmark with pcregrep (pcre 8.39, configured with
--enable-unicode-properties), and with ./grep0 (which has the PCRE_MULTILINE
implementation, i.e., commit da94c91a81fc63275371d0580d8688b6abd85346), and with
./grep (which is grep after the attached patches are installed), I got timings
like the following:
user sys
1.972 0.072 LC_ALL=en_US.utf8 pcregrep -u "z.*a" k
0.234 0.076 LC_ALL=en_US.utf8 ./grep0 -P "z.*a" k
1.280 0.064 LC_ALL=en_US.utf8 ./grep -P "z.*a" k
1.487 0.077 LC_ALL=C pcregrep "z.*a" k
0.193 0.067 LC_ALL=C ./grep0 -P "z.*a" k
0.825 0.096 LC_ALL=C ./grep -P "z.*a" k
All times are CPU seconds. This is Fedora 24 x86-64, AMD Phenom II X4 910e. As
before, k was created by the shell command: yes 'abcdefg hijklmn opqrstu vwxyz'
| head -n 10000000 >k
So, on this benchmark using PCRE_MULTILINE gave a speedup of a factor of ~4.3 in
a multibyte locale, and a speedup of ~3.5 in a unibyte locale.
> On the other hand if you change the pattern to "z[^+]*a",
> pcregrep still takes about one second, but GNU grep a lot longer
Yes, that example makes GNU grep -P look really bad. So installed the 1st
attached patch, which mostly just reverts the January multiline patch, i.e., it
goes back to the slower "./grep -P" lines measured above.
[0001-grep-P-no-longer-uses-PCRE_MULTILINE.patch (text/x-diff, attachment)]
[0002-grep-further-P-performance-fix.patch (text/x-diff, attachment)]
This bug report was last modified 8 years and 190 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.