GNU bug report logs - #32750
[PATCH 2/2] dfa: optmization of alternation in NFA

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Mon, 17 Sep 2018 14:16:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 32750 <at> debbugs.gnu.org
Subject: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA
Date: Wed, 19 Sep 2018 00:48:42 -0700
Jim Meyering wrote:
>>    seq 10000 | sed 's/$/ abcdefghijklmnopqrstuvwxyz/; s/$/./' >in
>>    time -p env LC_ALL=C grep -vf in in
>>
>>    (before) real 63.43 user 61.67 system 1.65
>>    (after)  real  1.64 user  1.61 system 0.03
>>
>>    # If we do not add '.' at last in pattern, not dfa but kwset is used.
>>
>> grep also speeds-up about 3x in following case.
>>
>>    time -p env LC_ALL=C grep -vf /usr/share/dict/linux.words /usr/share/dict/linux.words
>>
>>    (before) real  2.48 user  2.09 system 0.38
>>    (after)  real  7.69 user  6.32 system 1.29
> Thank you for the patches.
> However, the before/after numbers you show here suggest that the
> "after" code takes more than triple of the time of "before". >
> Also, when I compared grep compiled at
> 123620af88f55c3e0cc9f0aed7311c72f625bc82 (latest, including your
> changes) and that compiled at the prior commit,
> 9c11510507ebcd31671f10d9b88532f8e6657ad2, I find that the new version
> takes over 30 seconds, while the prior one took about 20 seconds.


Is that last pair of times for the second benchmark he gave? I confess to being 
lazy and not trying that benchmark, as I was on an Ubuntu system that didn't 
have the linux.words file.

Did you try the first benchmark? On my Ubuntu 18.04 x86-64 (Xeon E3-1225 V2) 
platform, I got (before) real 55.51 user 51.53 sys 3.98, (after) real 0.64 user 
0.60 sys 0.04, so the change is a big performance win there.

On the other hand, I just now did the 2nd benchmark with a copy of a Fedora 28 
linux.words file, and got (before) real 8.06 user 6.20 sys 1.85, (after) real 
21.69 user 21.21 sys 0.47, so it's about three times slower. Ouch.




This bug report was last modified 6 years and 251 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.