GNU bug report logs -
#32750
[PATCH 2/2] dfa: optmization of alternation in NFA
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Mon, 17 Sep 2018 14:16:02 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
Message #25 received at 32750 <at> debbugs.gnu.org (full text, mbox):
On Mon, Sep 17, 2018 at 7:16 AM Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Even when similer states exists in alternation, DFA treats them as
> separate items. It may complicate the transition in NFA and cause
> slowdown. This change assembles the states into one.
>
> For example, ab|ac is changed into a(b|c).
>
> This change speeds-up matching for many branched pattern. For
> example, grep speeds-up more than 30x in following case.
>
> seq 10000 | sed 's/$/ abcdefghijklmnopqrstuvwxyz/; s/$/./' >in
> time -p env LC_ALL=C grep -vf in in
>
> (before) real 63.43 user 61.67 system 1.65
> (after) real 1.64 user 1.61 system 0.03
>
> # If we do not add '.' at last in pattern, not dfa but kwset is used.
>
> grep also speeds-up about 3x in following case.
>
> time -p env LC_ALL=C grep -vf /usr/share/dict/linux.words /usr/share/dict/linux.words
>
> (before) real 2.48 user 2.09 system 0.38
> (after) real 7.69 user 6.32 system 1.29
Thank you for the patches.
However, the before/after numbers you show here suggest that the
"after" code takes more than triple of the time of "before".
Also, when I compared grep compiled at
123620af88f55c3e0cc9f0aed7311c72f625bc82 (latest, including your
changes) and that compiled at the prior commit,
9c11510507ebcd31671f10d9b88532f8e6657ad2, I find that the new version
takes over 30 seconds, while the prior one took about 20 seconds.
FTR, I used gcc version 9.0.0 20180912, compiling with -O3.
This bug report was last modified 6 years and 251 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.