GNU bug report logs - #32750
[PATCH 2/2] dfa: optmization of alternation in NFA

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Mon, 17 Sep 2018 14:16:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #25 received at 32750 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 32750 <at> debbugs.gnu.org
Subject: Re: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA
Date: Tue, 18 Sep 2018 22:13:38 -0700
On Mon, Sep 17, 2018 at 7:16 AM Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Even when similer states exists in alternation, DFA treats them as
> separate items.  It may complicate the transition in NFA and cause
> slowdown.  This change assembles the states into one.
>
> For example, ab|ac is changed into a(b|c).
>
> This change speeds-up matching for many branched pattern.  For
> example, grep speeds-up more than 30x in following case.
>
>   seq 10000 | sed 's/$/ abcdefghijklmnopqrstuvwxyz/; s/$/./' >in
>   time -p env LC_ALL=C grep -vf in in
>
>   (before) real 63.43 user 61.67 system 1.65
>   (after)  real  1.64 user  1.61 system 0.03
>
>   # If we do not add '.' at last in pattern, not dfa but kwset is used.
>
> grep also speeds-up about 3x in following case.
>
>   time -p env LC_ALL=C grep -vf /usr/share/dict/linux.words /usr/share/dict/linux.words
>
>   (before) real  2.48 user  2.09 system 0.38
>   (after)  real  7.69 user  6.32 system 1.29

Thank you for the patches.
However, the before/after numbers you show here suggest that the
"after" code takes more than triple of the time of "before".

Also, when I compared grep compiled at
123620af88f55c3e0cc9f0aed7311c72f625bc82 (latest, including your
changes) and that compiled at the prior commit,
9c11510507ebcd31671f10d9b88532f8e6657ad2, I find that the new version
takes over 30 seconds, while the prior one took about 20 seconds.

FTR, I used gcc version 9.0.0 20180912, compiling with -O3.




This bug report was last modified 6 years and 251 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.