GNU bug report logs - #32750
[PATCH 2/2] dfa: optmization of alternation in NFA

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Mon, 17 Sep 2018 14:16:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Subject: bug#32750: closed (Re: bug#32750: [PATCH 2/2] dfa: optmization of
 alternation in NFA)
Date: Wed, 19 Sep 2018 04:13:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 32750 <at> debbugs.gnu.org.

-- 
32750: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=32750
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 32750-done <at> debbugs.gnu.org
Subject: Re: bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA
Date: Tue, 18 Sep 2018 21:12:18 -0700
[Message part 3 (text/plain, inline)]
Norihiro Tanaka wrote:
> I thought of various things for the name of the function, but I could
> not think of a good name.

Yes, that's a tough one. I eventually came up with maybe_disable_superset_dfa. 
Perhaps we can think of an ever better one.

I installed your patches into Gnulib, along with the attached relatively-minor 
fixups, and then propagated the result into grep by updating the Gnulib version 
that the grep master points to.

As you may notice, nowadays I prefer using signed types like ptrdiff_t to 
unsigned ones like size_t, as the signed types have a significant advantage in 
overflow checking with gcc -fsanitize=undefined. Perhaps at some point we should 
change the other instances of size_t in dfa.c, but one step at a time.

Thanks again for the performance improvement.
[0001-dfa-reorder-enum-for-efficiency.patch (text/x-patch, attachment)]
[0002-dfa-prune-states-as-we-go.patch (text/x-patch, attachment)]
[0003-dfa-tweak-allocation-performance.patch (text/x-patch, attachment)]
[0004-dfa-use-more-informative-function-name.patch (text/x-patch, attachment)]
[Message part 8 (message/rfc822, inline)]
From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: <bug-grep <at> gnu.org>
Subject: [PATCH 2/2] dfa: optmization of alternation in NFA
Date: Mon, 17 Sep 2018 23:14:52 +0900
[Message part 9 (text/plain, inline)]
Hi,

Even when similer states exists in alternation, DFA treats them as
separate items.  It may complicate the transition in NFA and cause
slowdown.  This change assembles the states into one.

For example, ab|ac is changed into a(b|c).

This change speeds-up matching for many branched pattern.  For
example, grep speeds-up more than 30x in following case.

  seq 10000 | sed 's/$/ abcdefghijklmnopqrstuvwxyz/; s/$/./' >in
  time -p env LC_ALL=C grep -vf in in

  (before) real 63.43 user 61.67 system 1.65
  (after)  real  1.64 user  1.61 system 0.03

  # If we do not add '.' at last in pattern, not dfa but kwset is used.

grep also speeds-up about 3x in following case.

  time -p env LC_ALL=C grep -vf /usr/share/dict/linux.words /usr/share/dict/linux.words

  (before) real  2.48 user  2.09 system 0.38
  (after)  real  7.69 user  6.32 system 1.29

Thanks,
Norihiro
[0001-dfa-simplify-initial-state.patch (text/plain, attachment)]
[0002-dfa-optmization-of-alternation-in-NFA.patch (text/plain, attachment)]

This bug report was last modified 6 years and 251 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.