GNU bug report logs -
#17013
[PATCH] grep: optimization by using the Galil rule for Boyer-Moore algorithm in KWSet
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Fri, 14 Mar 2014 12:22:02 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Package: grep
Tags: patch
The Boyer-Moore algorithm runs in O(m n) in the worst case,
which perhaps it may be much slower than the DFA.
The Galil rule enables to change O(m n) into O(n) for its case without
overheads and/or slow-down for other cases by avoiding to compare more
than once for a position in the text. This patch implements it.
I prepare following string, which makes a worst case for Boyer-Moore
algorithm, to measure the performance.
yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 > ../k
I run the test with the patch (best-of-5 trials):
env LC_ALL=C time -p src/grep kjjjjjjjjjjjjjjjjjjj k
real 0.70 user 0.32 sys 0.38
Back out that commit (temporarily), recompile, and rerun the experiment:
env LC_ALL=C time -p src/grep kjjjjjjjjjjjjjjjjjjj k
real 3.97 user 3.56 sys 0.40
Norihiro
[grep.patch (application/octet-stream, attachment)]
This bug report was last modified 11 years and 99 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.