GNU bug report logs - #17013
[PATCH] grep: optimization by using the Galil rule for Boyer-Moore algorithm in KWSet

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Fri, 14 Mar 2014 12:22:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Subject: bug#17013: closed (Re: bug#17013: [PATCH] grep: optimization by
 using the Galil rule for Boyer-Moore algorithm in KWSet)
Date: Tue, 08 Apr 2014 02:57:03 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#17013: [PATCH] grep: optimization by using the Galil rule for Boyer-Moore algorithm in KWSet

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 17013 <at> debbugs.gnu.org.

-- 
17013: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=17013
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17013-done <at> debbugs.gnu.org
Subject: Re: bug#17013: [PATCH] grep: optimization by using the Galil rule
 for Boyer-Moore algorithm in KWSet
Date: Mon, 07 Apr 2014 19:56:29 -0700
[Message part 3 (text/plain, inline)]
Norihiro Tanaka wrote:
> In second patch, I changed so that Boyer-Moore algorithm could be used
> also to case-insensitive matching if MB_CUR_MAX == 1.

Thanks, I merged this patch into the savannah git master (attachment 1), 
applied a fixup patch for clarity and to fix some minor style issues 
(attachment 2), and fixed some longstanding kwset memory-allocation 
infelicities mostly having to do with unecessary code (attachment 3).
[0001-grep-use-the-Galil-rule-for-Boyer-Moore-algorithm-in.patch (text/plain, attachment)]
[0002-grep-minor-cleanups-for-Galil-speedups.patch (text/plain, attachment)]
[0003-grep-simplify-memory-allocation-in-kwset.patch (text/plain, attachment)]
[Message part 7 (message/rfc822, inline)]
From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: submit <at> debbugs.gnu.org
Subject: [PATCH] grep: optimization by using the Galil rule for Boyer-Moore
 algorithm in KWSet
Date: Fri, 14 Mar 2014 21:21:21 +0900
[Message part 8 (text/plain, inline)]
Package: grep
Tags: patch

The Boyer-Moore algorithm runs in O(m n) in the worst case,
 which perhaps it may be much slower than the DFA.

The Galil rule enables to change O(m n) into O(n) for its case without
overheads and/or slow-down for other cases by avoiding to compare more
than once for a position in the text.  This patch implements it.

I prepare following string, which makes a worst case for Boyer-Moore
algorithm, to measure the performance.

    yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 > ../k

I run the test with the patch (best-of-5 trials):

    env LC_ALL=C time -p src/grep kjjjjjjjjjjjjjjjjjjj k
        real 0.70       user 0.32       sys 0.38

Back out that commit (temporarily), recompile, and rerun the experiment:

    env LC_ALL=C time -p src/grep kjjjjjjjjjjjjjjjjjjj k
        real 3.97       user 3.56       sys 0.40

Norihiro
[grep.patch (application/octet-stream, attachment)]

This bug report was last modified 11 years and 100 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.