GNU bug report logs - #17013
[PATCH] grep: optimization by using the Galil rule for Boyer-Moore algorithm in KWSet

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Fri, 14 Mar 2014 12:22:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17013 in the body.
You can then email your comments to 17013 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#17013; Package grep. (Fri, 14 Mar 2014 12:22:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 14 Mar 2014 12:22:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: submit <at> debbugs.gnu.org
Subject: [PATCH] grep: optimization by using the Galil rule for Boyer-Moore
 algorithm in KWSet
Date: Fri, 14 Mar 2014 21:21:21 +0900
[Message part 1 (text/plain, inline)]
Package: grep
Tags: patch

The Boyer-Moore algorithm runs in O(m n) in the worst case,
 which perhaps it may be much slower than the DFA.

The Galil rule enables to change O(m n) into O(n) for its case without
overheads and/or slow-down for other cases by avoiding to compare more
than once for a position in the text.  This patch implements it.

I prepare following string, which makes a worst case for Boyer-Moore
algorithm, to measure the performance.

    yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 > ../k

I run the test with the patch (best-of-5 trials):

    env LC_ALL=C time -p src/grep kjjjjjjjjjjjjjjjjjjj k
        real 0.70       user 0.32       sys 0.38

Back out that commit (temporarily), recompile, and rerun the experiment:

    env LC_ALL=C time -p src/grep kjjjjjjjjjjjjjjjjjjj k
        real 3.97       user 3.56       sys 0.40

Norihiro
[grep.patch (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17013; Package grep. (Sat, 15 Mar 2014 06:07:02 GMT) Full text and rfc822 format available.

Message #8 received at 17013 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 17013 <at> debbugs.gnu.org
Subject: bug#17013: [PATCH] grep: optimization by using the Galil rule for
 Boyer-Moore algorithm in KWSet
Date: Sat, 15 Mar 2014 15:06:23 +0900
[Message part 1 (text/plain, inline)]
I changed the patch so that the delta2 shift is extracted from the trie,
because it's very excellent.

Norihiro
[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17013; Package grep. (Sat, 15 Mar 2014 17:14:02 GMT) Full text and rfc822 format available.

Message #11 received at 17013 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17013 <at> debbugs.gnu.org
Subject: Re: bug#17013: [PATCH] grep: optimization by using the Galil rule for
 Boyer-Moore algorithm in KWSet
Date: Sat, 15 Mar 2014 10:12:56 -0700
On Fri, Mar 14, 2014 at 11:06 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> I changed the patch so that the delta2 shift is extracted from the trie,
> because it's very excellent.

Very nice.
I've begun to review these patches, and really like the
improved performance. Your first version (decreasing
worst-case O(M*N) to O(M+N)) gives 30x speed-up in
some cases.  The delta2-from-trie change brings it
closer to 40x.




Information forwarded to bug-grep <at> gnu.org:
bug#17013; Package grep. (Tue, 01 Apr 2014 09:08:01 GMT) Full text and rfc822 format available.

Message #14 received at 17013 <at> debbugs.gnu.org (full text, mbox):

From: Paolo Bonzini <bonzini <at> gnu.org>
To: Jim Meyering <jim <at> meyering.net>, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17013 <at> debbugs.gnu.org
Subject: Re: bug#17013: [PATCH] grep: optimization by using the Galil rule
 for Boyer-Moore algorithm in KWSet
Date: Tue, 01 Apr 2014 11:07:09 +0200
Il 15/03/2014 18:12, Jim Meyering ha scritto:
> On Fri, Mar 14, 2014 at 11:06 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>> I changed the patch so that the delta2 shift is extracted from the trie,
>> because it's very excellent.
>
> Very nice.
> I've begun to review these patches, and really like the
> improved performance. Your first version (decreasing
> worst-case O(M*N) to O(M+N)) gives 30x speed-up in
> some cases.  The delta2-from-trie change brings it
> closer to 40x.

Thanks Jim, I'll leave the review to you then!

Paolo





Information forwarded to bug-grep <at> gnu.org:
bug#17013; Package grep. (Wed, 02 Apr 2014 23:46:01 GMT) Full text and rfc822 format available.

Message #17 received at 17013 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 17013 <at> debbugs.gnu.org
Subject: bug#17013: [PATCH] grep: optimization by using the Galil rule for
 Boyer-Moore algorithm in KWSet
Date: Thu, 03 Apr 2014 08:45:22 +0900
[Message part 1 (text/plain, inline)]
In second patch, I changed so that Boyer-Moore algorithm could be used
also to case-insensitive matching if MB_CUR_MAX == 1.  It works with
patch#17019 and patch#17034.
[patch.txt (text/plain, attachment)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Tue, 08 Apr 2014 02:57:02 GMT) Full text and rfc822 format available.

Notification sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
bug acknowledged by developer. (Tue, 08 Apr 2014 02:57:03 GMT) Full text and rfc822 format available.

Message #22 received at 17013-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17013-done <at> debbugs.gnu.org
Subject: Re: bug#17013: [PATCH] grep: optimization by using the Galil rule
 for Boyer-Moore algorithm in KWSet
Date: Mon, 07 Apr 2014 19:56:29 -0700
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> In second patch, I changed so that Boyer-Moore algorithm could be used
> also to case-insensitive matching if MB_CUR_MAX == 1.

Thanks, I merged this patch into the savannah git master (attachment 1), 
applied a fixup patch for clarity and to fix some minor style issues 
(attachment 2), and fixed some longstanding kwset memory-allocation 
infelicities mostly having to do with unecessary code (attachment 3).
[0001-grep-use-the-Galil-rule-for-Boyer-Moore-algorithm-in.patch (text/plain, attachment)]
[0002-grep-minor-cleanups-for-Galil-speedups.patch (text/plain, attachment)]
[0003-grep-simplify-memory-allocation-in-kwset.patch (text/plain, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 06 May 2014 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 98 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.