GNU bug report logs - #17576
[PATCH] dfa: speed-up at initial state

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sat, 24 May 2014 03:33:01 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17576 in the body.
You can then email your comments to 17576 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#17576; Package grep. (Sat, 24 May 2014 03:33:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sat, 24 May 2014 03:33:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: bug-grep <at> gnu.org
Subject: [PATCH] dfa: speed-up at initial state
Date: Sat, 24 May 2014 12:31:37 +0900
[Message part 1 (text/plain, inline)]
Thanks to Jim and for the new release.  Let's just start with, for next
release I want to add further improvement to it.

In dfaexec, DFA state is always 0 until have found potential match.  So
we can improve matching there by continuing to use the transition table
without replacing it.

I tested the patch, and got about 3x speed-up.

$ yes j | head -10000000 >k
$ env LC_ALL=C time -p src/grep '\(a\|b\)' k
  Before: real 1.12   user 1.06   sys 0.04
  After : real 0.39   user 0.34   sys 0.04

I also tested for non-utf8 multibyte locale.
$ env LC_ALL=ja_JP.eucJP time -p src/grep '\(a\|b\)' k
  Before: real 1.41   user 1.35   sys 0.05
  After : real 0.38   user 0.32   sys 0.06

By the way, below on grep 2.18 (non-patch). (^_^)

$ env LANG=ja_JP.eucJP time -p src/grep '\(a\|b\)' k
          real 12.00  user 11.86  sys 0.13
[0001-dfa-speed-up-at-initial-state.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17576; Package grep. (Sat, 24 May 2014 05:02:02 GMT) Full text and rfc822 format available.

Message #8 received at 17576 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17576 <at> debbugs.gnu.org
Subject: Re: bug#17576: [PATCH] dfa: speed-up at initial state
Date: Fri, 23 May 2014 22:01:36 -0700
Norihiro Tanaka wrote:
> I tested the patch, and got about 3x speed-up.

That, but unfortunately whan I timed that patch against the current grep 
trunk (879d86e6242eae688f9e72d42b230348d1c93a8f), it slowed things down 
on a similar test case (a test like yours, but 100x longer).

$ yes j | head -1000000000 >k
$ env LC_ALL=C time -p src/grep-old '\(a\|b\)' k
real 8.25
user 7.74
sys 0.50
$ env LC_ALL=C time -p src/grep '\(a\|b\)' k
real 11.11
user 10.63
sys 0.46
$ env LC_ALL=ja_JP.eucJP time -p src/grep-old '\(a\|b\)' k
real 8.74
user 8.27
sys 0.43
$ env LC_ALL=ja_JP.eucJP time -p src/grep '\(a\|b\)' k
real 8.95
user 8.44
sys 0.49

This is on my usual platform: Fedora 20 x86-64, AMD Phenum II X4 910e, 
GCC 4.9.0, default (-O2) optimization.




Information forwarded to bug-grep <at> gnu.org:
bug#17576; Package grep. (Sat, 24 May 2014 05:24:01 GMT) Full text and rfc822 format available.

Message #11 received at 17576 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17576 <at> debbugs.gnu.org
Subject: Re: bug#17576: [PATCH] dfa: speed-up at initial state
Date: Fri, 23 May 2014 22:22:54 -0700
On Fri, May 23, 2014 at 8:31 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Thanks to Jim and for the new release.  Let's just start with, for next
> release I want to add further improvement to it.
>
> In dfaexec, DFA state is always 0 until have found potential match.  So
> we can improve matching there by continuing to use the transition table
> without replacing it.
>
> I tested the patch, and got about 3x speed-up.
>
> $ yes j | head -10000000 >k
> $ env LC_ALL=C time -p src/grep '\(a\|b\)' k
>   Before: real 1.12   user 1.06   sys 0.04
>   After : real 0.39   user 0.34   sys 0.04
>
> I also tested for non-utf8 multibyte locale.
> $ env LC_ALL=ja_JP.eucJP time -p src/grep '\(a\|b\)' k
>   Before: real 1.41   user 1.35   sys 0.05
>   After : real 0.38   user 0.32   sys 0.06
...

Thank you for the patch, but with a 2.30GHz i7-3615QM
using 1600MHz DDR3, I see a 40% decrease in performance
in that first case:

  $ for i in grep src/grep; do env LC_ALL=C time -f %e $i '\(a\|b\)' k; done
  0.40
  0.65

and no change in the latter case.
In each case, I used an input file 10 times larger:

   $ yes j | head -100000000 >k

I've also compared on an i5-4250U w/1600MHz DDR3, where there
is still a performance decrease, but only 19% this time (each time
is the best of 5 trials):

  0.43
  0.53




Information forwarded to bug-grep <at> gnu.org:
bug#17576; Package grep. (Sat, 24 May 2014 05:37:02 GMT) Full text and rfc822 format available.

Message #14 received at 17576 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: 17576 <at> debbugs.gnu.org
Subject: Re: bug#17576: [PATCH] dfa: speed-up at initial state
Date: Sat, 24 May 2014 14:36:02 +0900
Wow!  Thanks to Paul and Jim for the review.  I will retry it with newer
OS and gcc.  By the way, Have you already found what is wrong in the patch?





Information forwarded to bug-grep <at> gnu.org:
bug#17576; Package grep. (Sat, 24 May 2014 10:20:01 GMT) Full text and rfc822 format available.

Message #17 received at 17576 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>,
 <eggert <at> cs.ucla.edu>
Cc: 17576 <at> debbugs.gnu.org
Subject: Re: bug#17576: [PATCH] dfa: speed-up at initial state
Date: Sat, 24 May 2014 19:19:24 +0900
Sorry, I was wrong.  Correct test case is below.

$ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 >k

> Thank you for the patch, but with a 2.30GHz i7-3615QM
> using 1600MHz DDR3, I see a 40% decrease in performance
> in that first case:

It's a case which has a lot of small lines.
I will fix it in a future patch.





Information forwarded to bug-grep <at> gnu.org:
bug#17576; Package grep. (Sat, 24 May 2014 19:18:02 GMT) Full text and rfc822 format available.

Message #20 received at 17576 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17576 <at> debbugs.gnu.org
Subject: Re: bug#17576: [PATCH] dfa: speed-up at initial state
Date: Sat, 24 May 2014 12:17:15 -0700
Norihiro Tanaka wrote:
>Have you already found what is wrong in the patch?

Sorry, no, I just ran the test case you gave in your original report.

I guess that grep needs a performance test suite, so that we can find 
problems like this in a more-disciplined way.




Information forwarded to bug-grep <at> gnu.org:
bug#17576; Package grep. (Mon, 26 May 2014 14:14:01 GMT) Full text and rfc822 format available.

Message #23 received at 17576 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17576 <at> debbugs.gnu.org
Subject: Re: bug#17576: [PATCH] dfa: speed-up at initial state
Date: Mon, 26 May 2014 23:12:50 +0900
[Message part 1 (text/plain, inline)]
Paul Eggert wrote:
> I guess that grep needs a performance test suite, so that we can find
> problems like this in a more-disciplined way.

I agree, but I have no idea to perform it.

BTW, I retried them on Fedora 20, but I couldn't confirm significant
slowdown.

  Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz
  Fedora 20  GCC 4.8.2 20131017 (Red Hat 4.8.2-1) (x86-64)

$ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 >k
$ yes j | head -100000000 >l

[ before 0001-dfa-speed-up-at-initial-state.patch ]

$ env LC_ALL=C src/grep '\(a\|b\)' k
real 0.96
user 0.85
sys 0.10
$ env LC_ALL=C src/grep '\(a\|b\)' l
real 0.64
user 0.60
sys 0.03

[ after 0001-dfa-speed-up-at-initial-state.patch ]

$ env LC_ALL=C src/grep '\(a\|b\)' k
real 0.58
user 0.52
sys 0.05
$ env LC_ALL=C src/grep '\(a\|b\)' l
real 0.75
user 0.69
sys 0.05

However, I could confirm slowdown on RHEL6 GCC 4.4.7.  According to the
result, I see that GCC fails in inlination.  In addition to
0001-dfa-speed-up-at-initial-state.patch, I applied an attached patch,
and retried them.  Then I confirmed speed-up.

$ env LC_ALL=C src/grep '\(a\|b\)' k
real 0.46
user 0.35
sys 0.10
$ env LC_ALL=C src/grep '\(a\|b\)' l
real 0.54
user 0.49
sys 0.04

Thanks,
Norihiro
[0001-dfa-separate-dfaexec-function-to-help-optimization-b.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17576; Package grep. (Fri, 30 May 2014 07:32:02 GMT) Full text and rfc822 format available.

Message #26 received at 17576 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 17576 <at> debbugs.gnu.org, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Subject: Re: [PATCH] dfa: speed-up at initial state
Date: Fri, 30 May 2014 00:31:28 -0700
I don't think we need to complicate the code to improve optimization 
with older compilers.  People who really care about performance will use 
newer compilers anyway.  So let's focus on 
0001-dfa-speed-up-at-initial-state.patch.

I confirmed your performance results on my platform: the change makes 
test case "k" about 60% faster and test case "l" about 14% slower.  At 
this point we're just in bugfix mode so this patch can wait.  It'd be 
nice if it didn't slow down the short-line case so much.  Also, I wonder 
how well this change works for "typical" searches.  (I realize that 
"typical" is hard to define....)




Information forwarded to bug-grep <at> gnu.org:
bug#17576; Package grep. (Fri, 30 May 2014 11:41:02 GMT) Full text and rfc822 format available.

Message #29 received at 17576 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17576 <at> debbugs.gnu.org
Subject: Re: [PATCH] dfa: speed-up at initial state
Date: Fri, 30 May 2014 20:40:08 +0900
Paul Eggert wrote:
> I don't think we need to complicate the code to improve optimization
> with older compilers.

Thanks for the confirmation.  Two results of last are with not GCC 4.4.7
but GCC 4.8.2 after applying second patch.  The test case "k" is 50%
faster and "l" is also about 16% faster with GCC 4.8.2 on my platform by
two changes.





Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Sun, 28 Sep 2014 04:02:02 GMT) Full text and rfc822 format available.

Notification sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
bug acknowledged by developer. (Sun, 28 Sep 2014 04:02:03 GMT) Full text and rfc822 format available.

Message #34 received at 17576-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17576-done <at> debbugs.gnu.org
Subject: Re: [PATCH] dfa: speed-up at initial state
Date: Sat, 27 Sep 2014 21:01:14 -0700
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> The test case "k" is 50%
> faster and "l" is also about 16% faster with GCC 4.8.2 on my platform by
> two changes.

Thanks, I finally got around to looking at this and got similar performance 
results to yours.  That __attribute__((noinline)) bothers me, though, as it's 
not portable and is a bit inelegant.  I figured out a different way to avoid the 
inlining, and tweaked the commentary a bit, and so installed the attached 
additional patch after installing your patches.
[0001-dfa-minor-tweaks-mostly-to-remove-__attribute__-noin.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17576; Package grep. (Sun, 28 Sep 2014 07:19:02 GMT) Full text and rfc822 format available.

Message #37 received at 17576-done <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17576-done <at> debbugs.gnu.org
Subject: Re: [PATCH] dfa: speed-up at initial state
Date: Sun, 28 Sep 2014 16:18:15 +0900
Paul Eggert wrote:
> Thanks, I finally got around to looking at this and got similar 
> performance results to yours.  That __attribute__((noinline)) bothers
> me, though, as it's not portable and is a bit inelegant.  I figured
> out a different way to avoid the inlining, and tweaked the commentary
> a bit, and so installed the attached additional patch after installing
> your patches.

Excellent.

Thanks for the review and additional improvement.  I don't like that my
patch have uses `((noinline)) __attribute __' which isn't portable, but
you fix it.





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 26 Oct 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 10 years and 317 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.