GNU bug report logs -
#17576
[PATCH] dfa: speed-up at initial state
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Sat, 24 May 2014 03:33:01 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17576 in the body.
You can then email your comments to 17576 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#17576
; Package
grep
.
(Sat, 24 May 2014 03:33:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Sat, 24 May 2014 03:33:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Thanks to Jim and for the new release. Let's just start with, for next
release I want to add further improvement to it.
In dfaexec, DFA state is always 0 until have found potential match. So
we can improve matching there by continuing to use the transition table
without replacing it.
I tested the patch, and got about 3x speed-up.
$ yes j | head -10000000 >k
$ env LC_ALL=C time -p src/grep '\(a\|b\)' k
Before: real 1.12 user 1.06 sys 0.04
After : real 0.39 user 0.34 sys 0.04
I also tested for non-utf8 multibyte locale.
$ env LC_ALL=ja_JP.eucJP time -p src/grep '\(a\|b\)' k
Before: real 1.41 user 1.35 sys 0.05
After : real 0.38 user 0.32 sys 0.06
By the way, below on grep 2.18 (non-patch). (^_^)
$ env LANG=ja_JP.eucJP time -p src/grep '\(a\|b\)' k
real 12.00 user 11.86 sys 0.13
[0001-dfa-speed-up-at-initial-state.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17576
; Package
grep
.
(Sat, 24 May 2014 05:02:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 17576 <at> debbugs.gnu.org (full text, mbox):
Norihiro Tanaka wrote:
> I tested the patch, and got about 3x speed-up.
That, but unfortunately whan I timed that patch against the current grep
trunk (879d86e6242eae688f9e72d42b230348d1c93a8f), it slowed things down
on a similar test case (a test like yours, but 100x longer).
$ yes j | head -1000000000 >k
$ env LC_ALL=C time -p src/grep-old '\(a\|b\)' k
real 8.25
user 7.74
sys 0.50
$ env LC_ALL=C time -p src/grep '\(a\|b\)' k
real 11.11
user 10.63
sys 0.46
$ env LC_ALL=ja_JP.eucJP time -p src/grep-old '\(a\|b\)' k
real 8.74
user 8.27
sys 0.43
$ env LC_ALL=ja_JP.eucJP time -p src/grep '\(a\|b\)' k
real 8.95
user 8.44
sys 0.49
This is on my usual platform: Fedora 20 x86-64, AMD Phenum II X4 910e,
GCC 4.9.0, default (-O2) optimization.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17576
; Package
grep
.
(Sat, 24 May 2014 05:24:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 17576 <at> debbugs.gnu.org (full text, mbox):
On Fri, May 23, 2014 at 8:31 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Thanks to Jim and for the new release. Let's just start with, for next
> release I want to add further improvement to it.
>
> In dfaexec, DFA state is always 0 until have found potential match. So
> we can improve matching there by continuing to use the transition table
> without replacing it.
>
> I tested the patch, and got about 3x speed-up.
>
> $ yes j | head -10000000 >k
> $ env LC_ALL=C time -p src/grep '\(a\|b\)' k
> Before: real 1.12 user 1.06 sys 0.04
> After : real 0.39 user 0.34 sys 0.04
>
> I also tested for non-utf8 multibyte locale.
> $ env LC_ALL=ja_JP.eucJP time -p src/grep '\(a\|b\)' k
> Before: real 1.41 user 1.35 sys 0.05
> After : real 0.38 user 0.32 sys 0.06
...
Thank you for the patch, but with a 2.30GHz i7-3615QM
using 1600MHz DDR3, I see a 40% decrease in performance
in that first case:
$ for i in grep src/grep; do env LC_ALL=C time -f %e $i '\(a\|b\)' k; done
0.40
0.65
and no change in the latter case.
In each case, I used an input file 10 times larger:
$ yes j | head -100000000 >k
I've also compared on an i5-4250U w/1600MHz DDR3, where there
is still a performance decrease, but only 19% this time (each time
is the best of 5 trials):
0.43
0.53
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17576
; Package
grep
.
(Sat, 24 May 2014 05:37:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 17576 <at> debbugs.gnu.org (full text, mbox):
Wow! Thanks to Paul and Jim for the review. I will retry it with newer
OS and gcc. By the way, Have you already found what is wrong in the patch?
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17576
; Package
grep
.
(Sat, 24 May 2014 10:20:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 17576 <at> debbugs.gnu.org (full text, mbox):
Sorry, I was wrong. Correct test case is below.
$ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 >k
> Thank you for the patch, but with a 2.30GHz i7-3615QM
> using 1600MHz DDR3, I see a 40% decrease in performance
> in that first case:
It's a case which has a lot of small lines.
I will fix it in a future patch.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17576
; Package
grep
.
(Sat, 24 May 2014 19:18:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 17576 <at> debbugs.gnu.org (full text, mbox):
Norihiro Tanaka wrote:
>Have you already found what is wrong in the patch?
Sorry, no, I just ran the test case you gave in your original report.
I guess that grep needs a performance test suite, so that we can find
problems like this in a more-disciplined way.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17576
; Package
grep
.
(Mon, 26 May 2014 14:14:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 17576 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Paul Eggert wrote:
> I guess that grep needs a performance test suite, so that we can find
> problems like this in a more-disciplined way.
I agree, but I have no idea to perform it.
BTW, I retried them on Fedora 20, but I couldn't confirm significant
slowdown.
Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz
Fedora 20 GCC 4.8.2 20131017 (Red Hat 4.8.2-1) (x86-64)
$ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 >k
$ yes j | head -100000000 >l
[ before 0001-dfa-speed-up-at-initial-state.patch ]
$ env LC_ALL=C src/grep '\(a\|b\)' k
real 0.96
user 0.85
sys 0.10
$ env LC_ALL=C src/grep '\(a\|b\)' l
real 0.64
user 0.60
sys 0.03
[ after 0001-dfa-speed-up-at-initial-state.patch ]
$ env LC_ALL=C src/grep '\(a\|b\)' k
real 0.58
user 0.52
sys 0.05
$ env LC_ALL=C src/grep '\(a\|b\)' l
real 0.75
user 0.69
sys 0.05
However, I could confirm slowdown on RHEL6 GCC 4.4.7. According to the
result, I see that GCC fails in inlination. In addition to
0001-dfa-speed-up-at-initial-state.patch, I applied an attached patch,
and retried them. Then I confirmed speed-up.
$ env LC_ALL=C src/grep '\(a\|b\)' k
real 0.46
user 0.35
sys 0.10
$ env LC_ALL=C src/grep '\(a\|b\)' l
real 0.54
user 0.49
sys 0.04
Thanks,
Norihiro
[0001-dfa-separate-dfaexec-function-to-help-optimization-b.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17576
; Package
grep
.
(Fri, 30 May 2014 07:32:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 17576 <at> debbugs.gnu.org (full text, mbox):
I don't think we need to complicate the code to improve optimization
with older compilers. People who really care about performance will use
newer compilers anyway. So let's focus on
0001-dfa-speed-up-at-initial-state.patch.
I confirmed your performance results on my platform: the change makes
test case "k" about 60% faster and test case "l" about 14% slower. At
this point we're just in bugfix mode so this patch can wait. It'd be
nice if it didn't slow down the short-line case so much. Also, I wonder
how well this change works for "typical" searches. (I realize that
"typical" is hard to define....)
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17576
; Package
grep
.
(Fri, 30 May 2014 11:41:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 17576 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert wrote:
> I don't think we need to complicate the code to improve optimization
> with older compilers.
Thanks for the confirmation. Two results of last are with not GCC 4.4.7
but GCC 4.8.2 after applying second patch. The test case "k" is 50%
faster and "l" is also about 16% faster with GCC 4.8.2 on my platform by
two changes.
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Sun, 28 Sep 2014 04:02:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
bug acknowledged by developer.
(Sun, 28 Sep 2014 04:02:03 GMT)
Full text and
rfc822 format available.
Message #34 received at 17576-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> The test case "k" is 50%
> faster and "l" is also about 16% faster with GCC 4.8.2 on my platform by
> two changes.
Thanks, I finally got around to looking at this and got similar performance
results to yours. That __attribute__((noinline)) bothers me, though, as it's
not portable and is a bit inelegant. I figured out a different way to avoid the
inlining, and tweaked the commentary a bit, and so installed the attached
additional patch after installing your patches.
[0001-dfa-minor-tweaks-mostly-to-remove-__attribute__-noin.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17576
; Package
grep
.
(Sun, 28 Sep 2014 07:19:02 GMT)
Full text and
rfc822 format available.
Message #37 received at 17576-done <at> debbugs.gnu.org (full text, mbox):
Paul Eggert wrote:
> Thanks, I finally got around to looking at this and got similar
> performance results to yours. That __attribute__((noinline)) bothers
> me, though, as it's not portable and is a bit inelegant. I figured
> out a different way to avoid the inlining, and tweaked the commentary
> a bit, and so installed the attached additional patch after installing
> your patches.
Excellent.
Thanks for the review and additional improvement. I don't like that my
patch have uses `((noinline)) __attribute __' which isn't portable, but
you fix it.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 26 Oct 2014 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 10 years and 317 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.