GNU bug report logs - #17350
[PATCH] grep: speed up for a case to repeat failure in DFA after success in kwset

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sat, 26 Apr 2014 11:27:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17350 in the body.
You can then email your comments to 17350 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Sat, 26 Apr 2014 11:27:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sat, 26 Apr 2014 11:27:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: bug-grep <at> gnu.org
Subject: [PATCH] grep: speed up for a case to repeat failure in DFA after
 success in kwset
Date: Sat, 26 Apr 2014 20:25:43 +0900
[Message part 1 (text/plain, inline)]
Patch#17230 causes slowdown for a case to repeat failure in DFA after
success in kwset, although it improves the performance for many cases.

In fact, the master without this patch is 3.5x slower than grep 2.18 for
below.

    $ yes abcdabc | head -50000000 >k
    $ env LC_ALL=C time -p src/grep -i 'abcd.bd' k

This patch fixes the degradation of the performance for such cases.  See
the commit log for details.

Norihiro
[patch.txt (text/plain, attachment)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Sun, 27 Apr 2014 01:09:02 GMT) Full text and rfc822 format available.

Notification sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
bug acknowledged by developer. (Sun, 27 Apr 2014 01:09:02 GMT) Full text and rfc822 format available.

Message #10 received at 17350-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17350-done <at> debbugs.gnu.org
Subject: Re: bug#17350: [PATCH] grep: speed up for a case to repeat failure
 in DFA after success in kwset
Date: Sat, 26 Apr 2014 18:08:12 -0700
[Message part 1 (text/plain, inline)]
Thanks, I installed that patch with minor reworking of the ChangeLog 
entry and am marking this as done.  By the way, a minor point -- 
nowadays we prefer single-quoting 'like this' rather than the old style 
`like this' -- the GNU coding standards changed recently in this regard.

I noticed a mostly-theoretical read buffer overrun in the patch, along 
with some other opportunities for simplification and tuning, and 
installed the attached fixup patch as well.
[0001-dfa-fix-index-bug-in-previous-patch-and-simplify.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Sun, 27 Apr 2014 06:16:02 GMT) Full text and rfc822 format available.

Message #13 received at 17350-done <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17350-done <at> debbugs.gnu.org
Subject: bug#17350: [PATCH] grep: speed up for a case to repeat failure in DFA
 after success in kwset
Date: Sun, 27 Apr 2014 15:15:27 +0900
[Message part 1 (text/plain, inline)]
Paul, Thank you always for everything.  Fix for dfaisfast() is right.  A
superset never has BACKREF.  However, I think that we don't have to check
the end pointer when dfaisfast is true, because run DFA for whole a buffer
(use buflim) and check the end pointer.

By the way, I took into another bug by my previous patch.  If
`kwsm.index < kwset_exact_matches', don't have to run DFA for whole a buffer.

Norihiro
[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Sun, 27 Apr 2014 11:55:02 GMT) Full text and rfc822 format available.

Message #16 received at 17350-done <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17350-done <at> debbugs.gnu.org
Subject: bug#17350: [PATCH] grep: speed up for a case to repeat failure in DFA
 after success in kwset
Date: Sun, 27 Apr 2014 20:54:09 +0900
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> By the way, I took into another bug by my previous patch.  If
> `kwsm.index < kwset_exact_matches', don't have to run DFA for whole a buffer.

I found a issue in the patch.

If failed in DFA after succeed in kwset, doesn't return to kwset until
reaches the end of the buffer or find a match.  By that, thought some
cases speed up, there is also a case slowdown.

Although A is speed-up, B is slowdown.

(A)
  yes abcdabc | head -50000000 >k
  env LC_ALL=C time -p src/grep abcd.bd k

(B)
  yes "abcdabc
    $(yes jjjjjjj | head -99)" | head -50000000 >k
  env LC_ALL=C time -p src/grep abcd.bd k

In A, KWset doesn't work at all, and it's harmful.
OTOH, in B, It works effectively in A.

I considered only the case of A, but it's necessary to consider how B
does not slowdown.

I wrote the patch for the master to return to KWset, after checking with
DFA about 30 line.  `30' is based on results of the tests.

However, I don't so like this patch, since the basis to 30 is weak...
Is there anyone that have any good ideas?

Norihiro
[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Mon, 28 Apr 2014 06:10:02 GMT) Full text and rfc822 format available.

Message #19 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17350 <at> debbugs.gnu.org
Subject: Re: bug#17350: [PATCH] grep: speed up for a case to repeat failure
 in DFA after success in kwset
Date: Sun, 27 Apr 2014 23:09:30 -0700
Thanks, but I'm hoping that a simpler patch is possible in that area. 
It is unfortunately that this part of the code calls dfaexec in 4 places 
and dfahint in 3.  Surely we can simplify it so that it calls dfahint in 
one place, and dfaexec in one place -- or, if that's two ambitious, at 
least we should cut the complexity down somewhat.




Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Mon, 28 Apr 2014 12:29:02 GMT) Full text and rfc822 format available.

Message #22 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17350 <at> debbugs.gnu.org
Subject: bug#17350: [PATCH] grep: speed up for a case to repeat failure in DFA
 after success in kwset
Date: Mon, 28 Apr 2014 21:27:55 +0900
[Message part 1 (text/plain, inline)]
I simplified a superset of DFA and EGexecute(), and changed the patch for
this bug into the logic better.

Now, Each dfahint and dfaexec is called in one place.

Thanks,
Norihiro
[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Tue, 29 Apr 2014 12:04:01 GMT) Full text and rfc822 format available.

Message #25 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 17350 <at> debbugs.gnu.org
Cc: Paul Eggert <eggert <at> cs.ucla.edu>
Subject: bug#17350: [PATCH] grep: speed up for a case to repeat failure in DFA
 after success in kwset
Date: Tue, 29 Apr 2014 21:03:05 +0900
[Message part 1 (text/plain, inline)]
The previous version of 2nd patch wasn't so better still, because it was
slow for below in order that it never fills beg == prev_beg.

  $ yes 'abcdabc
jjjjjjj' | head -50000000 >k
  $ env LC_ALL=C time -p src/grep abcd.bd k

Accordingly, I have fixed it. It works below if dfafast is true.

  - If `beg - prev_beg' is small, overhead by calls of KWset and DFA is
    high.  So use constant 64 as minimum shift of the `end' pointer.

  - If ratio of `match - beg' to `beg - prev_beg' is large, KWset isn't
    effective.  So consider it.

As far as I have tested, It improves the performance of the worst cases
with little slowdown to others.

Thanks,
Norihiro
[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Wed, 30 Apr 2014 07:11:01 GMT) Full text and rfc822 format available.

Message #28 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17350 <at> debbugs.gnu.org
Subject: Re: bug#17350: [PATCH] grep: speed up for a case to repeat failure
 in DFA after success in kwset
Date: Wed, 30 Apr 2014 00:10:26 -0700
[Message part 1 (text/plain, inline)]
Thanks for working on that.  A few comments.  First, the ChangeLog 
entries can be improved; I put a proposed rewrite of them into the first 
two attached patches (the code's the same).  Second, building with 
--enable-gcc-warnings causes a complaint about dfasuperset needing to be 
declared pure.  Third, EGexecute still has quite a bit of duplicate 
code, even with the patch.  I took at shot at simplifying it (and fixing 
the 2nd problem) in the third attached patch.  This passes "make check" 
but I have not benchmarked it.
[0001-grep-simplify-superset.patch (text/plain, attachment)]
[0002-grep-adjust-timing-back-to-kwset-when-dfaisfast-is-t.patch (text/plain, attachment)]
[0003-grep-simplify-EGexecute-further.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Wed, 30 Apr 2014 14:43:02 GMT) Full text and rfc822 format available.

Message #31 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17350 <at> debbugs.gnu.org
Subject: bug#17350: [PATCH] grep: speed up for a case to repeat failure in DFA
 after success in kwset
Date: Wed, 30 Apr 2014 23:42:44 +0900
Thanks for the investigation.  `narrowed' in my patch means not
`kwset_exact_match' but that it has been narrowed down to one line for
DFA.  If `narrowed' is false, don't have to seek eolbyte.





Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Wed, 30 Apr 2014 14:50:03 GMT) Full text and rfc822 format available.

Message #34 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17350 <at> debbugs.gnu.org
Subject: Re: bug#17350: [PATCH] grep: speed up for a case to repeat failure
 in DFA after success in kwset
Date: Wed, 30 Apr 2014 07:49:13 -0700
Norihiro Tanaka wrote:
> If `narrowed' is false, don't have to seek eolbyte.

Sure, but how much does that help performance in the typical case?  The 
'narrowed' code is there purely for performance reasons, so if it 
doesn't significantly help performance, it's better to keep things 
simple and remove it.




Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Wed, 30 Apr 2014 23:05:02 GMT) Full text and rfc822 format available.

Message #37 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17350 <at> debbugs.gnu.org
Subject: bug#17350: [PATCH] grep: speed up for a case to repeat failure in DFA
 after success in kwset
Date: Thu, 01 May 2014 08:04:29 +0900
Paul Eggert wrote:
> Sure, but how much does that help performance in the typical case?

Perhaps, You are right.  I tested it for some cases, but couldn't find
any cases speeded-up by it.

Thanks,





Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Thu, 01 May 2014 01:34:02 GMT) Full text and rfc822 format available.

Message #40 received at 17350-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17350-done <at> debbugs.gnu.org
Subject: Re: bug#17350: [PATCH] grep: speed up for a case to repeat failure
 in DFA after success in kwset
Date: Wed, 30 Apr 2014 18:33:32 -0700
Norihiro Tanaka wrote:
> I tested it for some cases, but couldn't find
> any cases speeded-up by it.

Thanks for checking; I installed that combination of patches and am 
marking this as done.




Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Thu, 01 May 2014 11:52:01 GMT) Full text and rfc822 format available.

Message #43 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17350 <at> debbugs.gnu.org
Subject: bug#17350: [PATCH] grep: speed up for a case to repeat failure in DFA
 after success in kwset
Date: Thu, 01 May 2014 20:51:49 +0900
[Message part 1 (text/plain, inline)]
Paul Eggert wrote:
> Thanks for checking; I installed that combination of patches and am
> marking this as done.

Sorry, I have discovered a bug in your improvement, and fix it.

[patch.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Sat, 03 May 2014 22:41:02 GMT) Full text and rfc822 format available.

Message #46 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17350 <at> debbugs.gnu.org
Subject: Re: bug#17350: [PATCH] grep: speed up for a case to repeat failure
 in DFA after success in kwset
Date: Sat, 03 May 2014 15:40:05 -0700
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> Sorry, I have discovered a bug in your improvement, and fix it.

Thanks for reporting and fixing that!  I installed that.  Also, I tried 
to change the code a bit to make similar bugs less likely in the future, 
and installed the attached followup patch.
[0001-grep-clarify-EGexecute-slightly.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Tue, 06 May 2014 12:47:03 GMT) Full text and rfc822 format available.

Message #49 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17350 <at> debbugs.gnu.org
Subject: bug#17350: [PATCH] grep: speed up for a case to repeat failure in DFA
 after success in kwset
Date: Tue, 06 May 2014 21:45:54 +0900
[Message part 1 (text/plain, inline)]
It seems that we can't ignore overheads by searching BACKREF in dfaisfast().
First patch fixes it.  Second patch fixes a typo in a comment I have found
in the process.
[0001-dfa-checking-BACKREF-in-advance.patch (text/plain, attachment)]
[0002-dfa-fix-comment.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Tue, 06 May 2014 14:47:02 GMT) Full text and rfc822 format available.

Message #52 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17350 <at> debbugs.gnu.org
Subject: Re: bug#17350: [PATCH] grep: speed up for a case to repeat failure
 in DFA after success in kwset
Date: Tue, 06 May 2014 07:46:32 -0700
[Message part 1 (text/plain, inline)]
Thanks, I pushed that with minor changes to the ChangeLog message (first 
attached patch).  It's a tad faster to combine the relevant flags at 
compile-time so I did that (second attached patch).

"Iff" means "if and only if"; see <http://en.wikipedia.org/wiki/Iff>.  I
got the habit of trying to be logically-precise from RMS, who I guess
got it back in the days when he was doing AI.  Anyway, I reworded the 
text in a different way to avoid the misuse of "if" (third attached patch).
[0001-dfa-speed-up-dfaisfast.patch (text/plain, attachment)]
[0002-dfa-minor-performance-improvement-for-previous-chang.patch (text/plain, attachment)]
[0003-dfa-clarify-use-of-if.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Tue, 06 May 2014 22:09:01 GMT) Full text and rfc822 format available.

Message #55 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17350 <at> debbugs.gnu.org
Subject: bug#17350: [PATCH] grep: speed up for a case to repeat failure in DFA
 after success in kwset
Date: Wed, 07 May 2014 07:08:34 +0900
Paul Eggert wrote:
> "Iff" means "if and only if"; see <http://en.wikipedia.org/wiki/Iff>.  I
> got the habit of trying to be logically-precise from RMS, who I guess
> got it back in the days when he was doing AI.

Wow, I understand it.

Thanks,
Norihiro





Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Tue, 06 May 2014 22:18:02 GMT) Full text and rfc822 format available.

Message #58 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17350 <at> debbugs.gnu.org
Subject: bug#17350: [PATCH] grep: speed up for a case to repeat failure in DFA
 after success in kwset
Date: Wed, 07 May 2014 07:17:39 +0900
Paul Eggert wrote:
> Anyway, I reworded the text in a different way to avoid the misuse of
> "if" (third attached patch).

By the way, do you purposely replace spaces with tabs in third patch?





Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Wed, 07 May 2014 01:46:01 GMT) Full text and rfc822 format available.

Message #61 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17350 <at> debbugs.gnu.org
Subject: Re: bug#17350: [PATCH] grep: speed up for a case to repeat failure
 in DFA after success in kwset
Date: Tue, 06 May 2014 18:45:25 -0700
Norihiro Tanaka wrote:
> Paul Eggert wrote:
>> Anyway, I reworded the text in a different way to avoid the misuse of
>> "if" (third attached patch).
>
> By the way, do you purposely replace spaces with tabs in third patch?

Yes, we're using tabs to line up code that is multicolumn (typically the 
first column is code, the second comments).  Admittedly the code is not 
consistent in this area.

I'd prefer a different style, where there's a comment on one line and 
code on the next, and where there are no tabs, but now's not a good time 
to change styles.






Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Wed, 07 May 2014 11:56:01 GMT) Full text and rfc822 format available.

Message #64 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17350 <at> debbugs.gnu.org
Subject: bug#17350: [PATCH] grep: speed up for a case to repeat failure in DFA
 after success in kwset
Date: Wed, 07 May 2014 20:55:38 +0900
Paul Eggert wrote:
> Yes, we're using tabs to line up code that is multicolumn (typically
>  the first column is code, the second comments).  Admittedly the code
> is not consistent in this area.
> 
> I'd prefer a different style, where there's a comment on one line and
> code on the next, and where there are no tabs, but now's not a good
> time to change styles.

Thanks, I understand it.

Norihiro





Information forwarded to bug-grep <at> gnu.org:
bug#17350; Package grep. (Fri, 09 May 2014 19:40:01 GMT) Full text and rfc822 format available.

Message #67 received at 17350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17350 <at> debbugs.gnu.org
Subject: Re: bug#17350: [PATCH] grep: speed up for a case to repeat failure in
 DFA after success in kwset
Date: Fri, 9 May 2014 12:38:51 -0700
[Message part 1 (text/plain, inline)]
On Tue, May 6, 2014 at 7:46 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Thanks, I pushed that with minor changes to the ChangeLog message (first
> attached patch).  It's a tad faster to combine the relevant flags at
> compile-time so I did that (second attached patch).

Reading your 0002 "dfa: minor performance improvement for previous
change", I noticed that the fall-through "case BACKREF:" was not
marked as such, so have added a few /* fallthrough */ comments, to
mark the places where one (or a static analyzer) might have a doubt:
[0001-maint-mark-some-breakless-cases-with-fallthrough-com.patch (application/octet-stream, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 07 Jun 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 19 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.