GNU bug report logs - #17229
[PATCH 2/2] grep: speed-up by using memchr() in Boyer-Moore searching

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Wed, 9 Apr 2014 13:56:02 UTC

Severity: normal

Tags: moreinfo, patch

Done: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Bug is archived. No further changes may be made.

Full log


Message #20 received at 17229 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17229 <at> debbugs.gnu.org
Subject: bug#17229: [PATCH 2/2] grep: speed-up by using memchr() in
 Boyer-Moore searching
Date: Fri, 11 Apr 2014 08:25:25 +0900
[Message part 1 (text/plain, inline)]
I believe that memchr() isn't slower on HP-UX and Solaris but more
faster on x86 processors.

Now I wrote the additional patch.

It replaces `tp += d' to the other code in bmexec() and cwexec() for x86
processors.

I see that the code is more complex and slower theoretically, but it's
faster actually when `d' is small.  (We can test it by `grep -i jk k'
after patch#17230 and this patch.)

Therefore, I think that memchr() is faster than `tp += d' because the
increment instruction is more faster than the add instruction on x86
processors.  KWSet may be slower than DFA on them in the worst case,
even if doesn't use delta2.  Try to run and compare below and compare
without this patches.  The former uses only KWSet and the later uses only
DFA.  Later is faster on Linux x86, because DFA uses increment instrument
for shift of text pointer.  It may generate a wrong usage among users.
I think that it should be avoided.

  $ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 >k
  $ env LANG=C time -p src/grep jk k
  $ env LANG=C time -p src/grep -i jk k

> sometimes the former is more important than the latter, and this may be
> one of those times.

I see that grep attaches a high value to performance.  It uses not only
regex but also DFA.  Further more, DFA has the specific codes for utf-8
locale, which is used most frequently.  I think that grep may also have
specific codes for x86 processors, which are also used most frequently.

Norihiro
[patch.txt (text/plain, attachment)]

This bug report was last modified 11 years and 105 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.