GNU bug report logs - #17700
[PATCH] dfa: speed-up for a pattern that many atoms are catenated

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Thu, 5 Jun 2014 11:41:01 UTC

Severity: normal

Tags: patch

Done: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17700 <at> debbugs.gnu.org
Cc: Aharon Robbins <arnold <at> skeeve.com>
Subject: bug#17700: [PATCH] dfa: speed-up for a pattern that many atoms are catenated
Date: Thu, 05 Jun 2014 09:48:41 -0700
[Message part 1 (text/plain, inline)]
On 06/05/2014 04:32 AM, Norihiro Tanaka wrote:
> +      memchr (cp, *lookfor, lenin - (cp - lookin));
> +      if (!cp)

Thanks, but this part can't be right, as memchr's result is discarded.

It seems to me that much of the performance benefit comes from using a 
faster implementation of strstr, and that the DFA code will be better 
off if it simply uses the system strstr rather than rolling its own.  
(The DFA code dates back before strstr was standardized, which is why it 
has its own implementation.)

I installed the attached patch to do that and got a big speedup:

$ printf '%08192d\n' 0 | time -p src/old/grep -f - /dev/null
real 16.14
user 16.13
sys 0.00
$ printf '%08192d\n' 0 | time -p src/grep -f - /dev/null
real 0.79
user 0.79
sys 0.00

Could you please look at the remaining part of your patch and see 
whether it's a win if it's merged to what's now installed? Thanks.

PS.  Aharon, I assume this'll affect Gawk, in that you'll need to 
provide a strstr if you want to be portable to ancient systems that lack 
it.  strstr was standardized in C89 so it'd have to be a pretty ancient 
system, and it may be better just to let this slide.
[diff.patch (text/x-patch, attachment)]

This bug report was last modified 10 years and 355 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.