GNU bug report logs - #17420
[PATCH] grep: always convert fgrep to grep

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Tue, 6 May 2014 13:38:02 UTC

Severity: minor

Tags: patch

Done: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17420 in the body.
You can then email your comments to 17420 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#17420; Package grep. (Tue, 06 May 2014 13:38:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Tue, 06 May 2014 13:38:05 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: bug-grep <at> gnu.org
Subject: [PATCH] grep: always convert fgrep to grep
Date: Tue, 06 May 2014 22:36:57 +0900
[Message part 1 (text/plain, inline)]
fgrep uses different matcher from grep, but there is no reason why it
must be so.  Further more, fgrep matcher is slower than grep matcher in
many cases, because it never uses DFA.  This patch converts fgrep to grep
any time.

By the way, if `FGREP_NO_DFA' env. is set, fgrep still uses fgrep matcher.
It may be required in some cases.  fgrep matcher requires less memory
than grep matcher even if a pattern is too long, and/or is faster than
it in some cases to have multiple patterns.
[0001-grep-always-convert-fgrep-to-grep.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17420; Package grep. (Wed, 07 May 2014 07:01:02 GMT) Full text and rfc822 format available.

Message #8 received at 17420 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17420 <at> debbugs.gnu.org
Subject: Re: bug#17420: [PATCH] grep: always convert fgrep to grep
Date: Wed, 07 May 2014 00:00:18 -0700
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> By the way, if `FGREP_NO_DFA' env. is set, fgrep still uses fgrep matcher.

I'd rather not introduce another environment variable.  GREP_COLOR and 
GREP_COLORS and GREP_OPTIONS are now commonly considered to have been 
mistakes, and I'd rather not compound them.

Instead, how about removing the fgrep matcher entirely?  That'll make 
the code simpler and easier to maintain, which is a real plus.  Perhaps 
it'll be slower in a few cases, but as long as significant slowdowns are 
rare, that's OK.

Something like the attached patch, say.
[0001-grep-simplify-by-removing-Fexecute-kwsearch.c-etc.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17420; Package grep. (Fri, 09 May 2014 15:17:01 GMT) Full text and rfc822 format available.

Message #11 received at 17420 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17420 <at> debbugs.gnu.org
Subject: bug#17420: [PATCH] grep: always convert fgrep to grep
Date: Sat, 10 May 2014 00:15:59 +0900
Paul Eggert wrote:
> but as long as significant slowdowns are rare, that's OK.

That's graceful.  However, I concern two slower cases.

$ echo a | env LC_ALL=C time -p src/grep -Ff /usr/share/dict/linux.words
    real 1.34       user 1.26       sys 0.07
$ echo a | env LC_ALL=C time -p src/grep -f /usr/share/dict/linux.words
    real 56.79      user 6.33       sys 48.79

$ yes /usr/share/dict/linux.words | head -100 | xargs cat > k
$ printf 'Python\nPerl\nPascall\nProlog\nPHP\nRuby\nHaskell\nLisp\nScheme\n' |
  env LC_ALL=C time -p src/grep -Ff - k >/dev/null
    real 1.84       user 1.78       sys 0.05
$ printf 'Python\nPerl\nPascall\nProlog\nPHP\nRuby\nHaskell\nLisp\nScheme\n'
  env LC_ALL=C time -p src/grep -f - k >/dev/null
    real 2.26       user 2.19       sys 0.06

Now, Beate Commentz-Waltertz Walter algorithm in KWset is used by only
fgrep matcher.  Therefore if it's effective, fgrep matcher is faster
than grep matcher.  In addition, Beate Commentz-Waltertz Walter algorithm
is more smaller memory consumption than the DFA.

However, below is very slow, so that Beate Commentz-Waltertz Walter
algorithm in KWset hasn't impremented Galil rule yet.

  $ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 >k
  $ printf 'kjjjjjjjjjjjjjjjjjjj\nq\n' | env LC_ALL=C src/grep -Ff - k
    real 22.67      user 18.31      sys 3.64
  $ printf 'kjjjjjjjjjjjjjjjjjjj\nq\n' | env LC_ALL=C src/grep -f - k
    real 1.09       user 1.03       sys 0.05

Thanks,
Norihiro





Information forwarded to bug-grep <at> gnu.org:
bug#17420; Package grep. (Sat, 10 May 2014 06:45:01 GMT) Full text and rfc822 format available.

Message #14 received at 17420 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17420 <at> debbugs.gnu.org
Subject: Re: bug#17420: [PATCH] grep: always convert fgrep to grep
Date: Fri, 09 May 2014 23:44:01 -0700
Thanks, I see your point, so let's ignore my patch.  Still, I'd rather 
not have the behavior depend on environment variables, and I'd rather 
have grep "just work" without the user having to specify options.  Is 
there a better heuristic for deciding whether to use Commentz-Walter? 
If not, perhaps we should leave the code alone.

> Beate Commentz-Waltertz Walter algorithm in KWset hasn't impremented Galil rule yet.

Can the Galil rule be adapted to Commentz-Walter?  I didn't know this 
was possible.

Another possibility might be to replace Commentz-Walter with some other 
algorithm (Aho-Corasick, modified Wu-Manber, etc.)  I suppose this would 
be a bigger project, though.




Reply sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
You have taken responsibility. (Sat, 10 May 2014 23:40:02 GMT) Full text and rfc822 format available.

Notification sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
bug acknowledged by developer. (Sat, 10 May 2014 23:40:02 GMT) Full text and rfc822 format available.

Message #19 received at 17420-done <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 17420-done <at> debbugs.gnu.org
Subject: bug#17420: [PATCH] grep: always convert fgrep to grep
Date: Sun, 11 May 2014 08:39:09 +0900
> Is there a better heuristic for deciding whether to use Commentz-Walter?

No, I have no idea.

> Can the Galil rule be adapted to Commentz-Walter?

No, I don't know even whether we can do it.

> Another possibility might be to replace Commentz-Walter with some other algorithm (Aho-Corasick, modified Wu-Manber, etc.).

DFA is already like Aho-Corasick.  So even if we imprement them, there
will be no merit much in spite of big changes.

The argument has convinced me that we shouldn't apply the patch now.

Thanks,
Norihiro





Severity set to 'minor' from 'normal' Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Sun, 11 May 2014 00:05:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 08 Jun 2014 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 107 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.