GNU bug report logs -
#17420
[PATCH] grep: always convert fgrep to grep
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Tue, 6 May 2014 13:38:02 UTC
Severity: minor
Tags: patch
Done: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17420 in the body.
You can then email your comments to 17420 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#17420
; Package
grep
.
(Tue, 06 May 2014 13:38:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Tue, 06 May 2014 13:38:05 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
fgrep uses different matcher from grep, but there is no reason why it
must be so. Further more, fgrep matcher is slower than grep matcher in
many cases, because it never uses DFA. This patch converts fgrep to grep
any time.
By the way, if `FGREP_NO_DFA' env. is set, fgrep still uses fgrep matcher.
It may be required in some cases. fgrep matcher requires less memory
than grep matcher even if a pattern is too long, and/or is faster than
it in some cases to have multiple patterns.
[0001-grep-always-convert-fgrep-to-grep.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17420
; Package
grep
.
(Wed, 07 May 2014 07:01:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 17420 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> By the way, if `FGREP_NO_DFA' env. is set, fgrep still uses fgrep matcher.
I'd rather not introduce another environment variable. GREP_COLOR and
GREP_COLORS and GREP_OPTIONS are now commonly considered to have been
mistakes, and I'd rather not compound them.
Instead, how about removing the fgrep matcher entirely? That'll make
the code simpler and easier to maintain, which is a real plus. Perhaps
it'll be slower in a few cases, but as long as significant slowdowns are
rare, that's OK.
Something like the attached patch, say.
[0001-grep-simplify-by-removing-Fexecute-kwsearch.c-etc.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17420
; Package
grep
.
(Fri, 09 May 2014 15:17:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 17420 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert wrote:
> but as long as significant slowdowns are rare, that's OK.
That's graceful. However, I concern two slower cases.
$ echo a | env LC_ALL=C time -p src/grep -Ff /usr/share/dict/linux.words
real 1.34 user 1.26 sys 0.07
$ echo a | env LC_ALL=C time -p src/grep -f /usr/share/dict/linux.words
real 56.79 user 6.33 sys 48.79
$ yes /usr/share/dict/linux.words | head -100 | xargs cat > k
$ printf 'Python\nPerl\nPascall\nProlog\nPHP\nRuby\nHaskell\nLisp\nScheme\n' |
env LC_ALL=C time -p src/grep -Ff - k >/dev/null
real 1.84 user 1.78 sys 0.05
$ printf 'Python\nPerl\nPascall\nProlog\nPHP\nRuby\nHaskell\nLisp\nScheme\n'
env LC_ALL=C time -p src/grep -f - k >/dev/null
real 2.26 user 2.19 sys 0.06
Now, Beate Commentz-Waltertz Walter algorithm in KWset is used by only
fgrep matcher. Therefore if it's effective, fgrep matcher is faster
than grep matcher. In addition, Beate Commentz-Waltertz Walter algorithm
is more smaller memory consumption than the DFA.
However, below is very slow, so that Beate Commentz-Waltertz Walter
algorithm in KWset hasn't impremented Galil rule yet.
$ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 >k
$ printf 'kjjjjjjjjjjjjjjjjjjj\nq\n' | env LC_ALL=C src/grep -Ff - k
real 22.67 user 18.31 sys 3.64
$ printf 'kjjjjjjjjjjjjjjjjjjj\nq\n' | env LC_ALL=C src/grep -f - k
real 1.09 user 1.03 sys 0.05
Thanks,
Norihiro
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17420
; Package
grep
.
(Sat, 10 May 2014 06:45:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 17420 <at> debbugs.gnu.org (full text, mbox):
Thanks, I see your point, so let's ignore my patch. Still, I'd rather
not have the behavior depend on environment variables, and I'd rather
have grep "just work" without the user having to specify options. Is
there a better heuristic for deciding whether to use Commentz-Walter?
If not, perhaps we should leave the code alone.
> Beate Commentz-Waltertz Walter algorithm in KWset hasn't impremented Galil rule yet.
Can the Galil rule be adapted to Commentz-Walter? I didn't know this
was possible.
Another possibility might be to replace Commentz-Walter with some other
algorithm (Aho-Corasick, modified Wu-Manber, etc.) I suppose this would
be a bigger project, though.
Reply sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
You have taken responsibility.
(Sat, 10 May 2014 23:40:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
bug acknowledged by developer.
(Sat, 10 May 2014 23:40:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 17420-done <at> debbugs.gnu.org (full text, mbox):
> Is there a better heuristic for deciding whether to use Commentz-Walter?
No, I have no idea.
> Can the Galil rule be adapted to Commentz-Walter?
No, I don't know even whether we can do it.
> Another possibility might be to replace Commentz-Walter with some other algorithm (Aho-Corasick, modified Wu-Manber, etc.).
DFA is already like Aho-Corasick. So even if we imprement them, there
will be no merit much in spite of big changes.
The argument has convinced me that we shouldn't apply the patch now.
Thanks,
Norihiro
Severity set to 'minor' from 'normal'
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Sun, 11 May 2014 00:05:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 08 Jun 2014 11:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 11 years and 107 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.