GNU bug report logs - #16823
Use DFA regex engine on fgrep matcher

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Thu, 20 Feb 2014 13:27:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16823 in the body.
You can then email your comments to 16823 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#16823; Package grep. (Thu, 20 Feb 2014 13:27:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Thu, 20 Feb 2014 13:27:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: submit <at> debbugs.gnu.org
Subject: Use DFA regex engine on fgrep matcher
Date: Thu, 20 Feb 2014 22:26:35 +0900
[Message part 1 (text/plain, inline)]
Package: grep
Tags: patch

In recent years, grep matcher is very fast by improving the dfa engine.
On the other hands, fgrep matcher only uses kwset engine, which isn't
generally very good at for case-insensitive matching.

The patch enables to switch case-insensitive matching with fgrep matcher
into one with grep matcher, which can use DFA engine, make --ignore-case
(-i) with fgrep master faster (about 30-40x) in UTF locale.


- Before the patch

$ yes $(printf '%078dm' 0)| head -1000000 | tr 0 a > in
$ for i in 1 2 3 4 5; do env LC_ALL=en_US.UTF-8 time src/fgrep -i 'n' in; done
Command exited with non-zero status 1
6.06user 2.23system 0:08.59elapsed 96%CPU (0avgtext+0avgdata 2624maxresident)k
0inputs+0outputs (0major+188minor)pagefaults 0swaps
Command exited with non-zero status 1
6.23user 2.15system 0:08.64elapsed 97%CPU (0avgtext+0avgdata 2608maxresident)k
0inputs+0outputs (0major+187minor)pagefaults 0swaps
Command exited with non-zero status 1
6.83user 1.44system 0:08.47elapsed 97%CPU (0avgtext+0avgdata 2608maxresident)k
0inputs+0outputs (0major+187minor)pagefaults 0swaps
Command exited with non-zero status 1
7.35user 1.25system 0:08.77elapsed 98%CPU (0avgtext+0avgdata 2624maxresident)k
0inputs+0outputs (0major+188minor)pagefaults 0swaps
Command exited with non-zero status 1
7.60user 0.63system 0:08.48elapsed 97%CPU (0avgtext+0avgdata 2608maxresident)k
0inputs+0outputs (0major+187minor)pagefaults 0swaps


- After the patch

$ yes $(printf '%078dm' 0)| head -1000000 | tr 0 a > in
$ for i in 1 2 3 4 5; do env LC_ALL=en_US.UTF-8 time src/fgrep -i 'n' in; done
Command exited with non-zero status 1
0.19user 0.10system 0:00.30elapsed 97%CPU (0avgtext+0avgdata 2976maxresident)k
0inputs+0outputs (0major+210minor)pagefaults 0swaps
Command exited with non-zero status 1
0.16user 0.06system 0:00.22elapsed 99%CPU (0avgtext+0avgdata 2976maxresident)k
0inputs+0outputs (0major+210minor)pagefaults 0swaps
Command exited with non-zero status 1
0.18user 0.04system 0:00.23elapsed 95%CPU (0avgtext+0avgdata 2976maxresident)k
0inputs+0outputs (0major+210minor)pagefaults 0swaps
Command exited with non-zero status 1
0.15user 0.07system 0:00.23elapsed 96%CPU (0avgtext+0avgdata 2976maxresident)k
0inputs+0outputs (0major+210minor)pagefaults 0swaps
Command exited with non-zero status 1
0.17user 0.04system 0:00.24elapsed 93%CPU (0avgtext+0avgdata 2976maxresident)k
0inputs+0outputs (0major+210minor)pagefaults 0swaps

Norihiro
[dfa_fgrep.txt (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#16823; Package grep. (Thu, 20 Feb 2014 14:29:03 GMT) Full text and rfc822 format available.

Message #8 received at 16823 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 16823 <at> debbugs.gnu.org
Subject: Re: bug#16823: Use DFA regex engine on fgrep matcher
Date: Thu, 20 Feb 2014 23:28:19 +0900
In following case, about 200-400x faster. It's equal to performance of grep.
Patch#16232 may also work effectively.

- Before the patch

$ yes $(printf '%078dm' 0)| head -1000000 | tr 0 a > in
$ for i in 1 2 3 4 5; do env LC_ALL=ja_JP.UTF-8 time src/fgrep -i 'a' in; done
Command exited with non-zero status 1
7.46user 0.91system 0:08.62elapsed 97%CPU (0avgtext+0avgdata 2624maxresident)k
0inputs+0outputs (0major+188minor)pagefaults 0swaps
Command exited with non-zero status 1
7.94user 0.84system 0:09.13elapsed 96%CPU (0avgtext+0avgdata 2624maxresident)k
0inputs+0outputs (0major+188minor)pagefaults 0swaps
Command exited with non-zero status 1
7.72user 0.76system 0:08.83elapsed 96%CPU (0avgtext+0avgdata 2624maxresident)k
0inputs+0outputs (0major+188minor)pagefaults 0swaps
Command exited with non-zero status 1
7.77user 0.62system 0:08.74elapsed 96%CPU (0avgtext+0avgdata 2608maxresident)k
0inputs+0outputs (0major+187minor)pagefaults 0swaps
Command exited with non-zero status 1
8.03user 0.71system 0:09.08elapsed 96%CPU (0avgtext+0avgdata 2624maxresident)k
0inputs+0outputs (0major+188minor)pagefaults 0swaps

- After the patch

$ yes $(printf '%078dm' 0)| head -1000000 | tr 0 a > in
$ for i in 1 2 3 4 5; do env LC_ALL=ja_JP.UTF-8 time src/fgrep -i 'a' in; done
Command exited with non-zero status 1
0.04user 0.08system 0:00.14elapsed 90%CPU (0avgtext+0avgdata 3008maxresident)k
0inputs+0outputs (0major+212minor)pagefaults 0swaps
Command exited with non-zero status 1
0.02user 0.04system 0:00.08elapsed 89%CPU (0avgtext+0avgdata 3008maxresident)k
0inputs+0outputs (0major+226minor)pagefaults 0swaps
Command exited with non-zero status 1
0.02user 0.05system 0:00.08elapsed 89%CPU (0avgtext+0avgdata 3024maxresident)k
0inputs+0outputs (0major+213minor)pagefaults 0swaps
Command exited with non-zero status 1
0.02user 0.05system 0:00.09elapsed 83%CPU (0avgtext+0avgdata 3024maxresident)k
0inputs+0outputs (0major+213minor)pagefaults 0swaps
Command exited with non-zero status 1
0.02user 0.04system 0:00.07elapsed 93%CPU (0avgtext+0avgdata 3024maxresident)k
0inputs+0outputs (0major+213minor)pagefaults 0swaps





Information forwarded to bug-grep <at> gnu.org:
bug#16823; Package grep. (Sun, 09 Mar 2014 14:52:02 GMT) Full text and rfc822 format available.

Message #11 received at 16823 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 16823 <at> debbugs.gnu.org
Subject: bug#16823: Use DFA regex engine on fgrep matcher
Date: Sun, 09 Mar 2014 23:51:05 +0900
[Message part 1 (text/plain, inline)]
I make an update and add the draft of the commit log for the patch.

Norihiro
[patch.txt (application/octet-stream, attachment)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Mon, 24 Mar 2014 01:19:02 GMT) Full text and rfc822 format available.

Notification sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
bug acknowledged by developer. (Mon, 24 Mar 2014 01:19:02 GMT) Full text and rfc822 format available.

Message #16 received at 16823-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 16823-done <at> debbugs.gnu.org
Subject: Re: bug#16823: Use DFA regex engine on fgrep matcher
Date: Sun, 23 Mar 2014 18:18:34 -0700
[Message part 1 (text/plain, inline)]
Thanks, I pushed your patch (with a minor change to make it integrate 
with the latest grep) and then pushed some fixes and one major 
simplification: don't have any special case for "grep -iF PAT" when PAT 
contains no alphabetics.  This is rare enough that I expect it's not 
worth complicating grep to worry about it.

I'm attaching the combined patch, that is the merge of your patch plus 
my changes.
[grep-F.patch (text/plain, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 21 Apr 2014 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 63 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.