GNU bug report logs -
#15630
grep 2.14 much slower than 2.5.1
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 15630 in the body.
You can then email your comments to 15630 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#15630
; Package
grep
.
(Wed, 16 Oct 2013 14:11:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"Z. Majeed" <zmajeed <at> sbcglobal.net>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Wed, 16 Oct 2013 14:11:04 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I'm running 64-bit builds of grep 2.14 and 2.5.1 on a Red Hat 5.6 box - grep 2.14 is significantly slower than 2.5.1 on a simple regex - the times are:
grep 2.5.1: 4.39user 3.19system 0:07.60elapsed
grep 2.14: 25.92user 2.84system 0:28.76elapsed
the grep commandline is -i "<name>.*russia" - the file is a large XML file with 101,766,751 lines around 3.6 GB size - there are 14,772 matched lines - runs are in the C locale - both grep builds have the default configuration - here are callgrind top Ir counts:
grep 2.5.1:
5,985,715,429 kwset.c:kwsexec
833,138,736 dfa.c:dfaexec
360,061,388 ???:memchr
110,119,157 search.c:EGexecute
34,010,204 grep.c:grepfile
32,198,545 ???:__ctype_get_mb_cur_max
11,459,760 grep.c:fillbuf
7,175,377 ???:memmove
3,623,898 grep.c:grepbuf
grep 2.14:
36,717,431,504 dfa.c:dfaexec
15,709,111,428 ???:memchr
12,363,145,663 kwset.c:kwsexec
6,483,204,386 dfasearch.c:EGexecute
14,650,909 ???:memrchr
10,358,230 main.c:fillbuf
7,172,801 ???:memmove
7,162,667 main.c:grepdesc
4,484,004 main.c:grepbuf
1,250,200 ???:__ctype_get_mb_cur_max
and top function call counts:
grep 2.5.1:
kwsexec 1656108
__ctype_get_mb_cur_max 1547396
memchr 1547383
dfaexec 1547383
__ctype_get_mb_cur_max 1547383
__ctype_get_mb_cur_max 1547383
__ctype_get_mb_cur_max 1532611
EGexecute 124962
__ctype_get_mb_cur_max 124962
read 110191
grepbuf 110190
fillbuf 110190
memmove 110189
__ctype_get_mb_cur_max 108725
prtext 14772
prline 14772
grep 2.14:
memchr 101766751
kwsexec 101766751
dfaexec 101766751
EGexecute 124966
__ctype_get_mb_cur_max 124966
__ctype_get_mb_cur_max 124966
read 110195
memrchr 110194
grepbuf 110194
fillbuf 110194
memmove 110193
prtext 14772
prline 14772
Ratios of Ir counts to function call counts:
grep 2.5.1:
dfaexec: 538.42 = 833138736/1547383
kwsexec: 3614.33 = 5985715429/1656108
memchr: 232.69 = 360061388/1547383
grep 2.14:
dfaexec: 360.80 = 36717431504/101766751
kwsexec: 121.48 = 12363145663/101766751
memchr:154.36 = 15709111428/101766751
1. grep 2.14 calls kwsexec, dfaexec and memchr once per line while 2.5.1 makes far fewer calls to those functions
2. grep 2.5.1 calls __ctype_get_mb_cur_max many more times than 2.14 but overall spends less time in the function
3. grep 2.14 calls memrchr while grep 2.5.1 does not
4. grep 2.5.1 generally passes longer chunks to memchr thus reducing the overall time it spends in the function
Is there a runtime option or buildtime configuration for grep 2.14 that could give it comparable performance to grep 2.5.1 for the sort of simple regex in my example?
Zartaj
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#15630
; Package
grep
.
(Wed, 16 Oct 2013 19:21:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 15630 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I see the reason is the workaround in do_execute that turns on line-by-line matching for -i across the board - I got runtime confirmation by trying "<name>.*[rR][uU][sS][sS][iI][aA]" - the times were faster than for grep 2.5.1 with -i:
3.59user 2.95system 0:06.55elapsed
I'm not sure if the workaround is for the -i problem in UTF-8 locales discussed in http://savannah.gnu.org/bugs/?29391. This bug report really should be titled "--ignore-case very slow in grep 2.14"
Zartaj
________________________________
From: GNU bug Tracking System <help-debbugs <at> gnu.org>
To: Z. Majeed <zmajeed <at> sbcglobal.net>
Sent: Wednesday, October 16, 2013 10:11 AM
Subject: bug#15630: Acknowledgement (grep 2.14 much slower than 2.5.1)
Thank you for filing a new bug report with debbugs.gnu.org.
This is an automatically generated reply to let you know your message
has been received.
Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.
Your message has been sent to the package maintainer(s):
bug-grep <at> gnu.org
If you wish to submit further information on this problem, please
send it to 15630 <at> debbugs.gnu.org.
Please do not send mail to help-debbugs <at> gnu.org unless you wish
to report a problem with the Bug-tracking system.
--
15630: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=15630
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#15630
; Package
grep
.
(Sun, 20 Oct 2013 00:59:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 15630 <at> debbugs.gnu.org (full text, mbox):
On Wed, Oct 16, 2013 at 12:20 PM, Z. Majeed <zmajeed <at> sbcglobal.net> wrote:
> I see the reason is the workaround in do_execute that turns on line-by-line matching for -i across the board - I got runtime confirmation by trying "<name>.*[rR][uU][sS][sS][iI][aA]" - the times were faster than for grep 2.5.1 with -i:
> 3.59user 2.95system 0:06.55elapsed
>
> I'm not sure if the workaround is for the -i problem in UTF-8 locales discussed in http://savannah.gnu.org/bugs/?29391. This bug report really should be titled "--ignore-case very slow in grep 2.14"
Thanks for the reminder.
I'm about to release grep-2.15, but after that, I will be inclined to
address that problem.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#15630
; Package
grep
.
(Sun, 20 Oct 2013 01:40:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 15630 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Thanks - I nearly always use -i so a fix would be highly appreciated - meantime I dug a bit more into this issue - it's not as straightforward as it first seemed - the crux of the problem is not the workaround for UTF-8 but -i "<name>.*russia" causing dfamust to be just the one-character string "<" because "name" turns into character classes - for XML input that practically makes keyword matching worthless and the main loop in EGexecute degenerates to line-by-line processing - it seems to me dfaparse ought to deal with case foldings a little better so the trans table support in cwexec gets used - there have also been some simple patches submitted to make use of trans in bmexec
Zartaj
________________________________
From: Jim Meyering <jim <at> meyering.net>
To: Z. Majeed <zmajeed <at> sbcglobal.net>
Cc: "15630 <at> debbugs.gnu.org" <15630 <at> debbugs.gnu.org>
Sent: Saturday, October 19, 2013 8:58 PM
Subject: Re: bug#15630: Acknowledgement (grep 2.14 much slower than 2.5.1)
On Wed, Oct 16, 2013 at 12:20 PM, Z. Majeed <zmajeed <at> sbcglobal.net> wrote:
> I see the reason is the workaround in do_execute that turns on line-by-line matching for -i across the board - I got runtime confirmation by trying "<name>.*[rR][uU][sS][sS][iI][aA]" - the times were faster than for grep 2.5.1 with -i:
> 3.59user 2.95system 0:06.55elapsed
>
> I'm not sure if the workaround is for the -i problem in UTF-8 locales discussed in http://savannah.gnu.org/bugs/?29391. This bug report really should be titled "--ignore-case very slow in grep 2.14"
Thanks for the reminder.
I'm about to release grep-2.15, but after that, I will be inclined to
address that problem.
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#15630
; Package
grep
.
(Sun, 20 Oct 2013 02:55:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 15630 <at> debbugs.gnu.org (full text, mbox):
On Sat, Oct 19, 2013 at 6:39 PM, Z. Majeed <zmajeed <at> sbcglobal.net> wrote:
> Thanks - I nearly always use -i so a fix would be highly appreciated -
> meantime I dug a bit more into this issue - it's not as straightforward as
> it first seemed - the crux of the problem is not the workaround for UTF-8
> but -i "<name>.*russia" causing dfamust to be just the one-character string
> "<" because "name" turns into character classes - for XML input that
> practically makes keyword matching worthless and the main loop in EGexecute
> degenerates to line-by-line processing - it seems to me dfaparse ought to
> deal with case foldings a little better so the trans table support in cwexec
> gets used - there have also been some simple patches submitted to make use
> of trans in bmexec
If you can point to a seemingly-good patch, please do.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#15630
; Package
grep
.
(Sun, 06 Apr 2014 10:46:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 15630 <at> debbugs.gnu.org (full text, mbox):
Jim Meyering <jim <at> meyering.net> wrote:
> If you can point to a seemingly-good patch, please do.
This bug will be fixed by application of patch#17019 and patch#17034.
Norihiro
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Mon, 21 Apr 2014 18:10:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
"Z. Majeed" <zmajeed <at> sbcglobal.net>
:
bug acknowledged by developer.
(Mon, 21 Apr 2014 18:10:03 GMT)
Full text and
rfc822 format available.
Message #25 received at 15630-done <at> debbugs.gnu.org (full text, mbox):
On 04/06/2014 03:45 AM, Norihiro Tanaka wrote:
> This bug will be fixed by application of patch#17019 and patch#17034.
Thanks, Iapplied those patchesa week or two ago, and so am closing this bug.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 20 May 2014 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 11 years and 38 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.