GNU bug report logs - #15630
grep 2.14 much slower than 2.5.1

Previous Next

Package: grep;

Reported by: "Z. Majeed" <zmajeed <at> sbcglobal.net>

Date: Wed, 16 Oct 2013 14:11:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 15630 in the body.
You can then email your comments to 15630 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#15630; Package grep. (Wed, 16 Oct 2013 14:11:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Z. Majeed" <zmajeed <at> sbcglobal.net>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Wed, 16 Oct 2013 14:11:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Z. Majeed" <zmajeed <at> sbcglobal.net>
To: "bug-grep <at> gnu.org" <bug-grep <at> gnu.org>
Subject: grep 2.14 much slower than 2.5.1
Date: Wed, 16 Oct 2013 07:09:56 -0700 (PDT)
[Message part 1 (text/plain, inline)]
I'm running 64-bit builds of grep 2.14 and 2.5.1 on a Red Hat 5.6 box - grep 2.14 is significantly slower than 2.5.1 on a simple regex - the times are:
grep 2.5.1: 4.39user 3.19system 0:07.60elapsed
grep 2.14: 25.92user 2.84system 0:28.76elapsed

the grep commandline is -i "<name>.*russia" - the file is a large XML file with 101,766,751 lines around 3.6 GB size - there are 14,772 matched lines - runs are in the C locale - both grep builds have the default configuration - here are callgrind top Ir counts:
grep 2.5.1:
5,985,715,429 kwset.c:kwsexec
  833,138,736 dfa.c:dfaexec
  360,061,388 ???:memchr
  110,119,157 search.c:EGexecute
   34,010,204 grep.c:grepfile
   32,198,545 ???:__ctype_get_mb_cur_max
   11,459,760 grep.c:fillbuf
    7,175,377 ???:memmove
    3,623,898 grep.c:grepbuf

grep 2.14:
36,717,431,504 dfa.c:dfaexec
15,709,111,428 ???:memchr
12,363,145,663 kwset.c:kwsexec
 6,483,204,386 dfasearch.c:EGexecute
    14,650,909 ???:memrchr
    10,358,230 main.c:fillbuf
     7,172,801 ???:memmove
     7,162,667 main.c:grepdesc
     4,484,004 main.c:grepbuf
     1,250,200 ???:__ctype_get_mb_cur_max

and top function call counts:
grep 2.5.1:
kwsexec 1656108
__ctype_get_mb_cur_max 1547396
memchr 1547383
dfaexec 1547383
__ctype_get_mb_cur_max 1547383
__ctype_get_mb_cur_max 1547383
__ctype_get_mb_cur_max 1532611
EGexecute 124962
__ctype_get_mb_cur_max 124962
read 110191
grepbuf 110190
fillbuf 110190
memmove 110189
__ctype_get_mb_cur_max 108725
prtext 14772
prline 14772

grep 2.14:
memchr 101766751
kwsexec 101766751
dfaexec 101766751
EGexecute 124966
__ctype_get_mb_cur_max 124966
__ctype_get_mb_cur_max 124966
read 110195
memrchr 110194
grepbuf 110194
fillbuf 110194
memmove 110193
prtext 14772
prline 14772


Ratios of  Ir counts to function call counts:
grep 2.5.1:

dfaexec: 538.42 = 833138736/1547383
kwsexec: 3614.33 = 5985715429/1656108
memchr: 232.69 = 360061388/1547383

grep 2.14:
dfaexec: 360.80 = 36717431504/101766751

kwsexec: 121.48 = 12363145663/101766751

memchr:154.36 = 15709111428/101766751



1. grep 2.14 calls kwsexec, dfaexec and memchr once per line while 2.5.1 makes far fewer calls to those functions
2. grep 2.5.1 calls __ctype_get_mb_cur_max many more times than 2.14 but overall spends less time in the function
3. grep 2.14 calls memrchr while grep 2.5.1 does not
4. grep 2.5.1 generally passes longer chunks to memchr thus reducing the overall time it spends in the function

Is there a runtime option or buildtime configuration for grep 2.14 that could give it comparable performance to grep 2.5.1 for the sort of simple regex in my example?

Zartaj
[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#15630; Package grep. (Wed, 16 Oct 2013 19:21:02 GMT) Full text and rfc822 format available.

Message #8 received at 15630 <at> debbugs.gnu.org (full text, mbox):

From: "Z. Majeed" <zmajeed <at> sbcglobal.net>
To: "15630 <at> debbugs.gnu.org" <15630 <at> debbugs.gnu.org>
Subject: Re: bug#15630: Acknowledgement (grep 2.14 much slower than 2.5.1)
Date: Wed, 16 Oct 2013 12:20:15 -0700 (PDT)
[Message part 1 (text/plain, inline)]
I see the reason is the workaround in do_execute that turns on line-by-line matching for -i across the board - I got runtime confirmation by trying "<name>.*[rR][uU][sS][sS][iI][aA]" - the times were faster than for grep 2.5.1 with -i:
3.59user 2.95system 0:06.55elapsed

I'm not sure if the workaround is for the -i problem in UTF-8 locales discussed in http://savannah.gnu.org/bugs/?29391. This bug report really should be titled "--ignore-case very slow in grep 2.14"


Zartaj



________________________________
 From: GNU bug Tracking System <help-debbugs <at> gnu.org>
To: Z. Majeed <zmajeed <at> sbcglobal.net> 
Sent: Wednesday, October 16, 2013 10:11 AM
Subject: bug#15630: Acknowledgement (grep 2.14 much slower than 2.5.1)
 

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
bug-grep <at> gnu.org

If you wish to submit further information on this problem, please
send it to 15630 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs <at> gnu.org unless you wish
to report a problem with the Bug-tracking system.

-- 
15630: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=15630
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#15630; Package grep. (Sun, 20 Oct 2013 00:59:02 GMT) Full text and rfc822 format available.

Message #11 received at 15630 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "Z. Majeed" <zmajeed <at> sbcglobal.net>
Cc: "15630 <at> debbugs.gnu.org" <15630 <at> debbugs.gnu.org>
Subject: Re: bug#15630: Acknowledgement (grep 2.14 much slower than 2.5.1)
Date: Sat, 19 Oct 2013 17:58:01 -0700
On Wed, Oct 16, 2013 at 12:20 PM, Z. Majeed <zmajeed <at> sbcglobal.net> wrote:
> I see the reason is the workaround in do_execute that turns on line-by-line matching for -i across the board - I got runtime confirmation by trying "<name>.*[rR][uU][sS][sS][iI][aA]" - the times were faster than for grep 2.5.1 with -i:
> 3.59user 2.95system 0:06.55elapsed
>
> I'm not sure if the workaround is for the -i problem in UTF-8 locales discussed in http://savannah.gnu.org/bugs/?29391. This bug report really should be titled "--ignore-case very slow in grep 2.14"

Thanks for the reminder.
I'm about to release grep-2.15, but after that, I will be inclined to
address that problem.




Information forwarded to bug-grep <at> gnu.org:
bug#15630; Package grep. (Sun, 20 Oct 2013 01:40:02 GMT) Full text and rfc822 format available.

Message #14 received at 15630 <at> debbugs.gnu.org (full text, mbox):

From: "Z. Majeed" <zmajeed <at> sbcglobal.net>
To: "15630 <at> debbugs.gnu.org" <15630 <at> debbugs.gnu.org>
Cc: Jim Meyering <jim <at> meyering.net>
Subject: Re: bug#15630: Acknowledgement (grep 2.14 much slower than 2.5.1)
Date: Sat, 19 Oct 2013 18:39:45 -0700 (PDT)
[Message part 1 (text/plain, inline)]
Thanks - I nearly always use -i so a fix would be highly appreciated - meantime I dug a bit more into this issue - it's not as straightforward as it first seemed - the crux of the problem is not the workaround for UTF-8 but -i "<name>.*russia" causing dfamust to be just the one-character string "<" because "name" turns into character classes - for XML input that practically makes keyword matching worthless and the main loop in EGexecute degenerates to line-by-line processing - it seems to me dfaparse ought to deal with case foldings a little better so the trans table support in cwexec gets used - there have also been some simple patches submitted to make use of trans in bmexec

Zartaj



________________________________
 From: Jim Meyering <jim <at> meyering.net>
To: Z. Majeed <zmajeed <at> sbcglobal.net> 
Cc: "15630 <at> debbugs.gnu.org" <15630 <at> debbugs.gnu.org> 
Sent: Saturday, October 19, 2013 8:58 PM
Subject: Re: bug#15630: Acknowledgement (grep 2.14 much slower than 2.5.1)
 

On Wed, Oct 16, 2013 at 12:20 PM, Z. Majeed <zmajeed <at> sbcglobal.net> wrote:
> I see the reason is the workaround in do_execute that turns on line-by-line matching for -i across the board - I got runtime confirmation by trying "<name>.*[rR][uU][sS][sS][iI][aA]" - the times were faster than for grep 2.5.1 with -i:
> 3.59user 2.95system 0:06.55elapsed
>
> I'm not sure if the workaround is for the -i problem in UTF-8 locales discussed in http://savannah.gnu.org/bugs/?29391. This bug report really should be titled "--ignore-case very slow in grep 2.14"

Thanks for the reminder.
I'm about to release grep-2.15, but after that, I will be inclined to
address that problem.
[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#15630; Package grep. (Sun, 20 Oct 2013 02:55:02 GMT) Full text and rfc822 format available.

Message #17 received at 15630 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "Z. Majeed" <zmajeed <at> sbcglobal.net>
Cc: "15630 <at> debbugs.gnu.org" <15630 <at> debbugs.gnu.org>
Subject: Re: bug#15630: Acknowledgement (grep 2.14 much slower than 2.5.1)
Date: Sat, 19 Oct 2013 19:54:23 -0700
On Sat, Oct 19, 2013 at 6:39 PM, Z. Majeed <zmajeed <at> sbcglobal.net> wrote:
> Thanks - I nearly always use -i so a fix would be highly appreciated -
> meantime I dug a bit more into this issue - it's not as straightforward as
> it first seemed - the crux of the problem is not the workaround for UTF-8
> but -i "<name>.*russia" causing dfamust to be just the one-character string
> "<" because "name" turns into character classes - for XML input that
> practically makes keyword matching worthless and the main loop in EGexecute
> degenerates to line-by-line processing - it seems to me dfaparse ought to
> deal with case foldings a little better so the trans table support in cwexec
> gets used - there have also been some simple patches submitted to make use
> of trans in bmexec

If you can point to a seemingly-good patch, please do.




Information forwarded to bug-grep <at> gnu.org:
bug#15630; Package grep. (Sun, 06 Apr 2014 10:46:02 GMT) Full text and rfc822 format available.

Message #20 received at 15630 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: "15630 <at> debbugs.gnu.org" <15630 <at> debbugs.gnu.org>
Cc: Jim Meyering <jim <at> meyering.net>, "Z. Majeed" <zmajeed <at> sbcglobal.net>
Subject: bug#15630: grep 2.14 much slower than 2.5.1
Date: Sun, 06 Apr 2014 19:45:24 +0900
Jim Meyering <jim <at> meyering.net> wrote:
> If you can point to a seemingly-good patch, please do.

This bug will be fixed by application of patch#17019 and patch#17034.

Norihiro





Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Mon, 21 Apr 2014 18:10:02 GMT) Full text and rfc822 format available.

Notification sent to "Z. Majeed" <zmajeed <at> sbcglobal.net>:
bug acknowledged by developer. (Mon, 21 Apr 2014 18:10:03 GMT) Full text and rfc822 format available.

Message #25 received at 15630-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 15630-done <at> debbugs.gnu.org
Cc: "Z. Majeed" <zmajeed <at> sbcglobal.net>
Subject: Re: bug#15630: grep 2.14 much slower than 2.5.1
Date: Mon, 21 Apr 2014 11:09:20 -0700
On 04/06/2014 03:45 AM, Norihiro Tanaka wrote:
> This bug will be fixed by application of patch#17019 and patch#17034.

Thanks, Iapplied those patchesa week or two ago, and so am closing this bug.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 20 May 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 38 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.