GNU bug report logs - #19997
Performance differences between Git sources and release tarballs?

Previous Next

Package: grep;

Reported by: sur-behoffski <sur_behoffski <at> grouse.com.au>

Date: Wed, 4 Mar 2015 03:22:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 19997 in the body.
You can then email your comments to 19997 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#19997; Package grep. (Wed, 04 Mar 2015 03:22:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to sur-behoffski <sur_behoffski <at> grouse.com.au>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Wed, 04 Mar 2015 03:22:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: sur-behoffski <sur_behoffski <at> grouse.com.au>
To: "bug-grep <at> gnu.org" <bug-grep <at> gnu.org>
Subject: Performance differences between Git sources and release tarballs?
Date: Wed, 04 Mar 2015 13:51:32 +1030
G'day,

I'm working on comparing the performance of my Boyer-Moore string search
code in the hstbm[1] project, versus the equivalent code in GNU Grep
(src/kwset.c).  My hope is to expand the very narrow corpus of patterns
and file data types that I've used to-date in my testing.  The major risk
in varying a tuned algorithm such as B-M is that pathological data,
and/or important normal cases, may suffer a performance hit as a result
of the variations.  I'm trying to locate, or perhaps build, a corpus,
together with a detailed performance profile, for the limited set of
hardware that I have access to.  Generalising the testing to other
architectures, OSes and/or compiler toolchains is a further target.

(I note that, in the recent past, others have suggested Project
Gutenberg's text of the King James Version of the Bible as one possible
member of such a test corpus.)

I'm looking to use PAPI[2] in order to use hardware counters to help pick
apart where CPU time is spent; but more on that when I have some sentient
results to report.

I decided to start by looking at why GNU Grep was significantly slower than
hstbm when searching for a trivial pattern ("123") in /dev/null.  I knew
that one test (invoking the command 1000 times inside a timed shell script)
that GNU Grep was roughly 20% slower.  I found that "fgrep" -- more
precisely, "grep -F", was a significant factor -- the grep pattern
compilation was more expensive than the fgrep (src/kwsearch.c) compilation.
NLS is another difference; hstbm does not call "bindtextdomain" or
"textdomain"; but again, more on that another time.

When starting to add instrumentation to GNU Grep (grep.c's main), I found
that it was up to 300% slower, not the 20%ish that I'd measured previously.
After vanishing down a number of rabbit-holes, some to do with GCC's
architecture selection, I've found that the 2.21 tarball has high
performance, whereas the Git head is much, much slower.

(I use a source-based (Gentoo) Linux OS, so am able to dissect the stages
of Gentoo's build steps... a little.)

Looking at Grep's "configure --help" output, I see that
"--enable-gcc-warnings" is an option, and, with some experimentation with
invoking compilation in different environments, it seems that a slew of
warning options is enabled in the development tree, that are not enabled in
the release tarball.  This is when I naively invoke ./configure without
looking closely at all the possible configuration options.

So, could you please give me some guidance as to why the release tarball
would build so differently to the development (Git head) set of sources?

Apologies in advance if there's some documentation that I overlooked.

thanks,

sur-behoffski (Brenton Hoff)
Programmer, Grouse Software

[1] http://savannah.nongnu.org/projects/hstbm
[2] http://icl.cs.utk.edu/papi/




Information forwarded to bug-grep <at> gnu.org:
bug#19997; Package grep. (Wed, 04 Mar 2015 07:19:02 GMT) Full text and rfc822 format available.

Message #8 received at 19997 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: sur-behoffski <sur_behoffski <at> grouse.com.au>, 19997 <at> debbugs.gnu.org
Subject: Re: bug#19997: Performance differences between Git sources and release
 tarballs?
Date: Tue, 03 Mar 2015 23:18:19 -0800
sur-behoffski wrote:
> Looking at Grep's "configure --help" output, I see that
> "--enable-gcc-warnings" is an option, and, with some experimentation with
> invoking compilation in different environments, it seems that a slew of
> warning options is enabled in the development tree, that are not enabled in
> the release tarball.

These shouldn't affect grep's runtime performance.

I don't know why you're observing the performance you do.  It could be that 
you're building 'grep' differently from the way Gentoo builds it.  You might try 
building both grep 2.21 and grep master yourself, with the same build procedure 
both ways.




Information forwarded to bug-grep <at> gnu.org:
bug#19997; Package grep. (Tue, 10 Mar 2015 06:01:02 GMT) Full text and rfc822 format available.

Message #11 received at 19997 <at> debbugs.gnu.org (full text, mbox):

From: sur-behoffski <sur_behoffski <at> grouse.com.au>
To: 19997 <at> debbugs.gnu.org
Subject: Re: bug#19997: Performance differences between Git sources and release
 tarballs?
Date: Tue, 10 Mar 2015 16:29:27 +1030
On 03/04/15 17:48, Paul Eggert wrote:
> sur-behoffski wrote:
>> Looking at Grep's "configure --help" output, I see that
>> "--enable-gcc-warnings" is an option, and, with some experimentation with
>> invoking compilation in different environments, it seems that a slew of
>> warning options is enabled in the development tree, that are not enabled in
>> the release tarball.
>
> These shouldn't affect grep's runtime performance.
>
> I don't know why you're observing the performance you do.  It could be that you're building 'grep' differently from the way Gentoo builds it.  You might try building both grep 2.21 and grep master yourself, with the same build procedure both ways.
>

Okay, I think I have this issue sorted:  Adding "-lpapi" to the link command
results in more infrastructure effort before main() is called.  If this
change is factored out, then your comment looks to be correct:  There should
be no performance differences between Git/release tarball builds.

The practical upshot is that PAPI's valuable for finding resource-heavy
portions of code, but then has to be factored out, or possibly added in
equally everywhere, when comparisons between programs are being made.
For the factoring-out case, you have to adopt a dual-configuration stance:
   1. PAPI involved in one-off runs, reporting results, highlighting
      places where resources are being used, and giving quick feedback
      for experimental code changes; plus
   2. The program without PAPI (and also without any PAPI-associated
      intrusive monitoring changes whatsoever), built for multiple-repetition
      full-program timing runs.

Here's a quick summary of results, followed by a more detailed look at
program activity provided courtesy of "strace".  The summary uses "repl",
a Lua script that:
    - Uses "time" on a bash script that repeatedly invokes the program
      (e.g. 1000 "inner reps");
    - Does this whole set multiple times (e.g. 50 "outer reps");
    - Uses "nice -12" to reduce likely variation from other system activity;
    - Sleeps for a short while between outer reps to give other processes
      a chance, as otherwise "nice -12" might be unduly disruptive;
    - Crosses off a few of the fastest and slowest times as possibly
      being aberrations;
    - Sorts the remaining times; and
    - Reports the mean time, together with the lowest few (10) runs.

cheers,

sur-behoffski (Brenton Hoff)
Programmer, Grouse Software

------------------------------------------------------------

Summary:  "justmain.c" and -lpapi (yeah, I know "-O2 -pipe" doesn't do much;
these are included as they are usual for my Gentoo system):

$ cat justmain.c
int main(void)
{
    return 0;
}
$ gcc -O2 -pipe -o justmain      justmain.c
$ gcc -O2 -pipe -o justmain-papi justmain.c -lpapi
$ repl 50/1000 ./justmain
Cmd:    "./justmain"
[50/1000] Mean: 0.501; Lowest: 0.495 0.496 0.496 0.496 0.497 0.497 0.497 0.498 0.498 0.499
$ repl 50/1000 ./justmain-papi
Cmd:    "./justmain-papi"
[50/1000] Mean: 0.988; Lowest: 0.964 0.965 0.965 0.968 0.970 0.973 0.974 0.975 0.977 0.978

------------------------------------------------------------

Comparing the strace output for "justmain" and "justmain-papi" shows the
extra effort needed for papi and other associated libraries:


$ strace ./justmain
execve("./justmain", ["./justmain"], [/* 37 vars */]) = 0
brk(0)                                  = 0x1d40000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbae97eb000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=144976, ...}) = 0
mmap(NULL, 144976, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fbae97c7000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300N\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1716576, ...}) = 0
mmap(NULL, 3828824, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fbae9225000
mprotect(0x7fbae93c3000, 2093056, PROT_NONE) = 0
mmap(0x7fbae95c2000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19d000) = 0x7fbae95c2000
mmap(0x7fbae95c8000, 15448, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fbae95c8000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbae97c6000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbae97c5000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbae97c4000
arch_prctl(ARCH_SET_FS, 0x7fbae97c5700) = 0
mprotect(0x7fbae95c2000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7fbae97ec000, 4096, PROT_READ) = 0
munmap(0x7fbae97c7000, 144976)          = 0
exit_group(0)                           = ?
+++ exited with 0 +++


$ strace ./justmain-papi
execve("./justmain-papi", ["./justmain-papi"], [/* 37 vars */]) = 0
brk(0)                                  = 0x2045000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2112cf4000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=144976, ...}) = 0
mmap(NULL, 144976, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f2112cd0000
close(3)                                = 0
open("/usr/lib64/libpapi.so.5.4.0.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\353\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=356288, ...}) = 0
mmap(NULL, 2456816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f211287d000
mprotect(0x7f21128ca000, 2093056, PROT_NONE) = 0
mmap(0x7f2112ac9000, 45056, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4c000) = 0x7f2112ac9000
mmap(0x7f2112ad4000, 3312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f2112ad4000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300N\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1716576, ...}) = 0
mmap(NULL, 3828824, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f21124d6000
mprotect(0x7f2112674000, 2093056, PROT_NONE) = 0
mmap(0x7f2112873000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19d000) = 0x7f2112873000
mmap(0x7f2112879000, 15448, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f2112879000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2112ccf000
open("/usr/lib64/tls/x86_64/libpfm.so.4", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/tls/x86_64", 0x7ffd2d32b460) = -1 ENOENT (No such file or directory)
open("/usr/lib64/tls/libpfm.so.4", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/tls", 0x7ffd2d32b460)  = -1 ENOENT (No such file or directory)
open("/usr/lib64/x86_64/libpfm.so.4", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/x86_64", 0x7ffd2d32b460) = -1 ENOENT (No such file or directory)
open("/usr/lib64/libpfm.so.4", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \251\5\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1054784, ...}) = 0
mmap(NULL, 3155440, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f21121d3000
mprotect(0x7f211226c000, 2097152, PROT_NONE) = 0
mmap(0x7f211246c000, 430080, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x99000) = 0x7f211246c000
mmap(0x7f21124d5000, 1520, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f21124d5000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2112ccd000
arch_prctl(ARCH_SET_FS, 0x7f2112ccd740) = 0
mprotect(0x7f2112873000, 16384, PROT_READ) = 0
mprotect(0x7f211246c000, 385024, PROT_READ) = 0
mprotect(0x7f2112ac9000, 4096, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7f2112cf5000, 4096, PROT_READ) = 0
munmap(0x7f2112cd0000, 144976)          = 0
exit_group(0)                           = ?
+++ exited with 0 +++






Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Thu, 12 Mar 2015 03:04:01 GMT) Full text and rfc822 format available.

Notification sent to sur-behoffski <sur_behoffski <at> grouse.com.au>:
bug acknowledged by developer. (Thu, 12 Mar 2015 03:04:02 GMT) Full text and rfc822 format available.

Message #16 received at 19997-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 19997-done <at> debbugs.gnu.org
Subject: closing bug report
Date: Wed, 11 Mar 2015 20:02:56 -0700
As the -lpapi problem is understood now to be a side-track issue, I'm closing 
the bug report.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 09 Apr 2015 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 10 years and 133 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.