#19997 - Performance differences between Git sources and release tarballs?

GNU bug report logs - #19997
Performance differences between Git sources and release tarballs?

Package: grep;

Reported by: sur-behoffski <sur_behoffski <at> grouse.com.au>

Date: Wed, 4 Mar 2015 03:22:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: sur-behoffski <sur_behoffski <at> grouse.com.au> To: 19997 <at> debbugs.gnu.org Subject: bug#19997: Performance differences between Git sources and release tarballs? Date: Wed, 04 Mar 2015 13:51:32 +1030

G'day, I'm working on comparing the performance of my Boyer-Moore string search code in the hstbm[1] project, versus the equivalent code in GNU Grep (src/kwset.c). My hope is to expand the very narrow corpus of patterns and file data types that I've used to-date in my testing. The major risk in varying a tuned algorithm such as B-M is that pathological data, and/or important normal cases, may suffer a performance hit as a result of the variations. I'm trying to locate, or perhaps build, a corpus, together with a detailed performance profile, for the limited set of hardware that I have access to. Generalising the testing to other architectures, OSes and/or compiler toolchains is a further target. (I note that, in the recent past, others have suggested Project Gutenberg's text of the King James Version of the Bible as one possible member of such a test corpus.) I'm looking to use PAPI[2] in order to use hardware counters to help pick apart where CPU time is spent; but more on that when I have some sentient results to report. I decided to start by looking at why GNU Grep was significantly slower than hstbm when searching for a trivial pattern ("123") in /dev/null. I knew that one test (invoking the command 1000 times inside a timed shell script) that GNU Grep was roughly 20% slower. I found that "fgrep" -- more precisely, "grep -F", was a significant factor -- the grep pattern compilation was more expensive than the fgrep (src/kwsearch.c) compilation. NLS is another difference; hstbm does not call "bindtextdomain" or "textdomain"; but again, more on that another time. When starting to add instrumentation to GNU Grep (grep.c's main), I found that it was up to 300% slower, not the 20%ish that I'd measured previously. After vanishing down a number of rabbit-holes, some to do with GCC's architecture selection, I've found that the 2.21 tarball has high performance, whereas the Git head is much, much slower. (I use a source-based (Gentoo) Linux OS, so am able to dissect the stages of Gentoo's build steps... a little.) Looking at Grep's "configure --help" output, I see that "--enable-gcc-warnings" is an option, and, with some experimentation with invoking compilation in different environments, it seems that a slew of warning options is enabled in the development tree, that are not enabled in the release tarball. This is when I naively invoke ./configure without looking closely at all the possible configuration options. So, could you please give me some guidance as to why the release tarball would build so differently to the development (Git head) set of sources? Apologies in advance if there's some documentation that I overlooked. thanks, sur-behoffski (Brenton Hoff) Programmer, Grouse Software [1] http://savannah.nongnu.org/projects/hstbm [2] http://icl.cs.utk.edu/papi/

This bug report was last modified 10 years and 133 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #19997 Performance differences between Git sources and release tarballs?

GNU bug report logs - #19997
Performance differences between Git sources and release tarballs?