GNU bug report logs - #21763
poor performance since grep 2.19 when comparing files with grep

Previous Next

Package: grep;

Reported by: "Bennett, Steve" <S.Bennett <at> lancaster.ac.uk>

Date: Mon, 26 Oct 2015 14:19:03 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #19 received at 21763-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 22357-done <at> debbugs.gnu.org, 21763-done <at> debbugs.gnu.org
Cc: Norihiro Tanaka <noritnk <at> kcn.ne.jp>,
 sur-behoffski <sur_behoffski <at> grouse.com.au>, "L.A. Walsh" <law <at> tlinx.org>,
 Jim Meyering <jim <at> meyering.net>, "Bennett, Steve" <S.Bennett <at> lancaster.ac.uk>,
 JQK <jquan <at> redhat.com>, arnold <at> skeeve.com, 22239 <at> debbugs.gnu.org,
 Trevor Cordes <gnu <at> tecnopolis.ca>, Paul Jackson <pj <at> usa.net>,
 Bruno Haible <bruno <at> clisp.org>,
 Ondřej Cífka <ondra <at> cifka.com>,
 Bruce Dubbs <bruce.dubbs <at> gmail.com>
Subject: Re: bug#22357: grep -f not only huge memory usage, but also huge time
 cost
Date: Tue, 20 Dec 2016 21:17:01 -0800
[Message part 1 (text/plain, inline)]
I installed the attached patches into grep master. These fix the performance 
regressions noted at the start of Bug#22357. I see that the related performance 
problems noted in Bug#21763 seem to be fixed too, I expect because of Norihiro 
Tanaka's recent changes, so I'll boldly close both bug reports.

To some extent the attached patches restore the old behavior for grep -F, when 
grep is given two or more patterns. The patch doesn't change the underlying 
algorithms; it merely uses a different heuristic to decide whether to use the -F 
matcher. Although I wouldn't be surprised if the attached patches hurt 
performance in some cases, I didn't uncover any such cases in my performance 
testing, which I admit mostly consisted of running the examples in the 
abovementioned bug reports.

I'll leave Bug#22239 open, as I get the following performance figures 
(user+system CPU time) for the Bug#22239 benchmark, where list.txt is created by 
"aspell dump master | head -n 100000 >list.txt", and the grep commands all use 
the operands "-F -f list.txt /etc/passwd" in the en_US.utf8 locale on Fedora 24 
x86-64.

  no -i       -i       grep version
   0.25      0.33      2.16
   0.26     10.95      2.21
   0.11      2.90*     current master (including attached patches)

In the C locale, the current grep master is always significantly faster than 
grep 2.16 or 2.21 on the benchmark, so the only significant problem is the 
number marked "*". I ran the benchmarks on an AMD Phenom II X4 910e.
[0001-grep-simplify-line-counting-in-patterns.patch (text/x-diff, attachment)]
[0002-grep-simplify-matcher-configuration.patch (text/x-diff, attachment)]
[0003-grep-fix-performance-with-multiple-patterns.patch (text/x-diff, attachment)]

This bug report was last modified 8 years and 148 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.