GNU bug report logs - #21763
poor performance since grep 2.19 when comparing files with grep

Previous Next

Package: grep;

Reported by: "Bennett, Steve" <S.Bennett <at> lancaster.ac.uk>

Date: Mon, 26 Oct 2015 14:19:03 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #34 received at 21763-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 21763-done <at> debbugs.gnu.org,
 22357-done <at> debbugs.gnu.org, sur-behoffski <sur_behoffski <at> grouse.com.au>,
 "L.A. Walsh" <law <at> tlinx.org>, JQK <jquan <at> redhat.com>, 22239 <at> debbugs.gnu.org,
 Trevor Cordes <gnu <at> tecnopolis.ca>, Bruno Haible <bruno <at> clisp.org>,
 Ond?ej Cifka <ondra <at> cifka.com>
Subject: Re: bug#22239: bug#22357: grep -f not only huge memory usage, but
 also huge time cost
Date: Mon, 26 Dec 2016 14:01:16 +0100
On Mon, Dec 26, 2016 at 1:06 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>
> On Fri, 23 Dec 2016 17:38:42 -0800
> Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>
>> No. Thanks, I hadn't considered that possibility. I looked into the
>> slowdown and installed the attached patches, which cause 'grep' to
>> run about as fast on this test case as grep 2.25 (though not as fast
>> as grep 2.26). The main fix is in patch 5. On my platform:
>
> Hmm, how about the following test cases, although it is extreame?
>
> $ cat pat
> 0
> 00 0
> 00 00 0
> 00 00 00 0
> 00 00 00 00 0
> 00 00 00 00 00 0
> 00 00 00 00 00 00 0
> 00 00 00 00 00 00 00 0
> 00 00 00 00 00 00 00 00 0
> 00 00 00 00 00 00 00 00 00 0
> 00 00 00 00 00 00 00 00 00 00 0
> 00 00 00 00 00 00 00 00 00 00 00 0
> 00 00 00 00 00 00 00 00 00 00 00 00 0
> 00 00 00 00 00 00 00 00 00 00 00 00 00 0
> $ yes '00 00 00 00 00 00 00 00 00 00 00 00 00' |
>   head -1000000 >inp
> $ time -p env LC_ALL=C src/grep -w -f pat inp

That took 7 seconds for me.

Here is one that is O(N^2) in the length of the literal search string:
(the search strings are sequences of '0's, with lengths from 10k.. 320k):

$ n=10000; for i in $(seq 6); do env time -f "$n %e" src/grep -f
<(printf %0${n}d 0) <<<1; ((n *= 2)); done
10000 0.44
20000 1.44
40000 5.41
80000 21.27
160000 97.27
320000 426.72




This bug report was last modified 8 years and 148 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.