GNU bug report logs - #44754
Extreme performance degradation in GNU grep 3.4 / 3.6

Previous Next

Package: grep;

Reported by: bug-grep <at> gnu.org, f.heckenbach <at> fh-soft.de

Date: Fri, 20 Nov 2020 05:32:01 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: bug-grep <at> gnu.org, f.heckenbach <at> fh-soft.de
Cc: 44754-done <at> debbugs.gnu.org
Subject: Re: bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6
Date: Thu, 26 Nov 2020 21:41:20 -1000
On Thu, Nov 26, 2020 at 9:03 AM Jim Meyering <jim <at> meyering.net> wrote:
>
> On Wed, Nov 25, 2020 at 3:12 PM Jim Meyering <jim <at> meyering.net> wrote:
> > Thank you for the fine bug report.
> > The grep-3.6 bug you've exposed is due to the fact that your input
> > triggers excessive hash collisions when using the code modeled after
> > gnulib/lib/hash-pjw.c. That made the new pattern-preprocessing phase
> > take O(N^2) time for N patterns. In the attached, I've switched grep
> > to use the djb2 hash function, and that resolves the problem. I'll
> > also add a NEWS entry and a test before pushing this.
>
> Timings suggest that grep-3.6's preprocessing came closer to O(N^3).
> Here's an example that would take 2-3 days with grep-3.6 and only
> seconds with this fix:
>
>   : | grep -Ff <(seq 6400000 | tr 0-9 A-J)
>
> Here's a complete patch.
> I'll push it later today.

Pushed along with two gnulib-related changes.




This bug report was last modified 4 years and 225 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.