GNU bug report logs - #22239
fgrep -i slow in 2.21

Previous Next

Package: grep;

Reported by: Ondřej Cífka <ondra <at> cifka.com>

Date: Fri, 25 Dec 2015 22:46:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#22239: closed (fgrep -i slow in 2.21)
Date: Tue, 17 Jan 2017 16:19:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Tue, 17 Jan 2017 08:18:30 -0800
with message-id <1eaa6f92-1d02-ba65-ef6d-d19f9807e3f7 <at> cs.ucla.edu>
and subject line Re: bug#22239: fgrep -i slow in 2.21
has caused the debbugs.gnu.org bug report #22239,
regarding fgrep -i slow in 2.21
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
22239: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22239
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Ondřej Cífka <ondra <at> cifka.com>
To: bug-grep <at> gnu.org
Subject: fgrep -i slow in 2.21
Date: Fri, 25 Dec 2015 23:44:42 +0100
When running "grep -Fi -f list.txt" where list.txt has thousands of
lines, it takes orders of magnitude longer to process the input file
than without the -i. This was not the case in grep 2.16, where fgrep
-i was about as fast as fgrep.

--
Ondřej Cífka


[Message part 3 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Ondřej Cífka <ondra <at> cifka.com>
Cc: 22239-done <at> debbugs.gnu.org
Subject: Re: bug#22239: fgrep -i slow in 2.21
Date: Tue, 17 Jan 2017 08:18:30 -0800
[Message part 4 (text/plain, inline)]
On 04/11/2016 12:14 AM, Ondřej Cífka wrote:
> You're probably right about the locale. I'm using cs_CZ.UTF-8. With
> LC_ALL=C, both variants run faster and the difference is
> insignificant.
>
> With cs_CZ.UTF-8, on my machine, your test case takes 2.322s with -i
> and 0.464s without -i.
>
> I tested on my Aspell dictionary dump, where the difference is more noticeable:
>
> aspell dump master | head -n 100000 >list.txt
>
> grep 2.21 with -i: 7.336s
> grep 2.21 without -i: 0.312s
> grep 2.16 with -i: 0.372s
> grep 2.16 without -i: 0.431s
>
> With LC_ALL=C, both versions are about as fast.

I got some free time to look into this, and installed the attached set
of patches; the 2nd one is the key one. In the en_EN.utf8 locale on my
platform (Fedora 25 x86-64), I get the following user times for 'grep
-Ff list.txt list.txt' where list.txt was generated as you describe:

   0.444 grep 2.16
   0.522 grep 2.16 -i
   0.443 grep 2.21
  13.048 grep 2.21 -i
   0.096 grep current
   0.101 grep current -i

Since this patch causes grep to use Aho-Corasick more often, I expect it
to hurt performance in some cases involving multiple patterns, but we
can look into that as they turn up. In the meantime since the original
bug seems to be fixed I am taking the liberty of closing the bug report.
[0001-build-update-gnulib-submodule-to-latest.txt (text/plain, attachment)]
[0002-Improve-i-performance-in-typical-UTF-8-searches.txt (text/plain, attachment)]
[0003-src-kwset.c-Fix-comment-typo.txt (text/plain, attachment)]
[0004-NEWS-Fix-typo.txt (text/plain, attachment)]

This bug report was last modified 8 years and 185 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.