GNU bug report logs - #24009
[PATCH] grep: use fastmap in regex

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sat, 16 Jul 2016 16:59:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#24009: closed ([PATCH] grep: use fastmap in regex)
Date: Fri, 02 Sep 2016 05:33:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Thu, 1 Sep 2016 22:32:12 -0700
with message-id <1b4e4883-9a2e-b974-1a9c-c4409a84c1c2 <at> cs.ucla.edu>
and subject line Re: bug#24009: [PATCH] grep: use fastmap in regex
has caused the debbugs.gnu.org bug report #24009,
regarding [PATCH] grep: use fastmap in regex
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
24009: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=24009
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: <bug-grep <at> gnu.org>
Subject: [PATCH] grep: use fastmap in regex
Date: Sun, 17 Jul 2016 01:57:30 +0900
[Message part 3 (text/plain, inline)]
sed and gawk use fastmap in regex, but grep does not.  By using fastmap,
I expect that grep speeds up for patterns as regex is used.

before:
$ time -p env LC_ALL=ja_JP.eucjp src/grep '\([a-b]\)\1' k
real 7.83
user 7.62
sys 0.07

after:
$ time -p env LC_ALL=ja_JP.eucjp src/grep '\([a-b]\)\1' k
real 0.46
user 0.38
sys 0.07

However, if grep uses fastmap, fails in case-fold-titlecase test.  It
means that grep's behavior differ from sed and gawk, as they use fastmap,
although it seems to be a bug in regex.
[0001-grep-use-fastmap-in-regex.patch (text/plain, attachment)]
[Message part 5 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 24009-done <at> debbugs.gnu.org, Jens Schleusener <Jens.Schleusener <at> t-online.de>
Subject: Re: bug#24009: [PATCH] grep: use fastmap in regex
Date: Thu, 1 Sep 2016 22:32:12 -0700
[Message part 6 (text/plain, inline)]
Norihiro Tanaka wrote:
> I think this patch should be suspended because of this issue.
> I reported it to glibc developers.  https://sourceware.org/bugzilla/show_bug.cgi?id=20381

After thinking about it a bit, I came up with a variant of the patch that gives 
the performance improvement unless -i is used, so I installed the attached 
patches. The first patch is mostly just refactoring this somewhat-crufty code 
and fixing an O(N**2) reallocation problem. The second is the real improvement.

The second patch just captures the low-hanging fruit. For example, even with -i 
we could use a fastmap if all the pattern's letters (including letters matched 
by ranges) happen to avoid the glibc bug. Something like that might be worth 
pursuing.

Since the attached patch fixes the test case that prompted the bug report I'm 
closing the bug. We can reopen it, or open a new one, if someone wants to fix 
the remaining performance glitches.

Thanks again for all these fixes!
[0001-grep-improve-dfasearch-storage-management.patch (text/x-diff, attachment)]
[0002-grep-use-regex-fastmap-unless-i.patch (text/x-diff, attachment)]

This bug report was last modified 8 years and 323 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.