GNU bug report logs -
#24009
[PATCH] grep: use fastmap in regex
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Sat, 16 Jul 2016 16:59:02 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your message dated Thu, 1 Sep 2016 22:32:12 -0700
with message-id <1b4e4883-9a2e-b974-1a9c-c4409a84c1c2 <at> cs.ucla.edu>
and subject line Re: bug#24009: [PATCH] grep: use fastmap in regex
has caused the debbugs.gnu.org bug report #24009,
regarding [PATCH] grep: use fastmap in regex
to be marked as done.
(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)
--
24009: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=24009
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
[Message part 3 (text/plain, inline)]
sed and gawk use fastmap in regex, but grep does not. By using fastmap,
I expect that grep speeds up for patterns as regex is used.
before:
$ time -p env LC_ALL=ja_JP.eucjp src/grep '\([a-b]\)\1' k
real 7.83
user 7.62
sys 0.07
after:
$ time -p env LC_ALL=ja_JP.eucjp src/grep '\([a-b]\)\1' k
real 0.46
user 0.38
sys 0.07
However, if grep uses fastmap, fails in case-fold-titlecase test. It
means that grep's behavior differ from sed and gawk, as they use fastmap,
although it seems to be a bug in regex.
[0001-grep-use-fastmap-in-regex.patch (text/plain, attachment)]
[Message part 5 (message/rfc822, inline)]
[Message part 6 (text/plain, inline)]
Norihiro Tanaka wrote:
> I think this patch should be suspended because of this issue.
> I reported it to glibc developers. https://sourceware.org/bugzilla/show_bug.cgi?id=20381
After thinking about it a bit, I came up with a variant of the patch that gives
the performance improvement unless -i is used, so I installed the attached
patches. The first patch is mostly just refactoring this somewhat-crufty code
and fixing an O(N**2) reallocation problem. The second is the real improvement.
The second patch just captures the low-hanging fruit. For example, even with -i
we could use a fastmap if all the pattern's letters (including letters matched
by ranges) happen to avoid the glibc bug. Something like that might be worth
pursuing.
Since the attached patch fixes the test case that prompted the bug report I'm
closing the bug. We can reopen it, or open a new one, if someone wants to fix
the remaining performance glitches.
Thanks again for all these fixes!
[0001-grep-improve-dfasearch-storage-management.patch (text/x-diff, attachment)]
[0002-grep-use-regex-fastmap-unless-i.patch (text/x-diff, attachment)]
This bug report was last modified 8 years and 323 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.