GNU bug report logs -
#24161
[PATCH 2/2] sed: speed up matching by reguler expression with dfa matcher
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Fri, 5 Aug 2016 14:05:01 UTC
Severity: normal
Tags: patch
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your bug report
#24161: [PATCH 2/2] sed: speed up matching by reguler expression with dfa matcher
which was filed against the sed package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 24161 <at> debbugs.gnu.org.
--
24161: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=24161
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
On Mon, Aug 8, 2016 at 4:29 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>
> On Sat, 6 Aug 2016 19:20:22 -0700
> Jim Meyering <jim <at> meyering.net> wrote:
>
>> Thanks again.
>> I've revised it as follows and expect to push tomorrow:
>> - remove the abort and comment from dfaerror -- should is not
>> necessary, given the _Noreturn attribute.
>> - adjusted commit log and NEWS entry, also moving the "Improvements"
>> section to the top
>> - sorted source file names in local.mk (they were not sorted before, either)
>> - added the "make syntax-check"-required mention of sed/dfa.c in
>> po/POTFILES.in
>
> Thanks for adjusting. I agree the all changes.
Pushed.
[Message part 3 (message/rfc822, inline)]
[Message part 4 (text/plain, inline)]
Hi,
We can speeds up sed by using dfa matcher brought from grep. gawk users
it, sed does not uses it yet. It will speed up matching for typical
cases.
$ yes $(printf %040d 0) | head -1000000 >k
Before:
]$ time -p env LC_ALL=C sed/sed -ne /000000000k/p k
real 3.04
user 2.99
sys 0.03
$ time -p env LC_ALL=en_US.utf8 sed/sed -ne /000000000k/p k
real 3.04
user 2.90
sys 0.06
$ time -p env LC_ALL=ja_JP.eucjp sed/sed -ne /000000000k/p k
real 7.09
user 6.77
sys 0.31
After patching:
$ time -p env LC_ALL=C sed/sed -ne /000000000k/p k
real 0.29
user 0.15
sys 0.10
$ time -p env LC_ALL=en_US.utf8 sed/sed -ne /000000000k/p k
real 0.27
user 0.25
sys 0.02
$ time -p env LC_ALL=ja_JP.eucjp sed/sed -ne /000000000k/p k
real 0.33
user 0.29
sys 0.03
I believe that this patch can greatly improve performance of matching by
sed, however I worry about the maintenance as updates for dfa is always
done in grep.
Thanks,
Norihiro
[0002-sed-speed-up-matching-by-reguler-expression-with-dfa.patch (text/plain, attachment)]
This bug report was last modified 8 years and 340 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.