GNU bug report logs -
#23932
dfa: use algorithm for single byte character to any single byte character in input text always
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Sun, 10 Jul 2016 09:53:01 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
Message #10 received at 23932 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Sun, 10 Jul 2016 18:51:43 +0900
Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> In multibyte locales, if a pattern start with period expression,
> matching is still slow, as transition table is built at run time,
> even when next character is single byte in input text.
>
> This patch changes it into as use algorithm for single byte character to
> any single byte character in input text always. If transition table has
> been built already and a next character in input text is single byte,
> transit to next state by reference of only pre-built transition table,
> even if from a state including ANYCHAR.
>
> $ yes "$(printf 'a%038db\n' 0)" | head -1000000 >in
> $ env LC_ALL=C gcc -v
> Reading specs from /usr/local/lib/gcc/x86_64-pc-linux-gnu/4.4.7/specs
> Target: x86_64-pc-linux-gnu
> Configured with: ./configure --with-as=/usr/local/bin/as --with-ld=/usr/local/bin/ld --with-system-zlib --enable-__cxa_atexit
> Thread model: posix
> gcc version 4.4.7 (GCC)
>
> patch#21486 is required before this patch. grep will speed up by this
> patch additionaly.
I updated the patch due to change in bug#21486, and added a patch
including a minor change.
[0001-dfa-use-algorithm-for-single-byte-character-to-any-s.patch (text/plain, attachment)]
[0002-dfa-avoid-invalid-character-matches-period.patch (text/plain, attachment)]
This bug report was last modified 8 years and 265 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.