GNU bug report logs - #23932
dfa: use algorithm for single byte character to any single byte character in input text always

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sun, 10 Jul 2016 09:53:01 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #10 received at 23932 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 23932 <at> debbugs.gnu.org
Subject: Re: bug#23932: dfa: use algorithm for single byte character to any
 single byte character in input text always
Date: Tue, 16 Aug 2016 23:35:22 +0900
[Message part 1 (text/plain, inline)]
On Sun, 10 Jul 2016 18:51:43 +0900
Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:

> In multibyte locales, if a pattern start with period expression,
> matching is still slow, as transition table is built at run time,
> even when next character is single byte in input text.
> 
> This patch changes it into as use algorithm for single byte character to
> any single byte character in input text always.  If transition table has
> been built already and a next character in input text is single byte,
> transit to next state by reference of only pre-built transition table,
> even if from a state including ANYCHAR.
> 
> $ yes "$(printf 'a%038db\n' 0)" | head -1000000 >in
> $ env LC_ALL=C gcc -v
> Reading specs from /usr/local/lib/gcc/x86_64-pc-linux-gnu/4.4.7/specs
> Target: x86_64-pc-linux-gnu
> Configured with: ./configure --with-as=/usr/local/bin/as --with-ld=/usr/local/bin/ld --with-system-zlib --enable-__cxa_atexit
> Thread model: posix
> gcc version 4.4.7 (GCC)
> 
> patch#21486 is required before this patch.  grep will speed up by this
> patch additionaly.

I updated the patch due to change in bug#21486, and added a patch
including a minor change.
[0001-dfa-use-algorithm-for-single-byte-character-to-any-s.patch (text/plain, attachment)]
[0002-dfa-avoid-invalid-character-matches-period.patch (text/plain, attachment)]

This bug report was last modified 8 years and 265 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.