GNU bug report logs -
#18777
[PATCH] dfa: improvement for checking of multibyte character boundary
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Mon, 20 Oct 2014 15:05:01 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
Message #17 received at 18777 <at> debbugs.gnu.org (full text, mbox):
Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Eric Blake <eblake <at> redhat.com> wrote:
> > Is it worth extending your optimization to all five of the
> > POSIX-guaranteed single byte characters?
>
> Thanks, but I don't want to perform it immediately. DFA has already
> regarded newline as a single byte character, but hasn't others yet. So,
> we may need to make many changes to handle invalid locales and sequences
> not to conform to the rule. If we omitted that, It might be that limits
> are added to the locale to be able to apply DFA to. Threfore, it should
> be performed carefully.
I would think adding a check for '\r' would be safe and would help
too; given that on Windows systems '\r' generally occurs just as
frequently as '\n', it should give a nice speedup for gawk on those
systems.
The other characters that Erik cited seem less like a big issue to me.
Thanks,
Arnold
This bug report was last modified 9 years and 74 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.