GNU bug report logs -
#18777
[PATCH] dfa: improvement for checking of multibyte character boundary
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Mon, 20 Oct 2014 15:05:01 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
On Mon, 20 Oct 2014 10:07:20 -0600
Eric Blake <eblake <at> redhat.com> wrote:
> POSIX requires that NUL, slash, dot, newline, and carriage return all be
> single bytes that cannot occur inside a multibyte character (because
> they have special meaning to file name resolution and/or terminal
> interaction); it added this requirement fairly recently, but only after
> confirming that common existing locales satisfy this constraint. (The
> same is not true for most any other character; even though POSIX
> requires that a-z, A-Z, and 0-9 be single bytes, it does not forbid
> those characters from also being bytes embedded within multibyte
> characters). Is it worth extending your optimization to all five of the
> POSIX-guaranteed single byte characters?
I rewrote the patch so that NUL, slash, dot and carriage return as well
as newline might be also regarded as a special character.
[0001-dfa-improvement-for-checking-of-multibyte-character-.patch (text/plain, attachment)]
This bug report was last modified 9 years and 75 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.