GNU bug report logs - #18777
[PATCH] dfa: improvement for checking of multibyte character boundary

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Mon, 20 Oct 2014 15:05:01 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Eric Blake <eblake <at> redhat.com>
Cc: 18777 <at> debbugs.gnu.org
Subject: bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary
Date: Mon, 15 Dec 2014 23:59:32 +0900
[Message part 1 (text/plain, inline)]
On Mon, 20 Oct 2014 10:07:20 -0600
Eric Blake <eblake <at> redhat.com> wrote:

> POSIX requires that NUL, slash, dot, newline, and carriage return all be
> single bytes that cannot occur inside a multibyte character (because
> they have special meaning to file name resolution and/or terminal
> interaction); it added this requirement fairly recently, but only after
> confirming that common existing locales satisfy this constraint.  (The
> same is not true for most any other character; even though POSIX
> requires that a-z, A-Z, and 0-9 be single bytes, it does not forbid
> those characters from also being bytes embedded within multibyte
> characters).  Is it worth extending your optimization to all five of the
> POSIX-guaranteed single byte characters?

I rewrote the patch so that NUL, slash, dot and carriage return as well
as newline might be also regarded as a special character.
[0001-dfa-improvement-for-checking-of-multibyte-character-.patch (text/plain, attachment)]

This bug report was last modified 9 years and 75 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.