GNU bug report logs - #18777
[PATCH] dfa: improvement for checking of multibyte character boundary

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Mon, 20 Oct 2014 15:05:01 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #53 received at 18777 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eric Blake <eblake <at> redhat.com>, 18777 <at> debbugs.gnu.org
Subject: Re: bug#18777: [PATCH] dfa: improvement for checking of multibyte
 character boundary
Date: Thu, 18 Dec 2014 08:50:19 +0900
On Wed, 17 Dec 2014 09:46:09 -0800
Paul Eggert <eggert <at> cs.ucla.edu> wrote:

> Yes, and that's the point: we don't want this if-statement to be pruned
> if WCP != NULL.  We want the code to return P right away in the typical
> case where P is at a character boundary.  If MBP is way less than P,
> this will save the work of the following loop.

We must set a wide character for not next but previous character to WCP
in a case to return P.

For example, I assume following sequence in Shift_JIS locale.  A pair of
0x95 0x5c is a multibyte character in Shift_JIS locale.  I assume to
input MBP = position (a) and P = position (d) into skip_remains_mb().

      0x41 0x95 0x5c 0x0a
    (a)  (b)  (c)  (d)

If WCP == NULL, we can return P right away.  On the other hands, if
WCP != NULL, we must set a wide character for 0x95 0x5c to WCP before
return P.

Do you have any ideas to utilize always_character_boundary for the case?





This bug report was last modified 9 years and 75 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.