GNU bug report logs - #16912
[PATCH] no longer use CSET for non-UTF8 locale in DFA engine

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sat, 1 Mar 2014 09:49:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #25 received at 16912 <at> debbugs.gnu.org (full text, mbox):

From: Paolo Bonzini <bonzini <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 16912 <at> debbugs.gnu.org
Subject: Re: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in
 DFA engine
Date: Tue, 04 Mar 2014 16:49:59 +0100
Il 03/03/2014 07:13, Paul Eggert ha scritto:
> Norihiro Tanaka wrote:
>> However I don't understand why the optimization isn't completed on
>> non-UTF8 locale only.  Can you explain it?
>
> Sorry, no; there's a lot about that code I don't yet understand.

IIRC it's because a CSET matches any byte, while the corresponding 
MBCSET only matches that byte if it is a single-byte character.  So for 
example, say "\x83A" is a two-byte character.  The CSET "A" will match 
it but the corresponding MBCSET will not.

This can happen in the Shift-JIS encoding.

Paolo





This bug report was last modified 11 years and 78 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.