GNU bug report logs - #24975
Matching issues with characters whose encoding ends in some other character

Previous Next

Package: grep;

Reported by: Stephane Chazelas <stephane.chazelas <at> gmail.com>

Date: Sun, 20 Nov 2016 21:51:01 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 24975 <at> debbugs.gnu.org, Stephane Chazelas <stephane.chazelas <at> gmail.com>
Subject: bug#24975: Matching issues with characters whose encoding ends in some other character
Date: Mon, 28 Nov 2016 09:11:55 -0800
On Mon, Nov 28, 2016 at 5:49 AM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Jim Meyering <jim <at> meyering.net> wrote:
>
>> I suspect this won't be the last word in this area, because it feels
>> like we should be able to adjust DFA's tables so that people using
>> such locales can retain DFA's efficiency without the bug in the
>> current implementation.
>
> Hi Jim,
>
> It is a bug in dfa for period expression in non-UTF8 locales.  dfa
> calculates transition for single byte characters and a multibyte
> character separately and merge both results.  However, if backs to
> an initial state in transition for single byte characters, we should
> stop matching single byte characters.

Nice work. Thank you.




This bug report was last modified 8 years and 258 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.