GNU bug report logs - #24975
Matching issues with characters whose encoding ends in some other character

Previous Next

Package: grep;

Reported by: Stephane Chazelas <stephane.chazelas <at> gmail.com>

Date: Sun, 20 Nov 2016 21:51:01 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: jim <at> meyering.net
Cc: 24975 <at> debbugs.gnu.org, stephane.chazelas <at> gmail.com
Subject: bug#24975: Matching issues with characters whose encoding ends in some other character
Date: Mon, 28 Nov 2016 22:49:27 +0900
[Message part 1 (text/plain, inline)]
Jim Meyering <jim <at> meyering.net> wrote:

> I suspect this won't be the last word in this area, because it feels
> like we should be able to adjust DFA's tables so that people using
> such locales can retain DFA's efficiency without the bug in the
> current implementation.

Hi Jim,

It is a bug in dfa for period expression in non-UTF8 locales.  dfa
calculates transition for single byte characters and a multibyte
character separately and merge both results.  However, if backs to
an initial state in transition for single byte characters, we should
stop matching single byte characters.

Thanks,
Norihiro
[0001-dfa-avoid-match-middle-in-multibyte-character.patch (text/plain, attachment)]

This bug report was last modified 8 years and 258 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.