GNU bug report logs - #16927
[PATCH] grep: avoid to add same character to a bracket expression

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Mon, 3 Mar 2014 13:14:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 16927 <at> debbugs.gnu.org
Subject: bug#16927: [PATCH] grep: avoid to add same character to a bracket expression
Date: Mon, 03 Mar 2014 22:13:00 +0900
[Message part 1 (text/plain, inline)]
Package: grep
Tags: patch

The patch avoids to add same character to a bracket expression in
trivial_case_ignore.  That may be able to generate smaller tokens in
multibyte locales.

For example, FULLWIDTH LATIN CAPITAL LETTER A (ef bd 81) will transform
as below, because multibyte characters in CSET is extended to OR
expressions in DFA.

Before the patch:

[AAa] (where each charactecter is fullwidth)
EF BD CAT 81 CAT EF BD CAT 81 CAT OR EF BC CAT A1 CAT OR

After the patch:

[Aa] (where each charactecter is fullwidth)
EF BD CAT 81 CAT EF BC CAT A1 CAT OR
[patch.txt (application/octet-stream, attachment)]

This bug report was last modified 11 years and 178 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.