GNU bug report logs - #16232
[PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales

Previous Next

Package: grep;

Reported by: Jim Meyering <jim <at> meyering.net>

Date: Mon, 23 Dec 2013 22:40:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: 16232 <16232 <at> debbugs.gnu.org>
Subject: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales
Date: Sat, 11 Jan 2014 20:36:31 -0800
On Sat, Jan 11, 2014 at 6:15 AM, Pádraig Brady <P <at> draigbrady.com> wrote:
> On 01/11/2014 11:33 AM, Pádraig Brady wrote:
...
> This is also a good summary of stuff to consider with case:
> http://www.unicode.org/faq/casemap_charprop.html
>
> So picking another case situation from there:
>   "in the Greek script, capital sigma (U+03A3) is the uppercase form of both
>    the regular (U+03C2) and final (U+03C3) lowercase sigma."
>
> One can see that sed handles this:
>   $ printf '\u03C2\u03C3\n' | sed 's/.*/&\U&/'
>   ςσΣΣ
>   $ printf '\u03A3\n' | sed 's/.*/&\L&/'
>   Σσ
>
> Though I was surprised the grep (2.14) didn't match any combo of these
>   $ printf '\u03C2\u03C3\n' | grep -Fi "$(printf \u03A3)"
>   $ printf '\u03A3\n' | grep -Fi "$(printf \u03C2)"
>   $ printf '\u03A3\n' | grep -Fi "$(printf \u03C3)"

Actually, if you quote the argument to the latter printf, two of those
do match, both with -F and without:

$ printf '\u03C2\u03C3\n' | grep -Fi "$(printf '\u03A3')"
ςσ
$ printf '\u03A3\n' | grep -Fi "$(printf '\u03C2')"
$ printf '\u03A3\n' | grep -Fi "$(printf '\u03C3')"
Σ




This bug report was last modified 11 years and 82 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.