GNU bug report logs -
#16232
[PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales
Previous Next
Reported by: Jim Meyering <jim <at> meyering.net>
Date: Mon, 23 Dec 2013 22:40:02 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
Message #47 received at 16232 <at> debbugs.gnu.org (full text, mbox):
On 01/12/2014 04:36 AM, Jim Meyering wrote:
> On Sat, Jan 11, 2014 at 6:15 AM, Pádraig Brady <P <at> draigbrady.com> wrote:
>> On 01/11/2014 11:33 AM, Pádraig Brady wrote:
> ...
>> This is also a good summary of stuff to consider with case:
>> http://www.unicode.org/faq/casemap_charprop.html
>>
>> So picking another case situation from there:
>> "in the Greek script, capital sigma (U+03A3) is the uppercase form of both
>> the regular (U+03C2) and final (U+03C3) lowercase sigma."
>>
>> One can see that sed handles this:
>> $ printf '\u03C2\u03C3\n' | sed 's/.*/&\U&/'
>> ςσΣΣ
>> $ printf '\u03A3\n' | sed 's/.*/&\L&/'
>> Σσ
>>
>> Though I was surprised the grep (2.14) didn't match any combo of these
>> $ printf '\u03C2\u03C3\n' | grep -Fi "$(printf \u03A3)"
>> $ printf '\u03A3\n' | grep -Fi "$(printf \u03C2)"
>> $ printf '\u03A3\n' | grep -Fi "$(printf \u03C3)"
>
> Actually, if you quote the argument to the latter printf, two of those
> do match, both with -F and without:
>
> $ printf '\u03C2\u03C3\n' | grep -Fi "$(printf '\u03A3')"
> ςσ
> $ printf '\u03A3\n' | grep -Fi "$(printf '\u03C2')"
> $ printf '\u03A3\n' | grep -Fi "$(printf '\u03C3')"
> Σ
Oops right.
So that's still no regression with the new scheme
since grep is 1:1 here for Σ and σ.
thanks,
Pádraig.
This bug report was last modified 11 years and 82 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.