GNU bug report logs - #16232
[PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales

Previous Next

Package: grep;

Reported by: Jim Meyering <jim <at> meyering.net>

Date: Mon, 23 Dec 2013 22:40:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #47 received at 16232 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 16232 <16232 <at> debbugs.gnu.org>
Subject: Re: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes
 10x) in multibyte locales
Date: Sun, 12 Jan 2014 12:56:23 +0000
On 01/12/2014 04:36 AM, Jim Meyering wrote:
> On Sat, Jan 11, 2014 at 6:15 AM, Pádraig Brady <P <at> draigbrady.com> wrote:
>> On 01/11/2014 11:33 AM, Pádraig Brady wrote:
> ...
>> This is also a good summary of stuff to consider with case:
>> http://www.unicode.org/faq/casemap_charprop.html
>>
>> So picking another case situation from there:
>>   "in the Greek script, capital sigma (U+03A3) is the uppercase form of both
>>    the regular (U+03C2) and final (U+03C3) lowercase sigma."
>>
>> One can see that sed handles this:
>>   $ printf '\u03C2\u03C3\n' | sed 's/.*/&\U&/'
>>   ςσΣΣ
>>   $ printf '\u03A3\n' | sed 's/.*/&\L&/'
>>   Σσ
>>
>> Though I was surprised the grep (2.14) didn't match any combo of these
>>   $ printf '\u03C2\u03C3\n' | grep -Fi "$(printf \u03A3)"
>>   $ printf '\u03A3\n' | grep -Fi "$(printf \u03C2)"
>>   $ printf '\u03A3\n' | grep -Fi "$(printf \u03C3)"
> 
> Actually, if you quote the argument to the latter printf, two of those
> do match, both with -F and without:
> 
> $ printf '\u03C2\u03C3\n' | grep -Fi "$(printf '\u03A3')"
> ςσ
> $ printf '\u03A3\n' | grep -Fi "$(printf '\u03C2')"
> $ printf '\u03A3\n' | grep -Fi "$(printf '\u03C3')"
> Σ

Oops right.
So that's still no regression with the new scheme
since grep is 1:1 here for Σ and σ.

thanks,
Pádraig.




This bug report was last modified 11 years and 82 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.