GNU bug report logs - #43577
wrong result for grep -io in turkish locale

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Wed, 23 Sep 2020 13:24:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #19 received at 43577-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 43577-done <at> debbugs.gnu.org
Subject: Re: bug#43577: wrong result for grep -io in turkish locale
Date: Wed, 23 Sep 2020 19:57:36 -0700
[Message part 1 (text/plain, inline)]
On 9/23/20 6:47 PM, Norihiro Tanaka wrote:
> I attach the fix for the bug.  Regex is fixed in Paul, thank you.
> 

Thanks, I had written a similar patch, and your patch helped me find a bug in 
what I wrote. The patch I wrote uses an auxiliary ok_fold table that lets 
fgrep_icase_charlen avoid calling mbrtwoc for single-byte characters in the 
pattern; this may help performance for long patterns. More important, 
fgrep_icase_charlen does not return -1 for a character like 'a' in an 
en_US.UTF-8 locale merely because 'a' has a case folded counterpart 'A'; the 
idea is that we should be OK if the case folded counterparts are single-byte.

I had added more-extensive tests than were in your patch, and some of them found 
a crash in kwsinit that indicated a similar change is needed there. I assume 
this was because the patch I wrote had a more-generous fgrep_icase_charlen. As 
this simplifies kwsinit, this patch does that too.

While looking into this I found a performance glitch I recently introduced (I 
double-counted some regular expressions, messing up later heuristics). Plus I 
checked on this on our old Solaris 10 box and fixed a couple of porting 
glitches. I installed the attached patches, into the master branch, to help make 
it easier for you to compare your changes to mine. Patch 0003 is the enhanced 
version of the patch that you wrote.

Thanks again for working on this.
[0001-grep-fix-recently-introduced-performance-glitch.patch (text/x-patch, attachment)]
[0002-build-update-gnulib-submodule-to-latest.patch (text/x-patch, attachment)]
[0003-grep-fix-more-Turkish-eyes-bugs.patch (text/x-patch, attachment)]
[0004-grep-pacify-Sun-C-5.15.patch (text/x-patch, attachment)]
[0005-grep-don-t-assume-PCRE-in-tests.patch (text/x-patch, attachment)]

This bug report was last modified 4 years and 237 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.