GNU bug report logs - #43225
Grep treats extended Latin characters like whitespace

Previous Next

Package: grep;

Reported by: Mayo Fark <mayofark <at> outlook.com>

Date: Sat, 5 Sep 2020 16:06:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#43225: closed (Grep treats extended Latin characters like
 whitespace)
Date: Wed, 09 Sep 2020 19:46:01 +0000
[Message part 1 (text/plain, inline)]
Your message dated Wed, 9 Sep 2020 12:45:11 -0700
with message-id <87d378cf-2c5b-c0aa-a9c4-1557ecb7c40e <at> cs.ucla.edu>
and subject line Re: bug#43225: Grep treats extended Latin characters like whitespace
has caused the debbugs.gnu.org bug report #43225,
regarding Grep treats extended Latin characters like whitespace
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
43225: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=43225
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Mayo Fark <mayofark <at> outlook.com>
To: "bug-grep <at> gnu.org" <bug-grep <at> gnu.org>
Subject: Grep treats extended Latin characters like whitespace
Date: Sat, 5 Sep 2020 14:27:56 +0000
[Message part 3 (text/plain, inline)]
What I did:
```
grep -Riw cone *
'''

Expected result: lines with the word "cone" surrounded by whitespace, ignoring case.

What I got instead:
```
data/po/pt_BR.po:msgstr "Pressione o ícone de pódio para iniciar o tutorial"
'''

Why this is a bug: the word ícone is not the same as cone and should not have been returned in the result set. It appears that grep treats the í character in ícone as whitespace, which affects other extended-Latin characters as well.


[Message part 4 (text/html, inline)]
[Message part 5 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Mayo Fark <mayofark <at> outlook.com>
Cc: 43225-done <at> debbugs.gnu.org
Subject: Re: bug#43225: Grep treats extended Latin characters like whitespace
Date: Wed, 9 Sep 2020 12:45:11 -0700
[Message part 6 (text/plain, inline)]
On 9/5/20 7:27 AM, Mayo Fark wrote:

> grep -Riw cone *
> ...
> data/po/pt_BR.po:msgstr "Pressione o ícone de pódio para iniciar o tutorial"

Thanks for the bug report. This bug is due to an overenthusiastic optimization 
that I installed in late 2016. I installed the attached patch to fix the bug.
[0001-grep-fix-w-bug-in-UTF-8-locales.patch (text/x-patch, attachment)]

This bug report was last modified 4 years and 311 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.