GNU bug report logs - #43225
Grep treats extended Latin characters like whitespace

Previous Next

Package: grep;

Reported by: Mayo Fark <mayofark <at> outlook.com>

Date: Sat, 5 Sep 2020 16:06:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Mayo Fark <mayofark <at> outlook.com>
Subject: bug#43225: closed (Re: bug#43225: Grep treats extended Latin
 characters like whitespace)
Date: Wed, 09 Sep 2020 19:46:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#43225: Grep treats extended Latin characters like whitespace

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 43225 <at> debbugs.gnu.org.

-- 
43225: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=43225
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Mayo Fark <mayofark <at> outlook.com>
Cc: 43225-done <at> debbugs.gnu.org
Subject: Re: bug#43225: Grep treats extended Latin characters like whitespace
Date: Wed, 9 Sep 2020 12:45:11 -0700
[Message part 3 (text/plain, inline)]
On 9/5/20 7:27 AM, Mayo Fark wrote:

> grep -Riw cone *
> ...
> data/po/pt_BR.po:msgstr "Pressione o ícone de pódio para iniciar o tutorial"

Thanks for the bug report. This bug is due to an overenthusiastic optimization 
that I installed in late 2016. I installed the attached patch to fix the bug.
[0001-grep-fix-w-bug-in-UTF-8-locales.patch (text/x-patch, attachment)]
[Message part 5 (message/rfc822, inline)]
From: Mayo Fark <mayofark <at> outlook.com>
To: "bug-grep <at> gnu.org" <bug-grep <at> gnu.org>
Subject: Grep treats extended Latin characters like whitespace
Date: Sat, 5 Sep 2020 14:27:56 +0000
[Message part 6 (text/plain, inline)]
What I did:
```
grep -Riw cone *
'''

Expected result: lines with the word "cone" surrounded by whitespace, ignoring case.

What I got instead:
```
data/po/pt_BR.po:msgstr "Pressione o ícone de pódio para iniciar o tutorial"
'''

Why this is a bug: the word ícone is not the same as cone and should not have been returned in the result set. It appears that grep treats the í character in ícone as whitespace, which affects other extended-Latin characters as well.


[Message part 7 (text/html, inline)]

This bug report was last modified 4 years and 311 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.