GNU bug report logs - #21604
grep doesn't match diacritical chars in ISO-8859 files

Previous Next

Package: grep;

Reported by: Santiago Ruano Rincón <santiagorr <at> riseup.net>

Date: Fri, 2 Oct 2015 14:45:02 UTC

Severity: normal

Tags: notabug

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #12 received at 21604-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Santiago Ruano Rincón <santiagorr <at> riseup.net>,
 21604-done <at> debbugs.gnu.org
Subject: Re: bug#21604: grep doesn't match diacritical chars in ISO-8859 files
Date: Fri, 2 Oct 2015 13:01:04 -0700
On 10/02/2015 02:43 AM, Santiago Ruano Rincón wrote:
> grep doesn't match characters with diacritical
> marks in ISO-8859 files, inside a Unicode enviroment

That is normal and expected behavior.  In a UTF-8 locale, "á" is 
represented by the two bytes 0xC3 and 0xA1.  In an ISO-8859 file, the 
same character is represented by the single byte 0xE1.  The UTF-8 
pattern won't match the ISO-8859 representation.

To avoid this problem, switch to an ISO-8859 locale before using grep to 
read ISO-8859 text files.  This is true for pretty much any standard 
utility, not just grep.  Alternatively, you can translate the text files 
from ISO-8859 to UTF-8, before giving the resulting text to grep or to 
other utilities.




This bug report was last modified 9 years and 297 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.