GNU bug report logs -
#21604
grep doesn't match diacritical chars in ISO-8859 files
Previous Next
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your bug report
#21604: grep doesn't match diacritical chars in ISO-8859 files
which was filed against the grep package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 21604 <at> debbugs.gnu.org.
--
21604: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=21604
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
On 10/02/2015 02:43 AM, Santiago Ruano Rincón wrote:
> grep doesn't match characters with diacritical
> marks in ISO-8859 files, inside a Unicode enviroment
That is normal and expected behavior. In a UTF-8 locale, "á" is
represented by the two bytes 0xC3 and 0xA1. In an ISO-8859 file, the
same character is represented by the single byte 0xE1. The UTF-8
pattern won't match the ISO-8859 representation.
To avoid this problem, switch to an ISO-8859 locale before using grep to
read ISO-8859 text files. This is true for pretty much any standard
utility, not just grep. Alternatively, you can translate the text files
from ISO-8859 to UTF-8, before giving the resulting text to grep or to
other utilities.
[Message part 3 (message/rfc822, inline)]
[Message part 4 (text/plain, inline)]
Hi,
Moreover http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19230 , several
debian users report that grep doesn't match characters with diacritical
marks in ISO-8859 files, inside a Unicode enviroment:
% file /tmp/q.h
/tmp/q.h: ISO-8859 text
% grep c /tmp/q.h
Coincidencia en el fichero binario /tmp/q.h
% grep -a c /tmp/q.h
struct cara* lcaras; //array de caras, habr� que usar reserva dinamica de memoria.
% grep á /tmp/q.h
% grep -a á /tmp/q.h
grep matches the "á" pattern if it's is input from an ISO-8859 file:
% grep -f a q.h
Coincidencia en el fichero binario q.h
Test files attached
Full report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800670
Regards,
Santiago
-- System Information:
Debian Release: stretch/sid
APT prefers squeeze-lts
APT policy: (500, 'squeeze-lts'), (500, 'oldoldstable'), (500, 'unstable'), (500, 'testing'), (500, 'oldstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=es_CO.utf8, LC_CTYPE=es_CO.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)
Versions of packages grep depends on:
ii dpkg 1.18.1
ii install-info 6.0.0.dfsg.1-3
ii libc6 2.19-19
ii libpcre3 2:8.35-7
[q.h (text/x-chdr, attachment)]
This bug report was last modified 9 years and 297 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.