From debbugs-submit-bounces@debbugs.gnu.org Fri Oct 02 10:44:21 2015 Received: (at submit) by debbugs.gnu.org; 2 Oct 2015 14:44:21 +0000 Received: from localhost ([127.0.0.1]:52277 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Zi1ZM-000861-U7 for submit@debbugs.gnu.org; Fri, 02 Oct 2015 10:44:21 -0400 Received: from eggs.gnu.org ([208.118.235.92]:58492) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Zhwsr-0007zO-8Q for submit@debbugs.gnu.org; Fri, 02 Oct 2015 05:44:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zhwsq-0003UO-7Q for submit@debbugs.gnu.org; Fri, 02 Oct 2015 05:44:08 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_20,T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:40417) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zhwsq-0003U7-55 for submit@debbugs.gnu.org; Fri, 02 Oct 2015 05:44:08 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40231) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zhwsp-0003Sf-4p for bug-grep@gnu.org; Fri, 02 Oct 2015 05:44:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zhwsl-0003OZ-Od for bug-grep@gnu.org; Fri, 02 Oct 2015 05:44:07 -0400 Received: from mx1.riseup.net ([198.252.153.129]:55918) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zhwsl-0003Mq-Im for bug-grep@gnu.org; Fri, 02 Oct 2015 05:44:03 -0400 Received: from piha.riseup.net (unknown [10.0.1.162]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.riseup.net (Postfix) with ESMTPS id 57CF9C2275 for ; Fri, 2 Oct 2015 02:44:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=riseup.net; s=squak; t=1443779042; bh=OXzFaBf8vAc9CIE+1fa3BmsLk3BeOnOYdIAzX6+2Bx0=; h=Date:From:To:Subject:From; b=pmy2q5CDwpqbhbxOTdZJHg6PggV7uTLmyJHLya8DtRGQjuXUnYgo/NRATX90GqXv+ MgGJoSWlA9Q2oOAQQzB8+hfEP/wSVa/zVsrgJNEd6HZOTJIZgYgjfi+JJCbHHMIrHv fJ9HDdE8OfUlOW0+lO77dEhEZMPpVEkJhZB/l2pA= Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: santiagorr) with ESMTPSA id BE18B1407B2 Received: by nomada (sSMTP sendmail emulation); Fri, 02 Oct 2015 11:43:58 +0200 Date: Fri, 2 Oct 2015 11:43:58 +0200 From: Santiago Ruano =?iso-8859-1?Q?Rinc=F3n?= To: bug-grep@gnu.org Subject: grep doesn't match diacritical chars in ISO-8859 files Message-ID: <20151002094358.GD344@nomada> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="QWpDgw58+k1mSFBj" Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-Virus-Scanned: clamav-milter 0.98.7 at mx1.riseup.net X-Virus-Status: Clean Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.3 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 02 Oct 2015 10:44:20 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.3 (----) --QWpDgw58+k1mSFBj Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi, Moreover http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D19230 , several debian users report that grep doesn't match characters with diacritical marks in ISO-8859 files, inside a Unicode enviroment: % file /tmp/q.h=20 /tmp/q.h: ISO-8859 text % grep c /tmp/q.h Coincidencia en el fichero binario /tmp/q.h % grep -a c /tmp/q.h struct cara* lcaras; //array de caras, habr=EF=BF=BD que usar reserva= dinamica de memoria. % grep =C3=A1 /tmp/q.h=20 % grep -a =C3=A1 /tmp/q.h grep matches the "=C3=A1" pattern if it's is input from an ISO-8859 file: % grep -f a q.h=20 Coincidencia en el fichero binario q.h Test files attached Full report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=3D800670 Regards, Santiago -- System Information: Debian Release: stretch/sid APT prefers squeeze-lts APT policy: (500, 'squeeze-lts'), (500, 'oldoldstable'), (500, 'unsta= ble'), (500, 'testing'), (500, 'oldstable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores) Locale: LANG=3Des_CO.utf8, LC_CTYPE=3Des_CO.utf8 (charmap=3DUTF-8) Shell: /bin/sh linked to /bin/dash Init: sysvinit (via /sbin/init) Versions of packages grep depends on: ii dpkg 1.18.1 ii install-info 6.0.0.dfsg.1-3 ii libc6 2.19-19 ii libpcre3 2:8.35-7 --QWpDgw58+k1mSFBj Content-Type: text/x-chdr; charset=utf-8 Content-Disposition: attachment; filename="q.h" Content-Transfer-Encoding: quoted-printable struct cara* lcaras; //array de caras, habr=E1 que usar reserva dinamica= de memoria. --QWpDgw58+k1mSFBj-- From debbugs-submit-bounces@debbugs.gnu.org Fri Oct 02 16:01:46 2015 Received: (at control) by debbugs.gnu.org; 2 Oct 2015 20:01:46 +0000 Received: from localhost ([127.0.0.1]:52466 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Zi6WY-00007W-4i for submit@debbugs.gnu.org; Fri, 02 Oct 2015 16:01:46 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:47349) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Zi6WW-00007O-20 for control@debbugs.gnu.org; Fri, 02 Oct 2015 16:01:44 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 88990160998 for ; Fri, 2 Oct 2015 13:01:43 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id arg_uN1VEf6n for ; Fri, 2 Oct 2015 13:01:43 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id ECC1E160ECC for ; Fri, 2 Oct 2015 13:01:42 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 3qLn6h9mHZTt for ; Fri, 2 Oct 2015 13:01:42 -0700 (PDT) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id D784B160998 for ; Fri, 2 Oct 2015 13:01:42 -0700 (PDT) To: control@debbugs.gnu.org From: Paul Eggert Subject: 21604 is not a bug Organization: UCLA Computer Science Department Message-ID: <560EE2A6.3060103@cs.ucla.edu> Date: Fri, 2 Oct 2015 13:01:42 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) tags 21604 notabug thanks From debbugs-submit-bounces@debbugs.gnu.org Fri Oct 02 16:01:10 2015 Received: (at 21604-done) by debbugs.gnu.org; 2 Oct 2015 20:01:10 +0000 Received: from localhost ([127.0.0.1]:52463 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Zi6Vx-00006f-N2 for submit@debbugs.gnu.org; Fri, 02 Oct 2015 16:01:10 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:47296) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Zi6Vu-00006W-JB for 21604-done@debbugs.gnu.org; Fri, 02 Oct 2015 16:01:07 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id A36FE160998; Fri, 2 Oct 2015 13:01:05 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id m7_7GY83I3Cm; Fri, 2 Oct 2015 13:01:05 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id EF451160ECC; Fri, 2 Oct 2015 13:01:04 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id adqVy6IOXkUT; Fri, 2 Oct 2015 13:01:04 -0700 (PDT) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id D9766160998; Fri, 2 Oct 2015 13:01:04 -0700 (PDT) Subject: Re: bug#21604: grep doesn't match diacritical chars in ISO-8859 files To: =?UTF-8?Q?Santiago_Ruano_Rinc=c3=b3n?= , 21604-done@debbugs.gnu.org References: <20151002094358.GD344@nomada> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <560EE280.1060408@cs.ucla.edu> Date: Fri, 2 Oct 2015 13:01:04 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20151002094358.GD344@nomada> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 21604-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) On 10/02/2015 02:43 AM, Santiago Ruano Rinc=C3=B3n wrote: > grep doesn't match characters with diacritical > marks in ISO-8859 files, inside a Unicode enviroment That is normal and expected behavior. In a UTF-8 locale, "=C3=A1" is=20 represented by the two bytes 0xC3 and 0xA1. In an ISO-8859 file, the=20 same character is represented by the single byte 0xE1. The UTF-8=20 pattern won't match the ISO-8859 representation. To avoid this problem, switch to an ISO-8859 locale before using grep to=20 read ISO-8859 text files. This is true for pretty much any standard=20 utility, not just grep. Alternatively, you can translate the text files=20 from ISO-8859 to UTF-8, before giving the resulting text to grep or to=20 other utilities. From unknown Thu Aug 21 14:54:42 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 31 Oct 2015 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator