From unknown Wed Jun 18 23:04:46 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#21989 <21989@debbugs.gnu.org> To: bug#21989 <21989@debbugs.gnu.org> Subject: Status: grep search by ASCII code unsuccessful Reply-To: bug#21989 <21989@debbugs.gnu.org> Date: Thu, 19 Jun 2025 06:04:46 +0000 retitle 21989 grep search by ASCII code unsuccessful reassign 21989 grep submitter 21989 Shivanshu Goyal severity 21989 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 23 02:56:13 2015 Received: (at submit) by debbugs.gnu.org; 23 Nov 2015 07:56:14 +0000 Received: from localhost ([127.0.0.1]:48889 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0lyd-00031S-3d for submit@debbugs.gnu.org; Mon, 23 Nov 2015 02:56:13 -0500 Received: from eggs.gnu.org ([208.118.235.92]:41853) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0jc5-0007R0-5V for submit@debbugs.gnu.org; Mon, 23 Nov 2015 00:24:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a0jc3-0000F8-VS for submit@debbugs.gnu.org; Mon, 23 Nov 2015 00:24:28 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_40, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:38836) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0jc3-0000F3-Rp for submit@debbugs.gnu.org; Mon, 23 Nov 2015 00:24:27 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51855) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0jc2-0004xU-Oq for bug-grep@gnu.org; Mon, 23 Nov 2015 00:24:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a0jc1-0000Er-Uh for bug-grep@gnu.org; Mon, 23 Nov 2015 00:24:26 -0500 Received: from mail-oi0-x22c.google.com ([2607:f8b0:4003:c06::22c]:34018) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0jc1-0000Ed-PI for bug-grep@gnu.org; Mon, 23 Nov 2015 00:24:25 -0500 Received: by oies6 with SMTP id s6so112615187oie.1 for ; Sun, 22 Nov 2015 21:24:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=TtA78HbGqw0CtaiAHpabLNtVuztzrPw4qcom2TsWqnc=; b=dGvwjc87J+ykjuS30jxoeeYf6c6cK9yV/Kyy3/v2dvCD2EsgiLDekh2205K7SF9rCB MQBWD62kXgwH401XuBWwm3vKnEb9UuplVk7I0WNDhlJ8hNpjKPgGINLusIZ9On6Jmfcn apdrYMXrNujQhZke1EPKvpCodz3WCTUAJ2TyJs0zdf2RSvvjubpRN5RpILApaKN/faKq kYcas7Rnn0MeCWFdZMzH6UpI+cEkwxhNJEdsvamTyCLTOZxowW5X1nVe6P4309mjJCLn 7X69PIPut9nJ/JO7C1c+/LgkVHnkkpkwzxSQfnvUOxwkNpUdsScgKc2DZEL7uJWIyrK8 Jm5g== X-Received: by 10.60.77.34 with SMTP id p2mr15136070oew.21.1448256264410; Sun, 22 Nov 2015 21:24:24 -0800 (PST) MIME-Version: 1.0 Received: by 10.60.59.193 with HTTP; Sun, 22 Nov 2015 21:24:05 -0800 (PST) From: Shivanshu Goyal Date: Sun, 22 Nov 2015 21:24:05 -0800 Message-ID: Subject: grep search by ASCII code unsuccessful To: bug-grep@gnu.org Content-Type: multipart/alternative; boundary=047d7b33cac22f1f6805252e70a0 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -3.8 (---) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 23 Nov 2015 02:55:54 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.8 (---) --047d7b33cac22f1f6805252e70a0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, I think I found a bug which did not exist in version 2.14, but does seem to exist in versions 2.16 and 2.22. I have not tested any other versions. Say there is a file with the following contents: shivanshu@thetis:tmp$ cat temp | xxd 0000000: 68e2 8093 680a h...h. The following is the grep 2.14 command and output: shivanshu@thetis:tmp$ cat temp | grep -P '\xe2\x80\x93' h=E2=80=93h The following is the grep 2.16/2.22 command and output: shivanshu@thetis:tmp$ cat temp | grep -P '\xe2\x80\x93' d1y8@thetis:tmp$ Thanks, Shivanshu Goyal shivanshu.ca --047d7b33cac22f1f6805252e70a0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

I think I found a bug which did not= exist in version 2.14, but does seem to exist in versions 2.16 and 2.22. I= have not tested any other versions.

Say there is = a file with the following contents:

shivanshu@thetis:tmp$ cat temp | xxd
0000000: 68e2 8093 680a =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 h...h.

=
The following= is the grep 2.14 command and output:

s= hivanshu@thetis:tmp$ cat temp | grep -P '\xe2\x80\x93'
=
h=E2=80=93= h

The following is the g= rep 2.16/2.22 command and output:

shivanshu@thetis:tmp$ cat te= mp | grep -P '\xe2\x80\x93'
d1y8@thetis:tmp$

Th= anks,
Shivanshu Goyal
shivanshu.ca
--047d7b33cac22f1f6805252e70a0-- From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 23 10:05:21 2015 Received: (at 21989) by debbugs.gnu.org; 23 Nov 2015 15:05:22 +0000 Received: from localhost ([127.0.0.1]:49643 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0sgD-0000Tj-5T for submit@debbugs.gnu.org; Mon, 23 Nov 2015 10:05:21 -0500 Received: from mail-wm0-f48.google.com ([74.125.82.48]:38131) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0sg9-0000TZ-58 for 21989@debbugs.gnu.org; Mon, 23 Nov 2015 10:05:17 -0500 Received: by wmec201 with SMTP id c201so109105924wme.1 for <21989@debbugs.gnu.org>; Mon, 23 Nov 2015 07:05:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=hAoS2AGS0T2EHYiBPkYYnYMEgGhWf62xt+UJMOC90cQ=; b=ukoCyzO8jQiCMv+QzrgUQtxFFgYozGAebWwvThXrE8cNrVBrqs/PoWXJFAt7i/y7TL 5mVk8zuYAN2UJrnQzt3iVxVPvVqsUW7LS+FF0lcv9ZcOHqLtBAGdpyNNMeIF9NSjBMVo FAL4u1hniBqInttqxuyKjHnl7MENMOcPj/3zUAl8WBOb/YVdT5nI6AiOAgWB1lbsvNW1 EF4ms555MQW9MdxCvYsw7LWzz46McyniWcDtW1ZI878o5mB8DjbXRZ2FCBSK4nb0I5A8 grloa+OMPqMMc/nG2JQzgdmiWmzZn2CW4zEyjueGzVe+dDbVGNZOyOcm6WuQ+Ej0ZOkf XhaA== X-Received: by 10.194.84.4 with SMTP id u4mr36939331wjy.149.1448291116575; Mon, 23 Nov 2015 07:05:16 -0800 (PST) Received: from chaz.gmail.com ([2.121.21.200]) by smtp.gmail.com with ESMTPSA id bh6sm2691118wjb.0.2015.11.23.07.05.15 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 23 Nov 2015 07:05:15 -0800 (PST) Date: Mon, 23 Nov 2015 15:05:14 +0000 From: Stephane Chazelas To: Shivanshu Goyal Subject: Re: bug#21989: grep search by ASCII code unsuccessful Message-ID: <20151123150514.GB18811@chaz.gmail.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 21989 Cc: 21989@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) 2015-11-22 21:24:05 -0800, Shivanshu Goyal: [...] > I think I found a bug which did not exist in version 2.14, but does seem to > exist in versions 2.16 and 2.22. I have not tested any other versions. > > Say there is a file with the following contents: > > shivanshu@thetis:tmp$ cat temp | xxd > 0000000: 68e2 8093 680a h...h. > > The following is the grep 2.14 command and output: > > shivanshu@thetis:tmp$ cat temp | grep -P '\xe2\x80\x93' > h–h > > The following is the grep 2.16/2.22 command and output: > > shivanshu@thetis:tmp$ cat temp | grep -P '\xe2\x80\x93' > d1y8@thetis:tmp$ [...] If you read the pcrepattern man page, you'll see that \xe2 doesn't match the byte e2, but the character of code e2. If you're in a UTF-8 locale, \xe2 would match the character of Unicode code point e2 (LATIN SMALL LETTER A WITH CIRCUMFLEX) which in UTF-8 is written as the bytes c3 a2. The sequence e2 80 93 is actually the one character U+2013 (EN DASH). So, here, you either want: LC_ALL=C grep -P '\xe2\x80\x93' That is use a locale where characters are single-byte and their code is the byte value, or assuming the current locale is UTF-8, use: grep -P '\x{2013}' Or, regardless of the locale: grep -P '(*UTF8)\x{2013}' -- Stephane From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 23 11:16:38 2015 Received: (at 21989-done) by debbugs.gnu.org; 23 Nov 2015 16:16:39 +0000 Received: from localhost ([127.0.0.1]:49673 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0tnC-000468-H1 for submit@debbugs.gnu.org; Mon, 23 Nov 2015 11:16:38 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:38541) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0tnA-00045y-U6 for 21989-done@debbugs.gnu.org; Mon, 23 Nov 2015 11:16:37 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id CC6C91605AF; Mon, 23 Nov 2015 08:16:35 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id BLnHDqM0zQh1; Mon, 23 Nov 2015 08:16:35 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 3725A160779; Mon, 23 Nov 2015 08:16:35 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id x1eY0KpkZowT; Mon, 23 Nov 2015 08:16:35 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 1D0571605AF; Mon, 23 Nov 2015 08:16:35 -0800 (PST) Subject: Re: bug#21989: grep search by ASCII code unsuccessful To: Stephane Chazelas , Shivanshu Goyal References: <20151123150514.GB18811@chaz.gmail.com> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <56533BE0.9070706@cs.ucla.edu> Date: Mon, 23 Nov 2015 08:16:32 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20151123150514.GB18811@chaz.gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.6 (/) X-Debbugs-Envelope-To: 21989-done Cc: 21989-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.6 (/) Thanks, Stephane, for diagnosing the problem. Closing the bug. From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 23 11:44:12 2015 Received: (at submit) by debbugs.gnu.org; 23 Nov 2015 16:44:12 +0000 Received: from localhost ([127.0.0.1]:49704 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0uDs-0004tc-Ae for submit@debbugs.gnu.org; Mon, 23 Nov 2015 11:44:12 -0500 Received: from eggs.gnu.org ([208.118.235.92]:43019) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0jj2-0007dA-DC for submit@debbugs.gnu.org; Mon, 23 Nov 2015 00:31:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a0jj0-0001i2-U0 for submit@debbugs.gnu.org; Mon, 23 Nov 2015 00:31:39 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: * X-Spam-Status: No, score=1.1 required=5.0 tests=BAYES_50, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:57403) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0jj0-0001hy-QQ for submit@debbugs.gnu.org; Mon, 23 Nov 2015 00:31:38 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53024) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0jiz-0006mK-PK for bug-grep@gnu.org; Mon, 23 Nov 2015 00:31:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a0jiy-0001hb-OL for bug-grep@gnu.org; Mon, 23 Nov 2015 00:31:37 -0500 Received: from mail-oi0-x22a.google.com ([2607:f8b0:4003:c06::22a]:33591) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0jiy-0001hX-JW for bug-grep@gnu.org; Mon, 23 Nov 2015 00:31:36 -0500 Received: by oixx65 with SMTP id x65so111483736oix.0 for ; Sun, 22 Nov 2015 21:31:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-type; bh=bTgkVdf0ZF7j2VY6WI8X3kbtc7/LtXCutSG2Ky8hbio=; b=UJRBmiNWFkVlZc3dNY163fPPeJJVr4yaD+K032qiDs5W6LjzohqyM9CmimA+09pB3e R15dakcr1VTlS6mVQf9O+tLdI0Ftst+LFNUKeWCs8EyRgJUYxxRMgeBA0HBHjlYPhCBm 4wWj+XCW9/lodHw4d1ts64IUX1EbEqQID7YyEC0djcC6Nx5etwkZk6mONafj00YQgiSE SwEoG9ZQVuok4nPC8IUOHLUEqnxCveyr/9Qm4MAlwIxzhPZ2EGObJpGeNfNAaVSmzTL3 7vgIcoFq99xOluZrinhkRG8EZzK+AtqB1Ydd+8hbfxBrmUK8rIOgtG4astvPZKCmPNxV mC4g== X-Received: by 10.60.65.6 with SMTP id t6mr15307894oes.47.1448256696188; Sun, 22 Nov 2015 21:31:36 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Shivanshu Goyal Date: Mon, 23 Nov 2015 05:31:26 +0000 Message-ID: Subject: Re: grep search by ASCII code unsuccessful To: bug-grep@gnu.org Content-Type: multipart/alternative; boundary=001a11c1a328eb899d05252e890c X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -3.8 (---) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 23 Nov 2015 11:44:02 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.8 (---) --001a11c1a328eb899d05252e890c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Correction: The following is the grep 2.16/2.22 command and output: (It doesn't output anything) shivanshu@thetis:tmp$ cat temp | grep -P '\xe2\x80\x93' shivanshu@thetis:tmp$ On Sun, Nov 22, 2015 at 9:24 PM Shivanshu Goyal wrote: > Hi, > > I think I found a bug which did not exist in version 2.14, but does seem > to exist in versions 2.16 and 2.22. I have not tested any other versions. > > Say there is a file with the following contents: > > shivanshu@thetis:tmp$ cat temp | xxd > 0000000: 68e2 8093 680a h...h. > > The following is the grep 2.14 command and output: > > shivanshu@thetis:tmp$ cat temp | grep -P '\xe2\x80\x93' > h=E2=80=93h > > The following is the grep 2.16/2.22 command and output: > > shivanshu@thetis:tmp$ cat temp | grep -P '\xe2\x80\x93' > d1y8@thetis:tmp$ > > Thanks, > Shivanshu Goyal > shivanshu.ca > --001a11c1a328eb899d05252e890c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Correction:

The following is the grep 2.16/2.22 command and outpu= t:
(It doesn't output anything)=

shivanshu@thetis:tmp$ cat temp | grep -P '= ;\xe2\x80\x93'
shi= vanshu@thetis:tmp$

<= div dir=3D"ltr">On Sun, Nov 22, 2015 at 9:24 PM Shivanshu Goyal <shivanshu3@gmail.com> wrote:
Hi,

I= think I found a bug which did not exist in version 2.14, but does seem to = exist in versions 2.16 and 2.22. I have not tested any other versions.

Say there is a file with the following contents:
=

shivanshu@thet= is:tmp$ cat temp | xxd
0000000: 68e2 8093 680a =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 h...h.

The following is the grep 2.14 command and output:=

shivanshu@thetis:tmp$ cat temp | grep -P '= ;\xe2\x80\x93'
h=E2=80=93h

The following is the grep 2.16/2.22 command and output:

shivanshu@thetis:tmp$ cat temp | grep -P '\xe2\x80\x93'
d1y8@thetis:tmp$

Thanks,
Shivanshu Goyal
=
--001a11c1a328eb899d05252e890c-- From unknown Wed Jun 18 23:04:46 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 22 Dec 2015 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator