GNU bug report logs -
#21989
grep search by ASCII code unsuccessful
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 21989 in the body.
You can then email your comments to 21989 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#21989
; Package
grep
.
(Mon, 23 Nov 2015 07:57:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Shivanshu Goyal <shivanshu3 <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Mon, 23 Nov 2015 07:57:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi,
I think I found a bug which did not exist in version 2.14, but does seem to
exist in versions 2.16 and 2.22. I have not tested any other versions.
Say there is a file with the following contents:
shivanshu <at> thetis:tmp$ cat temp | xxd
0000000: 68e2 8093 680a h...h.
The following is the grep 2.14 command and output:
shivanshu <at> thetis:tmp$ cat temp | grep -P '\xe2\x80\x93'
h–h
The following is the grep 2.16/2.22 command and output:
shivanshu <at> thetis:tmp$ cat temp | grep -P '\xe2\x80\x93'
d1y8 <at> thetis:tmp$
Thanks,
Shivanshu Goyal
shivanshu.ca
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#21989
; Package
grep
.
(Mon, 23 Nov 2015 15:06:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 21989 <at> debbugs.gnu.org (full text, mbox):
2015-11-22 21:24:05 -0800, Shivanshu Goyal:
[...]
> I think I found a bug which did not exist in version 2.14, but does seem to
> exist in versions 2.16 and 2.22. I have not tested any other versions.
>
> Say there is a file with the following contents:
>
> shivanshu <at> thetis:tmp$ cat temp | xxd
> 0000000: 68e2 8093 680a h...h.
>
> The following is the grep 2.14 command and output:
>
> shivanshu <at> thetis:tmp$ cat temp | grep -P '\xe2\x80\x93'
> h–h
>
> The following is the grep 2.16/2.22 command and output:
>
> shivanshu <at> thetis:tmp$ cat temp | grep -P '\xe2\x80\x93'
> d1y8 <at> thetis:tmp$
[...]
If you read the pcrepattern man page, you'll see that \xe2
doesn't match the byte e2, but the character of code e2.
If you're in a UTF-8 locale, \xe2 would match the character of
Unicode code point e2 (LATIN SMALL LETTER A WITH CIRCUMFLEX)
which in UTF-8 is written as the bytes c3 a2.
The sequence e2 80 93 is actually the one character U+2013 (EN
DASH). So, here, you either want:
LC_ALL=C grep -P '\xe2\x80\x93'
That is use a locale where characters are single-byte and their
code is the byte value, or assuming the current locale is UTF-8,
use:
grep -P '\x{2013}'
Or, regardless of the locale:
grep -P '(*UTF8)\x{2013}'
--
Stephane
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Mon, 23 Nov 2015 16:17:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Shivanshu Goyal <shivanshu3 <at> gmail.com>
:
bug acknowledged by developer.
(Mon, 23 Nov 2015 16:17:02 GMT)
Full text and
rfc822 format available.
Message #13 received at 21989-done <at> debbugs.gnu.org (full text, mbox):
Thanks, Stephane, for diagnosing the problem. Closing the bug.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#21989
; Package
grep
.
(Mon, 23 Nov 2015 16:45:03 GMT)
Full text and
rfc822 format available.
Message #16 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Correction:
The following is the grep 2.16/2.22 command and output:
(It doesn't output anything)
shivanshu <at> thetis:tmp$ cat temp | grep -P '\xe2\x80\x93'
shivanshu <at> thetis:tmp$
On Sun, Nov 22, 2015 at 9:24 PM Shivanshu Goyal <shivanshu3 <at> gmail.com>
wrote:
> Hi,
>
> I think I found a bug which did not exist in version 2.14, but does seem
> to exist in versions 2.16 and 2.22. I have not tested any other versions.
>
> Say there is a file with the following contents:
>
> shivanshu <at> thetis:tmp$ cat temp | xxd
> 0000000: 68e2 8093 680a h...h.
>
> The following is the grep 2.14 command and output:
>
> shivanshu <at> thetis:tmp$ cat temp | grep -P '\xe2\x80\x93'
> h–h
>
> The following is the grep 2.16/2.22 command and output:
>
> shivanshu <at> thetis:tmp$ cat temp | grep -P '\xe2\x80\x93'
> d1y8 <at> thetis:tmp$
>
> Thanks,
> Shivanshu Goyal
> shivanshu.ca
>
[Message part 2 (text/html, inline)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 22 Dec 2015 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 9 years and 180 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.