GNU bug report logs - #51669
some patterns which should match 0x0 don’t do so

Previous Next

Package: grep;

Reported by: Christoph Anton Mitterer <calestyo <at> scientia.org>

Date: Sun, 7 Nov 2021 17:30:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 51669 in the body.
You can then email your comments to 51669 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#51669; Package grep. (Sun, 07 Nov 2021 17:30:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Christoph Anton Mitterer <calestyo <at> scientia.org>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sun, 07 Nov 2021 17:30:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Christoph Anton Mitterer <calestyo <at> scientia.org>
To: bug-grep <at> gnu.org
Subject: some patterns which should match 0x0 don’t do so
Date: Sun, 07 Nov 2021 17:56:56 +0100
Hey.

Maybe this is no a bug at all due grep rather being focused on text
files and 0x0 being special anyway, but just for your information:



$ hd test-with-0x00-and-0x02 
00000000  66 6f 6f 0a 62 61 72 0a  7a 65 02 00 0a 62 61 7a  |foo.bar.ze...baz|
00000010  0a 7a 65 72 00 0a 65 6e  64 0a                    |.zer..end.|
0000001a

If one now does:
$ grep '[^[:alnum:][:space:][:punct:]]' test-with-0x00-and-0x02
grep: test-with-0x00-and-0x02: binary file matches

it matches, presumably only the 0x02, though.


Having only 0x00 in the file:
$ hd test-with-0x00-only
00000000  66 6f 6f 0a 62 61 72 0a  7a 65 72 00 0a 62 61 7a  |foo.bar.zer..baz|
00000010  0a 7a 65 72 00 0a 65 6e  64 0a                    |.zer..end.|
0000001a

doesn’t cause a match:
$ grep '[^[:alnum:][:space:][:punct:]]' test-with-0x00-only
$

while naively I'd have assume that 0x00 should be matched as well.


Cheers,
Chris.




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Mon, 08 Nov 2021 00:38:01 GMT) Full text and rfc822 format available.

Notification sent to Christoph Anton Mitterer <calestyo <at> scientia.org>:
bug acknowledged by developer. (Mon, 08 Nov 2021 00:38:01 GMT) Full text and rfc822 format available.

Message #10 received at 51669-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Christoph Anton Mitterer <calestyo <at> scientia.org>
Cc: 51669-done <at> debbugs.gnu.org
Subject: Re: bug#51669: some patterns which should match 0x0 don’t do so
Date: Sun, 7 Nov 2021 16:37:17 -0800
That's a feature not a bug; see:

https://www.gnu.org/software/grep/manual/html_node/File-and-Directory-Selection.html

and look for --binary-files. You can use 'grep -a' to pay more attention 
to binary data.




Information forwarded to bug-grep <at> gnu.org:
bug#51669; Package grep. (Mon, 08 Nov 2021 00:46:01 GMT) Full text and rfc822 format available.

Message #13 received at 51669 <at> debbugs.gnu.org (full text, mbox):

From: Christoph Anton Mitterer <calestyo <at> scientia.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 51669 <at> debbugs.gnu.org
Subject: Re: bug#51669: some patterns which should match 0x0
 don’t do so
Date: Mon, 08 Nov 2021 01:45:06 +0100
On Sun, 2021-11-07 at 16:37 -0800, Paul Eggert wrote:
> https://www.gnu.org/software/grep/manual/html_node/File-and-Directory-Selection.html
> 
> and look for --binary-files. You can use 'grep -a' to pay more
> attention 
> to binary data.

Well I've had seen that, but why is 0x00 different from 0x02? As shown
in the example above, even *without* -a or similar, it would detect
0x02.

That's what feels a bit strange, IMO.


Cheers,
Chris.




Information forwarded to bug-grep <at> gnu.org:
bug#51669; Package grep. (Mon, 08 Nov 2021 07:21:02 GMT) Full text and rfc822 format available.

Message #16 received at 51669 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Christoph Anton Mitterer <calestyo <at> scientia.org>
Cc: 51669 <at> debbugs.gnu.org
Subject: Re: bug#51669: some patterns which should match 0x0 don’t do so
Date: Sun, 7 Nov 2021 23:20:24 -0800
On 11/7/21 16:45, Christoph Anton Mitterer wrote:

> why is 0x00 different from 0x02?
POSIX says text files cannot contain NUL bytes. They can contain 0x02 
bytes, though.

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403

More generally, in a POSIX system any method for deciding whether a file 
is text vs binary is to some extent a heuristic, and there will always 
be corners in any such heuristic.




Information forwarded to bug-grep <at> gnu.org:
bug#51669; Package grep. (Mon, 08 Nov 2021 15:01:01 GMT) Full text and rfc822 format available.

Message #19 received at 51669 <at> debbugs.gnu.org (full text, mbox):

From: Christoph Anton Mitterer <calestyo <at> scientia.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 51669 <at> debbugs.gnu.org
Subject: Re: bug#51669: some patterns which should match 0x0
 don’t do so
Date: Mon, 08 Nov 2021 15:59:51 +0100
On Sun, 2021-11-07 at 23:20 -0800, Paul Eggert wrote:
> POSIX says text files cannot contain NUL bytes. They can contain 0x02
> bytes, though.

Well yes, that' clear... but at least the console output in the 0x02
case seems to imply that grep already considers it binary (and not text
file).
So I thought it would make sense to do the same if it encounters 0x00.


Anyway... thanks for your help :-)

Cheers,
Chris.




Information forwarded to bug-grep <at> gnu.org:
bug#51669; Package grep. (Mon, 08 Nov 2021 19:06:01 GMT) Full text and rfc822 format available.

Message #22 received at 51669 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Christoph Anton Mitterer <calestyo <at> scientia.org>
Cc: 51669 <at> debbugs.gnu.org
Subject: Re: bug#51669: some patterns which should match 0x0 don’t do so
Date: Mon, 8 Nov 2021 11:05:40 -0800
On 11/8/21 06:59, Christoph Anton Mitterer wrote:
> the console output in the 0x02
> case seems to imply that grep already considers it binary (and not text
> file).
> So I thought it would make sense to do the same if it encounters 0x00.

No, because grep is documented to treat 0x00 like newline in some cases, 
such as the case you described. It does not treat 0x02 like newline.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 07 Dec 2021 12:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 251 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.