GNU bug report logs - #30326
grep not searching through a text file (thinking it binary)

Previous Next

Package: grep;

Reported by: "L. A. Walsh" <gnu <at> tlinx.org>

Date: Fri, 2 Feb 2018 19:31:02 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Paul Jackson <pj <at> usa.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 30326 <at> debbugs.gnu.org
Subject: bug#30326: grep not searching through a text file (thinking it binary)
Date: Mon, 05 Feb 2018 15:27:53 -0600
[Message part 1 (text/plain, inline)]
Paul Eggert wrote, in response to my suggestion to filter grep output,
not input, for "binary junk":>> We've done that already, if memory serves.

I don't think so :).

The installed grep on the system I'm typing on right now is "grep (GNU
grep) 3.0".I've not checked closely, but I believe that should be a fairly
recent grep.
I created a large file ("/tmp/pjbb")  by concatenating:
1) a big plain ASCII file of C source code,
2) a small ELF executable, and
3) another big plain ASCII file of C source code.

Then I grep'd in this big file for the string "pj <at> usa.net", which
appeared twice in  the first file of C source code,  and once
again in the second file of C source code.

Here's what I see:
============================

*$* grep --version | head -1
grep (GNU grep) 3.0

*$* grep pj <at> usa.net /tmp/pjbb
* pj <at> usa.net
* pj <at> usa.net
Binary file /tmp/pjbb matches

*$* grep -a pj <at> usa.net /tmp/pjbb
* pj <at> usa.net
* pj <at> usa.net
* pj <at> usa.net
============================

By default, grep sees the first two "pj <at> usa.net",
then abandons the search before seeing the third
such, when it first encounters the ELF binary.

Using "grep -a" to ask grep to persist, it sees all
three "pj <at> usa.net" strings.

===

My ancient home-brew hack that provides ASCII trimmed
output when scanning binary files for ASCII strings, contains
custom code to buffer the already scanned input, in order
that it can then scan backwards, once it finds a match.

The usual line oriented buffering doesn't work so well when
the input file might have no, or at least infrequent, line breaks.

--
                Paul Jackson
                pj <at> usa.net
[Message part 2 (text/html, inline)]

This bug report was last modified 7 years and 83 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.