GNU bug report logs -
#30326
grep not searching through a text file (thinking it binary)
Previous Next
Reported by: "L. A. Walsh" <gnu <at> tlinx.org>
Date: Fri, 2 Feb 2018 19:31:02 UTC
Severity: normal
Tags: notabug
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Paul Eggert wrote, in response to my suggestion to filter grep output,
not input, for "binary junk":>> We've done that already, if memory serves.
I don't think so :).
The installed grep on the system I'm typing on right now is "grep (GNU
grep) 3.0".I've not checked closely, but I believe that should be a fairly
recent grep.
I created a large file ("/tmp/pjbb") by concatenating:
1) a big plain ASCII file of C source code,
2) a small ELF executable, and
3) another big plain ASCII file of C source code.
Then I grep'd in this big file for the string "pj <at> usa.net", which
appeared twice in the first file of C source code, and once
again in the second file of C source code.
Here's what I see:
============================
*$* grep --version | head -1
grep (GNU grep) 3.0
*$* grep pj <at> usa.net /tmp/pjbb
* pj <at> usa.net
* pj <at> usa.net
Binary file /tmp/pjbb matches
*$* grep -a pj <at> usa.net /tmp/pjbb
* pj <at> usa.net
* pj <at> usa.net
* pj <at> usa.net
============================
By default, grep sees the first two "pj <at> usa.net",
then abandons the search before seeing the third
such, when it first encounters the ELF binary.
Using "grep -a" to ask grep to persist, it sees all
three "pj <at> usa.net" strings.
===
My ancient home-brew hack that provides ASCII trimmed
output when scanning binary files for ASCII strings, contains
custom code to buffer the already scanned input, in order
that it can then scan backwards, once it finds a match.
The usual line oriented buffering doesn't work so well when
the input file might have no, or at least infrequent, line breaks.
--
Paul Jackson
pj <at> usa.net
[Message part 2 (text/html, inline)]
This bug report was last modified 7 years and 83 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.