Paul Eggert wrote, in response to my suggestion to filter grep output, not input, for "binary junk":
>> We've done that already, if memory serves.
I don't think so :).
The installed grep on the system I'm typing on right now is "grep (GNU grep) 3.0".
I've not checked closely, but I believe that should be a fairly recent grep.
I created a large file ("/tmp/pjbb") by concatenating:
1) a big plain ASCII file of C source code,
2) a small ELF executable, and
3) another big plain ASCII file of C source code.
Then I grep'd in this big file for the string "
pj@usa.net", which
appeared twice in the first file of C source code, and once
again in the second file of C source code.
Here's what I see:
============================
$ grep --version | head -1
grep (GNU grep) 3.0
Binary file /tmp/pjbb matches
============================
then abandons the search before seeing the third
such, when it first encounters the ELF binary.
Using "grep -a" to ask grep to persist, it sees all
===
My ancient home-brew hack that provides ASCII trimmed
output when scanning binary files for ASCII strings, contains
custom code to buffer the already scanned input, in order
that it can then scan backwards, once it finds a match.
The usual line oriented buffering doesn't work so well when
the input file might have no, or at least infrequent, line breaks.
--
Paul Jackson
pj@usa.net