GNU bug report logs - #20526
BUG: text file is detected as binary

Previous Next

Package: grep;

Reported by: Sebastian Poehn <sebastian.poehn <at> gmail.com>

Date: Thu, 7 May 2015 15:41:03 UTC

Severity: normal

Merged with 19230, 19985, 21558

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #35 received at 20526 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Kamil Dudka <kdudka <at> redhat.com>, Eric Blake <eblake <at> redhat.com>
Cc: 20526 <at> debbugs.gnu.org,
 Sebastian Pöhn <sebastian.poehn <at> gmail.com>,
 debbugs-submit <at> debbugs.gnu.org, =?UTF-8?Q?P=C3=B6hn <at> debbugs.gnu.org
Subject: Re: bug#20526: BUG: text file is detected as binary
Date: Mon, 11 May 2015 21:27:35 -0700
Kamil Dudka wrote:
> Which bug does it fix?

I don't recall a bug report being filed for it, but the old grep behavior had 
real problems: as I remember at times it dumped core, and at other times it spit 
out improperly encoded data to the terminal.  We've fixed the core dumps I know 
about, though I think grep still outputs improperly encoded data at times (and 
this should get fixed too -- see below for a suggestion).

At any rate, applications could never assume a particular behavior for 
improperly encoded files, so the current behavior is clearly not a bug.  Users 
may be able to scrape along by setting LC_ALL=C before running 'grep' -- the 
problems LC_ALL=C runs into are about the same as the problems with using old 
'grep' (except that the new grep doesn't dump core :-).


Perhaps we can improve the behavior of grep by changing its heuristic slightly. 
 Currently grep reports "Binary file FOO matches" if it finds binary data in 
FOO before it finds the first match.  Instead, perhaps we could change grep to 
report "Binary file FOO matches" when it sees that it's about to generate binary 
*output* copied from FOO, regardless of whether this output represents the first 
match.  That is, when grep sees that it's about to output binary data, grep 
instead outputs "Binary file FOO matches" and then stops output for FOO (even if 
it already output some lines for ordinary matches in FOO).

This approach would fix the problem of grep trashing the output stream, and it 
should be less drastic than grep's current approach, in that it would make grep 
more likely to do what Kamil Dudka is asking for (assuming grep is given mostly 
valid input interspersed with small amounts of binary data).





This bug report was last modified 9 years and 138 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.