GNU bug report logs - #22838
New 'Binary file' detection considered harmful

Previous Next

Package: grep;

Reported by: Marcello Perathoner <marcello <at> perathoner.de>

Date: Sun, 28 Feb 2016 18:13:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #41 received at 22838 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Marcello Perathoner <marcello <at> perathoner.de>, 22838 <at> debbugs.gnu.org
Subject: Re: bug#22838: New 'Binary file' detection considered harmful
Date: Mon, 29 Feb 2016 15:35:19 -0800
On 02/29/2016 12:34 PM, Marcello Perathoner wrote:
> On 02/29/2016 08:29 PM, Paul Eggert wrote:
>
> They would 'blast' their terminals without grep too.

Sure, but in practice it's common for users to do something like this:

grep -r getaddrinfo_a *

I just now did this in my working copy of the GNU Emacs source code. If 
-a were the default, I would see 13874778 bytes on my screen, the vast 
majority of which would be useless or even harmful. As grep stands now, 
I see just 5480 bytes and they're mostly useful.

> I was lucky in that I noticed that a 17GB tar file could not be a 
> complete backup of a 500GB drive.

Yes, you were lucky there. But you were unlucky in that your backup 
software invoked grep without worrying about file name validity. Suppose 
a file name contained a newline? Your backups could be toast.

> At least ... make the new behaviour optional.

It is optional; we merely disagree about the option's default value.

> Since 2.21 I will now have to always specify -a or LC_ALL=C when
> grepping my files.

I suggest using -a. LC_ALL=C won't work the way that you want on 
platforms where the C locale is UTF-8, or is pure ASCII. For example, on 
Fedora 23 or RHEL 7 with grep 2.23 we have:

$ printf '\200\n' | LC_ALL=C grep .
Binary file (standard input) matches

This is because the C locale is pure ASCII on these platforms, i.e., 
'\200' is not a valid character the way it is with traditional Unix.  I 
don't know why Red Hat made that change.




This bug report was last modified 8 years and 257 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.