GNU bug report logs - #22838
New 'Binary file' detection considered harmful

Previous Next

Package: grep;

Reported by: Marcello Perathoner <marcello <at> perathoner.de>

Date: Sun, 28 Feb 2016 18:13:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Marcello Perathoner <marcello <at> perathoner.de>
To: Eric Blake <eblake <at> redhat.com>, Paul Eggert <eggert <at> cs.ucla.edu>, 22838 <at> debbugs.gnu.org
Subject: bug#22838: New 'Binary file' detection considered harmful
Date: Tue, 1 Mar 2016 11:05:21 +0100
On 02/29/2016 11:37 PM, Eric Blake wrote:
> On 02/29/2016 01:11 PM, Marcello Perathoner wrote:
>
>>> Yes, locale dependencies on standard behavior can be annoying.
>>>
>>
>> You assume that a user will only ever want to grep text files encoded in
>> the machine's locale. That is not so.
>
> You've been relying on undefined behavior, and it caught up with you.

(The backup2l author has been relying. I'm just a user of that package 
and I already filed a bug against backup2l too.)

You confuse 'undefined' with 'undocumented'.  The old behaviour was very 
well defined, even if it could turn out nasty.  It was defined by 
implementation: it was a de-facto standard.

OTOH it was nowhere documented that grepping non-locale files was 
considered marginal or illegal.

The old documentation explicitly stated:

"""
If  the  first  few  bytes  of  a file indicate that the file contains 
binary data, assume that the file is of type TYPE. By  default, TYPE  is 
 binary,  and  grep normally outputs either a one-line message saying 
that a binary file matches, or no message if there is no match.
"""
--- from an old man page

The new behaviour changes documented old behaviour.



Furthermore there's no need to fix the old bug in such a heavy-handed 
way. Less disrupting alternatives:

1) Make the new behaviour an opt-in.  Print a deprecation warning that 
gives people a chance to fix their scripts.  After a while make the new 
behaviour the default.

2) If you just output

   binary line 42 in file x matches

and continue regular output after the next newline, the breakage would 
be much more confined.

3) Fail in the old documented way of printing only the error message 
instead of introducing a new mode of failure that looks like success and 
loses the error message in the noise.

4) Don't implement this change between minor releases. A breaking change 
deserves a major release.



Regards

-- 
Marcello Perathoner





This bug report was last modified 8 years and 256 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.