GNU bug report logs - #22838
New 'Binary file' detection considered harmful

Previous Next

Package: grep;

Reported by: Marcello Perathoner <marcello <at> perathoner.de>

Date: Sun, 28 Feb 2016 18:13:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #29 received at 22838 <at> debbugs.gnu.org (full text, mbox):

From: Marcello Perathoner <marcello <at> perathoner.de>
To: Eric Blake <eblake <at> redhat.com>, Paul Eggert <eggert <at> cs.ucla.edu>,
 22838 <at> debbugs.gnu.org
Subject: Re: bug#22838: New 'Binary file' detection considered harmful
Date: Mon, 29 Feb 2016 21:11:02 +0100
On 02/29/2016 06:56 PM, Eric Blake wrote:
> On 02/29/2016 10:54 AM, Eric Blake wrote:
>> Encoding errors are not characters, but bytes.  A line cannot contain
>> encoding errors.  Therefore, a file with encoding errors is not a text file.
>
> Corollary - there exist files which are text files in some locales, but
> binary files in others (based on whether the locale interprets the bytes
> as an encoding error or as valid characters).
>
> Yes, locale dependencies on standard behavior can be annoying.
>

You assume that a user will only ever want to grep text files encoded in 
the machine's locale. That is not so.

As a German user I have on my disk files in many encodings: utf-8, 
iso-8859-1, win-1252, iso-8859-15, encodings that are now defunct like 
CP850, CP847, "German 7-bit ASCII" that replaced braces with Umlauts, 
old WordStar files that used control characters inside.

Since 2.21 I will now have to always specify -a or LC_ALL=C when 
grepping my files.




Regards

-- 
Marcello Perathoner





This bug report was last modified 8 years and 256 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.