GNU bug report logs - #30326
grep not searching through a text file (thinking it binary)

Previous Next

Package: grep;

Reported by: "L. A. Walsh" <gnu <at> tlinx.org>

Date: Fri, 2 Feb 2018 19:31:02 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


Message #30 received at 30326-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: L A Walsh <gnu <at> tlinx.org>
Cc: 30326-done <at> debbugs.gnu.org
Subject: Re: bug#30326: grep not searching through a text file (thinking it
 binary)
Date: Fri, 2 Feb 2018 15:44:45 -0800
On 02/02/2018 03:30 PM, L A Walsh wrote:
> most computer files (vs. user-files) are still single-byte. 

That's because so many of them are ASCII. But ASCII files are not the 
issue here. grep's behavior hasn't changed when operating on ASCII files 
in typical locales. The issue is text using a non-ASCII encoding that is 
not compatible with your locale; e.g., if your text file uses ISO 8859-1 
but your locale specifies UTF-8.

In my experience, UTF-8 has long been winning this battle, in the sense 
that UTF-8 is by far the dominant encoding for the non-ASCII files I 
regularly use. So I use a UTF-8 locale, and suggest this as a good 
default for most users nowadays.

It's not possible to get direct statistics about encoding for all user 
files. However, we can see what's being published on the web. Currently 
UTF-8 is being used by about 90% of public websites whose character 
encoding can be determined, according to the latest W3Techs survey. ISO 
8859-1 is in second place, at about 4%. See:

https://w3techs.com/technologies/overview/character_encoding/all





This bug report was last modified 7 years and 34 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.