GNU bug report logs -
#30326
grep not searching through a text file (thinking it binary)
Previous Next
Reported by: "L. A. Walsh" <gnu <at> tlinx.org>
Date: Fri, 2 Feb 2018 19:31:02 UTC
Severity: normal
Tags: notabug
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
Full log
Message #30 received at 30326-done <at> debbugs.gnu.org (full text, mbox):
On 02/02/2018 03:30 PM, L A Walsh wrote:
> most computer files (vs. user-files) are still single-byte.
That's because so many of them are ASCII. But ASCII files are not the
issue here. grep's behavior hasn't changed when operating on ASCII files
in typical locales. The issue is text using a non-ASCII encoding that is
not compatible with your locale; e.g., if your text file uses ISO 8859-1
but your locale specifies UTF-8.
In my experience, UTF-8 has long been winning this battle, in the sense
that UTF-8 is by far the dominant encoding for the non-ASCII files I
regularly use. So I use a UTF-8 locale, and suggest this as a good
default for most users nowadays.
It's not possible to get direct statistics about encoding for all user
files. However, we can see what's being published on the web. Currently
UTF-8 is being used by about 90% of public websites whose character
encoding can be determined, according to the latest W3Techs survey. ISO
8859-1 is in second place, at about 4%. See:
https://w3techs.com/technologies/overview/character_encoding/all
This bug report was last modified 7 years and 34 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.