GNU bug report logs -
#22838
New 'Binary file' detection considered harmful
Previous Next
Full log
Message #44 received at 22838 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 02/29/2016 04:35 PM, Paul Eggert wrote:
> I suggest using -a. LC_ALL=C won't work the way that you want on
> platforms where the C locale is UTF-8, or is pure ASCII. For example, on
> Fedora 23 or RHEL 7 with grep 2.23 we have:
>
> $ printf '\200\n' | LC_ALL=C grep .
> Binary file (standard input) matches
>
> This is because the C locale is pure ASCII on these platforms, i.e.,
> '\200' is not a valid character the way it is with traditional Unix. I
> don't know why Red Hat made that change.
I _think_ the Austin Group is leaning towards requiring the "C" locale
to always be a unibyte locale with all 256 bytes as valid characters, so
neither strict 7-bit ASCII nor UTF-8 would be usable as the "C" locale;
but for that to happen, POSIX would also need to allow a way to get a
UTF-8 locale easily accessible and describe how it differs from the "C"
locale under such a ruling. But it's still all conjecture on what the
final results will be - even in the standards committee, gracefully
documenting how locale corner cases must behave vs. leaving
implementations some latitude is tricky business; and any such change is
at least 3 or 4 years down the road before it could be standardized in
Issue 8 (right now, the focus is on Technical Corrigendum 2 for Issue 7).
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
This bug report was last modified 8 years and 257 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.