GNU bug report logs -
#8871
Bug with "sort -i" ?
Previous Next
Full log
Message #11 received at 8871 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
[re-adding the list]
On 06/15/2011 03:28 PM, Al Bogner wrote:
>> When all of the bytes are ignored as non-printable, then all three
>> lines are identical, hence -u prints only one line.
>
> Ok and thanks. I had a different understanding of non-printable.
Non-printable translates to whether isprint(3) returns 0 for a given
byte (single-byte locale, like C), or iswprint(3) returns 0 for a given
wide character (Unicode character composed from UTF-8 bytes, multi-byte
locale like de_DE.UTF-8). These functions are locale-specific (a byte
value may be deemed printable in one locale but not another).
Furthermore, isprint(0xa0) and iswprint(0xa0) may give different results
within the same locale, if the implementation is trying to reject
incomplete UTF-8 sequences and only understands complete wchar_t as
characters, in which case any code that uses isprint() on the individual
bytes of UTF-8 rather than iswprint() on the wchar_t of each composed
Unicode character will get the (unfortunate) results that no multi-byte
characters are recognized as printable.
Factor into this mess the fact that upstream coreutils still lacks
decent multi-byte handling in a lot of utilities. Various distros have
add-on patches for better wchar_t handling, but as of yet they have not
been consolidated into something that is easily maintainable and adds no
overhead to the single-byte C locale situation.
--
Eric Blake eblake <at> redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
This bug report was last modified 14 years and 36 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.