GNU bug report logs - #24924
multibyte: pr has no concept of wide characters

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Fri, 11 Nov 2016 16:12:01 UTC

Severity: wishlist

Message #20 received at 24924 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 24924 <at> debbugs.gnu.org
Subject: Re: bug#24924: GNU pr only working with singlebyte 1-width characters
Date: Thu, 1 Dec 2016 06:32:22 +0000

2016-11-30 18:37:05 -0800, Paul Eggert:
> On 11/30/2016 03:30 AM, Stephane Chazelas wrote:
> >That can also be seen as a POSIX conformance bug
> 
> Not really, as POSIX does not require support for UTF-8 (except in
> the pax utility, which is not part of coreutils).
[...]

POSIX does not require support for any charset. It only
specifies one locale (C/POSIX), doesn't specify the charset in
that locale  other than it should be a single byte charset that
covers the portable character set. Examples of such charsets are
ASCII, iso8859-x or EBCDIC. In practice, that tends to be ASCII
(except for some rare EBCDIC based IBM systems) as tha

But it does support a localisation API and allows system to
support other locales with other charsets. That API does support
multi-byte encodings, including stateful ones (though how they
are /defined/ is implementation defined for lock-shift ones and
in practice those are unworkable so I'd expect those would
eventually be removed from the standard). It doesn't require
compliant systems to have locales with multi-byte character sets,
but if they have (if they show up in the output of locale -a),
then they have to be supported throughout (as specified, for all
the utilities for instance).

Basically, on systems that have locales with multi-byte
encodings --UTF-8 or other-- (most Unix-like ones including GNU
systems like Debian), GNU pr (and many other GNU utilities) is
not POSIX compliant.

See
http://pubs.opengroup.org/onlinepubs/9699919799.2016edition/basedefs/V1_chap06.html

for details.

-- 
Stephane

This bug report was last modified 6 years and 231 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #24924 multibyte: pr has no concept of wide characters

GNU bug report logs - #24924
multibyte: pr has no concept of wide characters