GNU bug report logs -
#24924
multibyte: pr has no concept of wide characters
Previous Next
Full log
Message #26 received at 24924 <at> debbugs.gnu.org (full text, mbox):
2016-12-01 07:04:05 +0000, Stephane Chazelas:
> 2016-11-30 18:37:05 -0800, Paul Eggert:
> [...]
> > In the meantime if you could submit a patch for the
> > documentation that should fix the immediate documentation
> > problem.
> [...]
>
> What about:
[...]
> +Please note that @command{pr} currently doesn't support multi-byte characters
> +or non-ASCII characters that have a null or double width. If such characters
> +occur in the input or column separators, column alignment may be off or lines
> +may exceed the page width. There is also no provision to support bidirectional
> +text.
[...]
Actually, it seems it can also truncate lines in the middle of
some characters though it seems it's confined to multibyte
characters that have byte values <= 127 like:
$ locale charmap
BIG5-HKSCS
$ printf '\ue9\ue9\ue9\n' | pr -w5 -t2 | hd
00000000 88 6d 88 6d 88 0a |.m.m..|
00000006
See how that third é (0x88 0x6d in BIG5-HKSCS) was truncated in
the middle.
It's as if it was considering all byte values >= 128 as having
zero width in multi-byte locales (and only in multi-byte
locales, that doesn't seem to occur in single-byte ones).
So maybe:
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index cc85f22..15088ce 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -1838,6 +1838,13 @@ For single
column output no line truncation occurs by default. Use @option{-W} option to
truncate lines in that case.
+Please note that @command{pr} currently doesn't support multi-byte characters
+or non-ASCII characters that have a null or double width. If such characters
+occur in the input or column separators, column alignment may be off or lines
+may exceed the page width, or truncation may occur in the middle of some
+characters producing invalid text output. There is also no provision to support
+bidirectional text.
+
The following changes were made in version 1.22i and apply to later
versions of @command{pr}:
@c FIXME: this whole section here sounds very awkward to me. I
--
Stephane
This bug report was last modified 6 years and 231 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.