GNU bug report logs - #24924
multibyte: pr has no concept of wide characters

Previous Next

Package: coreutils;

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Fri, 11 Nov 2016 16:12:01 UTC

Severity: wishlist

Full log


Message #8 received at 24924 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Cc: 24924 <at> debbugs.gnu.org
Subject: Re: bug#24924: pr has no concept of wide characters
Date: Fri, 11 Nov 2016 11:36:20 -0500
severity 24924 wishlist
tags 24924 wishlist notabug
thanks

Hello Dan,

On 11/11/2016 11:10 AM, 積丹尼 Dan Jacobson wrote:
> The pr documentation (man, info) doesn't mention how it has no concept
> of wide characters.
> $ pr -m --sep-string='^^^'  file file

Indeed, most of the current coreutils programs do not support wide or multi-byte characters correctly.
The current official implementation does not support it (which is why I marked this item as 'wishlist' and not a bug).
On RedHat systems, there is the 'i18n' patch, which adds some support but also introduces some problematic issues:
  https://github.com/pixelb/coreutils/tree/i18n

However, there is an active effort to make all of them multibyte aware.
The latest updates are (in reverse chronological order, these are somewhat long threads):
  http://lists.gnu.org/archive/html/coreutils/2016-09/msg00026.html
  http://lists.gnu.org/archive/html/coreutils/2016-09/msg00011.html
  http://lists.gnu.org/archive/html/coreutils/2016-07/msg00013.html

'cut' and 'expand' were the first two programs I worked on.
'pr' is definitely on the list - once I have a proof-of-concept working, I would very much appreciate if you could help me test it as there are many edge-cases with multibyte support and wide-characters.

As a curiosity,
are you using UTF-8 locales exclusively, or do you have experience with Shift-JIS or EUC-JP locales?


I'm leaving this ticket open, and welcome discussion and comments.
regards,
 - assaf


P.S.
The usual disclaimer applies: there is currently no ETA for multibyte support in coreutils.







This bug report was last modified 6 years and 231 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.