GNU bug report logs - #33775
multibyte: fold: multi-byte sequences as separate columns

Previous Next

Package: coreutils;

Reported by: Michael Siegel <msi <at> malbolge.net>

Date: Mon, 17 Dec 2018 02:15:01 UTC

Severity: wishlist

To reply to this bug, email your comments to 33775 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#33775; Package coreutils. (Mon, 17 Dec 2018 02:15:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Michael Siegel <msi <at> malbolge.net>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 17 Dec 2018 02:15:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Michael Siegel <msi <at> malbolge.net>
To: bug-coreutils <at> gnu.org
Subject: fold: counting multi-byte utf-8 sequences as separate columns
Date: Mon, 17 Dec 2018 02:32:55 +0100
Hello,

I've just discovered an odd behavior of `fold' while trying to wrap a
piece of text containing phonetic characters.

Take the following line, for example:

Tcl (pronounced "tickle" or tee cee ell /ˈtiː siː ɛl/) is a high-level,

It is 71 characters long. Still, running

echo "Tcl (pronounced "tickle" or tee cee ell /ˈtiː siː ɛl/) is a
high-level," | fold -w 72 -s

produces

Tcl (pronounced tickle or tee cee ell /ˈtiː siː ɛl/) is a
high-level,

I've had someone test this with FreeBSD's `fold', which didn't behave
that way. Instead, it filled out the line as expected.

Further investigation by developers of Adélie Linux revealed that GNU's
`fold' is counting multi-byte utf-8 sequences (in this case, the
phonetic characters) as separate columns:

awilcox on gwyn [pts/11 Sun 16 19:01] ~: cat testing.txt
1234567890 234567890 234567890 234567890 234567890 234567890 234567890
/ˈtiː siː ɛl/ Adélie en français español ¿que? ¡ay! here is 70 chars ^
yep.
awilcox on gwyn [pts/11 Sun 16 19:01] ~: fold -w 72 -s testing.txt
1234567890 234567890 234567890 234567890 234567890 234567890 234567890
/ˈtiː siː ɛl/ Adélie en français español ¿que? ¡ay! here is 70
chars ^
yep.



msi




Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sun, 23 Dec 2018 06:04:01 GMT) Full text and rfc822 format available.

Changed bug title to 'multibyte: fold: multi-byte sequences as separate columns' from 'fold: counting multi-byte utf-8 sequences as separate columns' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sun, 23 Dec 2018 06:04:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#33775; Package coreutils. (Sun, 23 Dec 2018 06:05:01 GMT) Full text and rfc822 format available.

Message #12 received at 33775 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Michael Siegel <msi <at> malbolge.net>, 33775 <at> debbugs.gnu.org
Subject: Re: bug#33775: fold: counting multi-byte utf-8 sequences as separate
 columns
Date: Sat, 22 Dec 2018 23:03:50 -0700
severity 33775 wishlist
retitle 33775 multibyte: fold: multi-byte sequences as separate columns
stop

Hello,

On 2018-12-16 6:32 p.m., Michael Siegel wrote:
> I've just discovered an odd behavior of `fold' while trying to wrap a
> piece of text containing phonetic characters.
> 
> Take the following line, for example:

Thank you for reporting this issue and
providing clear, reproducible examples.

Adding complete multibyte/utf8 support to all coreutils
programs is an on-going effort.

I'm marking this as a "wishlist" item, which will remain
open until we complete the implementation.

Related multibyte items are listed here (with "multibyte" prefix):
https://debbugs.gnu.org/cgi/pkgreport.cgi?which=pkg&data=coreutils



regards,
 - assaf






This bug report was last modified 6 years and 178 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.