GNU bug report logs - #33775
multibyte: fold: multi-byte sequences as separate columns

Previous Next

Package: coreutils;

Reported by: Michael Siegel <msi <at> malbolge.net>

Date: Mon, 17 Dec 2018 02:15:01 UTC

Severity: wishlist

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Michael Siegel <msi <at> malbolge.net>
To: bug-coreutils <at> gnu.org
Subject: fold: counting multi-byte utf-8 sequences as separate columns
Date: Mon, 17 Dec 2018 02:32:55 +0100
Hello,

I've just discovered an odd behavior of `fold' while trying to wrap a
piece of text containing phonetic characters.

Take the following line, for example:

Tcl (pronounced "tickle" or tee cee ell /ˈtiː siː ɛl/) is a high-level,

It is 71 characters long. Still, running

echo "Tcl (pronounced "tickle" or tee cee ell /ˈtiː siː ɛl/) is a
high-level," | fold -w 72 -s

produces

Tcl (pronounced tickle or tee cee ell /ˈtiː siː ɛl/) is a
high-level,

I've had someone test this with FreeBSD's `fold', which didn't behave
that way. Instead, it filled out the line as expected.

Further investigation by developers of Adélie Linux revealed that GNU's
`fold' is counting multi-byte utf-8 sequences (in this case, the
phonetic characters) as separate columns:

awilcox on gwyn [pts/11 Sun 16 19:01] ~: cat testing.txt
1234567890 234567890 234567890 234567890 234567890 234567890 234567890
/ˈtiː siː ɛl/ Adélie en français español ¿que? ¡ay! here is 70 chars ^
yep.
awilcox on gwyn [pts/11 Sun 16 19:01] ~: fold -w 72 -s testing.txt
1234567890 234567890 234567890 234567890 234567890 234567890 234567890
/ˈtiː siː ɛl/ Adélie en français español ¿que? ¡ay! here is 70
chars ^
yep.



msi




This bug report was last modified 6 years and 178 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.