GNU bug report logs -
#54124
fmt inserts garbage in certain cases?
Previous Next
Full log
Message #8 received at 54124 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 23/02/2022 10:58, JD wrote:
> Hi!
>
> I have fmt from coreutils 8.32.1 installed via MacPorts.
>
> If I run the following command: `echo х х х х х х х х х х х х х х х х х х х х х х х х х х | gfmt -sw 10` (which is just echoing 26 Cyrillic 'х' ('kha') letters), I get the following results:
>
> https://i.imgur.com/yRx7uuz.png (iTerm2)
> https://i.imgur.com/7oQ0UPz.png (iTerm2 if passed via `more`)
> https://i.imgur.com/UlLrEMy.png (Alacritty)
>
> And if I delete just two 'х' letters, like this: `echo х х х х х х х х х х х х х х х х х х х х х х х х | gfmt -sw 10`, evertyhitng shows just fine: https://i.imgur.com/DwuWxyx.png
>
> Would be grateful for any advice :)
The issue here is that (on macOS 10.15.7 at least),
isspace(0x85) returns true for UTF-8 locales
(but not for "C" or "iso8859-1" locales).
BTW iscntrl() returns true for 0x85 on all non C locales
on both Linux and macOS.
Now gnulib says wrt isspace() that:
"This function's behaviour depends on the locale, but does not support
the multibyte characters that occur in strings in locales with
@code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales)."
I think isspace(x85) returning true on macOS is a bug,
but we should probably avoid isspace() in fmt altogether
given it's inconsistency with multibyte locales.
The attached uses c_isspace() instead.
cheers,
Pádraig
[fmt-utf8-macOS.patch (text/x-patch, attachment)]
This bug report was last modified 3 years and 119 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.