GNU bug report logs -
#54124
fmt inserts garbage in certain cases?
Previous Next
To reply to this bug, email your comments to 54124 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#54124
; Package
coreutils
.
(Wed, 23 Feb 2022 11:28:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"JD" <john1doe <at> ya.ru>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Wed, 23 Feb 2022 11:28:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi!
I have fmt from coreutils 8.32.1 installed via MacPorts.
If I run the following command: `echo х х х х х х х х х х х х х х х х х х х х х х х х х х | gfmt -sw 10` (which is just echoing 26 Cyrillic 'х' ('kha') letters), I get the following results:
https://i.imgur.com/yRx7uuz.png (iTerm2)
https://i.imgur.com/7oQ0UPz.png (iTerm2 if passed via `more`)
https://i.imgur.com/UlLrEMy.png (Alacritty)
And if I delete just two 'х' letters, like this: `echo х х х х х х х х х х х х х х х х х х х х х х х х | gfmt -sw 10`, evertyhitng shows just fine: https://i.imgur.com/DwuWxyx.png
Would be grateful for any advice :)
--
JD
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#54124
; Package
coreutils
.
(Wed, 23 Feb 2022 17:57:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 54124 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 23/02/2022 10:58, JD wrote:
> Hi!
>
> I have fmt from coreutils 8.32.1 installed via MacPorts.
>
> If I run the following command: `echo х х х х х х х х х х х х х х х х х х х х х х х х х х | gfmt -sw 10` (which is just echoing 26 Cyrillic 'х' ('kha') letters), I get the following results:
>
> https://i.imgur.com/yRx7uuz.png (iTerm2)
> https://i.imgur.com/7oQ0UPz.png (iTerm2 if passed via `more`)
> https://i.imgur.com/UlLrEMy.png (Alacritty)
>
> And if I delete just two 'х' letters, like this: `echo х х х х х х х х х х х х х х х х х х х х х х х х | gfmt -sw 10`, evertyhitng shows just fine: https://i.imgur.com/DwuWxyx.png
>
> Would be grateful for any advice :)
The issue here is that (on macOS 10.15.7 at least),
isspace(0x85) returns true for UTF-8 locales
(but not for "C" or "iso8859-1" locales).
BTW iscntrl() returns true for 0x85 on all non C locales
on both Linux and macOS.
Now gnulib says wrt isspace() that:
"This function's behaviour depends on the locale, but does not support
the multibyte characters that occur in strings in locales with
@code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales)."
I think isspace(x85) returning true on macOS is a bug,
but we should probably avoid isspace() in fmt altogether
given it's inconsistency with multibyte locales.
The attached uses c_isspace() instead.
cheers,
Pádraig
[fmt-utf8-macOS.patch (text/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#54124
; Package
coreutils
.
(Thu, 24 Feb 2022 01:31:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 54124 <at> debbugs.gnu.org (full text, mbox):
On 23/02/2022 17:55, Pádraig Brady wrote:
> I think isspace(x85) returning true on macOS is a bug,
Bug is a bit of a strong word here.
A digression into why 0x85 is being treated specially here.
Note Cyrillic kha "х" is encoded in UTF-8 as:
$ printf '\u0445' | od -tx1
0000000 d1 85
What I think is happening is \u0085 represents "Next Line" in unicode.
This is present in unicode to support mapping to/from the corresponding char in EBCDIC,
which had a distinct char for this in addition to CR and LF.
Given isspace('\n') returns true, then it makes some sense that isspace("Next Line")
would return true, and I guess through implementation details
isspace(int) is operating on utf32 on macOS in UTF-8 locales
and this returning true for this value.
BTW 0xA0 is the only other value that isspace() returns true for
(other than the standard c_isspace() values of course).
This is non breaking space, so it's best we don't split on it anyway.
I.e. this is another benefit to the change.
I still think using c_isspace() to avoid this issue is best,
and intend to push the change tomorrow.
cheers,
Pádraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#54124
; Package
coreutils
.
(Thu, 24 Feb 2022 03:07:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 54124 <at> debbugs.gnu.org (full text, mbox):
On 2/23/22 17:29, Pádraig Brady wrote:
> Given isspace('\n') returns true, then it makes some sense that
> isspace("Next Line")
> would return true,
POSIX says that the application must insure that argument to isspace is
either EOF or "a character representable as an unsigned char", and
arguably since 0x85 not either one of those things the behavior of
isspace(0x85) is undefined.
However, the C standard does not have this wording, and since POSIX is
supposed to defer to the C standard here, this appears to be a bug in
POSIX (as well as a bug in macOS). It's understandable if the Apple C
library's developers got confused by the POSIX wording.
This bug report was last modified 3 years and 119 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.