GNU bug report logs - #54124
fmt inserts garbage in certain cases?

Previous Next

Package: coreutils;

Reported by: "JD" <john1doe <at> ya.ru>

Date: Wed, 23 Feb 2022 11:28:01 UTC

Severity: normal

Full log


Message #11 received at 54124 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: JD <john1doe <at> ya.ru>, 54124 <at> debbugs.gnu.org
Subject: Re: bug#54124: fmt inserts garbage in certain cases?
Date: Thu, 24 Feb 2022 01:29:56 +0000
On 23/02/2022 17:55, Pádraig Brady wrote:

> I think isspace(x85) returning true on macOS is a bug,

Bug is a bit of a strong word here.

A digression into why 0x85 is being treated specially here.
Note Cyrillic kha "х" is encoded in UTF-8 as:
 $ printf '\u0445' | od -tx1
 0000000 d1 85

What I think is happening is \u0085 represents "Next Line" in unicode.
This is present in unicode to support mapping to/from the corresponding char in EBCDIC,
which had a distinct char for this in addition to CR and LF.
Given isspace('\n') returns true, then it makes some sense that isspace("Next Line")
would return true, and I guess through implementation details
isspace(int) is operating on utf32 on macOS in UTF-8 locales
and this returning true for this value.

BTW 0xA0 is the only other value that isspace() returns true for
(other than the standard c_isspace() values of course).
This is non breaking space, so it's best we don't split on it anyway.
I.e. this is another benefit to the change.

I still think using c_isspace() to avoid this issue is best,
and intend to push the change tomorrow.

cheers,
Pádraig




This bug report was last modified 3 years and 119 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.