GNU bug report logs -
#79300
fold-nbsp test failure
Previous Next
To reply to this bug, email your comments to 79300 AT debbugs.gnu.org.
There is no need to reopen the bug first.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Sun, 24 Aug 2025 07:51:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Bruno Haible <bruno <at> clisp.org>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Sun, 24 Aug 2025 07:51:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Today's CI run reports
FAIL: tests/fold/fold-nbsp
on NetBSD 10 and Solaris 11.4.
The log output in both cases is:
FAIL: tests/fold/fold-nbsp
==========================
--- exp1 2025-08-24 06:57:52.605590760 +0000
+++ out1 2025-08-24 06:57:52.607333160 +0000
@@ -1,3 +1,3 @@
abcdefghij
-klmnop qrs
-tuvwxyz
+klmnop
+qrstuvwxyz
--- exp2 2025-08-24 06:57:52.613841250 +0000
+++ out2 2025-08-24 06:57:52.615577504 +0000
@@ -1,3 +1,3 @@
abcdefghij
-klmnop qr
-stuvwxyz
+klmnop
+qrstuvwxyz
FAIL tests/fold/fold-nbsp.sh (exit status: 1)
It looks like the character has been treated like a space.
If you need a correction at the Gnulib level (in the functions iswblank,
iswspace, c32isblank, or c32isspace), please report this to bug-gnulib.
Alternatively, you may declare this a "quality of implementation" issue
and simply disable the test on NetBSD and Solaris.
Bruno
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Tue, 26 Aug 2025 02:26:02 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
Bruno Haible via GNU coreutils Bug Reports <bug-coreutils <at> gnu.org>
writes:
> Today's CI run reports
> FAIL: tests/fold/fold-nbsp
> on NetBSD 10 and Solaris 11.4.
>
> The log output in both cases is:
>
> FAIL: tests/fold/fold-nbsp
> ==========================
>
> --- exp1 2025-08-24 06:57:52.605590760 +0000
> +++ out1 2025-08-24 06:57:52.607333160 +0000
> @@ -1,3 +1,3 @@
> abcdefghij
> -klmnop qrs
> -tuvwxyz
> +klmnop
> +qrstuvwxyz
> --- exp2 2025-08-24 06:57:52.613841250 +0000
> +++ out2 2025-08-24 06:57:52.615577504 +0000
> @@ -1,3 +1,3 @@
> abcdefghij
> -klmnop qr
> -stuvwxyz
> +klmnop
> +qrstuvwxyz
> FAIL tests/fold/fold-nbsp.sh (exit status: 1)
>
>
> It looks like the character has been treated like a space.
> If you need a correction at the Gnulib level (in the functions iswblank,
> iswspace, c32isblank, or c32isspace), please report this to bug-gnulib.
> Alternatively, you may declare this a "quality of implementation" issue
> and simply disable the test on NetBSD and Solaris.
Thanks.
My initial idea was to check if U+2007 FIGURE SPACE and U+00A0 NO-BREAK
SPACE are blank using grep. But apparently Solaris grep does not handle
multibyte characters. Therefore, FIGURE SPACE cannot be checked. :(
Collin
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Tue, 26 Aug 2025 02:26:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Tue, 26 Aug 2025 09:48:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 79300 <at> debbugs.gnu.org (full text, mbox):
Collin Funk wrote:
> My initial idea was to check if U+2007 FIGURE SPACE and U+00A0 NO-BREAK
> SPACE are blank using grep. But apparently Solaris grep does not handle
> multibyte characters. Therefore, FIGURE SPACE cannot be checked. :(
I'm not sure we are talking about the same thing. I reported the test
failure from the fold-nbsp test, as committed in git. It does not use
'grep'. It uses 'compare' (from tests/init.sh).
Bruno
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Tue, 26 Aug 2025 09:48:03 GMT)
Full text and
rfc822 format available.
Message #17 received at 79300 <at> debbugs.gnu.org (full text, mbox):
On 26/08/2025 03:24, Collin Funk wrote:
> Bruno Haible via GNU coreutils Bug Reports <bug-coreutils <at> gnu.org>
> writes:
>
>> Today's CI run reports
>> FAIL: tests/fold/fold-nbsp
>> on NetBSD 10 and Solaris 11.4.
>>
>> The log output in both cases is:
>>
>> FAIL: tests/fold/fold-nbsp
>> ==========================
>>
>> --- exp1 2025-08-24 06:57:52.605590760 +0000
>> +++ out1 2025-08-24 06:57:52.607333160 +0000
>> @@ -1,3 +1,3 @@
>> abcdefghij
>> -klmnop qrs
>> -tuvwxyz
>> +klmnop
>> +qrstuvwxyz
>> --- exp2 2025-08-24 06:57:52.613841250 +0000
>> +++ out2 2025-08-24 06:57:52.615577504 +0000
>> @@ -1,3 +1,3 @@
>> abcdefghij
>> -klmnop qr
>> -stuvwxyz
>> +klmnop
>> +qrstuvwxyz
>> FAIL tests/fold/fold-nbsp.sh (exit status: 1)
>>
>>
>> It looks like the character has been treated like a space.
>> If you need a correction at the Gnulib level (in the functions iswblank,
>> iswspace, c32isblank, or c32isspace), please report this to bug-gnulib.
>> Alternatively, you may declare this a "quality of implementation" issue
>> and simply disable the test on NetBSD and Solaris.
>
> Thanks.
>
> My initial idea was to check if U+2007 FIGURE SPACE and U+00A0 NO-BREAK
> SPACE are blank using grep. But apparently Solaris grep does not handle
> multibyte characters. Therefore, FIGURE SPACE cannot be checked. :(
Perhaps the techniques from tests/wc/wc-nbsp.sh could be used?
Maybe something like:
check_space() {
char="$1"
# Use -L to determine whether NBSP is printable.
# FreeBSD 11 and OS X treat NBSP as non printable ?
test "$(env printf "=$char=" | wc -L)" = 3 &&
test $(env printf "=$char=" | wc -w) = 2
}
if check_space '\u2007'; then
...
fi
cheers,
Padraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Tue, 26 Aug 2025 09:48:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Fri, 29 Aug 2025 03:42:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 79300 <at> debbugs.gnu.org (full text, mbox):
Bruno Haible <bruno <at> clisp.org> writes:
> Collin Funk wrote:
>> My initial idea was to check if U+2007 FIGURE SPACE and U+00A0 NO-BREAK
>> SPACE are blank using grep. But apparently Solaris grep does not handle
>> multibyte characters. Therefore, FIGURE SPACE cannot be checked. :(
>
> I'm not sure we are talking about the same thing. I reported the test
> failure from the fold-nbsp test, as committed in git. It does not use
> 'grep'. It uses 'compare' (from tests/init.sh).
Sorry, should have been more clear.
I was trying to think of a workaround, i.e. check if U+2007 is
considered blank or not. And then decide if the test should be skipped.
Collin
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Fri, 29 Aug 2025 03:43:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Fri, 29 Aug 2025 04:24:05 GMT)
Full text and
rfc822 format available.
Message #29 received at 79300 <at> debbugs.gnu.org (full text, mbox):
Pádraig Brady <P <at> draigBrady.com> writes:
> Perhaps the techniques from tests/wc/wc-nbsp.sh could be used?
> Maybe something like:
>
> check_space() {
> char="$1"
> # Use -L to determine whether NBSP is printable.
> # FreeBSD 11 and OS X treat NBSP as non printable ?
> test "$(env printf "=$char=" | wc -L)" = 3 &&
> test $(env printf "=$char=" | wc -w) = 2
> }
>
> if check_space '\u2007'; then
> ...
> fi
Thanks for the suggestion, but that doesn't work. Any issue with
skipping based on $host_os for this test and for fold-spaces.sh?
I was thinking of testing "printf '\u00A0' | ./src/tr -d '[:blank:]'"
but that won't work since 'tr' operates on bytes and U+00A0 is
represented as 0xc2 0xa0 in UTF-8.
Collin
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Fri, 29 Aug 2025 12:48:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 79300 <at> debbugs.gnu.org (full text, mbox):
On 29/08/2025 05:23, Collin Funk wrote:
> Pádraig Brady <P <at> draigBrady.com> writes:
>
>> Perhaps the techniques from tests/wc/wc-nbsp.sh could be used?
>> Maybe something like:
>>
>> check_space() {
>> char="$1"
>> # Use -L to determine whether NBSP is printable.
>> # FreeBSD 11 and OS X treat NBSP as non printable ?
>> test "$(env printf "=$char=" | wc -L)" = 3 &&
>> test $(env printf "=$char=" | wc -w) = 2
>> }
>>
>> if check_space '\u2007'; then
>> ...
>> fi
>
> Thanks for the suggestion, but that doesn't work. Any issue with
> skipping based on $host_os for this test and for fold-spaces.sh?
>
> I was thinking of testing "printf '\u00A0' | ./src/tr -d '[:blank:]'"
> but that won't work since 'tr' operates on bytes and U+00A0 is
> represented as 0xc2 0xa0 in UTF-8.
Oh right sorry. wc has it's own iswnbspace,
whereas fold essentially relies on the system iswblank.
That means you could correlate with uniq though. Something like:
isblank() { test $(printf "a$1a\nb$1b\n" | uniq -f1 | wc -l) = 2; }
if ! isblank '\u2007'; then
# can test '\u2007' is treated as non breaking space
fi
That would be a preferable way to gate the test.
Though I'm thinking now we should adjust fold(1) a little
to ensure we don't break with nbsp consistently across systems.
I.e. move/rename iswnbspace() from wc.c to src/system.h
and use it in fold (and wc) to give consistent behavior.
I.e. fold would use: c32isblank() && ! c32isnbspace(),
and the test would stay as is.
cheers,
Padraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Sat, 30 Aug 2025 03:28:03 GMT)
Full text and
rfc822 format available.
Message #35 received at 79300 <at> debbugs.gnu.org (full text, mbox):
Pádraig Brady <P <at> draigBrady.com> writes:
>> Thanks for the suggestion, but that doesn't work. Any issue with
>> skipping based on $host_os for this test and for fold-spaces.sh?
>> I was thinking of testing "printf '\u00A0' | ./src/tr -d
>> '[:blank:]'"
>> but that won't work since 'tr' operates on bytes and U+00A0 is
>> represented as 0xc2 0xa0 in UTF-8.
>
> Oh right sorry. wc has it's own iswnbspace,
> whereas fold essentially relies on the system iswblank.
>
> That means you could correlate with uniq though. Something like:
>
> isblank() { test $(printf "a$1a\nb$1b\n" | uniq -f1 | wc -l) = 2; }
> if ! isblank '\u2007'; then
> # can test '\u2007' is treated as non breaking space
> fi
>
> That would be a preferable way to gate the test.
>
> Though I'm thinking now we should adjust fold(1) a little
> to ensure we don't break with nbsp consistently across systems.
> I.e. move/rename iswnbspace() from wc.c to src/system.h
> and use it in fold (and wc) to give consistent behavior.
> I.e. fold would use: c32isblank() && ! c32isnbspace(),
> and the test would stay as is.
Thanks, I forgot about that function. That sounds like a good idea to
me. We can be nice to people who do not use glibc.
We will have to hoist the 'posixly_correct' check out of it before
though. Technically POSIX says that 'fold -s' should only break at
<blank> characters. But I rather avoid adding more
getenv ("POSIXLY_CORRECT") to programs that do not yet have them.
Collin
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Sat, 30 Aug 2025 10:25:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 79300 <at> debbugs.gnu.org (full text, mbox):
On 30/08/2025 04:27, Collin Funk wrote:
> Pádraig Brady <P <at> draigBrady.com> writes:
>
>>> Thanks for the suggestion, but that doesn't work. Any issue with
>>> skipping based on $host_os for this test and for fold-spaces.sh?
>>> I was thinking of testing "printf '\u00A0' | ./src/tr -d
>>> '[:blank:]'"
>>> but that won't work since 'tr' operates on bytes and U+00A0 is
>>> represented as 0xc2 0xa0 in UTF-8.
>>
>> Oh right sorry. wc has it's own iswnbspace,
>> whereas fold essentially relies on the system iswblank.
>>
>> That means you could correlate with uniq though. Something like:
>>
>> isblank() { test $(printf "a$1a\nb$1b\n" | uniq -f1 | wc -l) = 2; }
>> if ! isblank '\u2007'; then
>> # can test '\u2007' is treated as non breaking space
>> fi
>>
>> That would be a preferable way to gate the test.
>>
>> Though I'm thinking now we should adjust fold(1) a little
>> to ensure we don't break with nbsp consistently across systems.
>> I.e. move/rename iswnbspace() from wc.c to src/system.h
>> and use it in fold (and wc) to give consistent behavior.
>> I.e. fold would use: c32isblank() && ! c32isnbspace(),
>> and the test would stay as is.
>
> Thanks, I forgot about that function. That sounds like a good idea to
> me. We can be nice to people who do not use glibc.
>
> We will have to hoist the 'posixly_correct' check out of it before
> though. Technically POSIX says that 'fold -s' should only break at
> <blank> characters. But I rather avoid adding more
> getenv ("POSIXLY_CORRECT") to programs that do not yet have them.
Yes I agree that fold should not depend on POSIXLY_CORRECT,
so c32isnbspace() should only look at the passed char.
cheers,
Padraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Wed, 03 Sep 2025 04:05:02 GMT)
Full text and
rfc822 format available.
Message #41 received at 79300 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Pádraig Brady <P <at> draigBrady.com> writes:
>> Thanks, I forgot about that function. That sounds like a good idea
>> to
>> me. We can be nice to people who do not use glibc.
>> We will have to hoist the 'posixly_correct' check out of it before
>> though. Technically POSIX says that 'fold -s' should only break at
>> <blank> characters. But I rather avoid adding more
>> getenv ("POSIXLY_CORRECT") to programs that do not yet have them.
>
> Yes I agree that fold should not depend on POSIXLY_CORRECT,
> so c32isnbspace() should only look at the passed char.
This patch should do the trick. It fixes it on Solaris 11.4 (cfarm215).
I couldn't reproduce the failure seen on the CI machines in my NetBSD 10
VM. But I see no reason why this fix wouldn't work there too.
Will push it tomorrow.
Collin
[0001-fold-check-that-characters-are-not-non-breaking-spac.patch (text/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Wed, 03 Sep 2025 07:24:01 GMT)
Full text and
rfc822 format available.
Message #44 received at 79300 <at> debbugs.gnu.org (full text, mbox):
On 03/09/2025 05:04, Collin Funk wrote:
> Pádraig Brady <P <at> draigBrady.com> writes:
>
>>> Thanks, I forgot about that function. That sounds like a good idea
>>> to
>>> me. We can be nice to people who do not use glibc.
>>> We will have to hoist the 'posixly_correct' check out of it before
>>> though. Technically POSIX says that 'fold -s' should only break at
>>> <blank> characters. But I rather avoid adding more
>>> getenv ("POSIXLY_CORRECT") to programs that do not yet have them.
>>
>> Yes I agree that fold should not depend on POSIXLY_CORRECT,
>> so c32isnbspace() should only look at the passed char.
>
> This patch should do the trick. It fixes it on Solaris 11.4 (cfarm215).
> I couldn't reproduce the failure seen on the CI machines in my NetBSD 10
> VM. But I see no reason why this fix wouldn't work there too.
>
> Will push it tomorrow.
I would have left iswnbspace() in wc.c, calling into c32isnbspace(),
otherwise the double negative with posixly_correct is awkward.
Anyway the logic looks good.
thanks!
Padraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Thu, 04 Sep 2025 02:27:02 GMT)
Full text and
rfc822 format available.
Message #47 received at 79300 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Pádraig Brady <P <at> draigBrady.com> writes:
>> This patch should do the trick. It fixes it on Solaris 11.4
>> (cfarm215).
>> I couldn't reproduce the failure seen on the CI machines in my NetBSD 10
>> VM. But I see no reason why this fix wouldn't work there too.
>> Will push it tomorrow.
>
> I would have left iswnbspace() in wc.c, calling into c32isnbspace(),
> otherwise the double negative with posixly_correct is awkward.
> Anyway the logic looks good.
I was about 50/50 whether the double negation was too ugly to use. :)
I'll leave the function there, but name it maybe_c32isnbspace(). Since I
don't want the function to be misunderstood as a wchar_t function.
Pushed the attatched two patches. The second fixes a 'make syntax-check'
failure. Will close this bug now.
Collin
P.S. I actually just noticed this unchanged hunk in my diff:
$ git ls-files | grep -E '\.[ch]' | xargs grep -F 'isw'
src/wc.c: in_word2 = (! iswspace (wide_char)
Okay to change this one to the c32 variant?
Collin
[0001-fold-check-that-characters-are-not-non-breaking-spac.patch (text/x-patch, attachment)]
[0002-maint-avoid-syntax-check-failure-from-previous-commi.patch (text/x-patch, attachment)]
Added tag(s) fixed.
Request was from
Collin Funk <collin.funk1 <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Thu, 04 Sep 2025 02:29:02 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
79300 <at> debbugs.gnu.org and Bruno Haible <bruno <at> clisp.org>
Request was from
Collin Funk <collin.funk1 <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Thu, 04 Sep 2025 02:29:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Thu, 04 Sep 2025 17:05:02 GMT)
Full text and
rfc822 format available.
Message #54 received at 79300 <at> debbugs.gnu.org (full text, mbox):
Hi Collin,
> Pushed the attatched two patches. The second fixes a 'make syntax-check'
> failure. Will close this bug now.
Thanks. I confirm (via today's CI run) that bug #79300 is fixed.
> P.S. I actually just noticed this unchanged hunk in my diff:
>
> $ git ls-files | grep -E '\.[ch]' | xargs grep -F 'isw'
> src/wc.c: in_word2 = (! iswspace (wide_char)
>
> Okay to change this one to the c32 variant?
Yes. Since 'wide_char' is of type char32_t and produced by mbrtoc32,
this line should call c32isspace, not iswspace. [1]
Bruno
[1] https://www.gnu.org/software/gnulib/manual/html_node/Comparison-of-character-APIs.html
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79300
; Package
coreutils
.
(Fri, 05 Sep 2025 04:26:02 GMT)
Full text and
rfc822 format available.
Message #57 received at 79300 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Bruno Haible <bruno <at> clisp.org> writes:
>> P.S. I actually just noticed this unchanged hunk in my diff:
>>
>> $ git ls-files | grep -E '\.[ch]' | xargs grep -F 'isw'
>> src/wc.c: in_word2 = (! iswspace (wide_char)
>>
>> Okay to change this one to the c32 variant?
>
> Yes. Since 'wide_char' is of type char32_t and produced by mbrtoc32,
> this line should call c32isspace, not iswspace. [1]
Pushed the attatched, thanks.
Collin
[0001-maint-prefer-c32isspace-to-iswspace.patch (text/x-patch, attachment)]
This bug report was last modified 9 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.