GNU bug report logs - #32236
df header corrupted with LANG=zh_TW.UTF-8 on macOS

Previous Next

Package: coreutils;

Reported by: Chih-Hsuan Yen <yan12125 <at> gmail.com>

Date: Sat, 21 Jul 2018 16:10:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Bruno Haible <bruno <at> clisp.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Chih-Hsuan Yen <yan12125 <at> gmail.com>, bug-gnulib <bug-gnulib <at> gnu.org>, Pádraig Brady <P <at> draigbrady.com>, 32236-done <at> debbugs.gnu.org
Subject: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS
Date: Fri, 27 Jul 2018 11:38:22 +0200
Paul Eggert wrote:
> my earlier patch 
> neglected the possibility that mbrtowc can return 0

I wouldn't see this as a bug: You can assume that mbrtowc returns
0 if and only if the multibyte sequence is a NUL byte - but you had
chosen srcend in such a way that this would not happen in the loop.

> and it incorrectly assumed 
> wide control characters always have a single-byte representation.

Oops, you're right. My mistake as well.

The new patch looks good.

This will catch (and replace with '?') U+2028 and U+2029 on glibc systems.
On macOS, it will not do this, because iswcntrl(0x2028) and iswcntrl(0x2029)
is 0 on this system; this is consistent with the fact that the 'Terminal'
program displays these characters as simple spaces. So, no need to override
iswcntrl on macOS.

Bruno


2018-07-27  Bruno Haible  <bruno <at> clisp.org>

	iswcntrl: Mention minor problem on macOS.
	* doc/posix-functions/iswcntrl.texi: Mention oddity on macOS.

diff --git a/doc/posix-functions/iswcntrl.texi b/doc/posix-functions/iswcntrl.texi
index 99eaa0e..44dd034 100644
--- a/doc/posix-functions/iswcntrl.texi
+++ b/doc/posix-functions/iswcntrl.texi
@@ -25,4 +25,8 @@ Portability problems not fixed by Gnulib:
 @item
 On AIX and Windows platforms, @code{wchar_t} is a 16-bit type and therefore cannot
 accommodate all Unicode characters.
+@item
+This function returns 0 for U+2028 (LINE SEPARATOR) and
+U+2029 (PARAGRAPH SEPARATOR) on some platforms:
+Mac OS X 10.13.
 @end itemize





This bug report was last modified 6 years and 160 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.