GNU bug report logs - #32236
df header corrupted with LANG=zh_TW.UTF-8 on macOS

Previous Next

Package: coreutils;

Reported by: Chih-Hsuan Yen <yan12125 <at> gmail.com>

Date: Sat, 21 Jul 2018 16:10:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Chih-Hsuan Yen <yan12125 <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: bug-gnulib <bug-gnulib <at> gnu.org>, Pádraig Brady <P <at> draigbrady.com>, 32236 <at> debbugs.gnu.org
Subject: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS
Date: Mon, 23 Jul 2018 00:09:45 +0800
2018-07-22 23:12 GMT+08:00 Paul Eggert <eggert <at> cs.ucla.edu>:
> Pádraig Brady wrote:
>>
>> I've also attached an alternative patch for df (in your name).
>
>
> That still has problems, since it can generate improperly-encoded strings in
> UTF-8 locales (if the inputs are improperly encoded), and can replace parts
> of multibyte characters with '?' in non-UTF-8 locales. Please try the
> attached patch instead, which attempts to address these issues. This is more
> along the lines that Bruno suggested, except it doesn't use mbsiter as I
> figured it was simpler overall just to use mbrtowc directly for this one
> thing.

Here's the result of df:

$ df
檔案系統        容量  已用  可用 已用 掛載點
/dev/disk1s1    234G  137G   95G  60% /
/dev/disk1s4    234G  2.1G   95G   3% /private/var/vm
chyen.cc:        25G   12G   12G  51% /private/tmp/abc def ghi

$ df | xxd
00000000: e6aa 94e6 a188 e7b3 bbe7 b5b1 2020 2020  ............
00000010: 2020 2020 e5ae b9e9 878f 2020 e5b7 b2e7      ......  ....
00000020: 94a8 2020 e58f afe7 94a8 20e5 b7b2 e794  ..  ...... .....
00000030: a820 e68e 9be8 bc89 e9bb 9e0a 2f64 6576  . ........../dev
00000040: 2f64 6973 6b31 7331 2020 2020 3233 3447  /disk1s1    234G
00000050: 2020 3133 3747 2020 2039 3547 2020 3630    137G   95G  60
00000060: 2520 2f0a 2f64 6576 2f64 6973 6b31 7334  % /./dev/disk1s4
00000070: 2020 2020 3233 3447 2020 322e 3147 2020      234G  2.1G
00000080: 2039 3547 2020 2033 2520 2f70 7269 7661   95G   3% /priva
00000090: 7465 2f76 6172 2f76 6d0a 6368 7965 6e2e  te/var/vm.chyen.
000000a0: 6363 3a20 2020 2020 2020 2032 3547 2020  cc:        25G
000000b0: 2031 3247 2020 2031 3247 2020 3531 2520   12G   12G  51%
000000c0: 2f70 7269 7661 7465 2f74 6d70 2f61 6263  /private/tmp/abc
000000d0: e280 a864 6566 e280 a967 6869 0a         ...def...ghi.

Chinese header names are correct, and U+2028 and U+2029 are written
as-is. All tested with LANG=zh_TW.UTF-8 LC_COLLATE=C
LC_CTYPE=zh_TW.UTF-8.




This bug report was last modified 6 years and 160 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.