From unknown Sun Sep 07 03:08:26 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#32236 <32236@debbugs.gnu.org> To: bug#32236 <32236@debbugs.gnu.org> Subject: Status: df header corrupted with LANG=zh_TW.UTF-8 on macOS Reply-To: bug#32236 <32236@debbugs.gnu.org> Date: Sun, 07 Sep 2025 10:08:26 +0000 retitle 32236 df header corrupted with LANG=3Dzh_TW.UTF-8 on macOS reassign 32236 coreutils submitter 32236 Chih-Hsuan Yen severity 32236 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Sat Jul 21 12:09:53 2018 Received: (at submit) by debbugs.gnu.org; 21 Jul 2018 16:09:53 +0000 Received: from localhost ([127.0.0.1]:51233 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fguS8-0000kT-OL for submit@debbugs.gnu.org; Sat, 21 Jul 2018 12:09:52 -0400 Received: from eggs.gnu.org ([208.118.235.92]:52622) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fgsk3-0006fj-Ng for submit@debbugs.gnu.org; Sat, 21 Jul 2018 10:20:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fgsjx-0002bN-9C for submit@debbugs.gnu.org; Sat, 21 Jul 2018 10:20:10 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:46050) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fgsjx-0002b9-5q for submit@debbugs.gnu.org; Sat, 21 Jul 2018 10:20:09 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43578) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fgsjw-0000qb-1z for bug-coreutils@gnu.org; Sat, 21 Jul 2018 10:20:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fgsjv-0002aJ-1F for bug-coreutils@gnu.org; Sat, 21 Jul 2018 10:20:08 -0400 Received: from mail-lf1-x12e.google.com ([2a00:1450:4864:20::12e]:35316) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fgsju-0002a1-Or for bug-coreutils@gnu.org; Sat, 21 Jul 2018 10:20:06 -0400 Received: by mail-lf1-x12e.google.com with SMTP id f18-v6so3620380lfc.2 for ; Sat, 21 Jul 2018 07:20:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=KR2Z2W3zd6rKuFftKNmviSFZj68U1zdIm4KNHsdH+ng=; b=lD1KMWC0bQvsereo6o4ul3ZF2goUMUZKqroLdmtqRncoBOScfqExsUBp4NfQWLPMU0 bFCoohrW54jg03hKeN8f78TN21KC5Np+bdAlZFFK17eg40ADgAXZYYDc6GQqgqgUqWtW kfaLU5XDlwTPthbrkXn3qw7lghgtTVz7W3gEBqDuDSfhk/Ota8eyvW/f+5NABdRv/cei TApm+OObw+btgn0uxy+CEpBJ/M5UQDXuVEe86cdoRT6ot7qFpte8sfOG7O4bIFEx/QYz lV9XcsMAmJNqivaBggv6OpD+KOTV/gC/RI0XQUZxpXyOUmZ1WY8mOyfbigzuO1uPzmMu Yj/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=KR2Z2W3zd6rKuFftKNmviSFZj68U1zdIm4KNHsdH+ng=; b=ZfHxuUolDBpLe9hYeKuZji0Ghzn9PQ6e0n+dn5QGnJbbSu9u77NSYfc7TH1INbxBhf js13YWJXdyXEIXgi7GHPWIP4qR1BhlEyK9m2uwXtv9d++OxhcBJnOxJhELpRXhN3NXyg JXgJjEPzubaIs3Ll0o3G+Vm36tebJ3kAw1/Jf23qt4ZKwOVbiwuE50u1xdzn8PiCZ3Ew fOO4z/hBnVEI3TbnteRJCm1LB0ZZ8xVkpKmmnWuXY7vjbU3SaNPcBETayX3qFdqkVzGQ McH12N9fO3GKauB/qtKPhKurvrm565rXW2T7rfMindltWXD8UgKaOA0v5Z37mDhNsHsz Ybag== X-Gm-Message-State: AOUpUlEs4aMnEefKcHBdEDOWk7fCdNIEr+kk35zeHMdcP1R1rikZYmnz 9KdaT9Yp47YytaQ4HOGekuixJkSlUOXq3ff53uBrjLuq X-Google-Smtp-Source: AAOMgpdvkYG233DqB2mvdT96+o1LfJ5vAXYj4LEeg2PP3eikbduF8LXP2FhoWi3sJPdSoTHQb88b30nOhRcRP/Ui/aw= X-Received: by 2002:a19:8d07:: with SMTP id p7-v6mr3735593lfd.117.1532182804931; Sat, 21 Jul 2018 07:20:04 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a2e:6590:0:0:0:0:0 with HTTP; Sat, 21 Jul 2018 07:20:04 -0700 (PDT) From: Chih-Hsuan Yen Date: Sat, 21 Jul 2018 22:20:04 +0800 Message-ID: Subject: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: bug-coreutils@gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -3.8 (---) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 21 Jul 2018 12:09:51 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.8 (----) Hi coreutils developers, I'm using coreutils on macOS High Sierra (10.13). I noticed that with `LANG=3Dzh_TW.UTF-8`, `df` output is corrupted. =EF=BF=BD?=EF=BF=BD?=E7=B3=BB=E7=B5=B1 =E5=AE=B9=EF=BF=BD?? =E5=B7=B2=EF=BF= =BD?=EF=BF=BD =EF=BF=BD?=EF=BF=BD=EF=BF=BD?=EF=BF=BD =E5=B7=B2=EF=BF=BD?=EF= =BF=BD% =EF=BF=BD??=EF=BF=BD?=EF=BF=BD? /dev/disk1s1 234G 151G 81G 65% / /dev/disk1s4 234G 2.1G 81G 3% /private/var/vm (I'm not sure if other mail agents can display those characters correctly or not. See my blog post [1] for the exact output.) Seems it's similar to bug#25630 [2], which is not resolved. I guess the reason of my issue is that iscntrl() is broken on macOS High Sierra, so in hide_problematic_chars(), some bytes in the Chinese header is replaced with a question mark. I managed to patch coreutils [3] to make `df` work. Could you have a look? Thanks! Best, Chih-Hsuan Yen [1] https://blog.chyen.cc/posts/2018/06/23/mac-df-chinese.html [2] http://lists.gnu.org/archive/html/bug-coreutils/2017-02/msg00008.html [3] https://github.com/yan12125/macports-ports/blob/fix-coreutils-df-chines= e/sysutils/coreutils/files/patch-df.diff From debbugs-submit-bounces@debbugs.gnu.org Sat Jul 21 16:30:31 2018 Received: (at 32236) by debbugs.gnu.org; 21 Jul 2018 20:30:31 +0000 Received: from localhost ([127.0.0.1]:51391 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fgyWM-00070j-V1 for submit@debbugs.gnu.org; Sat, 21 Jul 2018 16:30:31 -0400 Received: from mail.magicbluesmoke.com ([82.195.144.49]:51904) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fgyWL-00070b-4n for 32236@debbugs.gnu.org; Sat, 21 Jul 2018 16:30:29 -0400 Received: from localhost.localdomain (unknown [76.21.115.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id DF2479D4A; Sat, 21 Jul 2018 21:30:27 +0100 (IST) Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: Chih-Hsuan Yen , 32236@debbugs.gnu.org, bug-gnulib References: From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: <36659f47-2bab-bfb2-e197-882d02cd1169@draigBrady.com> Date: Sat, 21 Jul 2018 13:30:25 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------FB6C10D6334910DB3F5038B1" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 32236 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) This is a multi-part message in MIME format. --------------FB6C10D6334910DB3F5038B1 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit On 21/07/18 07:20, Chih-Hsuan Yen wrote: > Hi coreutils developers, > > I'm using coreutils on macOS High Sierra (10.13). I noticed that with > `LANG=zh_TW.UTF-8`, `df` output is corrupted. > > �?�?系統 容�?? 已�?� �?��?� 已�?�% �??�?�? > /dev/disk1s1 234G 151G 81G 65% / > /dev/disk1s4 234G 2.1G 81G 3% /private/var/vm > > (I'm not sure if other mail agents can display those characters > correctly or not. See my blog post [1] for the exact output.) > > Seems it's similar to bug#25630 [2], which is not resolved. I guess > the reason of my issue is that iscntrl() is broken on macOS High > Sierra, so in hide_problematic_chars(), some bytes in the Chinese > header is replaced with a question mark. I managed to patch coreutils > [3] to make `df` work. Could you have a look? Thanks! > > Best, > > Chih-Hsuan Yen > > [1] https://blog.chyen.cc/posts/2018/06/23/mac-df-chinese.html > [2] http://lists.gnu.org/archive/html/bug-coreutils/2017-02/msg00008.html > [3] https://github.com/yan12125/macports-ports/blob/fix-coreutils-df-chinese/sysutils/coreutils/files/patch-df.diff Wow. That's surprising. I do see the FreeBSD man pages say: "The 4.4BSD extension of accepting arguments outside of the range of the unsigned char type in locales with large character sets is considered obsolete and may not be supported in future releases." Now I think that might have been referring to >= 0xFF, but fair enough. I've attached a gnulib patch to document for iscntrl at least. It would be great if someone could test the other is*() classification functions on macOS so that I might have a more complete documentation patch. I've also attached an alternative patch for df (in your name). Can you try that one? thanks! Pádraig --------------FB6C10D6334910DB3F5038B1 Content-Type: text/x-patch; name="df-utf8-osx.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="df-utf8-osx.patch" >From 6b7434fb222144af3ae9e2d4fd6b4c72eec25f5b Mon Sep 17 00:00:00 2001 From: Chih-Hsuan Yen Date: Sat, 21 Jul 2018 13:19:23 -0700 Subject: [PATCH] df: avoid multibyte character corruption on macOS * src/df.c (hide_problematic_chars): Use c_iscntrl() as passing 8 bit characters to iscntrl() is not supported on macOS. * NEWS: Mention the bug fix. Fixes https://bugs.gnu.org/32236 --- NEWS | 4 ++++ src/df.c | 3 ++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index af1a990..aa3b4f9 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,10 @@ GNU coreutils NEWS -*- outline -*- * Noteworthy changes in release ?.? (????-??-??) [?] +** Bug fixes + + df no longer corrupts displayed multibyte characters on macOS. + * Noteworthy changes in release 8.30 (2018-07-01) [stable] diff --git a/src/df.c b/src/df.c index 1178865..c851fcc 100644 --- a/src/df.c +++ b/src/df.c @@ -23,6 +23,7 @@ #include #include #include +#include #include "system.h" #include "canonicalize.h" @@ -281,7 +282,7 @@ hide_problematic_chars (char *cell) char *p = cell; while (*p) { - if (iscntrl (to_uchar (*p))) + if (c_iscntrl (to_uchar (*p))) *p = '?'; p++; } -- 2.9.3 --------------FB6C10D6334910DB3F5038B1 Content-Type: text/x-patch; name="osx-iscntrl-doc.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="osx-iscntrl-doc.patch" >From 816cc0d5fb92552a551c523f49c829261731dfe8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= Date: Sat, 21 Jul 2018 13:15:13 -0700 Subject: [PATCH] iscntrl: document that macOS returns true for >= 0x80 * doc/posix-functions/iscntrl.texi: Mention that support for chars >= 0x80 is not standarized, and not supported on OS X >= 10.5 at least --- doc/posix-functions/iscntrl.texi | 3 +++ 1 file changed, 3 insertions(+) diff --git a/doc/posix-functions/iscntrl.texi b/doc/posix-functions/iscntrl.texi index 7e6813f..3c15708 100644 --- a/doc/posix-functions/iscntrl.texi +++ b/doc/posix-functions/iscntrl.texi @@ -16,6 +16,9 @@ OS X 10.8. Portability problems not fixed by Gnulib: @itemize +This function does not support arguments outside of the range of the +unsigned char type in locales with large character sets, on some platforms. +OS X 10.5 will return non zero for characters >= 0x80 in UTF-8 locales. @end itemize Note: This function's behaviour depends on the locale, but does not support -- 2.9.3 --------------FB6C10D6334910DB3F5038B1-- From debbugs-submit-bounces@debbugs.gnu.org Sat Jul 21 18:43:46 2018 Received: (at 32236) by debbugs.gnu.org; 21 Jul 2018 22:43:47 +0000 Received: from localhost ([127.0.0.1]:51450 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fh0bK-0003iF-MF for submit@debbugs.gnu.org; Sat, 21 Jul 2018 18:43:46 -0400 Received: from mo4-p00-ob.smtp.rzone.de ([81.169.146.218]:26851) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fh0bI-0003i6-TK for 32236@debbugs.gnu.org; Sat, 21 Jul 2018 18:43:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1532213023; s=strato-dkim-0002; d=clisp.org; h=References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: X-RZG-CLASS-ID:X-RZG-AUTH:From:Subject:Sender; bh=tFMb+LarIGJMdeAvOA2H3kHxqnviakBVGFpcgjVmHMQ=; b=fQAbXmrfIXPYHs6ce7ihbyJ9dmrVwb/ys0/eqGKR8H2EG62cTg6EQV5e2nyuRoGpjo lSE3PYjeQl0nh8vAx/ZtyPHuofRaDOhg0IaQeJvVfm+3oZEOgRZlyfC9Nn7L4GhRL6mm 0ybkXdNWOoINtSGEGTlG3DEI/JLT8DZNH635P7elkmIkPQ+ZTydeUBXzRm/8y7a+x1V2 cRTI96uliDac496FbW7DtUF5AasuFbCC8MJ3/Be9HrKv8cTXHvx29bMcTjFmuClfikti 31mHe3uPB8sqB0vt74wP+X0t6G/vApL3SiV6zDEL1BOn0vvPL3EV02aF1x7e/w5O9Jhn OfTA== X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH+AHjwLuWOGKf9zfs=" X-RZG-CLASS-ID: mo00 Received: from bruno.haible.de by smtp.strato.de (RZmta 43.13 DYNA|AUTH) with ESMTPSA id g03ba1u6LMhgfzM (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (curve secp521r1 with 521 ECDH bits, eq. 15360 bits RSA)) (Client did not present a certificate); Sun, 22 Jul 2018 00:43:42 +0200 (CEST) From: Bruno Haible To: bug-gnulib@gnu.org Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS Date: Sun, 22 Jul 2018 00:43:42 +0200 Message-ID: <1599384.GXLMD97vOh@omega> User-Agent: KMail/5.1.3 (Linux/4.4.0-130-generic; KDE/5.18.0; x86_64; ; ) In-Reply-To: <36659f47-2bab-bfb2-e197-882d02cd1169@draigBrady.com> References: <36659f47-2bab-bfb2-e197-882d02cd1169@draigBrady.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 32236 Cc: Chih-Hsuan Yen , =?ISO-8859-1?Q?P=E1draig?= Brady , 32236@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Hi P=E1draig, > I've attached a gnulib patch to document for iscntrl at least. > +This function does not support arguments outside of the range of the > +unsigned char type in locales with large character sets, on some platfor= ms. > +OS X 10.5 will return non zero for characters >=3D 0x80 in UTF-8 locales. In UTF-8 locales, arguments >=3D 0x80 are invalid arguments for iscntrl(). POSIX [1] says "The c argument is a type int, the value of which the application shall ensure is a character representable as an unsigned char or equal to the value of the macro EOF. If the argument has any other value, the behavior is undefined." The term "character" is defined here [2]: "A sequence of one or more bytes representing a single graphic symbol or control code." So, in a UTF-8 locale, a "character representable as an unsigned char" is a byte sequence of length 1, where the single byte has a value in the range 0x00..0x7F. =46or invalid values "the behavior is undefined." You were expecting a valu= e 0. Now, in the gnulib documentations, what we mention as portability problems are the cases where - the behaviour for valid arguments is different on different platforms, = or - the boundary between valid and invalid arguments is fuzzy and depends on the platform. IMO there's no point in documenting that a function _really_ has undefined behaviour when POSIX says that it has undefined behaviour. > I've also attached an alternative patch for df (in your name). This patch is correct (because the characters that you test for in c_iscntrl are 0x00..0x1F, 0x7F, which don't occur as second or later byte in a multib= yte character in the EUC-JP, EUC-KR, GB2312, EUC-TW, GB18030, SJIS encodings). But it does not catch control characters outside of the ASCII range. It wou= ld make sense to catch these as well. If you want to do that, 'hide_problematic_chars' needs to be rewritten as a loop that iterates acro= ss the multibyte characters. For example with the 'mbiter' module, in combination with the mb_iscntrl function from the 'mbchar' module. Or directly with mbrtowc() and iswcntrl(). Bruno [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/iscntrl.html [2] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html= #tag_03_87 From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 22 02:44:00 2018 Received: (at 32236) by debbugs.gnu.org; 22 Jul 2018 06:44:00 +0000 Received: from localhost ([127.0.0.1]:51528 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fh864-0002Io-5R for submit@debbugs.gnu.org; Sun, 22 Jul 2018 02:44:00 -0400 Received: from mail-lf1-f41.google.com ([209.85.167.41]:46366) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fh7Wl-0001T5-UT for 32236@debbugs.gnu.org; Sun, 22 Jul 2018 02:07:32 -0400 Received: by mail-lf1-f41.google.com with SMTP id l16-v6so4282955lfc.13 for <32236@debbugs.gnu.org>; Sat, 21 Jul 2018 23:07:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=2bPYjaq7u3hmIfjmATYAiy0MLkETLS771RzQcfn8AvU=; b=u7Twx7e26K4opX9wyLrbJfDlO0dPEMVZJ0rq8kASxTXLecHgPwAntQ7UnNsSB7xR/c VHQs6UDEolnXDBRdZTCpqSxY9jxQi7j9nniF0bf5VvQqDSCyFGNtd6tOsHU1uIDRxtas ZwMKI7wgmJtbKPoHZvgRjihCX7EORQhcsfXCFALqcYNk6bV7ijvfT19zSmBwX6BOyh9H eCxElJY6VFptdMuiylVfUQmZfPuRB7SHNCEe3NLFve45CD95e5dvRF8WD+H1AzhMHgwV pbUM2E/Z498tQ+WIGLCWWdB/rY86WESWQDqk7hQrCkuSvCQx95nYyOWzUJkxfDIcuolP I/9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=2bPYjaq7u3hmIfjmATYAiy0MLkETLS771RzQcfn8AvU=; b=rXgDrZDpWWDPs0EldbjwAUKGYsQe/cAtkLFNsqQZ8PoqXJwKLSlTqn1XOEG57fuL2x RJchIFbO0okeVaV7lMIO1g4A3ujl0LE7wmZDhU1cHSQbuAJbJ2rbj40BO758tLT4D93c IZQDEy4rLwiySkl48ptEEQ0QSlO34+96VpTglcu8++Dv1x3aZwGxCm6u+Bk9wQ5cazKo k1h8Ji0bYG2RWsAn8V8CoMqjO4XCIP0jOZfnirHovuY6Q4sHRbLpTSQexrkkwYyfgGlS S5miptqHKJoZCDwbTnLwrDPqlmn1BZOdaiq7eX97Q6sJ7NwEgsIAHDjtgcLFKjZ1xlyC IOIg== X-Gm-Message-State: AOUpUlGpt6tzmYPzs2+ErPlDIXzZ6K60l4qfBLYTEqxSHmkN4xUbgewd C6t/rHR6U/fep2umNcnDKnuDeqGRaK1ndQMRL5o= X-Google-Smtp-Source: AAOMgpdtXU7JjEkW7ttmOK+uPWfL0fKecgJ+BkVtVHfzR4br5jt6FhyzAr9Kia+AScRx/T81Kf9QANQ564SSB9AupgU= X-Received: by 2002:a19:8ecf:: with SMTP id a76-v6mr4858069lfl.52.1532239645920; Sat, 21 Jul 2018 23:07:25 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a2e:6590:0:0:0:0:0 with HTTP; Sat, 21 Jul 2018 23:07:25 -0700 (PDT) In-Reply-To: <1599384.GXLMD97vOh@omega> References: <36659f47-2bab-bfb2-e197-882d02cd1169@draigBrady.com> <1599384.GXLMD97vOh@omega> From: Chih-Hsuan Yen Date: Sun, 22 Jul 2018 14:07:25 +0800 Message-ID: Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: Bruno Haible Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 32236 X-Mailman-Approved-At: Sun, 22 Jul 2018 02:43:58 -0400 Cc: =?UTF-8?Q?P=C3=A1draig_Brady?= , bug-gnulib@gnu.org, 32236@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) 2018-07-22 6:43 GMT+08:00 Bruno Haible : > Hi P=C3=A1draig, > >> I've attached a gnulib patch to document for iscntrl at least. > >> +This function does not support arguments outside of the range of the >> +unsigned char type in locales with large character sets, on some platfo= rms. >> +OS X 10.5 will return non zero for characters >=3D 0x80 in UTF-8 locale= s. > > In UTF-8 locales, arguments >=3D 0x80 are invalid arguments for iscntrl()= . > > POSIX [1] says > "The c argument is a type int, the value of which the application shall > ensure is a character representable as an unsigned char or equal to th= e > value of the macro EOF. If the argument has any other value, the behav= ior > is undefined." > > The term "character" is defined here [2]: > "A sequence of one or more bytes representing a single graphic symbol o= r > control code." > > So, in a UTF-8 locale, a "character representable as an unsigned char" > is a byte sequence of length 1, where the single byte has a value in the > range 0x00..0x7F. > > For invalid values "the behavior is undefined." You were expecting a valu= e 0. > > Now, in the gnulib documentations, what we mention as portability problem= s > are the cases where > - the behaviour for valid arguments is different on different platforms= , or > - the boundary between valid and invalid arguments is fuzzy and depends= on > the platform. > IMO there's no point in documenting that a function _really_ has undefine= d > behaviour when POSIX says that it has undefined behaviour. > >> I've also attached an alternative patch for df (in your name). > > This patch is correct (because the characters that you test for in c_iscn= trl > are 0x00..0x1F, 0x7F, which don't occur as second or later byte in a mult= ibyte > character in the EUC-JP, EUC-KR, GB2312, EUC-TW, GB18030, SJIS encodings)= . > > But it does not catch control characters outside of the ASCII range. It w= ould > make sense to catch these as well. If you want to do that, > 'hide_problematic_chars' needs to be rewritten as a loop that iterates ac= ross > the multibyte characters. For example with the 'mbiter' module, in > combination with the mb_iscntrl function from the 'mbchar' module. Or > directly with mbrtowc() and iswcntrl(). > > Bruno > > [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/iscntrl.htm= l > [2] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.ht= ml#tag_03_87 The `c_iscntrl()` patch also fixes the issue on macOS. Please tell me if you want me to test other patches, thanks! Cheers, Chih-Hsuan Yen From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 22 06:47:09 2018 Received: (at 32236) by debbugs.gnu.org; 22 Jul 2018 10:47:09 +0000 Received: from localhost ([127.0.0.1]:51576 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhBtK-00085x-BQ for submit@debbugs.gnu.org; Sun, 22 Jul 2018 06:47:08 -0400 Received: from mo4-p00-ob.smtp.rzone.de ([85.215.255.25]:34072) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhBtG-00085O-05 for 32236@debbugs.gnu.org; Sun, 22 Jul 2018 06:47:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1532256420; s=strato-dkim-0002; d=clisp.org; h=References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: X-RZG-CLASS-ID:X-RZG-AUTH:From:Subject:Sender; bh=fNUZJR4hmBmvhE3JnD+CCD3qktsB+6wDBbtun8z0LsA=; b=TQa/t68ZR2Jj5im8Fvb9kZN82SCCvAIxZky++1H2R3kx6RibZ+h9X/HWFbM1tGCTxu dKgmeEoFSGT9WyLGWFbp6zeSN8cE3e5Bs/ImvmEEbxYBJgHKZx0vW9jNfy9w0+RblvMx CA4DxifANUCLLsXbHvSKFqbw86j985QM1PNZkGrik9AeJk7hbmaFh/gkuPx9WhTo5fN0 mbdV6dVu6E9qO6YINplaZkeW5VMqH5/dZU5uCWvUVNOi/4I9TCnDvlST9dG2JNZUzKDf DcWHp5nzDfnuRUTVNW33IxHrOSk7aTyJUWVtR+yPuDimA3gSr2TYAwa8Rw8J5PatHXTy w5wQ== X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH+AHjwLuWOGKf9zfs=" X-RZG-CLASS-ID: mo00 Received: from bruno.haible.de by smtp.strato.de (RZmta 43.13 DYNA|AUTH) with ESMTPSA id g03ba1u6MAkxhHR (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (curve secp521r1 with 521 ECDH bits, eq. 15360 bits RSA)) (Client did not present a certificate); Sun, 22 Jul 2018 12:46:59 +0200 (CEST) From: Bruno Haible To: Chih-Hsuan Yen Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS Date: Sun, 22 Jul 2018 12:46:59 +0200 Message-ID: <2137390.TxkQqJaxDP@omega> User-Agent: KMail/5.1.3 (Linux/4.4.0-130-generic; KDE/5.18.0; x86_64; ; ) In-Reply-To: References: <1599384.GXLMD97vOh@omega> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 32236 Cc: =?ISO-8859-1?Q?P=E1draig?= Brady , bug-gnulib@gnu.org, 32236@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Chih-Hsuan Yen wrote: > The `c_iscntrl()` patch also fixes the issue on macOS. Please tell me > if you want me to test other patches, thanks! You could test how it behaves with mount points that contain U+2028 or U+2029 characters. On Linux, I'd test it like this. Hope it's similar on macOS: $ mkdir /tmp/`printf 'abc\u2028def\u2029ghi'` $ sudo mount -r -t iso9660 -o loop /some/iso/image.iso /tmp/abc* $ df =2E.. /dev/loop0 1986048 1986048 0 100% /tmp/abc=EF= =BF=BDdef=EF=BF=BDghi $ ls -ld /tmp/abc* dr-xr-xr-x 4 root root 2048 Nov 19 2014 /tmp/abc?def?ghi Bruno From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 22 10:07:17 2018 Received: (at 32236) by debbugs.gnu.org; 22 Jul 2018 14:07:17 +0000 Received: from localhost ([127.0.0.1]:52473 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhF12-0000HN-Ux for submit@debbugs.gnu.org; Sun, 22 Jul 2018 10:07:17 -0400 Received: from mail-lf1-f48.google.com ([209.85.167.48]:44377) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhDpa-0006hC-Vf for 32236@debbugs.gnu.org; Sun, 22 Jul 2018 08:51:23 -0400 Received: by mail-lf1-f48.google.com with SMTP id g6-v6so4591103lfb.11 for <32236@debbugs.gnu.org>; Sun, 22 Jul 2018 05:51:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=Kp7vqEotWsTMEPt2heweMhZb8zzBAO0+y9pQVHQ1CaQ=; b=ovk+KgiTtM185Jcq3ZsgTJTTS3pinqd8T0q7ilAdtNePeUsst2CREJ2EqGPtm+bu3Z Vcf9Fs++Hay1HvzE2ktz+uBtvNiBwHYBMn8SePxC28wieWU5MT0FOPqR6ja6QrqSD3NS XyviteyHunw2NMfVZ1DK6i9uefo3QU2p3Vcmvg1SMfhYt8KsFhL/9cx4EooJiCwD8m8n v8rUJfPJBoff7LdXcsDTuflGYJqQWGYIS48mlDNKvQHRc9e3KrrcX66FJ2kxWejObRlM Nv3E+LCT4FlwM3VcehH9JUMAOi+YY+D1Sb46NCQDO6Orq1qQ0ccK1EZ9xDnI9qatf0D+ GA+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=Kp7vqEotWsTMEPt2heweMhZb8zzBAO0+y9pQVHQ1CaQ=; b=srfaU2bT3gf7WbWVLdxk75nEvI+MspgIL4SZ3a4c4nATUjInlxQXxkjV9lOm+7D2RM wTasIb22cLNyjnWUyBa9fwRm9hzRhUKvhK1Z8vLTldgF8icdrmnxwaSmxM3Nx5L0KcwC PlgXlJqKiuviemuEj+9oevUl0EpR+eyozo7t67SNikAqydWC8uTjJzy6kMn6/OQv4Zpt WPKbfHtPTUSYk0TdNcYNqIPmv6PMsV24yE5YzOT15RlYv6eGJw5smTPi4+JFFRR/zCU9 5rxEYHMxAhuYATYTTSytUVNjXx9y+UKwnwFDq9MVloWTR86fwHBuhjTC/0Ao8+YuuU6t nZZA== X-Gm-Message-State: AOUpUlFFVyRbGMHjbpfRdKJE5X5rCO6LwTQOFdCypQ7BZ5NUBlLWEhm2 T/HtLhturGnMHxIcl08RD3hJ/sulQTAtJEQ8KPE= X-Google-Smtp-Source: AAOMgpdgwyLF+nuv44FGoEIUFQEzeyc4a1jw3MZwxX8/dwqIy4VQoo52AMeJZpb3FhIF+I/ePF1/1h7Mz60On2vOFCk= X-Received: by 2002:a19:95c9:: with SMTP id x192-v6mr5173728lfd.37.1532263876653; Sun, 22 Jul 2018 05:51:16 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a2e:6590:0:0:0:0:0 with HTTP; Sun, 22 Jul 2018 05:51:16 -0700 (PDT) In-Reply-To: <2137390.TxkQqJaxDP@omega> References: <1599384.GXLMD97vOh@omega> <2137390.TxkQqJaxDP@omega> From: Chih-Hsuan Yen Date: Sun, 22 Jul 2018 20:51:16 +0800 Message-ID: Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: Bruno Haible Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 32236 X-Mailman-Approved-At: Sun, 22 Jul 2018 10:07:16 -0400 Cc: =?UTF-8?Q?P=C3=A1draig_Brady?= , bug-gnulib@gnu.org, 32236@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) 2018-07-22 18:46 GMT+08:00 Bruno Haible : > Chih-Hsuan Yen wrote: >> The `c_iscntrl()` patch also fixes the issue on macOS. Please tell me >> if you want me to test other patches, thanks! > > You could test how it behaves with mount points that contain U+2028 or > U+2029 characters. On Linux, I'd test it like this. Hope it's similar > on macOS: > $ mkdir /tmp/`printf 'abc\u2028def\u2029ghi'` > $ sudo mount -r -t iso9660 -o loop /some/iso/image.iso /tmp/abc* > $ df > ... > /dev/loop0 1986048 1986048 0 100% /tmp/abc=EF= =BF=BDdef=EF=BF=BDghi > $ ls -ld /tmp/abc* > dr-xr-xr-x 4 root root 2048 Nov 19 2014 /tmp/abc?def?ghi > > Bruno > Hi Bruno, With the c_iscntrl() patch, the result of ls and df are: (I use xxd as GMail seems unable to handle U+2028 and U+2029 correctly) $ ls -ld /tmp/abc=E2=80=A8def=E2=80=A9ghi | xxd 00000000: 6472 7778 722d 7872 2d78 2031 2079 656e drwxr-xr-x 1 yen 00000010: 2073 7461 6666 2034 3039 3620 3230 3138 staff 4096 2018 00000020: 2f30 372f 3232 2030 323a 3136 3a34 3320 /07/22 02:16:43 00000030: 2f74 6d70 2f61 6263 e280 a864 6566 e280 /tmp/abc...def.. 00000040: a967 6869 0a .ghi. $ df | xxd 00000000: e6aa 94e6 a188 e7b3 bbe7 b5b1 2020 2020 ............ 00000010: 2020 2020 e5ae b9e9 878f 2020 e5b7 b2e7 ...... .... 00000020: 94a8 2020 e58f afe7 94a8 20e5 b7b2 e794 .. ...... ..... 00000030: a825 20e6 8e9b e8bc 89e9 bb9e 0a2f 6465 .% ........../de 00000040: 762f 6469 736b 3173 3120 2020 2032 3334 v/disk1s1 234 00000050: 4720 2031 3337 4720 2020 3935 4720 2020 G 137G 95G 00000060: 3630 2520 2f0a 2f64 6576 2f64 6973 6b31 60% /./dev/disk1 00000070: 7334 2020 2020 3233 3447 2020 322e 3147 s4 234G 2.1G 00000080: 2020 2039 3547 2020 2020 3325 202f 7072 95G 3% /pr 00000090: 6976 6174 652f 7661 722f 766d 0a63 6879 ivate/var/vm.chy 000000a0: 656e 2e63 633a 2020 2020 2020 2020 3235 en.cc: 25 000000b0: 4720 2020 3132 4720 2020 3132 4720 2020 G 12G 12G 000000c0: 3532 2520 2f70 7269 7661 7465 2f74 6d70 52% /private/tmp 000000d0: 2f61 6263 e280 a864 6566 e280 a967 6869 /abc...def...ghi 000000e0: 0a . Without the c_iscntrl() patch (unmodified 8.30), ls behaves the same, and the result of df is: $ df | xxd 00000000: e6aa 3fe6 a13f e7b3 bbe7 b5b1 20e5 aeb9 ..?..?...... ... 00000010: e93f 3f20 e5b7 b2e7 3fa8 20e5 3faf e73f .?? ....?. .?..? 00000020: a820 e5b7 b2e7 3fa8 2520 e63f 3fe8 bc3f . ....?.% .??..? 00000030: e9bb 3f0a 2f64 6576 2f64 6973 6b31 7331 ..?./dev/disk1s1 00000040: 2020 2020 3233 3447 2020 3133 3747 2020 234G 137G 00000050: 2020 3935 4720 2020 2036 3025 202f 0a2f 95G 60% /./ 00000060: 6465 762f 6469 736b 3173 3420 2020 2032 dev/disk1s4 2 00000070: 3334 4720 2032 2e31 4720 2020 2039 3547 34G 2.1G 95G 00000080: 2020 2020 2033 2520 2f70 7269 7661 7465 3% /private 00000090: 2f76 6172 2f76 6d0a 6368 7965 6e2e 6363 /var/vm.chyen.cc 000000a0: 3a20 2020 2020 2020 2032 3547 2020 2031 : 25G 1 000000b0: 3247 2020 2020 3132 4720 2020 2035 3125 2G 12G 51% 000000c0: 202f 7072 6976 6174 652f 746d 702f 6162 /private/tmp/ab 000000d0: 63e2 3fa8 6465 66e2 3fa9 6768 690a c.?.def.?.ghi. Hope those results are helpful! Chih-Hsuan Yen From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 22 11:12:17 2018 Received: (at 32236) by debbugs.gnu.org; 22 Jul 2018 15:12:17 +0000 Received: from localhost ([127.0.0.1]:52560 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhG1x-0001uc-CT for submit@debbugs.gnu.org; Sun, 22 Jul 2018 11:12:17 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:40702) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhG1s-0001uJ-Ts for 32236@debbugs.gnu.org; Sun, 22 Jul 2018 11:12:13 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 1369E16081C; Sun, 22 Jul 2018 08:12:07 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id hD8k9BiL6rAj; Sun, 22 Jul 2018 08:12:06 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 15C90160825; Sun, 22 Jul 2018 08:12:06 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id giHMMy0g8P9M; Sun, 22 Jul 2018 08:12:05 -0700 (PDT) Received: from [192.168.1.9] (unknown [47.154.30.119]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id A8D9116081C; Sun, 22 Jul 2018 08:12:05 -0700 (PDT) Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: =?UTF-8?Q?P=c3=a1draig_Brady?= , Chih-Hsuan Yen , 32236@debbugs.gnu.org, bug-gnulib References: <36659f47-2bab-bfb2-e197-882d02cd1169@draigBrady.com> From: Paul Eggert Openpgp: preference=signencrypt Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= xsFNBEyAcmQBEADAAyH2xoTu7ppG5D3a8FMZEon74dCvc4+q1XA2J2tBy2pwaTqfhpxxdGA9 Jj50UJ3PD4bSUEgN8tLZ0san47l5XTAFLi2456ciSl5m8sKaHlGdt9XmAAtmXqeZVIYX/UFS 96fDzf4xhEmm/y7LbYEPQdUdxu47xA5KhTYp5bltF3WYDz1Ygd7gx07Auwp7iw7eNvnoDTAl KAl8KYDZzbDNCQGEbpY3efZIvPdeI+FWQN4W+kghy+P6au6PrIIhYraeua7XDdb2LS1en3Ss mE3QjqfRqI/A2ue8JMwsvXe/WK38Ezs6x74iTaqI3AFH6ilAhDqpMnd/msSESNFt76DiO1ZK QMr9amVPknjfPmJISqdhgB1DlEdw34sROf6V8mZw0xfqT6PKE46LcFefzs0kbg4GORf8vjG2 Sf1tk5eU8MBiyN/bZ03bKNjNYMpODDQQwuP84kYLkX2wBxxMAhBxwbDVZudzxDZJ1C2VXujC OJVxq2kljBM9ETYuUGqd75AW2LXrLw6+MuIsHFAYAgRr7+KcwDgBAfwhPBYX34nSSiHlmLC+ KaHLeCLF5ZI2vKm3HEeCTtlOg7xZEONgwzL+fdKo+D6SoC8RRxJKs8a3sVfI4t6CnrQzvJbB n6gxdgCu5i29J1QCYrCYvql2UyFPAK+do99/1jOXT4m2836j1wARAQABzSBQYXVsIEVnZ2Vy dCA8ZWdnZXJ0QGNzLnVjbGEuZWR1PsLBfgQTAQIAKAUCTIByZAIbAwUJEswDAAYLCQgHAwIG FQgCCQoLBBYCAwECHgECF4AACgkQ7ZfpDmKqfjRRGw/+Ij03dhYfYl/gXVRiuzV1gGrbHk+t nfrI/C7fAeoFzQ5tVgVinShaPkZo0HTPf18x6IDEdAiO8Mqo1yp0CtHmzGMCJ50o4Grgfjlr 6g/+vtEOKbhleszN2XpJvpwM2QgGvn/laTLUu8PH9aRWTs7qJJZKKKAb4sxYc92FehPu6FOD 0dDiyhlDAq4lOV2mdBpzQbiojoZzQLMQwjpgCTK2572eK9EOEQySUThXrSIz6ASenp4NYTFH s9tuJQvXk9gZDdPSl3bp+47dGxlxEWLpBIM7zIONw4ks4azgT8nvDZxA5IZHtvqBlJLBObYY 0Le61Wp0y3TlBDh2qdK8eYL426W4scEMSuig5gb8OAtQiBW6k2sGUxxeiv8ovWu8YAZgKJfu oWI+uRnMEddruY8JsoM54KaKvZikkKs2bg1ndtLVzHpJ6qFZC7QVjeHUh6/BmgvdjWPZYFTt N+KA9CWX3GQKKgN3uu988yznD7LnB98T4EUH1HA/GnfBqMV1gpzTvPc4qVQinCmIkEFp83zl +G5fCjJJ3W7ivzCnYo4KhKLpFUm97okTKR2LW3xZzEW4cLSWO387MTK3CzDOx5qe6s4a91Zu ZM/j/TQdTLDaqNn83kA4Hq48UHXYxcIh+Nd8k/3w6lFuoK0wrOFiywjLx+0ur5jmmbecBGHc 1xdhAFHOwU0ETIByZAEQAKaF678T9wyH4wjTrV1Pz3cDEoSnV/0ZUrOT37p1dcGyj/IXq1x6 70HRVahAmk0sZpYc25PF9D5GPYHFWlNjuPU96rDndXB3hedmBRhLdC4bAXjI4DV+bmdVe+q/ IMnlZRaVlm9EiMCVAR6w13sReu7qXkW9r3RwY2AzXskp/tAe4BRKr1Zmbvi2nbnQ6epEC42r Rbx0B1EhjbIQZ5JHGk24iPT7LdBgnNmos5wYjzwNlkMQD5T0Ydzhk7J+UxwA5m46mOhRDC2r FV/A0gm5TLy8DXjv/Esc4gYnYai6SQqnUEVh5LuV8YCJBnijs+Tiw71x1icmn6xGI45EugJO gec+rLypYgpVp4x0HI5T88qBRYCkxH3Kg8Qo+EWNA9A4LRQ9DX8njona0gf0s03tocK8kBN6 6UoqqPtHBnc4eMgBymCflK12eKfd2YYxnyg9cZazWA5VslvTxpm76hbg5oiAEH/Vg/8MxHyA nPhfrgwyPrmJEcVBafdspJnYQxBYNco2LFPIhlOvWh8r4at+s+M3Lb26oUTczlgdW1Sf3SDA 77BMRnF0FQyE+7AzV79MBN4ykiqaezQxtaF1Fy/tvkhffSo8u+dwG0EgJh+te38gTcISVr0G IPplLz6YhjrbHrPRF1CN5UuL9DBGjxuN35RLNVEfta6RUFlR6NctTjvrABEBAAHCwWUEGAEC AA8FAkyAcmQCGwwFCRLMAwAACgkQ7ZfpDmKqfjSrHA/+KzAKvTxRhA9MWNLxIyJ7S5uJ16gs T3oCjZrBKGEhKMOGX4O0GA6VOEryO7QRCCYah3oxSG38IAnNeiwJXgU9Bzkk85UGbPEd7HGF /VSeHCQwWou6jqUDTSDvn9YhNTdG0KXPM74aC+xr2Zow1O2mhXihgWKD0Dw+0LYPnUOsQ0KO FxHXXYHmRrS1OZPU59BLvc+TRhIhafSHKLwbXK+6ckkxBx6h8z5ccpG0Qs4bFhdFYnFrEieD LoGmnE2YLhdV6swJ9VNCS6pLiEohT3fm7aXm15tZOIyzMZhHRSAPblXxQ0ZSWjq8oRrcYNFx c4W1URpAkBCOYJoXvQfD5L3lqAl8TCqDUzYxhH/tJhbDdHrqHH767jaDaTB1+Talp/2AMKwc XNOdiklGxbmHVG6YGl6g8Lrbsu9NZEI4yLlHzuikthJWgz+3vZhVGyNlt+HNIoF6CjDL2omu 5cEq4RDHM44QqPk6l7O0pUvN1mT4B+S1b08RKpqm/ff015E37HNV/piIvJlxGAYz8PSfuGCB 1thMYqlmgdhd9/BabGFbGGYHA6U4/T5zqU+f6xHy1SsAQZ1MSKlLwekBIT+4/cLRGqCHjnV0 q5H/T6a7t5mPkbzSrOLSo4puj+IToNjYyYIDBWzhlA19avOa+rvUjmHtD3sFN7cXWtkGoi8b uNcby4U= Organization: UCLA Computer Science Department Message-ID: <66ec8c11-a2af-a7cb-0d07-4dabe4d232f8@cs.ucla.edu> Date: Sun, 22 Jul 2018 08:12:04 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <36659f47-2bab-bfb2-e197-882d02cd1169@draigBrady.com> Content-Type: multipart/mixed; boundary="------------319BB69583027A4006235300" Content-Language: en-US X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 32236 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) This is a multi-part message in MIME format. --------------319BB69583027A4006235300 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable P=C3=A1draig Brady wrote: > I've also attached an alternative patch for df (in your name). That still has problems, since it can generate improperly-encoded strings= in=20 UTF-8 locales (if the inputs are improperly encoded), and can replace par= ts of=20 multibyte characters with '?' in non-UTF-8 locales. Please try the attach= ed=20 patch instead, which attempts to address these issues. This is more along= the=20 lines that Bruno suggested, except it doesn't use mbsiter as I figured it= was=20 simpler overall just to use mbrtowc directly for this one thing. --------------319BB69583027A4006235300 Content-Type: text/x-patch; name="0001-df-avoid-multibyte-character-corruption-on-macOS.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename*0="0001-df-avoid-multibyte-character-corruption-on-macOS.patch" =46rom 17a1a37549344cdfd95cc84b1848dafa256be5a0 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 22 Jul 2018 08:09:01 -0700 Subject: [PATCH] df: avoid multibyte character corruption on macOS Problem reported by Chih-Hsuan Yen (Bug#32236). * NEWS: Mention the bug fix. * src/df.c: Include wchar.h and wctype.h instead of mbswidth.h. (hide_problematic_chars): Return number of screen columns. All callers changed. Use iswcntrl, not iscntrl. (get_header, get_dev): Rely on hide_problematic_chars width, not mbswidth. Scan the cell once, instead of two or three times. --- NEWS | 4 ++++ src/df.c | 46 +++++++++++++++++++++++++++++++--------------- 2 files changed, 35 insertions(+), 15 deletions(-) diff --git a/NEWS b/NEWS index af1a990..aa3b4f9 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,10 @@ GNU coreutils NEWS -*= - outline -*- =20 * Noteworthy changes in release ?.? (????-??-??) [?] =20 +** Bug fixes + + df no longer corrupts displayed multibyte characters on macOS. + =20 * Noteworthy changes in release 8.30 (2018-07-01) [stable] =20 diff --git a/src/df.c b/src/df.c index 1178865..664b88b 100644 --- a/src/df.c +++ b/src/df.c @@ -23,6 +23,8 @@ #include #include #include +#include +#include =20 #include "system.h" #include "canonicalize.h" @@ -31,7 +33,6 @@ #include "fsusage.h" #include "human.h" #include "mbsalign.h" -#include "mbswidth.h" #include "mountlist.h" #include "quote.h" #include "find-mount-point.h" @@ -272,20 +273,36 @@ static struct option const long_options[] =3D }; =20 /* Replace problematic chars with '?'. - Since only control characters are currently considered, - this should work in all encodings. */ + Return the number of screen columns. */ =20 -static char* +static size_t hide_problematic_chars (char *cell) { - char *p =3D cell; - while (*p) + char *srcend =3D cell + strlen (cell); + char *dst =3D cell; + mbstate_t mbstate =3D { 0, }; + size_t n; + size_t width =3D 0; + + for (char *src =3D cell; src !=3D srcend; src +=3D n) { - if (iscntrl (to_uchar (*p))) - *p =3D '?'; - p++; + wchar_t wc; + n =3D mbrtowc (&wc, src, srcend - src, &mbstate); + if (n < (size_t) -2 && !iswcntrl (wc)) + { + memcpy (dst, src, n); + dst +=3D n; + } + else + { + *dst++ =3D '?'; + memset (&mbstate, 0, sizeof mbstate); + } + width++; } - return cell; + + *dst =3D '\0'; + return width; } =20 /* Dynamically allocate a row of pointers in TABLE, which @@ -569,11 +586,10 @@ get_header (void) if (!cell) xalloc_die (); =20 - hide_problematic_chars (cell); - table[nrows - 1][col] =3D cell; =20 - columns[col]->width =3D MAX (columns[col]->width, mbswidth (cell, = 0)); + size_t cell_width =3D hide_problematic_chars (cell); + columns[col]->width =3D MAX (columns[col]->width, cell_width); } } =20 @@ -1182,8 +1198,8 @@ get_dev (char const *disk, char const *mount_point,= char const* file, if (!cell) assert (!"empty cell"); =20 - hide_problematic_chars (cell); - columns[col]->width =3D MAX (columns[col]->width, mbswidth (cell, = 0)); + size_t cell_width =3D hide_problematic_chars (cell); + columns[col]->width =3D MAX (columns[col]->width, cell_width); table[nrows - 1][col] =3D cell; } free (dev_name); --=20 2.7.4 --------------319BB69583027A4006235300-- From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 22 12:09:53 2018 Received: (at 32236) by debbugs.gnu.org; 22 Jul 2018 16:09:53 +0000 Received: from localhost ([127.0.0.1]:52635 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhGvh-0007IW-9V for submit@debbugs.gnu.org; Sun, 22 Jul 2018 12:09:53 -0400 Received: from mail-lj1-f176.google.com ([209.85.208.176]:41725) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhGvg-0007IK-2f for 32236@debbugs.gnu.org; Sun, 22 Jul 2018 12:09:52 -0400 Received: by mail-lj1-f176.google.com with SMTP id y17-v6so14862853ljy.8 for <32236@debbugs.gnu.org>; Sun, 22 Jul 2018 09:09:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=es4hlSMLpc+/g2DhyHQ550pqIfCICLRe/gpb0f2qWjk=; b=RXqXuZsmRWEu7RGxSZB1IYcaCfkISZYBLIdWMTLH3iMVLlb/H9IYm8oyhaSREwVPJb c+xLohAEeblkGZAhFRkZbi7ZCthOQ9YmniD9aoNGWUCdjQumVByTBe6TnqvXpAujCzKj qGGL3SL/j63zfIfDSzecuex5/EGfFuiqZLMOmT+C009eIhf/goxR5rsYs+DWKXDzmLbv Q17sLG/ElH6v8EWmxO4lU32nRG7x/O4ute16UFHBIcA9Vx6gYXsfR2Hbj7OJqwW6r6Np R4jgBXPbd/NNAACW4Zcts+i4Dm6egZhP2dNYubMMMT9X4Tx0wmAy8GfwINqkakXl8Io4 wtQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=es4hlSMLpc+/g2DhyHQ550pqIfCICLRe/gpb0f2qWjk=; b=j4JbRbXhj1/chOV6OipAS6uLfBjslZR1lDJUud5BNq6Gi2l2sfGWcyFeAgJF0xVxpa UFXjoco3k1KC8wRZIDNV1hHktRumLoyddECRl/xTsGWAKpuQi/wMdTLIZviU1yc+EWrs cANABbHq8mHy7gkXyWyEXYk4KWssqk4OKgT724Q7XF41+EkTXw8SvtB0WNbKwzZKYLdL l9mnf1BsXRRaT39CGX+DE2JonR5awkwtSohiU2IKjRPSbxJmr0oM72KgPzzXjxZL88Wk kJV7GTZncwTjCw/7Yw1qrRJY/rkX91EVmLXg+Ng3pFlm5WklTLCfzJZITV8Nd+kLRiit 40pA== X-Gm-Message-State: AOUpUlGhuSxzuAAVmzDfLvEjcZBjztjbgDxOVt825NbU4FkQRjBOISSk tHkNkK9F0gw44lFMpEU4o8SgoQfvUtpTrYCYzhw= X-Google-Smtp-Source: AAOMgpfvdLr5U2DBQ2AYLBn97hJcVZlJzOCe3JWlJ8uVFEPim0/W7Q5vlkBC88KxvkIFsVzBuJNE7MmUyaSgHUhEHlw= X-Received: by 2002:a2e:1301:: with SMTP id 1-v6mr6397949ljt.56.1532275786069; Sun, 22 Jul 2018 09:09:46 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a2e:6590:0:0:0:0:0 with HTTP; Sun, 22 Jul 2018 09:09:45 -0700 (PDT) In-Reply-To: <66ec8c11-a2af-a7cb-0d07-4dabe4d232f8@cs.ucla.edu> References: <36659f47-2bab-bfb2-e197-882d02cd1169@draigBrady.com> <66ec8c11-a2af-a7cb-0d07-4dabe4d232f8@cs.ucla.edu> From: Chih-Hsuan Yen Date: Mon, 23 Jul 2018 00:09:45 +0800 Message-ID: Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: Paul Eggert Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 32236 Cc: bug-gnulib , =?UTF-8?Q?P=C3=A1draig_Brady?= , 32236@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) 2018-07-22 23:12 GMT+08:00 Paul Eggert : > P=C3=A1draig Brady wrote: >> >> I've also attached an alternative patch for df (in your name). > > > That still has problems, since it can generate improperly-encoded strings= in > UTF-8 locales (if the inputs are improperly encoded), and can replace par= ts > of multibyte characters with '?' in non-UTF-8 locales. Please try the > attached patch instead, which attempts to address these issues. This is m= ore > along the lines that Bruno suggested, except it doesn't use mbsiter as I > figured it was simpler overall just to use mbrtowc directly for this one > thing. Here's the result of df: $ df =E6=AA=94=E6=A1=88=E7=B3=BB=E7=B5=B1 =E5=AE=B9=E9=87=8F =E5=B7=B2= =E7=94=A8 =E5=8F=AF=E7=94=A8 =E5=B7=B2=E7=94=A8 =E6=8E=9B=E8=BC=89=E9=BB= =9E /dev/disk1s1 234G 137G 95G 60% / /dev/disk1s4 234G 2.1G 95G 3% /private/var/vm chyen.cc: 25G 12G 12G 51% /private/tmp/abc def ghi $ df | xxd 00000000: e6aa 94e6 a188 e7b3 bbe7 b5b1 2020 2020 ............ 00000010: 2020 2020 e5ae b9e9 878f 2020 e5b7 b2e7 ...... .... 00000020: 94a8 2020 e58f afe7 94a8 20e5 b7b2 e794 .. ...... ..... 00000030: a820 e68e 9be8 bc89 e9bb 9e0a 2f64 6576 . ........../dev 00000040: 2f64 6973 6b31 7331 2020 2020 3233 3447 /disk1s1 234G 00000050: 2020 3133 3747 2020 2039 3547 2020 3630 137G 95G 60 00000060: 2520 2f0a 2f64 6576 2f64 6973 6b31 7334 % /./dev/disk1s4 00000070: 2020 2020 3233 3447 2020 322e 3147 2020 234G 2.1G 00000080: 2039 3547 2020 2033 2520 2f70 7269 7661 95G 3% /priva 00000090: 7465 2f76 6172 2f76 6d0a 6368 7965 6e2e te/var/vm.chyen. 000000a0: 6363 3a20 2020 2020 2020 2032 3547 2020 cc: 25G 000000b0: 2031 3247 2020 2031 3247 2020 3531 2520 12G 12G 51% 000000c0: 2f70 7269 7661 7465 2f74 6d70 2f61 6263 /private/tmp/abc 000000d0: e280 a864 6566 e280 a967 6869 0a ...def...ghi. Chinese header names are correct, and U+2028 and U+2029 are written as-is. All tested with LANG=3Dzh_TW.UTF-8 LC_COLLATE=3DC LC_CTYPE=3Dzh_TW.UTF-8. From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 22 12:17:13 2018 Received: (at 32236) by debbugs.gnu.org; 22 Jul 2018 16:17:13 +0000 Received: from localhost ([127.0.0.1]:52646 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhH2n-0007U2-Ba for submit@debbugs.gnu.org; Sun, 22 Jul 2018 12:17:13 -0400 Received: from mail.magicbluesmoke.com ([82.195.144.49]:42760) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhH2l-0007Tu-Bk for 32236@debbugs.gnu.org; Sun, 22 Jul 2018 12:17:11 -0400 Received: from localhost.localdomain (unknown [76.21.115.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id 6E21F9C41; Sun, 22 Jul 2018 17:17:09 +0100 (IST) Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: Paul Eggert , Chih-Hsuan Yen , 32236@debbugs.gnu.org, bug-gnulib References: <36659f47-2bab-bfb2-e197-882d02cd1169@draigBrady.com> <66ec8c11-a2af-a7cb-0d07-4dabe4d232f8@cs.ucla.edu> From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: Date: Sun, 22 Jul 2018 09:17:07 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <66ec8c11-a2af-a7cb-0d07-4dabe4d232f8@cs.ucla.edu> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 32236 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 22/07/18 08:12, Paul Eggert wrote: > Pádraig Brady wrote: >> I've also attached an alternative patch for df (in your name). > > That still has problems, since it can generate improperly-encoded strings in > UTF-8 locales (if the inputs are improperly encoded), and can replace parts of > multibyte characters with '?' in non-UTF-8 locales. Please try the attached > patch instead, which attempts to address these issues. This is more along the > lines that Bruno suggested, except it doesn't use mbsiter as I figured it was > simpler overall just to use mbrtowc directly for this one thing. I haven't time to review this now, but I did want to only avoid \n etc. that might cause issues for programs that parsed output from df on a line by line basis. This subset of control characters is safe to identify It seems problematic to start eliding improperly encoded mount points for example, rather than just outputting what's there. Also just incrementing width++ per each wide character doesn't seem right, though again I've not tested it. cheers, Pádraig From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 22 12:25:24 2018 Received: (at 32236) by debbugs.gnu.org; 22 Jul 2018 16:25:24 +0000 Received: from localhost ([127.0.0.1]:52650 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhHAi-0007gH-6B for submit@debbugs.gnu.org; Sun, 22 Jul 2018 12:25:24 -0400 Received: from mail.magicbluesmoke.com ([82.195.144.49]:42776) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhHAg-0007g9-5b for 32236@debbugs.gnu.org; Sun, 22 Jul 2018 12:25:22 -0400 Received: from localhost.localdomain (unknown [76.21.115.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id C7D4C9CC9; Sun, 22 Jul 2018 17:25:20 +0100 (IST) Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: Bruno Haible , bug-gnulib@gnu.org References: <36659f47-2bab-bfb2-e197-882d02cd1169@draigBrady.com> <1599384.GXLMD97vOh@omega> From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: <1552df41-3c80-f1fd-8749-bb664de43f29@draigBrady.com> Date: Sun, 22 Jul 2018 09:25:18 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <1599384.GXLMD97vOh@omega> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 32236 Cc: Chih-Hsuan Yen , 32236@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 21/07/18 15:43, Bruno Haible wrote: > Hi Pádraig, > >> I've attached a gnulib patch to document for iscntrl at least. > >> +This function does not support arguments outside of the range of the >> +unsigned char type in locales with large character sets, on some platforms. >> +OS X 10.5 will return non zero for characters >= 0x80 in UTF-8 locales. > > In UTF-8 locales, arguments >= 0x80 are invalid arguments for iscntrl(). > > POSIX [1] says > "The c argument is a type int, the value of which the application shall > ensure is a character representable as an unsigned char or equal to the > value of the macro EOF. If the argument has any other value, the behavior > is undefined." > > The term "character" is defined here [2]: > "A sequence of one or more bytes representing a single graphic symbol or > control code." > > So, in a UTF-8 locale, a "character representable as an unsigned char" > is a byte sequence of length 1, where the single byte has a value in the > range 0x00..0x7F. > > For invalid values "the behavior is undefined." You were expecting a value 0. > > Now, in the gnulib documentations, what we mention as portability problems > are the cases where > - the behaviour for valid arguments is different on different platforms, or > - the boundary between valid and invalid arguments is fuzzy and depends on > the platform. > IMO there's no point in documenting that a function _really_ has undefined > behaviour when POSIX says that it has undefined behaviour. Thanks for all that info. I agree iscntrl() behavior on macOS is within spec, though is still surprising, and different from other systems. I agree docs should be as succinct as possible, though... >> I've also attached an alternative patch for df (in your name). > > This patch is correct (because the characters that you test for in c_iscntrl > are 0x00..0x1F, 0x7F, which don't occur as second or later byte in a multibyte > character in the EUC-JP, EUC-KR, GB2312, EUC-TW, GB18030, SJIS encodings). ... It might be worth mentioning this subtle point in the c_iscntrl() docs? "Note this identifies all single byte control chars even in multibyte encodings". > But it does not catch control characters outside of the ASCII range. It would > make sense to catch these as well. If you want to do that, > 'hide_problematic_chars' needs to be rewritten as a loop that iterates across > the multibyte characters. For example with the 'mbiter' module, in > combination with the mb_iscntrl function from the 'mbchar' module. Or > directly with mbrtowc() and iswcntrl(). I was mainly worried here about \n for scripts to robustly parse df output. cheers, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 22 13:01:20 2018 Received: (at 32236) by debbugs.gnu.org; 22 Jul 2018 17:01:20 +0000 Received: from localhost ([127.0.0.1]:52664 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhHjT-0000Af-TU for submit@debbugs.gnu.org; Sun, 22 Jul 2018 13:01:20 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:49976) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhHjS-0000AS-3N for 32236@debbugs.gnu.org; Sun, 22 Jul 2018 13:01:18 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id BCF4116081C; Sun, 22 Jul 2018 10:01:11 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id T0w5auy9v3SC; Sun, 22 Jul 2018 10:01:11 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id F38E6160825; Sun, 22 Jul 2018 10:01:10 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id VyqY90mLckQx; Sun, 22 Jul 2018 10:01:10 -0700 (PDT) Received: from [192.168.1.9] (unknown [47.154.30.119]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 91AC216081C; Sun, 22 Jul 2018 10:01:10 -0700 (PDT) Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: =?UTF-8?Q?P=c3=a1draig_Brady?= , Chih-Hsuan Yen , 32236@debbugs.gnu.org, bug-gnulib References: <36659f47-2bab-bfb2-e197-882d02cd1169@draigBrady.com> <66ec8c11-a2af-a7cb-0d07-4dabe4d232f8@cs.ucla.edu> From: Paul Eggert Openpgp: preference=signencrypt Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= xsFNBEyAcmQBEADAAyH2xoTu7ppG5D3a8FMZEon74dCvc4+q1XA2J2tBy2pwaTqfhpxxdGA9 Jj50UJ3PD4bSUEgN8tLZ0san47l5XTAFLi2456ciSl5m8sKaHlGdt9XmAAtmXqeZVIYX/UFS 96fDzf4xhEmm/y7LbYEPQdUdxu47xA5KhTYp5bltF3WYDz1Ygd7gx07Auwp7iw7eNvnoDTAl KAl8KYDZzbDNCQGEbpY3efZIvPdeI+FWQN4W+kghy+P6au6PrIIhYraeua7XDdb2LS1en3Ss mE3QjqfRqI/A2ue8JMwsvXe/WK38Ezs6x74iTaqI3AFH6ilAhDqpMnd/msSESNFt76DiO1ZK QMr9amVPknjfPmJISqdhgB1DlEdw34sROf6V8mZw0xfqT6PKE46LcFefzs0kbg4GORf8vjG2 Sf1tk5eU8MBiyN/bZ03bKNjNYMpODDQQwuP84kYLkX2wBxxMAhBxwbDVZudzxDZJ1C2VXujC OJVxq2kljBM9ETYuUGqd75AW2LXrLw6+MuIsHFAYAgRr7+KcwDgBAfwhPBYX34nSSiHlmLC+ KaHLeCLF5ZI2vKm3HEeCTtlOg7xZEONgwzL+fdKo+D6SoC8RRxJKs8a3sVfI4t6CnrQzvJbB n6gxdgCu5i29J1QCYrCYvql2UyFPAK+do99/1jOXT4m2836j1wARAQABzSBQYXVsIEVnZ2Vy dCA8ZWdnZXJ0QGNzLnVjbGEuZWR1PsLBfgQTAQIAKAUCTIByZAIbAwUJEswDAAYLCQgHAwIG FQgCCQoLBBYCAwECHgECF4AACgkQ7ZfpDmKqfjRRGw/+Ij03dhYfYl/gXVRiuzV1gGrbHk+t nfrI/C7fAeoFzQ5tVgVinShaPkZo0HTPf18x6IDEdAiO8Mqo1yp0CtHmzGMCJ50o4Grgfjlr 6g/+vtEOKbhleszN2XpJvpwM2QgGvn/laTLUu8PH9aRWTs7qJJZKKKAb4sxYc92FehPu6FOD 0dDiyhlDAq4lOV2mdBpzQbiojoZzQLMQwjpgCTK2572eK9EOEQySUThXrSIz6ASenp4NYTFH s9tuJQvXk9gZDdPSl3bp+47dGxlxEWLpBIM7zIONw4ks4azgT8nvDZxA5IZHtvqBlJLBObYY 0Le61Wp0y3TlBDh2qdK8eYL426W4scEMSuig5gb8OAtQiBW6k2sGUxxeiv8ovWu8YAZgKJfu oWI+uRnMEddruY8JsoM54KaKvZikkKs2bg1ndtLVzHpJ6qFZC7QVjeHUh6/BmgvdjWPZYFTt N+KA9CWX3GQKKgN3uu988yznD7LnB98T4EUH1HA/GnfBqMV1gpzTvPc4qVQinCmIkEFp83zl +G5fCjJJ3W7ivzCnYo4KhKLpFUm97okTKR2LW3xZzEW4cLSWO387MTK3CzDOx5qe6s4a91Zu ZM/j/TQdTLDaqNn83kA4Hq48UHXYxcIh+Nd8k/3w6lFuoK0wrOFiywjLx+0ur5jmmbecBGHc 1xdhAFHOwU0ETIByZAEQAKaF678T9wyH4wjTrV1Pz3cDEoSnV/0ZUrOT37p1dcGyj/IXq1x6 70HRVahAmk0sZpYc25PF9D5GPYHFWlNjuPU96rDndXB3hedmBRhLdC4bAXjI4DV+bmdVe+q/ IMnlZRaVlm9EiMCVAR6w13sReu7qXkW9r3RwY2AzXskp/tAe4BRKr1Zmbvi2nbnQ6epEC42r Rbx0B1EhjbIQZ5JHGk24iPT7LdBgnNmos5wYjzwNlkMQD5T0Ydzhk7J+UxwA5m46mOhRDC2r FV/A0gm5TLy8DXjv/Esc4gYnYai6SQqnUEVh5LuV8YCJBnijs+Tiw71x1icmn6xGI45EugJO gec+rLypYgpVp4x0HI5T88qBRYCkxH3Kg8Qo+EWNA9A4LRQ9DX8njona0gf0s03tocK8kBN6 6UoqqPtHBnc4eMgBymCflK12eKfd2YYxnyg9cZazWA5VslvTxpm76hbg5oiAEH/Vg/8MxHyA nPhfrgwyPrmJEcVBafdspJnYQxBYNco2LFPIhlOvWh8r4at+s+M3Lb26oUTczlgdW1Sf3SDA 77BMRnF0FQyE+7AzV79MBN4ykiqaezQxtaF1Fy/tvkhffSo8u+dwG0EgJh+te38gTcISVr0G IPplLz6YhjrbHrPRF1CN5UuL9DBGjxuN35RLNVEfta6RUFlR6NctTjvrABEBAAHCwWUEGAEC AA8FAkyAcmQCGwwFCRLMAwAACgkQ7ZfpDmKqfjSrHA/+KzAKvTxRhA9MWNLxIyJ7S5uJ16gs T3oCjZrBKGEhKMOGX4O0GA6VOEryO7QRCCYah3oxSG38IAnNeiwJXgU9Bzkk85UGbPEd7HGF /VSeHCQwWou6jqUDTSDvn9YhNTdG0KXPM74aC+xr2Zow1O2mhXihgWKD0Dw+0LYPnUOsQ0KO FxHXXYHmRrS1OZPU59BLvc+TRhIhafSHKLwbXK+6ckkxBx6h8z5ccpG0Qs4bFhdFYnFrEieD LoGmnE2YLhdV6swJ9VNCS6pLiEohT3fm7aXm15tZOIyzMZhHRSAPblXxQ0ZSWjq8oRrcYNFx c4W1URpAkBCOYJoXvQfD5L3lqAl8TCqDUzYxhH/tJhbDdHrqHH767jaDaTB1+Talp/2AMKwc XNOdiklGxbmHVG6YGl6g8Lrbsu9NZEI4yLlHzuikthJWgz+3vZhVGyNlt+HNIoF6CjDL2omu 5cEq4RDHM44QqPk6l7O0pUvN1mT4B+S1b08RKpqm/ff015E37HNV/piIvJlxGAYz8PSfuGCB 1thMYqlmgdhd9/BabGFbGGYHA6U4/T5zqU+f6xHy1SsAQZ1MSKlLwekBIT+4/cLRGqCHjnV0 q5H/T6a7t5mPkbzSrOLSo4puj+IToNjYyYIDBWzhlA19avOa+rvUjmHtD3sFN7cXWtkGoi8b uNcby4U= Organization: UCLA Computer Science Department Message-ID: <986b7ded-d58c-a385-7e59-d1006786419c@cs.ucla.edu> Date: Sun, 22 Jul 2018 10:01:09 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 32236 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) P=C3=A1draig Brady wrote: > I did want to only avoid \n etc. that might cause issues for > programs that parsed output from df on a line by line basis. > This subset of control characters is safe to identify > It seems problematic to start eliding improperly encoded > mount points for example, rather than just outputting > what's there. Yes, I suppose you're right, it's not df's job to police encodings. > Also just incrementing width++ per each wide character > doesn't seem right, though again I've not tested it. True as well. OK, please ignore my patch. I was prompted by worries about multibyte encodings that use bytes that c= ould be=20 misinterpreted as ASCII control characters, such as a locale that uses EB= CDIC=20 encoding. However, that's probably just a theoretical concern; no coreuti= ls=20 users use EBCDIC any more, right? Plus there are doubtless lots of other = places=20 in coreutils that assume '\n' is a newline in encoded text. From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 22 17:35:28 2018 Received: (at 32236) by debbugs.gnu.org; 22 Jul 2018 21:35:28 +0000 Received: from localhost ([127.0.0.1]:52758 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhM0l-0002JI-QG for submit@debbugs.gnu.org; Sun, 22 Jul 2018 17:35:27 -0400 Received: from mo4-p00-ob.smtp.rzone.de ([85.215.255.23]:16417) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhM0i-0002J8-Vj for 32236@debbugs.gnu.org; Sun, 22 Jul 2018 17:35:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1532295323; s=strato-dkim-0002; d=clisp.org; h=References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: X-RZG-CLASS-ID:X-RZG-AUTH:From:Subject:Sender; bh=oKwbgwLvxghZKNinlnPqL+E5BExHwvCkVTsh6IOf4MA=; b=Uzu65On0nhArICFnnfci9dKtOqU8KEX4bANgS5QMzPFy0Dkz15BbeQe8Al7QP7c65o +4mxwaBkg35e+eiyq4o+kiSQGo71XIX4p8E5RYg5BZmXa70Yj7XcpzoYWvxsJPcxgx1C j9RYhTnCSowYYx3YJM1YSrf/NvGUbTl+TNl7ECi11SsOc4fzxQPGxRAb/Ue3eF2PViia FwZK4TodzmBC2ADszrIfY7JHrmLx3u93QSlAyORtpQ2r2YLNZ2HV9I9UIgSfhiRjX29b oj8j97JhtVD75OiaXgskfFJkEshPzBIg+Q8v9BNoBTa90xJ34jCLZPCn0NZUG9Fx2g2/ L2Uw== X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH+AHjwLuWOGKf9zfs=" X-RZG-CLASS-ID: mo00 Received: from bruno.haible.de by smtp.strato.de (RZmta 43.13 DYNA|AUTH) with ESMTPSA id g03ba1u6MLZMikQ (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (curve secp521r1 with 521 ECDH bits, eq. 15360 bits RSA)) (Client did not present a certificate); Sun, 22 Jul 2018 23:35:22 +0200 (CEST) From: Bruno Haible To: bug-gnulib@gnu.org Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS Date: Sun, 22 Jul 2018 23:35:21 +0200 Message-ID: <7441655.aCqMusjKWc@omega> User-Agent: KMail/5.1.3 (Linux/4.4.0-130-generic; KDE/5.18.0; x86_64; ; ) In-Reply-To: References: <66ec8c11-a2af-a7cb-0d07-4dabe4d232f8@cs.ucla.edu> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 32236 Cc: Chih-Hsuan Yen , Paul Eggert , =?ISO-8859-1?Q?P=E1draig?= Brady , 32236@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) P=C3=A1draig Brady wrote: > but I did want to only avoid \n etc. that might cause issues for > programs that parsed output from df on a line by line basis. The current code (which uses iscntrl) also catches escape sequences that can cause weird output on the screen, in a terminal emulator. This is good (because it can confuse a human reader as much as a '\n' would confuse a line-by-line parser). Now, this feature currently only works for escape sequence that start with an ASCII escape U+001B. It would be useful also for other control characters to be caught, at least: * escape characters U+009B. * other characters that cause a newline in a terminal emulator: U+2028 and U+2029. =46or example, in konsole, the escape sequence '\u009bf' repositions the cursor. So the effects of $ mkdir /tmp/`printf 'abc\u009bf'` $ sudo mount -r -t iso9660 -o loop /some/iso/image.iso /tmp/abc* $ df =2E.. /dev/loop0 1986048 1986048 0 100% /tmp/abc=EF= =BF=BDf is that 'df' produces an U+FFFD. This is less useful than what it produces for an ASCII escape: $ mkdir /tmp/`printf 'abc\u001b[2J'` $ sudo mount -r -t iso9660 -o loop /some/iso/image.iso /tmp/abc* $ df =2E.. /dev/loop0 692828 692828 0 100% /tmp/abc?[2J Bruno From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 22 17:40:39 2018 Received: (at 32236) by debbugs.gnu.org; 22 Jul 2018 21:40:39 +0000 Received: from localhost ([127.0.0.1]:52763 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhM5n-0002QN-DV for submit@debbugs.gnu.org; Sun, 22 Jul 2018 17:40:39 -0400 Received: from mo4-p01-ob.smtp.rzone.de ([85.215.255.54]:28589) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fhM5l-0002QF-TQ for 32236@debbugs.gnu.org; Sun, 22 Jul 2018 17:40:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1532295636; s=strato-dkim-0002; d=clisp.org; h=References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: X-RZG-CLASS-ID:X-RZG-AUTH:From:Subject:Sender; bh=GW0rn690G2O+0YxUo+nH7mBFizXSURrGfy/aAF7066Q=; b=IcKyuSYXy5g+hzxxZgoFRCnqrCqx/5feFWP7vOYkTokjkce51hA3EUZI0UpfjN0oK7 eqRnvl6YStUj1R7OMuQPRgyMX6CiIrIp7qL/hiQf49Rn20peWUkvmAURuAPz1d76DB1c qlHDUbcUpcau9YWqtkssFYjHBHu8gypLFR7sCVHbzPIwypKEZjFeVZrx458LQNZ0ZuVL BQMOL2JQeI9KvNeA5xfcbBKw+hXZqQcs89S6ar+x/UbLNw+E4atxXYHBbPfEctZxf4nT KDzuXMDOlpyoIGPY7c5hCtxepGd3g3EMA1xWS1iqJSufut3hSlIj2v7ScmJCxtRsSCBK OY0g== X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH+AHjwLuWOGKf9zfs=" X-RZG-CLASS-ID: mo00 Received: from bruno.haible.de by smtp.strato.de (RZmta 43.13 DYNA|AUTH) with ESMTPSA id g03ba1u6MLeail3 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (curve secp521r1 with 521 ECDH bits, eq. 15360 bits RSA)) (Client did not present a certificate); Sun, 22 Jul 2018 23:40:36 +0200 (CEST) From: Bruno Haible To: =?ISO-8859-1?Q?P=E1draig?= Brady Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS Date: Sun, 22 Jul 2018 23:40:35 +0200 Message-ID: <4004100.9zfVm5Hql4@omega> User-Agent: KMail/5.1.3 (Linux/4.4.0-130-generic; KDE/5.18.0; x86_64; ; ) In-Reply-To: <1552df41-3c80-f1fd-8749-bb664de43f29@draigBrady.com> References: <1599384.GXLMD97vOh@omega> <1552df41-3c80-f1fd-8749-bb664de43f29@draigBrady.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 32236 Cc: Chih-Hsuan Yen , bug-gnulib@gnu.org, 32236@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) P=E1draig Brady wrote: > > This patch is correct (because the characters that you test for in c_is= cntrl > > are 0x00..0x1F, 0x7F, which don't occur as second or later byte in a mu= ltibyte > > character in the EUC-JP, EUC-KR, GB2312, EUC-TW, GB18030, SJIS encoding= s). >=20 > ... It might be worth mentioning this subtle point in the c_iscntrl() doc= s? > "Note this identifies all single byte control chars even in multibyte enc= odings". Only in the multibyte encodings that are currently in use. We never know wh= at kinds of features or misfeatures new multibyte encodings will come up with: Before GB18030 was introduced, it was a common feature of all multibyte enc= odings (including SJIS) that ASCII characters in the range 0x00..0x3F never occur = as second or later byte in a multibyte character. Well, GB18030 broke this ass= umption. So, it is dangerous to rely on this property. Therefore I wouldn't like to document it in the c_iscntrl() documentation. Bruno From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 25 11:51:21 2018 Received: (at 32236) by debbugs.gnu.org; 25 Jul 2018 15:51:22 +0000 Received: from localhost ([127.0.0.1]:57384 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fiM4P-0000to-NW for submit@debbugs.gnu.org; Wed, 25 Jul 2018 11:51:21 -0400 Received: from mail-lf1-f51.google.com ([209.85.167.51]:35306) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fiM4O-0000tb-Bg for 32236@debbugs.gnu.org; Wed, 25 Jul 2018 11:51:20 -0400 Received: by mail-lf1-f51.google.com with SMTP id f18-v6so5782422lfc.2 for <32236@debbugs.gnu.org>; Wed, 25 Jul 2018 08:51:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=oRQbEEoRifzF/YsIslyfkdLeZSSKCVrvkeBLdtXj4Os=; b=Sq/YdOqWqjeuRQ79MvjTL1jDx98fXgX/2ayOe2cuM57ykaJgyZVuoBliclb5CUrHqk hNuDuMlZsjSXrXblYMzFEerIZ7wagXPK8tJVCMVzZ8++dO7xatL9K/ROFNL4FD0Gm+Rx nW0xjv1lpXONN5VUl3TteOnUBehm+EAFIS6wVyEbLwUpStO8rGE51ZlJ58REpJsh0840 hyVzk62/4hNOMBTgYSN1fAi3Yu8lPw6gWiPI1rZ/DQSa2kaq9hNCGm4HwLhEDitzm8ie DG9jBRLnsa3bt4jJYQwTk5bFM15eNrwiNflAzqmF7RJmmMiBgZjb9emnKk1sK64k/CKK 5TSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=oRQbEEoRifzF/YsIslyfkdLeZSSKCVrvkeBLdtXj4Os=; b=Is80dDic0acvh5eX24v3/jPklVVX8vIwCKHuCo/D77VzsUS8mf5cGi73Z6ETwMrB28 dnKYKXco0S/M1RcoKDs/uRNqTLb0nVlVGS7NyYrnHhHvclGntpJ/6A8yqJa6vI1oKBRN o32kZjK6/dopDXD97lQL8UWWFbz4p+eg2WFS4I874mufBhYT2E4Mg1BiyVavjkCD4Zl0 jsWvvfddE4ti/cKhkKLLP43us94xczeWS5ACYivjn5JPlTzHYXbMWM4FwTlppV94ywJS p0MU1iM9n/h/MmJmDtjNyuLleH8H4OuFZWsPhlZ5LAmPvSD9twC2q2SBblwSFhxVjLcD bBcQ== X-Gm-Message-State: AOUpUlG/SBe9FvOXAZFxPm9buFSg84CeEx6qWcnC5ORxI+6aImiQrWJn 9k1ivZOaolspWtKmqEH7usrKhx3PZORma9c54r8= X-Google-Smtp-Source: AAOMgpfJPOx8RbNanprM7itflGut8L7Z5uZVI+BZNqRvqGHuDusatIqMidk19Pp1z7z7XEfxFShUj6g26tbO2/xwMck= X-Received: by 2002:a19:95c9:: with SMTP id x192-v6mr12884158lfd.37.1532533874288; Wed, 25 Jul 2018 08:51:14 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a2e:6590:0:0:0:0:0 with HTTP; Wed, 25 Jul 2018 08:51:13 -0700 (PDT) In-Reply-To: <4004100.9zfVm5Hql4@omega> References: <1599384.GXLMD97vOh@omega> <1552df41-3c80-f1fd-8749-bb664de43f29@draigBrady.com> <4004100.9zfVm5Hql4@omega> From: Chih-Hsuan Yen Date: Wed, 25 Jul 2018 23:51:13 +0800 Message-ID: Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: Bruno Haible Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 32236 Cc: bug-gnulib , =?UTF-8?Q?P=C3=A1draig_Brady?= , 32236@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) 2018-07-23 5:40 GMT+08:00 Bruno Haible : > P=C3=A1draig Brady wrote: >> > This patch is correct (because the characters that you test for in c_i= scntrl >> > are 0x00..0x1F, 0x7F, which don't occur as second or later byte in a m= ultibyte >> > character in the EUC-JP, EUC-KR, GB2312, EUC-TW, GB18030, SJIS encodin= gs). >> >> ... It might be worth mentioning this subtle point in the c_iscntrl() do= cs? >> "Note this identifies all single byte control chars even in multibyte en= codings". > > Only in the multibyte encodings that are currently in use. We never know = what > kinds of features or misfeatures new multibyte encodings will come up wit= h: > Before GB18030 was introduced, it was a common feature of all multibyte e= ncodings > (including SJIS) that ASCII characters in the range 0x00..0x3F never occu= r as > second or later byte in a multibyte character. Well, GB18030 broke this a= ssumption. > > So, it is dangerous to rely on this property. Therefore I wouldn't like t= o > document it in the c_iscntrl() documentation. > > Bruno > Hello any update on this? Discussions about encodings are beyond my knowledge, yet I can feel that it's difficult to correctly filter control characters. How about following the idea from P=C3=A1draig Brady and filter \n only? From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 26 05:02:06 2018 Received: (at 32236) by debbugs.gnu.org; 26 Jul 2018 09:02:06 +0000 Received: from localhost ([127.0.0.1]:58280 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fic9t-0004Vq-WB for submit@debbugs.gnu.org; Thu, 26 Jul 2018 05:02:06 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:54406) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fic9p-0004VF-7x for 32236@debbugs.gnu.org; Thu, 26 Jul 2018 05:02:04 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 5664E160656; Thu, 26 Jul 2018 02:01:55 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 5zOtWZK0H15e; Thu, 26 Jul 2018 02:01:54 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 45D6E1606C2; Thu, 26 Jul 2018 02:01:54 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id W1GP80IlQLYp; Thu, 26 Jul 2018 02:01:54 -0700 (PDT) Received: from [192.168.1.9] (unknown [47.154.30.119]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id D3DC0160656; Thu, 26 Jul 2018 02:01:53 -0700 (PDT) Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: Chih-Hsuan Yen , Bruno Haible References: <1599384.GXLMD97vOh@omega> <1552df41-3c80-f1fd-8749-bb664de43f29@draigBrady.com> <4004100.9zfVm5Hql4@omega> From: Paul Eggert Openpgp: preference=signencrypt Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= xsFNBEyAcmQBEADAAyH2xoTu7ppG5D3a8FMZEon74dCvc4+q1XA2J2tBy2pwaTqfhpxxdGA9 Jj50UJ3PD4bSUEgN8tLZ0san47l5XTAFLi2456ciSl5m8sKaHlGdt9XmAAtmXqeZVIYX/UFS 96fDzf4xhEmm/y7LbYEPQdUdxu47xA5KhTYp5bltF3WYDz1Ygd7gx07Auwp7iw7eNvnoDTAl KAl8KYDZzbDNCQGEbpY3efZIvPdeI+FWQN4W+kghy+P6au6PrIIhYraeua7XDdb2LS1en3Ss mE3QjqfRqI/A2ue8JMwsvXe/WK38Ezs6x74iTaqI3AFH6ilAhDqpMnd/msSESNFt76DiO1ZK QMr9amVPknjfPmJISqdhgB1DlEdw34sROf6V8mZw0xfqT6PKE46LcFefzs0kbg4GORf8vjG2 Sf1tk5eU8MBiyN/bZ03bKNjNYMpODDQQwuP84kYLkX2wBxxMAhBxwbDVZudzxDZJ1C2VXujC OJVxq2kljBM9ETYuUGqd75AW2LXrLw6+MuIsHFAYAgRr7+KcwDgBAfwhPBYX34nSSiHlmLC+ KaHLeCLF5ZI2vKm3HEeCTtlOg7xZEONgwzL+fdKo+D6SoC8RRxJKs8a3sVfI4t6CnrQzvJbB n6gxdgCu5i29J1QCYrCYvql2UyFPAK+do99/1jOXT4m2836j1wARAQABzSBQYXVsIEVnZ2Vy dCA8ZWdnZXJ0QGNzLnVjbGEuZWR1PsLBfgQTAQIAKAUCTIByZAIbAwUJEswDAAYLCQgHAwIG FQgCCQoLBBYCAwECHgECF4AACgkQ7ZfpDmKqfjRRGw/+Ij03dhYfYl/gXVRiuzV1gGrbHk+t nfrI/C7fAeoFzQ5tVgVinShaPkZo0HTPf18x6IDEdAiO8Mqo1yp0CtHmzGMCJ50o4Grgfjlr 6g/+vtEOKbhleszN2XpJvpwM2QgGvn/laTLUu8PH9aRWTs7qJJZKKKAb4sxYc92FehPu6FOD 0dDiyhlDAq4lOV2mdBpzQbiojoZzQLMQwjpgCTK2572eK9EOEQySUThXrSIz6ASenp4NYTFH s9tuJQvXk9gZDdPSl3bp+47dGxlxEWLpBIM7zIONw4ks4azgT8nvDZxA5IZHtvqBlJLBObYY 0Le61Wp0y3TlBDh2qdK8eYL426W4scEMSuig5gb8OAtQiBW6k2sGUxxeiv8ovWu8YAZgKJfu oWI+uRnMEddruY8JsoM54KaKvZikkKs2bg1ndtLVzHpJ6qFZC7QVjeHUh6/BmgvdjWPZYFTt N+KA9CWX3GQKKgN3uu988yznD7LnB98T4EUH1HA/GnfBqMV1gpzTvPc4qVQinCmIkEFp83zl +G5fCjJJ3W7ivzCnYo4KhKLpFUm97okTKR2LW3xZzEW4cLSWO387MTK3CzDOx5qe6s4a91Zu ZM/j/TQdTLDaqNn83kA4Hq48UHXYxcIh+Nd8k/3w6lFuoK0wrOFiywjLx+0ur5jmmbecBGHc 1xdhAFHOwU0ETIByZAEQAKaF678T9wyH4wjTrV1Pz3cDEoSnV/0ZUrOT37p1dcGyj/IXq1x6 70HRVahAmk0sZpYc25PF9D5GPYHFWlNjuPU96rDndXB3hedmBRhLdC4bAXjI4DV+bmdVe+q/ IMnlZRaVlm9EiMCVAR6w13sReu7qXkW9r3RwY2AzXskp/tAe4BRKr1Zmbvi2nbnQ6epEC42r Rbx0B1EhjbIQZ5JHGk24iPT7LdBgnNmos5wYjzwNlkMQD5T0Ydzhk7J+UxwA5m46mOhRDC2r FV/A0gm5TLy8DXjv/Esc4gYnYai6SQqnUEVh5LuV8YCJBnijs+Tiw71x1icmn6xGI45EugJO gec+rLypYgpVp4x0HI5T88qBRYCkxH3Kg8Qo+EWNA9A4LRQ9DX8njona0gf0s03tocK8kBN6 6UoqqPtHBnc4eMgBymCflK12eKfd2YYxnyg9cZazWA5VslvTxpm76hbg5oiAEH/Vg/8MxHyA nPhfrgwyPrmJEcVBafdspJnYQxBYNco2LFPIhlOvWh8r4at+s+M3Lb26oUTczlgdW1Sf3SDA 77BMRnF0FQyE+7AzV79MBN4ykiqaezQxtaF1Fy/tvkhffSo8u+dwG0EgJh+te38gTcISVr0G IPplLz6YhjrbHrPRF1CN5UuL9DBGjxuN35RLNVEfta6RUFlR6NctTjvrABEBAAHCwWUEGAEC AA8FAkyAcmQCGwwFCRLMAwAACgkQ7ZfpDmKqfjSrHA/+KzAKvTxRhA9MWNLxIyJ7S5uJ16gs T3oCjZrBKGEhKMOGX4O0GA6VOEryO7QRCCYah3oxSG38IAnNeiwJXgU9Bzkk85UGbPEd7HGF /VSeHCQwWou6jqUDTSDvn9YhNTdG0KXPM74aC+xr2Zow1O2mhXihgWKD0Dw+0LYPnUOsQ0KO FxHXXYHmRrS1OZPU59BLvc+TRhIhafSHKLwbXK+6ckkxBx6h8z5ccpG0Qs4bFhdFYnFrEieD LoGmnE2YLhdV6swJ9VNCS6pLiEohT3fm7aXm15tZOIyzMZhHRSAPblXxQ0ZSWjq8oRrcYNFx c4W1URpAkBCOYJoXvQfD5L3lqAl8TCqDUzYxhH/tJhbDdHrqHH767jaDaTB1+Talp/2AMKwc XNOdiklGxbmHVG6YGl6g8Lrbsu9NZEI4yLlHzuikthJWgz+3vZhVGyNlt+HNIoF6CjDL2omu 5cEq4RDHM44QqPk6l7O0pUvN1mT4B+S1b08RKpqm/ff015E37HNV/piIvJlxGAYz8PSfuGCB 1thMYqlmgdhd9/BabGFbGGYHA6U4/T5zqU+f6xHy1SsAQZ1MSKlLwekBIT+4/cLRGqCHjnV0 q5H/T6a7t5mPkbzSrOLSo4puj+IToNjYyYIDBWzhlA19avOa+rvUjmHtD3sFN7cXWtkGoi8b uNcby4U= Organization: UCLA Computer Science Department Message-ID: <4839bde7-af25-f5b2-302b-305655a774da@cs.ucla.edu> Date: Thu, 26 Jul 2018 02:01:53 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------F7542E4399DB5B2F9BBA777C" Content-Language: en-US X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 32236 Cc: bug-gnulib , 32236@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) This is a multi-part message in MIME format. --------------F7542E4399DB5B2F9BBA777C Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Chih-Hsuan Yen wrote: > How about following the idea from P=C3=A1draig Brady > and filter \n only? Given the later comments it seems better to filter out encoding errors an= d=20 control characters. Programs that parse the output already cannot trust t= he=20 strings to be exactly right, since newlines are gonna get replaced no mat= ter=20 what. So there seems little benefit to copying the other garbage faithful= ly. Revised proposed patch(es) attached. --------------F7542E4399DB5B2F9BBA777C Content-Type: text/x-patch; name="0001-df-avoid-multibyte-character-corruption-on-macOS.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-df-avoid-multibyte-character-corruption-on-macOS.patch" >From e4f2a5f97771c4a74e0bdef1c1e4a0d2735cef15 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 22 Jul 2018 09:50:20 -0700 Subject: [PATCH 1/2] df: avoid multibyte character corruption on macOS Problem reported by Chih-Hsuan Yen (Bug#32236). * NEWS: Mention the bug fix. * src/df.c: Include wchar.h and wctype.h. (hide_problematic_chars): Respect multibyte encodings when replacing problematic characters or bytes with '?'. --- NEWS | 4 ++++ src/df.c | 35 +++++++++++++++++++++++++---------- 2 files changed, 29 insertions(+), 10 deletions(-) diff --git a/NEWS b/NEWS index af1a990..aa3b4f9 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,10 @@ GNU coreutils NEWS -*- outline -*- * Noteworthy changes in release ?.? (????-??-??) [?] +** Bug fixes + + df no longer corrupts displayed multibyte characters on macOS. + * Noteworthy changes in release 8.30 (2018-07-01) [stable] diff --git a/src/df.c b/src/df.c index 1178865..52be414b 100644 --- a/src/df.c +++ b/src/df.c @@ -23,6 +23,8 @@ #include #include #include +#include +#include #include "system.h" #include "canonicalize.h" @@ -271,21 +273,34 @@ static struct option const long_options[] = {NULL, 0, NULL, 0} }; -/* Replace problematic chars with '?'. - Since only control characters are currently considered, - this should work in all encodings. */ +/* Replace problematic chars with '?'. */ -static char* +static void hide_problematic_chars (char *cell) { - char *p = cell; - while (*p) + char *srcend = cell + strlen (cell); + char *dst = cell; + mbstate_t mbstate = { 0, }; + size_t n; + + for (char *src = cell; src != srcend; src += n) { - if (iscntrl (to_uchar (*p))) - *p = '?'; - p++; + wchar_t wc; + size_t srcbytes = srcend - src; + n = mbrtowc (&wc, src, srcbytes, &mbstate); + if (n <= srcbytes && !iswcntrl (wc)) + { + memcpy (dst, src, n); + dst += n; + } + else + { + *dst++ = '?'; + memset (&mbstate, 0, sizeof mbstate); + } } - return cell; + + *dst = '\0'; } /* Dynamically allocate a row of pointers in TABLE, which -- 2.7.4 --------------F7542E4399DB5B2F9BBA777C Content-Type: text/x-patch; name="0002-df-tune-slightly.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0002-df-tune-slightly.patch" >From e40ae8fb26fd0c8c0cb7e42598a2b8d9bd8551bb Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 26 Jul 2018 01:56:28 -0700 Subject: [PATCH 2/2] df: tune slightly * src/df.c (get_header, get_dev): Avoid calling mbswidth twice when once will do. --- src/df.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/df.c b/src/df.c index 52be414b..f5bd26d 100644 --- a/src/df.c +++ b/src/df.c @@ -588,7 +588,8 @@ get_header (void) table[nrows - 1][col] = cell; - columns[col]->width = MAX (columns[col]->width, mbswidth (cell, 0)); + size_t cell_width = mbswidth (cell, 0); + columns[col]->width = MAX (columns[col]->width, cell_width); } } @@ -1198,7 +1199,8 @@ get_dev (char const *disk, char const *mount_point, char const* file, assert (!"empty cell"); hide_problematic_chars (cell); - columns[col]->width = MAX (columns[col]->width, mbswidth (cell, 0)); + size_t cell_width = mbswidth (cell, 0); + columns[col]->width = MAX (columns[col]->width, cell_width); table[nrows - 1][col] = cell; } free (dev_name); -- 2.7.4 --------------F7542E4399DB5B2F9BBA777C-- From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 26 06:09:56 2018 Received: (at 32236) by debbugs.gnu.org; 26 Jul 2018 10:09:57 +0000 Received: from localhost ([127.0.0.1]:58365 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fidDY-00068P-Of for submit@debbugs.gnu.org; Thu, 26 Jul 2018 06:09:56 -0400 Received: from mo4-p00-ob.smtp.rzone.de ([81.169.146.220]:26561) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fidDW-00068G-NZ for 32236@debbugs.gnu.org; Thu, 26 Jul 2018 06:09:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1532599793; s=strato-dkim-0002; d=clisp.org; h=References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: X-RZG-CLASS-ID:X-RZG-AUTH:From:Subject:Sender; bh=rWz93TIFNilncKwZxUMl4EHS+9DWNY3QjcQ79s88wx8=; b=IK63tbiaGZso02s4nx4LNHl/KlFUrJGfUIF5S1M0VbZ1yK5Ds4EWsLXGBU9TS1BaUK RxOpETwRGzIecjgbqz/R1HCzWKYvHLVIuimKkNqE+7M1bVbqI4YWvOrKzgGkyMiswDyr vMzE9FIu4mZOAn4AWs6vVe0MXnAAgsPIi1ZNUM2tqoUNMm6aJ465vjevCWxjQRN5q9ld 9XCBnZweJKBVG2tTiHMVmB8rfh0rMJOeeNGP+rYvwgTprvX21hpfKABIIkZDbVyXFULM 0+FBkNrgjvwTw9zt7eD0HEmTRW5Ehrp6giYtEPnjZUQIzpWIA8T5W7eiOubARsNGl74t gLDw== X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH+AHjwLuWOGKf9zfs=" X-RZG-CLASS-ID: mo00 Received: from bruno.haible.de by smtp.strato.de (RZmta 43.13 DYNA|AUTH) with ESMTPSA id g03ba1u6QA9qTvL (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (curve secp521r1 with 521 ECDH bits, eq. 15360 bits RSA)) (Client did not present a certificate); Thu, 26 Jul 2018 12:09:52 +0200 (CEST) From: Bruno Haible To: Paul Eggert Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS Date: Thu, 26 Jul 2018 12:09:51 +0200 Message-ID: <1631759.zXW7oniGUb@omega> User-Agent: KMail/5.1.3 (Linux/4.4.0-130-generic; KDE/5.18.0; x86_64; ; ) In-Reply-To: <4839bde7-af25-f5b2-302b-305655a774da@cs.ucla.edu> References: <4839bde7-af25-f5b2-302b-305655a774da@cs.ucla.edu> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 32236 Cc: Chih-Hsuan Yen , bug-gnulib , 32236@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Paul Eggert wrote: > Revised proposed patch(es) attached. Looks good to me, except for one little thing: > memcpy (dst, src, n); src and dst may overlap. Therefore memmove should be used instead of memcpy. Bruno From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 26 13:34:54 2018 Received: (at 32236) by debbugs.gnu.org; 26 Jul 2018 17:34:55 +0000 Received: from localhost ([127.0.0.1]:59298 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fikAA-0006SX-Mi for submit@debbugs.gnu.org; Thu, 26 Jul 2018 13:34:54 -0400 Received: from mail.magicbluesmoke.com ([82.195.144.49]:40712) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fikA8-0006SN-AF for 32236@debbugs.gnu.org; Thu, 26 Jul 2018 13:34:52 -0400 Received: from localhost.localdomain (unknown [76.21.115.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id E7ED716F; Thu, 26 Jul 2018 18:34:49 +0100 (IST) Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: Paul Eggert , Chih-Hsuan Yen , Bruno Haible References: <1599384.GXLMD97vOh@omega> <1552df41-3c80-f1fd-8749-bb664de43f29@draigBrady.com> <4004100.9zfVm5Hql4@omega> <4839bde7-af25-f5b2-302b-305655a774da@cs.ucla.edu> From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: Date: Thu, 26 Jul 2018 10:34:47 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <4839bde7-af25-f5b2-302b-305655a774da@cs.ucla.edu> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 32236 Cc: bug-gnulib , 32236@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 26/07/18 02:01, Paul Eggert wrote: > Chih-Hsuan Yen wrote: >> How about following the idea from Pádraig Brady >> and filter \n only? > > Given the later comments it seems better to filter out encoding errors and > control characters. Programs that parse the output already cannot trust the > strings to be exactly right, since newlines are gonna get replaced no matter > what. So there seems little benefit to copying the other garbage faithfully. > > Revised proposed patch(es) attached. This is better, though this means that mount points now need to match the locale of df or they won't be displayed. Theoretically that was the case previously, but only for control chars and so wouldn't have have had a practical impact for mounts encoded in another local, only for security/robustness reasons where one might have \n etc. I've pushed the c_iscntrl patch since it's simplest and probably most appropriate patch for an existing release. If you consider the matching encoding issue as a non issue, then I'm OK with this. cheers, Pádraig From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 26 21:23:13 2018 Received: (at 32236-done) by debbugs.gnu.org; 27 Jul 2018 01:23:13 +0000 Received: from localhost ([127.0.0.1]:59411 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1firTN-0000lr-H2 for submit@debbugs.gnu.org; Thu, 26 Jul 2018 21:23:13 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:60930) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1firTK-0000lc-TS for 32236-done@debbugs.gnu.org; Thu, 26 Jul 2018 21:23:12 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 077021600AE; Thu, 26 Jul 2018 18:23:05 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id MwHREEwuH4qP; Thu, 26 Jul 2018 18:23:04 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 1A378160FD7; Thu, 26 Jul 2018 18:23:04 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id TBZXfU7frcw7; Thu, 26 Jul 2018 18:23:03 -0700 (PDT) Received: from [192.168.1.9] (unknown [47.154.30.119]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 7E5AA1600AE; Thu, 26 Jul 2018 18:23:03 -0700 (PDT) Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: =?UTF-8?Q?P=c3=a1draig_Brady?= , Chih-Hsuan Yen , Bruno Haible References: <1599384.GXLMD97vOh@omega> <1552df41-3c80-f1fd-8749-bb664de43f29@draigBrady.com> <4004100.9zfVm5Hql4@omega> <4839bde7-af25-f5b2-302b-305655a774da@cs.ucla.edu> From: Paul Eggert Openpgp: preference=signencrypt Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= xsFNBEyAcmQBEADAAyH2xoTu7ppG5D3a8FMZEon74dCvc4+q1XA2J2tBy2pwaTqfhpxxdGA9 Jj50UJ3PD4bSUEgN8tLZ0san47l5XTAFLi2456ciSl5m8sKaHlGdt9XmAAtmXqeZVIYX/UFS 96fDzf4xhEmm/y7LbYEPQdUdxu47xA5KhTYp5bltF3WYDz1Ygd7gx07Auwp7iw7eNvnoDTAl KAl8KYDZzbDNCQGEbpY3efZIvPdeI+FWQN4W+kghy+P6au6PrIIhYraeua7XDdb2LS1en3Ss mE3QjqfRqI/A2ue8JMwsvXe/WK38Ezs6x74iTaqI3AFH6ilAhDqpMnd/msSESNFt76DiO1ZK QMr9amVPknjfPmJISqdhgB1DlEdw34sROf6V8mZw0xfqT6PKE46LcFefzs0kbg4GORf8vjG2 Sf1tk5eU8MBiyN/bZ03bKNjNYMpODDQQwuP84kYLkX2wBxxMAhBxwbDVZudzxDZJ1C2VXujC OJVxq2kljBM9ETYuUGqd75AW2LXrLw6+MuIsHFAYAgRr7+KcwDgBAfwhPBYX34nSSiHlmLC+ KaHLeCLF5ZI2vKm3HEeCTtlOg7xZEONgwzL+fdKo+D6SoC8RRxJKs8a3sVfI4t6CnrQzvJbB n6gxdgCu5i29J1QCYrCYvql2UyFPAK+do99/1jOXT4m2836j1wARAQABzSBQYXVsIEVnZ2Vy dCA8ZWdnZXJ0QGNzLnVjbGEuZWR1PsLBfgQTAQIAKAUCTIByZAIbAwUJEswDAAYLCQgHAwIG FQgCCQoLBBYCAwECHgECF4AACgkQ7ZfpDmKqfjRRGw/+Ij03dhYfYl/gXVRiuzV1gGrbHk+t nfrI/C7fAeoFzQ5tVgVinShaPkZo0HTPf18x6IDEdAiO8Mqo1yp0CtHmzGMCJ50o4Grgfjlr 6g/+vtEOKbhleszN2XpJvpwM2QgGvn/laTLUu8PH9aRWTs7qJJZKKKAb4sxYc92FehPu6FOD 0dDiyhlDAq4lOV2mdBpzQbiojoZzQLMQwjpgCTK2572eK9EOEQySUThXrSIz6ASenp4NYTFH s9tuJQvXk9gZDdPSl3bp+47dGxlxEWLpBIM7zIONw4ks4azgT8nvDZxA5IZHtvqBlJLBObYY 0Le61Wp0y3TlBDh2qdK8eYL426W4scEMSuig5gb8OAtQiBW6k2sGUxxeiv8ovWu8YAZgKJfu oWI+uRnMEddruY8JsoM54KaKvZikkKs2bg1ndtLVzHpJ6qFZC7QVjeHUh6/BmgvdjWPZYFTt N+KA9CWX3GQKKgN3uu988yznD7LnB98T4EUH1HA/GnfBqMV1gpzTvPc4qVQinCmIkEFp83zl +G5fCjJJ3W7ivzCnYo4KhKLpFUm97okTKR2LW3xZzEW4cLSWO387MTK3CzDOx5qe6s4a91Zu ZM/j/TQdTLDaqNn83kA4Hq48UHXYxcIh+Nd8k/3w6lFuoK0wrOFiywjLx+0ur5jmmbecBGHc 1xdhAFHOwU0ETIByZAEQAKaF678T9wyH4wjTrV1Pz3cDEoSnV/0ZUrOT37p1dcGyj/IXq1x6 70HRVahAmk0sZpYc25PF9D5GPYHFWlNjuPU96rDndXB3hedmBRhLdC4bAXjI4DV+bmdVe+q/ IMnlZRaVlm9EiMCVAR6w13sReu7qXkW9r3RwY2AzXskp/tAe4BRKr1Zmbvi2nbnQ6epEC42r Rbx0B1EhjbIQZ5JHGk24iPT7LdBgnNmos5wYjzwNlkMQD5T0Ydzhk7J+UxwA5m46mOhRDC2r FV/A0gm5TLy8DXjv/Esc4gYnYai6SQqnUEVh5LuV8YCJBnijs+Tiw71x1icmn6xGI45EugJO gec+rLypYgpVp4x0HI5T88qBRYCkxH3Kg8Qo+EWNA9A4LRQ9DX8njona0gf0s03tocK8kBN6 6UoqqPtHBnc4eMgBymCflK12eKfd2YYxnyg9cZazWA5VslvTxpm76hbg5oiAEH/Vg/8MxHyA nPhfrgwyPrmJEcVBafdspJnYQxBYNco2LFPIhlOvWh8r4at+s+M3Lb26oUTczlgdW1Sf3SDA 77BMRnF0FQyE+7AzV79MBN4ykiqaezQxtaF1Fy/tvkhffSo8u+dwG0EgJh+te38gTcISVr0G IPplLz6YhjrbHrPRF1CN5UuL9DBGjxuN35RLNVEfta6RUFlR6NctTjvrABEBAAHCwWUEGAEC AA8FAkyAcmQCGwwFCRLMAwAACgkQ7ZfpDmKqfjSrHA/+KzAKvTxRhA9MWNLxIyJ7S5uJ16gs T3oCjZrBKGEhKMOGX4O0GA6VOEryO7QRCCYah3oxSG38IAnNeiwJXgU9Bzkk85UGbPEd7HGF /VSeHCQwWou6jqUDTSDvn9YhNTdG0KXPM74aC+xr2Zow1O2mhXihgWKD0Dw+0LYPnUOsQ0KO FxHXXYHmRrS1OZPU59BLvc+TRhIhafSHKLwbXK+6ckkxBx6h8z5ccpG0Qs4bFhdFYnFrEieD LoGmnE2YLhdV6swJ9VNCS6pLiEohT3fm7aXm15tZOIyzMZhHRSAPblXxQ0ZSWjq8oRrcYNFx c4W1URpAkBCOYJoXvQfD5L3lqAl8TCqDUzYxhH/tJhbDdHrqHH767jaDaTB1+Talp/2AMKwc XNOdiklGxbmHVG6YGl6g8Lrbsu9NZEI4yLlHzuikthJWgz+3vZhVGyNlt+HNIoF6CjDL2omu 5cEq4RDHM44QqPk6l7O0pUvN1mT4B+S1b08RKpqm/ff015E37HNV/piIvJlxGAYz8PSfuGCB 1thMYqlmgdhd9/BabGFbGGYHA6U4/T5zqU+f6xHy1SsAQZ1MSKlLwekBIT+4/cLRGqCHjnV0 q5H/T6a7t5mPkbzSrOLSo4puj+IToNjYyYIDBWzhlA19avOa+rvUjmHtD3sFN7cXWtkGoi8b uNcby4U= Organization: UCLA Computer Science Department Message-ID: <61bb0915-497b-b32c-9252-73e1406e0154@cs.ucla.edu> Date: Thu, 26 Jul 2018 18:23:02 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------B536B9D0B15661D5A94DBF20" Content-Language: en-US X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 32236-done Cc: 32236-done@debbugs.gnu.org, bug-gnulib X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) This is a multi-part message in MIME format. --------------B536B9D0B15661D5A94DBF20 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable P=C3=A1draig Brady wrote: > I've pushed the c_iscntrl patch since it's simplest > and probably most appropriate patch for an existing release. Yes, that makes sense for a quick patch. However, for the next release I = think=20 it'd be better to catch encoding errors and multibyte control characters,= given=20 the problems noted. I installed the attached further patch to try to do t= his.=20 This fixes the problem that Bruno noted, along with two others; my earlie= r patch=20 neglected the possibility that mbrtowc can return 0, and it incorrectly a= ssumed=20 wide control characters always have a single-byte representation. Either way the original bug appears to be fix so I'm boldly closing the b= ug report. --------------B536B9D0B15661D5A94DBF20 Content-Type: text/x-patch; name="0001-df-avoid-multibyte-character-corruption-on-macOS.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename*0="0001-df-avoid-multibyte-character-corruption-on-macOS.patch" =46rom 2cf5d730690dad600f8b6d74d0b5fde522804e43 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 22 Jul 2018 09:50:20 -0700 Subject: [PATCH] df: avoid multibyte character corruption on macOS This improves on the earlier fix for the problem reported by Chih-Hsuan Yen (Bug#32236), by also looking for other control characters and for encoding errors. * src/df.c: Include wchar.h and wctype.h instead of c-ctype.h. (hide_problematic_chars): Process the string as multibyte. Use iswcntrl, not c_iscntrl. --- src/df.c | 43 ++++++++++++++++++++++++++++++++----------- 1 file changed, 32 insertions(+), 11 deletions(-) diff --git a/src/df.c b/src/df.c index c851fcc..d27ba02 100644 --- a/src/df.c +++ b/src/df.c @@ -23,7 +23,8 @@ #include #include #include -#include +#include +#include =20 #include "system.h" #include "canonicalize.h" @@ -272,21 +273,41 @@ static struct option const long_options[] =3D {NULL, 0, NULL, 0} }; =20 -/* Replace problematic chars with '?'. - Since only control characters are currently considered, - this should work in all encodings. */ +/* Replace problematic chars with '?'. */ =20 -static char* +static void hide_problematic_chars (char *cell) { - char *p =3D cell; - while (*p) + char *srcend =3D cell + strlen (cell); + char *dst =3D cell; + mbstate_t mbstate =3D { 0, }; + size_t n; + + for (char *src =3D cell; src !=3D srcend; src +=3D n) { - if (c_iscntrl (to_uchar (*p))) - *p =3D '?'; - p++; + wchar_t wc; + size_t srcbytes =3D srcend - src; + n =3D mbrtowc (&wc, src, srcbytes, &mbstate); + bool ok =3D 0 < n && n <=3D srcbytes; + + if (ok) + ok =3D !iswcntrl (wc); + else + n =3D 1; + + if (ok) + { + memmove (dst, src, n); + dst +=3D n; + } + else + { + *dst++ =3D '?'; + memset (&mbstate, 0, sizeof mbstate); + } } - return cell; + + *dst =3D '\0'; } =20 /* Dynamically allocate a row of pointers in TABLE, which --=20 2.7.4 --------------B536B9D0B15661D5A94DBF20-- From debbugs-submit-bounces@debbugs.gnu.org Fri Jul 27 05:38:27 2018 Received: (at 32236-done) by debbugs.gnu.org; 27 Jul 2018 09:38:27 +0000 Received: from localhost ([127.0.0.1]:59570 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fizCd-0004Dd-Ap for submit@debbugs.gnu.org; Fri, 27 Jul 2018 05:38:27 -0400 Received: from mo4-p01-ob.smtp.rzone.de ([85.215.255.52]:24184) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fizCa-0004DU-OO for 32236-done@debbugs.gnu.org; Fri, 27 Jul 2018 05:38:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1532684303; s=strato-dkim-0002; d=clisp.org; h=References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: X-RZG-CLASS-ID:X-RZG-AUTH:From:Subject:Sender; bh=aGj9YQYXGHJhVlBHlY6DaUwfLylA7rnd2WJS8SNVll8=; b=DPQ4i0AJ14aCkxPnQXpbsqDLxvMHnEHPUSk78qzI4znz7dNXd3iNFHaZviRE9qZuzL +OSWerfwYhbf1XE+BcyGyKLnP5u7REL1i6LO/5r0opyDwvBWMt2i3oXk9dZiJw7L/1lQ 60FrB5X8PJckCI3+bpFDYLSUb/8u/+c2sWApoagyBXqLRNmXpdIIDXifcBtDJL5hwn9b q1FE3w64cWqylm0vBKoPYHuv8Ky5CXnxraHVIWQHgFibFlLBHWti3pQLFsb8M3XStCn/ 2zYJbwLfYoftRU/RjbM488cdHkskEfAONLkmHyFYDfhB4vxTwN1bbopxU/LMTJvEv0Ah BlqA== X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH+AHjwLuWOGKf9zfs=" X-RZG-CLASS-ID: mo00 Received: from bruno.haible.de by smtp.strato.de (RZmta 43.13 DYNA|AUTH) with ESMTPSA id g03ba1u6R9cNZGt (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (curve secp521r1 with 521 ECDH bits, eq. 15360 bits RSA)) (Client did not present a certificate); Fri, 27 Jul 2018 11:38:23 +0200 (CEST) From: Bruno Haible To: Paul Eggert Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS Date: Fri, 27 Jul 2018 11:38:22 +0200 Message-ID: <2027749.UlHNbUF2ei@omega> User-Agent: KMail/5.1.3 (Linux/4.4.0-130-generic; KDE/5.18.0; x86_64; ; ) In-Reply-To: <61bb0915-497b-b32c-9252-73e1406e0154@cs.ucla.edu> References: <61bb0915-497b-b32c-9252-73e1406e0154@cs.ucla.edu> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 32236-done Cc: Chih-Hsuan Yen , bug-gnulib , =?ISO-8859-1?Q?P=E1draig?= Brady , 32236-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Paul Eggert wrote: > my earlier patch > neglected the possibility that mbrtowc can return 0 I wouldn't see this as a bug: You can assume that mbrtowc returns 0 if and only if the multibyte sequence is a NUL byte - but you had chosen srcend in such a way that this would not happen in the loop. > and it incorrectly assumed > wide control characters always have a single-byte representation. Oops, you're right. My mistake as well. The new patch looks good. This will catch (and replace with '?') U+2028 and U+2029 on glibc systems. On macOS, it will not do this, because iswcntrl(0x2028) and iswcntrl(0x2029) is 0 on this system; this is consistent with the fact that the 'Terminal' program displays these characters as simple spaces. So, no need to override iswcntrl on macOS. Bruno 2018-07-27 Bruno Haible iswcntrl: Mention minor problem on macOS. * doc/posix-functions/iswcntrl.texi: Mention oddity on macOS. diff --git a/doc/posix-functions/iswcntrl.texi b/doc/posix-functions/iswcntrl.texi index 99eaa0e..44dd034 100644 --- a/doc/posix-functions/iswcntrl.texi +++ b/doc/posix-functions/iswcntrl.texi @@ -25,4 +25,8 @@ Portability problems not fixed by Gnulib: @item On AIX and Windows platforms, @code{wchar_t} is a 16-bit type and therefore cannot accommodate all Unicode characters. +@item +This function returns 0 for U+2028 (LINE SEPARATOR) and +U+2029 (PARAGRAPH SEPARATOR) on some platforms: +Mac OS X 10.13. @end itemize From debbugs-submit-bounces@debbugs.gnu.org Fri Jul 27 15:05:28 2018 Received: (at 32236-done) by debbugs.gnu.org; 27 Jul 2018 19:05:28 +0000 Received: from localhost ([127.0.0.1]:60589 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fj83L-0005O2-RE for submit@debbugs.gnu.org; Fri, 27 Jul 2018 15:05:28 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:47060) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fj83K-0005Np-0w for 32236-done@debbugs.gnu.org; Fri, 27 Jul 2018 15:05:26 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E7E3F161050; Fri, 27 Jul 2018 12:05:19 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id beLE5K3DHiHh; Fri, 27 Jul 2018 12:05:18 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 928BB1610C6; Fri, 27 Jul 2018 12:05:18 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id o11p1Bkgwpsm; Fri, 27 Jul 2018 12:05:18 -0700 (PDT) Received: from [192.168.1.9] (unknown [47.154.30.119]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 25229161050; Fri, 27 Jul 2018 12:05:18 -0700 (PDT) Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: Bruno Haible References: <61bb0915-497b-b32c-9252-73e1406e0154@cs.ucla.edu> <2027749.UlHNbUF2ei@omega> From: Paul Eggert Openpgp: preference=signencrypt Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= xsFNBEyAcmQBEADAAyH2xoTu7ppG5D3a8FMZEon74dCvc4+q1XA2J2tBy2pwaTqfhpxxdGA9 Jj50UJ3PD4bSUEgN8tLZ0san47l5XTAFLi2456ciSl5m8sKaHlGdt9XmAAtmXqeZVIYX/UFS 96fDzf4xhEmm/y7LbYEPQdUdxu47xA5KhTYp5bltF3WYDz1Ygd7gx07Auwp7iw7eNvnoDTAl KAl8KYDZzbDNCQGEbpY3efZIvPdeI+FWQN4W+kghy+P6au6PrIIhYraeua7XDdb2LS1en3Ss mE3QjqfRqI/A2ue8JMwsvXe/WK38Ezs6x74iTaqI3AFH6ilAhDqpMnd/msSESNFt76DiO1ZK QMr9amVPknjfPmJISqdhgB1DlEdw34sROf6V8mZw0xfqT6PKE46LcFefzs0kbg4GORf8vjG2 Sf1tk5eU8MBiyN/bZ03bKNjNYMpODDQQwuP84kYLkX2wBxxMAhBxwbDVZudzxDZJ1C2VXujC OJVxq2kljBM9ETYuUGqd75AW2LXrLw6+MuIsHFAYAgRr7+KcwDgBAfwhPBYX34nSSiHlmLC+ KaHLeCLF5ZI2vKm3HEeCTtlOg7xZEONgwzL+fdKo+D6SoC8RRxJKs8a3sVfI4t6CnrQzvJbB n6gxdgCu5i29J1QCYrCYvql2UyFPAK+do99/1jOXT4m2836j1wARAQABzSBQYXVsIEVnZ2Vy dCA8ZWdnZXJ0QGNzLnVjbGEuZWR1PsLBfgQTAQIAKAUCTIByZAIbAwUJEswDAAYLCQgHAwIG FQgCCQoLBBYCAwECHgECF4AACgkQ7ZfpDmKqfjRRGw/+Ij03dhYfYl/gXVRiuzV1gGrbHk+t nfrI/C7fAeoFzQ5tVgVinShaPkZo0HTPf18x6IDEdAiO8Mqo1yp0CtHmzGMCJ50o4Grgfjlr 6g/+vtEOKbhleszN2XpJvpwM2QgGvn/laTLUu8PH9aRWTs7qJJZKKKAb4sxYc92FehPu6FOD 0dDiyhlDAq4lOV2mdBpzQbiojoZzQLMQwjpgCTK2572eK9EOEQySUThXrSIz6ASenp4NYTFH s9tuJQvXk9gZDdPSl3bp+47dGxlxEWLpBIM7zIONw4ks4azgT8nvDZxA5IZHtvqBlJLBObYY 0Le61Wp0y3TlBDh2qdK8eYL426W4scEMSuig5gb8OAtQiBW6k2sGUxxeiv8ovWu8YAZgKJfu oWI+uRnMEddruY8JsoM54KaKvZikkKs2bg1ndtLVzHpJ6qFZC7QVjeHUh6/BmgvdjWPZYFTt N+KA9CWX3GQKKgN3uu988yznD7LnB98T4EUH1HA/GnfBqMV1gpzTvPc4qVQinCmIkEFp83zl +G5fCjJJ3W7ivzCnYo4KhKLpFUm97okTKR2LW3xZzEW4cLSWO387MTK3CzDOx5qe6s4a91Zu ZM/j/TQdTLDaqNn83kA4Hq48UHXYxcIh+Nd8k/3w6lFuoK0wrOFiywjLx+0ur5jmmbecBGHc 1xdhAFHOwU0ETIByZAEQAKaF678T9wyH4wjTrV1Pz3cDEoSnV/0ZUrOT37p1dcGyj/IXq1x6 70HRVahAmk0sZpYc25PF9D5GPYHFWlNjuPU96rDndXB3hedmBRhLdC4bAXjI4DV+bmdVe+q/ IMnlZRaVlm9EiMCVAR6w13sReu7qXkW9r3RwY2AzXskp/tAe4BRKr1Zmbvi2nbnQ6epEC42r Rbx0B1EhjbIQZ5JHGk24iPT7LdBgnNmos5wYjzwNlkMQD5T0Ydzhk7J+UxwA5m46mOhRDC2r FV/A0gm5TLy8DXjv/Esc4gYnYai6SQqnUEVh5LuV8YCJBnijs+Tiw71x1icmn6xGI45EugJO gec+rLypYgpVp4x0HI5T88qBRYCkxH3Kg8Qo+EWNA9A4LRQ9DX8njona0gf0s03tocK8kBN6 6UoqqPtHBnc4eMgBymCflK12eKfd2YYxnyg9cZazWA5VslvTxpm76hbg5oiAEH/Vg/8MxHyA nPhfrgwyPrmJEcVBafdspJnYQxBYNco2LFPIhlOvWh8r4at+s+M3Lb26oUTczlgdW1Sf3SDA 77BMRnF0FQyE+7AzV79MBN4ykiqaezQxtaF1Fy/tvkhffSo8u+dwG0EgJh+te38gTcISVr0G IPplLz6YhjrbHrPRF1CN5UuL9DBGjxuN35RLNVEfta6RUFlR6NctTjvrABEBAAHCwWUEGAEC AA8FAkyAcmQCGwwFCRLMAwAACgkQ7ZfpDmKqfjSrHA/+KzAKvTxRhA9MWNLxIyJ7S5uJ16gs T3oCjZrBKGEhKMOGX4O0GA6VOEryO7QRCCYah3oxSG38IAnNeiwJXgU9Bzkk85UGbPEd7HGF /VSeHCQwWou6jqUDTSDvn9YhNTdG0KXPM74aC+xr2Zow1O2mhXihgWKD0Dw+0LYPnUOsQ0KO FxHXXYHmRrS1OZPU59BLvc+TRhIhafSHKLwbXK+6ckkxBx6h8z5ccpG0Qs4bFhdFYnFrEieD LoGmnE2YLhdV6swJ9VNCS6pLiEohT3fm7aXm15tZOIyzMZhHRSAPblXxQ0ZSWjq8oRrcYNFx c4W1URpAkBCOYJoXvQfD5L3lqAl8TCqDUzYxhH/tJhbDdHrqHH767jaDaTB1+Talp/2AMKwc XNOdiklGxbmHVG6YGl6g8Lrbsu9NZEI4yLlHzuikthJWgz+3vZhVGyNlt+HNIoF6CjDL2omu 5cEq4RDHM44QqPk6l7O0pUvN1mT4B+S1b08RKpqm/ff015E37HNV/piIvJlxGAYz8PSfuGCB 1thMYqlmgdhd9/BabGFbGGYHA6U4/T5zqU+f6xHy1SsAQZ1MSKlLwekBIT+4/cLRGqCHjnV0 q5H/T6a7t5mPkbzSrOLSo4puj+IToNjYyYIDBWzhlA19avOa+rvUjmHtD3sFN7cXWtkGoi8b uNcby4U= Organization: UCLA Computer Science Department Message-ID: Date: Fri, 27 Jul 2018 12:05:17 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <2027749.UlHNbUF2ei@omega> Content-Type: multipart/mixed; boundary="------------DE67227910A8A07457A581D2" Content-Language: en-US X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 32236-done Cc: Chih-Hsuan Yen , bug-gnulib , =?UTF-8?Q?P=c3=a1draig_Brady?= , 32236-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) This is a multi-part message in MIME format. --------------DE67227910A8A07457A581D2 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Bruno Haible wrote: > You can assume that mbrtowc returns > 0 if and only if the multibyte sequence is a NUL byte - but you had > chosen srcend in such a way that this would not happen in the loop. Thanks for the correction. I mistakenly thought that C allows multibyte=20 encodings in which a null wide character's multibyte representation conta= ins an=20 all-bits-zero byte. I installed the attached to omit the unnecessary test= . --------------DE67227910A8A07457A581D2 Content-Type: text/x-patch; name="0001-df-omit-redundant-comparison.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0001-df-omit-redundant-comparison.patch" =46rom 34e261fa2768533f34cf35158489ea6a22115c17 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 27 Jul 2018 12:00:02 -0700 Subject: [PATCH] df: omit redundant comparison Trivial inefficiency reported by Bruno Haible in: http://lists.gnu.org/r/bug-gnulib/2018-07/msg00109.html * src/df.c (hide_problematic_chars): Omit redundant test. --- src/df.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/df.c b/src/df.c index 9b65872..5553221 100644 --- a/src/df.c +++ b/src/df.c @@ -288,7 +288,7 @@ hide_problematic_chars (char *cell) wchar_t wc; size_t srcbytes =3D srcend - src; n =3D mbrtowc (&wc, src, srcbytes, &mbstate); - bool ok =3D 0 < n && n <=3D srcbytes; + bool ok =3D n <=3D srcbytes; =20 if (ok) ok =3D !iswcntrl (wc); --=20 2.7.4 --------------DE67227910A8A07457A581D2-- From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 29 01:54:02 2018 Received: (at 32236-done) by debbugs.gnu.org; 29 Jul 2018 05:54:02 +0000 Received: from localhost ([127.0.0.1]:33199 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fjeeX-0006o8-Nv for submit@debbugs.gnu.org; Sun, 29 Jul 2018 01:54:01 -0400 Received: from mail-lf1-f65.google.com ([209.85.167.65]:38244) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fjeeV-0006nj-P0 for 32236-done@debbugs.gnu.org; Sun, 29 Jul 2018 01:54:00 -0400 Received: by mail-lf1-f65.google.com with SMTP id a4-v6so6036295lff.5 for <32236-done@debbugs.gnu.org>; Sat, 28 Jul 2018 22:53:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=PFagaEkbH5kykYr14w/iq3Fb8DMPpr92MzZZrSWgPTU=; b=a6EAJyDRqeZDap8m3t1zUBF+T7X/arC45yPOBFuNzF8CzSvI082rOTKu74FUCUqzth pLUSfaBcOOiQT9dwbvx9kWCqbBqIw2yIHtEEfnM1oCUss538zMGsTI6s5e5pEKszo2gA qGlBf5/VI4LuLpwgkDDP0kntYtM6zTI2hT/5bZ16JwFiWkTZy1H7bGXcC9XTZCPCHIxC lkbPQpomx1GY+Y6wzSHauyhGiiwqLHW4AL4UJXTyw++HvyvoR9YaEC15zAseN/gC3M/7 dHXrTfaP7hoFoobcWMS0dwiSd4Dn6bEtUmhDzGN/k5bnCqeIOP1qdRbM4rPmUcDNUvVX X5tQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=PFagaEkbH5kykYr14w/iq3Fb8DMPpr92MzZZrSWgPTU=; b=b1RtMvw/q/TLWc69odmeek7mUG989EFvyXVvGkj0UIuASaIrNwtgb5PA4jVrRZ7+F7 wil65XSt3dmWZzeR9yTIukLBySmJp0OPxrOIaMG/w2AucZlIytxcZC0YIZb1hYlEOIRv E7hH4XJWV6v6YowoikyTv+qqyQAELyh4vCo7Hf0nTk7/yC0UH92Qt/4qKnzjkJdT6upo aRm1Fzh+STp2eVTwAmvAjYr/nUI+B0lT6aD60pQbWEc172RlGBF0KMOVR4qcBmnDNlJ5 P+m5PmyuPyAmHaIhDTqTP+V6UFKpRpMSdVAk7+PIEKMr64YoL6OMe8yQZ/nVS4sAUzfs 7T6g== X-Gm-Message-State: AOUpUlEXgotk2yoYMU92TMijmESe+ax+TWqJzINai+OJtSaLfvyvxLKl M04TCBCPb8zx+p19URclsumgv4CbxFwXN3c04cQ= X-Google-Smtp-Source: AAOMgpdqkPD1gUBrLdglxW6wUfp33GV3aS9X6QTYaTPCTGTb1GmdOlhH9xWJykREhjmL5Cvp5RnjacvG/3RImxCFH6M= X-Received: by 2002:a19:c403:: with SMTP id u3-v6mr7204585lff.87.1532843633699; Sat, 28 Jul 2018 22:53:53 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a2e:6590:0:0:0:0:0 with HTTP; Sat, 28 Jul 2018 22:53:53 -0700 (PDT) In-Reply-To: References: <61bb0915-497b-b32c-9252-73e1406e0154@cs.ucla.edu> <2027749.UlHNbUF2ei@omega> From: Chih-Hsuan Yen Date: Sun, 29 Jul 2018 13:53:53 +0800 Message-ID: Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: Paul Eggert Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 32236-done Cc: =?UTF-8?Q?P=C3=A1draig_Brady?= , bug-gnulib , Bruno Haible , 32236-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) 2018-07-28 3:05 GMT+08:00 Paul Eggert : > Bruno Haible wrote: >> >> You can assume that mbrtowc returns >> 0 if and only if the multibyte sequence is a NUL byte - but you had >> chosen srcend in such a way that this would not happen in the loop. > > > Thanks for the correction. I mistakenly thought that C allows multibyte > encodings in which a null wide character's multibyte representation contains > an all-bits-zero byte. I installed the attached to omit the unnecessary > test. Thanks you all for the efforts! I've installed commit e5dae2c6b0bcd0e4ac6e5b212688d223e2e62f79 of coreutils, and `df` works like a charm! Cheers! Chih-Hsuan Yen From unknown Sun Sep 07 03:08:26 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 26 Aug 2018 11:24:06 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator From debbugs-submit-bounces@debbugs.gnu.org Sun Mar 03 17:49:43 2019 Received: (at control) by debbugs.gnu.org; 3 Mar 2019 22:49:43 +0000 Received: from localhost ([127.0.0.1]:58761 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1h0ZvT-00078E-Db for submit@debbugs.gnu.org; Sun, 03 Mar 2019 17:49:43 -0500 Received: from mail.magicbluesmoke.com ([82.195.144.49]:38684) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1h0ZvR-000785-BI for control@debbugs.gnu.org; Sun, 03 Mar 2019 17:49:41 -0500 Received: from localhost.localdomain (unknown [76.21.115.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id 676679A73 for ; Sun, 3 Mar 2019 22:49:39 +0000 (GMT) To: GNU bug tracker automated control server From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: Date: Sun, 3 Mar 2019 14:49:37 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Spam-Score: 2.0 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: unarchive 32236 Content analysis details: (2.0 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 1.8 MISSING_SUBJECT Missing Subject: header 0.2 NO_SUBJECT Extra score for no subject X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) unarchive 32236 From debbugs-submit-bounces@debbugs.gnu.org Sun Mar 03 17:54:02 2019 Received: (at 32236) by debbugs.gnu.org; 3 Mar 2019 22:54:02 +0000 Received: from localhost ([127.0.0.1]:58766 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1h0Zzd-0007Ew-W7 for submit@debbugs.gnu.org; Sun, 03 Mar 2019 17:54:02 -0500 Received: from mail.magicbluesmoke.com ([82.195.144.49]:38738) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1h0Zzb-0007Eh-Dl for 32236@debbugs.gnu.org; Sun, 03 Mar 2019 17:54:00 -0500 Received: from localhost.localdomain (unknown [76.21.115.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id 2C6169ADA; Sun, 3 Mar 2019 22:53:57 +0000 (GMT) Subject: Re: bug#32236: df header corrupted with LANG=zh_TW.UTF-8 on macOS To: 32236@debbugs.gnu.org, eggert@cs.ucla.edu, yan12125@gmail.com References: <1599384.GXLMD97vOh@omega> <1552df41-3c80-f1fd-8749-bb664de43f29@draigBrady.com> <4004100.9zfVm5Hql4@omega> <4839bde7-af25-f5b2-302b-305655a774da@cs.ucla.edu> <61bb0915-497b-b32c-9252-73e1406e0154@cs.ucla.edu> From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: Date: Sun, 3 Mar 2019 14:53:56 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <61bb0915-497b-b32c-9252-73e1406e0154@cs.ucla.edu> Content-Type: multipart/mixed; boundary="------------69D660C76604A0A5FC38BB3D" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 32236 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) This is a multi-part message in MIME format. --------------69D660C76604A0A5FC38BB3D Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit On 26/07/18 18:23, Paul Eggert wrote: > Pádraig Brady wrote: >> I've pushed the c_iscntrl patch since it's simplest >> and probably most appropriate patch for an existing release. > > Yes, that makes sense for a quick patch. However, for the next release I think > it'd be better to catch encoding errors and multibyte control characters, given > the problems noted. I installed the attached further patch to try to do this. > This fixes the problem that Bruno noted, along with two others; my earlier patch > neglected the possibility that mbrtowc can return 0, and it incorrectly assumed > wide control characters always have a single-byte representation. > > Either way the original bug appears to be fix so I'm boldly closing the bug report. Reviewing this, I dislike the way that we're now enforcing that the file system locale needs to match the current user's locale or otherwise df will not output all original characters. That has the potential to break scripts, as mismatched encodings is a common issue. In the attached I've taken the original less aggressive replacement policy when not outputting to a tty, leaving more sanitizing to the tty case. cheers, Pádraig --------------69D660C76604A0A5FC38BB3D Content-Type: text/x-patch; name="df-relax-encoding.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="df-relax-encoding.patch" >From 97bc0e17065950f96a6e1350d1ed8db65ebfee96 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= Date: Sun, 3 Mar 2019 14:35:18 -0800 Subject: [PATCH] df: don't require file system and display encodings to match * src/df.c (replace_problematic_chars): A new wrapper to be more conservative in our replacement when not connected to a tty. * tests/df/problematic-chars.sh: Add a test case. --- src/df.c | 34 +++++++++++++++++++++++++++++++--- tests/df/problematic-chars.sh | 29 +++++++++++++++++++++++------ 2 files changed, 54 insertions(+), 9 deletions(-) diff --git a/src/df.c b/src/df.c index 1eb7bcd..041f282 100644 --- a/src/df.c +++ b/src/df.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -273,10 +274,26 @@ static struct option const long_options[] = {NULL, 0, NULL, 0} }; +/* Replace problematic chars with '?'. + Since only control characters are currently considered, + this should work in all encodings. */ + +static void +replace_control_chars (char *cell) +{ + char *p = cell; + while (*p) + { + if (c_iscntrl (to_uchar (*p))) + *p = '?'; + p++; + } +} + /* Replace problematic chars with '?'. */ static void -hide_problematic_chars (char *cell) +replace_invalid_chars (char *cell) { char *srcend = cell + strlen (cell); char *dst = cell; @@ -310,6 +327,17 @@ hide_problematic_chars (char *cell) *dst = '\0'; } +static void +replace_problematic_chars (char *cell) +{ + static int tty_out = -1; + if (tty_out < 0) + tty_out = isatty (STDOUT_FILENO); + + (tty_out ? replace_invalid_chars : replace_control_chars) (cell) ; +} + + /* Dynamically allocate a row of pointers in TABLE, which can then be accessed with standard 2D array notation. */ @@ -591,7 +619,7 @@ get_header (void) if (!cell) xalloc_die (); - hide_problematic_chars (cell); + replace_problematic_chars (cell); table[nrows - 1][col] = cell; @@ -1205,7 +1233,7 @@ get_dev (char const *disk, char const *mount_point, char const* file, if (!cell) assert (!"empty cell"); - hide_problematic_chars (cell); + replace_problematic_chars (cell); size_t cell_width = mbswidth (cell, 0); columns[col]->width = MAX (columns[col]->width, cell_width); table[nrows - 1][col] = cell; diff --git a/tests/df/problematic-chars.sh b/tests/df/problematic-chars.sh index 34e743b..aa4c131 100755 --- a/tests/df/problematic-chars.sh +++ b/tests/df/problematic-chars.sh @@ -17,14 +17,17 @@ # along with this program. If not, see . . "${srcdir=.}/tests/init.sh"; path_prepend_ ./src -print_ver_ df +print_ver_ df printf require_root_ + +# Ensure a new line in a mount point only outputs a single line + mnt='mount point' cwd=$(pwd) -cleanup_() { cd /; umount "$cwd/$mnt"; } +cleanup_() { umount "$cwd/$mnt"; } skip=0 # Create a file system, then mount it. @@ -33,14 +36,28 @@ dd if=/dev/zero of=blob bs=8192 count=200 > /dev/null 2>&1 \ mkdir "$mnt" || skip=1 mkfs -t ext2 -F blob \ || skip_ "failed to create ext2 file system" - mount -oloop blob "$mnt" || skip=1 - test $skip = 1 \ && skip_ "insufficient mount/ext2 support" - test $(df "$mnt" | wc -l) = 2 || fail=1 - test "$fail" = 1 && dump_mount_list_ + +# Ensure mount points not matching the current user encoding are output + +unset LC_ALL +f=$LOCALE_FR_UTF8 +: ${LOCALE_FR_UTF8=none} +if test "$LOCALE_FR_UTF8" != "none"; then + + cleanup_ || framework_failure_ + + mnt="$(env printf 'm\xf3unt p\xf3int')" + mkdir "$mnt" || framework_failure_ + mount -oloop blob "$mnt" || skip_ "unable to mount $mnt" + + LC_ALL=$f df --output=target "$mnt" > df.out || fail=1 + test "$(basename "$(tail -n1 df.out)")" = "$mnt" || fail=1 +fi + Exit $fail -- 2.9.3 --------------69D660C76604A0A5FC38BB3D-- From unknown Sun Sep 07 03:08:26 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 01 Apr 2019 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator