From unknown Fri Jun 20 07:13:02 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#30814 <30814@debbugs.gnu.org> To: bug#30814 <30814@debbugs.gnu.org> Subject: Status: Please increase the value of MAX_MON_WIDTH in ls.c Reply-To: bug#30814 <30814@debbugs.gnu.org> Date: Fri, 20 Jun 2025 14:13:02 +0000 retitle 30814 Please increase the value of MAX_MON_WIDTH in ls.c reassign 30814 coreutils submitter 30814 Rafal Luzynski severity 30814 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 13 20:07:02 2018 Received: (at submit) by debbugs.gnu.org; 14 Mar 2018 00:07:03 +0000 Received: from localhost ([127.0.0.1]:60157 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1evtwc-0005mC-LR for submit@debbugs.gnu.org; Tue, 13 Mar 2018 20:07:02 -0400 Received: from eggs.gnu.org ([208.118.235.92]:58033) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1evtwZ-0005ld-R4 for submit@debbugs.gnu.org; Tue, 13 Mar 2018 20:07:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1evtwT-0000lE-Pk for submit@debbugs.gnu.org; Tue, 13 Mar 2018 20:06:54 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:45991) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1evtwT-0000l9-MP for submit@debbugs.gnu.org; Tue, 13 Mar 2018 20:06:53 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49002) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1evtwS-0007N6-Fc for bug-coreutils@gnu.org; Tue, 13 Mar 2018 20:06:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1evtwP-0000hx-96 for bug-coreutils@gnu.org; Tue, 13 Mar 2018 20:06:52 -0400 Received: from ano163.rev.netart.pl ([85.128.223.163]:37192) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1evtwP-0000hI-23 for bug-coreutils@gnu.org; Tue, 13 Mar 2018 20:06:49 -0400 X-Virus-Scanned: by amavisd-new using ClamAV (9) Received: from poczta.nazwa.pl (ox7.netart.com.pl [10.252.0.17]) by id16c608407a.nazwa.pl (Postfix) with ESMTP id 730111C9199 for ; Wed, 14 Mar 2018 01:06:46 +0100 (CET) Date: Wed, 14 Mar 2018 01:06:46 +0100 (CET) From: Rafal Luzynski To: bug-coreutils@gnu.org Message-ID: <1501208979.82403.1520986006382@poczta.nazwa.pl> Subject: Please increase the value of MAX_MON_WIDTH in ls.c MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Priority: 3 Importance: Medium X-Mailer: Open-Xchange Mailer v7.8.4-Rev22 X-Originating-Client: com.openexchange.ox.gui.dhtml X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Rafal Luzynski Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) As we have introduced the support of nominative and genitive month names in glibc [1] and we are going to provide the updated locale data for Catalan language [2] it has been discovered [3] that the current limit of the maximum length of the abbreviated month name as displayed by "ls -l" will not work with the new data for Catalan. It is obligatory to precede the month name with "de " (note: the space) so the abbreviated month names limited to 5 characters will be ambiguous and therefore unreadable: de ma (should be "de mar" at least) d=E2=80=99abr (correct) de ma (should be "de mai" at least) de ju (should be "de jun" at least) de ju (should be "de jul" at least) Increasing the value of MAX_MON_WIDTH to 6 characters will fix the problem. The location of the constant is here: [4] Although it has been also suggested in the same bug report that there should be no additional limit for the month length. This bug may be related with the coreutils bug #29377. [5] Regards, Rafal Luzynski [1] https://sourceware.org/bugzilla/show_bug.cgi?id=3D10871 [2] https://sourceware.org/bugzilla/show_bug.cgi?id=3D22848 [3] https://sourceware.org/bugzilla/show_bug.cgi?id=3D22848#c6 [4] http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/ls.c#n1099 [5] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D29377 From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 14 14:40:35 2018 Received: (at 30814) by debbugs.gnu.org; 14 Mar 2018 18:40:35 +0000 Received: from localhost ([127.0.0.1]:33607 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ewBKF-00010w-47 for submit@debbugs.gnu.org; Wed, 14 Mar 2018 14:40:35 -0400 Received: from mail.magicbluesmoke.com ([82.195.144.49]:36032) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ewBKD-00010n-Fh for 30814@debbugs.gnu.org; Wed, 14 Mar 2018 14:40:34 -0400 Received: from localhost.localdomain (unknown [109.78.204.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id F28909F2C; Wed, 14 Mar 2018 18:40:31 +0000 (GMT) Subject: Re: bug#30814: Please increase the value of MAX_MON_WIDTH in ls.c To: Rafal Luzynski , 30814@debbugs.gnu.org References: <1501208979.82403.1520986006382@poczta.nazwa.pl> From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: Date: Wed, 14 Mar 2018 11:40:31 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <1501208979.82403.1520986006382@poczta.nazwa.pl> Content-Type: multipart/mixed; boundary="------------719FD9970EA2A568967A70EE" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 30814 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) This is a multi-part message in MIME format. --------------719FD9970EA2A568967A70EE Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit On 13/03/18 17:06, Rafal Luzynski wrote: > As we have introduced the support of nominative and genitive > month names in glibc [1] and we are going to provide the updated > locale data for Catalan language [2] it has been discovered [3] > that the current limit of the maximum length of the abbreviated > month name as displayed by "ls -l" will not work with the new > data for Catalan. It is obligatory to precede the month name > with "de " (note: the space) so the abbreviated month names limited > to 5 characters will be ambiguous and therefore unreadable: It's a bit surprising that _abbreviations_ all need the "de " prefix, but fair enough. > de ma (should be "de mar" at least) > d’abr (correct) > de ma (should be "de mai" at least) > de ju (should be "de jun" at least) > de ju (should be "de jul" at least) > > Increasing the value of MAX_MON_WIDTH to 6 characters will fix > the problem. The location of the constant is here: [4] > > Although it has been also suggested in the same bug report that > there should be no additional limit for the month length. > > This bug may be related with the coreutils bug #29377. [5] > > Regards, > > Rafal Luzynski > > > [1] https://sourceware.org/bugzilla/show_bug.cgi?id=10871 > [2] https://sourceware.org/bugzilla/show_bug.cgi?id=22848 > [3] https://sourceware.org/bugzilla/show_bug.cgi?id=22848#c6 > [4] http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/ls.c#n1099 > [5] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=29377 > > > > Thanks for the careful analysis. 5 was chosen as a max width for abmon as that was seen to be unambiguous and also truncate overly long abbreviations. One can browse the abbreviations by length using: locale -a | grep utf8 | while read l; do LC_ALL=$l locale abmon; done | tr ';' '\n' | sort -u | grep '.\{5,\}' | while read mon; do printf '%02d %s\n' "$(echo "$mon" | wc -L)" "$mon" done | sort -n | less That shows a couple of existing issues with the limit of 5. ln_CD.utf8 (Democratic Republic of the Congo) needs a length of 7 to be unambiguous, while Arabic needs 12! I don't remember arabic being so long at the time I implemented the alignment/truncation in ls (9 years ago), but we should probably expand to account for that. $ LC_ALL=ln_CD.utf8 locale abmon sánzá1.;sánzá2.;sánzá3.;sánzá4.;sánzá5.;sánzá6.;sánzá7.;sánzá8.;sánzá9.;sánz10.;sánzá11.;sánzá12. $ LC_ALL=ar_SY.utf8 locale abmon | tr ';' '\n' كانون الثاني شباط آذار نيسان نوار حزيران تموز آب أيلول تشرين الأول تشرين الثاني كانون الأول Given the increase in supported size should only impact relatively few languages it probably makes sense to increase to 12. The attached does that and also augments the test to find ambiguous cases. cheers, Pádraig --------------719FD9970EA2A568967A70EE Content-Type: text/x-patch; name="ls-abmon-width.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="ls-abmon-width.patch" =46rom d383dfd223c5d24ec22556d5707151f8c5ca18cf Mon Sep 17 00:00:00 2001 From: =3D?UTF-8?q?P=3DC3=3DA1draig=3D20Brady?=3D Date: Wed, 14 Mar 2018 11:31:43 -0700 Subject: [PATCH] ls: increase the allowed abmon width from 5 to 12 This will impact relatively few languages, and will make Arabic, Catalan, Lingala etc. output unambiguous abbreviated month names. * src/ls.c (MAX_MON_WIDTH): Increase from 5 to 12. * NEWS: Mention the bug fix. * tests/ls/abmon-align.sh: Augment to check for ambiguous output. Fixes https://bugs.gnu.org/30814 --- NEWS | 4 ++++ src/ls.c | 7 +++++-- tests/ls/abmon-align.sh | 9 ++++++--- 3 files changed, 15 insertions(+), 5 deletions(-) diff --git a/NEWS b/NEWS index e5569eb..351a082 100644 --- a/NEWS +++ b/NEWS @@ -21,6 +21,10 @@ GNU coreutils NEWS = -*- outline -*- Previously it would have set executable bits on created special files.= [bug introduced with coreutils-8.20] =20 + ls no longer truncates the abbreviated month names that have a + display width between 6 and 12 inclusive. Previously this would have + output ambiguous months for Arabic or Catalan locales. + ** Improvements =20 stat and tail now know about the "exfs" file system, which is a diff --git a/src/ls.c b/src/ls.c index cd6b09c..c89a22f 100644 --- a/src/ls.c +++ b/src/ls.c @@ -1095,8 +1095,11 @@ file_escape_init (void) variable width abbreviated months and also precomputing/caching the names was seen to increase the performance of ls significantly. = */ =20 -/* max number of display cells to use */ -enum { MAX_MON_WIDTH =3D 5 }; +/* max number of display cells to use. + As of 2018 the abmon for Arabic has entries with width 12. + It doesn't make much sense to support wider than this + and locales should aim for abmon entries of width <=3D 5. */ +enum { MAX_MON_WIDTH =3D 12 }; /* abformat[RECENT][MON] is the format to use for timestamps with recentness RECENT and month MON. */ enum { ABFORMAT_SIZE =3D 128 }; diff --git a/tests/ls/abmon-align.sh b/tests/ls/abmon-align.sh index b639ca9..d4ff708 100755 --- a/tests/ls/abmon-align.sh +++ b/tests/ls/abmon-align.sh @@ -32,17 +32,20 @@ for format in "%b" "[%b" "%b]" "[%b]"; do # The sed usage here is slightly different from the original, # removing the \(.*\), to avoid triggering misbehavior in at least # GNU sed 4.2 (possibly miscompiled) on Mac OS X (Darwin 9.8.0). - n_widths=3D$( + months=3D"$( LC_ALL=3D$LOC TIME_STYLE=3D+"$format" ls -lgG *.ts | - LC_ALL=3DC sed 's/.\{15\}//;s/ ..\.ts$//;s/ /./g' | + LC_ALL=3DC sed 's/.\{15\}//;s/ ..\.ts$//;s/ /./g')" + n_widths=3D$(echo "$months" | while read mon; do echo "$mon" | LC_ALL=3D$LOC wc -L; done | uniq | wc -l ) + n_dupes=3D$(echo "$months" | sort | uniq -d | wc -l) test "$n_widths" =3D "1" || { fail=3D1; break 2; } + test "$n_dupes" =3D "0" || { fail=3D1; break 2; } done done if test "$fail" =3D "1"; then - echo "misalignment detected in $LOC locale:" + echo "misalignment or ambiguous output in $LOC locale:" LC_ALL=3D$LOC TIME_STYLE=3D+%b ls -lgG *.ts fi =20 --=20 2.9.3 --------------719FD9970EA2A568967A70EE-- From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 14 18:53:29 2018 Received: (at 30814) by debbugs.gnu.org; 14 Mar 2018 22:53:29 +0000 Received: from localhost ([127.0.0.1]:33746 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ewFGy-0007cU-R7 for submit@debbugs.gnu.org; Wed, 14 Mar 2018 18:53:29 -0400 Received: from ano163.rev.netart.pl ([85.128.223.163]:34768) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ewFGx-0007cE-1u for 30814@debbugs.gnu.org; Wed, 14 Mar 2018 18:53:27 -0400 X-Virus-Scanned: by amavisd-new using ClamAV (15) X-Spam-Flag: NO X-Spam-Score: 1 X-Spam-Level: * X-Spam-Status: No, score=1 tagged_above=-10 tests=[NA_REMOVAL=1] autolearn=disabled Received: from poczta.nazwa.pl (unknown [10.252.0.27]) by id16c608407a.nazwa.pl (Postfix) with ESMTP id 7192E1A610A; Wed, 14 Mar 2018 23:53:20 +0100 (CET) Date: Wed, 14 Mar 2018 23:53:20 +0100 (CET) From: Rafal Luzynski To: 30814@debbugs.gnu.org, =?UTF-8?Q?P=C3=A1draig_Brady?= Message-ID: <1976408187.134668.1521068000360@poczta.nazwa.pl> In-Reply-To: References: <1501208979.82403.1520986006382@poczta.nazwa.pl> Subject: Re: bug#30814: Please increase the value of MAX_MON_WIDTH in ls.c MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Priority: 3 Importance: Medium X-Mailer: Open-Xchange Mailer v7.8.4-Rev22 X-Originating-Client: com.openexchange.ox.gui.dhtml X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 30814 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Rafal Luzynski Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) 14.03.2018 19:40 P=C3=A1draig Brady wrote: > [...] > One can browse the abbreviations by length using: > > locale -a | grep utf8 | > while read l; do LC_ALL=3D$l locale abmon; done | > tr ';' '\n' | sort -u | grep '.\{5,\}' | > while read mon; do > printf '%02d %s\n' "$(echo "$mon" | wc -L)" "$mon" > done | > sort -n | less > > That shows a couple of existing issues with the limit of 5. > ln_CD.utf8 (Democratic Republic of the Congo) needs a length of 7 to be > unambiguous, > while Arabic needs 12! > [...] > > $ LC_ALL=3Dln_CD.utf8 locale abmon > s=C3=A1nz=C3=A11.;s=C3=A1nz=C3=A12.;s=C3=A1nz=C3=A13.;s=C3=A1nz=C3=A14.;s= =C3=A1nz=C3=A15.;s=C3=A1nz=C3=A16.;s=C3=A1nz=C3=A17.;s=C3=A1nz=C3=A18.;s=C3= =A1nz=C3=A19.;s=C3=A1nz10.;s=C3=A1nz=C3=A111.;s=C3=A1nz=C3=A112. Nice, script, thank you. :-) The issue with ln_CD is no longer true, it has been fixed in June/July 2017. Please see the output on Fedora 28 (beta) with glibc 2.27: $ LC_ALL=3Dln_CD.utf8 locale abmon yan;fbl;msi;apl;mai;yun;yul;agt;stb;=C9=94tb;nvb;dsb but it does not help because some Arabic languages still need 12. Even worse, your script ran at the same machine gives the following output (only the final lines): ... 11 siakwa kati 11 yahbra kati 11 =D8=AA=D8=B4=D8=B1=D9=8A=D9=86 =D8=A7=D9=84=D8=A3=D9=88=D9=84 11 =D9=83=D8=A7=D9=86=D9=88=D9=86 =D8=A7=D9=84=D8=A3=D9=88=D9=84 12 kakamuk kati 12 pastara kati 12 waupasa kati 12 =D8=AA=D8=B4=D8=B1=D9=8A=D9=86 =D8=A7=D9=84=D8=AB=D8=A7=D9=86=D9=8A 12 =D9=83=D8=A7=D9=86=D9=88=D9=86 =D8=A7=D9=84=D8=AB=D8=A7=D9=86=D9=8A 15 l=C3=AE wainhka kati 15 lih mairin kati (END) Those with 15 characters come from miq_NI language which has been introduced in September 2017 (glibc 2.27, released Feb 1, 2018): $ LC_ALL=3Dmiq_NI.utf8 locale abmon siakwa kati;kuswa kati;kakamuk kati;l=C3=AE wainhka kati;lih mairin kati;l= =C3=AE kati;pastara kati;sikla kati;w=C3=AEs kati;waupasa kati;yahbra kati;trisu k= ati $ LC_ALL=3Dmiq_NI.utf8 locale mon siakwa kati;kuswa kati;kakamuk kati;l=C3=AE wainhka kati;lih mairin kati;l= =C3=AE kati;pastara kati;sikla kati;w=C3=AEs kati;waupasa kati;yahbra kati;trisu k= ati But, as you can see, this locale data should be fixed because abmon and mon are the same; at least " kati" which appears everywhere may be probably removed. Also truncating the string to 12 characters probably still makes it unambiguous. While at this, I have not checked but does your tests/ls/abmon-align.sh script check for the length required to make all abbreviated month names unambiguous (i.e., how many letters can we truncate to ensure that the month names are still unambiguous) or just the longest abbreviated month name? > $ LC_ALL=3Dar_SY.utf8 locale abmon | tr ';' '\n' > [...] This is still true although again, mon and abmon seem to be the same in ar_SY which is probably not the best we can have. I wish I could fix it if I only knew how. :) (BTW, other Arabic variants seem to have the abbreviated month names shorter.) > [...] > Given the increase in supported size should only impact relatively few > languages > it probably makes sense to increase to 12. The attached does that > and also augments the test to find ambiguous cases. 12 is more than I asked for but that's definitely not destructive. My only remark is: please remove "Lingala" from the commit comment because it is no longer true. Otherwise the patch seems to be OK. Thank you and best regards, Rafal From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 16 06:15:08 2018 Received: (at 30814) by debbugs.gnu.org; 16 Mar 2018 10:15:08 +0000 Received: from localhost ([127.0.0.1]:35737 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ewmOC-0003XL-8u for submit@debbugs.gnu.org; Fri, 16 Mar 2018 06:15:08 -0400 Received: from mail.magicbluesmoke.com ([82.195.144.49]:56764) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ewmOA-0003X9-Cl for 30814@debbugs.gnu.org; Fri, 16 Mar 2018 06:15:07 -0400 Received: from localhost.localdomain (unknown [109.76.132.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id C9D1DA259; Fri, 16 Mar 2018 10:15:04 +0000 (GMT) Subject: Re: bug#30814: Please increase the value of MAX_MON_WIDTH in ls.c To: Rafal Luzynski , 30814@debbugs.gnu.org References: <1501208979.82403.1520986006382@poczta.nazwa.pl> <1976408187.134668.1521068000360@poczta.nazwa.pl> From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: Date: Fri, 16 Mar 2018 03:15:04 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <1976408187.134668.1521068000360@poczta.nazwa.pl> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 30814 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 14/03/18 15:53, Rafal Luzynski wrote: > 14.03.2018 19:40 Pádraig Brady wrote: >> [...] >> One can browse the abbreviations by length using: >> >> locale -a | grep utf8 | >> while read l; do LC_ALL=$l locale abmon; done | >> tr ';' '\n' | sort -u | grep '.\{5,\}' | >> while read mon; do >> printf '%02d %s\n' "$(echo "$mon" | wc -L)" "$mon" >> done | >> sort -n | less >> >> That shows a couple of existing issues with the limit of 5. >> ln_CD.utf8 (Democratic Republic of the Congo) needs a length of 7 to be >> unambiguous, >> while Arabic needs 12! >> [...] >> >> $ LC_ALL=ln_CD.utf8 locale abmon >> sánzá1.;sánzá2.;sánzá3.;sánzá4.;sánzá5.;sánzá6.;sánzá7.;sánzá8.;sánzá9.;sánz10.;sánzá11.;sánzá12. > > Nice, script, thank you. :-) The issue with ln_CD is no longer > true, it has been fixed in June/July 2017. Please see the output > on Fedora 28 (beta) with glibc 2.27: > > $ LC_ALL=ln_CD.utf8 locale abmon > yan;fbl;msi;apl;mai;yun;yul;agt;stb;ɔtb;nvb;dsb > > but it does not help because some Arabic languages still need 12. > Even worse, your script ran at the same machine gives the following > output (only the final lines): > > ... > 11 siakwa kati > 11 yahbra kati > 11 تشرين الأول > 11 كانون الأول > 12 kakamuk kati > 12 pastara kati > 12 waupasa kati > 12 تشرين الثاني > 12 كانون الثاني > 15 lî wainhka kati > 15 lih mairin kati > (END) > > Those with 15 characters come from miq_NI language which has been > introduced in September 2017 (glibc 2.27, released Feb 1, 2018): > > $ LC_ALL=miq_NI.utf8 locale abmon > siakwa kati;kuswa kati;kakamuk kati;lî wainhka kati;lih mairin kati;lî > kati;pastara kati;sikla kati;wîs kati;waupasa kati;yahbra kati;trisu kati > $ LC_ALL=miq_NI.utf8 locale mon > siakwa kati;kuswa kati;kakamuk kati;lî wainhka kati;lih mairin kati;lî > kati;pastara kati;sikla kati;wîs kati;waupasa kati;yahbra kati;trisu kati > > But, as you can see, this locale data should be fixed because abmon > and mon are the same; > at least " kati" which appears everywhere may > be probably removed. Also truncating the string to 12 characters > probably still makes it unambiguous. > > While at this, I have not checked but does your tests/ls/abmon-align.sh > script check for the length required to make all abbreviated month > names unambiguous (i.e., how many letters can we truncate to ensure > that the month names are still unambiguous) or just the longest > abbreviated month name? It checks that 12 months for a few sample languages are unambiguous > >> $ LC_ALL=ar_SY.utf8 locale abmon | tr ';' '\n' >> [...] > > This is still true although again, mon and abmon seem to be the same > in ar_SY which is probably not the best we can have. I wish I could > fix it if I only knew how. :) A patch to glibc would be most appreciated, but as for content I don't know. I see ICU has narrow, short, long variants, but for ar_SY the narrow are ambiguous, and the short are copies of the long ones: http://demo.icu-project.org/icu-bin/locexp?d_=en&_=ar_SY > (BTW, other Arabic variants seem to have > the abbreviated month names shorter.) Right, I see the long Arabic names are derived from Aramaic: https://en.wikipedia.org/wiki/Arabic_names_of_calendar_months >> [...] >> Given the increase in supported size should only impact relatively few >> languages >> it probably makes sense to increase to 12. The attached does that >> and also augments the test to find ambiguous cases. > > 12 is more than I asked for but that's definitely not destructive. > My only remark is: please remove "Lingala" from the commit comment > because it is no longer true. Otherwise the patch seems to be OK. Given this is usually a deficiency in the locale rather than inherent in the language, I'm definitely not going above 12. I'd even drop it to 8 if there were apparent short abmons for all languages, but will leave at 12 as this isn't the case for ar_SY at least. cheers, Pádraig From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 16 06:18:57 2018 Received: (at control) by debbugs.gnu.org; 16 Mar 2018 10:18:57 +0000 Received: from localhost ([127.0.0.1]:35742 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ewmRs-0003cn-V7 for submit@debbugs.gnu.org; Fri, 16 Mar 2018 06:18:57 -0400 Received: from mail.magicbluesmoke.com ([82.195.144.49]:59662) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ewmRq-0003cf-TH for control@debbugs.gnu.org; Fri, 16 Mar 2018 06:18:55 -0400 Received: from localhost.localdomain (unknown [109.76.132.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id 5DCC0A26E for ; Fri, 16 Mar 2018 10:18:54 +0000 (GMT) Subject: Re: bug#30814: Please increase the value of MAX_MON_WIDTH in ls.c References: <1501208979.82403.1520986006382@poczta.nazwa.pl> <1976408187.134668.1521068000360@poczta.nazwa.pl> To: GNU bug tracker automated control server From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: <3319dfa9-8c03-1004-4143-b008b066fd0e@draigBrady.com> Date: Fri, 16 Mar 2018 03:18:53 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <1976408187.134668.1521068000360@poczta.nazwa.pl> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) close 30814 From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 16 08:31:29 2018 Received: (at submit) by debbugs.gnu.org; 16 Mar 2018 12:31:29 +0000 Received: from localhost ([127.0.0.1]:35834 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ewoW9-0002Ey-Fd for submit@debbugs.gnu.org; Fri, 16 Mar 2018 08:31:29 -0400 Received: from eggs.gnu.org ([208.118.235.92]:48192) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ewoW7-0002El-JX for submit@debbugs.gnu.org; Fri, 16 Mar 2018 08:31:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ewoVx-0002OU-8l for submit@debbugs.gnu.org; Fri, 16 Mar 2018 08:31:22 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:47617) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ewoVx-0002OO-4C for submit@debbugs.gnu.org; Fri, 16 Mar 2018 08:31:17 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39154) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ewoVs-0001P0-Go for bug-coreutils@gnu.org; Fri, 16 Mar 2018 08:31:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ewoVp-0002K1-Rm for bug-coreutils@gnu.org; Fri, 16 Mar 2018 08:31:12 -0400 Received: from mout.gmx.net ([212.227.15.15]:35731) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ewoVp-0002I9-Hr for bug-coreutils@gnu.org; Fri, 16 Mar 2018 08:31:09 -0400 Received: from adsiz.fritz.box ([146.52.144.151]) by mail.gmx.com (mrgmx002 [212.227.17.190]) with ESMTPSA (Nemesis) id 0LZz01-1eDGXT1wuz-00ljE9; Fri, 16 Mar 2018 13:30:54 +0100 From: Ruediger Meier To: bug-coreutils@gnu.org Subject: Re: bug#30814: Please increase the value of MAX_MON_WIDTH in ls.c Date: Fri, 16 Mar 2018 13:30:53 +0100 User-Agent: KMail/1.9.10 References: <1501208979.82403.1520986006382@poczta.nazwa.pl> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <201803161330.53416.sweet_f_a@gmx.de> X-Provags-ID: V03:K0:Dwc02NsMS0/fDGggThFU6vy4JEyZSHrPvmCF6dN32sxcvzKIK26 bli0+BxuM1Atkpc97I18JhjJwla7QM2CE8J3gMrkwdlFeTjEfodaWL2/OD1lZOekVN411qi o6ew2553WMYoGFiPDYMUc2OvJGqTUNIsAly6G/oGuACViAoIfgxWQQIMOFl/MydsHZ79L4U ykgk4vY/piOb5ca7gV2Tw== X-UI-Out-Filterresults: notjunk:1;V01:K0:hVIo6w7oyGU=:anwO006E/aAAXLOxeN9894 JE6pkltZjwHpPkKllnnKvhiHAldoyRFPNnrG3NewqPkA1WucL+6eqARZkWkaOnVlvMxATNPmI Ju73B/ymU0NrNJQb0mA1gUSlbtyjwAyNXEz/LVsEtVYuKhpLfkoOBA425z5aybiVUgavz6n0Q rhWTpXe3QpSP79CuW++Xj7EvK9jYLsmL3xv9j18KX8i6nohGx0Vt1oF4GYU0UU/GoLcvC+gQO ogn/+DdeFCUOx65KZArwgZJa8DPFM1W50GfgVcysQqo+DrmKTzccvT7TmbWeSkRV7qvyZfa1m SARbga3dc9o9Kapbc5nMc71wwcnTQRFVGTf7rmBwNMEIyBTXTn1L99grW6YMumuEm4oZ29mzZ 1C23YxsAZE+xhXy5SAREqwEf1vwhEv7wSVwiWRCT2t92WO24oo8Utqt3zc4gcgzFnrvveH4Ab i0r7pLjDOVJ/t1eF2V2TPn8OzNRJjCHizQWdbEbvJyWqXXX3WZF5e5vKRWLgKdXf+13Dg5tkF B5khksueHJbQSJ+uCWLoDOO9SCsOz0BD19tVPUekEQfDzuezSDKJrF7MPKUkGbJiyBBrNvg9p cCpZuCCoTaUNEoNXYzdG7mTRyM3g3GyPK1VzbSIXRv/vGlQ5mmMbajsPIICpU7P/Em2m2Gx6J DxWPBbrMb4psuDawN06vs4xWsaQGKuab4GkTwEnAFGb3FyOUKthlt1q1lynp4RGFOQs3cG8Mo OrtrG/Ye+RX7hRZM4+w2jBNtfpq6sRPQ5T6s7eP/z2IW0rzfQ/TvP4+ub9vcEhz7Z3oB8KEIE jVXFZoDb2kMyJSJ4WhMouiyQZP9gw== X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.1 (----) X-Debbugs-Envelope-To: submit Cc: 30814@debbugs.gnu.org, =?utf-8?q?P=C3=A1draig_Brady?= , Rafal Luzynski X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.1 (----) On Wednesday 14 March 2018, P=C3=A1draig Brady wrote: > On 13/03/18 17:06, Rafal Luzynski wrote: > > As we have introduced the support of nominative and genitive > > month names in glibc [1] and we are going to provide the updated > > locale data for Catalan language [2] it has been discovered [3] > > that the current limit of the maximum length of the abbreviated > > month name as displayed by "ls -l" will not work with the new > > data for Catalan. It is obligatory to precede the month name > > with "de " (note: the space) so the abbreviated month names limited > > to 5 characters will be ambiguous and therefore unreadable: > > It's a bit surprising that _abbreviations_ all need the "de " prefix, > but fair enough. Most used "abbreviations" in our locales do not follow the language=20 rules anyways. Even in english we would need to add dots and some month=20 abbreviations just do not exist. Below 3 examples of the correct abbreviations for english, spanish, and=20 german: Jan. enero Jan. =46eb. feb. Feb. Mar. marzo M=C3=A4rz Apr. abr. Apr. May mayo Mai June jun. Jun. July jul. Jul. Aug. agosto Aug. Sept. set. Sept. Oct. oct. Okt. Nov. nov. Nov. Dec. dic. Dez. Thankfully all 3 locales just use the first three letters. Note in=20 spanish you would also need to add such genitive "de" but of course=20 nobody wants to see it when printing short dates to a terminal. While I see a benefit of having the correct abbreviations *somewhere* in=20 the locale. I don't think they should be used in tools like ls by=20 default. The output should IMHO not longer than --time-style=3Dlong-iso=20 or --full-time. > > de ma (should be "de mar" at least) > > d=E2=80=99abr (correct) > > de ma (should be "de mai" at least) > > de ju (should be "de jun" at least) > > de ju (should be "de jul" at least) I don't speak Catalan, but I can't believe that "de jun" is a correct=20 abbreviation following the language rules. > > Increasing the value of MAX_MON_WIDTH to 6 characters will fix > > the problem. The location of the constant is here: [4] > > > > Although it has been also suggested in the same bug report that > > there should be no additional limit for the month length. > > > > This bug may be related with the coreutils bug #29377. [5] > > > > Regards, > > > > Rafal Luzynski > > > > > > [1] https://sourceware.org/bugzilla/show_bug.cgi?id=3D10871 > > [2] https://sourceware.org/bugzilla/show_bug.cgi?id=3D22848 > > [3] https://sourceware.org/bugzilla/show_bug.cgi?id=3D22848#c6 > > [4] > > http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/ls.c#n1099 > > [5] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D29377 > > Thanks for the careful analysis. > > 5 was chosen as a max width for abmon > as that was seen to be unambiguous and > also truncate overly long abbreviations. > > One can browse the abbreviations by length using: > > locale -a | grep utf8 | > while read l; do LC_ALL=3D$l locale abmon; done | > tr ';' '\n' | sort -u | grep '.\{5,\}' | > while read mon; do > printf '%02d %s\n' "$(echo "$mon" | wc -L)" "$mon" > done | > sort -n | less > > That shows a couple of existing issues with the limit of 5. > ln_CD.utf8 (Democratic Republic of the Congo) needs a length of 7 to > be unambiguous, while Arabic needs 12! > I don't remember arabic being so long at the time I implemented > the alignment/truncation in ls (9 years ago), but we should probably > expand to account for that. > > $ LC_ALL=3Dln_CD.utf8 locale abmon > s=C3=A1nz=C3=A11.;s=C3=A1nz=C3=A12.;s=C3=A1nz=C3=A13.;s=C3=A1nz=C3=A14.;s= =C3=A1nz=C3=A15.;s=C3=A1nz=C3=A16.;s=C3=A1nz=C3=A17.;s=C3=A1nz=C3=A18.;s=C3= =A1nz=C3=A1 >9.;s=C3=A1nz10.;s=C3=A1nz=C3=A111.;s=C3=A1nz=C3=A112. > > $ LC_ALL=3Dar_SY.utf8 locale abmon | tr ';' '\n' > =D9=83=D8=A7=D9=86=D9=88=D9=86 =D8=A7=D9=84=D8=AB=D8=A7=D9=86=D9=8A > =D8=B4=D8=A8=D8=A7=D8=B7 > =D8=A2=D8=B0=D8=A7=D8=B1 > =D9=86=D9=8A=D8=B3=D8=A7=D9=86 > =D9=86=D9=88=D8=A7=D8=B1 > =D8=AD=D8=B2=D9=8A=D8=B1=D8=A7=D9=86 > =D8=AA=D9=85=D9=88=D8=B2 > =D8=A2=D8=A8 > =D8=A3=D9=8A=D9=84=D9=88=D9=84 > =D8=AA=D8=B4=D8=B1=D9=8A=D9=86 =D8=A7=D9=84=D8=A3=D9=88=D9=84 > =D8=AA=D8=B4=D8=B1=D9=8A=D9=86 =D8=A7=D9=84=D8=AB=D8=A7=D9=86=D9=8A > =D9=83=D8=A7=D9=86=D9=88=D9=86 =D8=A7=D9=84=D8=A3=D9=88=D9=84 > > Given the increase in supported size should only impact relatively > few languages it probably makes sense to increase to 12. The attached > does that and also augments the test to find ambiguous cases. > > cheers, > P=C3=A1draig From unknown Fri Jun 20 07:13:02 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 14 Apr 2018 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator