From unknown Fri Aug 15 14:15:41 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#24015 <24015@debbugs.gnu.org> To: bug#24015 <24015@debbugs.gnu.org> Subject: Status: [PATCH] sort: make -h work with -k and blank used as thousands separator Reply-To: bug#24015 <24015@debbugs.gnu.org> Date: Fri, 15 Aug 2025 21:15:41 +0000 retitle 24015 [PATCH] sort: make -h work with -k and blank used as thousand= s separator reassign 24015 coreutils submitter 24015 Kamil Dudka severity 24015 normal tag 24015 patch thanks From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 17 12:02:37 2016 Received: (at submit) by debbugs.gnu.org; 17 Jul 2016 16:02:37 +0000 Received: from localhost ([127.0.0.1]:54681 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bOoWR-0008SX-67 for submit@debbugs.gnu.org; Sun, 17 Jul 2016 12:02:37 -0400 Received: from eggs.gnu.org ([208.118.235.92]:35025) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bOoWO-0008SJ-4D for submit@debbugs.gnu.org; Sun, 17 Jul 2016 12:02:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bOoWH-00040h-Mw for submit@debbugs.gnu.org; Sun, 17 Jul 2016 12:02:18 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:56017) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bOoWH-00040P-JH for submit@debbugs.gnu.org; Sun, 17 Jul 2016 12:02:17 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54187) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bOoWF-0000Wn-1D for bug-coreutils@gnu.org; Sun, 17 Jul 2016 12:02:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bOoWA-0003zW-Jm for bug-coreutils@gnu.org; Sun, 17 Jul 2016 12:02:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44378) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bOoWA-0003zA-CP for bug-coreutils@gnu.org; Sun, 17 Jul 2016 12:02:10 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A8B5085364 for ; Sun, 17 Jul 2016 16:02:08 +0000 (UTC) Received: from f23.localdomain (ovpn-204-23.brq.redhat.com [10.40.204.23]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u6HG26tP002412; Sun, 17 Jul 2016 12:02:07 -0400 From: Kamil Dudka To: bug-coreutils@gnu.org Subject: [PATCH] sort: make -h work with -k and blank used as thousands separator Date: Sun, 17 Jul 2016 18:02:06 +0200 Message-Id: <1468771326-2519-1-git-send-email-kdudka@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Sun, 17 Jul 2016 16:02:08 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.1 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.9 (/) * src/sort.c (find_unit_order): Allow to skip only one occurrence of thousands_sep to avoid finding the unit in the next column in case thousands_sep matches as blank and is used as column delimiter. * tests/misc/sort-h-thousands-sep.sh: Add regression test for this bug. * tests/local.mk: Reference the test. Reported at https://bugzilla.redhat.com/1355780 --- src/sort.c | 12 ++++++---- tests/local.mk | 1 + tests/misc/sort-h-thousands-sep.sh | 45 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 54 insertions(+), 4 deletions(-) create mode 100755 tests/misc/sort-h-thousands-sep.sh diff --git a/src/sort.c b/src/sort.c index f717604..a2cadda 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1904,12 +1904,16 @@ find_unit_order (char const *number) to be lacking in units. FIXME: add support for multibyte thousands_sep and decimal_point. */ - do + while (ISDIGIT (ch = *p++)) { - while (ISDIGIT (ch = *p++)) - nonzero |= ch - '0'; + nonzero |= ch - '0'; + + /* Allow to skip only one occurrence of thousands_sep to avoid finding + the unit in the next column in case thousands_sep matches as blank + and is used as column delimiter. */ + if (*p == thousands_sep) + ++p; } - while (ch == thousands_sep); if (ch == decimal_point) while (ISDIGIT (ch = *p++)) diff --git a/tests/local.mk b/tests/local.mk index 27cbf6e..889142a 100644 --- a/tests/local.mk +++ b/tests/local.mk @@ -348,6 +348,7 @@ all_tests = \ tests/misc/sort-discrim.sh \ tests/misc/sort-files0-from.pl \ tests/misc/sort-float.sh \ + tests/misc/sort-h-thousands-sep.sh \ tests/misc/sort-merge.pl \ tests/misc/sort-merge-fdlimit.sh \ tests/misc/sort-month.sh \ diff --git a/tests/misc/sort-h-thousands-sep.sh b/tests/misc/sort-h-thousands-sep.sh new file mode 100755 index 0000000..a1e02de --- /dev/null +++ b/tests/misc/sort-h-thousands-sep.sh @@ -0,0 +1,45 @@ +#!/bin/sh +# exercise 'sort -h' in locales where thousands separator is blank + +# Copyright (C) 2016 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src +print_ver_ sort + +tee in > exp1 << _EOF_ +1 1k 4 003 1M +2k 2M 4 002 2 +3M 3 4 001 3k +_EOF_ + +cat > exp2 << _EOF_ +3M 3 4 001 3k +1 1k 4 003 1M +2k 2M 4 002 2 +_EOF_ + +cat > exp3 << _EOF_ +3M 3 4 001 3k +2k 2M 4 002 2 +1 1k 4 003 1M +_EOF_ + +for i in 1 2 3; do + LC_ALL="sv_SE.utf8" sort -h -k $i "in" > "out${i}" || fail=1 + compare "exp${i}" "out${i}" || fail=1 +done + +Exit $fail -- 2.5.5 From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 17 15:51:54 2016 Received: (at 24015) by debbugs.gnu.org; 17 Jul 2016 19:51:54 +0000 Received: from localhost ([127.0.0.1]:54805 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bOs6U-0005Ei-9d for submit@debbugs.gnu.org; Sun, 17 Jul 2016 15:51:54 -0400 Received: from mail.magicbluesmoke.com ([82.195.144.49]:49318) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bOs6P-0005EX-TG for 24015@debbugs.gnu.org; Sun, 17 Jul 2016 15:51:53 -0400 Received: from [192.168.1.80] (unknown [109.77.17.63]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id 974D7245; Sun, 17 Jul 2016 20:51:48 +0100 (IST) Subject: Re: bug#24015: [PATCH] sort: make -h work with -k and blank used as thousands separator To: Kamil Dudka , 24015@debbugs.gnu.org References: <1468771326-2519-1-git-send-email-kdudka@redhat.com> From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: <578BE1D4.3020601@draigBrady.com> Date: Sun, 17 Jul 2016 20:51:48 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <1468771326-2519-1-git-send-email-kdudka@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 24015 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 17/07/16 17:02, Kamil Dudka wrote: > * src/sort.c (find_unit_order): Allow to skip only one occurrence > of thousands_sep to avoid finding the unit in the next column in case > thousands_sep matches as blank and is used as column delimiter. > * tests/misc/sort-h-thousands-sep.sh: Add regression test for this bug. > * tests/local.mk: Reference the test. > Reported at https://bugzilla.redhat.com/1355780 > --- > src/sort.c | 12 ++++++---- > tests/local.mk | 1 + > tests/misc/sort-h-thousands-sep.sh | 45 ++++++++++++++++++++++++++++++++++++++ > 3 files changed, 54 insertions(+), 4 deletions(-) > create mode 100755 tests/misc/sort-h-thousands-sep.sh > > diff --git a/src/sort.c b/src/sort.c > index f717604..a2cadda 100644 > --- a/src/sort.c > +++ b/src/sort.c > @@ -1904,12 +1904,16 @@ find_unit_order (char const *number) > to be lacking in units. > FIXME: add support for multibyte thousands_sep and decimal_point. */ > > - do > + while (ISDIGIT (ch = *p++)) > { > - while (ISDIGIT (ch = *p++)) > - nonzero |= ch - '0'; > + nonzero |= ch - '0'; > + > + /* Allow to skip only one occurrence of thousands_sep to avoid finding > + the unit in the next column in case thousands_sep matches as blank > + and is used as column delimiter. */ > + if (*p == thousands_sep) > + ++p; > } > - while (ch == thousands_sep); This is an improvement. Though I now also see an existing inconsistency where we treat trailing blanks in this case. I.E. this inconsistency with: $ printf '%s\n' '1 M' '2 K' | LANG=en_US git/coreutils/src/sort -h 1 M 2 K $ printf '%s\n' '1 M' '2 K' | LANG=sv_SE git/coreutils/src/sort -h 2 K 1 M We should probably not allow/consider a blank after the last digit as part of the number here. I.E. the first output is correct, treating the input as 2 separate fields. > diff --git a/tests/misc/sort-h-thousands-sep.sh b/tests/misc/sort-h-thousands-sep.sh > new file mode 100755 > index 0000000..a1e02de > --- /dev/null > +++ b/tests/misc/sort-h-thousands-sep.sh > @@ -0,0 +1,45 @@ > +#!/bin/sh > +# exercise 'sort -h' in locales where thousands separator is blank > + > +# Copyright (C) 2016 Free Software Foundation, Inc. > + > +# This program is free software: you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation, either version 3 of the License, or > +# (at your option) any later version. > + > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > + > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see . > + > +. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src > +print_ver_ sort > + > +tee in > exp1 << _EOF_ > +1 1k 4 003 1M > +2k 2M 4 002 2 > +3M 3 4 001 3k > +_EOF_ > + > +cat > exp2 << _EOF_ > +3M 3 4 001 3k > +1 1k 4 003 1M > +2k 2M 4 002 2 > +_EOF_ > + > +cat > exp3 << _EOF_ > +3M 3 4 001 3k > +2k 2M 4 002 2 > +1 1k 4 003 1M > +_EOF_ > + A testing for the case I highlighted would be good. > +for i in 1 2 3; do > + LC_ALL="sv_SE.utf8" sort -h -k $i "in" > "out${i}" || fail=1 > + compare "exp${i}" "out${i}" || fail=1 > +done We'd have to skip_ the test if sv_SE wasn't available. Maybe something like: test "$(LC_ALL=sv_SE locale thousands_sep)" = ' ' || skip_ 'The swedish locale with blank thousands separator is unavailable' This deserves an entry in NEWS also. thanks! Pádraig From debbugs-submit-bounces@debbugs.gnu.org Sun Jul 17 16:05:16 2016 Received: (at 24015) by debbugs.gnu.org; 17 Jul 2016 20:05:16 +0000 Received: from localhost ([127.0.0.1]:54810 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bOsJQ-0005Xs-KB for submit@debbugs.gnu.org; Sun, 17 Jul 2016 16:05:16 -0400 Received: from mail.magicbluesmoke.com ([82.195.144.49]:49332) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bOsJO-0005Xk-Va for 24015@debbugs.gnu.org; Sun, 17 Jul 2016 16:05:15 -0400 Received: from [192.168.1.80] (unknown [109.77.17.63]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id 6096B245; Sun, 17 Jul 2016 21:05:14 +0100 (IST) Subject: Re: bug#24015: [PATCH] sort: make -h work with -k and blank used as thousands separator To: Kamil Dudka , 24015@debbugs.gnu.org References: <1468771326-2519-1-git-send-email-kdudka@redhat.com> <578BE1D4.3020601@draigBrady.com> From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: <578BE4F9.1070903@draigBrady.com> Date: Sun, 17 Jul 2016 21:05:13 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <578BE1D4.3020601@draigBrady.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 24015 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) > On 17/07/16 17:02, Kamil Dudka wrote: >> diff --git a/src/sort.c b/src/sort.c >> index f717604..a2cadda 100644 >> --- a/src/sort.c >> +++ b/src/sort.c >> @@ -1904,12 +1904,16 @@ find_unit_order (char const *number) >> to be lacking in units. >> FIXME: add support for multibyte thousands_sep and decimal_point. */ >> >> - do >> + while (ISDIGIT (ch = *p++)) >> { >> - while (ISDIGIT (ch = *p++)) >> - nonzero |= ch - '0'; >> + nonzero |= ch - '0'; >> + >> + /* Allow to skip only one occurrence of thousands_sep to avoid finding >> + the unit in the next column in case thousands_sep matches as blank >> + and is used as column delimiter. */ >> + if (*p == thousands_sep) >> + ++p; >> } >> - while (ch == thousands_sep); > There is also similar logic in debug_key that would need the same adjustments. cheers, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Mon Jul 18 13:04:50 2016 Received: (at 24015) by debbugs.gnu.org; 18 Jul 2016 17:04:51 +0000 Received: from localhost ([127.0.0.1]:55934 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bPByM-00039d-LL for submit@debbugs.gnu.org; Mon, 18 Jul 2016 13:04:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45104) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bPByK-00039S-Tm for 24015@debbugs.gnu.org; Mon, 18 Jul 2016 13:04:49 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D64FC883DA; Mon, 18 Jul 2016 17:04:47 +0000 (UTC) Received: from f23.localdomain (unused-4-111.brq.redhat.com [10.34.4.111]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u6IH4kOV015089; Mon, 18 Jul 2016 13:04:46 -0400 From: Kamil Dudka To: =?UTF-8?q?P=C3=A1draig=20Brady?= Subject: [PATCH v2 1/3] sort: deduplicate code for traversing numbers Date: Mon, 18 Jul 2016 19:04:43 +0200 Message-Id: <1468861485-28231-1-git-send-email-kdudka@redhat.com> In-Reply-To: <578BE1D4.3020601@draigBrady.com> References: <578BE1D4.3020601@draigBrady.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Mon, 18 Jul 2016 17:04:47 +0000 (UTC) X-Spam-Score: -6.3 (------) X-Debbugs-Envelope-To: 24015 Cc: 24015@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.3 (------) * src/sort.c (traverse_raw_number): New function for traversing numbers. (find_unit_order): Use traverse_raw_number () instead of open-coding it. (debug_key): Use traverse_raw_number () instead of open-coding it. --- src/sort.c | 63 ++++++++++++++++++++++++++++++++++---------------------------- 1 file changed, 35 insertions(+), 28 deletions(-) diff --git a/src/sort.c b/src/sort.c index f717604..58c1167 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1885,18 +1885,16 @@ static char const unit_order[UCHAR_LIM] = #endif }; -/* Return an integer that represents the order of magnitude of the - unit following the number. The number may contain thousands - separators and a decimal point, but it may not contain leading blanks. - Negative numbers get negative orders; zero numbers have a zero order. */ - -static int _GL_ATTRIBUTE_PURE -find_unit_order (char const *number) +/* Traverse number given as *number consisting of digits, thousands_sep, and + decimal_point chars only. Returns the highest digit found in the number, + or '\0' if no digit has been found. Upon return *number points at the + character that immediately follows after the given number. */ +static unsigned char +traverse_raw_number (char const **number) { - bool minus_sign = (*number == '-'); - char const *p = number + minus_sign; - int nonzero = 0; + char const *p = *number; unsigned char ch; + unsigned char max_digit = '\0'; /* Scan to end of number. Decimals or separators not followed by digits stop the scan. @@ -1907,16 +1905,34 @@ find_unit_order (char const *number) do { while (ISDIGIT (ch = *p++)) - nonzero |= ch - '0'; + if (max_digit < ch) + max_digit = ch; } while (ch == thousands_sep); if (ch == decimal_point) while (ISDIGIT (ch = *p++)) - nonzero |= ch - '0'; + if (max_digit < ch) + max_digit = ch; + + *number = p - 1; + return max_digit; +} + +/* Return an integer that represents the order of magnitude of the + unit following the number. The number may contain thousands + separators and a decimal point, but it may not contain leading blanks. + Negative numbers get negative orders; zero numbers have a zero order. */ - if (nonzero) +static int _GL_ATTRIBUTE_PURE +find_unit_order (char const *number) +{ + bool minus_sign = (*number == '-'); + char const *p = number + minus_sign; + unsigned char max_digit = traverse_raw_number (&p); + if ('0' < max_digit) { + unsigned char ch = *p; int order = unit_order[ch]; return (minus_sign ? -order : order); } @@ -2293,23 +2309,14 @@ debug_key (struct line const *line, struct keyfield const *key) ignore_value (strtold (beg, &tighter_lim)); else if (key->numeric || key->human_numeric) { - char *p = beg + (beg < lim && *beg == '-'); - bool found_digit = false; - unsigned char ch; - - do + char const *p = beg + (beg < lim && *beg == '-'); + unsigned char max_digit = traverse_raw_number (&p); + if ('0' <= max_digit) { - while (ISDIGIT (ch = *p++)) - found_digit = true; + unsigned char ch = *p; + tighter_lim = (char *) p + + (key->human_numeric && unit_order[ch]); } - while (ch == thousands_sep); - - if (ch == decimal_point) - while (ISDIGIT (ch = *p++)) - found_digit = true; - - if (found_digit) - tighter_lim = p - ! (key->human_numeric && unit_order[ch]); } else tighter_lim = lim; -- 2.5.5 From debbugs-submit-bounces@debbugs.gnu.org Mon Jul 18 13:04:57 2016 Received: (at 24015) by debbugs.gnu.org; 18 Jul 2016 17:04:57 +0000 Received: from localhost ([127.0.0.1]:55938 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bPByS-0003A5-Vm for submit@debbugs.gnu.org; Mon, 18 Jul 2016 13:04:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46510) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bPByR-00039b-G9 for 24015@debbugs.gnu.org; Mon, 18 Jul 2016 13:04:55 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 24ED372645; Mon, 18 Jul 2016 17:04:50 +0000 (UTC) Received: from f23.localdomain (unused-4-111.brq.redhat.com [10.34.4.111]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u6IH4kOX015089; Mon, 18 Jul 2016 13:04:49 -0400 From: Kamil Dudka To: =?UTF-8?q?P=C3=A1draig=20Brady?= Subject: [PATCH v2 3/3] sort: with -h, disallow thousands separator between number and unit Date: Mon, 18 Jul 2016 19:04:45 +0200 Message-Id: <1468861485-28231-3-git-send-email-kdudka@redhat.com> In-Reply-To: <1468861485-28231-1-git-send-email-kdudka@redhat.com> References: <578BE1D4.3020601@draigBrady.com> <1468861485-28231-1-git-send-email-kdudka@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 18 Jul 2016 17:04:50 +0000 (UTC) X-Spam-Score: -6.3 (------) X-Debbugs-Envelope-To: 24015 Cc: 24015@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.3 (------) * src/sort.c (traverse_raw_number): Accept thousands separator only if it is immediately followed by a digit. * tests/misc/sort-h-thousands-sep.sh: Cover the fix for this bug. Suggested by Pádraig Brady in http://bugs.gnu.org/24015 --- src/sort.c | 11 ++++++++++- tests/misc/sort-h-thousands-sep.sh | 24 ++++++++++++------------ 2 files changed, 22 insertions(+), 13 deletions(-) diff --git a/src/sort.c b/src/sort.c index 038f6ae..6b2dc84 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1895,6 +1895,7 @@ traverse_raw_number (char const **number) char const *p = *number; unsigned char ch; unsigned char max_digit = '\0'; + bool ends_with_thousands_sep = false; /* Scan to end of number. Decimals or separators not followed by digits stop the scan. @@ -1910,10 +1911,18 @@ traverse_raw_number (char const **number) /* Allow to skip only one occurrence of thousands_sep to avoid finding the unit in the next column in case thousands_sep matches as blank and is used as column delimiter. */ - if (*p == thousands_sep) + ends_with_thousands_sep = (*p == thousands_sep); + if (ends_with_thousands_sep) ++p; } + if (ends_with_thousands_sep) + { + /* thousands_sep not followed by digit is not allowed. */ + *number = p - /* already incremented twice */ 2; + return max_digit; + } + if (ch == decimal_point) while (ISDIGIT (ch = *p++)) if (max_digit < ch) diff --git a/tests/misc/sort-h-thousands-sep.sh b/tests/misc/sort-h-thousands-sep.sh index 17f1b6c..1168268 100755 --- a/tests/misc/sort-h-thousands-sep.sh +++ b/tests/misc/sort-h-thousands-sep.sh @@ -21,25 +21,25 @@ print_ver_ sort test "$(LC_ALL=sv_SE locale thousands_sep)" = ' ' \ || skip_ 'The Swedish locale with blank thousands separator is unavailable.' -tee exp1 > in << _EOF_ -1 1k 4 003 1M -2k 2M 4 002 2 -3M 3 4 001 3k +tee exp{1,3} > in << _EOF_ +1 1k 1 M 4 003 1M +2k 2M 2 k 4 002 2 +3M 3 3 G 4 001 3k _EOF_ cat > exp2 << _EOF_ -3M 3 4 001 3k -1 1k 4 003 1M -2k 2M 4 002 2 +3M 3 3 G 4 001 3k +1 1k 1 M 4 003 1M +2k 2M 2 k 4 002 2 _EOF_ -cat > exp3 << _EOF_ -3M 3 4 001 3k -2k 2M 4 002 2 -1 1k 4 003 1M +cat > exp5 << _EOF_ +3M 3 3 G 4 001 3k +2k 2M 2 k 4 002 2 +1 1k 1 M 4 003 1M _EOF_ -for i in 1 2 3; do +for i in 1 2 3 5; do LC_ALL="sv_SE.utf8" sort -h -k $i "in" > "out${i}" || fail=1 compare "exp${i}" "out${i}" || fail=1 done -- 2.5.5 From debbugs-submit-bounces@debbugs.gnu.org Mon Jul 18 13:05:07 2016 Received: (at 24015) by debbugs.gnu.org; 18 Jul 2016 17:05:07 +0000 Received: from localhost ([127.0.0.1]:55940 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bPByW-0003AG-97 for submit@debbugs.gnu.org; Mon, 18 Jul 2016 13:05:07 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40735) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bPByQ-00039U-Cx for 24015@debbugs.gnu.org; Mon, 18 Jul 2016 13:04:56 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0465610789; Mon, 18 Jul 2016 17:04:49 +0000 (UTC) Received: from f23.localdomain (unused-4-111.brq.redhat.com [10.34.4.111]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u6IH4kOW015089; Mon, 18 Jul 2016 13:04:48 -0400 From: Kamil Dudka To: =?UTF-8?q?P=C3=A1draig=20Brady?= Subject: [PATCH v2 2/3] sort: make -h work with -k and blank used as thousands separator Date: Mon, 18 Jul 2016 19:04:44 +0200 Message-Id: <1468861485-28231-2-git-send-email-kdudka@redhat.com> In-Reply-To: <1468861485-28231-1-git-send-email-kdudka@redhat.com> References: <578BE1D4.3020601@draigBrady.com> <1468861485-28231-1-git-send-email-kdudka@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 18 Jul 2016 17:04:49 +0000 (UTC) X-Spam-Score: -6.3 (------) X-Debbugs-Envelope-To: 24015 Cc: 24015@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.3 (-) * src/sort.c (traverse_raw_number): Allow to skip only one occurrence of thousands_sep to avoid finding the unit in the next column in case thousands_sep matches as blank and is used as column delimiter. * tests/misc/sort-h-thousands-sep.sh: Add regression test for this bug. * tests/local.mk: Reference the test. * NEWS: Mention the bug fix. Reported at https://bugzilla.redhat.com/1355780 --- NEWS | 2 ++ src/sort.c | 14 ++++++++---- tests/local.mk | 1 + tests/misc/sort-h-thousands-sep.sh | 47 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 59 insertions(+), 5 deletions(-) create mode 100755 tests/misc/sort-h-thousands-sep.sh diff --git a/NEWS b/NEWS index 4d8fb45..736b95e 100644 --- a/NEWS +++ b/NEWS @@ -21,6 +21,8 @@ GNU coreutils NEWS -*- outline -*- nl now resets numbering for each page section rather than just for each page. [This bug was present in "the beginning".] + sort -h -k now works even in locales that use blank as thousands separator. + stty --help no longer outputs extraneous gettext header lines for translated languages. [bug introduced in coreutils-8.24] diff --git a/src/sort.c b/src/sort.c index 58c1167..038f6ae 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1902,13 +1902,17 @@ traverse_raw_number (char const **number) to be lacking in units. FIXME: add support for multibyte thousands_sep and decimal_point. */ - do + while (ISDIGIT (ch = *p++)) { - while (ISDIGIT (ch = *p++)) - if (max_digit < ch) - max_digit = ch; + if (max_digit < ch) + max_digit = ch; + + /* Allow to skip only one occurrence of thousands_sep to avoid finding + the unit in the next column in case thousands_sep matches as blank + and is used as column delimiter. */ + if (*p == thousands_sep) + ++p; } - while (ch == thousands_sep); if (ch == decimal_point) while (ISDIGIT (ch = *p++)) diff --git a/tests/local.mk b/tests/local.mk index 27cbf6e..889142a 100644 --- a/tests/local.mk +++ b/tests/local.mk @@ -348,6 +348,7 @@ all_tests = \ tests/misc/sort-discrim.sh \ tests/misc/sort-files0-from.pl \ tests/misc/sort-float.sh \ + tests/misc/sort-h-thousands-sep.sh \ tests/misc/sort-merge.pl \ tests/misc/sort-merge-fdlimit.sh \ tests/misc/sort-month.sh \ diff --git a/tests/misc/sort-h-thousands-sep.sh b/tests/misc/sort-h-thousands-sep.sh new file mode 100755 index 0000000..17f1b6c --- /dev/null +++ b/tests/misc/sort-h-thousands-sep.sh @@ -0,0 +1,47 @@ +#!/bin/sh +# exercise 'sort -h' in locales where thousands separator is blank + +# Copyright (C) 2016 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src +print_ver_ sort +test "$(LC_ALL=sv_SE locale thousands_sep)" = ' ' \ + || skip_ 'The Swedish locale with blank thousands separator is unavailable.' + +tee exp1 > in << _EOF_ +1 1k 4 003 1M +2k 2M 4 002 2 +3M 3 4 001 3k +_EOF_ + +cat > exp2 << _EOF_ +3M 3 4 001 3k +1 1k 4 003 1M +2k 2M 4 002 2 +_EOF_ + +cat > exp3 << _EOF_ +3M 3 4 001 3k +2k 2M 4 002 2 +1 1k 4 003 1M +_EOF_ + +for i in 1 2 3; do + LC_ALL="sv_SE.utf8" sort -h -k $i "in" > "out${i}" || fail=1 + compare "exp${i}" "out${i}" || fail=1 +done + +Exit $fail -- 2.5.5 From debbugs-submit-bounces@debbugs.gnu.org Mon Jul 18 17:07:39 2016 Received: (at 24015-done) by debbugs.gnu.org; 18 Jul 2016 21:07:39 +0000 Received: from localhost ([127.0.0.1]:56051 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bPFlL-0000hO-Hb for submit@debbugs.gnu.org; Mon, 18 Jul 2016 17:07:39 -0400 Received: from mail.magicbluesmoke.com ([82.195.144.49]:52954) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bPFlK-0000hG-97 for 24015-done@debbugs.gnu.org; Mon, 18 Jul 2016 17:07:38 -0400 Received: from [192.168.1.80] (unknown [109.76.200.219]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id C414E224; Mon, 18 Jul 2016 22:07:34 +0100 (IST) Subject: Re: [PATCH v2 1/3] sort: deduplicate code for traversing numbers To: Kamil Dudka References: <578BE1D4.3020601@draigBrady.com> <1468861485-28231-1-git-send-email-kdudka@redhat.com> From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: <578D4516.2070100@draigBrady.com> Date: Mon, 18 Jul 2016 22:07:34 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <1468861485-28231-1-git-send-email-kdudka@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 24015-done Cc: 24015-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Excellent. I'll push all three patches. I'll adjust the first summary like s/sort:/maint: sort.c:/ since there is no functionality change I'll also add the check for the sv_SE locale in the test. thanks! Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Mon Jul 18 17:29:00 2016 Received: (at 24015) by debbugs.gnu.org; 18 Jul 2016 21:29:00 +0000 Received: from localhost ([127.0.0.1]:56060 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bPG60-0001CN-Br for submit@debbugs.gnu.org; Mon, 18 Jul 2016 17:29:00 -0400 Received: from mail.magicbluesmoke.com ([82.195.144.49]:52980) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bPG5y-0001CG-Mr for 24015@debbugs.gnu.org; Mon, 18 Jul 2016 17:28:58 -0400 Received: from [192.168.1.80] (unknown [109.76.200.219]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id 96A4A224; Mon, 18 Jul 2016 22:28:56 +0100 (IST) Subject: Re: bug#24015: [PATCH v2 1/3] sort: deduplicate code for traversing numbers To: Kamil Dudka References: <578BE1D4.3020601@draigBrady.com> <1468861485-28231-1-git-send-email-kdudka@redhat.com> <578D4516.2070100@draigBrady.com> From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: <578D4A18.9020807@draigBrady.com> Date: Mon, 18 Jul 2016 22:28:56 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <578D4516.2070100@draigBrady.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 24015 Cc: 24015@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 18/07/16 22:07, Pádraig Brady wrote: > Excellent. I'll push all three patches. > > I'll adjust the first summary like s/sort:/maint: sort.c:/ > since there is no functionality change > > I'll also add the check for the sv_SE locale in the test. Oops you'd done that already. One change I did make to the test was to change the exp{1,3} bashism to exp1 exp3, as the tests need to run under any POSIX compliant shell. I tested that with: make SHELL=/bin/dash TESTS=tests/misc/sort-h-thousands-sep.sh SUBDIRS=. check cheers, Pádraig From debbugs-submit-bounces@debbugs.gnu.org Tue Jul 19 02:09:50 2016 Received: (at 24015) by debbugs.gnu.org; 19 Jul 2016 06:09:50 +0000 Received: from localhost ([127.0.0.1]:56323 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bPOE2-0000Gf-HX for submit@debbugs.gnu.org; Tue, 19 Jul 2016 02:09:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56862) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bPOE1-0000GQ-HH for 24015@debbugs.gnu.org; Tue, 19 Jul 2016 02:09:49 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8EE6738102A; Tue, 19 Jul 2016 06:09:43 +0000 (UTC) Received: from kdudka-nb.localnet (ovpn-204-38.brq.redhat.com [10.40.204.38]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u6J69fBu014497 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 19 Jul 2016 02:09:42 -0400 From: Kamil Dudka To: =?ISO-8859-1?Q?P=E1draig?= Brady Subject: Re: bug#24015: [PATCH v2 1/3] sort: deduplicate code for traversing numbers Date: Tue, 19 Jul 2016 08:09:38 +0200 Message-ID: <1612000.F8mov5s0YU@kdudka-nb> User-Agent: KMail/4.14.10 (Linux/4.6.4-gentoo; KDE/4.14.20; x86_64; ; ) In-Reply-To: <578D4A18.9020807@draigBrady.com> References: <578BE1D4.3020601@draigBrady.com> <578D4516.2070100@draigBrady.com> <578D4A18.9020807@draigBrady.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Tue, 19 Jul 2016 06:09:43 +0000 (UTC) X-Spam-Score: -6.3 (------) X-Debbugs-Envelope-To: 24015 Cc: 24015@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.3 (------) On Monday, July 18, 2016 22:28:56 P=E1draig Brady wrote: > On 18/07/16 22:07, P=E1draig Brady wrote: > > Excellent. I'll push all three patches. > >=20 > > I'll adjust the first summary like s/sort:/maint: sort.c:/ > > since there is no functionality change > >=20 > > I'll also add the check for the sv_SE locale in the test. >=20 > Oops you'd done that already. >=20 > One change I did make to the test was to change > the exp{1,3} bashism to exp1 exp3, as the tests need > to run under any POSIX compliant shell. > I tested that with: >=20 > make SHELL=3D/bin/dash TESTS=3Dtests/misc/sort-h-thousands-sep.sh S= UBDIRS=3D. > check Perfect. Thanks for the improvements! Kamil > cheers, > P=E1draig From unknown Fri Aug 15 14:15:41 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 16 Aug 2016 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator