From unknown Sat Aug 16 16:20:30 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6020: coreutils-8.x: a simple feature enhancement, and how to do it Resent-From: "Nelson H. F. Beebe" Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sat, 24 Apr 2010 01:31:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 6020 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 6020@debbugs.gnu.org Cc: beebe@math.utah.edu X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.12720726394827 (code B ref -1); Sat, 24 Apr 2010 01:31:01 +0000 Received: (at submit) by debbugs.gnu.org; 24 Apr 2010 01:30:39 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O5UCd-0001Fo-Gk for submit@debbugs.gnu.org; Fri, 23 Apr 2010 21:30:39 -0400 Received: from mail.gnu.org ([199.232.76.166] helo=mx10.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O5UCa-0001Fj-TQ for submit@debbugs.gnu.org; Fri, 23 Apr 2010 21:30:38 -0400 Received: from lists.gnu.org ([199.232.76.165]:48233) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1O5UCW-0003IK-Si for submit@debbugs.gnu.org; Fri, 23 Apr 2010 21:30:32 -0400 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O5UCV-00044g-SG for bug-coreutils@gnu.org; Fri, 23 Apr 2010 21:30:31 -0400 Received: from [140.186.70.92] (port=59359 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O5UCU-00044R-Ah for bug-coreutils@gnu.org; Fri, 23 Apr 2010 21:30:31 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.0 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O5UCR-0000K9-6m for bug-coreutils@gnu.org; Fri, 23 Apr 2010 21:30:30 -0400 Received: from mail.math.utah.edu ([155.101.98.135]:47450) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O5UCL-0000JN-Gv for bug-coreutils@gnu.org; Fri, 23 Apr 2010 21:30:27 -0400 Received: from psi.math.utah.edu (psi.math.utah.edu [155.101.96.19]) by mail.math.utah.edu (8.14.3/8.14.3) with ESMTP id o3O1UHAP023264; Fri, 23 Apr 2010 19:30:17 -0600 (MDT) Received: from psi.math.utah.edu (localhost [127.0.0.1]) by psi.math.utah.edu (8.14.3/8.14.3) with ESMTP id o3O1UHdL029813; Fri, 23 Apr 2010 19:30:17 -0600 (MDT) Received: (from beebe@localhost) by psi.math.utah.edu (8.14.3/8.14.3/Submit) id o3O1UHih029811; Fri, 23 Apr 2010 19:30:17 -0600 (MDT) Date: Fri, 23 Apr 2010 19:30:17 -0600 (MDT) From: "Nelson H. F. Beebe" X-US-Mail: "Department of Mathematics, 110 LCB, University of Utah, 155 S 1400 E RM 233, Salt Lake City, UT 84112-0090, USA" X-Telephone: +1 801 581 5254 X-FAX: +1 801 585 1640, +1 801 581 4148 X-URL: http://www.math.utah.edu/~beebe Message-ID: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0 (mail.math.utah.edu [155.101.98.135]); Fri, 23 Apr 2010 19:30:17 -0600 (MDT) X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-Spam-Score: -4.6 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.7 (-----) In 1981, 29 years ago, Intel introduced the 8087 floating-point coprocessor that implemented an early draft of the 1985 IEEE 754 standard for binary floating-point arithmetic. That chip, and all subsequent Intel IA-32, IA-64, EM64T, and AMD AMD64 (aka x86_64) architectures provide three floating-point formats in hardware: 32-bit 24-bit significand, number range ~= 1.4e-45 .. 3.40e38, roughly 7 decimal digits C type float 64-bit 53-bit significand, number range ~= 4.94e-324 .. 1.80e308 roughly 16 decimal digits C type double 80-bit (variously stored in 10, 12, or 16-byte memory blocks) 64-bit significand, number range ~= 3.64e-4951 .. 1.19e+4932 roughly 19 decimal digits C type long double Several other CPU platforms provide a 128-bit format instead of the 80-bit format, with these properties: 128-bit 113-bit significand, number range ~= 3.64e-4951 .. 1.19e+4932, roughly 34 decimal digits C type long double In 2009, the IEEE 754 Standard was revised to include the above, plus decimal arithmetic, the latter with these properties: 32-bit 7 digits, number range 1e-101 .. 9.999_999e+96 64-bit 16 digits, number range 1e-398 .. 9.999_999_999_999_999e+384 128-bit 34 digits, number range 1e-6176 .. 9.999_999_999_999_999_999_999_999_999_999_999e+6144 At present, up to version 8.5, coreutils uses only type double in its implementation of the -g sort-ordering option. The result is that it is unable to correctly sort files that use the entire number range of IEEE 754 binary arithmetic; indeed, the double format covers only about 6% of the possible binary range, and 5% of the decimal range. Please extend the next version of coreutils to use "long double" instead of "double" in this operation. Here is a patch that worked for one recent coreutils release: *** src/sort.c.~1~ Sun Jan 3 10:06:20 2010 --- src/sort.c Mon Jan 18 08:24:18 2010 *************** *** 1792,1799 **** --- 1792,1805 ---- char *ea; char *eb; + + #if 0 double a = strtod (sa, &ea); double b = strtod (sb, &eb); + #else + long double a = strtold (sa, &ea); + long double b = strtold (sb, &eb); + #endif /* Put conversion errors at the start of the collating sequence. */ if (sa == ea) The "long double" type is required by both C89 and C99, but the strtold() function appeared first in C99 (although many vendors supplied it before then). If strtold() is absent, then "long double x; if (sscanf(s, "%Lg", &x) == 1) {...}" is often a reasonable replacement. However, note that some aberrant systems implement "long double" as "double" (e.g., DEC Alpha OSF/1 4.x, Minix, and most *BSD distributions), and some implement it in doubled-double format, which increases the precision, but leaves the range at that of double. Examples of the latter include Apple Mac OS X on PowerPC, IBM AIX on PowerPC, and SGI IRIX MIPS. I suggest a configure-time check for strtold(), and if that works, then use "long double" in sort.c. ------------------------------------------------------------------------------- - Nelson H. F. Beebe Tel: +1 801 581 5254 - - University of Utah FAX: +1 801 581 4148 - - Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu - - 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org - - Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ - ------------------------------------------------------------------------------- From unknown Sat Aug 16 16:20:30 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6020: coreutils-8.x: a simple feature enhancement, and how to do it Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 28 Apr 2010 11:11:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6020 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: "Nelson H. F. Beebe" Cc: 6020@debbugs.gnu.org Received: via spool by 6020-submit@debbugs.gnu.org id=B6020.12724530327845 (code B ref 6020); Wed, 28 Apr 2010 11:11:02 +0000 Received: (at 6020) by debbugs.gnu.org; 28 Apr 2010 11:10:32 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O759z-00022U-Ud for submit@debbugs.gnu.org; Wed, 28 Apr 2010 07:10:32 -0400 Received: from mail1.slb.deg.dub.stisp.net ([84.203.253.98]) by debbugs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1O759v-00022M-SI for 6020@debbugs.gnu.org; Wed, 28 Apr 2010 07:10:28 -0400 Received: (qmail 84300 invoked from network); 28 Apr 2010 11:10:24 -0000 Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218) by mail1.slb.deg.dub.stisp.net with SMTP; 28 Apr 2010 11:10:24 -0000 Message-ID: <4BD8177F.70608@draigBrady.com> Date: Wed, 28 Apr 2010 12:09:51 +0100 From: =?UTF-8?Q?P=C3=A1draig?= Brady User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 MIME-Version: 1.0 References: In-Reply-To: X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: -1.6 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.9 (--) On 24/04/10 02:30, Nelson H. F. Beebe wrote: [snipped very useful floating point info] > At present, up to version 8.5, coreutils uses only type double in its > implementation of the -g sort-ordering option. The result is that it > is unable to correctly sort files that use the entire number range of > IEEE 754 binary arithmetic; indeed, the double format covers only > about 6% of the possible binary range, and 5% of the decimal range. This should do it. Using long double has no impact on performance on my pentium-m linux laptop. Note I tried converting to use xstrtold(), but that added about 25% overhead :( diff --git a/src/sort.c b/src/sort.c index 6d47b79..a815244 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1855,10 +1855,16 @@ general_numcompare (const char *sa, const char *sb) /* FIXME: maybe add option to try expensive FP conversion only if A and B can't be compared more cheaply/accurately. */ +#if HAVE_C99_STRTOLD /* provided by c-strtold module. */ +# define STRTOD strtold +#else +# define STRTOD strtod +#endif + char *ea; char *eb; - double a = strtod (sa, &ea); - double b = strtod (sb, &eb); + long double a = STRTOD (sa, &ea); + long double b = STRTOD (sb, &eb); /* Put conversion errors at the start of the collating sequence. */ if (sa == ea) > However, note that some aberrant systems implement "long double" as > "double" (e.g., DEC Alpha OSF/1 4.x, Minix, and most *BSD > distributions), and some implement it in doubled-double format, which > increases the precision, but leaves the range at that of double. > Examples of the latter include Apple Mac OS X on PowerPC, IBM AIX on > PowerPC, and SGI IRIX MIPS. I was wondering about a test for this: $ printf "3.64e-4951\n3.63e-4950\n" | ./sort -g 3.64e-4951 3.63e-4950 However I'm worried that will fail because of what you mention above. I probably need to add LDBL_{MIN,MAX} to getlimits. cheers, Pádraig. From unknown Sat Aug 16 16:20:30 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6020: coreutils-8.x: a simple feature enhancement, and how to do it Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 28 Apr 2010 23:41:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6020 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: "Nelson H. F. Beebe" Cc: 6020@debbugs.gnu.org Received: via spool by 6020-submit@debbugs.gnu.org id=B6020.12724980292934 (code B ref 6020); Wed, 28 Apr 2010 23:41:02 +0000 Received: (at 6020) by debbugs.gnu.org; 28 Apr 2010 23:40:29 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7Grj-0000lF-PE for submit@debbugs.gnu.org; Wed, 28 Apr 2010 19:40:28 -0400 Received: from mail1.slb.deg.dub.stisp.net ([84.203.253.98]) by debbugs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1O7Grg-0000l9-P6 for 6020@debbugs.gnu.org; Wed, 28 Apr 2010 19:40:26 -0400 Received: (qmail 58034 invoked from network); 28 Apr 2010 23:40:22 -0000 Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218) by mail1.slb.deg.dub.stisp.net with SMTP; 28 Apr 2010 23:40:22 -0000 Message-ID: <4BD8C742.8050701@draigBrady.com> Date: Thu, 29 Apr 2010 00:39:46 +0100 From: =?UTF-8?Q?P=C3=A1draig?= Brady User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 MIME-Version: 1.0 References: In-Reply-To: X-Enigmail-Version: 1.0.1 Content-Type: multipart/mixed; boundary="------------040902080405030106050408" X-Spam-Score: -1.6 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.9 (--) This is a multi-part message in MIME format. --------------040902080405030106050408 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit On 28/04/10 21:01, Nelson H. F. Beebe wrote: >>> ... >>> I was wondering about a test for this: >>> >>> $ printf "3.64e-4951\n3.63e-4950\n" | ./sort -g >>> 3.64e-4951 >>> 3.63e-4950 >>> >>> However I'm worried that will fail because of what you mention above. >>> I probably need to add LDBL_{MIN,MAX} to getlimits. >>> ... > > Here is what I see with the version that I patched some time ago > according to the proposal posted last week: > > % printf "3.64e-4951\n3.63e-4950\n" | sort-8.4 -g > 3.64e-4951 > 3.63e-4950 > > Why should getlimits() even be used? Surely it is enough to ask > strtold() to just return its best answer for the conversion of a > human-readable number string to (we hope the nearest) machine number. getlimits is just used in our tests. Because of the implicit rounding in strtold I'd need something independent of `sort` to output LDBL_MIN and LDBL_MAX to verify that sort is actually using long double if available on the platform. > You should not worry about execution time; there is a current huge > hole in the coverage of floating-point numbers with coreutil's "sort > -g" option that badly needs repair. Getting the right answer a bit > more slowly is much more important than getting the wrong answer fast. I'm always wary of performance. I was just pointing out that there is no slow down on my system. I'll push the attached sometime tomorrow. cheers, Pádraig --------------040902080405030106050408 Content-Type: text/x-patch; name="sort-long-double.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="sort-long-double.diff" >From c1a4e4d3778323e68aadc6671c5e3db49b378761 Mon Sep 17 00:00:00 2001 From: =?utf-8?q?P=C3=A1draig=20Brady?= Date: Wed, 28 Apr 2010 23:54:33 +0100 Subject: [PATCH] sort: use long doubles for general numeric mode * src/sort.c (general_numcompare): Use long doubles unconditionally, and strtold when available, to convert numbers with greater range and precision. Performance was seen to be on par with standard doubles. * src/getlimits.c (main): Output floating point limits for use in tests. * tests/misc/sort-float: A new test to ensure sort is using long doubles when possible. * tests/Makefile.am: Reference the new test. * NEWS: Mention the new behaviour. Reported by Nelson H. F. Beebe --- NEWS | 4 +++ src/getlimits.c | 16 ++++++++++++-- src/sort.c | 10 +++++++- tests/Makefile.am | 1 + tests/misc/sort-float | 51 +++++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 77 insertions(+), 5 deletions(-) create mode 100755 tests/misc/sort-float diff --git a/NEWS b/NEWS index fdb03fd..070f338 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,10 @@ GNU coreutils NEWS -*- outline -*- * Noteworthy changes in release ?.? (????-??-??) [?] +** Changes in behavior + + sort -g now uses long doubles for greater range and precision. + * Noteworthy changes in release 8.5 (2010-04-23) [stable] diff --git a/src/getlimits.c b/src/getlimits.c index 48d07b5..93d4035 100644 --- a/src/getlimits.c +++ b/src/getlimits.c @@ -19,6 +19,7 @@ #include /* sets _FILE_OFFSET_BITS=64 etc. */ #include #include +#include #include "system.h" #include "c-ctype.h" @@ -123,7 +124,7 @@ decimal_ascii_add (const char *str1, const char *str2) int main (int argc, char **argv) { - char limit[64]; /* big enough for 128 bit at least */ + char limit[64]; /* big enough for 128 bit integers at least */ char *oflow; initialize_main (&argc, &argv); @@ -139,20 +140,24 @@ main (int argc, char **argv) usage, AUTHORS, (char const *) NULL); #define print_int(TYPE) \ - snprintf (limit, sizeof limit, "%"PRIuMAX, (uintmax_t)TYPE##_MAX); \ + snprintf (limit, sizeof limit, "%"PRIuMAX, (uintmax_t)TYPE##_MAX); \ printf (#TYPE"_MAX=%s\n", limit); \ oflow = decimal_ascii_add (limit, "1"); \ printf (#TYPE"_OFLOW=%s\n", oflow); \ free (oflow); \ if (TYPE##_MIN) \ { \ - snprintf (limit, sizeof limit, "%"PRIdMAX, (intmax_t)TYPE##_MIN); \ + snprintf (limit, sizeof limit, "%"PRIdMAX, (intmax_t)TYPE##_MIN); \ printf (#TYPE"_MIN=%s\n", limit); \ oflow = decimal_ascii_add (limit, "-1"); \ printf (#TYPE"_UFLOW=%s\n", oflow); \ free (oflow); \ } +#define print_float(TYPE) \ + printf (#TYPE"_MIN=%Le\n", (long double)TYPE##_MIN); \ + printf (#TYPE"_MAX=%Le\n", (long double)TYPE##_MAX); + /* Variable sized ints */ print_int (CHAR); print_int (SCHAR); @@ -171,4 +176,9 @@ main (int argc, char **argv) print_int (OFF_T); print_int (INTMAX); print_int (UINTMAX); + + /* Variable sized floats */ + print_float (FLT); + print_float (DBL); + print_float (LDBL); } diff --git a/src/sort.c b/src/sort.c index 6d47b79..a815244 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1855,10 +1855,16 @@ general_numcompare (const char *sa, const char *sb) /* FIXME: maybe add option to try expensive FP conversion only if A and B can't be compared more cheaply/accurately. */ +#if HAVE_C99_STRTOLD /* provided by c-strtold module. */ +# define STRTOD strtold +#else +# define STRTOD strtod +#endif + char *ea; char *eb; - double a = strtod (sa, &ea); - double b = strtod (sb, &eb); + long double a = STRTOD (sa, &ea); + long double b = STRTOD (sb, &eb); /* Put conversion errors at the start of the collating sequence. */ if (sa == ea) diff --git a/tests/Makefile.am b/tests/Makefile.am index a943ff3..b78b75d 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -224,6 +224,7 @@ TESTS = \ misc/sort-compress \ misc/sort-continue \ misc/sort-files0-from \ + misc/sort-float \ misc/sort-merge \ misc/sort-merge-fdlimit \ misc/sort-month \ diff --git a/tests/misc/sort-float b/tests/misc/sort-float new file mode 100755 index 0000000..2854625 --- /dev/null +++ b/tests/misc/sort-float @@ -0,0 +1,51 @@ +#!/bin/sh +# Ensure sort -g sorts floating point limits correctly + +# Copyright (C) 2010 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +if test "$VERBOSE" = yes; then + set -x + mv --version +fi + +. $srcdir/test-lib.sh +getlimits_ + +# See if sort should be using long doubles +grep '^#define HAVE_C99_STRTOLD 1' $CONFIG_HEADER > /dev/null || + { LDBL_MAX="$DBL_MAX"; LDBL_MIN="$DBL_MIN"; } + +printf -- "\ +-$LDBL_MAX +-$DBL_MAX +-$FLT_MAX +-$FLT_MIN +-$DBL_MIN +-$LDBL_MIN +0 +$LDBL_MIN +$DBL_MIN +$FLT_MIN +$FLT_MAX +$DBL_MAX +$LDBL_MAX +" > exp + +tac exp | sort -sg > out || fail=1 + +compare out exp || fail=1 + +Exit $fail -- 1.6.2.5 --------------040902080405030106050408-- From unknown Sat Aug 16 16:20:30 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6020: coreutils-8.x: a simple feature enhancement, and how to do it Resent-From: Erik Auerswald Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 29 Apr 2010 06:30:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6020 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: =?UTF-8?Q?P=C3=A1draig?= Brady Cc: 6020@debbugs.gnu.org, "Nelson H. F. Beebe" Received: via spool by 6020-submit@debbugs.gnu.org id=B6020.127252256314878 (code B ref 6020); Thu, 29 Apr 2010 06:30:02 +0000 Received: (at 6020) by debbugs.gnu.org; 29 Apr 2010 06:29:23 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7NFS-0003ru-EM for submit@debbugs.gnu.org; Thu, 29 Apr 2010 02:29:23 -0400 Received: from mailgw1.uni-kl.de ([131.246.120.220]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7ND3-0003qI-Mr for 6020@debbugs.gnu.org; Thu, 29 Apr 2010 02:26:54 -0400 Received: from sushi.unix-ag.uni-kl.de (sushi.unix-ag.uni-kl.de [IPv6:2001:638:208:ef34:0:ff:fe00:65]) by mailgw1.uni-kl.de (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id o3T6QpZJ008061 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 29 Apr 2010 08:26:51 +0200 Received: from sushi.unix-ag.uni-kl.de (ip6-localhost [IPv6:::1]) by sushi.unix-ag.uni-kl.de (8.14.3/8.14.3/Debian-5+lenny1) with ESMTP id o3T6Qo2O019442 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 29 Apr 2010 08:26:50 +0200 Received: (from auerswal@localhost) by sushi.unix-ag.uni-kl.de (8.14.3/8.14.3/Submit) id o3T6Qn84019441; Thu, 29 Apr 2010 08:26:49 +0200 Date: Thu, 29 Apr 2010 08:26:49 +0200 From: Erik Auerswald Message-ID: <20100429062649.GA16414@sushi.unix-ag.uni-kl.de> References: <4BD8C742.8050701@draigBrady.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <4BD8C742.8050701@draigBrady.com> User-Agent: Mutt/1.5.18 (2008-05-17) Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mailgw1.uni-kl.de id o3T6QpZJ008061 X-Spam-Score: -6.6 (------) X-Mailman-Approved-At: Thu, 29 Apr 2010 02:29:20 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.6 (------) Hi, two nit-picks regarding the test script below: On Thu, Apr 29, 2010 at 12:39:46AM +0100, P=E1draig Brady wrote: > [...] > @@ -0,0 +1,51 @@ > +#!/bin/sh > +# Ensure sort -g sorts floating point limits correctly > [...] > +if test "$VERBOSE" =3D yes; then > + set -x > + mv --version ^^ sort would be nicer. > +fi > + > +. $srcdir/test-lib.sh > +getlimits_ > + > +# See if sort should be using long doubles > +grep '^#define HAVE_C99_STRTOLD 1' $CONFIG_HEADER > /dev/null || ^^^^^^^^^^^ -q would be more concise. Regards, Erik From unknown Sat Aug 16 16:20:30 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6020: coreutils-8.x: a simple feature enhancement, and how to do it Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 29 Apr 2010 08:35:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6020 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Erik Auerswald Cc: 6020@debbugs.gnu.org, "Nelson H. F. Beebe" Received: via spool by 6020-submit@debbugs.gnu.org id=B6020.127253008118584 (code B ref 6020); Thu, 29 Apr 2010 08:35:01 +0000 Received: (at 6020) by debbugs.gnu.org; 29 Apr 2010 08:34:41 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7PCj-0004ph-Cz for submit@debbugs.gnu.org; Thu, 29 Apr 2010 04:34:41 -0400 Received: from mail1.slb.deg.dub.stisp.net ([84.203.253.98]) by debbugs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1O7PCh-0004pa-0T for 6020@debbugs.gnu.org; Thu, 29 Apr 2010 04:34:40 -0400 Received: (qmail 50435 invoked from network); 29 Apr 2010 08:34:37 -0000 Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218) by mail1.slb.deg.dub.stisp.net with SMTP; 29 Apr 2010 08:34:37 -0000 Message-ID: <4BD94477.7090403@draigBrady.com> Date: Thu, 29 Apr 2010 09:33:59 +0100 From: =?UTF-8?Q?P=C3=A1draig?= Brady User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 MIME-Version: 1.0 References: <4BD8C742.8050701@draigBrady.com> <20100429062649.GA16414@sushi.unix-ag.uni-kl.de> In-Reply-To: <20100429062649.GA16414@sushi.unix-ag.uni-kl.de> X-Enigmail-Version: 1.0.1 Content-Type: multipart/mixed; boundary="------------000205040809060902040207" X-Spam-Score: -2.0 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.9 (--) This is a multi-part message in MIME format. --------------000205040809060902040207 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit On 29/04/10 07:26, Erik Auerswald wrote: > Hi, > > two nit-picks regarding the test script below: > > On Thu, Apr 29, 2010 at 12:39:46AM +0100, Pádraig Brady wrote: >> [...] >> @@ -0,0 +1,51 @@ >> +#!/bin/sh >> +# Ensure sort -g sorts floating point limits correctly >> [...] >> +if test "$VERBOSE" = yes; then >> + set -x >> + mv --version > ^^ > sort > would be nicer. Heh, I noticed that :) >> +# See if sort should be using long doubles >> +grep '^#define HAVE_C99_STRTOLD 1' $CONFIG_HEADER > /dev/null || > ^^^^^^^^^^^ > -q > would be more concise. and efficient (it exits on first match). However, even though POSIX specifies -q, it's not portable. Solaris' grep for example, does not support -q. We'll start using it at some stage though. My latest patch is attached which corrects the info docs to mention strtold() not strtod(). Also the test is updated to exclude floats in non standard formats just in case, and also checks the fr_FR locale where the RADIXCHAR is ',' cheers, Pádraig. --------------000205040809060902040207 Content-Type: text/x-patch; name="sort-long-double.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="sort-long-double.diff" >From a703e0d074d5f57bf0f32550264b72634b9e9df0 Mon Sep 17 00:00:00 2001 From: =?utf-8?q?P=C3=A1draig=20Brady?= Date: Wed, 28 Apr 2010 23:54:33 +0100 Subject: [PATCH] sort: use long doubles for general numeric mode * src/sort.c (general_numcompare): Use long doubles unconditionally, and strtold when available, to convert numbers with greater range and precision. Performance was seen to be on par with standard doubles. * doc/coreutils.texi (sort invocation): Amend the -g description to mention long double rather than double, and strtold rather than strtod. * src/getlimits.c (main): Output floating point limits for use in tests. * tests/misc/sort-float: A new test to ensure sort is using long doubles when possible, and that locale specific floats are handled. * tests/Makefile.am: Reference the new test. * tests/test-lib.sh (getlimits_): Normalize indenting. * NEWS: Mention the new behaviour. Reported by Nelson H. F. Beebe --- NEWS | 4 +++ doc/coreutils.texi | 4 +- src/getlimits.c | 16 +++++++++++-- src/sort.c | 10 +++++++- tests/Makefile.am | 1 + tests/misc/sort-float | 56 +++++++++++++++++++++++++++++++++++++++++++++++++ tests/test-lib.sh | 4 +- 7 files changed, 86 insertions(+), 9 deletions(-) create mode 100755 tests/misc/sort-float diff --git a/NEWS b/NEWS index fdb03fd..070f338 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,10 @@ GNU coreutils NEWS -*- outline -*- * Noteworthy changes in release ?.? (????-??-??) [?] +** Changes in behavior + + sort -g now uses long doubles for greater range and precision. + * Noteworthy changes in release 8.5 (2010-04-23) [stable] diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 73971c6..c8ba53c 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -3767,8 +3767,8 @@ the final result, after the throwing away.)) @opindex --sort @cindex general numeric sort @vindex LC_NUMERIC -Sort numerically, using the standard C function @code{strtod} to convert -a prefix of each line to a double-precision floating point number. +Sort numerically, using the standard C function @code{strtold} to convert +a prefix of each line to a long double-precision floating point number. This allows floating point numbers to be specified in scientific notation, like @code{1.0e-34} and @code{10e100}. The @env{LC_NUMERIC} locale determines the decimal-point character. diff --git a/src/getlimits.c b/src/getlimits.c index 48d07b5..93d4035 100644 --- a/src/getlimits.c +++ b/src/getlimits.c @@ -19,6 +19,7 @@ #include /* sets _FILE_OFFSET_BITS=64 etc. */ #include #include +#include #include "system.h" #include "c-ctype.h" @@ -123,7 +124,7 @@ decimal_ascii_add (const char *str1, const char *str2) int main (int argc, char **argv) { - char limit[64]; /* big enough for 128 bit at least */ + char limit[64]; /* big enough for 128 bit integers at least */ char *oflow; initialize_main (&argc, &argv); @@ -139,20 +140,24 @@ main (int argc, char **argv) usage, AUTHORS, (char const *) NULL); #define print_int(TYPE) \ - snprintf (limit, sizeof limit, "%"PRIuMAX, (uintmax_t)TYPE##_MAX); \ + snprintf (limit, sizeof limit, "%"PRIuMAX, (uintmax_t)TYPE##_MAX); \ printf (#TYPE"_MAX=%s\n", limit); \ oflow = decimal_ascii_add (limit, "1"); \ printf (#TYPE"_OFLOW=%s\n", oflow); \ free (oflow); \ if (TYPE##_MIN) \ { \ - snprintf (limit, sizeof limit, "%"PRIdMAX, (intmax_t)TYPE##_MIN); \ + snprintf (limit, sizeof limit, "%"PRIdMAX, (intmax_t)TYPE##_MIN); \ printf (#TYPE"_MIN=%s\n", limit); \ oflow = decimal_ascii_add (limit, "-1"); \ printf (#TYPE"_UFLOW=%s\n", oflow); \ free (oflow); \ } +#define print_float(TYPE) \ + printf (#TYPE"_MIN=%Le\n", (long double)TYPE##_MIN); \ + printf (#TYPE"_MAX=%Le\n", (long double)TYPE##_MAX); + /* Variable sized ints */ print_int (CHAR); print_int (SCHAR); @@ -171,4 +176,9 @@ main (int argc, char **argv) print_int (OFF_T); print_int (INTMAX); print_int (UINTMAX); + + /* Variable sized floats */ + print_float (FLT); + print_float (DBL); + print_float (LDBL); } diff --git a/src/sort.c b/src/sort.c index 6d47b79..a815244 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1855,10 +1855,16 @@ general_numcompare (const char *sa, const char *sb) /* FIXME: maybe add option to try expensive FP conversion only if A and B can't be compared more cheaply/accurately. */ +#if HAVE_C99_STRTOLD /* provided by c-strtold module. */ +# define STRTOD strtold +#else +# define STRTOD strtod +#endif + char *ea; char *eb; - double a = strtod (sa, &ea); - double b = strtod (sb, &eb); + long double a = STRTOD (sa, &ea); + long double b = STRTOD (sb, &eb); /* Put conversion errors at the start of the collating sequence. */ if (sa == ea) diff --git a/tests/Makefile.am b/tests/Makefile.am index a943ff3..b78b75d 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -224,6 +224,7 @@ TESTS = \ misc/sort-compress \ misc/sort-continue \ misc/sort-files0-from \ + misc/sort-float \ misc/sort-merge \ misc/sort-merge-fdlimit \ misc/sort-month \ diff --git a/tests/misc/sort-float b/tests/misc/sort-float new file mode 100755 index 0000000..639cd7e --- /dev/null +++ b/tests/misc/sort-float @@ -0,0 +1,56 @@ +#!/bin/sh +# Ensure sort -g sorts floating point limits correctly + +# Copyright (C) 2010 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +if test "$VERBOSE" = yes; then + set -x + sort --version +fi + +. $srcdir/test-lib.sh + +for LOC in C $LOCALE_FR; do + + LC_ALL=$LOC getlimits_ + + # See if sort should be using long doubles + grep '^#define HAVE_C99_STRTOLD 1' $CONFIG_HEADER > /dev/null || + { LDBL_MAX="$DBL_MAX"; LDBL_MIN="$DBL_MIN"; } + + printf -- "\ +-$LDBL_MAX +-$DBL_MAX +-$FLT_MAX +-$FLT_MIN +-$DBL_MIN +-$LDBL_MIN +0 +$LDBL_MIN +$DBL_MIN +$FLT_MIN +$FLT_MAX +$DBL_MAX +$LDBL_MAX +" | + grep '^[0-9.,e+-]*$' > exp # restrict to numeric just in case + + tac exp | LC_ALL=$LOC sort -sg > out || fail=1 + + compare out exp || fail=1 +done + +Exit $fail diff --git a/tests/test-lib.sh b/tests/test-lib.sh index a62857b..ac2f8bf 100644 --- a/tests/test-lib.sh +++ b/tests/test-lib.sh @@ -57,8 +57,8 @@ skip_test_() getlimits_() { - eval $(getlimits) - test "$INT_MAX" || + eval $(getlimits) + test "$INT_MAX" || error_ "Error running getlimits" } -- 1.6.2.5 --------------000205040809060902040207-- From unknown Sat Aug 16 16:20:30 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6020: coreutils-8.x: a simple feature enhancement, and how to do it Resent-From: Jim Meyering Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 29 Apr 2010 09:05:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6020 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: =?UTF-8?Q?P=C3=A1draig?= Brady Cc: Erik Auerswald , 6020@debbugs.gnu.org, "Nelson H. F. Beebe" Received: via spool by 6020-submit@debbugs.gnu.org id=B6020.127253190119546 (code B ref 6020); Thu, 29 Apr 2010 09:05:02 +0000 Received: (at 6020) by debbugs.gnu.org; 29 Apr 2010 09:05:01 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7Pg4-00055C-Km for submit@debbugs.gnu.org; Thu, 29 Apr 2010 05:05:01 -0400 Received: from smtp1-g21.free.fr ([212.27.42.1]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7Pg0-000554-MR for 6020@debbugs.gnu.org; Thu, 29 Apr 2010 05:04:58 -0400 Received: from smtp1-g21.free.fr (localhost [127.0.0.1]) by smtp1-g21.free.fr (Postfix) with ESMTP id 710DC940130 for <6020@debbugs.gnu.org>; Thu, 29 Apr 2010 11:04:52 +0200 (CEST) Received: from mx.meyering.net (mx.meyering.net [82.230.74.64]) by smtp1-g21.free.fr (Postfix) with ESMTP id 8CCF8940179 for <6020@debbugs.gnu.org>; Thu, 29 Apr 2010 11:04:50 +0200 (CEST) Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id 6309A10B31; Thu, 29 Apr 2010 11:04:50 +0200 (CEST) From: Jim Meyering In-Reply-To: <4BD94477.7090403@draigBrady.com> =?UTF-8?Q?("P=C3=A1draig?= Brady"'s message of "Thu, 29 Apr 2010 09:33:59 +0100") References: <4BD8C742.8050701@draigBrady.com> <20100429062649.GA16414@sushi.unix-ag.uni-kl.de> <4BD94477.7090403@draigBrady.com> Date: Thu, 29 Apr 2010 11:04:50 +0200 Message-ID: <871vdyvdcd.fsf@meyering.net> Lines: 43 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -3.1 (---) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.1 (---) P=C3=A1draig Brady wrote: > On 29/04/10 07:26, Erik Auerswald wrote: >> Hi, >> >> two nit-picks regarding the test script below: >> >> On Thu, Apr 29, 2010 at 12:39:46AM +0100, P=C3=A1draig Brady wrote: >>> [...] >>> @@ -0,0 +1,51 @@ >>> +#!/bin/sh >>> +# Ensure sort -g sorts floating point limits correctly >>> [...] >>> +if test "$VERBOSE" =3D yes; then >>> + set -x >>> + mv --version >> ^^ >> sort >> would be nicer. > > Heh, I noticed that :) > >>> +# See if sort should be using long doubles >>> +grep '^#define HAVE_C99_STRTOLD 1' $CONFIG_HEADER > /dev/null || >> ^^^^^^^^^^^ >> -q >> would be more concise. > > and efficient (it exits on first match). When efficiency is important, sometimes I've used -l. (e.g., in maint.mk) That is portable and also makes grep stop searching upon first match. However, you'd probably still want to redirect its stdout. Useful when grep would otherwise search much more or generate much more output. > However, even though POSIX specifies -q, it's not portable. > Solaris' grep for example, does not support -q. > We'll start using it at some stage though. > > My latest patch is attached which corrects the info docs > to mention strtold() not strtod(). Thanks for writing that. From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 29 19:50:45 2010 Received: (at control) by debbugs.gnu.org; 29 Apr 2010 23:50:45 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7dVE-0004F4-Ff for submit@debbugs.gnu.org; Thu, 29 Apr 2010 19:50:44 -0400 Received: from mail1.slb.deg.dub.stisp.net ([84.203.253.98]) by debbugs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1O7dVC-0004Ey-Hd for control@debbugs.gnu.org; Thu, 29 Apr 2010 19:50:43 -0400 Received: (qmail 68347 invoked from network); 29 Apr 2010 23:50:42 -0000 Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218) by mail1.slb.deg.dub.stisp.net with SMTP; 29 Apr 2010 23:50:42 -0000 Message-ID: <4BDA1B29.3080807@draigBrady.com> Date: Fri, 30 Apr 2010 00:50:01 +0100 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 MIME-Version: 1.0 To: control@debbugs.gnu.org X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -2.0 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.0 (--) close 6020 8.6 From unknown Sat Aug 16 16:20:30 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6020: coreutils-8.x: a simple feature enhancement, and how to do it Resent-From: Paul Eggert Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 30 Apr 2010 18:23:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6020 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: =?UTF-8?Q?P=C3=A1draig?= Brady Cc: 6020@debbugs.gnu.org, "Nelson H. F. Beebe" Received: via spool by 6020-submit@debbugs.gnu.org id=B6020.127265173524149 (code B ref 6020); Fri, 30 Apr 2010 18:23:01 +0000 Received: (at 6020) by debbugs.gnu.org; 30 Apr 2010 18:22:15 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7uqs-0006HS-DF for submit@debbugs.gnu.org; Fri, 30 Apr 2010 14:22:14 -0400 Received: from kiwi.cs.ucla.edu ([131.179.128.19]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7ulr-0006EI-5m for 6020@debbugs.gnu.org; Fri, 30 Apr 2010 14:17:04 -0400 Received: from penguin (Penguin.CS.UCLA.EDU [131.179.64.200]) by kiwi.cs.ucla.edu (8.13.8+Sun/8.13.8/UCLACS-6.0) with ESMTP id o3UIGv1L008065; Fri, 30 Apr 2010 11:16:57 -0700 (PDT) From: Paul Eggert References: <4BD8177F.70608@draigBrady.com> Date: Fri, 30 Apr 2010 11:16:56 -0700 In-Reply-To: <4BD8177F.70608@draigBrady.com> =?UTF-8?Q?("P=C3=A1draig?= Brady"'s message of "Wed, 28 Apr 2010 12:09:51 +0100") Message-ID: <87sk6cg607.fsf@cs.ucla.edu> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.6 (--) X-Mailman-Approved-At: Fri, 30 Apr 2010 14:22:14 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) P=C3=A1draig Brady writes: > +#if HAVE_C99_STRTOLD /* provided by c-strtold module. */ > +# define STRTOD strtold > +#else > +# define STRTOD strtod > +#endif > + > char *ea; > char *eb; > - double a =3D strtod (sa, &ea); > - double b =3D strtod (sb, &eb); > + long double a =3D STRTOD (sa, &ea); > + long double b =3D STRTOD (sb, &eb); This could cause performance problems on machines that have slow long-double operations (implemented via traps, say) and that lack strtold. How about doing something like this instead? It tries to move as much of the mess as possible to the #if part. #if HAVE_C99_STRTOLD # define long_double long double #else # define long_double double # undef strtold # define strtold strtod #endif ... long_double a =3D strtold (sa, &ea); long_double a =3D strtold (sa, &ea); From unknown Sat Aug 16 16:20:30 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6020: coreutils-8.x: a simple feature enhancement, and how to do it References: Resent-From: "Nelson H. F. Beebe" Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 30 Apr 2010 18:37:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6020 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: 6020@debbugs.gnu.org, =?UTF-8?Q?P=C3=A1draig?= Brady , "Nelson H. F. Beebe" Received: via spool by 6020-submit@debbugs.gnu.org id=B6020.127265260624560 (code B ref 6020); Fri, 30 Apr 2010 18:37:02 +0000 Received: (at 6020) by debbugs.gnu.org; 30 Apr 2010 18:36:46 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7v4v-0006O5-N6 for submit@debbugs.gnu.org; Fri, 30 Apr 2010 14:36:46 -0400 Received: from mail.math.utah.edu ([155.101.98.135]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7v4t-0006O0-2k for 6020@debbugs.gnu.org; Fri, 30 Apr 2010 14:36:44 -0400 Received: from psi.math.utah.edu (psi.math.utah.edu [155.101.96.19]) by mail.math.utah.edu (8.14.3/8.14.3) with ESMTP id o3UIa8FU004713; Fri, 30 Apr 2010 12:36:08 -0600 (MDT) Received: from psi.math.utah.edu (localhost [127.0.0.1]) by psi.math.utah.edu (8.14.3/8.14.3) with ESMTP id o3UIa8fN028845; Fri, 30 Apr 2010 12:36:08 -0600 (MDT) Received: (from beebe@localhost) by psi.math.utah.edu (8.14.3/8.14.3/Submit) id o3UIa8QD028844; Fri, 30 Apr 2010 12:36:08 -0600 (MDT) Date: Fri, 30 Apr 2010 12:36:08 -0600 (MDT) From: "Nelson H. F. Beebe" X-US-Mail: "Department of Mathematics, 110 LCB, University of Utah, 155 S 1400 E RM 233, Salt Lake City, UT 84112-0090, USA" X-Telephone: +1 801 581 5254 X-FAX: +1 801 585 1640, +1 801 581 4148 X-URL: http://www.math.utah.edu/~beebe In-Reply-To: <87sk6cg607.fsf@cs.ucla.edu> Message-ID: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0 (mail.math.utah.edu [155.101.98.135]); Fri, 30 Apr 2010 12:36:08 -0600 (MDT) X-Spam-Score: -2.6 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.4 (---) Paul Eggert writes about the proposed use of long double instead of double in sort's -g option: >> ... >> This could cause performance problems on machines that have slow >> long-double operations (implemented via traps, say) and that lack >> strtold. >> ... Because using double instead of long double cripples sorting of numerical data by drastically reducing the number range, I would much rather pay a premium in run time to get the right answer, rather than a useless wrong answer as GNU sort currently does. If you folks want to consider alternatives, we could have something like this: -g, --general-numeric-sort compare according to general numerical value in type double -gg same as before, but with type long double -ggg same as before, but with general multiple-precision floating arithmetic using the gmp library However, I'd much prefer a single option, and correct output. Most people don't need floating-point comparisons in their sorts, but for those of us who do, correctness trumps speed every time. There should not be a problem in using -lgmp on modern systems, because (a) it is very portable, and (b) it is required by all gcc-4.x installations (and we reached the first release of the gcc-4.6 family on 16-Apr-2010). Indeed, -lgmp has been tested for back to at least coreutils-7.0. ------------------------------------------------------------------------------- - Nelson H. F. Beebe Tel: +1 801 581 5254 - - University of Utah FAX: +1 801 581 4148 - - Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu - - 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org - - Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ - ------------------------------------------------------------------------------- From unknown Sat Aug 16 16:20:30 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6020: coreutils-8.x: a simple feature enhancement, and how to do it Resent-From: Paul Eggert Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 30 Apr 2010 18:58:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6020 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: "Nelson H. F. Beebe" Cc: 6020@debbugs.gnu.org, =?UTF-8?Q?P=C3=A1draig?= Brady Received: via spool by 6020-submit@debbugs.gnu.org id=B6020.127265385825150 (code B ref 6020); Fri, 30 Apr 2010 18:58:02 +0000 Received: (at 6020) by debbugs.gnu.org; 30 Apr 2010 18:57:38 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7vP8-0006Xb-Bl for submit@debbugs.gnu.org; Fri, 30 Apr 2010 14:57:38 -0400 Received: from kiwi.cs.ucla.edu ([131.179.128.19]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7vP6-0006XV-AJ for 6020@debbugs.gnu.org; Fri, 30 Apr 2010 14:57:37 -0400 Received: from penguin (Penguin.CS.UCLA.EDU [131.179.64.200]) by kiwi.cs.ucla.edu (8.13.8+Sun/8.13.8/UCLACS-6.0) with ESMTP id o3UIvU6g009343; Fri, 30 Apr 2010 11:57:31 -0700 (PDT) From: Paul Eggert References: Date: Fri, 30 Apr 2010 11:57:30 -0700 In-Reply-To: (Nelson H. F. Beebe's message of "Fri, 30 Apr 2010 12:36:08 -0600 (MDT)") Message-ID: <87fx2cg44l.fsf@cs.ucla.edu> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Score: -2.6 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) "Nelson H. F. Beebe" writes: > Because using double instead of long double cripples sorting of > numerical data by drastically reducing the number range, I would much > rather pay a premium in run time to get the right answer, rather than > a useless wrong answer as GNU sort currently does. Yes, that's right. But my note was not about that. It was about improving performance while not affecting behavior. If all we have is strtod, there's no point doing a long-double compare. > -g, --general-numeric-sort compare according to general numerical > value in type double > -gg same as before, but with type long double > -ggg same as before, but with general multiple-precision floating > arithmetic using the gmp library Even better, just compare the numbers as text, without converting them to internal format. This is already done for -n, and would be much faster than any of the above approaches. If done right it would be just as accurate as gmp. (It wouldn't be trivial, though.) From unknown Sat Aug 16 16:20:30 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6020: coreutils-8.x: a simple feature enhancement, and how to do it References: Resent-From: "Nelson H. F. Beebe" Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 30 Apr 2010 22:13:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6020 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert , =utf-8QP=C3=A1draig=Brady , 6020@debbugs.gnu.org Cc: beebe@math.utah.edu Received: via spool by 6020-submit@debbugs.gnu.org id=B6020.127266552430530 (code B ref 6020); Fri, 30 Apr 2010 22:13:01 +0000 Received: (at 6020) by debbugs.gnu.org; 30 Apr 2010 22:12:04 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7yRG-0007wN-Qy for submit@debbugs.gnu.org; Fri, 30 Apr 2010 18:12:03 -0400 Received: from mail.math.utah.edu ([155.101.98.135]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7yRE-0007vz-Cz for 6020@debbugs.gnu.org; Fri, 30 Apr 2010 18:12:01 -0400 Received: from psi.math.utah.edu (psi.math.utah.edu [155.101.96.19]) by mail.math.utah.edu (8.14.3/8.14.3) with ESMTP id o3UMBsgm020678; Fri, 30 Apr 2010 16:11:54 -0600 (MDT) Received: from psi.math.utah.edu (localhost [127.0.0.1]) by psi.math.utah.edu (8.14.3/8.14.3) with ESMTP id o3UMBsE0029171; Fri, 30 Apr 2010 16:11:54 -0600 (MDT) Received: (from beebe@localhost) by psi.math.utah.edu (8.14.3/8.14.3/Submit) id o3UMBrRK029170; Fri, 30 Apr 2010 16:11:53 -0600 (MDT) Date: Fri, 30 Apr 2010 16:11:53 -0600 (MDT) From: "Nelson H. F. Beebe" X-US-Mail: "Department of Mathematics, 110 LCB, University of Utah, 155 S 1400 E RM 233, Salt Lake City, UT 84112-0090, USA" X-Telephone: +1 801 581 5254 X-FAX: +1 801 585 1640, +1 801 581 4148 X-URL: http://www.math.utah.edu/~beebe In-Reply-To: <87fx2cg44l.fsf@cs.ucla.edu> Message-ID: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0 (mail.math.utah.edu [155.101.98.135]); Fri, 30 Apr 2010 16:11:54 -0600 (MDT) X-Spam-Score: -3.3 (---) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.2 (---) >> If all we have is strtod, there's no point doing a long-double compare. True, UNLESS sscanf("%lg", &x) works: I've had to use that alternative in the past. >> ... >> Even better, just compare the numbers as text, without converting them >> to internal format. This is already done for -n, and would be much >> faster than any of the above approaches. If done right it would be just >> as accurate as gmp. (It wouldn't be trivial, though.) >> ... Yes, that would work, but care needs to be taken to handle the formats that strtold() supports (decimal 1.2345e+678 and C99 hexadecimal 0x1.f3dp-2047), and that essentially mentions having to convert from hexadecimal to decimal, and effectively deal with many of the same problems that strtold() handles. The code also must deal with issues of leading and trailing zeros in significand and exponent. Quick: how are these ordered? 1.00000000000000000000000000000000000000000000000000e-50 0.00000000000000000000000000000000000000000000000001e+00 Answer: they are equal. And these: +0.00000000000000000000000000000000000000000000000000e-50 -0.00000000000000000000000000000000000000000000000000e+00 Answer: both are zero, but the negative one should precede the positive one in sort (on the grounds that negative zero really means tiny negative that was too small to represent). So, indeed, a string-based comparison is not trivial. ------------------------------------------------------------------------------- - Nelson H. F. Beebe Tel: +1 801 581 5254 - - University of Utah FAX: +1 801 581 4148 - - Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu - - 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org - - Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ - ------------------------------------------------------------------------------- From unknown Sat Aug 16 16:20:30 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6020: coreutils-8.x: a simple feature enhancement, and how to do it Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 30 Apr 2010 22:35:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6020 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: 6020@debbugs.gnu.org, "Nelson H. F. Beebe" Received: via spool by 6020-submit@debbugs.gnu.org id=B6020.127266685531054 (code B ref 6020); Fri, 30 Apr 2010 22:35:01 +0000 Received: (at 6020) by debbugs.gnu.org; 30 Apr 2010 22:34:15 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O7ymk-00084p-6w for submit@debbugs.gnu.org; Fri, 30 Apr 2010 18:34:14 -0400 Received: from mail1.slb.deg.dub.stisp.net ([84.203.253.98]) by debbugs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1O7ymi-00084h-7y for 6020@debbugs.gnu.org; Fri, 30 Apr 2010 18:34:12 -0400 Received: (qmail 51621 invoked from network); 30 Apr 2010 22:34:07 -0000 Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218) by mail1.slb.deg.dub.stisp.net with SMTP; 30 Apr 2010 22:34:07 -0000 Message-ID: <4BDB5AB1.3060303@draigBrady.com> Date: Fri, 30 Apr 2010 23:33:21 +0100 From: =?UTF-8?Q?P=C3=A1draig?= Brady User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 MIME-Version: 1.0 References: <4BD8177F.70608@draigBrady.com> <87sk6cg607.fsf@cs.ucla.edu> In-Reply-To: <87sk6cg607.fsf@cs.ucla.edu> X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.8 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.8 (--) On 30/04/10 19:16, Paul Eggert wrote: > Pádraig Brady writes: > >> +#if HAVE_C99_STRTOLD /* provided by c-strtold module. */ >> +# define STRTOD strtold >> +#else >> +# define STRTOD strtod >> +#endif >> + >> char *ea; >> char *eb; >> - double a = strtod (sa, &ea); >> - double b = strtod (sb, &eb); >> + long double a = STRTOD (sa, &ea); >> + long double b = STRTOD (sb, &eb); > > This could cause performance problems on machines that have slow > long-double operations (implemented via traps, say) and that lack > strtold. An unusual combination, buy you're right. I'll push this soon in your name. thanks for the review, Pádraig. commit 1a2afe2adde1f4e864ac098e5d0ede6fcc6b46db Author: Paul Eggert Date: Fri Apr 30 23:23:38 2010 +0100 sort: use long doubles only when effective * src/sort.c (general_numcompare): Don't use long double if strtold is not available, as it may introduce needless overhead. diff --git a/src/sort.c b/src/sort.c index a815244..54b97e2 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1856,15 +1856,17 @@ general_numcompare (const char *sa, const char *sb) only if A and B can't be compared more cheaply/accurately. */ #if HAVE_C99_STRTOLD /* provided by c-strtold module. */ -# define STRTOD strtold +# define long_double long double #else -# define STRTOD strtod +# define long_double double +# undef strtold +# define strtold strtod #endif char *ea; char *eb; - long double a = STRTOD (sa, &ea); - long double b = STRTOD (sb, &eb); + long_double a = strtold (sa, &ea); + long_double b = strtold (sb, &eb); /* Put conversion errors at the start of the collating sequence. */ if (sa == ea)