GNU bug report logs - #66256
sorting NAN values with "general-numeric’

Reported by: Jorge Stolfi <stolfi <at> ic.unicamp.br>

Date: Thu, 28 Sep 2023 11:16:02 UTC

Severity: normal

To reply to this bug, email your comments to 66256 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-coreutils <at> gnu.org:
bug#66256; Package coreutils. (Thu, 28 Sep 2023 11:16:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jorge Stolfi <stolfi <at> ic.unicamp.br>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 28 Sep 2023 11:16:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jorge Stolfi <stolfi <at> ic.unicamp.br>
To: bug-coreutils <at> gnu.org
Subject: sorting NAN values with "general-numeric’
Date: Thu, 28 Sep 2023 07:43:52 -0300

The full documentation of the "--general-numeric-sort" option of  
{sort} says that NaN values are sorted "in a consistent but  
machine-dependent order".

This is not good. The point of the IEEE floating-point standard was to  
make the results of floating-point computations be independent of the  
platform or implementation.

Please consider extending that goal to the handling of NaNs by {sort}.  
 That it, all flavors of NaN (determined by their char tails, as  
parsed by {strtod}) should be treated as equal.

The fact that different flavors of NaN have distinct binary  
representation is not an excuse to sort them as distinct, since the  
same is true of +0 and -0, which "general-numeric" sort already treats  
as equal.

As a separate suggestion, please consider having {sort} abort with an  
error message if any field that is supposed to be sorted with  
"general-numeric" is not a valid {double} value, or has some leftover  
chars that are not parsed by {strtod}.

Whether these solutions are accepted or not, please change the manpage  
explanation of "-g"/"--general-numeric-sort" to say, at least, "the  
field is parsed as a double-precision (64-bit) floating-point number  
and sorted by its numeric value".

Thanks, and all the best,

--jorge
-- 
Jorge Stolfi - Professor Titular/Full Professor
Instituto de Computação/Computer Science Dept
Universidade Estadual de Campinas/State University of Campinas
Campinas, SP - Brazil

Information forwarded to bug-coreutils <at> gnu.org:
bug#66256; Package coreutils. (Thu, 28 Sep 2023 11:54:01 GMT) Full text and rfc822 format available.

Message #8 received at 66256 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jorge Stolfi <stolfi <at> ic.unicamp.br>, 66256 <at> debbugs.gnu.org
Subject: Re: bug#66256: sorting NAN values with "general-numeric’
Date: Thu, 28 Sep 2023 12:52:47 +0100

On 28/09/2023 11:43, Jorge Stolfi wrote:
> 
> The full documentation of the "--general-numeric-sort" option of
> {sort} says that NaN values are sorted "in a consistent but
> machine-dependent order".
> 
> This is not good. The point of the IEEE floating-point standard was to
> make the results of floating-point computations be independent of the
> platform or implementation.
> 
> Please consider extending that goal to the handling of NaNs by {sort}.
>    That it, all flavors of NaN (determined by their char tails, as
> parsed by {strtod}) should be treated as equal.
> 
> The fact that different flavors of NaN have distinct binary
> representation is not an excuse to sort them as distinct, since the
> same is true of +0 and -0, which "general-numeric" sort already treats
> as equal.
> 
> As a separate suggestion, please consider having {sort} abort with an
> error message if any field that is supposed to be sorted with
> "general-numeric" is not a valid {double} value, or has some leftover
> chars that are not parsed by {strtod}.
> 
> Whether these solutions are accepted or not, please change the manpage
> explanation of "-g"/"--general-numeric-sort" to say, at least, "the
> field is parsed as a double-precision (64-bit) floating-point number
> and sorted by its numeric value".
> 
> Thanks, and all the best,

No comment on the actual ordering of NaNs, but
note NaN ordering changed recently in coreutils 9.2,
as discussed at https://bugs.gnu.org/55212

cheers,
Pádraig

Information forwarded to bug-coreutils <at> gnu.org:
bug#66256; Package coreutils. (Thu, 28 Sep 2023 20:46:01 GMT) Full text and rfc822 format available.

Message #11 received at 66256 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>,
 Jorge Stolfi <stolfi <at> ic.unicamp.br>, 66256 <at> debbugs.gnu.org
Subject: Re: bug#66256: sorting NAN values with "general-numeric’
Date: Thu, 28 Sep 2023 13:44:50 -0700

On my long list of things to do is to have sort -g sort more 
deterministically with NaNs. This could be done with the new totalorder 
and totalorderl functions in C23 Annex F.10.12.1, if available. The fix 
would not be portable (a these functions are newly sort-of-standardized 
and are often not available) but it should be better than nothing.

Of course the other problem is that there's no standard textual 
representation of NaN payloads (i.e., their fractions).

This bug report was last modified 1 year and 316 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #66256 sorting NAN values with "general-numeric’

GNU bug report logs - #66256
sorting NAN values with "general-numeric’