GNU bug report logs - #30661
sort: add sort-by-hex-number feature

Previous Next

Package: coreutils;

Reported by: James Bunke <james_a_bunke <at> yahoo.com>

Date: Thu, 1 Mar 2018 00:02:01 UTC

Severity: wishlist

To reply to this bug, email your comments to 30661 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#30661; Package coreutils. (Thu, 01 Mar 2018 00:02:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to James Bunke <james_a_bunke <at> yahoo.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 01 Mar 2018 00:02:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: James Bunke <james_a_bunke <at> yahoo.com>
To: "bug-coreutils <at> gnu.org" <bug-coreutils <at> gnu.org>
Subject: sort
Date: Wed, 28 Feb 2018 23:42:59 +0000 (UTC)
[Message part 1 (text/plain, inline)]
To: bug-coreutils <at> gnu.org

This seems an oversight than an actual bug:

    'sort -n' thinks "B" is a larger value than "AA" -- yep! someone
forgot about hexadecimal, but binary, octal, and decimal work fine.

Suggestion: Don't revert to alphanumeric sorting until the rules are
broken by the sort field:

    1) There is an optional leading Plus(+) or Minus(-) but just one.
    2) There is an optional single Point(.) that may occur anywhere
       within the field except before an optional Plus or Minus.
    3) Numerals are limited to "0123456789ABCDEFabcdef".
    4) No white space, other letters, or other punctuation allowed or
       revert to alphanumeric sort.

Thank You,
J.B.

P.S.: Shouldn't be necessary to transform data to sort it...
      Use '-nx' or '-gx' if you must, but it shouldn't be needed.

[Message part 2 (text/html, inline)]

Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Thu, 01 Mar 2018 16:16:02 GMT) Full text and rfc822 format available.

Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Thu, 01 Mar 2018 16:16:02 GMT) Full text and rfc822 format available.

Notification sent to James Bunke <james_a_bunke <at> yahoo.com>:
bug acknowledged by developer. (Thu, 01 Mar 2018 16:16:02 GMT) Full text and rfc822 format available.

Message #12 received at 30661-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: James Bunke <james_a_bunke <at> yahoo.com>, 30661-done <at> debbugs.gnu.org,
 GNU bug control <control <at> debbugs.gnu.org>
Subject: Re: bug#30661: sort
Date: Thu, 1 Mar 2018 10:15:36 -0600
tag 30661 notabug
thanks

On 02/28/2018 05:42 PM, James Bunke wrote:
> To: bug-coreutils <at> gnu.org
> 
> This seems an oversight than an actual bug:
> 
>      'sort -n' thinks "B" is a larger value than "AA" -- yep! someone
> forgot about hexadecimal, but binary, octal, and decimal work fine.

Please demonstrate an actual command line that you typed and output you 
got.  Here's what I tried in reproducing your claim:

$ printf 'AA\nB\n' | LC_ALL=C sort --debug -n
sort: using simple byte comparison
AA
^ no match for key
__
B
^ no match for key
_

As I typed it, 'sort -n' outputs the line AA before the line B because 
of fallback sorting rules (the entire line is used when none of the keys 
produced a difference, and since neither line was numeric, they were 
equivalently treated as '0' by -n), contrary to your claim that sort 
takes 'B' first.  Therefore, I don't know if my attempt matches what you 
actually saw, as you did not give very many details other than a vague 
verbal description of your issue.

> 
> Suggestion: Don't revert to alphanumeric sorting until the rules are
> broken by the sort field:

Sorry, but 'sort -n' behavior is specified by POSIX, and we can't change 
it, as that would break scripts that expect POSIX behavior.  Most 
likely, sort can already do what you want with additional command line 
options, but I don't even know what data you want sorted, or what output 
you actually want, to tell you what command line would give the output 
you want.  The --debug option can be great at learning what sort is 
actually doing (and how it is more likely that your request is 
incomplete, rather than sort misbehaving).

As such, I'm closing this as not a bug, as you have not demonstrated an 
actual POSIX compliance issue; but do feel free to provide us with more 
information, and we can reopen this if you actually do come up with 
something that needs addressing beyond what sort can already do when 
invoked correctly.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Information forwarded to bug-coreutils <at> gnu.org:
bug#30661; Package coreutils. (Thu, 01 Mar 2018 23:41:01 GMT) Full text and rfc822 format available.

Message #15 received at 30661 <at> debbugs.gnu.org (full text, mbox):

From: James Bunke <james_a_bunke <at> yahoo.com>
To: "30661 <at> debbugs.gnu.org" <30661 <at> debbugs.gnu.org>
Subject: Re: bug#30661: closed (Re: bug#30661: sort)
Date: Thu, 1 Mar 2018 23:29:59 +0000 (UTC)
[Message part 1 (text/plain, inline)]
$ echo -e "170\n11" | sort -n
11
170
$ echo -e "AA\nB" | sort -n
AA
B
$ echo -e "0xAA\n0xB" | sort -n
0xAA
0xB

Perhaps its the documentation that is lacking as I find no reference to hexadecimal ineither the "man" or "info" on sort -- can it sort hexadecimal? No information  on whatsort considers to be a "numeral" or expects hexadecimal to be represented. I was justattempting to skip extra processes to convert the data or to write my own sort process.
Thank you for your efforts on my behalf. Do you know who handles the documentation?Maybe there is newer man/info than on this old machine.
  

    On Thursday, March 1, 2018 11:16 AM, GNU bug Tracking System <help-debbugs <at> gnu.org> wrote:
 

 Your bug report

#30661: sort

which was filed against the coreutils package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 30661 <at> debbugs.gnu.org.

-- 
30661: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=30661
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problemstag 30661 notabug
thanks

On 02/28/2018 05:42 PM, James Bunke wrote:
> To: bug-coreutils <at> gnu.org
> 
> This seems an oversight than an actual bug:
> 
>      'sort -n' thinks "B" is a larger value than "AA" -- yep! someone
> forgot about hexadecimal, but binary, octal, and decimal work fine.

Please demonstrate an actual command line that you typed and output you 
got.  Here's what I tried in reproducing your claim:

$ printf 'AA\nB\n' | LC_ALL=C sort --debug -n
sort: using simple byte comparison
AA
^ no match for key
__
B
^ no match for key
_

As I typed it, 'sort -n' outputs the line AA before the line B because 
of fallback sorting rules (the entire line is used when none of the keys 
produced a difference, and since neither line was numeric, they were 
equivalently treated as '0' by -n), contrary to your claim that sort 
takes 'B' first.  Therefore, I don't know if my attempt matches what you 
actually saw, as you did not give very many details other than a vague 
verbal description of your issue.

> 
> Suggestion: Don't revert to alphanumeric sorting until the rules are
> broken by the sort field:

Sorry, but 'sort -n' behavior is specified by POSIX, and we can't change 
it, as that would break scripts that expect POSIX behavior.  Most 
likely, sort can already do what you want with additional command line 
options, but I don't even know what data you want sorted, or what output 
you actually want, to tell you what command line would give the output 
you want.  The --debug option can be great at learning what sort is 
actually doing (and how it is more likely that your request is 
incomplete, rather than sort misbehaving).

As such, I'm closing this as not a bug, as you have not demonstrated an 
actual POSIX compliance issue; but do feel free to provide us with more 
information, and we can reopen this if you actually do come up with 
something that needs addressing beyond what sort can already do when 
invoked correctly.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.          +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


To: bug-coreutils <at> gnu.org

This seems an oversight than an actual bug:

    'sort -n' thinks "B" is a larger value than "AA" -- yep! someone
forgot about hexadecimal, but binary, octal, and decimal work fine.

Suggestion: Don't revert to alphanumeric sorting until the rules are
broken by the sort field:

    1) There is an optional leading Plus(+) or Minus(-) but just one.
    2) There is an optional single Point(.) that may occur anywhere
       within the field except before an optional Plus or Minus.
    3) Numerals are limited to "0123456789ABCDEFabcdef".
    4) No white space, other letters, or other punctuation allowed or
       revert to alphanumeric sort.

Thank You,
J.B.

P.S.: Shouldn't be necessary to transform data to sort it...
      Use '-nx' or '-gx' if you must, but it shouldn't be needed.



   
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#30661; Package coreutils. (Fri, 02 Mar 2018 01:53:01 GMT) Full text and rfc822 format available.

Message #18 received at 30661 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: James Bunke <james_a_bunke <at> yahoo.com>,
 "30661 <at> debbugs.gnu.org" <30661 <at> debbugs.gnu.org>
Subject: Re: bug#30661: closed (Re: bug#30661: sort)
Date: Thu, 1 Mar 2018 19:52:24 -0600
reopen 30661
retitle 30661 RFE: Add way for sort to handle hex numbers
tag 30661 -notabug
thanks

On 03/01/2018 05:29 PM, James Bunke wrote:
> $ echo -e "170\n11" | sort -n

echo -e is not portable; printf is better.

> 11
> 170
> $ echo -e "AA\nB" | sort -n
> AA
> B
> $ echo -e "0xAA\n0xB" | sort -n
> 0xAA
> 0xB

Again, 'sort --debug' is your friend:

$ printf '0xAA\n0xB\n' | LC_ALL=C sort -n --debug
sort: using simple byte comparison
0xAA
_
____
0xB
_
___

The numeric sort key parses '0' and stops at 'x', because it does NOT 
parse hexadecimal.

Here's what POSIX has to say about -n:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html

"-n
    Restrict the sort key to an initial numeric string, consisting of 
optional <blank> characters, optional <hyphen-minus> character, and zero 
or more digits with an optional radix character and thousands separators 
(as defined in the current locale), which shall be sorted by arithmetic 
value. An empty digit string shall be treated as zero. Leading zeros and 
signs on zeros shall not affect ordering."

Which does not directly mention "decimal", but the mention of a radix 
character (as in '1.2' or '1,2', depending on locale) pretty much 
implies decimal, as radix characters are only output by printf when 
printing floating point values in a decimal format.

> 
> Perhaps its the documentation that is lacking as I find no reference to hexadecimal ineither the "man" or "info" on sort -- can it sort hexadecimal?

-n cannot.  You are correct that we could improve the info page to make 
it explicit that -n sorts based on decimal values.  You also raise a 
good point that it may be worth adding a new sorting option that sorts 
by hexadecimal.  Although the existing practice of 
decorate/sort/undecorate to [temporarily] convert hex into decimal 
before sorting is going to be more portable, being able to directly sort 
hex does seem like something that may be worthwhile.

> No information  on whatsort considers to be a "numeral" or expects hexadecimal to be represented. I was justattempting to skip extra processes to convert the data or to write my own sort process.
> Thank you for your efforts on my behalf. Do you know who handles the documentation?Maybe there is newer man/info than on this old machine.

The info documentation is part of coreutils.git, so you've reached the 
right place.  I'm going to reopen and retitle this bug to request the 
ability to do hex sorting.


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 02 Mar 2018 01:53:02 GMT) Full text and rfc822 format available.

Changed bug title to 'RFE: Add way for sort to handle hex numbers' from 'sort' Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Fri, 02 Mar 2018 01:53:02 GMT) Full text and rfc822 format available.

Removed tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Fri, 02 Mar 2018 02:06:02 GMT) Full text and rfc822 format available.

Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 02:51:02 GMT) Full text and rfc822 format available.

Changed bug title to 'sort: add sort-by-hex-number feature' from 'RFE: Add way for sort to handle hex numbers' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 02:51:02 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 319 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.