GNU bug report logs - #23556
sort(1): misleading description of option -n

Previous Next

Package: coreutils;

Reported by: Carsten Hey <carsten <at> debian.org>

Date: Mon, 16 May 2016 18:29:01 UTC

Severity: normal

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 23556 in the body.
You can then email your comments to 23556 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#23556; Package coreutils. (Mon, 16 May 2016 18:29:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Carsten Hey <carsten <at> debian.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 16 May 2016 18:29:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Carsten Hey <carsten <at> debian.org>
To: bug-coreutils <at> gnu.org
Subject: sort(1): misleading description of option -n
Date: Sat, 14 May 2016 16:17:35 +0200
Hi,

the man page sort(1) contains a misleading description of the option -n:

    $ lsb_release -ic
    Distributor ID: Debian
    Codename:       jessie

    $ sort --version | sed -n 1p
    sort (GNU coreutils) 8.23

    $ man sort | grep -A1 -- --numeric-sort | sed -n -e 's/^ *//' -e '1!p'
    compare according to string numerical value

According to Ubuntu's web page, this string is also in their package
coreutils_8.25-2ubuntu2_i386.


This description reads as if this command:

    $ printf '%s\n' 'x 9' 'x 10' | sort -n
    x 10
    x 9

… would produce the output of this command:

    $ printf '%s\n' 'x 9' 'x 10' | sort -V
    x 9
    x 10

…, but instead, -n stops doing its magic after finding the first
non-numeric, non-whitespace character.  There is a short and simple
way to summarize this behaviour.


If you use Cyanogenmod on your mobile phone, you carry a minor
programming error in your pocket that is very likely caused by this
misleading description of -n.


Regards
Carsten




Information forwarded to bug-coreutils <at> gnu.org:
bug#23556; Package coreutils. (Mon, 16 May 2016 19:09:01 GMT) Full text and rfc822 format available.

Message #8 received at 23556 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Carsten Hey <carsten <at> debian.org>, 23556 <at> debbugs.gnu.org
Subject: Re: bug#23556: sort(1): misleading description of option -n
Date: Mon, 16 May 2016 15:07:59 -0400
Hello Carsten,

On 05/14/2016 10:17 AM, Carsten Hey wrote:
> the man page sort(1) contains a misleading description of the option -n:
[...]
>
>      $ man sort | grep -A1 -- --numeric-sort | sed -n -e 's/^ *//' -e '1!p'
>      compare according to string numerical value
[...]
> This description reads as if this command:
>
> $ printf '%s\n' 'x 9' 'x 10' | sort -n
> x 10
> x 9
[...]
> but instead, -n stops doing its magic after finding the first
> non-numeric, non-whitespace character. There is a short and simple
> way to summarize this behaviour.

IIUC, you are disputing the accuracy (or clarity) of the term "string numerical value" on the manual page,
and not the actual behavior of "sort -n" (which is mandated by posix and has been this way for many many years,
as opposed to "sort -V" which was only introduced as GNU extension in coreutils version 7.0 in 2008).

The description says "string numeric value" - which (to me) does not mean anything other than numeric value
(implying letters will not be sorted properly), but opinions clearly differ.
Using the "--debug" option would immediately reveal the error:

    $ printf '%s\n' 'x 9' 'x 10' | sort --debug -n
    sort: using ‘en_US.UTF-8’ sorting rules
    x 10
    ^ no match for key
    ____
    x 9
    ^ no match for key
    ___


If you have a suggestion for improved wording, I'm sure they can be considered for inclusion.
A patch against function usage() in sort.c would go even a longer way.
note that unlike FreeBSD/OpenBSD, the description in the man page is derived from "sort --help",
and thus kept brief.

For completeness, here are similar descriptions of "sort -n" from other sources:

POSIX says (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html):
   -n    Restrict the sort key to an initial numeric string, consisting of optional
         <blank> characters, optional minus-sign, and zero or more digits with an
         optional radix character and thousands separators (as defined in the current
         locale), which shall be sorted by arithmetic value. An empty digit string
         shall be treated as zero. Leading zeros and signs on zeros shall not affect ordering.

The GNU Coreutils manual (which is the official documentation, not the man page) says:
(http://www.gnu.org/software/coreutils/manual/coreutils.html#sort-invocation)
  -n
  --numeric-sort
  --sort=numeric
      Sort numerically. The number begins each line and consists of optional blanks,
      an optional ‘-’ sign, and zero or more digits   possibly separated by thousands
      separators, optionally followed by a decimal-point character and zero or more digits.
      An empty number is treated as ‘0’. The LC_NUMERIC locale specifies the decimal-point
      character and thousands separator. By default a blank is a space or a tab, but
      the LC_CTYPE locale can change this.


OpenBSD's man page has:
     -n, --numeric-sort, --sort=numeric
             An initial numeric string, consisting of optional blank space,
             optional minus sign, and zero or more digits (including decimal
             point) is sorted by arithmetic value.  Leading blank characters
             are ignored.

FreeBSD's man page has:
     -n, --numeric-sort, --sort=numeric
             Sort fields numerically by arithmetic value.  Fields are supposed
             to have optional blanks in the beginning, an optional minus sign,
             zero or more digits (including decimal point and possible thou-
             sand separators).



I'm leaving the bug open, other comments and feedback welcomed.

regards,
 - assaf






Information forwarded to bug-coreutils <at> gnu.org:
bug#23556; Package coreutils. (Mon, 16 May 2016 19:36:02 GMT) Full text and rfc822 format available.

Message #11 received at 23556 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Carsten Hey <carsten <at> debian.org>, 23556 <at> debbugs.gnu.org
Subject: Re: bug#23556: sort(1): misleading description of option -n
Date: Mon, 16 May 2016 13:35:48 -0600
[Message part 1 (text/plain, inline)]
On 05/14/2016 08:17 AM, Carsten Hey wrote:
> Hi,
> 
> the man page sort(1) contains a misleading description of the option -n:

>     compare according to string numerical value

That sounds accurate to me, although as Assaf pointed out, suggested
wording improvements are welcome.

>     $ printf '%s\n' 'x 9' 'x 10' | sort -n
>     x 10
>     x 9

The numerical value of "x 10" (that is, the equivalent to the value of
atoi("x 10")), is "0".  To check:

$ printf '%s\n' 'x 9' 'x 10' '1' '-1' | sort -n
-1
x 10
x 9
1

If you want to sort by the second column only, then use:

$ printf '%s\n' 'x 9' 'x 10' | sort -n -k 2,2
x 9
x 10

That is, by adding the -k option, you can limit the text being sorted to
the portion of the line containing the numerical value, rather than the
entire line, so as to avoid a numerical value of 0 when hitting a
non-numeric portion of the line.

> …, but instead, -n stops doing its magic after finding the first
> non-numeric, non-whitespace character.

Because that is how it has always behaved, and how POSIX requires it to
behave.

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#23556; Package coreutils. (Mon, 16 May 2016 22:24:02 GMT) Full text and rfc822 format available.

Message #14 received at 23556 <at> debbugs.gnu.org (full text, mbox):

From: Carsten Hey <carsten <at> debian.org>
To: Assaf Gordon <assafgordon <at> gmail.com>, 23556 <at> debbugs.gnu.org
Subject: Re: bug#23556: sort(1): misleading description of option -n
Date: Tue, 17 May 2016 00:23:45 +0200
Hi,

looks like this weekend's (routing?) problems related to mail delivery
to @gnu.org addresses, which were also noticed by at least three other
people, are resolved now :)

I forwarded my mail to the Debian BTS (since it didn't get delivered at
first), but the people that read it there did not get its actual
intention (and if multiple people do so, the sender likely did something
wrong).  Reading this Debian bug would not provide any additional useful
information.

* Assaf Gordon [2016-05-16 15:07 -0400]:
> On 05/14/2016 10:17 AM, Carsten Hey wrote:
> >the man page sort(1) contains a misleading description of the option -n:
> [...]
> >
> >     $ man sort | grep -A1 -- --numeric-sort | sed -n -e 's/^ *//' -e '1!p'
> >     compare according to string numerical value
> [...]
> >This description reads as if this command:
> >
> >$ printf '%s\n' 'x 9' 'x 10' | sort -n
> >x 10
> >x 9
> [...]
> >but instead, -n stops doing its magic after finding the first
> >non-numeric, non-whitespace character. There is a short and simple
> >way to summarize this behaviour.
>
> IIUC, you are disputing the accuracy (or clarity) of the term "string
> numerical value" on the manual page, and not the actual behavior of
> "sort -n" …

Exactly.  It seems like many people have problems to understand mails
that contain code, but are neither a patch nor complain about the
behaviour of a program - maybe my wording was suboptimal too.  I'll
consider this in future when I write similar mails.

> The description says "string numeric value" - which (to me) does not
> mean anything other than numeric value (implying letters will not be
> sorted properly), but opinions clearly differ.

We all know that '1st', '2nd', '3rd', …, '9th', '10th' and so on are
sorted in this order if -n is used.  We also know that piping lines that
match ./[0-9][0-9]* to sort -n (which happens every time I upgrade my
mobile's operating system) is a useless use of -n.

Neither the description of -n in the man page, nor the explanation how
you would read these words explain this difference in any way, at least
without an additional definition of "string numeric value".

> If you have a suggestion for improved wording, I'm sure they can be
> considered for inclusion.

If I would not know that "string numeric value" is proper English,
I wouldn't consider it to be correct, hence I'm likely the wrong person
to suggest a concrete wording, …

> the description in the man page is derived from "sort --help",
> and thus kept brief.

…, especially, when it needs to be that short.

I assume that a brief and proper description would either contain the
word "beginning" or the word "initial".  Suggestions I have at the
moment are "sort a text's initial numeric parts", "sort initial numeric
parts of a text" and "sort numbers at the beginning of a text", but
a native speaker is likely able to find something better and more
correct (actually, the initial numeric parts aren't sorted, they are
used as sorting key).

Due to the need to translate those stings, a change after a release
might be better than a change before a release.


Thanks,
Carsten




Information forwarded to bug-coreutils <at> gnu.org:
bug#23556; Package coreutils. (Sat, 27 Oct 2018 22:42:01 GMT) Full text and rfc822 format available.

Message #17 received at 23556 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: 23556 <at> debbugs.gnu.org
Subject: Re: bug#23556: sort(1): misleading description of option -n
Date: Sat, 27 Oct 2018 16:41:23 -0600
close 23556
stop

(triaging old bugs)


On 2016-05-16 4:23 p.m., Carsten Hey wrote:
> * Assaf Gordon [2016-05-16 15:07 -0400]:
>>
>> IIUC, you are disputing the accuracy (or clarity) of the term "string
>> numerical value" on the manual page, and not the actual behavior of
>> "sort -n" …
> 
> Exactly.  It seems like many people have problems to understand mails
> that contain code, but are neither a patch nor complain about the
> behaviour of a program - maybe my wording was suboptimal too.  I'll
> consider this in future when I write similar mails.
> 
[...]


>> If you have a suggestion for improved wording, I'm sure they can be
>> considered for inclusion.
> 
> If I would not know that "string numeric value" is proper English,
> I wouldn't consider it to be correct, hence I'm likely the wrong person
> to suggest a concrete wording, …

With no further follow-ups in 2 years, I'm closing this bug.
Discussion can continue by replying to this thread,
and patches are always welcomed.

-assaf





bug closed, send any further explanations to 23556 <at> debbugs.gnu.org and Carsten Hey <carsten <at> debian.org> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sat, 27 Oct 2018 22:42:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 25 Nov 2018 12:24:09 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 210 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.