GNU bug report logs - #18273
sort seems to misbehave if both -u and -n or -k are used

Previous Next

Package: coreutils;

Reported by: "Lennart Sorensen" <lsorense <at> csclub.uwaterloo.ca>

Date: Fri, 15 Aug 2014 19:32:02 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


Message #18 received at 18273 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Lennart Sorensen <lsorense <at> csclub.uwaterloo.ca>, 18273 <at> debbugs.gnu.org
Subject: Re: bug#18273: closed (Re: bug#18273: sort seems to misbehave if
 both -u and -n or -k are used)
Date: Fri, 15 Aug 2014 14:32:14 -0600
[Message part 1 (text/plain, inline)]
On 08/15/2014 02:22 PM, Lennart Sorensen wrote:

> OK I accept that it is correct behaviour.
> 
> The documentation on the other hand is awful in that case.  I went and
> checked the documentation to try and make sense of what it was doing
> before sending the report, and there was nothing there that gave any
> hint that this was expected behaviour.

'info sort' says:

  The '--stable' ('-s') option
disables this "last-resort comparison" so that lines in which all fields
compare equal are left in their original relative order.  The '--unique'
('-u') option also disables the last-resort comparison.

and later on:

'-u'
'--unique'

     Normally, output only the first of a sequence of lines that compare
     equal.  For the '--check' ('-c' or '-C') option, check that no pair
     of consecutive lines compares equal.

     This option also disables the default last-resort comparison.

     The commands 'sort -u' and 'sort | uniq' are equivalent, but this
     equivalence does not extend to arbitrary 'sort' options.  For
     example, 'sort -n -u' inspects only the value of the initial
     numeric string when checking for uniqueness, whereas 'sort -n |
     uniq' inspects the entire line.  *Note uniq invocation::.


> 
> Why does it have a blob talking about which options implicitly enable -s,
> rather than mention that in the documentation for the options that do it.

-u is the only option that implicitly enables -s.

You are welcome to propose a patch to the documentation that would
clarify the situation; we can reopen this bug if a patch materializes.
Maybe even a change to 'sort --help' output to mention that -u implies
-s (which would also feed the 'man sort' page).

> 
> Why does it not mention for -n that anything that isn't a number is
> ignored and treated as if it didn't exist when it comes to deciding
> things like uniqueness?  Are people expected to go read the posix
> standard instead?

The info page DOES mention this:

'-n'
'--numeric-sort'
'--sort=numeric'
     Sort numerically.  The number begins each line and consists of
     optional blanks, an optional '-' sign, and zero or more digits
     possibly separated by thousands separators, optionally followed by
     a decimal-point character and zero or more digits.  An empty number
     is treated as '0'.  The 'LC_NUMERIC' locale specifies the
     decimal-point character and thousands separator.  By default a
     blank is a space or a tab, but the 'LC_CTYPE' locale can change
     this.

The --help output is intentionally terse, so I don't know what we could
do there to make it more obvious without exploding the size of what is
supposed to be brief.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

This bug report was last modified 10 years and 278 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.