GNU bug report logs -
#18273
sort seems to misbehave if both -u and -n or -k are used
Previous Next
Full log
Message #21 received at 18273 <at> debbugs.gnu.org (full text, mbox):
On Fri, Aug 15, 2014 at 02:32:14PM -0600, Eric Blake wrote:
> 'info sort' says:
>
> The '--stable' ('-s') option
> disables this "last-resort comparison" so that lines in which all fields
> compare equal are left in their original relative order. The '--unique'
> ('-u') option also disables the last-resort comparison.
>
> and later on:
>
> '-u'
> '--unique'
>
> Normally, output only the first of a sequence of lines that compare
> equal. For the '--check' ('-c' or '-C') option, check that no pair
> of consecutive lines compares equal.
>
> This option also disables the default last-resort comparison.
>
> The commands 'sort -u' and 'sort | uniq' are equivalent, but this
> equivalence does not extend to arbitrary 'sort' options. For
> example, 'sort -n -u' inspects only the value of the initial
> numeric string when checking for uniqueness, whereas 'sort -n |
> uniq' inspects the entire line. *Note uniq invocation::.
OK I guess that does somewhat point out the behaviour.
> -u is the only option that implicitly enables -s.
>
> You are welcome to propose a patch to the documentation that would
> clarify the situation; we can reopen this bug if a patch materializes.
> Maybe even a change to 'sort --help' output to mention that -u implies
> -s (which would also feed the 'man sort' page).
I do wonder why there isn't an option to undo that implicit option,
but perhaps it would not actually make sense.
> The info page DOES mention this:
>
> '-n'
> '--numeric-sort'
> '--sort=numeric'
> Sort numerically. The number begins each line and consists of
> optional blanks, an optional '-' sign, and zero or more digits
> possibly separated by thousands separators, optionally followed by
> a decimal-point character and zero or more digits. An empty number
> is treated as '0'. The 'LC_NUMERIC' locale specifies the
> decimal-point character and thousands separator. By default a
> blank is a space or a tab, but the 'LC_CTYPE' locale can change
> this.
>
> The --help output is intentionally terse, so I don't know what we could
> do there to make it more obvious without exploding the size of what is
> supposed to be brief.
Well I always thought info was meant to be complete documentation.
I see nothing in the above that makes me think it would ignore the part
of the line that isn't a number. The part in -u does seem to point out
that this is the behaviour.
I think this might be the first time I ever used -n when the input was
not pure numbers, so I never hit this before.
--
Len Sorensen
This bug report was last modified 10 years and 278 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.