GNU bug report logs -
#9995
problem about sort -u -k
Previous Next
Reported by: 夏凯 <walkerxk <at> gmail.com>
Date: Tue, 8 Nov 2011 17:25:15 UTC
Severity: normal
Tags: notabug
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
Full log
Message #15 received at 9995 <at> debbugs.gnu.org (full text, mbox):
On 11/08/2011 11:54 AM, Eric Blake wrote:
>> 22:41:39#tp#~> /usr/local/bin/sort -u -k1,3 a
>> 1 a q
>> 1 a w
>> 3 a w
>> 22:41:48#tp#~> /usr/local/bin/sort -u -k3 a
>> 1 a q
>> 1 a w
> Since you didn't tell us what output you were hoping to get, I can't
> tell you the proper command line that would match your expected output.
> Feel free to reply, even while this bug is closed, if you need more help
> in getting the output you want.
I'll give a preemptive attempt at guessing what you meant, as well:
If you wanted to sort on just the third and subsequent fields, but then
strip duplicate lines only if the entire line is duplicate, then you
have to use two processes:
sort [-s] -k3 a | uniq
If you don't mind a two-key sort, where the primary key is the third and
subsequent fields, but where the secondary key is the entire line so as
to force sort -u to consider the entire line when determining
uniqueness, then one process will do:
sort -u -k3 -k1 a
To see the difference, and remembering that sort -u implies sort -s,
consider these contents for a:
$ cat a
1 a q
2 a q
1 a q
1 a w
3 a w
$ sort -u -k3 -k1 a
1 a q
2 a q
1 a w
3 a w
$ sort -s -k3 a | uniq
1 a q
2 a q
1 a q
1 a w
3 a w
$ sort -k3 a | uniq
1 a q
2 a q
1 a w
3 a w
That is, if the stable sort of just -k3 leaves identical lines that are
not adjacent ("1 a q" in my example), then the separate uniq process
won't filter them; while using sort -u with -k1 as the means to force
the entire line as a secondary sort key loses the ability to leave
identical lines separated by a distinct line. Likewise, omitting both
-s and -u lets sort imply a last-resort -k1, at which point uniq sees
the same line order as sort -u sees.
>> i read
http://www.gnu.org/s/coreutils/manual/html_node/sort-invocation.html,
>> but got nothing about this.
Actually, it does - under the option -u, I see:
The commands sort -u and sort | uniq are equivalent, but this
equivalence does not extend to arbitrary sort options. For example, sort
-n -u inspects only the value of the initial numeric string when
checking for uniqueness, whereas sort -n | uniq inspects the entire
line. See uniq invocation.
--
Eric Blake eblake <at> redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
This bug report was last modified 13 years and 255 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.