GNU bug report logs - #9995
problem about sort -u -k

Previous Next

Package: coreutils;

Reported by: 夏凯 <walkerxk <at> gmail.com>

Date: Tue, 8 Nov 2011 17:25:15 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


Message #15 received at 9995 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: 9995 <at> debbugs.gnu.org, 夏凯 <walkerxk <at> gmail.com>
Subject: Re: bug#9995: problem about sort -u -k
Date: Tue, 08 Nov 2011 12:45:11 -0700
On 11/08/2011 11:54 AM, Eric Blake wrote:
>> 22:41:39#tp#~> /usr/local/bin/sort -u -k1,3 a
>> 1 a q
>> 1 a w
>> 3 a w
>> 22:41:48#tp#~> /usr/local/bin/sort -u -k3 a
>> 1 a q
>> 1 a w

> Since you didn't tell us what output you were hoping to get, I can't
> tell you the proper command line that would match your expected output.
> Feel free to reply, even while this bug is closed, if you need more help
> in getting the output you want.

I'll give a preemptive attempt at guessing what you meant, as well:

If you wanted to sort on just the third and subsequent fields, but then 
strip duplicate lines only if the entire line is duplicate, then you 
have to use two processes:

sort [-s] -k3 a | uniq

If you don't mind a two-key sort, where the primary key is the third and 
subsequent fields, but where the secondary key is the entire line so as 
to force sort -u to consider the entire line when determining 
uniqueness, then one process will do:

sort -u -k3 -k1 a

To see the difference, and remembering that sort -u implies sort -s, 
consider these contents for a:

$ cat a
1 a q
2 a q
1 a q
1 a w
3 a w
$ sort -u -k3 -k1 a
1 a q
2 a q
1 a w
3 a w
$ sort -s -k3 a | uniq
1 a q
2 a q
1 a q
1 a w
3 a w
$ sort -k3 a | uniq
1 a q
2 a q
1 a w
3 a w

That is, if the stable sort of just -k3 leaves identical lines that are 
not adjacent ("1 a q" in my example), then the separate uniq process 
won't filter them; while using sort -u with -k1 as the means to force 
the entire line as a secondary sort key loses the ability to leave 
identical lines separated by a distinct line.  Likewise, omitting both 
-s and -u lets sort imply a last-resort -k1, at which point uniq sees 
the same line order as sort -u sees.

>> i read 
http://www.gnu.org/s/coreutils/manual/html_node/sort-invocation.html,
>> but got nothing about this.

Actually, it does - under the option -u, I see:

The commands sort -u and sort | uniq are equivalent, but this 
equivalence does not extend to arbitrary sort options. For example, sort 
-n -u inspects only the value of the initial numeric string when 
checking for uniqueness, whereas sort -n | uniq inspects the entire 
line. See uniq invocation.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org




This bug report was last modified 13 years and 255 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.