GNU bug report logs - #9995
problem about sort -u -k

Previous Next

Package: coreutils;

Reported by: 夏凯 <walkerxk <at> gmail.com>

Date: Tue, 8 Nov 2011 17:25:15 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: 夏凯 <walkerxk <at> gmail.com>
To: 9995 <at> debbugs.gnu.org
Subject: bug#9995: problem about sort -u -k
Date: Wed, 9 Nov 2011 22:02:26 +0800
thanks for you reply.
if i want to use the entire line as a key, and sort by the third
field, whether should i use sort -u -k3 -k1 -k2 a to do that?

On Wed, Nov 9, 2011 at 03:45, Eric Blake <eblake <at> redhat.com> wrote:
> On 11/08/2011 11:54 AM, Eric Blake wrote:
>>>
>>> 22:41:39#tp#~> /usr/local/bin/sort -u -k1,3 a
>>> 1 a q
>>> 1 a w
>>> 3 a w
>>> 22:41:48#tp#~> /usr/local/bin/sort -u -k3 a
>>> 1 a q
>>> 1 a w
>
>> Since you didn't tell us what output you were hoping to get, I can't
>> tell you the proper command line that would match your expected output.
>> Feel free to reply, even while this bug is closed, if you need more help
>> in getting the output you want.
>
> I'll give a preemptive attempt at guessing what you meant, as well:
>
> If you wanted to sort on just the third and subsequent fields, but then
> strip duplicate lines only if the entire line is duplicate, then you have to
> use two processes:
>
> sort [-s] -k3 a | uniq
>
> If you don't mind a two-key sort, where the primary key is the third and
> subsequent fields, but where the secondary key is the entire line so as to
> force sort -u to consider the entire line when determining uniqueness, then
> one process will do:
>
> sort -u -k3 -k1 a
>
> To see the difference, and remembering that sort -u implies sort -s,
> consider these contents for a:
>
> $ cat a
> 1 a q
> 2 a q
> 1 a q
> 1 a w
> 3 a w
> $ sort -u -k3 -k1 a
> 1 a q
> 2 a q
> 1 a w
> 3 a w
> $ sort -s -k3 a | uniq
> 1 a q
> 2 a q
> 1 a q
> 1 a w
> 3 a w
> $ sort -k3 a | uniq
> 1 a q
> 2 a q
> 1 a w
> 3 a w
>
> That is, if the stable sort of just -k3 leaves identical lines that are not
> adjacent ("1 a q" in my example), then the separate uniq process won't
> filter them; while using sort -u with -k1 as the means to force the entire
> line as a secondary sort key loses the ability to leave identical lines
> separated by a distinct line.  Likewise, omitting both -s and -u lets sort
> imply a last-resort -k1, at which point uniq sees the same line order as
> sort -u sees.
>
>>> i read
>>> http://www.gnu.org/s/coreutils/manual/html_node/sort-invocation.html,
>>> but got nothing about this.
>
> Actually, it does - under the option -u, I see:
>
> The commands sort -u and sort | uniq are equivalent, but this equivalence
> does not extend to arbitrary sort options. For example, sort -n -u inspects
> only the value of the initial numeric string when checking for uniqueness,
> whereas sort -n | uniq inspects the entire line. See uniq invocation.
>
> --
> Eric Blake   eblake <at> redhat.com    +1-801-349-2682
> Libvirt virtualization library http://libvirt.org
>



-- 
contact me:
MSN: walkerxk <at> gmail.com
GTALK: walkerxk <at> gmail.com




This bug report was last modified 13 years and 255 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.