#9995 - problem about sort -u -k - GNU bug report logs

GNU bug report logs - #9995
problem about sort -u -k

Reported by: 夏凯 <walkerxk <at> gmail.com>

Date: Tue, 8 Nov 2011 17:25:15 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: 夏凯 <walkerxk <at> gmail.com> To: 9995 <at> debbugs.gnu.org Subject: bug#9995: problem about sort -u -k Date: Wed, 9 Nov 2011 22:02:26 +0800

thanks for you reply. if i want to use the entire line as a key, and sort by the third field, whether should i use sort -u -k3 -k1 -k2 a to do that? On Wed, Nov 9, 2011 at 03:45, Eric Blake <eblake <at> redhat.com> wrote: > On 11/08/2011 11:54 AM, Eric Blake wrote: >>> >>> 22:41:39#tp#~> /usr/local/bin/sort -u -k1,3 a >>> 1 a q >>> 1 a w >>> 3 a w >>> 22:41:48#tp#~> /usr/local/bin/sort -u -k3 a >>> 1 a q >>> 1 a w > >> Since you didn't tell us what output you were hoping to get, I can't >> tell you the proper command line that would match your expected output. >> Feel free to reply, even while this bug is closed, if you need more help >> in getting the output you want. > > I'll give a preemptive attempt at guessing what you meant, as well: > > If you wanted to sort on just the third and subsequent fields, but then > strip duplicate lines only if the entire line is duplicate, then you have to > use two processes: > > sort [-s] -k3 a | uniq > > If you don't mind a two-key sort, where the primary key is the third and > subsequent fields, but where the secondary key is the entire line so as to > force sort -u to consider the entire line when determining uniqueness, then > one process will do: > > sort -u -k3 -k1 a > > To see the difference, and remembering that sort -u implies sort -s, > consider these contents for a: > > $ cat a > 1 a q > 2 a q > 1 a q > 1 a w > 3 a w > $ sort -u -k3 -k1 a > 1 a q > 2 a q > 1 a w > 3 a w > $ sort -s -k3 a | uniq > 1 a q > 2 a q > 1 a q > 1 a w > 3 a w > $ sort -k3 a | uniq > 1 a q > 2 a q > 1 a w > 3 a w > > That is, if the stable sort of just -k3 leaves identical lines that are not > adjacent ("1 a q" in my example), then the separate uniq process won't > filter them; while using sort -u with -k1 as the means to force the entire > line as a secondary sort key loses the ability to leave identical lines > separated by a distinct line. Likewise, omitting both -s and -u lets sort > imply a last-resort -k1, at which point uniq sees the same line order as > sort -u sees. > >>> i read >>> http://www.gnu.org/s/coreutils/manual/html_node/sort-invocation.html, >>> but got nothing about this. > > Actually, it does - under the option -u, I see: > > The commands sort -u and sort | uniq are equivalent, but this equivalence > does not extend to arbitrary sort options. For example, sort -n -u inspects > only the value of the initial numeric string when checking for uniqueness, > whereas sort -n | uniq inspects the entire line. See uniq invocation. > > -- > Eric Blake eblake <at> redhat.com +1-801-349-2682 > Libvirt virtualization library http://libvirt.org > -- contact me: MSN: walkerxk <at> gmail.com GTALK: walkerxk <at> gmail.com

This bug report was last modified 13 years and 255 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #9995 problem about sort -u -k

GNU bug report logs - #9995
problem about sort -u -k