GNU bug report logs - #18073
defect with sort multiple arguments

Previous Next

Package: coreutils;

Reported by: n buckner <bucknerns <at> gmail.com>

Date: Mon, 21 Jul 2014 20:52:04 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


Message #7 received at control <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: n buckner <bucknerns <at> gmail.com>, 18073-done <at> debbugs.gnu.org
Subject: Re: bug#18073: defect with sort multiple arguments
Date: Mon, 21 Jul 2014 15:13:05 -0600
[Message part 1 (text/plain, inline)]
tag 18073 notabug
thanks

On 07/21/2014 01:57 PM, n buckner wrote:
> I was seeing some odd behaviour with sort -n -u.  I ran sort -n -u dataset
> and expected the same output as sort -n dataset| uniq but instead got
> something different.  sortbug is a script file showing the usage described
> above, dataset is the dataset.
> here is the version I am running.
> 
> sort (GNU coreutils) 8.21

Thanks for the report.  However, the problem is not in sort, but in your
usage of the command line parameters to sort.  Let's use the --debug
flag to see what is REALLY going on:

$ sort -n -u dataset --debug
sort: using ‘en_US.UTF-8’ sorting rules
2012-09-07 (Srikrishna Bodanapu
____
2013-06-15 (Chetana Nair
____
2014-02-24 (Subba Juturi
____

Aha - sort's -u says to declare lines unique ONLY if they differ on the
sort keys you specified, and disregarding any portion of the line that
didn't match your specified sort keys.  But the sort key you specified,
-n, ends as soon as it hits a non-numeric character.  If you WANT to
sort the entire line, then you need to do something like:

sort -k1,1n -k1 -u dataset

which says to sort _first_ by numeric (which ends on the first non-digit
character of each line), and _second_ by the entire line; and then
filter out for unique lines.  Adding the second key over the entire line
makes the difference that matches what you were seeing with uniq:

$ diff -u <(sort -k1,1n -k1 dataset -u) <(sort -n dataset | uniq)
$

Oh, and if you wanted to sort by all three fields of the date, instead
of just the year, you probably want:

sort -t - -k1,1n -k2,2n -k3,3n -k1 -u dataset

although for the particular dataset you posted, it makes no difference.

I'm closing this as not a bug, but please feel free to reply if you have
further questions.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

This bug report was last modified 10 years and 305 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.