tag 18073 notabug thanks On 07/21/2014 01:57 PM, n buckner wrote: > I was seeing some odd behaviour with sort -n -u. I ran sort -n -u dataset > and expected the same output as sort -n dataset| uniq but instead got > something different. sortbug is a script file showing the usage described > above, dataset is the dataset. > here is the version I am running. > > sort (GNU coreutils) 8.21 Thanks for the report. However, the problem is not in sort, but in your usage of the command line parameters to sort. Let's use the --debug flag to see what is REALLY going on: $ sort -n -u dataset --debug sort: using ‘en_US.UTF-8’ sorting rules 2012-09-07 (Srikrishna Bodanapu ____ 2013-06-15 (Chetana Nair ____ 2014-02-24 (Subba Juturi ____ Aha - sort's -u says to declare lines unique ONLY if they differ on the sort keys you specified, and disregarding any portion of the line that didn't match your specified sort keys. But the sort key you specified, -n, ends as soon as it hits a non-numeric character. If you WANT to sort the entire line, then you need to do something like: sort -k1,1n -k1 -u dataset which says to sort _first_ by numeric (which ends on the first non-digit character of each line), and _second_ by the entire line; and then filter out for unique lines. Adding the second key over the entire line makes the difference that matches what you were seeing with uniq: $ diff -u <(sort -k1,1n -k1 dataset -u) <(sort -n dataset | uniq) $ Oh, and if you wanted to sort by all three fields of the date, instead of just the year, you probably want: sort -t - -k1,1n -k2,2n -k3,3n -k1 -u dataset although for the particular dataset you posted, it makes no difference. I'm closing this as not a bug, but please feel free to reply if you have further questions. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org