GNU bug report logs -
#18073
defect with sort multiple arguments
Previous Next
Reported by: n buckner <bucknerns <at> gmail.com>
Date: Mon, 21 Jul 2014 20:52:04 UTC
Severity: normal
Tags: notabug
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18073 in the body.
You can then email your comments to 18073 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#18073
; Package
coreutils
.
(Mon, 21 Jul 2014 20:52:04 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
n buckner <bucknerns <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Mon, 21 Jul 2014 20:52:05 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I was seeing some odd behaviour with sort -n -u. I ran sort -n -u dataset
and expected the same output as sort -n dataset| uniq but instead got
something different. sortbug is a script file showing the usage described
above, dataset is the dataset.
here is the version I am running.
sort (GNU coreutils) 8.21
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and Paul Eggert.
Thanks,
Nathan
[Message part 2 (text/html, inline)]
[dataset (application/octet-stream, attachment)]
[sortbug (application/octet-stream, attachment)]
Added tag(s) notabug.
Request was from
Eric Blake <eblake <at> redhat.com>
to
control <at> debbugs.gnu.org
.
(Mon, 21 Jul 2014 21:14:02 GMT)
Full text and
rfc822 format available.
Reply sent
to
Eric Blake <eblake <at> redhat.com>
:
You have taken responsibility.
(Mon, 21 Jul 2014 21:14:03 GMT)
Full text and
rfc822 format available.
Notification sent
to
n buckner <bucknerns <at> gmail.com>
:
bug acknowledged by developer.
(Mon, 21 Jul 2014 21:14:04 GMT)
Full text and
rfc822 format available.
Message #12 received at 18073-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
tag 18073 notabug
thanks
On 07/21/2014 01:57 PM, n buckner wrote:
> I was seeing some odd behaviour with sort -n -u. I ran sort -n -u dataset
> and expected the same output as sort -n dataset| uniq but instead got
> something different. sortbug is a script file showing the usage described
> above, dataset is the dataset.
> here is the version I am running.
>
> sort (GNU coreutils) 8.21
Thanks for the report. However, the problem is not in sort, but in your
usage of the command line parameters to sort. Let's use the --debug
flag to see what is REALLY going on:
$ sort -n -u dataset --debug
sort: using ‘en_US.UTF-8’ sorting rules
2012-09-07 (Srikrishna Bodanapu
____
2013-06-15 (Chetana Nair
____
2014-02-24 (Subba Juturi
____
Aha - sort's -u says to declare lines unique ONLY if they differ on the
sort keys you specified, and disregarding any portion of the line that
didn't match your specified sort keys. But the sort key you specified,
-n, ends as soon as it hits a non-numeric character. If you WANT to
sort the entire line, then you need to do something like:
sort -k1,1n -k1 -u dataset
which says to sort _first_ by numeric (which ends on the first non-digit
character of each line), and _second_ by the entire line; and then
filter out for unique lines. Adding the second key over the entire line
makes the difference that matches what you were seeing with uniq:
$ diff -u <(sort -k1,1n -k1 dataset -u) <(sort -n dataset | uniq)
$
Oh, and if you wanted to sort by all three fields of the date, instead
of just the year, you probably want:
sort -t - -k1,1n -k2,2n -k3,3n -k1 -u dataset
although for the particular dataset you posted, it makes no difference.
I'm closing this as not a bug, but please feel free to reply if you have
further questions.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#18073
; Package
coreutils
.
(Tue, 22 Jul 2014 16:45:03 GMT)
Full text and
rfc822 format available.
Message #15 received at 18073 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
[re-adding the bug, with permission]
On 07/22/2014 10:14 AM, n buckner wrote:
> Sorry didn't see this at the bottom of the manpage. info coreutils 'sort
> invocation'
>
> The manpage is kind of misleading because it does not convey that at all.
>
> -n, --numeric-sort
> compare according to string numerical value
>
> -u, --unique
> with -c, check for strict ordering; without -c, output only
> the first of an equal run
>
>
If you think we can improve the documentation, to make it more obvious
that -u only covers uniqueness between keys, and that -n stops a key at
the first non-numeric character, suggestions are welcome. Remember that
the man page is generated from the --help text, and that those are
supposed to be consise; but the info page should definitely go into more
detail. And in fact, I see this in the info page:
The commands 'sort -u' and 'sort | uniq' are equivalent, but this
equivalence does not extend to arbitrary 'sort' options. For
example, 'sort -n -u' inspects only the value of the initial
numeric string when checking for uniqueness, whereas 'sort -n |
uniq' inspects the entire line. *Note uniq invocation::.
which is _exactly_ what you filed this bug report about.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 20 Aug 2014 11:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 10 years and 304 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.