GNU bug report logs -
#35636
bug report sort command
Previous Next
Reported by: Michele Liberi <mliberi <at> gmail.com>
Date: Wed, 8 May 2019 14:29:02 UTC
Severity: normal
Tags: notabug
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your message dated Wed, 8 May 2019 09:41:58 -0500
with message-id <42626a2d-9179-32b4-b2ef-e68d931463e3 <at> redhat.com>
and subject line Re: bug#35636: bug report sort command
has caused the debbugs.gnu.org bug report #35636,
regarding bug report sort command
to be marked as done.
(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)
--
35636: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=35636
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
[Message part 3 (text/plain, inline)]
I verified the following bug is there in:
- sort (GNU coreutils) 8.21
- sort (GNU coreutils) 8.22
- sort (GNU coreutils) 8.23
*Input file:*
# cat sort.in
1|a|x
2|b|x
3|aa|x
4|bb|x
5|c|x
*shell command and output:*
# sort -t'|' -k2 <sort.in
3|aa|x
1|a|x
4|bb|x
2|b|x
5|c|x
*I expected that key "a" to come before key "aa" and key "b" to come before
key "bb".*
[Message part 4 (text/html, inline)]
[Message part 5 (message/rfc822, inline)]
[Message part 6 (text/plain, inline)]
tag 35636 notabug
thanks
On 5/8/19 3:35 AM, Michele Liberi wrote:
> I verified the following bug is there in:
>
> - sort (GNU coreutils) 8.21
> - sort (GNU coreutils) 8.22
> - sort (GNU coreutils) 8.23
>
> *Input file:*
> # cat sort.in
> 1|a|x
> 2|b|x
> 3|aa|x
> 4|bb|x
> 5|c|x
>
>
> *shell command and output:*
> # sort -t'|' -k2 <sort.in
> 3|aa|x
> 1|a|x
> 4|bb|x
> 2|b|x
> 5|c|x
Let's use --debug to see what sort really did:
$ sort --debug -t'|' -k2 <sort.in
sort: using ‘en_US.UTF-8’ sorting rules
3|aa|x
____
______
1|a|x
___
_____
4|bb|x
____
______
2|b|x
___
_____
5|c|x
___
_____
Since you did not specify an ending field, you are comparing the string
"aa|x" with "a|x", and the string "a|x" with "bb|x"; in the en_US.UTF-8
locale, punctuation is ignored on the first-order pass through
strcoll(), which means you are effectively comparing "aax" with "ax"
with "bbx", and the sort is correct; but even in a locale that does not
ignore punctuation:
$ LC_ALL=C sort --debug -t'|' -k2 <sort.in
sort: using simple byte comparison
3|aa|x
____
______
1|a|x
___
_____
4|bb|x
____
______
2|b|x
___
_____
5|c|x
___
_____
the sort is still correct, since ASCII '|' sorts after ASCII 'a'. Your
real problem is that you are sorting on too much data; you need to try
again with the key limited to exactly the second field:
$ sort --debug -t'|' -k2,2 <sort.in
sort: using ‘en_US.UTF-8’ sorting rules
1|a|x
_
_____
3|aa|x
__
______
2|b|x
_
_____
4|bb|x
__
______
5|c|x
_
_____
where now sort can see that "a" is a prefix of "aa" because it is no
longer bleeding on to the rest of the line.
>
> *I expected that key "a" to come before key "aa" and key "b" to come before
> key "bb".*
Your expectations are at odds with your incomplete command line. sort
is behaving as required; therefore, I'm closing this as not a bug. But
feel free to reply if you have further questions.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
[signature.asc (application/pgp-signature, attachment)]
This bug report was last modified 6 years and 13 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.