GNU bug report logs -
#35636
bug report sort command
Previous Next
Reported by: Michele Liberi <mliberi <at> gmail.com>
Date: Wed, 8 May 2019 14:29:02 UTC
Severity: normal
Tags: notabug
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your bug report
#35636: bug report sort command
which was filed against the coreutils package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 35636 <at> debbugs.gnu.org.
--
35636: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=35636
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
[Message part 3 (text/plain, inline)]
tag 35636 notabug
thanks
On 5/8/19 3:35 AM, Michele Liberi wrote:
> I verified the following bug is there in:
>
> - sort (GNU coreutils) 8.21
> - sort (GNU coreutils) 8.22
> - sort (GNU coreutils) 8.23
>
> *Input file:*
> # cat sort.in
> 1|a|x
> 2|b|x
> 3|aa|x
> 4|bb|x
> 5|c|x
>
>
> *shell command and output:*
> # sort -t'|' -k2 <sort.in
> 3|aa|x
> 1|a|x
> 4|bb|x
> 2|b|x
> 5|c|x
Let's use --debug to see what sort really did:
$ sort --debug -t'|' -k2 <sort.in
sort: using ‘en_US.UTF-8’ sorting rules
3|aa|x
____
______
1|a|x
___
_____
4|bb|x
____
______
2|b|x
___
_____
5|c|x
___
_____
Since you did not specify an ending field, you are comparing the string
"aa|x" with "a|x", and the string "a|x" with "bb|x"; in the en_US.UTF-8
locale, punctuation is ignored on the first-order pass through
strcoll(), which means you are effectively comparing "aax" with "ax"
with "bbx", and the sort is correct; but even in a locale that does not
ignore punctuation:
$ LC_ALL=C sort --debug -t'|' -k2 <sort.in
sort: using simple byte comparison
3|aa|x
____
______
1|a|x
___
_____
4|bb|x
____
______
2|b|x
___
_____
5|c|x
___
_____
the sort is still correct, since ASCII '|' sorts after ASCII 'a'. Your
real problem is that you are sorting on too much data; you need to try
again with the key limited to exactly the second field:
$ sort --debug -t'|' -k2,2 <sort.in
sort: using ‘en_US.UTF-8’ sorting rules
1|a|x
_
_____
3|aa|x
__
______
2|b|x
_
_____
4|bb|x
__
______
5|c|x
_
_____
where now sort can see that "a" is a prefix of "aa" because it is no
longer bleeding on to the rest of the line.
>
> *I expected that key "a" to come before key "aa" and key "b" to come before
> key "bb".*
Your expectations are at odds with your incomplete command line. sort
is behaving as required; therefore, I'm closing this as not a bug. But
feel free to reply if you have further questions.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
[signature.asc (application/pgp-signature, attachment)]
[Message part 5 (message/rfc822, inline)]
[Message part 6 (text/plain, inline)]
I verified the following bug is there in:
- sort (GNU coreutils) 8.21
- sort (GNU coreutils) 8.22
- sort (GNU coreutils) 8.23
*Input file:*
# cat sort.in
1|a|x
2|b|x
3|aa|x
4|bb|x
5|c|x
*shell command and output:*
# sort -t'|' -k2 <sort.in
3|aa|x
1|a|x
4|bb|x
2|b|x
5|c|x
*I expected that key "a" to come before key "aa" and key "b" to come before
key "bb".*
[Message part 7 (text/html, inline)]
This bug report was last modified 6 years and 52 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.