GNU bug report logs -
#9562
unexpected sort behaviour
Previous Next
Full log
Message #10 received at 9562-done <at> debbugs.gnu.org (full text, mbox):
force-merge 9562 9561
tag 9562 notabug
thanks
On 09/20/2011 05:51 AM, vijay krishna wrote:
> Hello Team,
>
> May I please know the reason for the following behaviour of the sort
> command...
>
Thanks for the report; however, this is not a bug. As mentioned in the
FAQ, you are encountering this behavior because of your choice of locale:
https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
> 0 $ sort -k 1 bug2_file1
> b101 512
> b1 512
> ------------------
> sort (GNU coreutils) 5.97
Newer sort also comes with a --debug option that would help explain your
predicament (5.97 is YEARS old; the latest is 8.13, with numerous bug
fixes, although none of the behavior you show is affected by any of
those bug fixes).
$ printf 'b101 512\nb1 512\n' | LC_ALL=C sort -k1 --debug
sort: using simple byte comparison
b1 512
______
______
b101 512
________
________
$ printf 'b101 512\nb1 512\n' | sort -k1 --debug
sort: using `en_US.UTF-8' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
b101 512
________
________
b1 512
______
______
$ printf 'b101 512\nb1 512\n' | sort -k1,1 --debug
sort: using `en_US.UTF-8' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
b1 512
__
______
b101 512
____
________
In the en_US.UTF-8 locale, collation is done by dictionary ordering,
where whitespace is insignificant to the collation; and specification of
-k1 instead of the more precise k1,1 means that you are sorting the
entire line instead of the first field of the line. Since "b1512"
collates greater than "b101512" in en_US collation rules, the same
applies to "b1 512" and "b101 512". Notice how use of -k1,1 changed the
output by comparing only "b1" and "b101", or how use of LC_ALL=C changed
the output by switching to bytewise collation with no ditionary sorting,
where space becomes significant.
--
Eric Blake eblake <at> redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
This bug report was last modified 13 years and 325 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.