GNU bug report logs - #9562
unexpected sort behaviour

Previous Next

Package: coreutils;

Reported by: vijay krishna <krishna.vijay4444 <at> gmail.com>

Date: Tue, 20 Sep 2011 16:23:02 UTC

Severity: normal

Tags: notabug

Merged with 9561

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


Message #10 received at 9562-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: vijay krishna <krishna.vijay4444 <at> gmail.com>
Cc: 9562-done <at> debbugs.gnu.org
Subject: Re: bug#9562: unexpected sort behaviour
Date: Tue, 20 Sep 2011 10:26:58 -0600
force-merge 9562 9561
tag 9562 notabug
thanks

On 09/20/2011 05:51 AM, vijay krishna wrote:
> Hello Team,
>
>    May I please know the reason for the following behaviour of the sort
> command...
>

Thanks for the report; however, this is not a bug.  As mentioned in the 
FAQ, you are encountering this behavior because of your choice of locale:

https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

> 0 $ sort -k 1 bug2_file1
> b101 512
> b1 512
> ------------------

> sort (GNU coreutils) 5.97

Newer sort also comes with a --debug option that would help explain your 
predicament (5.97 is YEARS old; the latest is 8.13, with numerous bug 
fixes, although none of the behavior you show is affected by any of 
those bug fixes).

$ printf 'b101 512\nb1 512\n' | LC_ALL=C sort -k1 --debug
sort: using simple byte comparison
b1 512
______
______
b101 512
________
________

$ printf 'b101 512\nb1 512\n' | sort -k1 --debug
sort: using `en_US.UTF-8' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
b101 512
________
________
b1 512
______
______
$ printf 'b101 512\nb1 512\n' | sort -k1,1 --debug
sort: using `en_US.UTF-8' sorting rules
sort: leading blanks are significant in key 1; consider also specifying `b'
b1 512
__
______
b101 512
____
________


In the en_US.UTF-8 locale, collation is done by dictionary ordering, 
where whitespace is insignificant to the collation; and specification of 
-k1 instead of the more precise k1,1 means that you are sorting the 
entire line instead of the first field of the line.  Since "b1512" 
collates greater than "b101512" in en_US collation rules, the same 
applies to "b1 512" and "b101 512".  Notice how use of -k1,1 changed the 
output by comparing only "b1" and "b101", or how use of LC_ALL=C changed 
the output by switching to bytewise collation with no ditionary sorting, 
where space becomes significant.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org




This bug report was last modified 13 years and 325 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.