GNU bug report logs - #8124
sort utilty?

Previous Next

Package: coreutils;

Reported by: Betty J Barr <barr <at> EGR.UH.EDU>

Date: Sat, 26 Feb 2011 19:29:03 UTC

Severity: normal

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 8124 in the body.
You can then email your comments to 8124 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#8124; Package coreutils. (Sat, 26 Feb 2011 19:29:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Betty J Barr <barr <at> EGR.UH.EDU>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 26 Feb 2011 19:29:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Betty J Barr <barr <at> EGR.UH.EDU>
To: bug-coreutils <at> gnu.org
Cc: BABales <at> uh.edu
Subject: sort utilty?
Date: Sat, 26 Feb 2011 12:28:08 -0600 (CST)
[Message part 1 (text/plain, inline)]
I have used the "unix" sort for years without trouble, but suddenly 
something weird is happening. I have a student whose last name is Khan, 
and a student whose last name is Khanal,. The comma immediately follows 
the last name. Obviously Khan should be sorted before Khanal, both from 
the English standpoint and based on the ASCI code for a comma. But it is 
not. Khanal comes first. Why?
(The two files I have attached are the unsorted version (khan.txt) and the 
one produced by
sort +0.0  -0.14 khan.txt >khan.out

                         Dr. Betty Barr,
                         University of Houston
[khan.txt (text/plain, attachment)]
[khan.out (text/plain, attachment)]

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#8124; Package coreutils. (Sat, 26 Feb 2011 23:35:02 GMT) Full text and rfc822 format available.

Message #8 received at 8124 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Betty J Barr <barr <at> EGR.UH.EDU>
Cc: 8124 <at> debbugs.gnu.org, BABales <at> uh.edu
Subject: Re: bug#8124: sort utilty?
Date: Sat, 26 Feb 2011 16:34:23 -0700
[Message part 1 (text/plain, inline)]
On 02/26/2011 11:28 AM, Betty J Barr wrote:
> I have used the "unix" sort for years without trouble, but suddenly
> something weird is happening. I have a student whose last name is Khan,
> and a student whose last name is Khanal,. The comma immediately follows
> the last name. Obviously Khan should be sorted before Khanal, both from
> the English standpoint and based on the ASCI code for a comma. But it is
> not. Khanal comes first. Why?

Because you are using a locale that regards punctuation as insignificant
in collation sequences.

See this FAQ, then try 'LC_ALL=C sort ...' to see the difference.
 http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

Also, newer coreutils (the latest is 8.10) includes 'sort --debug' to
help diagnose problems like this.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Mon, 28 Feb 2011 16:22:02 GMT) Full text and rfc822 format available.

Notification sent to Betty J Barr <barr <at> EGR.UH.EDU>:
bug acknowledged by developer. (Mon, 28 Feb 2011 16:22:02 GMT) Full text and rfc822 format available.

Message #13 received at 8124-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: "Betty J. Barr" <barr <at> uh.edu>
Cc: 8124-done <at> debbugs.gnu.org, Bryan Bales <BABales <at> uh.edu>
Subject: Re: bug#8124: sort utilty?
Date: Mon, 28 Feb 2011 09:21:10 -0700
[Message part 1 (text/plain, inline)]
[re-adding the list, for closure on this report]

On 02/28/2011 09:05 AM, Betty J. Barr wrote:
>> Because you are using a locale that regards punctuation as insignificant
>> in collation sequences.
>>
>> See this FAQ, then try 'LC_ALL=C sort ...' to see the difference.
>>
>> http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

> Thank you. Setting the environment did work. Since I am at the mercy of
> our IT people in terms of versions and updates, it was just a shock when
> something I had used for years did not work.

Glad to hear it.  Yes, it is rather a shock when upgrading a system
changes the default locale to something different than it was before,
with all sorts of knock-on effects that many people are not expecting;
but it's nothing that coreutils can change other than to help teach
people about the effect of locale settings.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 29 Mar 2011 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 14 years and 144 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.