GNU bug report logs - #6791
locale sort ordering confusion

Previous Next

Package: coreutils;

Reported by: "George Thomas Irimben (georgeti)" <georgeti <at> cisco.com>

Date: Tue, 3 Aug 2010 20:33:01 UTC

Severity: normal

Tags: moreinfo, notabug

Merged with 6790

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 6791 in the body.
You can then email your comments to 6791 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6791; Package coreutils. (Tue, 03 Aug 2010 20:33:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to "George Thomas Irimben (georgeti)" <georgeti <at> cisco.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Tue, 03 Aug 2010 20:33:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "George Thomas Irimben (georgeti)" <georgeti <at> cisco.com>
To: <bug-coreutils <at> gnu.org>
Subject: RE: Problem(bug?) with basic sort command in Linux
Date: Wed, 4 Aug 2010 01:56:55 +0530
Hi,
Setting LC_ALL to C fixed the issue. 
But before setting this to C, with same .cshrc file, Unix didn't give me
a problem.

Does it mean my shell script which has this sort command will not work
for others unless they set their LC_ALL?
Should I set this to C in my script itself? 

Any other suggestion?

Thanks, George

-----Original Message-----
From: George Thomas Irimben (georgeti) 
Sent: Wednesday, August 04, 2010 1:33 AM
To: 'bug-coreutils <at> gnu.org'
Subject: Problem(bug?) with basic sort command in Linux

Hi,
I would like to report a problem(bug?) I am facing with sort command in
Linux.

Sorting of a simple text file using simple sort command is giving me
incorrect result.

Here is the problem:

Text file to sort has 3 lines
my-lnx7% cat y
abc/d,ABC
abc/,XYZ
abc/o,MNO

sort command from Linux is giving me below result(According to me, this
result is incorrect)

my-lnx7% sort y
abc/d,ABC
abc/o,MNO
abc/,XYZ

But, result expected is as below. Because "," is ahead of "d" in ASCII
table. 
Same found working on Unix using same input file, same command line.

abc/,XYZ
abc/d,ABC
abc/o,MNO


Pls let me know if this is a problem in Linux or I am missing something.


Thanks, George




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6791; Package coreutils. (Tue, 03 Aug 2010 21:09:02 GMT) Full text and rfc822 format available.

Message #8 received at 6791 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: "George Thomas Irimben (georgeti)" <georgeti <at> cisco.com>
Cc: 6791 <at> debbugs.gnu.org
Subject: Re: bug#6791: Problem(bug?) with basic sort command in Linux
Date: Tue, 03 Aug 2010 14:08:58 -0700
On 08/03/10 13:26, George Thomas Irimben (georgeti) wrote:
> Should I set this to C in my script itself? 

Yes.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6791; Package coreutils. (Tue, 03 Aug 2010 21:12:02 GMT) Full text and rfc822 format available.

Message #11 received at 6791 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: "George Thomas Irimben (georgeti)" <georgeti <at> cisco.com>
Cc: 6791 <at> debbugs.gnu.org
Subject: Re: bug#6791: Problem(bug?) with basic sort command in Linux
Date: Tue, 03 Aug 2010 15:10:25 -0600
[Message part 1 (text/plain, inline)]
merge 6790 6791
tag 6790 + notabug
close 6790
thanks

On 08/03/2010 02:26 PM, George Thomas Irimben (georgeti) wrote:
> Hi,
> Setting LC_ALL to C fixed the issue. 
> But before setting this to C, with same .cshrc file, Unix didn't give me
> a problem.

That just means that you have a different default locale on your Unix
box than you do on the box where you encountered the difference.

> 
> Does it mean my shell script which has this sort command will not work
> for others unless they set their LC_ALL?
> Should I set this to C in my script itself? 

In general, it is a good idea, if you don't want locale differences to
impact the behavior of your script.

This is not a bug in sort, and it is a FAQ:
http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Merged 6790 6791. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Tue, 03 Aug 2010 21:12:02 GMT) Full text and rfc822 format available.

Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Tue, 03 Aug 2010 21:12:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to "George Thomas Irimben (georgeti)" <georgeti <at> cisco.com> Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Tue, 03 Aug 2010 21:12:02 GMT) Full text and rfc822 format available.

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6791; Package coreutils. (Tue, 03 Aug 2010 21:39:02 GMT) Full text and rfc822 format available.

Message #20 received at 6791 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: "George Thomas Irimben (georgeti)" <georgeti <at> cisco.com>
Cc: 6791 <at> debbugs.gnu.org
Subject: Re: bug#6791: Problem(bug?) with basic sort command in Linux
Date: Tue, 3 Aug 2010 15:39:10 -0600
forcemerge 6790 6791
tags 6790 + moreinfo
retitle 6790 locale sort ordering confusion
thanks

George Thomas Irimben (georgeti) wrote:
> Setting LC_ALL to C fixed the issue. 

Yes.  This is a well known behavior.  Thank you for the report anyway.
However what you are seeing is intended behavior.  It isn't something
sort has control over.  The character collation sequence is chosen by
your specified locale.  You can see what locale you have configured
with the 'locale' command.

  $ locale

> But before setting this to C, with same .cshrc file, Unix didn't give me
> a problem.

Your system probably didn't set the locale before and now it does.  Or
you were using a different system.  Or some such.  Definitely this is
a locale change.  Now that you know what to look for I am sure you
will locate the specific thing that changed.

> Does it mean my shell script which has this sort command will not work
> for others unless they set their LC_ALL?
> Should I set this to C in my script itself? 

Correct.  If your script requires a standard sort order then you will
need to ensure it yourself.  Because the environment it runs in may
default to a different locale sort ordering otherwise.

<rant> You don't like it and I don't like it but the-powers-that-be
have confused working with data on a computer with talking about
working with data on a computer.  They have decided that the collation
ordering (sort ordering) for data should be dictionary ordering.  In
dictionary ordering case is folded together and punctuation is
ignored.  For example by having LANG set to any of the "en_*" locales
the system is instructed to use dictionary sort ordering.  This
affects almost everything on the system that sorts.  This includes
commands such as 'ls' and also commands built into your shell
(e.g. 'echo *') too. </rant>

> Any other suggestion?

Your sort order depends upon your locale.  You didn't say what your
locale was and therefore I assume that you were not aware that it
had an effect.

The documentation says:

     Unless otherwise specified, all comparisons use the character
  collating sequence specified by the `LC_COLLATE' locale.(1)
  ...
     (1) If you use a non-POSIX locale (e.g., by setting `LC_ALL' to
  `en_US'), then `sort' may produce output that is sorted differently
  than you're accustomed to.  In that case, set the `LC_ALL'
  environment variable to `C'.  Note that setting only `LC_COLLATE'
  has two problems.  First, it is ineffective if `LC_ALL' is also set.
  Second, it has undefined behavior if `LC_CTYPE' (or `LANG', if
  `LC_CTYPE' is unset) is set to an incompatible value.  For example,
  you get undefined behavior if `LC_CTYPE' is `ja_JP.PCK' but
  `LC_COLLATE' is `en_US.UTF-8'.

As far as I know, which isn't as much as I would like especially in
this case, it is implemented in libc.  Therefore it would need to be
addressed with libc folks.

  http://www.gnu.org/software/libc/

But very likely the chain continues well beyond that point.

Personally I have the following in my $HOME/.bashrc file.

  export LANG=en_US.UTF-8
  export LC_COLLATE=C

That sets most of my locale to a UTF-8 one but forces sorting to be
standard C/POSIX.  This probably won't work in the general case since
I have no idea how that would interact with all character sets.

You may want to look at the FAQ.

  http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

Bob




Forcibly Merged 6790 6791. Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Tue, 03 Aug 2010 21:39:02 GMT) Full text and rfc822 format available.

Added tag(s) moreinfo. Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Tue, 03 Aug 2010 21:39:02 GMT) Full text and rfc822 format available.

Changed bug title to 'locale sort ordering confusion' from 'Problem(bug?) with basic sort command in Linux' Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Tue, 03 Aug 2010 21:39:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 01 Sep 2010 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 15 years and 8 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.