GNU bug report logs - #9580
sort 8.5 bug?

Previous Next

Package: coreutils;

Reported by: Sean Sun <sean.x.sun <at> gmail.com>

Date: Thu, 22 Sep 2011 21:46:01 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9580 in the body.
You can then email your comments to 9580 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#9580; Package coreutils. (Thu, 22 Sep 2011 21:46:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Sean Sun <sean.x.sun <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 22 Sep 2011 21:46:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Sean Sun <sean.x.sun <at> gmail.com>
To: Bug-coreutils <at> gnu.org
Subject: sort 8.5 bug?
Date: Thu, 22 Sep 2011 13:55:05 -0700 (PDT)
#########################################################
Ubuntu 11.04
2.6.38-11-generic-pae

sort --version
sort (GNU coreutils) 8.5
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.

##########################################################
I created two testing files: File_A and File_B.

cat File_​A
.
BAD.

sort File_A
.
BAD.

cat File_​B
.s
BAD.s

sort File_B
BAD.s
.s

So basi­cally, append­ing a let­ter after ‘.’ would reverse the sort order.
That doesn't look quite right. Is there an explanation for this behavior?
I've tried the same on a Mac, and their sort (5.93) woks just fine.

I've also tried set LC_ALL='C'. Just in case it's a funky locale problem,
but didn't make a difference.

-- 
View this message in context: http://old.nabble.com/sort-8.5-bug--tp32503840p32503840.html
Sent from the Gnu - Coreutils - Discuss mailing list archive at Nabble.com.





Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Thu, 22 Sep 2011 22:03:01 GMT) Full text and rfc822 format available.

Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Thu, 22 Sep 2011 22:03:02 GMT) Full text and rfc822 format available.

Notification sent to Sean Sun <sean.x.sun <at> gmail.com>:
bug acknowledged by developer. (Thu, 22 Sep 2011 22:03:02 GMT) Full text and rfc822 format available.

Message #12 received at 9580-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Sean Sun <sean.x.sun <at> gmail.com>
Cc: 9580-done <at> debbugs.gnu.org
Subject: Re: bug#9580: sort 8.5 bug?
Date: Thu, 22 Sep 2011 16:01:44 -0600
tag 9580 notabug
thanks

On 09/22/2011 02:55 PM, Sean Sun wrote:
> So basi­cally, append­ing a let­ter after ‘.’ would reverse the sort order.
> That doesn't look quite right. Is there an explanation for this behavior?
> I've tried the same on a Mac, and their sort (5.93) woks just fine.

Thanks for the report, but this is not a bug in sort.  Actually, both 
versions that you tried (8.5 and 5.93) sort in the same way, where the 
difference is in your choice of locale, and you are hitting this FAQ:
https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

Newer coreutils added a --debug option to help you learn why the bug is 
in your expectations and not in sort (8.13 is current, but --debug has 
been present since 8.6).  So let's use it:

$ printf '.\nBAD.\n.s\nBAD.s\n' | sort --debug
sort: using `en_US.UTF-8' sorting rules
.
_
BAD.
____
BAD.s
_____
.s
__

$ printf '.\nBAD.\n.s\nBAD.s\n' | LC_ALL=C sort --debug
sort: using simple byte comparison
.
_
.s
__
BAD.
____
BAD.s
_____


Remember, the en_US.UTF-8 locale uses dictionary order collation, which 
treats punctuation as insignificant, and blends case.  That is, 's' and 
'.s' collate as the same string, and '.s' is larger than 'BAD.' since 
's' comes later in the alphabet than 'B'.

On the other hand, the C locale uses ASCII ordering, where every byte is 
significant, and '.' sorts before 'B'.

>
> I've also tried set LC_ALL='C'. Just in case it's a funky locale problem,
> but didn't make a difference.

Are you sure you used the correct syntax?  The way you wrote it, it 
looks like you tried:

$ set LC_ALL='C'

But that is neither sh (export LC_ALL=C) nor csh (setenv LC_ALL C) 
syntax.  And your problem is absolutely explained by locale, and would 
indeed be "solved" if you indeed had set LC_ALL=C like you meant to do.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 21 Oct 2011 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 13 years and 245 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.