GNU bug report logs - #32603
sort bug?

Previous Next

Package: coreutils;

Reported by: Michael Bartman <michael.bartman <at> sparkpost.com>

Date: Fri, 31 Aug 2018 16:36:01 UTC

Severity: normal

Tags: notabug

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 32603 in the body.
You can then email your comments to 32603 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#32603; Package coreutils. (Fri, 31 Aug 2018 16:36:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Michael Bartman <michael.bartman <at> sparkpost.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Fri, 31 Aug 2018 16:36:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Michael Bartman <michael.bartman <at> sparkpost.com>
To: bug-coreutils <at> gnu.org
Subject: sort bug?
Date: Fri, 31 Aug 2018 11:34:26 -0400
[Message part 1 (text/plain, inline)]
My version of sort seems to have unpredictable behavior, based on the data
being sorted:

$ sort <foo
t
te
tec

$ sort <foo
t.co
tec.co
te.co

$ sort <foo
t.c
te.c
tec.c

$ sort <foo
t.co
tec.co
te.co

$ sort <foo
tec.o
te.o
t.o


$ sort --version
sort (GNU coreutils) 8.4
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.
-- 

Using flags -d, -n, -R, -r, and -i had no effect on this behavior.

*Mike Bartman*
*senior software engineer - platform*

*tel* (415)-578-5222 x492
*email *michael.bartman <at> sparkpost.com
[Message part 2 (text/html, inline)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Fri, 31 Aug 2018 16:45:02 GMT) Full text and rfc822 format available.

Notification sent to Michael Bartman <michael.bartman <at> sparkpost.com>:
bug acknowledged by developer. (Fri, 31 Aug 2018 16:45:02 GMT) Full text and rfc822 format available.

Message #10 received at 32603-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Michael Bartman <michael.bartman <at> sparkpost.com>, 32603-done <at> debbugs.gnu.org
Subject: Re: bug#32603: sort bug?
Date: Fri, 31 Aug 2018 09:44:35 -0700
"sort --help" says:

*** WARNING ***
The locale specified by the environment affects sort order.
Set LC_ALL=C to get the traditional sort order that uses
native byte values.

and that's what you have run into.




Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Fri, 31 Aug 2018 17:01:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#32603; Package coreutils. (Fri, 31 Aug 2018 17:01:03 GMT) Full text and rfc822 format available.

Message #15 received at 32603-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: 32603-done <at> debbugs.gnu.org, eggert <at> cs.ucla.edu,
 michael.bartman <at> sparkpost.com
Subject: Re: bug#32603: sort bug?
Date: Fri, 31 Aug 2018 11:59:49 -0500
tag 32603 notabug
thanks

On 08/31/2018 11:44 AM, Paul Eggert wrote:
> "sort --help" says:
> 
> *** WARNING ***
> The locale specified by the environment affects sort order.
> Set LC_ALL=C to get the traditional sort order that uses
> native byte values.
> 
> and that's what you have run into.

To expound on Paul's answer:

> $ sort <foo
> t.co
> tec.co
> te.co

Let's run that with --debug to make it obvious:

$ printf 't.co\ntec.co\nte.co\n' | sort --debug
sort: using ‘en_US.UTF-8’ sorting rules
t.co
____
tec.co
______
te.co
_____

and realize that en_US.UTF-8 is a locale where punctuation is ignored 
when determining collation order (thus, 'tco' < 'tecco' < 'teco' once 
you strip out the ignored '.').

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Information forwarded to bug-coreutils <at> gnu.org:
bug#32603; Package coreutils. (Fri, 31 Aug 2018 17:10:02 GMT) Full text and rfc822 format available.

Message #18 received at 32603-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: R0b0t1 <r030t1 <at> gmail.com>, Eric Blake <eblake <at> redhat.com>
Cc: 32603-done <at> debbugs.gnu.org, michael.bartman <at> sparkpost.com
Subject: Re: bug#32603: sort bug?
Date: Fri, 31 Aug 2018 10:08:52 -0700
R0b0t1 wrote:
> I keep seeing these sort "bugs" pop up, they seem to be very popular. At
> any point would the default behavior be seen as needing change?

No matter what the default behavior is, it won't work for some applications, and 
"bugs" will pop up.




Information forwarded to bug-coreutils <at> gnu.org:
bug#32603; Package coreutils. (Fri, 31 Aug 2018 18:01:02 GMT) Full text and rfc822 format available.

Message #21 received at 32603-done <at> debbugs.gnu.org (full text, mbox):

From: R0b0t1 <r030t1 <at> gmail.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 32603-done <at> debbugs.gnu.org,
 michael.bartman <at> sparkpost.com
Subject: Re: bug#32603: sort bug?
Date: Fri, 31 Aug 2018 12:05:56 -0500
[Message part 1 (text/plain, inline)]
On Fri, Aug 31, 2018 at 11:59 AM, Eric Blake <eblake <at> redhat.com> wrote:

> tag 32603 notabug
> thanks
>
>
> On 08/31/2018 11:44 AM, Paul Eggert wrote:
>
>> "sort --help" says:
>>
>> *** WARNING ***
>> The locale specified by the environment affects sort order.
>> Set LC_ALL=C to get the traditional sort order that uses
>> native byte values.
>>
>> and that's what you have run into.
>>
>
> To expound on Paul's answer:
>
> > $ sort <foo
> > t.co
> > tec.co
> > te.co
>
> Let's run that with --debug to make it obvious:
>
> $ printf 't.co\ntec.co\nte.co\n' | sort --debug
> sort: using ‘en_US.UTF-8’ sorting rules
> t.co
> ____
> tec.co
> ______
> te.co
> _____
>
> and realize that en_US.UTF-8 is a locale where punctuation is ignored when
> determining collation order (thus, 'tco' < 'tecco' < 'teco' once you strip
> out the ignored '.').
>
>
I keep seeing these sort "bugs" pop up, they seem to be very popular. At
any point would the default behavior be seen as needing change?

I'm not sure why I'd want to ignore special characters by default, for
example...

Cheers,
    R0b0t1
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#32603; Package coreutils. (Fri, 31 Aug 2018 18:43:01 GMT) Full text and rfc822 format available.

Message #24 received at 32603 <at> debbugs.gnu.org (full text, mbox):

From: Michael Bartman <michael.bartman <at> sparkpost.com>
To: 32603 <at> debbugs.gnu.org
Subject: Thank you for the quick response and answer!
Date: Fri, 31 Aug 2018 14:41:51 -0400
[Message part 1 (text/plain, inline)]
While the behavior of ignoring parts of the data is unexpected and
confusing, the explanation is clear and useful, and the LC_ALL=C setting
does result in the expected results.  Thank you to all respondents.

The explanation of LC_ALL use in the "sort --help" output could perhaps be
clearer however to reduce the number of future "bug" reports.  Perhaps
something like this:

"The locale specified by the environment affects sort order, and some
locale specifications or defaults may ignore certain characters, such as
punctuation.  If you see unexpected sort output orderings, try setting
LC_ALL=C to get the traditional sort order that uses native byte values."

-- 

*Mike Bartman*
*senior software engineer - platform*

*tel* (415)-578-5222 x492
*email *michael.bartman <at> sparkpost.com
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#32603; Package coreutils. (Sat, 01 Sep 2018 07:53:01 GMT) Full text and rfc822 format available.

Message #27 received at 32603 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Michael Bartman <michael.bartman <at> sparkpost.com>, 32603 <at> debbugs.gnu.org
Subject: Re: bug#32603: Thank you for the quick response and answer!
Date: Sat, 1 Sep 2018 00:52:01 -0700
Michael Bartman wrote:
> try setting
> LC_ALL=C to get the traditional sort order that uses native byte values."

LC_ALL=C is not guaranteed to do that. There is no requirement that it use 
native byte values; on the contrary, it is required to not use native byte 
values in some circumstances (e.g., z/OS EBCDIC environments).

This is a complicated area, unfortunately, and it's not something that can 
easily be condensed into a single line in a help message.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 29 Sep 2018 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 349 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.