GNU bug report logs -
#32603
sort bug?
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 32603 in the body.
You can then email your comments to 32603 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#32603
; Package
coreutils
.
(Fri, 31 Aug 2018 16:36:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Michael Bartman <michael.bartman <at> sparkpost.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Fri, 31 Aug 2018 16:36:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
My version of sort seems to have unpredictable behavior, based on the data
being sorted:
$ sort <foo
t
te
tec
$ sort <foo
t.co
tec.co
te.co
$ sort <foo
t.c
te.c
tec.c
$ sort <foo
t.co
tec.co
te.co
$ sort <foo
tec.o
te.o
t.o
$ sort --version
sort (GNU coreutils) 8.4
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and Paul Eggert.
--
Using flags -d, -n, -R, -r, and -i had no effect on this behavior.
*Mike Bartman*
*senior software engineer - platform*
*tel* (415)-578-5222 x492
*email *michael.bartman <at> sparkpost.com
[Message part 2 (text/html, inline)]
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Fri, 31 Aug 2018 16:45:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Michael Bartman <michael.bartman <at> sparkpost.com>
:
bug acknowledged by developer.
(Fri, 31 Aug 2018 16:45:02 GMT)
Full text and
rfc822 format available.
Message #10 received at 32603-done <at> debbugs.gnu.org (full text, mbox):
"sort --help" says:
*** WARNING ***
The locale specified by the environment affects sort order.
Set LC_ALL=C to get the traditional sort order that uses
native byte values.
and that's what you have run into.
Added tag(s) notabug.
Request was from
Eric Blake <eblake <at> redhat.com>
to
control <at> debbugs.gnu.org
.
(Fri, 31 Aug 2018 17:01:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#32603
; Package
coreutils
.
(Fri, 31 Aug 2018 17:01:03 GMT)
Full text and
rfc822 format available.
Message #15 received at 32603-done <at> debbugs.gnu.org (full text, mbox):
tag 32603 notabug
thanks
On 08/31/2018 11:44 AM, Paul Eggert wrote:
> "sort --help" says:
>
> *** WARNING ***
> The locale specified by the environment affects sort order.
> Set LC_ALL=C to get the traditional sort order that uses
> native byte values.
>
> and that's what you have run into.
To expound on Paul's answer:
> $ sort <foo
> t.co
> tec.co
> te.co
Let's run that with --debug to make it obvious:
$ printf 't.co\ntec.co\nte.co\n' | sort --debug
sort: using ‘en_US.UTF-8’ sorting rules
t.co
____
tec.co
______
te.co
_____
and realize that en_US.UTF-8 is a locale where punctuation is ignored
when determining collation order (thus, 'tco' < 'tecco' < 'teco' once
you strip out the ignored '.').
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#32603
; Package
coreutils
.
(Fri, 31 Aug 2018 17:10:02 GMT)
Full text and
rfc822 format available.
Message #18 received at 32603-done <at> debbugs.gnu.org (full text, mbox):
R0b0t1 wrote:
> I keep seeing these sort "bugs" pop up, they seem to be very popular. At
> any point would the default behavior be seen as needing change?
No matter what the default behavior is, it won't work for some applications, and
"bugs" will pop up.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#32603
; Package
coreutils
.
(Fri, 31 Aug 2018 18:01:02 GMT)
Full text and
rfc822 format available.
Message #21 received at 32603-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Fri, Aug 31, 2018 at 11:59 AM, Eric Blake <eblake <at> redhat.com> wrote:
> tag 32603 notabug
> thanks
>
>
> On 08/31/2018 11:44 AM, Paul Eggert wrote:
>
>> "sort --help" says:
>>
>> *** WARNING ***
>> The locale specified by the environment affects sort order.
>> Set LC_ALL=C to get the traditional sort order that uses
>> native byte values.
>>
>> and that's what you have run into.
>>
>
> To expound on Paul's answer:
>
> > $ sort <foo
> > t.co
> > tec.co
> > te.co
>
> Let's run that with --debug to make it obvious:
>
> $ printf 't.co\ntec.co\nte.co\n' | sort --debug
> sort: using ‘en_US.UTF-8’ sorting rules
> t.co
> ____
> tec.co
> ______
> te.co
> _____
>
> and realize that en_US.UTF-8 is a locale where punctuation is ignored when
> determining collation order (thus, 'tco' < 'tecco' < 'teco' once you strip
> out the ignored '.').
>
>
I keep seeing these sort "bugs" pop up, they seem to be very popular. At
any point would the default behavior be seen as needing change?
I'm not sure why I'd want to ignore special characters by default, for
example...
Cheers,
R0b0t1
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#32603
; Package
coreutils
.
(Fri, 31 Aug 2018 18:43:01 GMT)
Full text and
rfc822 format available.
Message #24 received at 32603 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
While the behavior of ignoring parts of the data is unexpected and
confusing, the explanation is clear and useful, and the LC_ALL=C setting
does result in the expected results. Thank you to all respondents.
The explanation of LC_ALL use in the "sort --help" output could perhaps be
clearer however to reduce the number of future "bug" reports. Perhaps
something like this:
"The locale specified by the environment affects sort order, and some
locale specifications or defaults may ignore certain characters, such as
punctuation. If you see unexpected sort output orderings, try setting
LC_ALL=C to get the traditional sort order that uses native byte values."
--
*Mike Bartman*
*senior software engineer - platform*
*tel* (415)-578-5222 x492
*email *michael.bartman <at> sparkpost.com
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#32603
; Package
coreutils
.
(Sat, 01 Sep 2018 07:53:01 GMT)
Full text and
rfc822 format available.
Message #27 received at 32603 <at> debbugs.gnu.org (full text, mbox):
Michael Bartman wrote:
> try setting
> LC_ALL=C to get the traditional sort order that uses native byte values."
LC_ALL=C is not guaranteed to do that. There is no requirement that it use
native byte values; on the contrary, it is required to not use native byte
values in some circumstances (e.g., z/OS EBCDIC environments).
This is a complicated area, unfortunately, and it's not something that can
easily be condensed into a single line in a help message.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 29 Sep 2018 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 6 years and 349 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.