GNU bug report logs - #17189
Sort bug #2

Previous Next

Package: coreutils;

Reported by: Nikos Balkanas <nbalkanas <at> gmail.com>

Date: Sat, 5 Apr 2014 04:39:01 UTC

Severity: normal

Tags: notabug

Merged with 17188

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


Message #15 received at 17189-done <at> debbugs.gnu.org (full text, mbox):

From: Nikos Balkanas <nbalkanas <at> gmail.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: 17189-done <at> debbugs.gnu.org
Subject: Re: bug#17189: Sort bug #2
Date: Sat, 5 Apr 2014 22:19:47 +0300
[Message part 1 (text/plain, inline)]
On Sat, Apr 5, 2014 at 3:23 PM, Eric Blake <eblake <at> redhat.com> wrote:

> tag 17189 notabug
> forcemerge 17188 17189
> thanks
>
> On 04/04/2014 10:38 PM, Nikos Balkanas wrote:
> > What about this output?
>
> What about it?
>
> >
> > sort -k1 input > out
> >
> > 009    2919
> > 009    3107
> > 0.0     9312
> > 00a    3294
> > 00A    3389
> > 00a    3484
> > 00A    3578
> > 00a     3670
> > 00A    4142
> > 00b    4236
> > 00B    4332
> > 00b    4801
> >
> > This is no sorting. It is random garbage. Since when 00a < 00B? This
>
> Ever since the en_US.UTF-8 locale defined strcoll() to sort in
> case-insensitive dictionary order by default.
>
> > utility used to work fine in earlier distributions, until you broke it
> down.
>
> No, earlier distributions merely defaulted to LC_ALL=C instead of
> LC_ALL=en_US.UTF-8.  This complaint is the same as your previous one,
> and the solution is the same - if you want sorting by bytes, then ensure
> that your locale is set to C rather than en_US.UTF-8.
>
> Thank you all. As I explained in my previous mail, an update of the man
pages is essential. A change in the UI would also be desirable,
if the standards allow it. Sorry, about my attitude, but I was getting
pretty desperate. Thanks for not flaming.

To make it up I will look into updating the man pages ;-)

A suggestion. I think that sort should sort text based on the LOCALE of
the file, not the system. Couldn't it detect automatically from the text,
whether it is is dealing with UTF-8 or iso?
If dealing with Iso, it should employ the C Locale


> --
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
>
[Message part 2 (text/html, inline)]

This bug report was last modified 11 years and 49 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.