On Mon, Apr 7, 2014 at 3:49 PM, Eric Blake <eblake@redhat.com> wrote:
On 04/05/2014 01:19 PM, Nikos Balkanas wrote:

>>
>> No, earlier distributions merely defaulted to LC_ALL=C instead of
>> LC_ALL=en_US.UTF-8.  This complaint is the same as your previous one,
>> and the solution is the same - if you want sorting by bytes, then ensure
>> that your locale is set to C rather than en_US.UTF-8.
>>
>> Thank you all. As I explained in my previous mail, an update of the man
> pages is essential. A change in the UI would also be desirable,
> if the standards allow it. Sorry, about my attitude, but I was getting
> pretty desperate. Thanks for not flaming.
>
> To make it up I will look into updating the man pages ;-)

But the man page ALREADY says this:

       ***  WARNING  ***  The locale specified by the environment
affects sort
       order.  Set LC_ALL=C to get the traditional sort order that uses
native
       byte values.

What more are you proposing?

I have already written a patch. It uses the available "-a" command line option to
 "force" traditional (ascii) sorting. Have updated man pages accordingly.

What is the best way to upload it?

>
> A suggestion. I think that sort should sort text based on the LOCALE of
> the file, not the system. Couldn't it detect automatically from the text,
> whether it is is dealing with UTF-8 or iso?

Unfortunately, no, this is not possible.  You're welcome to try and
write a patch to prove me wrong, but people have already had years of
experience of using environment variables as the way to tell a program
what encoding an input file uses, precisely because there is no other
obvious way of determining a file's locale.

It is possible. It's been sometime, since I was parsing unicode, but if I remember correctly,
a unicode char sets bits in its data to specify continuation. This calls for adaptive sorting based on input.
I think Bob already mentioned, that it is not acceptable to do a second pass on the input (worst case scenario)
to determine input locale, however, adaptive sorting should not need to. Unfortunately it is considerable
effort and I would need to know your sorting algo. Since I don't and have much work to do this period,
I wrote the much easier ui patch I talked before.

I find it more elegant and easier than changing the environment. If it is acceptable, let me know how to upload it.
 
--
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org