#17189 - Sort bug #2 - GNU bug report logs

GNU bug report logs - #17189
Sort bug #2

Reported by: Nikos Balkanas <nbalkanas <at> gmail.com>

Date: Sat, 5 Apr 2014 04:39:01 UTC

Severity: normal

Tags: notabug

Merged with 17188

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Message #27 received at 17189-done <at> debbugs.gnu.org (full text, mbox):

From: Nikos Balkanas <nbalkanas <at> gmail.com> To: Eric Blake <eblake <at> redhat.com> Cc: 17189-done <at> debbugs.gnu.org Subject: Re: bug#17189: Sort bug #2 Date: Mon, 7 Apr 2014 21:11:13 +0300

[Message part 1 (text/plain, inline)]

On Mon, Apr 7, 2014 at 3:49 PM, Eric Blake <eblake <at> redhat.com> wrote: > On 04/05/2014 01:19 PM, Nikos Balkanas wrote: > > >> > >> No, earlier distributions merely defaulted to LC_ALL=C instead of > >> LC_ALL=en_US.UTF-8. This complaint is the same as your previous one, > >> and the solution is the same - if you want sorting by bytes, then ensure > >> that your locale is set to C rather than en_US.UTF-8. > >> > >> Thank you all. As I explained in my previous mail, an update of the man > > pages is essential. A change in the UI would also be desirable, > > if the standards allow it. Sorry, about my attitude, but I was getting > > pretty desperate. Thanks for not flaming. > > > > To make it up I will look into updating the man pages ;-) > > But the man page ALREADY says this: > > *** WARNING *** The locale specified by the environment > affects sort > order. Set LC_ALL=C to get the traditional sort order that uses > native > byte values. > > What more are you proposing? > I have already written a patch. It uses the available "-a" command line option to "force" traditional (ascii) sorting. Have updated man pages accordingly. What is the best way to upload it? > > > > > A suggestion. I think that sort should sort text based on the LOCALE of > > the file, not the system. Couldn't it detect automatically from the text, > > whether it is is dealing with UTF-8 or iso? > > Unfortunately, no, this is not possible. You're welcome to try and > write a patch to prove me wrong, but people have already had years of > experience of using environment variables as the way to tell a program > what encoding an input file uses, precisely because there is no other > obvious way of determining a file's locale. > > It is possible. It's been sometime, since I was parsing unicode, but if I remember correctly, a unicode char sets bits in its data to specify continuation. This calls for adaptive sorting based on input. I think Bob already mentioned, that it is not acceptable to do a second pass on the input (worst case scenario) to determine input locale, however, adaptive sorting should not need to. Unfortunately it is considerable effort and I would need to know your sorting algo. Since I don't and have much work to do this period, I wrote the much easier ui patch I talked before. I find it more elegant and easier than changing the environment. If it is acceptable, let me know how to upload it. > -- > Eric Blake eblake redhat com +1-919-301-3266 > Libvirt virtualization library http://libvirt.org > >

[Message part 2 (text/html, inline)]

This bug report was last modified 11 years and 99 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #17189 Sort bug #2

GNU bug report logs - #17189
Sort bug #2