Package: coreutils;
Reported by: Nikos Balkanas <nbalkanas <at> gmail.com>
Date: Sat, 5 Apr 2014 02:27:02 UTC
Severity: normal
Tags: notabug
Merged with 17189
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Bob Proulx <bob <at> proulx.com> To: Nikos Balkanas <nbalkanas <at> gmail.com> Cc: 17188 <at> debbugs.gnu.org Subject: bug#17188: Sort bugs Date: Sat, 5 Apr 2014 14:23:29 -0600
Nikos Balkanas wrote: > Eric Blake wrote: > > See the FAQ: > > > > https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021 > > > From that link: > "So far there is still no fully satisfactory solution to this problem. If > you find one then please contact me so that this information can be listed." > > If you are "me", then I would like to suggest that you make default > the legacy sort behaviour, and add with -c the locale support that > standards and non-English users ask for. When I wrote that I did mean within the confines of continuing to conform to the standards. :-) > UI is still a bug, though not a code bug. And legacy UI compatibility is > broken. Actually no. If you were really using the legacy UI then you would be using the legacy locale setting LC_ALL=C too. If you aren't then you aren't using the legacy UI. > However, I am perfectly satisfied with your fast and long > explanation of what the status is. > You will, however, go crazy if you respond like that to every user with a > locale sorting issue. I usually rant: You don't like it and I don't like it but the-powers-that-be have confused working with data on a computer with talking about working with data on a computer. They have decided that the collation ordering (sort ordering) for data should be dictionary ordering. In dictionary ordering case is folded together and punctuation is ignored. By having LANG set to any of the "en" locales the system is instructed to use dictionary sort ordering. This affects almost everything on the system that sorts. This includes commands such as 'ls' and also your shell (e.g. 'echo *') too. > Can't you make default LOCALE=C for sorting and allow users to > change that to the system settings using -c when they need it? Actually no we can't. That would break the opposite side of things where people rely upon dictionary sorting based upon their chosen locale setting. After all of these years that would be equally bad in the opposite way. I am going to say "you" here but please don't take this as hostility. It is a bad word in text email. But I am really just trying to put down the facts of the case. Originally the locale was C. If you go back to the C locale things will be working for you as you wish it to work. It will work as it worked before. Agreed? Then you changed something. You changed the locale. You in your environment set LANG=en_US.UTF-8 (or similar equivalent). That is when you notice that sort doesn't work as you want it to work. Now you might say that you personally didn't make that choice but your system vendor did. It happened when you switched to a new machine running a newer system or something. Okay. But you chose that system vendor. You could choose a different system vendor. Or choose to go back to the previous system with the previous LANG=C locale. Or choose to configure the new system as you wish it. You are in control of it. As a pilot we have a saying, "Fly the airplane. Don't let the airplane fly you." :-) You could file bugs with your system vendor that they defaulted you to LANG=en_US.UTF-8 and ask them to allow users to choose LANG=C at install time instead. I have done this and unfortunately the response from one vendor was "That was intentional." with the bug closed and locked against further comment. The door slammed in my face. I am now using a different software distribution. > Nowadays users use other graphical tools to do sorting, sort is used > mostly by scripts. For you perhaps. Not for me. Not for many people. I have no idea what the survey count would be either way but it doesn't matter. Can't make the mistake of assuming that any one environment is more important to the exclusion of all others. But you see the problem isn't a change in sort. The problem is a change in locale. Sort is behaving as it has for years and years. What changed was the locale that most people get by default. It used to be that users would get LANG=C. But these days most users get LANG=en_US.UTF-8. But with a dictionary collating sort order locale it behaves undesirably to many of us. But to others that is exactly what they want. And so they wrote it into the locale. Two opposing viewpoints that being in opposition cannot be converged. Note that this is bigger than just sort. This affects everything on your system. It affects the shell. Try "echo *" and look at the sort ordering. Same thing there. The shell will sort by locale sort order. The only way to fix it is to fix it at the source of the problem. The source is the locale collation sequence. Which is why I always set this in my environment. export LANG=en_US.UTF-8 export LC_COLLATE=C But while that works for most western locales I have no idea how that would interact with chinese big5 for example. Probably badly. So it can't really be offered as a general solution to the problem. But if you are using one of the set of western locales that it works for then it does solve the problem for you. I keep thinking that one of these days I should dig into it and create my own locale. Something like LANG=en_US.C.UTF-8 that would define a sane sort ordering that wouldn't require LC_COLLATE=C to fix. But there isn't much itch to scratch there since LC_COLLATE=C does effectively the same thing to fix the problem. For western locales anyway and we don't usually hear from anyone else with this problem. Bob
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.