Nikos Balkanas wrote:
> Thank you all. As I explained in my previous mail, an update of the man
> pages is essential. A change in the UI would also be desirable,
> if the standards allow it. Sorry, about my attitude, but I was getting
> pretty desperate. Thanks for not flaming.
>
> To make it up I will look into updating the man pages ;-)
Hopefully you will then see the WARNING section in the man page.
*** WARNING *** The locale specified by the environment affects
sort order. Set LC_ALL=C to get the traditional sort order that
uses native byte values.
US-ASCII is a subset of UTF-8. Every ASCII file is also a valid UTF-8
file. That is by design. But it also makes it impossible to make an
assumption like this.
For example one would start out with:
Lorem ipsum dolor sit amet
Now is the time.
Don't look Ethyl!
That file would sort one way. Then someone would change the
apostrophe to the unicode one.
Lorem ipsum dolor sit amet
Now is the time.
Don’t look Ethel!
If sort tried to automatically detect behavior based upon the file
content then now the file will sort with dictionary sort ordering? I
think this would cause a large number of complaints. It would be data
dependent behavior and would break a lot of things. Plus this would
require sort to add another pass to read the file first to determine
this before applying sorting it. Please no.
Besides... One person's file of human language is another person's
file of raw bytes. Can't make assumptions like this.
Bob