GNU bug report logs -
#23012
sort: add option to set specific locale
Previous Next
Reported by: John Heidemann <johnh <at> isi.edu>
Date: Mon, 14 Mar 2016 18:24:02 UTC
Severity: wishlist
Tags: wontfix
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Locale-specific sorting produces uprising results.
While locale-specific sorting is all as per POSIX, the details are
obscure and can be confusing.
(See for example this comment in the code:
/* Always output the locale in debug mode, since this
is such a common source of confusion. */
and "Sort does not sort in normal order!" at
http://www.gnu.org/software/coreutils/faq/coreutils-faq.html
)
Locale-specific result can only be controlled by setting the LC_LOCALE
or LC_COLLATE environment variables. However, this approach results in
"spooky action at a distance"---it is not obvious to users, and it can
be hard to control when sort is used from other programs.
Suggested enhancement: it should be possible to specify the locale on
the command-line, making control of this feature more accessible.
A patch at
http://www.isi.edu/~johnh/SOFTWARE/sort_locale_option_160314.patch
adds --locale=WHATEVER and -L
to accomplish this goal.
The patch is against coreutils-8.24.
Please consider it for submission to coreutils.
A test case that exhibits locale-specific oddness, with current sort:
{ echo '100.0.2'; echo '1.0.2'; echo '1x0.2'; echo 'the 1.0 is first as Kerningham intended'; } |LC_COLLATE=C sort
{ echo '100.0.2'; echo '1.0.2'; echo '1x0.2'; echo 'the 100.0.2 is first, for fun and confusion'; } |LC_COLLATE=en_US.utf8 sort
And the happeniess that ensues from control without environment variables:
{ echo '100.0.2'; echo '1.0.2'; echo '1x0.2'; echo 'the 1.0 is first as Kerningham intended'; } |./sort --locale=C
{ echo '100.0.2'; echo '1.0.2'; echo '1x0.2'; echo 'the 100.0.2 is first, for fun and confusion'; } | ./sort --locale=en_US.utf8
If the coreutils maintainers consider this patch acceptable, I will also
write a patch that updates the documentation.
(You may also want to add this option across other tools. That part is
left as an exercise to the reader. :-)
-John Heidemann
This bug report was last modified 6 years and 261 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.