GNU bug report logs - #23012
sort: add option to set specific locale

Previous Next

Package: coreutils;

Reported by: John Heidemann <johnh <at> isi.edu>

Date: Mon, 14 Mar 2016 18:24:02 UTC

Severity: wishlist

Tags: wontfix

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: John Heidemann <johnh <at> isi.edu>
To: 23012 <at> debbugs.gnu.org
Subject: bug#23012: add option to specific locale to sort
Date: Mon, 14 Mar 2016 11:02:55 -0700
Locale-specific sorting produces uprising results.
While locale-specific sorting is all as per POSIX, the details are
obscure and can be confusing.

(See for example this comment in the code:
      /* Always output the locale in debug mode, since this
         is such a common source of confusion.  */
and "Sort does not sort in normal order!" at
http://www.gnu.org/software/coreutils/faq/coreutils-faq.html
)

Locale-specific result can only be controlled by setting the LC_LOCALE
or LC_COLLATE environment variables.  However, this approach results in
"spooky action at a distance"---it is not obvious to users, and it can
be hard to control when sort is used from other programs.


Suggested enhancement: it should be possible to specify the locale on
the command-line, making control of this feature more accessible.


A patch at
http://www.isi.edu/~johnh/SOFTWARE/sort_locale_option_160314.patch
adds --locale=WHATEVER and -L
to accomplish this goal.

The patch is against coreutils-8.24.

Please consider it for submission to coreutils.


A test case that exhibits locale-specific oddness, with current sort:

{ echo '100.0.2'; echo '1.0.2'; echo '1x0.2'; echo 'the 1.0 is first as Kerningham intended'; } |LC_COLLATE=C sort

{ echo '100.0.2'; echo '1.0.2'; echo '1x0.2'; echo 'the 100.0.2 is first, for fun and confusion'; } |LC_COLLATE=en_US.utf8 sort


And the happeniess that ensues from control without environment variables:

{ echo '100.0.2'; echo '1.0.2'; echo '1x0.2'; echo 'the 1.0 is first as Kerningham intended'; } |./sort --locale=C

{ echo '100.0.2'; echo '1.0.2'; echo '1x0.2'; echo 'the 100.0.2 is first, for fun and confusion'; } | ./sort --locale=en_US.utf8


If the coreutils maintainers consider this patch acceptable, I will also
write a patch that updates the documentation.

(You may also want to add this option across other tools.  That part is
left as an exercise to the reader. :-)

   -John Heidemann




This bug report was last modified 6 years and 261 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.