GNU bug report logs - #19142
sort not working with LANG set to language_country.encoding

Previous Next

Package: coreutils;

Reported by: Roland Sieker <ospalh <at> gmail.com>

Date: Fri, 21 Nov 2014 16:49:02 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


Message #15 received at 19142 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Roland Sieker <ospalh <at> gmail.com>
Cc: 19142 <at> debbugs.gnu.org
Subject: Re: bug#19142: sort not working with LANG set to
 language_country.encoding
Date: Fri, 21 Nov 2014 22:49:41 -0700
tag 19142 notabug
close 19142
thanks

Roland Sieker wrote:
> I have noticed that sort seems to have problems when the LANG environment
> variable is set with language and country.

Sort is definitely affected by LANG because LANG sets LC_COLLATE which
controls the collation sequence.  Different locales have different
collating sequences.  I don't like that the english locales such as my
own country's en_US.UTF-8 and others like en_GB.UTF-8 don't sort
"correctly" as far as I am concerned but I can only accept it.  Sort
order is actually a libc function and affects much more than sort.  It
also affects ls and the shell and basically everything on the system
that sorts.

> It sorts OK like this, with LANG just the language.encoding:
> ( setenv LANG en.UTF-8 ; echo 'a\nb\na\n⺌\n⺕\n⺌' | sort )
> a
> a
> b

Are you sure "en.UTF-8" is a valid locale?  It doesn't look like it to
me.  I think that is an invalid locale and therefore libc is falling
back to the C/POSIX locale.

> But not with LANG as language_country.encoding:
> ( setenv LANG en_GB.UTF-8 ; echo 'a\nb\na\n⺌\n⺕\n⺌' | sort )

Here "en_GB.UTF-8" is a valid domain and en_GB.UTF-8 uses dictionary
sort ordering.  Dictionary order folds case and ignores punctuation.

Try using the newish sort --debug option.  It will help debug problems
such as this.

  $ printf "a\nb\na\n⺌\n⺕\n⺌\n" | env LC_ALL=en_US.UTF-8 sort --debug
  sort: using ‘en_US.UTF-8’ sorting rules
  ...

  $ printf "a\nb\na\n⺌\n⺕\n⺌\n" | env LC_ALL=en.UTF-8 sort --debug
  sort: using simple byte comparison
  ...

See also the FAQ entry:

  https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021

Bob




This bug report was last modified 10 years and 218 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.