From unknown Fri Jul 25 06:44:14 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#19142 <19142@debbugs.gnu.org> To: bug#19142 <19142@debbugs.gnu.org> Subject: Status: sort not working with LANG set to language_country.encoding Reply-To: bug#19142 <19142@debbugs.gnu.org> Date: Fri, 25 Jul 2025 13:44:14 +0000 retitle 19142 sort not working with LANG set to language_country.encoding reassign 19142 coreutils submitter 19142 Roland Sieker severity 19142 normal tag 19142 notabug thanks From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 21 11:48:14 2014 Received: (at submit) by debbugs.gnu.org; 21 Nov 2014 16:48:14 +0000 Received: from localhost ([127.0.0.1]:41078 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XrrNV-0007yj-SG for submit@debbugs.gnu.org; Fri, 21 Nov 2014 11:48:14 -0500 Received: from eggs.gnu.org ([208.118.235.92]:44366) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XrmKo-0007Ft-KX for submit@debbugs.gnu.org; Fri, 21 Nov 2014 06:25:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XrmKn-00012X-Mr for submit@debbugs.gnu.org; Fri, 21 Nov 2014 06:25:06 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_40,FREEMAIL_FROM, HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:38615) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XrmKn-000122-Kg for submit@debbugs.gnu.org; Fri, 21 Nov 2014 06:25:05 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46081) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XrmKj-00013X-NY for bug-coreutils@gnu.org; Fri, 21 Nov 2014 06:25:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XrmKg-0000tA-At for bug-coreutils@gnu.org; Fri, 21 Nov 2014 06:25:01 -0500 Received: from mail-la0-x22d.google.com ([2a00:1450:4010:c03::22d]:61531) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XrmKg-0000sw-3J for bug-coreutils@gnu.org; Fri, 21 Nov 2014 06:24:58 -0500 Received: by mail-la0-f45.google.com with SMTP id gq15so4018421lab.4 for ; Fri, 21 Nov 2014 03:24:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Mb9PGLJX0+s48/fZI9Uusl5uaj//ZIp0jjPZOuymgPE=; b=Vx6eyJvLJHmNxNFsQE8vebq4zyhc/euaW6IcGtyMBPiR8csjGB2CpZx4YPyyt3zm8a ENbEmUYdSu68pNuySxDx5DABWprRREp4uF4rHSaDVCrak6yPSmfEXX2rSOJcrD1w75DA TZ+sYFPlA2fobar1mqGwWUrEihs3GGEprS6AITUCCNEHEeBB77uEV2wGTV9zBit+GMHN Ebv0eT0R+Yz98GTGCsu1B0zYmb5eYNpIj1LjenMrGprDzHsYnKYspIRNQ3OzOegQrvH5 0mQJ3ZUF/EyegKwWg+VzXs19LoGB4VIXRBFbAO7TlQ9l38J9wllnO827Pm3ccKdvb4gY G9ew== MIME-Version: 1.0 X-Received: by 10.152.30.70 with SMTP id q6mr3886180lah.6.1416569096416; Fri, 21 Nov 2014 03:24:56 -0800 (PST) Received: by 10.152.242.129 with HTTP; Fri, 21 Nov 2014 03:24:56 -0800 (PST) Date: Fri, 21 Nov 2014 12:24:56 +0100 Message-ID: Subject: sort not working with LANG set to language_country.encoding From: Roland Sieker To: bug-coreutils@gnu.org Content-Type: multipart/alternative; boundary=089e0160b46acab62005085cb13b X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 21 Nov 2014 11:48:12 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --089e0160b46acab62005085cb13b Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi. I have noticed that sort seems to have problems when the LANG environment variable is set with language and country. As a test case, i tried to sort a b a =E2=BA=8C =E2=BA=95 =E2=BA=8C It sorts OK like this, with LANG just the language.encoding: ( setenv LANG en.UTF-8 ; echo 'a\nb\na\n=E2=BA=8C\n=E2=BA=95\n=E2=BA=8C' | = sort ) a a b =E2=BA=8C =E2=BA=8C =E2=BA=95 But not with LANG as language_country.encoding: ( setenv LANG en_GB.UTF-8 ; echo 'a\nb\na\n=E2=BA=8C\n=E2=BA=95\n=E2=BA=8C'= | sort ) =E2=BA=8C =E2=BA=95 =E2=BA=8C a a b sort: sort (GNU coreutils) 8.21 Shell: tcsh 6.18.01 (Astron) 2012-02-14 (x86_64-unknown-linux) options wide,nls,dl,al,kan,rh,color,filec Fedora Linux 20 Regards, ospalh --089e0160b46acab62005085cb13b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi.

I have noticed that sort seems to have problems when the LANG environment v= ariable is set with language and country.

As a test case, i tried to sort

a
b
a
=E2=BA=8C
=E2=BA=95
=E2=BA=8C

It sorts OK like this, with LANG just the language.encoding:
( setenv LANG en.UTF-8 ; echo 'a\nb\na\n=E2=BA=8C\n=E2=BA=95\n=E2=BA=8C= ' | sort )
a
a
b
=E2=BA=8C
=E2=BA=8C
=E2=BA=95

But not with LANG as language_country.encoding:
( setenv LANG en_GB.UTF-8 ; echo 'a\nb\na\n=E2=BA=8C\n=E2=BA=95\n=E2=BA= =8C' | sort )
=E2=BA=8C
=E2=BA=95
=E2=BA=8C
a
a
b




sort: sort (GNU coreutils) 8.21
Shell: tcsh 6.18.01 (Astron) 2012-02-14 (x86_64-unknown-linux) options wide= ,nls,dl,al,kan,rh,color,filec
Fedora Linux 20

Regards, ospalh
--089e0160b46acab62005085cb13b-- From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 21 11:59:27 2014 Received: (at control) by debbugs.gnu.org; 21 Nov 2014 16:59:27 +0000 Received: from localhost ([127.0.0.1]:41088 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XrrYM-0008F8-Fs for submit@debbugs.gnu.org; Fri, 21 Nov 2014 11:59:26 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45668) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XrrYJ-0008Ew-WE; Fri, 21 Nov 2014 11:59:24 -0500 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id sALGxMaq002331 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 21 Nov 2014 11:59:22 -0500 Received: from [10.3.113.145] (ovpn-113-145.phx2.redhat.com [10.3.113.145]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id sALGxL5I019695; Fri, 21 Nov 2014 11:59:21 -0500 Message-ID: <546F6F68.1040004@redhat.com> Date: Fri, 21 Nov 2014 09:59:20 -0700 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Roland Sieker , 19142-done@debbugs.gnu.org Subject: Re: bug#19142: sort not working with LANG set to language_country.encoding References: In-Reply-To: OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="4AaMS4qbqM5m3hvqSKTPnMPRkjJILO6gG" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --4AaMS4qbqM5m3hvqSKTPnMPRkjJILO6gG Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable tag 19142 notabug thanks On 11/21/2014 04:24 AM, Roland Sieker wrote: > Hi. >=20 > I have noticed that sort seems to have problems when the LANG environme= nt > variable is set with language and country. >=20 Thanks for the report. The whole point of locales is that each locale is free to choose the collation sequences that make the most sense for that locale. > It sorts OK like this, with LANG just the language.encoding: > ( setenv LANG en.UTF-8 ; echo 'a\nb\na\n=E2=BA=8C\n=E2=BA=95\n=E2=BA=8C= ' | sort ) [I'm translating your csh syntax into more-reliable sh syntax] Try turning on sort debugging: $ printf 'a\nb\na\n=E2=BA=8C\n=E2=BA=95\n=E2=BA=8C' | LC_ALL=3Den.UTF-8 s= ort --debug sort: using simple byte comparison a _ a _ b _ =E2=BA=8C ___ =E2=BA=8C ___ =E2=BA=95 ___ > But not with LANG as language_country.encoding: $ printf 'a\nb\na\n=E2=BA=8C\n=E2=BA=95\n=E2=BA=8C' | LC_ALL=3Den_GB.UTF-= 8 sort --debug sort: using =E2=80=98en_GB.UTF-8=E2=80=99 sorting rules =E2=BA=8C __ =E2=BA=95 __ =E2=BA=8C __ a _ a _ b _ That just means that whoever wrote the en_GB.UTF-8 locale picked a different collation sequence for non-ascii characters than the person that wrote the generic en.UTF-8 locale. That's not a bug in sort, so I'm closing this as not a bug from coreutils' perspective. Feel free to raise it as a glibc bug (the owner of locale definitions on GNU/Linux systems) if you have a strong reason why different locales should be more consistent on their choice of collation sequences. And feel free to reply further to this bug with more questions or comments, even though it has been closed. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --4AaMS4qbqM5m3hvqSKTPnMPRkjJILO6gG Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJUb29oAAoJEKeha0olJ0NqfbEIAJaIiY/Ek6iWaScwlCfb52J1 nJGBDeE8jRJS7Oo1Lql8/PkSjt1+VxoFNoKzgGhKnJxMmKEW6e9YtnOp2mHzsjt0 8vSZbMm8i2xVN2Ctmp3ifvQ3mFoiduldzUUvF33A0qEvO5pV/VHQShEd3QlG/hmu D9e/nIfR6LUKKBVQFhOlsaaINc5x4Ofnzimbed87D/Ed89MnIKAD01UhcVC+GGO+ x7hqAskiBGb259MZRgp+Ahy8ibYmPqcCw9EuJ4IRcprJzDxJuYUWZCzn9tOmKloP kg2Og0c7tz54RpyPUgvHBlRXu/GAnfZ1Ob037koZgP+CAJEZrMJ1QIy0RY7+mSo= =scZB -----END PGP SIGNATURE----- --4AaMS4qbqM5m3hvqSKTPnMPRkjJILO6gG-- From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 22 00:49:46 2014 Received: (at 19142) by debbugs.gnu.org; 22 Nov 2014 05:49:46 +0000 Received: from localhost ([127.0.0.1]:41381 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xs3Zq-0006fh-1T for submit@debbugs.gnu.org; Sat, 22 Nov 2014 00:49:46 -0500 Received: from joseki.proulx.com ([216.17.153.58]:46096) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Xs3Zn-0006fV-9b; Sat, 22 Nov 2014 00:49:44 -0500 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 32B1421839; Fri, 21 Nov 2014 22:49:42 -0700 (MST) Received: by hysteria.proulx.com (Postfix, from userid 1000) id D82222DC35; Fri, 21 Nov 2014 22:49:41 -0700 (MST) Date: Fri, 21 Nov 2014 22:49:41 -0700 From: Bob Proulx To: Roland Sieker Subject: Re: bug#19142: sort not working with LANG set to language_country.encoding Message-ID: <20141121223838645014669@bob.proulx.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 19142 Cc: 19142@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) tag 19142 notabug close 19142 thanks Roland Sieker wrote: > I have noticed that sort seems to have problems when the LANG environment > variable is set with language and country. Sort is definitely affected by LANG because LANG sets LC_COLLATE which controls the collation sequence. Different locales have different collating sequences. I don't like that the english locales such as my own country's en_US.UTF-8 and others like en_GB.UTF-8 don't sort "correctly" as far as I am concerned but I can only accept it. Sort order is actually a libc function and affects much more than sort. It also affects ls and the shell and basically everything on the system that sorts. > It sorts OK like this, with LANG just the language.encoding: > ( setenv LANG en.UTF-8 ; echo 'a\nb\na\n⺌\n⺕\n⺌' | sort ) > a > a > b Are you sure "en.UTF-8" is a valid locale? It doesn't look like it to me. I think that is an invalid locale and therefore libc is falling back to the C/POSIX locale. > But not with LANG as language_country.encoding: > ( setenv LANG en_GB.UTF-8 ; echo 'a\nb\na\n⺌\n⺕\n⺌' | sort ) Here "en_GB.UTF-8" is a valid domain and en_GB.UTF-8 uses dictionary sort ordering. Dictionary order folds case and ignores punctuation. Try using the newish sort --debug option. It will help debug problems such as this. $ printf "a\nb\na\n⺌\n⺕\n⺌\n" | env LC_ALL=en_US.UTF-8 sort --debug sort: using ‘en_US.UTF-8’ sorting rules ... $ printf "a\nb\na\n⺌\n⺕\n⺌\n" | env LC_ALL=en.UTF-8 sort --debug sort: using simple byte comparison ... See also the FAQ entry: https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021 Bob From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 23 07:58:50 2014 Received: (at 19142) by debbugs.gnu.org; 23 Nov 2014 12:58:50 +0000 Received: from localhost ([127.0.0.1]:42248 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XsWkc-00057K-1A for submit@debbugs.gnu.org; Sun, 23 Nov 2014 07:58:50 -0500 Received: from mout.kundenserver.de ([212.227.17.24]:57403) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XsWkZ-00057A-2r for 19142@debbugs.gnu.org; Sun, 23 Nov 2014 07:58:48 -0500 Received: from [192.168.1.10] (pD957CF52.dip0.t-ipconnect.de [217.87.207.82]) by mrelayeu.kundenserver.de (node=mreue103) with ESMTP (Nemesis) id 0MSYbs-1XTCPR0kdw-00Rcu8; Sun, 23 Nov 2014 13:58:45 +0100 Message-ID: <5471DA04.3070709@bernhard-voelker.de> Date: Sun, 23 Nov 2014 13:58:44 +0100 From: Bernhard Voelker User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Roland Sieker , 19142@debbugs.gnu.org Subject: Re: bug#19142: sort not working with LANG set to language_country.encoding References: In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:iFbtoDJVvAtmnrJZji/nWuUBX/3nmaOcYAxlH7JOPCY OBrB5IJ5N1yN8C0sQZd+nu4ubjPhb15DrWJYBP/QT9gmaX3utz ZFyQbEk3DU4KCVJG8+xa7DaykuPRHU+KR0Zkrl1ZrY7Dq7dYqH 6YS71Yxx8MYBUqlYK4n5p64JR6kF2O/nrZlvt/iQZ0HcG29MIb xjGQQlQE0slE6psAuKugrwX8ZBY2y9BQErghEdhKieaXBs9dFY CQ+6ZUwguXBA9KwpB8w0kMBaSAbkzTFwB2h5iWA24KnOHdvK6G ntmiJaCFPawEg+Lboz3XjsynPF7AVeqZlWRnTusshIk3HMd9Rq IveeWIKlGV+zjk28jJQXEUt1hDSLuLsucJLchfEbJ X-UI-Out-Filterresults: notjunk:1; X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 19142 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 11/21/2014 12:24 PM, Roland Sieker wrote: > sort: sort (GNU coreutils) 8.21 > Shell: tcsh 6.18.01 (Astron) 2012-02-14 (x86_64-unknown-linux) options > wide,nls,dl,al,kan,rh,color,filec > Fedora Linux 20 Additionally to what Bob wrote, I want to mention that the multi-byte support is not part of the upstream sort, but is added by the distribution, Fedora in your case. Have a nice day, Berny From unknown Fri Jul 25 06:44:14 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 22 Dec 2014 12:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator