From unknown Fri Jun 20 07:13:06 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6007: sort command in Fedora10 Resent-From: "Vito Di Blas" Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 22 Apr 2010 21:45:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 6007 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 6007@debbugs.gnu.org X-Debbugs-Original-To: Received: via spool by submit@debbugs.gnu.org id=B.12719726719944 (code B ref -1); Thu, 22 Apr 2010 21:45:03 +0000 Received: (at submit) by debbugs.gnu.org; 22 Apr 2010 21:44:31 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O54CF-0002aL-0E for submit@debbugs.gnu.org; Thu, 22 Apr 2010 17:44:31 -0400 Received: from mail.gnu.org ([199.232.76.166] helo=mx10.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O542P-0002W7-Tg for submit@debbugs.gnu.org; Thu, 22 Apr 2010 17:34:22 -0400 Received: from lists.gnu.org ([199.232.76.165]:52326) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1O542L-0001XV-TV for submit@debbugs.gnu.org; Thu, 22 Apr 2010 17:34:17 -0400 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O542K-0007XM-Rt for bug-coreutils@gnu.org; Thu, 22 Apr 2010 17:34:17 -0400 Received: from [140.186.70.92] (port=53684 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O542J-0007WY-Fa for bug-coreutils@gnu.org; Thu, 22 Apr 2010 17:34:16 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.0 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O542H-0002Cx-SA for bug-coreutils@gnu.org; Thu, 22 Apr 2010 17:34:15 -0400 Received: from cp-out10.libero.it ([212.52.84.110]:55679) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O542H-0002BT-IN for bug-coreutils@gnu.org; Thu, 22 Apr 2010 17:34:13 -0400 Received: from dell7 (151.59.244.33) by cp-out10.libero.it (8.5.119) (authenticated as vito.diblas@libero.it) id 4BB3222A034A047F for bug-coreutils@gnu.org; Thu, 22 Apr 2010 23:34:06 +0200 Message-ID: From: "Vito Di Blas" Date: Thu, 22 Apr 2010 23:34:07 +0200 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0007_01CAE274.4D89A330" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5843 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-Spam-Score: -3.3 (---) X-Mailman-Approved-At: Thu, 22 Apr 2010 17:44:29 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.6 (----) This is a multi-part message in MIME format. ------=_NextPart_000_0007_01CAE274.4D89A330 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Dear Friends, in Linux Fedora10, I sort the file aaa.txt : Cari figliozzi Cari figlipucci Cari figli, oggi Cari figli, ieri Cari figli, domani Cari figli, pregate Cari figlioli with the command: <...> sort < aaa.txt > bbb.txt and I obtain the file bbb.txt Cari figli, domani Cari figli, ieri Cari figli, oggi Cari figlioli Cari figliozzi Cari figli, pregate Cari figlipucci which doesn't look sorted according to my expectation. Then, in WindowsXP, I sort again the file aaa.txt with the command: <...> sort aaa.txt > ccc.txt and I get the file ccc.txt : Cari figli, domani Cari figli, ieri Cari figli, oggi Cari figli, pregate Cari figlioli Cari figliozzi Cari figlipucci which looks sorted as expected. Should I use in Fedora some sort option or I met a bug? Thanks for your attention and best regards Vito Di Blas Ivrea Italy vito.diblas@libero.it ------=_NextPart_000_0007_01CAE274.4D89A330 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Dear  Friends, in Linux Fedora10, =  I sort=20 the file aaa.txt :
 
Cari figliozzi
Cari = figlipucci
Cari figli,=20 oggi
Cari figli, ieri
Cari figli, domani
Cari figli, = pregate
Cari=20 figlioli
 
with the command:
 
<...>    = sort =20 < aaa.txt  >  bbb.txt
 
and I obtain the  file = bbb.txt
 
Cari figli, domani
Cari figli, = ieri
Cari=20 figli, oggi
Cari figlioli
Cari figliozzi
Cari figli, = pregate
Cari=20 figlipucci

 
which doesn't look sorted according to = my=20 expectation.
Then, in WindowsXP, I sort again the = file aaa.txt=20 with the command:
 
<...>   =  sort =20 aaa.txt  > ccc.txt
 
and I get the file ccc.txt = :
 
Cari figli, domani
Cari figli, = ieri
Cari=20 figli, oggi
Cari figli, pregate
Cari figlioli
Cari = figliozzi
Cari=20 figlipucci
 
which looks sorted as = expected.
Should  I  use in Fedora some = sort option=20 or I met a bug?
Thanks for your attention and best=20 regards
 
Vito Di = Blas  =20 Ivrea  Italy
vito.diblas@libero.it
 
 

------=_NextPart_000_0007_01CAE274.4D89A330-- From unknown Fri Jun 20 07:13:06 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.427 (Entity 5.427) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: "Vito Di Blas" Subject: bug#6007: closed (Re: bug#6007: sort command in Fedora10) Message-ID: References: <4BD0CD78.2050304@redhat.com> X-Gnu-PR-Message: they-closed 6007 X-Gnu-PR-Package: coreutils Reply-To: 6007@debbugs.gnu.org Date: Thu, 22 Apr 2010 22:29:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1271975342-11024-1" This is a multi-part message in MIME format... ------------=_1271975342-11024-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #6007: sort command in Fedora10 which was filed against the coreutils package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 6007@debbugs.gnu.org. --=20 6007: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D6007 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1271975342-11024-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 6007-done) by debbugs.gnu.org; 22 Apr 2010 22:28:14 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O54sY-0002rZ-E3 for submit@debbugs.gnu.org; Thu, 22 Apr 2010 18:28:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O54sW-0002rU-0j for 6007-done@debbugs.gnu.org; Thu, 22 Apr 2010 18:28:13 -0400 Received: from int-mx03.intmail.prod.int.phx2.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.16]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o3MMS6nX001552 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 22 Apr 2010 18:28:07 -0400 Received: from [10.3.252.197] (vpn-252-197.phx2.redhat.com [10.3.252.197]) by int-mx03.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o3MMS5Ji014039; Thu, 22 Apr 2010 18:28:05 -0400 Message-ID: <4BD0CD78.2050304@redhat.com> Date: Thu, 22 Apr 2010 16:28:08 -0600 From: Eric Blake Organization: Red Hat User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100330 Fedora/3.0.4-1.fc12 Lightning/1.0b1 Thunderbird/3.0.4 MIME-Version: 1.0 To: Vito Di Blas Subject: Re: bug#6007: sort command in Fedora10 References: In-Reply-To: X-Enigmail-Version: 1.0.1 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enigF6570850AD7A84A8397058A3" X-Scanned-By: MIMEDefang 2.67 on 10.5.11.16 X-Spam-Score: -10.2 (----------) X-Debbugs-Envelope-To: 6007-done Cc: 6007-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -10.2 (----------) This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigF6570850AD7A84A8397058A3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 04/22/2010 03:34 PM, Vito Di Blas wrote: > and I obtain the file bbb.txt >=20 > Cari figli, domani > Cari figli, ieri > Cari figli, oggi > Cari figlioli > Cari figliozzi > Cari figli, pregate > Cari figlipucci >=20 >=20 > which doesn't look sorted according to my expectation. Not a bug, if you are in a locale where the collating order discards punctuation and whitespace as insignificant. > Then, in WindowsXP, I sort again the file aaa.txt with the command: >=20 > <...> sort aaa.txt > ccc.txt >=20 > and I get the file ccc.txt : >=20 > Cari figli, domani > Cari figli, ieri > Cari figli, oggi > Cari figli, pregate > Cari figlioli > Cari figliozzi > Cari figlipucci This is due to a difference in the default locales of your two systems. http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-o= rder_0021 Try again with 'LC_ALL=3DC sort aaa.txt' to see the difference. Personally, I have 'export LC_COLLATE=3DC' in my ~/.bashrc in order to guarantee traditional sorting, while everything else continues to follow my default locale. --=20 Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org --------------enigF6570850AD7A84A8397058A3 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJL0M14AAoJEKeha0olJ0NqB5gH/iUMwdvONdEG2VjdpNEuHecf ShDTbLLHcorG3ivygVQw467OxrLZJlAkS8Qe1QH8wIb1wm2W08VA0HBdDSp6D2sA keJhpEAjrWaYVa7h+ZUOYH0iHCVqYhpKodE2Vpgh+/VbZSGrcNGIr7J8IHJUZUQv PX5Ou0TgVO1zqkSiTQRj6gnQyPYX2bRm+SbXhywY4LNZoK7pHdxb6hP+UPJcGUvx m9cFhHRhV7Ad4/UqWLG23QkrfO1WTWrjQEchKB91u8gnHehYvsXiu0yEAx9uHBEm CPEUXxIxoz51Nrticywl8F7q2ZGy55NIO+H3hAb7tsiwylZm+rZqzxQJvT/P2rg= =5WJI -----END PGP SIGNATURE----- --------------enigF6570850AD7A84A8397058A3-- ------------=_1271975342-11024-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 22 Apr 2010 21:44:31 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O54CF-0002aL-0E for submit@debbugs.gnu.org; Thu, 22 Apr 2010 17:44:31 -0400 Received: from mail.gnu.org ([199.232.76.166] helo=mx10.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O542P-0002W7-Tg for submit@debbugs.gnu.org; Thu, 22 Apr 2010 17:34:22 -0400 Received: from lists.gnu.org ([199.232.76.165]:52326) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1O542L-0001XV-TV for submit@debbugs.gnu.org; Thu, 22 Apr 2010 17:34:17 -0400 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1O542K-0007XM-Rt for bug-coreutils@gnu.org; Thu, 22 Apr 2010 17:34:17 -0400 Received: from [140.186.70.92] (port=53684 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1O542J-0007WY-Fa for bug-coreutils@gnu.org; Thu, 22 Apr 2010 17:34:16 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.0 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1O542H-0002Cx-SA for bug-coreutils@gnu.org; Thu, 22 Apr 2010 17:34:15 -0400 Received: from cp-out10.libero.it ([212.52.84.110]:55679) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O542H-0002BT-IN for bug-coreutils@gnu.org; Thu, 22 Apr 2010 17:34:13 -0400 Received: from dell7 (151.59.244.33) by cp-out10.libero.it (8.5.119) (authenticated as vito.diblas@libero.it) id 4BB3222A034A047F for bug-coreutils@gnu.org; Thu, 22 Apr 2010 23:34:06 +0200 Message-ID: From: "Vito Di Blas" To: Subject: sort command in Fedora10 Date: Thu, 22 Apr 2010 23:34:07 +0200 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0007_01CAE274.4D89A330" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5843 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-Spam-Score: -3.3 (---) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Thu, 22 Apr 2010 17:44:29 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.6 (----) This is a multi-part message in MIME format. ------=_NextPart_000_0007_01CAE274.4D89A330 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Dear Friends, in Linux Fedora10, I sort the file aaa.txt : Cari figliozzi Cari figlipucci Cari figli, oggi Cari figli, ieri Cari figli, domani Cari figli, pregate Cari figlioli with the command: <...> sort < aaa.txt > bbb.txt and I obtain the file bbb.txt Cari figli, domani Cari figli, ieri Cari figli, oggi Cari figlioli Cari figliozzi Cari figli, pregate Cari figlipucci which doesn't look sorted according to my expectation. Then, in WindowsXP, I sort again the file aaa.txt with the command: <...> sort aaa.txt > ccc.txt and I get the file ccc.txt : Cari figli, domani Cari figli, ieri Cari figli, oggi Cari figli, pregate Cari figlioli Cari figliozzi Cari figlipucci which looks sorted as expected. Should I use in Fedora some sort option or I met a bug? Thanks for your attention and best regards Vito Di Blas Ivrea Italy vito.diblas@libero.it ------=_NextPart_000_0007_01CAE274.4D89A330 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Dear  Friends, in Linux Fedora10, =  I sort=20 the file aaa.txt :
 
Cari figliozzi
Cari = figlipucci
Cari figli,=20 oggi
Cari figli, ieri
Cari figli, domani
Cari figli, = pregate
Cari=20 figlioli
 
with the command:
 
<...>    = sort =20 < aaa.txt  >  bbb.txt
 
and I obtain the  file = bbb.txt
 
Cari figli, domani
Cari figli, = ieri
Cari=20 figli, oggi
Cari figlioli
Cari figliozzi
Cari figli, = pregate
Cari=20 figlipucci

 
which doesn't look sorted according to = my=20 expectation.
Then, in WindowsXP, I sort again the = file aaa.txt=20 with the command:
 
<...>   =  sort =20 aaa.txt  > ccc.txt
 
and I get the file ccc.txt = :
 
Cari figli, domani
Cari figli, = ieri
Cari=20 figli, oggi
Cari figli, pregate
Cari figlioli
Cari = figliozzi
Cari=20 figlipucci
 
which looks sorted as = expected.
Should  I  use in Fedora some = sort option=20 or I met a bug?
Thanks for your attention and best=20 regards
 
Vito Di = Blas  =20 Ivrea  Italy
vito.diblas@libero.it
 
 

------=_NextPart_000_0007_01CAE274.4D89A330-- ------------=_1271975342-11024-1-- From unknown Fri Jun 20 07:13:06 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6007: sort command in Fedora10 Resent-From: Bob Proulx Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 22 Apr 2010 22:42:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6007 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Vito Di Blas Cc: 6007@debbugs.gnu.org Received: via spool by 6007-submit@debbugs.gnu.org id=B6007.127197610811424 (code B ref 6007); Thu, 22 Apr 2010 22:42:02 +0000 Received: (at 6007) by debbugs.gnu.org; 22 Apr 2010 22:41:48 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O555e-0002yC-Lm for submit@debbugs.gnu.org; Thu, 22 Apr 2010 18:41:46 -0400 Received: from joseki.proulx.com ([216.17.153.58]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O555c-0002y3-Pq; Thu, 22 Apr 2010 18:41:46 -0400 Received: from dementia.proulx.com (dementia.proulx.com [192.168.230.115]) by joseki.proulx.com (Postfix) with ESMTP id BE88221363; Thu, 22 Apr 2010 16:41:39 -0600 (MDT) Received: by dementia.proulx.com (Postfix, from userid 1000) id 8AD703CC39D; Thu, 22 Apr 2010 16:41:39 -0600 (MDT) Date: Thu, 22 Apr 2010 16:41:39 -0600 From: Bob Proulx Message-ID: <20100422224139.GA8200@dementia.proulx.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) X-Spam-Score: -1.3 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) tags 6007 + moreinfo retitle 6007 locale sort ordering confusion thanks Vito Di Blas wrote: > <...> sort < aaa.txt > bbb.txt > Cari figli, domani > Cari figli, ieri > Cari figli, oggi > Cari figlioli > Cari figliozzi > Cari figli, pregate > Cari figlipucci Thank you for the bug report. However what you are seeing is intended behavior. It isn't something sort has control over. The character collation sequence is chosen by your specified locale. You can see what locale you have configured with the 'locale' command. $ locale > which doesn't look sorted according to my expectation. You don't like it and I don't like it but the-powers-that-be have confused working with data on a computer with talking about working with data on a computer. They have decided that the collation ordering (sort ordering) for data should be dictionary ordering. In dictionary ordering case is folded together and punctuation is ignored. By having LANG set to any of the "en" locales the system is instructed to use dictionary sort ordering. This affects almost everything on the system that sorts. This includes commands such as 'ls' and also your shell (e.g. 'echo *') too. > Should I use in Fedora some sort option or I met a bug? Your sort order depends upon your locale. You didn't say what your locale was and therefore I assume that you were not aware that it had an effect. The documentation says: Unless otherwise specified, all comparisons use the character collating sequence specified by the `LC_COLLATE' locale.(1) ... (1) If you use a non-POSIX locale (e.g., by setting `LC_ALL' to `en_US'), then `sort' may produce output that is sorted differently than you're accustomed to. In that case, set the `LC_ALL' environment variable to `C'. Note that setting only `LC_COLLATE' has two problems. First, it is ineffective if `LC_ALL' is also set. Second, it has undefined behavior if `LC_CTYPE' (or `LANG', if `LC_CTYPE' is unset) is set to an incompatible value. For example, you get undefined behavior if `LC_CTYPE' is `ja_JP.PCK' but `LC_COLLATE' is `en_US.UTF-8'. Personally I have the following in my $HOME/.bashrc file. export LANG=en_US.UTF-8 export LC_COLLATE=C That sets most of my locale to a UTF-8 one but forces sorting to be standard C/POSIX. This probably won't work in the general case since I have no idea how that would interact with all character sets. You may want to look at the FAQ. http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021 > Then, in WindowsXP, I sort again the file aaa.txt with the command: > ... > which looks sorted as expected. Probably that platform does not support, or is not configured for, the same locale sets as the other host. Bob From unknown Fri Jun 20 07:13:06 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6007: en_US sorting is completely stupid. References: Resent-From: "Alan Curry" Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 22 Apr 2010 23:09:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6007 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: moreinfo To: 6007@debbugs.gnu.org Received: via spool by 6007-submit@debbugs.gnu.org id=B6007.127197774014094 (code B ref 6007); Thu, 22 Apr 2010 23:09:02 +0000 Received: (at 6007) by debbugs.gnu.org; 22 Apr 2010 23:09:00 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O55Vz-0003fH-Fz for submit@debbugs.gnu.org; Thu, 22 Apr 2010 19:08:59 -0400 Received: from c-98-226-122-10.hsd1.in.comcast.net ([98.226.122.10] helo=kosh.dhis.org) by debbugs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1O55Vy-0003fC-Lo for 6007@debbugs.gnu.org; Thu, 22 Apr 2010 19:08:59 -0400 Received: (qmail 3237 invoked by uid 1000); 22 Apr 2010 23:08:54 -0000 Message-ID: <20100422230854.3236.qmail@kosh.dhis.org> From: "Alan Curry" Date: Thu, 22 Apr 2010 18:08:54 -0500 (GMT+5) In-Reply-To: <20100422224139.GA8200@dementia.proulx.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: 1.4 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Bob Proulx writes: > > You don't like it and I don't like it but the-powers-that-be have Who's the "power" here anyway? Who do we have to impeach? Seriously. The "en_US" locale is an unmitigated disaster. It's officially called "not a bug" every time it comes up, which seems to be once a week on this list alone, so what volume of complaints is required to tip the balance to "all right it's a damn bug let's fix it"? [...] Content analysis details: (1.4 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.9 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address [98.226.122.10 listed in dnsbl.sorbs.net] 0.9 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL [98.226.122.10 listed in zen.spamhaus.org] 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: 0.5000] 0.1 RDNS_DYNAMIC Delivered to trusted network by host with dynamic-looking rDNS -0.5 AWL AWL: From: address is in the auto white-list X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Bob Proulx writes: > > You don't like it and I don't like it but the-powers-that-be have Who's the "power" here anyway? Who do we have to impeach? Seriously. The "en_US" locale is an unmitigated disaster. It's officially called "not a bug" every time it comes up, which seems to be once a week on this list alone, so what volume of complaints is required to tip the balance to "all right it's a damn bug let's fix it"? [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.9 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL [98.226.122.10 listed in zen.spamhaus.org] 0.9 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address [98.226.122.10 listed in dnsbl.sorbs.net] 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: 0.5000] 0.1 RDNS_DYNAMIC Delivered to trusted network by host with dynamic-looking rDNS -0.4 AWL AWL: From: address is in the auto white-list Bob Proulx writes: > > You don't like it and I don't like it but the-powers-that-be have Who's the "power" here anyway? Who do we have to impeach? Seriously. The "en_US" locale is an unmitigated disaster. It's officially called "not a bug" every time it comes up, which seems to be once a week on this list alone, so what volume of complaints is required to tip the balance to "all right it's a damn bug let's fix it"? >From the name "en_US" one might guess that it represents the behavior expected by English-speaking users in or from the US. But those users have lived with computers for a generation or two. What they expect is ASCIIbetical. The only people who actually expect phone-book-style sorting are old geezers who remember what a phone book was. Most of them have never used a computer and never will, so why do we (and by "we" I mean whoever makes the locale rules) bend the default to accommodate them? -- Alan Curry From unknown Fri Jun 20 07:13:06 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6007: en_US sorting is completely stupid. Resent-From: Bob Proulx Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 22 Apr 2010 23:43:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6007 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: moreinfo To: 6007@debbugs.gnu.org Received: via spool by 6007-submit@debbugs.gnu.org id=B6007.127197976614933 (code B ref 6007); Thu, 22 Apr 2010 23:43:02 +0000 Received: (at 6007) by debbugs.gnu.org; 22 Apr 2010 23:42:46 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O562f-0003so-LQ for submit@debbugs.gnu.org; Thu, 22 Apr 2010 19:42:45 -0400 Received: from joseki.proulx.com ([216.17.153.58]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O562e-0003sj-3O for 6007@debbugs.gnu.org; Thu, 22 Apr 2010 19:42:44 -0400 Received: from dementia.proulx.com (dementia.proulx.com [192.168.230.115]) by joseki.proulx.com (Postfix) with ESMTP id 96DEB21379 for <6007@debbugs.gnu.org>; Thu, 22 Apr 2010 17:42:39 -0600 (MDT) Received: by dementia.proulx.com (Postfix, from userid 1000) id 762123CC39D; Thu, 22 Apr 2010 17:42:39 -0600 (MDT) Date: Thu, 22 Apr 2010 17:42:39 -0600 From: Bob Proulx Message-ID: <20100422234239.GA16205@dementia.proulx.com> References: <20100422230854.3236.qmail@kosh.dhis.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100422230854.3236.qmail@kosh.dhis.org> User-Agent: Mutt/1.5.18 (2008-05-17) X-Spam-Score: -1.3 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) Alan Curry wrote: > Bob Proulx writes: > > You don't like it and I don't like it but the-powers-that-be have > > Who's the "power" here anyway? Who do we have to impeach? Seriously. The > "en_US" locale is an unmitigated disaster. It's officially called "not a bug" > every time it comes up, which seems to be once a week on this list alone, so > what volume of complaints is required to tip the balance to "all right it's a > damn bug let's fix it"? As far as I know, which isn't as much as I would like especially in this case, it is implemented in libc. Therefore it would need to be addressed with libc folks. http://www.gnu.org/software/libc/ But very likely the chain continues well beyond that point. If you find out, please educate me. > From the name "en_US" one might guess that it represents the behavior > expected by English-speaking users in or from the US. But those users have > lived with computers for a generation or two. What they expect is > ASCIIbetical. The only people who actually expect phone-book-style sorting > are old geezers who remember what a phone book was. Most of them have never > used a computer and never will, so why do we (and by "we" I mean whoever > makes the locale rules) bend the default to accommodate them? It would be nice to be able to set my locale to en_US@C.UTF-8 or en_US@POSIX.UTF-8 and get a better behaved collation sequence. Bob From unknown Fri Jun 20 07:13:06 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6007: en_US sorting is completely stupid. Resent-From: Andreas Schwab Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 23 Apr 2010 08:17:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6007 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: moreinfo To: "Alan Curry" Cc: 6007@debbugs.gnu.org Received: via spool by 6007-submit@debbugs.gnu.org id=B6007.127201057428413 (code B ref 6007); Fri, 23 Apr 2010 08:17:02 +0000 Received: (at 6007) by debbugs.gnu.org; 23 Apr 2010 08:16:14 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O5E3a-0007OE-GH for submit@debbugs.gnu.org; Fri, 23 Apr 2010 04:16:14 -0400 Received: from mail-out.m-online.net ([212.18.0.10]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O5E3Y-0007O9-FI for 6007@debbugs.gnu.org; Fri, 23 Apr 2010 04:16:13 -0400 Received: from mail01.m-online.net (mail.m-online.net [192.168.3.149]) by mail-out.m-online.net (Postfix) with ESMTP id 69B321C005C6; Fri, 23 Apr 2010 10:16:08 +0200 (CEST) Received: from localhost (dynscan1.mnet-online.de [192.168.8.164]) by mail.m-online.net (Postfix) with ESMTP id 0CDB9902FA; Fri, 23 Apr 2010 10:16:08 +0200 (CEST) X-Virus-Scanned: amavisd-new at mnet-online.de Received: from mail.mnet-online.de ([192.168.3.149]) by localhost (dynscan1.mnet-online.de [192.168.8.164]) (amavisd-new, port 10024) with ESMTP id SthGfcArtNPj; Fri, 23 Apr 2010 10:16:05 +0200 (CEST) Received: from igel.home (ppp-88-217-104-84.dynamic.mnet-online.de [88.217.104.84]) by mail.mnet-online.de (Postfix) with ESMTP; Fri, 23 Apr 2010 10:16:05 +0200 (CEST) Received: by igel.home (Postfix, from userid 501) id 266D7CA297; Fri, 23 Apr 2010 10:16:05 +0200 (CEST) From: Andreas Schwab References: <20100422230854.3236.qmail@kosh.dhis.org> X-Yow: JAPAN is a WONDERFUL planet -- I wonder if we'll ever reach their level of COMPARATIVE SHOPPING... Date: Fri, 23 Apr 2010 10:16:04 +0200 In-Reply-To: <20100422230854.3236.qmail@kosh.dhis.org> (Alan Curry's message of "Thu, 22 Apr 2010 18:08:54 -0500 (GMT+5)") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.96 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Score: -1.9 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) "Alan Curry" writes: > Who's the "power" here anyway? You are, actually. Everyone can define locales to behave the way he likes, see localedef(1). > From the name "en_US" one might guess that it represents the behavior > expected by English-speaking users in or from the US. But those users > have lived with computers for a generation or two. What they expect is > ASCIIbetical. Nowadays most people don't know what ASCII is. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From unknown Fri Jun 20 07:13:06 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6007: en_US sorting is completely stupid. References: Resent-From: "Alan Curry" Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 23 Apr 2010 08:48:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6007 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: moreinfo To: 6007@debbugs.gnu.org Received: via spool by 6007-submit@debbugs.gnu.org id=B6007.127201247529263 (code B ref 6007); Fri, 23 Apr 2010 08:48:02 +0000 Received: (at 6007) by debbugs.gnu.org; 23 Apr 2010 08:47:55 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O5EYE-0007bw-41 for submit@debbugs.gnu.org; Fri, 23 Apr 2010 04:47:54 -0400 Received: from c-98-226-122-10.hsd1.in.comcast.net ([98.226.122.10] helo=kosh.dhis.org) by debbugs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1O5EYC-0007br-Bc for 6007@debbugs.gnu.org; Fri, 23 Apr 2010 04:47:53 -0400 Received: (qmail 22622 invoked by uid 1000); 23 Apr 2010 08:47:49 -0000 Message-ID: <20100423084749.22621.qmail@kosh.dhis.org> From: "Alan Curry" Date: Fri, 23 Apr 2010 03:47:49 -0500 (GMT+5) In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Andreas Schwab writes: > > "Alan Curry" writes: > > > Who's the "power" here anyway? > > You are, actually. Everyone can define locales to behave the way he > likes, see localedef(1). [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.9 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL [98.226.122.10 listed in zen.spamhaus.org] 0.9 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address [98.226.122.10 listed in dnsbl.sorbs.net] 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: 0.5000] 0.1 RDNS_DYNAMIC Delivered to trusted network by host with dynamic-looking rDNS -0.4 AWL AWL: From: address is in the auto white-list X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 1.6 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Andreas Schwab writes: > > "Alan Curry" writes: > > > Who's the "power" here anyway? > > You are, actually. Everyone can define locales to behave the way he > likes, see localedef(1). [...] Content analysis details: (1.6 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.9 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL [98.226.122.10 listed in zen.spamhaus.org] 0.9 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address [98.226.122.10 listed in dnsbl.sorbs.net] 0.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% [score: 0.4992] 0.1 RDNS_DYNAMIC Delivered to trusted network by host with dynamic-looking rDNS -0.3 AWL AWL: From: address is in the auto white-list Andreas Schwab writes: > > "Alan Curry" writes: > > > Who's the "power" here anyway? > > You are, actually. Everyone can define locales to behave the way he > likes, see localedef(1). I avoid this by not having any locales installed. But that doesn't help all the other victims. > > > From the name "en_US" one might guess that it represents the behavior > > expected by English-speaking users in or from the US. But those users > > have lived with computers for a generation or two. What they expect is > > ASCIIbetical. > > Nowadays most people don't know what ASCII is. They may not know how to name it, but they do complain when it isn't used, enough that it's a FAQ. People install a GNU/Linux distribution, pick "English" from the language menu, and get a set of sorting rules that doesn't makes sense. Sorry, should have told the installer you speak "C". "Donna Summer" just doesn't belong between "Don Adams" and "Don Pardo", and everyone knows it. Not a bug? Bah. Not a coreutils bug, but it's a bug. If glibc was in the same bug tracking system with coreutils, reports like this one could be reassigned there. -- Alan Curry From unknown Fri Jun 20 07:13:06 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6007: en_US sorting is completely stupid. Resent-From: Bob Proulx Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 23 Apr 2010 19:08:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 6007 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: moreinfo To: 6007@debbugs.gnu.org Received: via spool by 6007-submit@debbugs.gnu.org id=B6007.127204967325789 (code B ref 6007); Fri, 23 Apr 2010 19:08:02 +0000 Received: (at 6007) by debbugs.gnu.org; 23 Apr 2010 19:07:53 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O5OEC-0006hu-VL for submit@debbugs.gnu.org; Fri, 23 Apr 2010 15:07:53 -0400 Received: from joseki.proulx.com ([216.17.153.58]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1O5OEB-0006hm-74 for 6007@debbugs.gnu.org; Fri, 23 Apr 2010 15:07:51 -0400 Received: from dementia.proulx.com (dementia.proulx.com [192.168.230.115]) by joseki.proulx.com (Postfix) with ESMTP id 65B8E21363 for <6007@debbugs.gnu.org>; Fri, 23 Apr 2010 13:07:46 -0600 (MDT) Received: by dementia.proulx.com (Postfix, from userid 1000) id 5D9DC3CC220; Fri, 23 Apr 2010 13:07:46 -0600 (MDT) Date: Fri, 23 Apr 2010 13:07:46 -0600 From: Bob Proulx Message-ID: <20100423190746.GA31855@dementia.proulx.com> References: <20100422230854.3236.qmail@kosh.dhis.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) X-Spam-Score: -2.6 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) Andreas Schwab wrote: > Alan Curry writes: > > From the name "en_US" one might guess that it represents the behavior > > expected by English-speaking users in or from the US. But those users > > have lived with computers for a generation or two. What they expect is > > ASCIIbetical. > > Nowadays most people don't know what ASCII is. Even fewer know about EBCDIC. Or why native host byte ordering might differ between machines with different encodings. Bob