From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 03 16:32:03 2010 Received: (at submit) by debbugs.gnu.org; 3 Aug 2010 20:32:03 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OgO9a-0004lG-MF for submit@debbugs.gnu.org; Tue, 03 Aug 2010 16:32:03 -0400 Received: from mx10.gnu.org ([199.232.76.166]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OgO4c-0004iU-61 for submit@debbugs.gnu.org; Tue, 03 Aug 2010 16:26:55 -0400 Received: from lists.gnu.org ([199.232.76.165]:43265) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1OgO52-0004Jg-KY for submit@debbugs.gnu.org; Tue, 03 Aug 2010 16:27:20 -0400 Received: from [140.186.70.92] (port=60769 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OgO50-0001JY-Qt for bug-coreutils@gnu.org; Tue, 03 Aug 2010 16:27:19 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OgO4z-0001HT-Hd for bug-coreutils@gnu.org; Tue, 03 Aug 2010 16:27:18 -0400 Received: from sj-iport-6.cisco.com ([171.71.176.117]:57260) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OgO4z-0001H7-9u for bug-coreutils@gnu.org; Tue, 03 Aug 2010 16:27:17 -0400 Authentication-Results: sj-iport-6.cisco.com; dkim=neutral (message not signed) header.i=none X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4FAPsXWExAaMHG/2dsb2JhbACTOoxbcakam1OFOQSEFYc5CwE X-IronPort-AV: E=Sophos;i="4.55,311,1278288000"; d="scan'208";a="568062457" Received: from syd-core-1.cisco.com ([64.104.193.198]) by sj-iport-6.cisco.com with ESMTP; 03 Aug 2010 20:27:13 +0000 Received: from xbh-bgl-411.cisco.com (xbh-bgl-411.cisco.com [72.163.129.201]) by syd-core-1.cisco.com (8.13.8/8.14.3) with ESMTP id o73KRCAJ018013 for ; Tue, 3 Aug 2010 20:27:13 GMT Received: from xmb-bgl-412.cisco.com ([72.163.129.208]) by xbh-bgl-411.cisco.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 4 Aug 2010 01:57:12 +0530 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: Problem(bug?) with basic sort command in Linux Date: Wed, 4 Aug 2010 01:56:55 +0530 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Problem(bug?) with basic sort command in Linux Thread-Index: AcszRty2EpMpeyE6Thuv+xeKvtk1qAAAr+rA From: "George Thomas Irimben (georgeti)" To: X-OriginalArrivalTime: 03 Aug 2010 20:27:12.0515 (UTC) FILETIME=[413AA530:01CB334A] X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-Spam-Score: -5.4 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Tue, 03 Aug 2010 16:32:01 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.6 (-----) Hi, Setting LC_ALL to C fixed the issue.=20 But before setting this to C, with same .cshrc file, Unix didn't give me a problem. Does it mean my shell script which has this sort command will not work for others unless they set their LC_ALL? Should I set this to C in my script itself?=20 Any other suggestion? Thanks, George -----Original Message----- From: George Thomas Irimben (georgeti)=20 Sent: Wednesday, August 04, 2010 1:33 AM To: 'bug-coreutils@gnu.org' Subject: Problem(bug?) with basic sort command in Linux Hi, I would like to report a problem(bug?) I am facing with sort command in Linux. Sorting of a simple text file using simple sort command is giving me incorrect result. Here is the problem: Text file to sort has 3 lines my-lnx7% cat y abc/d,ABC abc/,XYZ abc/o,MNO sort command from Linux is giving me below result(According to me, this result is incorrect) my-lnx7% sort y abc/d,ABC abc/o,MNO abc/,XYZ But, result expected is as below. Because "," is ahead of "d" in ASCII table.=20 Same found working on Unix using same input file, same command line. abc/,XYZ abc/d,ABC abc/o,MNO Pls let me know if this is a problem in Linux or I am missing something. Thanks, George From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 03 17:08:38 2010 Received: (at 6791) by debbugs.gnu.org; 3 Aug 2010 21:08:38 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OgOiy-00052U-WD for submit@debbugs.gnu.org; Tue, 03 Aug 2010 17:08:38 -0400 Received: from kiwi.cs.ucla.edu ([131.179.128.19]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OgOiv-00052O-TB for 6791@debbugs.gnu.org; Tue, 03 Aug 2010 17:08:35 -0400 Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by kiwi.cs.ucla.edu (8.13.8+Sun/8.13.8/UCLACS-6.0) with ESMTP id o73L8x9J010972; Tue, 3 Aug 2010 14:08:59 -0700 (PDT) Message-ID: <4C58856A.1030602@cs.ucla.edu> Date: Tue, 03 Aug 2010 14:08:58 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100713 Thunderbird/3.0.6 MIME-Version: 1.0 To: "George Thomas Irimben (georgeti)" Subject: Re: bug#6791: Problem(bug?) with basic sort command in Linux References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Score: -3.4 (---) X-Debbugs-Envelope-To: 6791 Cc: 6791@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.4 (---) On 08/03/10 13:26, George Thomas Irimben (georgeti) wrote: > Should I set this to C in my script itself? Yes. From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 03 17:11:05 2010 Received: (at 6791) by debbugs.gnu.org; 3 Aug 2010 21:11:05 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OgOlM-00053r-Qe for submit@debbugs.gnu.org; Tue, 03 Aug 2010 17:11:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OgOlJ-00053S-U1; Tue, 03 Aug 2010 17:11:03 -0400 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o73LBR6V002519 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 3 Aug 2010 17:11:27 -0400 Received: from [10.3.113.47] (ovpn-113-47.phx2.redhat.com [10.3.113.47]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o73LBRUo007608; Tue, 3 Aug 2010 17:11:27 -0400 Message-ID: <4C5885C1.6080508@redhat.com> Date: Tue, 03 Aug 2010 15:10:25 -0600 From: Eric Blake Organization: Red Hat User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.7) Gecko/20100720 Fedora/3.1.1-1.fc13 Lightning/1.0b2pre Mnenhy/0.8.3 Thunderbird/3.1.1 MIME-Version: 1.0 To: "George Thomas Irimben (georgeti)" Subject: Re: bug#6791: Problem(bug?) with basic sort command in Linux References: In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enig8C26A1FBD9E4EF846A110470" X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-Spam-Score: -9.2 (---------) X-Debbugs-Envelope-To: 6791 Cc: 6791@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -10.1 (----------) This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig8C26A1FBD9E4EF846A110470 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable merge 6790 6791 tag 6790 + notabug close 6790 thanks On 08/03/2010 02:26 PM, George Thomas Irimben (georgeti) wrote: > Hi, > Setting LC_ALL to C fixed the issue.=20 > But before setting this to C, with same .cshrc file, Unix didn't give m= e > a problem. That just means that you have a different default locale on your Unix box than you do on the box where you encountered the difference. >=20 > Does it mean my shell script which has this sort command will not work > for others unless they set their LC_ALL? > Should I set this to C in my script itself?=20 In general, it is a good idea, if you don't want locale differences to impact the behavior of your script. This is not a bug in sort, and it is a FAQ: http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-o= rder_0021 --=20 Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org --------------enig8C26A1FBD9E4EF846A110470 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJMWIXBAAoJEKeha0olJ0NqjEYH/R8SKqAmDCUGjtGCqjZsbD2r 9uM8GvOX+VPLArTxHjrW8CtPp1BglzJ6may5NheV355oTvXO5kELk02dUcnZ6SiR lFozsn5ERXBf8vZkyWgNeBeklrOcHuE59lrWX81s0A4GfIVf2IH2jHkh0an0FJq9 APK+f/Zhfp3ki3FqSAB0fYUOGFhz8qUj5O5Z7zFAFT7pVFh/F4TxaCCk2JV2gZsw gsuf2HuzUJNCA/8aFdrWFJQVWgBDWTJQLXWF4gzKKly+9y4Ye992oX0lBtOpIICB JI5anQ5ygZeRes/NjrWgsD5f/wF/cyA2knfYp/k+dVJEAi964u+CJqme2O1hLNs= =rh5c -----END PGP SIGNATURE----- --------------enig8C26A1FBD9E4EF846A110470-- From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 03 17:38:48 2010 Received: (at 6791) by debbugs.gnu.org; 3 Aug 2010 21:38:48 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OgPCB-0005sg-Nn for submit@debbugs.gnu.org; Tue, 03 Aug 2010 17:38:47 -0400 Received: from joseki.proulx.com ([216.17.153.58]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OgPC8-0005sY-QI; Tue, 03 Aug 2010 17:38:46 -0400 Received: from dementia.proulx.com (dementia.proulx.com [192.168.230.115]) by joseki.proulx.com (Postfix) with ESMTP id 043ED2130E; Tue, 3 Aug 2010 15:39:11 -0600 (MDT) Received: by dementia.proulx.com (Postfix, from userid 1000) id E7D1C3CC398; Tue, 3 Aug 2010 15:39:10 -0600 (MDT) Date: Tue, 3 Aug 2010 15:39:10 -0600 From: Bob Proulx To: "George Thomas Irimben (georgeti)" Subject: Re: bug#6791: Problem(bug?) with basic sort command in Linux Message-ID: <20100803213910.GD8911@dementia.proulx.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) X-Spam-Score: -1.1 (-) X-Debbugs-Envelope-To: 6791 Cc: 6791@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.4 (--) forcemerge 6790 6791 tags 6790 + moreinfo retitle 6790 locale sort ordering confusion thanks George Thomas Irimben (georgeti) wrote: > Setting LC_ALL to C fixed the issue. Yes. This is a well known behavior. Thank you for the report anyway. However what you are seeing is intended behavior. It isn't something sort has control over. The character collation sequence is chosen by your specified locale. You can see what locale you have configured with the 'locale' command. $ locale > But before setting this to C, with same .cshrc file, Unix didn't give me > a problem. Your system probably didn't set the locale before and now it does. Or you were using a different system. Or some such. Definitely this is a locale change. Now that you know what to look for I am sure you will locate the specific thing that changed. > Does it mean my shell script which has this sort command will not work > for others unless they set their LC_ALL? > Should I set this to C in my script itself? Correct. If your script requires a standard sort order then you will need to ensure it yourself. Because the environment it runs in may default to a different locale sort ordering otherwise. You don't like it and I don't like it but the-powers-that-be have confused working with data on a computer with talking about working with data on a computer. They have decided that the collation ordering (sort ordering) for data should be dictionary ordering. In dictionary ordering case is folded together and punctuation is ignored. For example by having LANG set to any of the "en_*" locales the system is instructed to use dictionary sort ordering. This affects almost everything on the system that sorts. This includes commands such as 'ls' and also commands built into your shell (e.g. 'echo *') too. > Any other suggestion? Your sort order depends upon your locale. You didn't say what your locale was and therefore I assume that you were not aware that it had an effect. The documentation says: Unless otherwise specified, all comparisons use the character collating sequence specified by the `LC_COLLATE' locale.(1) ... (1) If you use a non-POSIX locale (e.g., by setting `LC_ALL' to `en_US'), then `sort' may produce output that is sorted differently than you're accustomed to. In that case, set the `LC_ALL' environment variable to `C'. Note that setting only `LC_COLLATE' has two problems. First, it is ineffective if `LC_ALL' is also set. Second, it has undefined behavior if `LC_CTYPE' (or `LANG', if `LC_CTYPE' is unset) is set to an incompatible value. For example, you get undefined behavior if `LC_CTYPE' is `ja_JP.PCK' but `LC_COLLATE' is `en_US.UTF-8'. As far as I know, which isn't as much as I would like especially in this case, it is implemented in libc. Therefore it would need to be addressed with libc folks. http://www.gnu.org/software/libc/ But very likely the chain continues well beyond that point. Personally I have the following in my $HOME/.bashrc file. export LANG=en_US.UTF-8 export LC_COLLATE=C That sets most of my locale to a UTF-8 one but forces sorting to be standard C/POSIX. This probably won't work in the general case since I have no idea how that would interact with all character sets. You may want to look at the FAQ. http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021 Bob From unknown Sat Sep 06 00:10:49 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 01 Sep 2010 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator