From unknown Tue Jun 17 01:49:48 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#19021 <19021@debbugs.gnu.org> To: bug#19021 <19021@debbugs.gnu.org> Subject: Status: Possible bug in sort Reply-To: bug#19021 <19021@debbugs.gnu.org> Date: Tue, 17 Jun 2025 08:49:48 +0000 retitle 19021 Possible bug in sort reassign 19021 coreutils submitter 19021 Ben Mendis severity 19021 normal tag 19021 notabug thanks From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 11 11:41:42 2014 Received: (at submit) by debbugs.gnu.org; 11 Nov 2014 16:41:42 +0000 Received: from localhost ([127.0.0.1]:57540 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XoEVh-00056H-BD for submit@debbugs.gnu.org; Tue, 11 Nov 2014 11:41:41 -0500 Received: from eggs.gnu.org ([208.118.235.92]:57952) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XoETO-00052h-BY for submit@debbugs.gnu.org; Tue, 11 Nov 2014 11:39:18 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XoETN-0005mn-Dv for submit@debbugs.gnu.org; Tue, 11 Nov 2014 11:39:18 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:45764) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XoETN-0005mf-A5 for submit@debbugs.gnu.org; Tue, 11 Nov 2014 11:39:17 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59661) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XoETL-0004jY-Vv for bug-coreutils@gnu.org; Tue, 11 Nov 2014 11:39:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XoETK-0005mC-Pf for bug-coreutils@gnu.org; Tue, 11 Nov 2014 11:39:15 -0500 Received: from mail-qa0-x22d.google.com ([2607:f8b0:400d:c00::22d]:51192) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XoETK-0005m5-Lo for bug-coreutils@gnu.org; Tue, 11 Nov 2014 11:39:14 -0500 Received: by mail-qa0-f45.google.com with SMTP id dc16so7224913qab.32 for ; Tue, 11 Nov 2014 08:39:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=0i78fWYliAN/L9PvlhJL+CQKFaE6a9FQHvQe022ctyk=; b=yGK0VuPxjABYAe8yI0HLRMP4lv1PsN9tLy2ZnJxRr3IvgRDRLw0wxlEWeFtBKX9WlM g31YiXe1Z3mTQzdL4fY7aosjXWrLcyYxuEreuKGnNKQaKaxaDUoXPpeirzdHCnEG6FE8 Ji/l/heDalmGLnzvwP3Byk7pCUhuK92ouogkfsw9sdfLX2G97M+npohnnydAIbFJirK5 k0soa/rGUZxXwp1sWu1Uhy8QBl8Jv3BbD7R74Vvjh9dhf9lSsnPa7yVnDHvzHiLLenBK tQ0mgU7Sl/8ux+KzG0Zfttg910oCz+65G8cFHU2r3LSKTED+tUiFh+TJb4Sx7T3eqh/E kfEw== MIME-Version: 1.0 X-Received: by 10.140.23.198 with SMTP id 64mr51506135qgp.62.1415723952707; Tue, 11 Nov 2014 08:39:12 -0800 (PST) Received: by 10.229.180.2 with HTTP; Tue, 11 Nov 2014 08:39:12 -0800 (PST) Date: Tue, 11 Nov 2014 11:39:12 -0500 Message-ID: Subject: Possible bug in sort From: Ben Mendis To: bug-coreutils@gnu.org Content-Type: multipart/alternative; boundary=001a11c12e6a4d1fd0050797ebc8 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Tue, 11 Nov 2014 11:41:38 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --001a11c12e6a4d1fd0050797ebc8 Content-Type: text/plain; charset=UTF-8 http://stackoverflow.com/questions/26869717/why-does-sort-seem-to-sort-a-field-incorrectly-based-on-the-presence-or-absenc Data is here: https://gist.github.com/anonymous/2a7beb4871b25ae8f8b3 This results in line 7 being sorted incorrectly: sort -t , -k 1n < weird.csv This produced the expected results: cut -f , -d 1-3 < weird.csv | sort -t , -k 1n Using 'g' instead of 'n' also produces the expected results, but I'm not clear on what the difference is between 'g' and 'n'. Tested with sort 8.21 on Slackware64-current. --001a11c12e6a4d1fd0050797ebc8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
https://gist.github.com/anonymous/2a7beb4871b25ae8f8b3

This results in line 7 being sorted incorrectly: sort= -t , -k 1n < weird.csv

This produced the expected results: cut -= f , -d 1-3 < weird.csv | sort -t , -k 1n

Using 'g' instea= d of 'n' also produces the expected results, but I'm not clear = on what the difference is between 'g' and 'n'.
Tested with sort 8.21 on Slackware64-current.
--001a11c12e6a4d1fd0050797ebc8-- From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 11 12:39:24 2014 Received: (at control) by debbugs.gnu.org; 11 Nov 2014 17:39:24 +0000 Received: from localhost ([127.0.0.1]:57569 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XoFPX-0006W8-2R for submit@debbugs.gnu.org; Tue, 11 Nov 2014 12:39:23 -0500 Received: from mx1.redhat.com ([209.132.183.28]:49565) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XoFPR-0006Vt-I8; Tue, 11 Nov 2014 12:39:18 -0500 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id sABHdEcf014799 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 11 Nov 2014 12:39:15 -0500 Received: from [10.3.113.152] (ovpn-113-152.phx2.redhat.com [10.3.113.152]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id sABHdDlY019425; Tue, 11 Nov 2014 12:39:13 -0500 Message-ID: <546249C1.5050906@redhat.com> Date: Tue, 11 Nov 2014 10:39:13 -0700 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Ben Mendis , 19021-done@debbugs.gnu.org Subject: Re: bug#19021: Possible bug in sort References: In-Reply-To: OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="1pJoMwW0avdkRigVxkAq0rkcT91Rco9gK" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Spam-Score: -5.5 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.5 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --1pJoMwW0avdkRigVxkAq0rkcT91Rco9gK Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable tag 19021 notabug thanks On 11/11/2014 09:39 AM, Ben Mendis wrote: > http://stackoverflow.com/questions/26869717/why-does-sort-seem-to-sort-= a-field-incorrectly-based-on-the-presence-or-absenc >=20 > Data is here: https://gist.github.com/anonymous/2a7beb4871b25ae8f8b3 Thanks for the report. Rather than making us chase down links, why not provide the information inline with your email? >=20 > This results in line 7 being sorted incorrectly: sort -t , -k 1n < weir= d.csv Try using the --debug option to see what is really happening. The bug is NOT in sort (which correctly obeyed your locale rules and incorrect command line), but in your command line (because you didn't tell sort where to quit parsing numbers). I'm going to distill it down to a smaller input that still expresses the same "swapped" lines: $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \ | sort -t, -k1n --debug sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules sort: key 1 is numeric and spans multiple fields 1,73,67,6 _________ _________ 2,68,61,7 _________ _________ 1,69,55,14 __________ __________ 2,71,59,12 __________ __________ See what's happening? The -k1n argument says to start parsing at field 1, but continue parsing until either the input is no longer numeric or until the end of line is reached (even if it goes into field 2 or beyond). Since commas are silently ignored in the en_US.UTF-8 locale when parsing a number, sort is thus comparing the values 268617 and 1695514, and the sort was correct. Now, try telling sort that it must parse a numeric field, but must END the parse at the end of the first field (if not sooner due to end of number): $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \ | sort -t, -k1,1n --debug sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules 1,69,55,14 _ __________ 1,73,67,6 _ _________ 2,68,61,7 _ _________ 2,71,59,12 _ __________ Or try using a locale where ',' is NOT part of a valid number: $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \ | LC_ALL=3DC sort -t, -k1n --debug sort: using simple byte comparison sort: key 1 is numeric and spans multiple fields 1,69,55,14 _ __________ 1,73,67,6 _ _________ 2,68,61,7 _ _________ 2,71,59,12 _ __________ >=20 > This produced the expected results: cut -f , -d 1-3 < weird.csv | sort = -t , > -k 1n Actually, you mean 'cut -d, -f 1-3' (you transposed while transferring from the stackoverflow site to your email). But yeah, when you truncate to a smaller number, you are comparing different values (17367 is less than 26861). >=20 > Using 'g' instead of 'n' also produces the expected results, but I'm no= t > clear on what the difference is between 'g' and 'n'. -n is specified by POSIX as parsing integers according to the current locale's definition. -g is a GNU extension, which says to parse floating point numbers. Apparently, in the en_US.UTF-8 locale, parsing floating point stops at the first comma, while parsing integers does not:= $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \ | sort -t, -k1g --debug sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules sort: key 1 is numeric and spans multiple fields 1,69,55,14 _ __________ 1,73,67,6 _ _________ 2,68,61,7 _ _________ 2,71,59,12 _ __________ I don't know why libc chose to make strtoll() ignore commas while strtold() does not, when not in the C locale. But at any rate, I hope I've demonstrated that the bug was in your usage and not in sort. So I'm closing this bug, although you should feel free to add further comments or questions. You may also want to read the FAQ:= https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-n= ot-sort-in-normal-order_0021 [Hmm - we should update that FAQ to mention the --debug option] --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --1pJoMwW0avdkRigVxkAq0rkcT91Rco9gK Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJUYknBAAoJEKeha0olJ0NqxOgH/RZbBzxhdamOmbBL6958SyCe Q0p6QUCMOAFV3t9ER5BwHJHDHnTdNPJzyr4rVKHPmZQsN3j//r7v4e0xTbMApJkK PZM5OB/1DpBFEGvfojIycLSYpijWT6HS1yERAr+bibugmzZxG8UjBknR/O8d33o7 QTaSujFT5jCl+XHaxSss5tGEn4YyqhMtQ+Bc8YpswQgswXgxFETphEjwkKszXWT7 ju272xBVPMeBQMgs1yZslgcjIEqrmIBpvUSUOhuD2tkxbtN0lgpPkyJ0GzaCLaCU v0LeLDj5t8J2oidQwXB1LEVk44WfJ4riCPe1lYqScyEF3BPnDOIjp7TARHzgr08= =mjyZ -----END PGP SIGNATURE----- --1pJoMwW0avdkRigVxkAq0rkcT91Rco9gK-- From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 11 13:28:20 2014 Received: (at 19021-done) by debbugs.gnu.org; 11 Nov 2014 18:28:20 +0000 Received: from localhost ([127.0.0.1]:57584 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XoGAt-0000lF-NV for submit@debbugs.gnu.org; Tue, 11 Nov 2014 13:28:20 -0500 Received: from nm25-vm1.bullet.mail.bf1.yahoo.com ([98.139.212.155]:37703) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XoGAq-0000ky-Ng for 19021-done@debbugs.gnu.org; Tue, 11 Nov 2014 13:28:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1415730495; bh=BJxf86s6NmODdqdi3ByActa/DYRR7RZj0nRhvi5wg+Q=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject; b=eGMfYgSem+ctDWAFq1E7+ul+vnztadt4cVjoGJJgCJHkRomZQm63skzhgb/D0dBU7Ziokx1jWPTQNtd+BnNBkbaP8LJLTo1UmLODz35SHD47WB+LC8bAQCu4Kq53EDGm0DYPWvhNn8+E6ueu1DPkVOWSB27dtsUbEvcrgL0xDcxgCzQNOT9JQ1UsskXJOvCx8tqtPmtPxl2rw60uldk11U0oo63cHAqC5gguUp244Q3wjheXWmj/M3J2R6o9QdwilWJy3mOD/gMNGd52UadgEgmWUabpGc1kI8IeF5n5VEiOmAlKcdNcZjFD6+ObwAJ1zWyGXS68SQ6ZzwQyFOE6bw== DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s2048; d=yahoo.com; b=sxAjYJ0/neUSc4oGTt+cDwndqBHF2Vxpv7ER2Ue/yKB0aAIM/pBF63Duc7RNT2lku0SsoDrX2qs2mfkiK0ThcbY1vweUBOWejhuUR0fLaUKnpOfCspdo94jTeu/bxd8GWsT42qlSH4EJPRAYqiKaS/6RWvx9NM6vfr3EYxL2rg3Nc0efSSRVmlOTg8PO7vosxwiaz7Cpo87zi8Gu7irB22bsV/Uie7sQg8roSvDp84BBz80i4rB0cIiTTYK5HEk+70rHbAFcBnyKkWdoD4ZlpxfqAHg88f5DvwWLaGfeLaev7s1Y4jN4YTFZn2PRZf1/V9EIrGc6IabAyArH3whFSA==; Received: from [66.196.81.171] by nm25.bullet.mail.bf1.yahoo.com with NNFMP; 11 Nov 2014 18:28:15 -0000 Received: from [98.139.215.229] by tm17.bullet.mail.bf1.yahoo.com with NNFMP; 11 Nov 2014 18:28:15 -0000 Received: from [127.0.0.1] by omp1069.mail.bf1.yahoo.com with NNFMP; 11 Nov 2014 18:28:15 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 697628.77088.bm@omp1069.mail.bf1.yahoo.com X-YMail-OSG: Abrp5bMVM1nKVjh12jXn58zhacJs.dqEoK77fopzyBk81_u_GmXBaBCGw0D1ki8 Ex9HGw6mZG7nUvm8DuPpx5mgAM6r6Hi0Hv4qhNiqE.v7DOPOdEQj3nmIjOqx1y9MLyFT5A533dhC kJQ8m_UPlK_KZXsZOfB2a.U_iMdCGj2DYtHmk..qqdj..qGq2mLHIBEzYUZCVapaZpEjk6rKn4Lk S.RNFQqP2LiSpTL_Om4y6ZIGYRqRBYU8STGRV1n9OPAuWKTOA_oomgzzFz5W_YjUHs7ZpyoF.jH. Rg35mIxRw2rZc92Z9w1.s7zlRoWZUhlgrAPL5ZYbA8wwwL4tRZLwaHwJU_nTWuxbHkfCC4xZ0_kc lf41M.f.qNBhVmyimDPMYAlk8P1ab1umCRF1uvGCiHGQ_iwo7RF30zd2u8P0c0EqM5f7iSEr4bFG BqQdjGgOzBngb3P5L9Ko28_xBrb3Xe3fmSGIE2.UDWGEo3y8l3Q_mexkdOGaqZAsJFgHvtAbVXVG pMk2McZZOxlSJLv5xeLD0HeXsbK0YyD4NR4OIpnlK71jSFMk5WsUG7tSach5jjqeI9NO9MeirEMm VVa9nTP5rqKmVtsOAiRxT4LL_GN1cpbxCpiUo4D1v5nW04h46pHU6LtMxRQG0cfo.lQqOuAatBQo JH_vIY8lqolJG_m7iFrZv.O3qRiwYo4ci3dlbMwa6UeiRl7ypKII53GvoZVs7tp176lR0DQYESFa QVsl_uGqleOprzyrI4tsHRhceZglHcqneS_uPsLYTQsmkxW2xtmERMwsIO2TB.5kdc1nS_NjYec8 fTlvAYyRtsZy6WmcHnlKIJD_7 Received: by 76.13.26.65; Tue, 11 Nov 2014 18:28:15 +0000 Date: Tue, 11 Nov 2014 18:27:49 +0000 (UTC) From: Leslie S Satenstein To: Eric Blake , Ben Mendis , "19021-done@debbugs.gnu.org" <19021-done@debbugs.gnu.org> Message-ID: <1043544624.575297.1415730469500.JavaMail.yahoo@jws10693.mail.bf1.yahoo.com> In-Reply-To: <546249C1.5050906@redhat.com> References: <546249C1.5050906@redhat.com> Subject: Re: bug#19021: Possible bug in sort MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_575296_1054793460.1415730469494" Content-Length: 14939 X-Spam-Score: -0.5 (/) X-Debbugs-Envelope-To: 19021-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Leslie S Satenstein List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) ------=_Part_575296_1054793460.1415730469494 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Why not have used=C2=A0 sort=C2=A0 -t ',' -k 1n=C2=A0 ? =C2=A0Regards=20 =C2=A0Leslie Mr. Leslie Satenstein Montr=C3=A9al Qu=C3=A9bec, Canada =20 From: Eric Blake To: Ben Mendis ; 19021-done@debbugs.gnu.org=20 Sent: Tuesday, November 11, 2014 12:39 PM Subject: bug#19021: Possible bug in sort =20 tag 19021 notabug thanks On 11/11/2014 09:39 AM, Ben Mendis wrote: > http://stackoverflow.com/questions/26869717/why-does-sort-seem-to-sort-a-= field-incorrectly-based-on-the-presence-or-absenc >=20 > Data is here: https://gist.github.com/anonymous/2a7beb4871b25ae8f8b3 Thanks for the report.=C2=A0 Rather than making us chase down links, why no= t provide the information inline with your email? >=20 > This results in line 7 being sorted incorrectly: sort -t , -k 1n < weird.= csv Try using the --debug option to see what is really happening.=C2=A0 The bug is NOT in sort (which correctly obeyed your locale rules and incorrect command line), but in your command line (because you didn't tell sort where to quit parsing numbers). I'm going to distill it down to a smaller input that still expresses the same "swapped" lines: $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \ | sort -t, -k1n --debug sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules sort: key 1 is numeric and spans multiple fields 1,73,67,6 _________ _________ 2,68,61,7 _________ _________ 1,69,55,14 __________ __________ 2,71,59,12 __________ __________ See what's happening? The -k1n argument says to start parsing at field 1, but continue parsing until either the input is no longer numeric or until the end of line is reached (even if it goes into field 2 or beyond). Since commas are silently ignored in the en_US.UTF-8 locale when parsing a number, sort is thus comparing the values 268617 and 1695514, and the sort was correct. Now, try telling sort that it must parse a numeric field, but must END the parse at the end of the first field (if not sooner due to end of number): $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \ | sort -t, -k1,1n --debug sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules 1,69,55,14 _ __________ 1,73,67,6 _ _________ 2,68,61,7 _ _________ 2,71,59,12 _ __________ Or try using a locale where ',' is NOT part of a valid number: $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \ | LC_ALL=3DC sort -t, -k1n --debug sort: using simple byte comparison sort: key 1 is numeric and spans multiple fields 1,69,55,14 _ __________ 1,73,67,6 _ _________ 2,68,61,7 _ _________ 2,71,59,12 _ __________ >=20 > This produced the expected results: cut -f , -d 1-3 < weird.csv | sort -t= , > -k 1n Actually, you mean 'cut -d, -f 1-3' (you transposed while transferring from the stackoverflow site to your email).=C2=A0 But yeah, when you trunca= te to a smaller number, you are comparing different values (17367 is less than 26861). >=20 > Using 'g' instead of 'n' also produces the expected results, but I'm not > clear on what the difference is between 'g' and 'n'. -n is specified by POSIX as parsing integers according to the current locale's definition.=C2=A0 -g is a GNU extension, which says to parse floating point numbers.=C2=A0 Apparently, in the en_US.UTF-8 locale, parsin= g floating point stops at the first comma, while parsing integers does not: $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \ | sort -t, -k1g --debug sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules sort: key 1 is numeric and spans multiple fields 1,69,55,14 _ __________ 1,73,67,6 _ _________ 2,68,61,7 _ _________ 2,71,59,12 _ __________ I don't know why libc chose to make strtoll() ignore commas while strtold() does not, when not in the C locale. But at any rate, I hope I've demonstrated that the bug was in your usage and not in sort.=C2=A0 So I'm closing this bug, although you should feel fr= ee to add further comments or questions.=C2=A0 You may also want to read the F= AQ: https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not= -sort-in-normal-order_0021 [Hmm - we should update that FAQ to mention the --debug option] --=20 Eric Blake=C2=A0 eblake redhat com=C2=A0 =C2=A0 +1-919-301-3266 Libvirt virtualization library http://libvirt.org =20 ------=_Part_575296_1054793460.1415730469494 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Why not have used  sort  -t ',' = -k 1n  ?
 
Regards

 Leslie
Mr. Lesli= e Satenstein
<= /div>Montr=C3=A9al Qu=C3=A9bec, Cana= da

<= b id=3D"yui_3_13_0_ym1_1_1391351152837_28311">


From: Eric Blake <eblake@redhat.com>
To: Ben Mendis <dragonwisard@gmail.co= m>; 19021-done@debbugs.gnu.org
Subject: bug#19021: Possible bug in sort=

tag 19021 notabug
thanks

On 11/11/2014 09:39 AM, Ben Mendis wrote:
> htt= p://stackoverflow.com/questions/26869717/why-does-sort-seem-to-sort-a-field= -incorrectly-based-on-the-presence-or-absenc
> > Data is here: https://gist.github.com/anonymous/2a7beb4871b25ae8f= 8b3

Thanks for the report.  R= ather than making us chase down links, why not
provide th= e information inline with your email?

= >
> This results in line 7 being sorted incorrectl= y: sort -t , -k 1n < weird.csv

Try = using the --debug option to see what is really happening.  The bug
is NOT in sort (which correctly obeyed your locale rules and= incorrect
command line), but in your command line (becau= se you didn't tell sort
where to quit parsing numbers).
I'm going to distill it down to a small= er input that still expresses the
same "swapped" lines:
$ printf '1,73,67,6\n2,68,61,7\n1,69,55= ,14\n2,71,59,12\n' \
| sort -t, -k1n --debug
sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules
sort: key 1 is numeric and spans multiple fields
1,= 73,67,6
_________
_________
2,68,61,7
_________
_________1,69,55,14
__________
_= _________
2,71,59,12
__________
__________

See what's happe= ning? The -k1n argument says to start parsing at field
1,= but continue parsing until either the input is no longer numeric or
until the end of line is reached (even if it goes into field 2 = or
beyond). Since commas are silently ignored in the en_U= S.UTF-8 locale
when parsing a number, sort is thus compar= ing the values 268617 and
1695514, and the sort was corre= ct.

Now, try telling sort that it must= parse a numeric field, but must END
the parse at the end= of the first field (if not sooner due to end of
number):=

$ printf '1,73,67,6\n2,68,61,7\n1,69,= 55,14\n2,71,59,12\n' \
| sort -t, -k1,1n --debug
sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules
1,69,55,14
_
__________
1,73,67,6
_
_________
2,68,61,7
_
_________
2,71,59,12
_
__________<= br clear=3D"none">
Or try using a locale where ',' is NOT= part of a valid number:

$ printf '1,7= 3,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \
| LC_ALL= =3DC sort -t, -k1n --debug
sort: using simple byte compar= ison
sort: key 1 is numeric and spans multiple fields
1,69,55,14
_
__________<= br clear=3D"none">1,73,67,6
_
_________=
2,68,61,7
_
________= _
2,71,59,12
_
______= ____


>
> This produced the expected results: cut -f , -d 1-3 < wei= rd.csv | sort -t ,
> -k 1n

Actually, you mean 'cut -d, -f 1-3' (you transposed while transfe= rring
from the stackoverflow site to your email).  B= ut yeah, when you truncate
to a smaller number, you are c= omparing different values (17367 is less
than 26861).



>
&g= t; Using 'g' instead of 'n' also produces the expected results, but I'm not=
> clear on what the difference is between 'g' and 'n'= .


-n is specified by POSIX as pa= rsing integers according to the current
locale's definiti= on.  -g is a GNU extension, which says to parse
floa= ting point numbers.  Apparently, in the en_US.UTF-8 locale, parsingfloating point stops at the first comma, while parsing inte= gers does not:

$ printf '1,73,67,6\n2,= 68,61,7\n1,69,55,14\n2,71,59,12\n' \
| sort -t, -k1g --d= ebug
sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting ru= les
sort: key 1 is numeric and spans multiple fields
1,69,55,14
_
__________1,73,67,6
_
_________<= br clear=3D"none">2,68,61,7
_
_________=
2,71,59,12
_
_______= ___

I don't know why libc chose to mak= e strtoll() ignore commas while
strtold() does not, when = not in the C locale.

But at any rate, = I hope I've demonstrated that the bug was in your usage
a= nd not in sort.  So I'm closing this bug, although you should feel fre= e
to add further comments or questions.  You may als= o want to read the FAQ:
https://www.gnu.org/software/coreuti= ls/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021
[Hmm - we should update that FAQ to mention the --debug option= ]

--
Eric Blake&nbs= p; eblake redhat com    +1-919-301-3266
Libvir= t virtualization library http://libvirt.org



------=_Part_575296_1054793460.1415730469494-- From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 11 14:29:38 2014 Received: (at 19021-done) by debbugs.gnu.org; 11 Nov 2014 19:29:38 +0000 Received: from localhost ([127.0.0.1]:57617 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XoH8D-0002Hp-LU for submit@debbugs.gnu.org; Tue, 11 Nov 2014 14:29:38 -0500 Received: from mx1.redhat.com ([209.132.183.28]:40471) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XoH8B-0002Hg-8T for 19021-done@debbugs.gnu.org; Tue, 11 Nov 2014 14:29:36 -0500 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id sABJTYCa028824 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 11 Nov 2014 14:29:34 -0500 Received: from [10.3.113.152] (ovpn-113-152.phx2.redhat.com [10.3.113.152]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id sABJTXkL002901; Tue, 11 Nov 2014 14:29:33 -0500 Message-ID: <5462639D.6000808@redhat.com> Date: Tue, 11 Nov 2014 12:29:33 -0700 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Leslie S Satenstein , Ben Mendis , "19021-done@debbugs.gnu.org" <19021-done@debbugs.gnu.org> Subject: Re: bug#19021: Possible bug in sort References: <546249C1.5050906@redhat.com> <1043544624.575297.1415730469500.JavaMail.yahoo@jws10693.mail.bf1.yahoo.com> In-Reply-To: <1043544624.575297.1415730469500.JavaMail.yahoo@jws10693.mail.bf1.yahoo.com> OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="PcdGOfORtHeOs3VXoLpx9HpnQo6tQ2gvD" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Spam-Score: -5.5 (-----) X-Debbugs-Envelope-To: 19021-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.5 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --PcdGOfORtHeOs3VXoLpx9HpnQo6tQ2gvD Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 11/11/2014 11:27 AM, Leslie S Satenstein wrote: [please don't top-post on technical lists - it makes it harder to figure out what you are asking] > Why not have used sort -t ',' -k 1n ? >=20 >> >> This results in line 7 being sorted incorrectly: sort -t , -k 1n < wei= rd.csv Are you asking the difference between: sort -t , -k 1n sort -t ',' -k 1n If so, there's no difference. The shell strips the '' quoting around , before invoking sort, so argv[] is the same in either spelling from the shell. But that has nothing to do with the bug report, where the answer is that the caller should have been using: sort -t , -k 1,1n or LC_ALL=3DC sort -t , -k 1n or the combination: LC_ALL=3DC sort -t , -k 1,1n --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --PcdGOfORtHeOs3VXoLpx9HpnQo6tQ2gvD Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJUYmOdAAoJEKeha0olJ0NqaDwIAJfMt2YOEI1NMRP1N3HR8swu TfdTs4nbKgnIiIdRDzOf/kA1JcbHEtt0ReMZHBt0vBoa67ihTS2zUSwiJTvrLYkR UC6xj2iEB92v9aS9FJoyrK9Ixpbm42c6uoY0x1sQir7g3jY4fYo1MVTl574U5H00 DNCYk4QsYcizKkYa6BWChCxzZMDqLB+G7JkjF9W1mdYWs0/Ly6A8sqx+Rh/cO4ui t3nDOYdouEx48K8XyP3jee46JqzugTPL3oWhpfr2l60e6pDfAEhkhSsqn6ChVz7x o+tSOLzz4AZCj0EdKkn+o40+uLI8TcC1JQIxZ7/lrbiT6MGsoPeviHmbVxmEvEk= =0+gl -----END PGP SIGNATURE----- --PcdGOfORtHeOs3VXoLpx9HpnQo6tQ2gvD-- From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 11 15:07:32 2014 Received: (at 19021) by debbugs.gnu.org; 11 Nov 2014 20:07:32 +0000 Received: from localhost ([127.0.0.1]:57650 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XoHit-0004Vu-8n for submit@debbugs.gnu.org; Tue, 11 Nov 2014 15:07:32 -0500 Received: from mail-qg0-f50.google.com ([209.85.192.50]:41502) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XoHiq-0004Vg-A9 for 19021@debbugs.gnu.org; Tue, 11 Nov 2014 15:07:29 -0500 Received: by mail-qg0-f50.google.com with SMTP id a108so7670208qge.23 for <19021@debbugs.gnu.org>; Tue, 11 Nov 2014 12:07:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=FpbGcbPMAjI7gHHikw6ipgD3ZhYM2NPx3+QFkhGY4/Q=; b=mBBQvYihJQcFgK8oa7kZaZ8H/FUshRCvL8ZUVal5xvuQuJpsgQ/83RjtvNq5pNi1At 37/1RlsyepzbNoDLpCygbnLbCa8+r5pVfCWXXQxl4JgYAfXIMNjNvBNUbMUtE2yXD4BQ LlSwWPnJjotscFb/wBs5BJnzIN2Dh/owdDwjzhwq0c1GHrGFzyThTZucWn7kMZ+hPuD3 IzktF7rqwJclqrPIZe7biKT3jP9aL/fEY99rl2L8BmaR5wbkH74dGZBREpCm5pgbGPjI i1b97tBbnzvSnVJAF7EZOysR5qjKMS6/O3oQ/0kj5A7Y15M9uWuq1xtC7SDkjBSwnygy lIUQ== MIME-Version: 1.0 X-Received: by 10.224.65.4 with SMTP id g4mr6709709qai.20.1415736447512; Tue, 11 Nov 2014 12:07:27 -0800 (PST) Received: by 10.229.180.2 with HTTP; Tue, 11 Nov 2014 12:07:27 -0800 (PST) In-Reply-To: References: <546249C1.5050906@redhat.com> Date: Tue, 11 Nov 2014 15:07:27 -0500 Message-ID: Subject: Re: bug#19021: closed (Re: bug#19021: Possible bug in sort) From: Ben Mendis To: 19021@debbugs.gnu.org Content-Type: multipart/alternative; boundary=001a11c2b9800cbb3e05079ad45b X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 19021 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --001a11c2b9800cbb3e05079ad45b Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks for the explanation. This solves my issue. On Tue, Nov 11, 2014 at 12:40 PM, GNU bug Tracking System < help-debbugs@gnu.org> wrote: > Your bug report > > #19021: Possible bug in sort > > which was filed against the coreutils package, has been closed. > > The explanation is attached below, along with your original report. > If you require more details, please reply to 19021@debbugs.gnu.org. > > -- > 19021: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D19021 > GNU Bug Tracking System > Contact help-debbugs@gnu.org with problems > > > ---------- Forwarded message ---------- > From: Eric Blake > To: Ben Mendis , 19021-done@debbugs.gnu.org > Cc: > Date: Tue, 11 Nov 2014 10:39:13 -0700 > Subject: Re: bug#19021: Possible bug in sort > tag 19021 notabug > thanks > > On 11/11/2014 09:39 AM, Ben Mendis wrote: > > > http://stackoverflow.com/questions/26869717/why-does-sort-seem-to-sort-a-= field-incorrectly-based-on-the-presence-or-absenc > > > > Data is here: https://gist.github.com/anonymous/2a7beb4871b25ae8f8b3 > > Thanks for the report. Rather than making us chase down links, why not > provide the information inline with your email? > > > > > This results in line 7 being sorted incorrectly: sort -t , -k 1n < > weird.csv > > Try using the --debug option to see what is really happening. The bug > is NOT in sort (which correctly obeyed your locale rules and incorrect > command line), but in your command line (because you didn't tell sort > where to quit parsing numbers). > > I'm going to distill it down to a smaller input that still expresses the > same "swapped" lines: > > $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \ > | sort -t, -k1n --debug > sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules > sort: key 1 is numeric and spans multiple fields > 1,73,67,6 > _________ > _________ > 2,68,61,7 > _________ > _________ > 1,69,55,14 > __________ > __________ > 2,71,59,12 > __________ > __________ > > See what's happening? The -k1n argument says to start parsing at field > 1, but continue parsing until either the input is no longer numeric or > until the end of line is reached (even if it goes into field 2 or > beyond). Since commas are silently ignored in the en_US.UTF-8 locale > when parsing a number, sort is thus comparing the values 268617 and > 1695514, and the sort was correct. > > Now, try telling sort that it must parse a numeric field, but must END > the parse at the end of the first field (if not sooner due to end of > number): > > $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \ > | sort -t, -k1,1n --debug > sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules > 1,69,55,14 > _ > __________ > 1,73,67,6 > _ > _________ > 2,68,61,7 > _ > _________ > 2,71,59,12 > _ > __________ > > Or try using a locale where ',' is NOT part of a valid number: > > $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \ > | LC_ALL=3DC sort -t, -k1n --debug > sort: using simple byte comparison > sort: key 1 is numeric and spans multiple fields > 1,69,55,14 > _ > __________ > 1,73,67,6 > _ > _________ > 2,68,61,7 > _ > _________ > 2,71,59,12 > _ > __________ > > > > > > This produced the expected results: cut -f , -d 1-3 < weird.csv | sort > -t , > > -k 1n > > Actually, you mean 'cut -d, -f 1-3' (you transposed while transferring > from the stackoverflow site to your email). But yeah, when you truncate > to a smaller number, you are comparing different values (17367 is less > than 26861). > > > > > Using 'g' instead of 'n' also produces the expected results, but I'm no= t > > clear on what the difference is between 'g' and 'n'. > > -n is specified by POSIX as parsing integers according to the current > locale's definition. -g is a GNU extension, which says to parse > floating point numbers. Apparently, in the en_US.UTF-8 locale, parsing > floating point stops at the first comma, while parsing integers does not: > > $ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \ > | sort -t, -k1g --debug > sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules > sort: key 1 is numeric and spans multiple fields > 1,69,55,14 > _ > __________ > 1,73,67,6 > _ > _________ > 2,68,61,7 > _ > _________ > 2,71,59,12 > _ > __________ > > I don't know why libc chose to make strtoll() ignore commas while > strtold() does not, when not in the C locale. > > But at any rate, I hope I've demonstrated that the bug was in your usage > and not in sort. So I'm closing this bug, although you should feel free > to add further comments or questions. You may also want to read the FAQ: > > https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-n= ot-sort-in-normal-order_0021 > [Hmm - we should update that FAQ to mention the --debug option] > > -- > Eric Blake eblake redhat com +1-919-301-3266 > Libvirt virtualization library http://libvirt.org > > > > ---------- Forwarded message ---------- > From: Ben Mendis > To: bug-coreutils@gnu.org > Cc: > Date: Tue, 11 Nov 2014 11:39:12 -0500 > Subject: Possible bug in sort > > http://stackoverflow.com/questions/26869717/why-does-sort-seem-to-sort-a-= field-incorrectly-based-on-the-presence-or-absenc > > Data is here: https://gist.github.com/anonymous/2a7beb4871b25ae8f8b3 > > This results in line 7 being sorted incorrectly: sort -t , -k 1n < > weird.csv > > This produced the expected results: cut -f , -d 1-3 < weird.csv | sort -t > , -k 1n > > Using 'g' instead of 'n' also produces the expected results, but I'm not > clear on what the difference is between 'g' and 'n'. > > Tested with sort 8.21 on Slackware64-current. > > --001a11c2b9800cbb3e05079ad45b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks for the explanation. This solves my issue.=C2=A0

On Tue, Nov 11= , 2014 at 12:40 PM, GNU bug Tracking System <help-debbugs@gnu.org= > wrote:
Your bug report

#19021: Possible bug in sort

which was filed against the coreutils package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 19021@debbugs.gnu.org.

--
19021: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D19021
GNU Bug Tracking System
Contact help-debbugs@gnu.org wi= th problems


---------- Forwarded message ----------
From:=C2= =A0Eric Blake <eblake@redhat.com>
To:=C2=A0Ben Mendis <
= dragonwisard@gmail.com>, 19021-done@debbugs.gnu.org
Cc:=C2=A0
Date:=C2=A0Tue, 11 Nov 20= 14 10:39:13 -0700
Subject:=C2=A0Re: bug#19021: Possible bug in sort
t= ag 19021 notabug
thanks

On 11/11/2014 09:39 AM, Ben Mendis wrote:
> http://stackoverflow.com/questions/26869717/why-does-sort-seem-to-s= ort-a-field-incorrectly-based-on-the-presence-or-absenc
>
> Data is here: https://gist.github.com/anonymous/2a7beb4871b2= 5ae8f8b3

Thanks for the report.=C2=A0 Rather than making us chase down links, why no= t
provide the information inline with your email?

>
> This results in line 7 being sorted incorrectly: sort -t , -k 1n < = weird.csv

Try using the --debug option to see what is really happening.=C2=A0 The bug=
is NOT in sort (which correctly obeyed your locale rules and incorrect
command line), but in your command line (because you didn't tell sort where to quit parsing numbers).

I'm going to distill it down to a smaller input that still expresses th= e
same "swapped" lines:

$ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \
=C2=A0| sort -t, -k1n --debug
sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules
sort: key 1 is numeric and spans multiple fields
1,73,67,6
_________
_________
2,68,61,7
_________
_________
1,69,55,14
__________
__________
2,71,59,12
__________
__________

See what's happening? The -k1n argument says to start parsing at field<= br> 1, but continue parsing until either the input is no longer numeric or
until the end of line is reached (even if it goes into field 2 or
beyond). Since commas are silently ignored in the en_US.UTF-8 locale
when parsing a number, sort is thus comparing the values 268617 and
1695514, and the sort was correct.

Now, try telling sort that it must parse a numeric field, but must END
the parse at the end of the first field (if not sooner due to end of
number):

$ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \
=C2=A0| sort -t, -k1,1n --debug
sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules
1,69,55,14
_
__________
1,73,67,6
_
_________
2,68,61,7
_
_________
2,71,59,12
_
__________

Or try using a locale where ',' is NOT part of a valid number:

$ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \
=C2=A0| LC_ALL=3DC sort -t, -k1n --debug
sort: using simple byte comparison
sort: key 1 is numeric and spans multiple fields
1,69,55,14
_
__________
1,73,67,6
_
_________
2,68,61,7
_
_________
2,71,59,12
_
__________


>
> This produced the expected results: cut -f , -d 1-3 < weird.csv | s= ort -t ,
> -k 1n

Actually, you mean 'cut -d, -f 1-3' (you transposed while transferr= ing
from the stackoverflow site to your email).=C2=A0 But yeah, when you trunca= te
to a smaller number, you are comparing different values (17367 is less
than 26861).

>
> Using 'g' instead of 'n' also produces the expected re= sults, but I'm not
> clear on what the difference is between 'g' and 'n'.
-n is specified by POSIX as parsing integers according to the current
locale's definition.=C2=A0 -g is a GNU extension, which says to parse floating point numbers.=C2=A0 Apparently, in the en_US.UTF-8 locale, parsin= g
floating point stops at the first comma, while parsing integers does not:
$ printf '1,73,67,6\n2,68,61,7\n1,69,55,14\n2,71,59,12\n' \
=C2=A0| sort -t, -k1g --debug
sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules
sort: key 1 is numeric and spans multiple fields
1,69,55,14
_
__________
1,73,67,6
_
_________
2,68,61,7
_
_________
2,71,59,12
_
__________

I don't know why libc chose to make strtoll() ignore commas while
strtold() does not, when not in the C locale.

But at any rate, I hope I've demonstrated that the bug was in your usag= e
and not in sort.=C2=A0 So I'm closing this bug, although you should fee= l free
to add further comments or questions.=C2=A0 You may also want to read the F= AQ:
https://www.gnu.or= g/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-or= der_0021
[Hmm - we should update that FAQ to mention the --debug option]

--
Eric Blake=C2=A0 =C2=A0eblake redhat com=C2=A0 =C2=A0 +1-919-301-3266
Libvirt virtualization library http://libvirt.org



---------- Forwarded message ----------
From:=C2=A0Ben Mendis &l= t;dragonwisard@gmail.com><= br>To:=C2=A0bug-coreutils@gnu.org<= /a>
Cc:=C2=A0
Date:=C2=A0Tue, 11 Nov 2014 11:39:12 -0500
Subject:= =C2=A0Possible bug in sort
http://stackoverflow.com/= questions/26869717/why-does-sort-seem-to-sort-a-field-incorrectly-based-on-= the-presence-or-absenc

This results in line 7 being sorted incorrectly: sort -t , -k = 1n < weird.csv

This produced the expected results: cut -f , -d 1-= 3 < weird.csv | sort -t , -k 1n

Using 'g' instead of '= ;n' also produces the expected results, but I'm not clear on what t= he difference is between 'g' and 'n'.

<= div>Tested with sort 8.21 on Slackware64-current.


--001a11c2b9800cbb3e05079ad45b-- From unknown Tue Jun 17 01:49:48 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 10 Dec 2014 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator