From unknown Sat Jun 14 18:48:57 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10985: sort -k behavior possible problem: field span across the boundaries Resent-From: Oleg Moskalenko Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 09 Mar 2012 19:58:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 10985 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 10985@debbugs.gnu.org X-Debbugs-Original-To: "'bug-coreutils@gnu.org'" Received: via spool by submit@debbugs.gnu.org id=B.133132304425786 (code B ref -1); Fri, 09 Mar 2012 19:58:01 +0000 Received: (at submit) by debbugs.gnu.org; 9 Mar 2012 19:57:24 +0000 Received: from localhost ([127.0.0.1]:41007 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S65wJ-0006hp-0p for submit@debbugs.gnu.org; Fri, 09 Mar 2012 14:57:24 -0500 Received: from eggs.gnu.org ([208.118.235.92]:50027) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S65n8-0006UJ-KK for submit@debbugs.gnu.org; Fri, 09 Mar 2012 14:47:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S65mB-0007S9-9x for submit@debbugs.gnu.org; Fri, 09 Mar 2012 14:46:57 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,HTML_MESSAGE, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:47110) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S65mB-0007S5-6e for submit@debbugs.gnu.org; Fri, 09 Mar 2012 14:46:55 -0500 Received: from eggs.gnu.org ([208.118.235.92]:42817) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S65m8-00070v-Rh for bug-coreutils@gnu.org; Fri, 09 Mar 2012 14:46:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S65m5-0007RA-Qh for bug-coreutils@gnu.org; Fri, 09 Mar 2012 14:46:52 -0500 Received: from smtp02.citrix.com ([66.165.176.63]:5451) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S65m5-0007QA-Hn for bug-coreutils@gnu.org; Fri, 09 Mar 2012 14:46:49 -0500 X-IronPort-AV: E=Sophos;i="4.73,559,1325480400"; d="scan'208,217";a="185295783" Received: from sjcpmailmx02.citrite.net ([10.216.14.75]) by FTLPIPO02.CITRIX.COM with ESMTP/TLS/RC4-MD5; 09 Mar 2012 14:46:46 -0500 Received: from SJCPMAILBOX01.citrite.net ([10.216.4.73]) by SJCPMAILMX02.citrite.net ([10.216.14.75]) with mapi; Fri, 9 Mar 2012 11:46:45 -0800 From: Oleg Moskalenko Date: Fri, 9 Mar 2012 11:46:45 -0800 Thread-Topic: sort -k behavior possible problem: field span across the boundaries Thread-Index: Acz+LVuZ9E4BBTs+RBu3Le5MCYhZlA== Message-ID: <031222CBCF33214AB2EB4ABA279428A30107B5E9D284@SJCPMAILBOX01.citrite.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_031222CBCF33214AB2EB4ABA279428A30107B5E9D284SJCPMAILBOX_" MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.9 (------) X-Mailman-Approved-At: Fri, 09 Mar 2012 14:57:21 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) --_000_031222CBCF33214AB2EB4ABA279428A30107B5E9D284SJCPMAILBOX_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi While testing different GNU coreutils sort versions on different platforms = (Linux and FreeBSD) I found that some behavior is probably not what a utili= ty user expects. Let's, say, we have to sort (numerically stable) just two lines: $ sort -t "|" -ns -k2.3,2.7 <

Hi

 

While testi= ng different GNU coreutils sort versions on different platforms (Linux and = FreeBSD) I found that some behavior is probably not what a utility user exp= ects.

 

Let’s, say, we have to sort (numerically stable) just two lin= es:

 

$ sort –t “|” –ns –k2.3,2.7 <<!

1|234

= 1|2|34

!

 

The GNU sort output is:<= /o:p>

 

1|2= 34

1|2|34

 

 

The correct output (from my point of view) must be:

 

1|2|34<= o:p>

1|234

 

My reasoning is that applying th= e key specs “-k2.3,2.7” to string “1|234” we obtain= the key “4”, and applying the same key to the string “1|= 2|34” we must obtain “” (empty string), because the secon= d field is just “2” and symbols from 3rd to 7th posi= tion give us an empty string. And the empty string is smaller than a number= , numerically, according to the “info sort”.

 

On the other han= d, the GNU sort (I suppose) just takes an offset from the field start, with= out taking into account the real field length. It yields the key “34&= #8221;, and this is larger, numerically, than “4”.

 

I do not k= now whether this is an intended behavior or a bug, but this is definitely n= on-intuitive and not what a reasonable user would expect.

 

Thanks a lot !<= o:p>

Oleg Moskalenko

 

= --_000_031222CBCF33214AB2EB4ABA279428A30107B5E9D284SJCPMAILBOX_-- From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 09 15:21:54 2012 Received: (at control) by debbugs.gnu.org; 9 Mar 2012 20:21:54 +0000 Received: from localhost ([127.0.0.1]:41034 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S66K1-0007HD-9n for submit@debbugs.gnu.org; Fri, 09 Mar 2012 15:21:53 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54844) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S66Jw-0007Gx-DB; Fri, 09 Mar 2012 15:21:51 -0500 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q29KKnCt010128 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 9 Mar 2012 15:20:49 -0500 Received: from [10.3.113.110] (ovpn-113-110.phx2.redhat.com [10.3.113.110]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id q29KKmrj020676; Fri, 9 Mar 2012 15:20:49 -0500 Message-ID: <4F5A6620.8050008@redhat.com> Date: Fri, 09 Mar 2012 13:20:48 -0700 From: Eric Blake Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120216 Thunderbird/10.0.1 MIME-Version: 1.0 To: Oleg Moskalenko Subject: Re: bug#10985: sort -k behavior possible problem: field span across the boundaries References: <031222CBCF33214AB2EB4ABA279428A30107B5E9D284@SJCPMAILBOX01.citrite.net> In-Reply-To: <031222CBCF33214AB2EB4ABA279428A30107B5E9D284@SJCPMAILBOX01.citrite.net> X-Enigmail-Version: 1.3.5 OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enigB3DFC489B26C37CCCF116BE1" X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: control Cc: 10985-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigB3DFC489B26C37CCCF116BE1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable tag 10985 notabug thanks On 03/09/2012 12:46 PM, Oleg Moskalenko wrote: > Hi >=20 > While testing different GNU coreutils sort versions on different platfo= rms (Linux and FreeBSD) I found that some behavior is probably not what a= utility user expects. Thanks for the report. However, you probably found behavior that is required by POSIX. >=20 > Let's, say, we have to sort (numerically stable) just two lines: >=20 > $ sort -t "|" -ns -k2.3,2.7 < 1|234 > 1|2|34 > ! Let's use 'sort --debug' to see what really happened: $ LC_ALL=3DC sort --debug -t\| -ns -k2.3,2.7 < 1|234 > 1|2|34 > a sort: using simple byte comparison 1|234 _ 1|2|34 __ So this sorted by locating the start of the second field ("234" of one line, and "2|34" of the other line), then starting at the 3rd byte past that location (even if it is in the next field). This behavior is required by POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html >=20 > The correct output (from my point of view) must be: >=20 > 1|2|34 > 1|234 Sorry, but that interpretation does not match POSIX. >=20 > My reasoning is that applying the key specs "-k2.3,2.7" to string "1|23= 4" we obtain the key "4", and applying the same key to the string "1|2|34= " we must obtain "" (empty string), That's where you are wrong. POSIX states: >> The notation: >>=20 >> -k field_start[type][,field_end[type]] >>=20 >> shall define a key field that begins at field_start and ends at field_= end inclusive, unless field_start falls beyond the end of the line or aft= er field_end, in which case the key field is empty. A missing field_end s= hall mean the last character of the line. >>=20 >> A field comprises a maximal sequence of non-separating characters and,= in the absence of option -t, any preceding field separator. >>=20 >> The field_start portion of the keydef option-argument shall have the f= orm: >>=20 >> field_number[.first_character] >>=20 >> Fields and characters within fields shall be numbered starting with 1.= The field_number and first_character pieces, interpreted as positive dec= imal integers, shall specify the first character to be used as part of a = sort key. If .first_character is omitted, it shall refer to the first cha= racter of the field. That is, the field_start 2.3 means to start at the third character past the second field, regardless if any intermediate field separators are located, and that _only_ the end of a line (and not another field separator) can result in an empty key field. >=20 > I do not know whether this is an intended behavior or a bug, Intended and mandated by the standards. > but this is definitely non-intuitive and not what a reasonable user wou= ld expect. Perhaps so, but if you want it changed, you need to file a bug report against POSIX. As such, I'm going to close out this coreutils bug. --=20 Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --------------enigB3DFC489B26C37CCCF116BE1 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJPWmYgAAoJEKeha0olJ0Nq4hsIAJXODhMKYFLHwcYRLafflUHC 2or7Hb3anMUncbJZC/KyTdMpZtbdabqP1/WzcO50h49zaXx4GOXRMgqEpmr4FGPD qXKVXcyEIKr2r9YR+RNUg0liToU3a6n2qRwcTGt543N6tbJ/YO183MMeOb6JEJ6U e0H8wDIdcHpKzsddEcj5JPBBmW5Qrz79QI8QXrcyy2wsxuve35f+XXoDTCnQ57Ns wEVA2KllsZdEVu0uh5uF7uTztE1M/BlkDQXApfEGEjRHbyfEGj0YWInSKThR3V+t qPkJpFDEtl/qnD+Ys5SW9WZKhnAdtlgiFTZBLu3xzAiD4eos95HT0mXm9j/RpTc= =j2My -----END PGP SIGNATURE----- --------------enigB3DFC489B26C37CCCF116BE1-- From unknown Sat Jun 14 18:48:57 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.428 (Entity 5.428) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Oleg Moskalenko Subject: bug#10985: closed (Re: bug#10985: sort -k behavior possible problem: field span across the boundaries) Message-ID: References: <4F5A6620.8050008@redhat.com> <031222CBCF33214AB2EB4ABA279428A30107B5E9D284@SJCPMAILBOX01.citrite.net> X-Gnu-PR-Message: they-closed 10985 X-Gnu-PR-Package: coreutils X-Gnu-PR-Keywords: notabug Reply-To: 10985@debbugs.gnu.org Date: Fri, 09 Mar 2012 20:22:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1331324522-28008-1" This is a multi-part message in MIME format... ------------=_1331324522-28008-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #10985: sort -k behavior possible problem: field span across the boundaries which was filed against the coreutils package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 10985@debbugs.gnu.org. --=20 10985: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D10985 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1331324522-28008-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 10985-done) by debbugs.gnu.org; 9 Mar 2012 20:21:53 +0000 Received: from localhost ([127.0.0.1]:41032 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S66K0-0007HA-DW for submit@debbugs.gnu.org; Fri, 09 Mar 2012 15:21:53 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54844) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S66Jw-0007Gx-DB; Fri, 09 Mar 2012 15:21:51 -0500 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q29KKnCt010128 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 9 Mar 2012 15:20:49 -0500 Received: from [10.3.113.110] (ovpn-113-110.phx2.redhat.com [10.3.113.110]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id q29KKmrj020676; Fri, 9 Mar 2012 15:20:49 -0500 Message-ID: <4F5A6620.8050008@redhat.com> Date: Fri, 09 Mar 2012 13:20:48 -0700 From: Eric Blake Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120216 Thunderbird/10.0.1 MIME-Version: 1.0 To: Oleg Moskalenko Subject: Re: bug#10985: sort -k behavior possible problem: field span across the boundaries References: <031222CBCF33214AB2EB4ABA279428A30107B5E9D284@SJCPMAILBOX01.citrite.net> In-Reply-To: <031222CBCF33214AB2EB4ABA279428A30107B5E9D284@SJCPMAILBOX01.citrite.net> X-Enigmail-Version: 1.3.5 OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enigB3DFC489B26C37CCCF116BE1" X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: 10985-done Cc: 10985-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigB3DFC489B26C37CCCF116BE1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable tag 10985 notabug thanks On 03/09/2012 12:46 PM, Oleg Moskalenko wrote: > Hi >=20 > While testing different GNU coreutils sort versions on different platfo= rms (Linux and FreeBSD) I found that some behavior is probably not what a= utility user expects. Thanks for the report. However, you probably found behavior that is required by POSIX. >=20 > Let's, say, we have to sort (numerically stable) just two lines: >=20 > $ sort -t "|" -ns -k2.3,2.7 < 1|234 > 1|2|34 > ! Let's use 'sort --debug' to see what really happened: $ LC_ALL=3DC sort --debug -t\| -ns -k2.3,2.7 < 1|234 > 1|2|34 > a sort: using simple byte comparison 1|234 _ 1|2|34 __ So this sorted by locating the start of the second field ("234" of one line, and "2|34" of the other line), then starting at the 3rd byte past that location (even if it is in the next field). This behavior is required by POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html >=20 > The correct output (from my point of view) must be: >=20 > 1|2|34 > 1|234 Sorry, but that interpretation does not match POSIX. >=20 > My reasoning is that applying the key specs "-k2.3,2.7" to string "1|23= 4" we obtain the key "4", and applying the same key to the string "1|2|34= " we must obtain "" (empty string), That's where you are wrong. POSIX states: >> The notation: >>=20 >> -k field_start[type][,field_end[type]] >>=20 >> shall define a key field that begins at field_start and ends at field_= end inclusive, unless field_start falls beyond the end of the line or aft= er field_end, in which case the key field is empty. A missing field_end s= hall mean the last character of the line. >>=20 >> A field comprises a maximal sequence of non-separating characters and,= in the absence of option -t, any preceding field separator. >>=20 >> The field_start portion of the keydef option-argument shall have the f= orm: >>=20 >> field_number[.first_character] >>=20 >> Fields and characters within fields shall be numbered starting with 1.= The field_number and first_character pieces, interpreted as positive dec= imal integers, shall specify the first character to be used as part of a = sort key. If .first_character is omitted, it shall refer to the first cha= racter of the field. That is, the field_start 2.3 means to start at the third character past the second field, regardless if any intermediate field separators are located, and that _only_ the end of a line (and not another field separator) can result in an empty key field. >=20 > I do not know whether this is an intended behavior or a bug, Intended and mandated by the standards. > but this is definitely non-intuitive and not what a reasonable user wou= ld expect. Perhaps so, but if you want it changed, you need to file a bug report against POSIX. As such, I'm going to close out this coreutils bug. --=20 Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --------------enigB3DFC489B26C37CCCF116BE1 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJPWmYgAAoJEKeha0olJ0Nq4hsIAJXODhMKYFLHwcYRLafflUHC 2or7Hb3anMUncbJZC/KyTdMpZtbdabqP1/WzcO50h49zaXx4GOXRMgqEpmr4FGPD qXKVXcyEIKr2r9YR+RNUg0liToU3a6n2qRwcTGt543N6tbJ/YO183MMeOb6JEJ6U e0H8wDIdcHpKzsddEcj5JPBBmW5Qrz79QI8QXrcyy2wsxuve35f+XXoDTCnQ57Ns wEVA2KllsZdEVu0uh5uF7uTztE1M/BlkDQXApfEGEjRHbyfEGj0YWInSKThR3V+t qPkJpFDEtl/qnD+Ys5SW9WZKhnAdtlgiFTZBLu3xzAiD4eos95HT0mXm9j/RpTc= =j2My -----END PGP SIGNATURE----- --------------enigB3DFC489B26C37CCCF116BE1-- ------------=_1331324522-28008-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 9 Mar 2012 19:57:24 +0000 Received: from localhost ([127.0.0.1]:41007 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S65wJ-0006hp-0p for submit@debbugs.gnu.org; Fri, 09 Mar 2012 14:57:24 -0500 Received: from eggs.gnu.org ([208.118.235.92]:50027) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S65n8-0006UJ-KK for submit@debbugs.gnu.org; Fri, 09 Mar 2012 14:47:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S65mB-0007S9-9x for submit@debbugs.gnu.org; Fri, 09 Mar 2012 14:46:57 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,HTML_MESSAGE, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:47110) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S65mB-0007S5-6e for submit@debbugs.gnu.org; Fri, 09 Mar 2012 14:46:55 -0500 Received: from eggs.gnu.org ([208.118.235.92]:42817) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S65m8-00070v-Rh for bug-coreutils@gnu.org; Fri, 09 Mar 2012 14:46:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S65m5-0007RA-Qh for bug-coreutils@gnu.org; Fri, 09 Mar 2012 14:46:52 -0500 Received: from smtp02.citrix.com ([66.165.176.63]:5451) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S65m5-0007QA-Hn for bug-coreutils@gnu.org; Fri, 09 Mar 2012 14:46:49 -0500 X-IronPort-AV: E=Sophos;i="4.73,559,1325480400"; d="scan'208,217";a="185295783" Received: from sjcpmailmx02.citrite.net ([10.216.14.75]) by FTLPIPO02.CITRIX.COM with ESMTP/TLS/RC4-MD5; 09 Mar 2012 14:46:46 -0500 Received: from SJCPMAILBOX01.citrite.net ([10.216.4.73]) by SJCPMAILMX02.citrite.net ([10.216.14.75]) with mapi; Fri, 9 Mar 2012 11:46:45 -0800 From: Oleg Moskalenko To: "'bug-coreutils@gnu.org'" Date: Fri, 9 Mar 2012 11:46:45 -0800 Subject: sort -k behavior possible problem: field span across the boundaries Thread-Topic: sort -k behavior possible problem: field span across the boundaries Thread-Index: Acz+LVuZ9E4BBTs+RBu3Le5MCYhZlA== Message-ID: <031222CBCF33214AB2EB4ABA279428A30107B5E9D284@SJCPMAILBOX01.citrite.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_031222CBCF33214AB2EB4ABA279428A30107B5E9D284SJCPMAILBOX_" MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 09 Mar 2012 14:57:21 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) --_000_031222CBCF33214AB2EB4ABA279428A30107B5E9D284SJCPMAILBOX_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi While testing different GNU coreutils sort versions on different platforms = (Linux and FreeBSD) I found that some behavior is probably not what a utili= ty user expects. Let's, say, we have to sort (numerically stable) just two lines: $ sort -t "|" -ns -k2.3,2.7 <

Hi

 

While testi= ng different GNU coreutils sort versions on different platforms (Linux and = FreeBSD) I found that some behavior is probably not what a utility user exp= ects.

 

Let’s, say, we have to sort (numerically stable) just two lin= es:

 

$ sort –t “|” –ns –k2.3,2.7 <<!

1|234

= 1|2|34

!

 

The GNU sort output is:<= /o:p>

 

1|2= 34

1|2|34

 

 

The correct output (from my point of view) must be:

 

1|2|34<= o:p>

1|234

 

My reasoning is that applying th= e key specs “-k2.3,2.7” to string “1|234” we obtain= the key “4”, and applying the same key to the string “1|= 2|34” we must obtain “” (empty string), because the secon= d field is just “2” and symbols from 3rd to 7th posi= tion give us an empty string. And the empty string is smaller than a number= , numerically, according to the “info sort”.

 

On the other han= d, the GNU sort (I suppose) just takes an offset from the field start, with= out taking into account the real field length. It yields the key “34&= #8221;, and this is larger, numerically, than “4”.

 

I do not k= now whether this is an intended behavior or a bug, but this is definitely n= on-intuitive and not what a reasonable user would expect.

 

Thanks a lot !<= o:p>

Oleg Moskalenko

 

= --_000_031222CBCF33214AB2EB4ABA279428A30107B5E9D284SJCPMAILBOX_-- ------------=_1331324522-28008-1--