From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 09 14:57:24 2012 Received: (at submit) by debbugs.gnu.org; 9 Mar 2012 19:57:24 +0000 Received: from localhost ([127.0.0.1]:41007 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S65wJ-0006hp-0p for submit@debbugs.gnu.org; Fri, 09 Mar 2012 14:57:24 -0500 Received: from eggs.gnu.org ([208.118.235.92]:50027) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S65n8-0006UJ-KK for submit@debbugs.gnu.org; Fri, 09 Mar 2012 14:47:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S65mB-0007S9-9x for submit@debbugs.gnu.org; Fri, 09 Mar 2012 14:46:57 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,HTML_MESSAGE, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:47110) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S65mB-0007S5-6e for submit@debbugs.gnu.org; Fri, 09 Mar 2012 14:46:55 -0500 Received: from eggs.gnu.org ([208.118.235.92]:42817) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S65m8-00070v-Rh for bug-coreutils@gnu.org; Fri, 09 Mar 2012 14:46:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S65m5-0007RA-Qh for bug-coreutils@gnu.org; Fri, 09 Mar 2012 14:46:52 -0500 Received: from smtp02.citrix.com ([66.165.176.63]:5451) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S65m5-0007QA-Hn for bug-coreutils@gnu.org; Fri, 09 Mar 2012 14:46:49 -0500 X-IronPort-AV: E=Sophos;i="4.73,559,1325480400"; d="scan'208,217";a="185295783" Received: from sjcpmailmx02.citrite.net ([10.216.14.75]) by FTLPIPO02.CITRIX.COM with ESMTP/TLS/RC4-MD5; 09 Mar 2012 14:46:46 -0500 Received: from SJCPMAILBOX01.citrite.net ([10.216.4.73]) by SJCPMAILMX02.citrite.net ([10.216.14.75]) with mapi; Fri, 9 Mar 2012 11:46:45 -0800 From: Oleg Moskalenko To: "'bug-coreutils@gnu.org'" Date: Fri, 9 Mar 2012 11:46:45 -0800 Subject: sort -k behavior possible problem: field span across the boundaries Thread-Topic: sort -k behavior possible problem: field span across the boundaries Thread-Index: Acz+LVuZ9E4BBTs+RBu3Le5MCYhZlA== Message-ID: <031222CBCF33214AB2EB4ABA279428A30107B5E9D284@SJCPMAILBOX01.citrite.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_031222CBCF33214AB2EB4ABA279428A30107B5E9D284SJCPMAILBOX_" MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 09 Mar 2012 14:57:21 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) --_000_031222CBCF33214AB2EB4ABA279428A30107B5E9D284SJCPMAILBOX_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi While testing different GNU coreutils sort versions on different platforms = (Linux and FreeBSD) I found that some behavior is probably not what a utili= ty user expects. Let's, say, we have to sort (numerically stable) just two lines: $ sort -t "|" -ns -k2.3,2.7 <

Hi

 

While testi= ng different GNU coreutils sort versions on different platforms (Linux and = FreeBSD) I found that some behavior is probably not what a utility user exp= ects.

 

Let’s, say, we have to sort (numerically stable) just two lin= es:

 

$ sort –t “|” –ns –k2.3,2.7 <<!

1|234

= 1|2|34

!

 

The GNU sort output is:<= /o:p>

 

1|2= 34

1|2|34

 

 

The correct output (from my point of view) must be:

 

1|2|34<= o:p>

1|234

 

My reasoning is that applying th= e key specs “-k2.3,2.7” to string “1|234” we obtain= the key “4”, and applying the same key to the string “1|= 2|34” we must obtain “” (empty string), because the secon= d field is just “2” and symbols from 3rd to 7th posi= tion give us an empty string. And the empty string is smaller than a number= , numerically, according to the “info sort”.

 

On the other han= d, the GNU sort (I suppose) just takes an offset from the field start, with= out taking into account the real field length. It yields the key “34&= #8221;, and this is larger, numerically, than “4”.

 

I do not k= now whether this is an intended behavior or a bug, but this is definitely n= on-intuitive and not what a reasonable user would expect.

 

Thanks a lot !<= o:p>

Oleg Moskalenko

 

= --_000_031222CBCF33214AB2EB4ABA279428A30107B5E9D284SJCPMAILBOX_-- From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 09 15:21:54 2012 Received: (at control) by debbugs.gnu.org; 9 Mar 2012 20:21:54 +0000 Received: from localhost ([127.0.0.1]:41034 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S66K1-0007HD-9n for submit@debbugs.gnu.org; Fri, 09 Mar 2012 15:21:53 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54844) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S66Jw-0007Gx-DB; Fri, 09 Mar 2012 15:21:51 -0500 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q29KKnCt010128 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 9 Mar 2012 15:20:49 -0500 Received: from [10.3.113.110] (ovpn-113-110.phx2.redhat.com [10.3.113.110]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id q29KKmrj020676; Fri, 9 Mar 2012 15:20:49 -0500 Message-ID: <4F5A6620.8050008@redhat.com> Date: Fri, 09 Mar 2012 13:20:48 -0700 From: Eric Blake Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120216 Thunderbird/10.0.1 MIME-Version: 1.0 To: Oleg Moskalenko Subject: Re: bug#10985: sort -k behavior possible problem: field span across the boundaries References: <031222CBCF33214AB2EB4ABA279428A30107B5E9D284@SJCPMAILBOX01.citrite.net> In-Reply-To: <031222CBCF33214AB2EB4ABA279428A30107B5E9D284@SJCPMAILBOX01.citrite.net> X-Enigmail-Version: 1.3.5 OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enigB3DFC489B26C37CCCF116BE1" X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: control Cc: 10985-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigB3DFC489B26C37CCCF116BE1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable tag 10985 notabug thanks On 03/09/2012 12:46 PM, Oleg Moskalenko wrote: > Hi >=20 > While testing different GNU coreutils sort versions on different platfo= rms (Linux and FreeBSD) I found that some behavior is probably not what a= utility user expects. Thanks for the report. However, you probably found behavior that is required by POSIX. >=20 > Let's, say, we have to sort (numerically stable) just two lines: >=20 > $ sort -t "|" -ns -k2.3,2.7 < 1|234 > 1|2|34 > ! Let's use 'sort --debug' to see what really happened: $ LC_ALL=3DC sort --debug -t\| -ns -k2.3,2.7 < 1|234 > 1|2|34 > a sort: using simple byte comparison 1|234 _ 1|2|34 __ So this sorted by locating the start of the second field ("234" of one line, and "2|34" of the other line), then starting at the 3rd byte past that location (even if it is in the next field). This behavior is required by POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html >=20 > The correct output (from my point of view) must be: >=20 > 1|2|34 > 1|234 Sorry, but that interpretation does not match POSIX. >=20 > My reasoning is that applying the key specs "-k2.3,2.7" to string "1|23= 4" we obtain the key "4", and applying the same key to the string "1|2|34= " we must obtain "" (empty string), That's where you are wrong. POSIX states: >> The notation: >>=20 >> -k field_start[type][,field_end[type]] >>=20 >> shall define a key field that begins at field_start and ends at field_= end inclusive, unless field_start falls beyond the end of the line or aft= er field_end, in which case the key field is empty. A missing field_end s= hall mean the last character of the line. >>=20 >> A field comprises a maximal sequence of non-separating characters and,= in the absence of option -t, any preceding field separator. >>=20 >> The field_start portion of the keydef option-argument shall have the f= orm: >>=20 >> field_number[.first_character] >>=20 >> Fields and characters within fields shall be numbered starting with 1.= The field_number and first_character pieces, interpreted as positive dec= imal integers, shall specify the first character to be used as part of a = sort key. If .first_character is omitted, it shall refer to the first cha= racter of the field. That is, the field_start 2.3 means to start at the third character past the second field, regardless if any intermediate field separators are located, and that _only_ the end of a line (and not another field separator) can result in an empty key field. >=20 > I do not know whether this is an intended behavior or a bug, Intended and mandated by the standards. > but this is definitely non-intuitive and not what a reasonable user wou= ld expect. Perhaps so, but if you want it changed, you need to file a bug report against POSIX. As such, I'm going to close out this coreutils bug. --=20 Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --------------enigB3DFC489B26C37CCCF116BE1 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJPWmYgAAoJEKeha0olJ0Nq4hsIAJXODhMKYFLHwcYRLafflUHC 2or7Hb3anMUncbJZC/KyTdMpZtbdabqP1/WzcO50h49zaXx4GOXRMgqEpmr4FGPD qXKVXcyEIKr2r9YR+RNUg0liToU3a6n2qRwcTGt543N6tbJ/YO183MMeOb6JEJ6U e0H8wDIdcHpKzsddEcj5JPBBmW5Qrz79QI8QXrcyy2wsxuve35f+XXoDTCnQ57Ns wEVA2KllsZdEVu0uh5uF7uTztE1M/BlkDQXApfEGEjRHbyfEGj0YWInSKThR3V+t qPkJpFDEtl/qnD+Ys5SW9WZKhnAdtlgiFTZBLu3xzAiD4eos95HT0mXm9j/RpTc= =j2My -----END PGP SIGNATURE----- --------------enigB3DFC489B26C37CCCF116BE1-- From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 09 15:29:35 2012 Received: (at 10985-done) by debbugs.gnu.org; 9 Mar 2012 20:29:35 +0000 Received: from localhost ([127.0.0.1]:41057 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S66RT-0007TJ-7z for submit@debbugs.gnu.org; Fri, 09 Mar 2012 15:29:35 -0500 Received: from smtp02.citrix.com ([66.165.176.63]:24530) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S66RQ-0007TB-LB for 10985-done@debbugs.gnu.org; Fri, 09 Mar 2012 15:29:33 -0500 X-IronPort-AV: E=Sophos;i="4.73,559,1325480400"; d="scan'208";a="185302034" Received: from sjcpmailmx01.citrite.net ([10.216.14.74]) by FTLPIPO02.CITRIX.COM with ESMTP/TLS/RC4-MD5; 09 Mar 2012 15:28:14 -0500 Received: from SJCPMAILBOX01.citrite.net ([10.216.4.73]) by SJCPMAILMX01.citrite.net ([10.216.14.74]) with mapi; Fri, 9 Mar 2012 12:28:13 -0800 From: Oleg Moskalenko To: 'Eric Blake' Date: Fri, 9 Mar 2012 12:28:13 -0800 Subject: RE: bug#10985: sort -k behavior possible problem: field span across the boundaries Thread-Topic: bug#10985: sort -k behavior possible problem: field span across the boundaries Thread-Index: Acz+MiAgNkW+YkPeSMmQYvLNKec4IAAAJX4Q Message-ID: <031222CBCF33214AB2EB4ABA279428A30107B5E9D285@SJCPMAILBOX01.citrite.net> References: <031222CBCF33214AB2EB4ABA279428A30107B5E9D284@SJCPMAILBOX01.citrite.net> <4F5A6620.8050008@redhat.com> In-Reply-To: <4F5A6620.8050008@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Spam-Score: -2.6 (--) X-Debbugs-Envelope-To: 10985-done Cc: "10985-done@debbugs.gnu.org" <10985-done@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) Hi Blake Thank you for the reply and explanations ! Best regards, Oleg From unknown Sun Jun 15 08:28:58 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 07 Apr 2012 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator