From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 15 15:31:31 2014 Received: (at submit) by debbugs.gnu.org; 15 Aug 2014 19:31:31 +0000 Received: from localhost ([127.0.0.1]:44103 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XINDm-0006WB-Q6 for submit@debbugs.gnu.org; Fri, 15 Aug 2014 15:31:31 -0400 Received: from eggs.gnu.org ([208.118.235.92]:45376) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIND3-0006UX-Ew for submit@debbugs.gnu.org; Fri, 15 Aug 2014 15:30:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XINCn-0003Ii-Ue for submit@debbugs.gnu.org; Fri, 15 Aug 2014 15:30:40 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:45943) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XINCn-0003Ie-R6 for submit@debbugs.gnu.org; Fri, 15 Aug 2014 15:30:29 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47093) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XINCf-0007vs-Qp for bug-coreutils@gnu.org; Fri, 15 Aug 2014 15:30:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XINCY-0003I0-AE for bug-coreutils@gnu.org; Fri, 15 Aug 2014 15:30:21 -0400 Received: from mail.csclub.uwaterloo.ca ([129.97.134.52]:34346) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XINCY-0003Hv-6X for bug-coreutils@gnu.org; Fri, 15 Aug 2014 15:30:14 -0400 Received: from caffeine.csclub.uwaterloo.ca (caffeine.csclub.uwaterloo.ca [129.97.134.17]) by mail.csclub.uwaterloo.ca (Postfix) with SMTP id 59D4D27C7E; Fri, 15 Aug 2014 15:30:11 -0400 (EDT) Received: by caffeine.csclub.uwaterloo.ca (sSMTP sendmail emulation); Fri, 15 Aug 2014 15:30:11 -0400 From: "Lennart Sorensen" Date: Fri, 15 Aug 2014 15:30:11 -0400 To: bug-coreutils@gnu.org Subject: sort seems to misbehave if both -u and -n or -k are used Message-ID: <20140815193011.GB17765@csclub.uwaterloo.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 15 Aug 2014 15:31:28 -0400 Cc: Len Sorensen X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Here is the case that has me thinking there is a bug (it sure doesn't make sense as valid behaviour). input: Version: 1.0.1e-2+deb7u12 Version: 1.0.1e-2+deb7u11 Version: 1.0.1e-2+deb7u12 Version: 1.0.1e-2+deb7u7 Version: 1.0.1e-2+deb7u11 OK output using 'sort': Version: 1.0.1e-2+deb7u11 Version: 1.0.1e-2+deb7u11 Version: 1.0.1e-2+deb7u12 Version: 1.0.1e-2+deb7u12 Version: 1.0.1e-2+deb7u7 OK output using 'sort -u': Version: 1.0.1e-2+deb7u11 Version: 1.0.1e-2+deb7u12 Version: 1.0.1e-2+deb7u7 OK output using 'sort -n': Version: 1.0.1e-2+deb7u11 Version: 1.0.1e-2+deb7u11 Version: 1.0.1e-2+deb7u12 Version: 1.0.1e-2+deb7u12 Version: 1.0.1e-2+deb7u7 (I may have hoped that one would sort by the last number given everything else is equal, but I did not expect it to actually do so). OK output using 'sort -k 3': Version: 1.0.1e-2+deb7u11 Version: 1.0.1e-2+deb7u11 Version: 1.0.1e-2+deb7u12 Version: 1.0.1e-2+deb7u12 Version: 1.0.1e-2+deb7u7 Weird output using 'sort -n -u': Version: 1.0.1e-2+deb7u12 Weird output using 'sort -k 3 -u': Version: 1.0.1e-2+deb7u12 So is this actually the expected behaviour? I would have thought from the documentation that -u would return unique lines of output, not just one line based on whatever sort key it happened to look at. -- Len Sorensen From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 15 15:49:14 2014 Received: (at control) by debbugs.gnu.org; 15 Aug 2014 19:49:14 +0000 Received: from localhost ([127.0.0.1]:44112 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XINUv-00073p-OQ for submit@debbugs.gnu.org; Fri, 15 Aug 2014 15:49:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:11032) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XINUr-00073N-Ew; Fri, 15 Aug 2014 15:49:10 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s7FJmwGW011325 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 15 Aug 2014 15:48:59 -0400 Received: from [10.3.113.16] ([10.3.113.16]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s7FJmv2i015583; Fri, 15 Aug 2014 15:48:58 -0400 Message-ID: <53EE6429.4050107@redhat.com> Date: Fri, 15 Aug 2014 13:48:57 -0600 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0 MIME-Version: 1.0 To: Lennart Sorensen , 18273-done@debbugs.gnu.org Subject: Re: bug#18273: sort seems to misbehave if both -u and -n or -k are used References: <20140815193011.GB17765@csclub.uwaterloo.ca> In-Reply-To: <20140815193011.GB17765@csclub.uwaterloo.ca> OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="IgvG9qXO2A71Ci7sNKsCwj4mkTp3hbJkS" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Spam-Score: -5.7 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.7 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --IgvG9qXO2A71Ci7sNKsCwj4mkTp3hbJkS Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable tag 18273 notabug thanks On 08/15/2014 01:30 PM, Lennart Sorensen wrote: > Here is the case that has me thinking there is a bug (it sure doesn't > make sense as valid behaviour). Thanks for the report. However, the behavior you have demonstrated is required by POSIX, and is therefore not a bug. The --debug option can be used to see what is really happening. >=20 > OK output using 'sort -n': >=20 > Version: 1.0.1e-2+deb7u11 > Version: 1.0.1e-2+deb7u11 > Version: 1.0.1e-2+deb7u12 > Version: 1.0.1e-2+deb7u12 > Version: 1.0.1e-2+deb7u7 >=20 > (I may have hoped that one would sort by the last number given everythi= ng > else is equal, but I did not expect it to actually do so). Actually, using -n without any other hints says to treat _the entire line_ as a number, and to quit parsing as soon as a non-numeric portion is found. Observe: $ LC_ALL=3DC sort foo --debug -n sort: using simple byte comparison Version: 1.0.1e-2+deb7u11 ^ no match for key _________________________ Version: 1.0.1e-2+deb7u11 ^ no match for key _________________________ Version: 1.0.1e-2+deb7u12 ^ no match for key _________________________ Version: 1.0.1e-2+deb7u12 ^ no match for key _________________________ Version: 1.0.1e-2+deb7u7 ^ no match for key ________________________ Furthermore, if you disable the last-resort comparison of the entire line, then you get the input order, since all of your keys were identically the empty numeric string at the front of the line: $ LC_ALL=3DC sort foo --debug -n -s sort: using simple byte comparison Version: 1.0.1e-2+deb7u12 ^ no match for key Version: 1.0.1e-2+deb7u11 ^ no match for key Version: 1.0.1e-2+deb7u12 ^ no match for key Version: 1.0.1e-2+deb7u7 ^ no match for key Version: 1.0.1e-2+deb7u11 ^ no match for key >=20 > OK output using 'sort -k 3': >=20 > Version: 1.0.1e-2+deb7u11 > Version: 1.0.1e-2+deb7u11 > Version: 1.0.1e-2+deb7u12 > Version: 1.0.1e-2+deb7u12 > Version: 1.0.1e-2+deb7u7 Umm, here, you don't HAVE a key 3. Again, as soon as you disable last-resort comparison, you get the original input order: $ LC_ALL=3DC sort foo --debug -k3 -s sort: using simple byte comparison Version: 1.0.1e-2+deb7u12 ^ no match for key Version: 1.0.1e-2+deb7u11 ^ no match for key Version: 1.0.1e-2+deb7u12 ^ no match for key Version: 1.0.1e-2+deb7u7 ^ no match for key Version: 1.0.1e-2+deb7u11 ^ no match for key >=20 > Weird output using 'sort -n -u': >=20 > Version: 1.0.1e-2+deb7u12 No, perfectly defined output. -u implictly enables -s, and I already demonstrated that -n on your input picks the initial empty string. Since all 5 lines have the same sort key, there is only one unique key seen, and the output is exactly the first line with that unique sort key. If you want to FORCE entire-line fallback, then request that as a fallback key (since -n by itself is global to all keys, I instead request two keys: the first as the numeric sort of the first field, the second as the fallback sort of the entire line): $ LC_ALL=3DC sort foo --debug -k1,1n -k1 -u sort: using simple byte comparison Version: 1.0.1e-2+deb7u11 ^ no match for key _________________________ Version: 1.0.1e-2+deb7u12 ^ no match for key _________________________ Version: 1.0.1e-2+deb7u7 ^ no match for key ________________________ >=20 > Weird output using 'sort -k 3 -u': >=20 > Version: 1.0.1e-2+deb7u12 Again, as proven above, all 5 lines have the same empty string (no such key at the end of the line), so the unique output is correct. >=20 > So is this actually the expected behaviour? I would have thought from > the documentation that -u would return unique lines of output, not just= > one line based on whatever sort key it happened to look at. Yes, sort -u is required to treat lines as unique solely based on the key(s) they were sorted by (and ignoring the default last-resort key, since -u implicitly disables -s). As this behavior is required by POSIX and consistent with other implementations, I'm closing it as not a bug. But if you have further comments or questions, you can continue to reply to this email. By the way, have you looked at sort -V, as a way to get what you appear to want? $ LC_ALL=3DC sort foo --debug -V -u sort: using simple byte comparison Version: 1.0.1e-2+deb7u7 ________________________ Version: 1.0.1e-2+deb7u11 _________________________ Version: 1.0.1e-2+deb7u12 _________________________ --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --IgvG9qXO2A71Ci7sNKsCwj4mkTp3hbJkS Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJT7mQpAAoJEKeha0olJ0Nq344H/0TJZYIxCPqJqb+iHOgjlBes 2y9t7e6oYqfipcEhRhGg4oALfn8Oj+GvElEturu6jvaqscjY0hlutdJ4S2pib4SZ 7CQ5ORHmwNAy3lJ22ZTh4TZkE8FRlIqY2JKvmfjIxBLif11gm7kmNtrTfpK7VAHW R2ado0HxTj5vPrrkHLCbq5NTB2elgp0tQ4AS4vHsc3pcO363SPZCLXy46rrO/V4I S0nrk4EePcTM9ACCXELA1ZvUdVq4gYmi/Kh0kV1BMcf1ZzEvtDfwgs7Ua/wXrtsR kI9Qr4L2PAa+lkS/+v5sd9VejnuwJPl6TbJZf/q4ucTOR81bsMrrm3WayUirJHo= =MoAX -----END PGP SIGNATURE----- --IgvG9qXO2A71Ci7sNKsCwj4mkTp3hbJkS-- From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 15 16:22:12 2014 Received: (at 18273) by debbugs.gnu.org; 15 Aug 2014 20:22:12 +0000 Received: from localhost ([127.0.0.1]:44135 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIO0o-0000mq-Ts for submit@debbugs.gnu.org; Fri, 15 Aug 2014 16:22:11 -0400 Received: from mail.csclub.uwaterloo.ca ([129.97.134.52]:42717) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIO0n-0000mg-2S for 18273@debbugs.gnu.org; Fri, 15 Aug 2014 16:22:09 -0400 Received: from caffeine.csclub.uwaterloo.ca (caffeine.csclub.uwaterloo.ca [129.97.134.17]) by mail.csclub.uwaterloo.ca (Postfix) with SMTP id 661732102E for <18273@debbugs.gnu.org>; Fri, 15 Aug 2014 16:22:07 -0400 (EDT) Received: by caffeine.csclub.uwaterloo.ca (sSMTP sendmail emulation); Fri, 15 Aug 2014 16:22:07 -0400 From: "Lennart Sorensen" Date: Fri, 15 Aug 2014 16:22:07 -0400 To: 18273@debbugs.gnu.org Subject: Re: bug#18273: closed (Re: bug#18273: sort seems to misbehave if both -u and -n or -k are used) Message-ID: <20140815202207.GE17765@csclub.uwaterloo.ca> References: <53EE6429.4050107@redhat.com> <20140815193011.GB17765@csclub.uwaterloo.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 18273 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Fri, Aug 15, 2014 at 07:50:04PM +0000, GNU bug Tracking System wrote: > Your bug report > > #18273: sort seems to misbehave if both -u and -n or -k are used > > which was filed against the coreutils package, has been closed. > > The explanation is attached below, along with your original report. > If you require more details, please reply to 18273@debbugs.gnu.org. > > -- > 18273: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18273 > GNU Bug Tracking System > Contact help-debbugs@gnu.org with problems > From: Eric Blake > To: Lennart Sorensen , > 18273-done@debbugs.gnu.org > Subject: Re: bug#18273: sort seems to misbehave if both -u and -n or -k are > used > > tag 18273 notabug > thanks > > On 08/15/2014 01:30 PM, Lennart Sorensen wrote: > > Here is the case that has me thinking there is a bug (it sure doesn't > > make sense as valid behaviour). > > Thanks for the report. However, the behavior you have demonstrated is > required by POSIX, and is therefore not a bug. The --debug option can > be used to see what is really happening. OK I accept that it is correct behaviour. The documentation on the other hand is awful in that case. I went and checked the documentation to try and make sense of what it was doing before sending the report, and there was nothing there that gave any hint that this was expected behaviour. Why does it have a blob talking about which options implicitly enable -s, rather than mention that in the documentation for the options that do it. Why does it not mention for -n that anything that isn't a number is ignored and treated as if it didn't exist when it comes to deciding things like uniqueness? Are people expected to go read the posix standard instead? -- Len Sorensen From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 15 16:32:23 2014 Received: (at 18273) by debbugs.gnu.org; 15 Aug 2014 20:32:23 +0000 Received: from localhost ([127.0.0.1]:44141 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIOAg-00015a-Pw for submit@debbugs.gnu.org; Fri, 15 Aug 2014 16:32:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:26618) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIOAe-00015Q-9m for 18273@debbugs.gnu.org; Fri, 15 Aug 2014 16:32:21 -0400 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s7FKWFxP027762 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 15 Aug 2014 16:32:16 -0400 Received: from [10.3.113.16] ([10.3.113.16]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s7FKWFDu022913; Fri, 15 Aug 2014 16:32:15 -0400 Message-ID: <53EE6E4E.90409@redhat.com> Date: Fri, 15 Aug 2014 14:32:14 -0600 From: Eric Blake Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.7.0 MIME-Version: 1.0 To: Lennart Sorensen , 18273@debbugs.gnu.org Subject: Re: bug#18273: closed (Re: bug#18273: sort seems to misbehave if both -u and -n or -k are used) References: <53EE6429.4050107@redhat.com> <20140815193011.GB17765@csclub.uwaterloo.ca> <20140815202207.GE17765@csclub.uwaterloo.ca> In-Reply-To: <20140815202207.GE17765@csclub.uwaterloo.ca> OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="L4BVG8VTIcSp3fi0GgU0fjh2B7pgjA7d8" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Spam-Score: -5.7 (-----) X-Debbugs-Envelope-To: 18273 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.7 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --L4BVG8VTIcSp3fi0GgU0fjh2B7pgjA7d8 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 08/15/2014 02:22 PM, Lennart Sorensen wrote: > OK I accept that it is correct behaviour. >=20 > The documentation on the other hand is awful in that case. I went and > checked the documentation to try and make sense of what it was doing > before sending the report, and there was nothing there that gave any > hint that this was expected behaviour. 'info sort' says: The '--stable' ('-s') option disables this "last-resort comparison" so that lines in which all fields compare equal are left in their original relative order. The '--unique' ('-u') option also disables the last-resort comparison. and later on: '-u' '--unique' Normally, output only the first of a sequence of lines that compare equal. For the '--check' ('-c' or '-C') option, check that no pair of consecutive lines compares equal. This option also disables the default last-resort comparison. The commands 'sort -u' and 'sort | uniq' are equivalent, but this equivalence does not extend to arbitrary 'sort' options. For example, 'sort -n -u' inspects only the value of the initial numeric string when checking for uniqueness, whereas 'sort -n | uniq' inspects the entire line. *Note uniq invocation::. >=20 > Why does it have a blob talking about which options implicitly enable -= s, > rather than mention that in the documentation for the options that do i= t. -u is the only option that implicitly enables -s. You are welcome to propose a patch to the documentation that would clarify the situation; we can reopen this bug if a patch materializes. Maybe even a change to 'sort --help' output to mention that -u implies -s (which would also feed the 'man sort' page). >=20 > Why does it not mention for -n that anything that isn't a number is > ignored and treated as if it didn't exist when it comes to deciding > things like uniqueness? Are people expected to go read the posix > standard instead? The info page DOES mention this: '-n' '--numeric-sort' '--sort=3Dnumeric' Sort numerically. The number begins each line and consists of optional blanks, an optional '-' sign, and zero or more digits possibly separated by thousands separators, optionally followed by a decimal-point character and zero or more digits. An empty number is treated as '0'. The 'LC_NUMERIC' locale specifies the decimal-point character and thousands separator. By default a blank is a space or a tab, but the 'LC_CTYPE' locale can change this. The --help output is intentionally terse, so I don't know what we could do there to make it more obvious without exploding the size of what is supposed to be brief. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --L4BVG8VTIcSp3fi0GgU0fjh2B7pgjA7d8 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJT7m5OAAoJEKeha0olJ0NqhokH/RmduuNyNoImNszML1SpsvdR ARlEBDE27SlLsWTSyfnUbU0M1Rt/WTUpG80pe0NPTrhSo/ejM9EESnWYRfmyIh6D t2RJgw+1+8W0k5B1zfB3rOGbsZVqKtzXUwAHPpfoB/QFNf66ky8+rCNUIVa19kiJ z7Z/0LGEQnEf2mWfKz14HU6hhLc1CqO61W0KDomE1DBYvdRlkve8kl7ilRNoSql0 NlVpXlcCNi+If5sGFB1vf3fFYpJd1VUR8r4CYCcmhP2eK3wuc47tXaxGwumBYuU6 t6MGa1HZ5J4+P7hcorC1ziqNLXYMGgJRDiMlMv+/PBLlIn6tCR7StTvu7qWbBNI= =i5ff -----END PGP SIGNATURE----- --L4BVG8VTIcSp3fi0GgU0fjh2B7pgjA7d8-- From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 15 17:05:29 2014 Received: (at 18273) by debbugs.gnu.org; 15 Aug 2014 21:05:29 +0000 Received: from localhost ([127.0.0.1]:44151 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIOgi-000283-Lg for submit@debbugs.gnu.org; Fri, 15 Aug 2014 17:05:29 -0400 Received: from mail.csclub.uwaterloo.ca ([129.97.134.52]:54451) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1XIOgf-00027s-AY for 18273@debbugs.gnu.org; Fri, 15 Aug 2014 17:05:26 -0400 Received: from caffeine.csclub.uwaterloo.ca (caffeine.csclub.uwaterloo.ca [129.97.134.17]) by mail.csclub.uwaterloo.ca (Postfix) with SMTP id D268220D0C; Fri, 15 Aug 2014 17:05:23 -0400 (EDT) Received: by caffeine.csclub.uwaterloo.ca (sSMTP sendmail emulation); Fri, 15 Aug 2014 17:05:23 -0400 From: "Lennart Sorensen" Date: Fri, 15 Aug 2014 17:05:23 -0400 To: Eric Blake Subject: Re: bug#18273: closed (Re: bug#18273: sort seems to misbehave if both -u and -n or -k are used) Message-ID: <20140815210523.GF17765@csclub.uwaterloo.ca> References: <53EE6429.4050107@redhat.com> <20140815193011.GB17765@csclub.uwaterloo.ca> <20140815202207.GE17765@csclub.uwaterloo.ca> <53EE6E4E.90409@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53EE6E4E.90409@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 18273 Cc: 18273@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Fri, Aug 15, 2014 at 02:32:14PM -0600, Eric Blake wrote: > 'info sort' says: > > The '--stable' ('-s') option > disables this "last-resort comparison" so that lines in which all fields > compare equal are left in their original relative order. The '--unique' > ('-u') option also disables the last-resort comparison. > > and later on: > > '-u' > '--unique' > > Normally, output only the first of a sequence of lines that compare > equal. For the '--check' ('-c' or '-C') option, check that no pair > of consecutive lines compares equal. > > This option also disables the default last-resort comparison. > > The commands 'sort -u' and 'sort | uniq' are equivalent, but this > equivalence does not extend to arbitrary 'sort' options. For > example, 'sort -n -u' inspects only the value of the initial > numeric string when checking for uniqueness, whereas 'sort -n | > uniq' inspects the entire line. *Note uniq invocation::. OK I guess that does somewhat point out the behaviour. > -u is the only option that implicitly enables -s. > > You are welcome to propose a patch to the documentation that would > clarify the situation; we can reopen this bug if a patch materializes. > Maybe even a change to 'sort --help' output to mention that -u implies > -s (which would also feed the 'man sort' page). I do wonder why there isn't an option to undo that implicit option, but perhaps it would not actually make sense. > The info page DOES mention this: > > '-n' > '--numeric-sort' > '--sort=numeric' > Sort numerically. The number begins each line and consists of > optional blanks, an optional '-' sign, and zero or more digits > possibly separated by thousands separators, optionally followed by > a decimal-point character and zero or more digits. An empty number > is treated as '0'. The 'LC_NUMERIC' locale specifies the > decimal-point character and thousands separator. By default a > blank is a space or a tab, but the 'LC_CTYPE' locale can change > this. > > The --help output is intentionally terse, so I don't know what we could > do there to make it more obvious without exploding the size of what is > supposed to be brief. Well I always thought info was meant to be complete documentation. I see nothing in the above that makes me think it would ignore the part of the line that isn't a number. The part in -u does seem to point out that this is the behaviour. I think this might be the first time I ever used -n when the input was not pure numbers, so I never hit this before. -- Len Sorensen From unknown Tue Jun 17 01:46:42 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 13 Sep 2014 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator