From unknown Thu Jun 19 14:05:35 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#22109 <22109@debbugs.gnu.org> To: bug#22109 <22109@debbugs.gnu.org> Subject: Status: Sort gives incorrect order when changing delimiters Reply-To: bug#22109 <22109@debbugs.gnu.org> Date: Thu, 19 Jun 2025 21:05:35 +0000 retitle 22109 Sort gives incorrect order when changing delimiters reassign 22109 coreutils submitter 22109 Ed Brambley severity 22109 normal tag 22109 notabug thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 07 11:16:31 2015 Received: (at submit) by debbugs.gnu.org; 7 Dec 2015 16:16:31 +0000 Received: from localhost ([127.0.0.1]:41870 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5ySl-0006TM-1h for submit@debbugs.gnu.org; Mon, 07 Dec 2015 11:16:31 -0500 Received: from eggs.gnu.org ([208.118.235.92]:42692) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5xpp-0005Y7-7g for submit@debbugs.gnu.org; Mon, 07 Dec 2015 10:36:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a5xpo-00006C-21 for submit@debbugs.gnu.org; Mon, 07 Dec 2015 10:36:17 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:60278) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5xpn-000067-Vk for submit@debbugs.gnu.org; Mon, 07 Dec 2015 10:36:16 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52676) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5xpm-0005ae-Lw for bug-coreutils@gnu.org; Mon, 07 Dec 2015 10:36:15 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a5xpl-00005O-HJ for bug-coreutils@gnu.org; Mon, 07 Dec 2015 10:36:14 -0500 Received: from mail-qg0-x234.google.com ([2607:f8b0:400d:c04::234]:34629) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5xpl-00005G-9w for bug-coreutils@gnu.org; Mon, 07 Dec 2015 10:36:13 -0500 Received: by qgeb1 with SMTP id b1so147450968qge.1 for ; Mon, 07 Dec 2015 07:36:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=HLCouaWMPtRbVAB4GRVPvgCXiLemA0jvjeN19LBxfC0=; b=RDNeCV9yfYmPXmSLkuggP/HfXQrK7KMIzm8pQXBdBdHOChq5E7pK+Pcxun7S1DOdjA NerOCsdUfLT/OP7+JLJ9lT2jDOhkv7w31GjeUD3Kz7ObPv4Niqhlj837V7+sFEGGs4G3 3odBFMuUfNE6HLbwWRyppPHkVj2B3FXSu6hXamZi9A6y+JbWj5OM6kqRzrQlyJI4BjH4 VbAvSv1Ti81hSyXANd1/b30BlswlI2SmeYmVYhxbiAHHLkC3fDWLXlgviax6lX1R5eD/ 9UM11uRoYQ9cWfScXzC5rnxFYENDwlsWea6W6pmfXCIGXthTPQoN0Qkn3flpfb7xoeXX v4lA== MIME-Version: 1.0 X-Received: by 10.140.98.117 with SMTP id n108mr37286867qge.56.1449502572652; Mon, 07 Dec 2015 07:36:12 -0800 (PST) Received: by 10.55.4.140 with HTTP; Mon, 7 Dec 2015 07:36:12 -0800 (PST) Date: Mon, 7 Dec 2015 15:36:12 +0000 Message-ID: Subject: Sort gives incorrect order when changing delimiters From: Ed Brambley To: bug-coreutils@gnu.org Content-Type: multipart/alternative; boundary=001a113abcb0f1cb920526509d5b X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 07 Dec 2015 11:16:28 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --001a113abcb0f1cb920526509d5b Content-Type: text/plain; charset=UTF-8 The following problem came to light following a StackOverflow question [1]. The lexical ordering of sort appears to depend on the delimiter used, and I believe it shouldn't. As a minimal example: ### Correct ordering ### $ printf "1,a,1\n2,aa,2" | LC_ALL=C sort -k2 -t, 1,a,1 2,aa,2 ### Incorrect ordering by replacing the "," delimiter by "~" ### $ printf "1~a~1\n2~aa~2" | LC_ALL=C sort -k2 -t~ 2~aa~2 1~a~1 I think this is because, in ASCII, "," < "a" < "~". Many thanks, Ed [1] http://stackoverflow.com/questions/34134677/trying-to-understand-the-sort-utilty-in-linux --001a113abcb0f1cb920526509d5b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
The following problem came to light following a StackOverf= low question=20 [1]. The lexical ordering of sort appears to depend on the delimiter=20 used, and I believe it shouldn't. As a minimal example:

### Cor= rect ordering ###
$ printf "1,a,1\n2,aa,2" | LC_ALL=3DC sort -= k2 -t,
1,a,1
2,aa,2

### Incorrect ordering by replacing the &q= uot;," delimiter by "~" ###
$ printf "1~a~1\n2~aa~2&= quot; | LC_ALL=3DC sort -k2 -t~
2~aa~2
1~a~1

I think this is b= ecause, in ASCII, "," < "a" < "~".
<= br>Many thanks,
Ed

[1] http://stackove= rflow.com/questions/34134677/trying-to-understand-the-sort-utilty-in-linux<= /a>
--001a113abcb0f1cb920526509d5b-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 07 11:48:45 2015 Received: (at 22109) by debbugs.gnu.org; 7 Dec 2015 16:48:45 +0000 Received: from localhost ([127.0.0.1]:41899 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5yxx-0000bZ-AF for submit@debbugs.gnu.org; Mon, 07 Dec 2015 11:48:45 -0500 Received: from mail-qg0-f53.google.com ([209.85.192.53]:34504) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5yxv-0000bQ-2p for 22109@debbugs.gnu.org; Mon, 07 Dec 2015 11:48:43 -0500 Received: by qgeb1 with SMTP id b1so149473549qge.1 for <22109@debbugs.gnu.org>; Mon, 07 Dec 2015 08:48:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-type:content-transfer-encoding; bh=KkG7HvwwDvBP1zT7Ni008Cf54DqYWxHPPrVfpJXUg+g=; b=jd5Z38lw5GakwZ6P5x93Reg/PWx0wqGmNK3P6+ZYyON0qu2tgYj9eHz+Phaq8N0hw1 g0M1b2tOmqwTrd/0GMfQIHAkA3jZ8cFik1mUSsvn25j7Y4mKA8aHnvLQpUQUPYzlP9m0 ziBpdwypP7KhfTosZVq+6wFuUy+0utSrxE2ww6XFTZzGfapI74mXibShIf8Q/8irUFvv g2KD7tPxLvZckSLbSEw/ooYHilNYDQ9s2xKTvA8sR0m9br8xA4DmN4nl9WUHKa14x31w ySgsTn4OT8xRLj6/5cGIlD7wraeq2xNdHunen21P6IerMzyvci7wZl7eb8ixntw3XA1h kpqg== X-Received: by 10.140.36.232 with SMTP id p95mr37922032qgp.55.1449506922633; Mon, 07 Dec 2015 08:48:42 -0800 (PST) Received: from disco.erlich.nygenome.org ([69.74.14.178]) by smtp.googlemail.com with ESMTPSA id n138sm11889171qhc.31.2015.12.07.08.48.41 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 07 Dec 2015 08:48:41 -0800 (PST) Subject: Re: bug#22109: Sort gives incorrect order when changing delimiters To: Ed Brambley , 22109@debbugs.gnu.org References: From: Assaf Gordon Message-ID: <5665B8A3.3020109@gmail.com> Date: Mon, 7 Dec 2015 11:49:39 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22109 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) tag 22109 notabug close 22109 stop Hello Ed, On 12/07/2015 10:36 AM, Ed Brambley wrote: > The following problem came to light following a StackOverflow question [1]. The lexical ordering of sort appears to depend on the delimiter used, and I believe it shouldn't. As a minimal example: > > ### Correct ordering ### > $ printf "1,a,1\n2,aa,2" | LC_ALL=C sort -k2 -t, > 1,a,1 > 2,aa,2 > > ### Incorrect ordering by replacing the "," delimiter by "~" ### > $ printf "1~a~1\n2~aa~2" | LC_ALL=C sort -k2 -t~ > 2~aa~2 > 1~a~1 > This is not a bug in 'sort', but simply an incorrect usage of the key options. The parameter "-k2" means: use the second key *and all characters until the end of the line* to sort each line. In this case, the character after the second key ',' or '~' does come into play. The correct usage is to specify the key as "-k2,2" meaning: sort by the second key alone (then resolve equal keys by the entire line, unless --stable is used). $ printf "1~a~1\n2~aa~2" | LC_ALL=C sort -k2,2 -t~ 1~a~1 2~aa~2 Using sort's "--debug" option will illustrate the difference (notice the underscore characters indicating what is the key that is being used): Incorrect usage (-k2): $ printf "1~a~1\n2~aa~2" | LC_ALL=C sort --debug -k2 -t~ sort: using simple byte comparison 2~aa~2 ____ ______ 1~a~1 ___ _____ Better usage (-k2,2): $ printf "1~a~1\n2~aa~2" | LC_ALL=C sort --debug -k2,2 -t~ sort: using simple byte comparison 1~a~1 _ _____ 2~aa~2 __ ______ regards, - assaf From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 07 11:49:17 2015 Received: (at control) by debbugs.gnu.org; 7 Dec 2015 16:49:17 +0000 Received: from localhost ([127.0.0.1]:41906 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5yyT-0000d0-AS for submit@debbugs.gnu.org; Mon, 07 Dec 2015 11:49:17 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51165) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5yyQ-0000ck-8e; Mon, 07 Dec 2015 11:49:15 -0500 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (Postfix) with ESMTPS id 3DB8C8E688; Mon, 7 Dec 2015 16:49:13 +0000 (UTC) Received: from [10.3.113.183] (ovpn-113-183.phx2.redhat.com [10.3.113.183]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id tB7GnCeJ003812; Mon, 7 Dec 2015 11:49:12 -0500 Subject: Re: bug#22109: Sort gives incorrect order when changing delimiters To: Ed Brambley , 22109-done@debbugs.gnu.org References: From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg X-Enigmail-Draft-Status: N1110 Organization: Red Hat, Inc. Message-ID: <5665B884.1080407@redhat.com> Date: Mon, 7 Dec 2015 09:49:08 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="vcdTvDl7BxEiOGNjhl41Blpv2cBRVvxKb" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --vcdTvDl7BxEiOGNjhl41Blpv2cBRVvxKb Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable tag 22109 notabug thanks On 12/07/2015 08:36 AM, Ed Brambley wrote: > The following problem came to light following a StackOverflow question = [1]. > The lexical ordering of sort appears to depend on the delimiter used, a= nd I > believe it shouldn't. As a minimal example: Thanks for the report. However, you have not found a bug in sort, only in your misuse of the command line and in your incorrect assumptions. Let's investigate further with the --debug option: >=20 > ### Correct ordering ### > $ printf "1,a,1\n2,aa,2" | LC_ALL=3DC sort -k2 -t, > 1,a,1 > 2,aa,2 $ printf '1,a,1\n2,aa,2' | LC_ALL=3DC sort -k2 -t, --debug sort: using simple byte comparison 1,a,1 ___ _____ 2,aa,2 ____ ______ You are comparing the string "a,1" with "aa,2"; so the relative relation between ',' and 'a' matters. >=20 > ### Incorrect ordering by replacing the "," delimiter by "~" ### > $ printf "1~a~1\n2~aa~2" | LC_ALL=3DC sort -k2 -t~ > 2~aa~2 > 1~a~1 Same goes for here. $ printf '1~a~1\n2~aa~2' | LC_ALL=3DC sort -k2 -t~ --debug sort: using simple byte comparison 2~aa~2 ____ ______ 1~a~1 ___ _____ You compared the string "aa~2" with "a~1". >=20 > I think this is because, in ASCII, "," < "a" < "~". Yes, so you saw exactly what you asked for. But what you asked for ("sort starting from the second delimiter through to the end of the line") is probably not what you wanted. It sounds like you wanted "sort on ONLY the second delimiter", which is spelled differently: $ printf '1~a~1\n2~aa~2' | LC_ALL=3DC sort -k2,2 -t~ --debug sort: using simple byte comparison 1~a~1 _ _____ 2~aa~2 __ ______ Note that there is a very distinct difference between '-k2' and '-k2,2'; only the latter one limits the sort to JUST the second key ("a" vs. "aa", regardless of delimiter), while the former slurps in the rest of the line such that the spelling of the delimiter affects the result. I'm marking this as not a bug in the database, but feel free to add further comments. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --vcdTvDl7BxEiOGNjhl41Blpv2cBRVvxKb Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJWZbiIAAoJEKeha0olJ0Nq9/wH/0iU24Fg2Kb5ggzsJP2DGOPc ov0tirk4EJYnK1GMC2XbrIN/qg3Soby2OHhyExrv6GDbfojRoCteE1PMTp1miksg 8Qqj9SM/cnkw6/45CkUImlR/33w+9XFZwHNMimZvtX7nnKEStNX/jVcanGA9rEwS 7GeBTLnpyLlwuRO5+S8iOOZ2Y09KCJEs+wQ74dFEGisD1sW8fgIuL8mp8HGndOSg aevJt55NyOLtboAscbUCHtTne3ltnDWF0cpOgSBIHvRIn88o4znXq2iq91esgnMa 7OfaXlU6DDdpim4uUJkpE1xkmP3gV4f5S3vISfuAZ2V2m1aHYH9PdHXi+HROLMM= =YTTx -----END PGP SIGNATURE----- --vcdTvDl7BxEiOGNjhl41Blpv2cBRVvxKb-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 07 13:07:27 2015 Received: (at 22109) by debbugs.gnu.org; 7 Dec 2015 18:07:27 +0000 Received: from localhost ([127.0.0.1]:41964 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a60C6-0005mT-E8 for submit@debbugs.gnu.org; Mon, 07 Dec 2015 13:07:27 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:45077) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a60C3-0005mD-7l for 22109@debbugs.gnu.org; Mon, 07 Dec 2015 13:07:24 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id A3CD01605E4; Mon, 7 Dec 2015 10:07:21 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id FGZLqJsEcvwt; Mon, 7 Dec 2015 10:07:20 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 35FD2160817; Mon, 7 Dec 2015 10:07:20 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id bc7vhxt-DRtO; Mon, 7 Dec 2015 10:07:20 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 1A7731605E4; Mon, 7 Dec 2015 10:07:20 -0800 (PST) Subject: Re: bug#22109: Sort gives incorrect order when changing delimiters To: 22109@debbugs.gnu.org, edbrambley@gmail.com References: <5665B884.1080407@redhat.com> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <5665CAD7.30303@cs.ucla.edu> Date: Mon, 7 Dec 2015 10:07:19 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <5665B884.1080407@redhat.com> Content-Type: multipart/mixed; boundary="------------070607060006010801080107" X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 22109 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) This is a multi-part message in MIME format. --------------070607060006010801080107 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit This confusion happens often enough that I installed the attached documentation patch to try to make things clearer. --------------070607060006010801080107 Content-Type: text/x-patch; name="0001-doc-promote-sort-debug.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-doc-promote-sort-debug.patch" >From bb1f0a9cd18bc6fa9cf83f23d95929cbab36bfcb Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Mon, 7 Dec 2015 10:03:52 -0800 Subject: [PATCH] doc: promote 'sort --debug' * README, doc/coreutils.texi (Introduction, sort invocation): Suggest 'sort --debug' more prominently. --- README | 20 ++++++++----- doc/coreutils.texi | 86 ++++++++++++++++++++++++++++++------------------------ 2 files changed, 61 insertions(+), 45 deletions(-) diff --git a/README b/README index f50b6df..183d4f8 100644 --- a/README +++ b/README @@ -59,9 +59,9 @@ files (man/*.x) are welcome. However, the authoritative documentation is in texinfo form in the doc directory. -***************************************** +********************************************* On Mac OS X 10.5.1 (Darwin 9.1), test failure ------------------------------------------ +--------------------------------------------- Mac OS X 10.5.1 (Darwin 9.1) provides only partial (and incompatible) ACL support, so although "./configure && make" succeeds, "make check" @@ -82,9 +82,9 @@ the mean time, you can configure with --disable-nls. For details, see . -*********************** +********************* Pre-C99 build failure ------------------------ +--------------------- In 2009 we added this requirement: To build the coreutils from source, you must have a C99-conforming @@ -165,6 +165,15 @@ root than when run by less privileged users. Reporting bugs: --------------- +Send bug reports, questions, comments, etc. to bug-coreutils@gnu.org. +To suggest a patch, see the files README-hacking and HACKING for tips. + +If you have a problem with 'sort', try running 'sort --debug', as it +can can often help find and fix problems without having to wait for an +answer to a bug report. If the debug output does not suffice to fix +the problem on your own, please compress and attach it to the rest of +your bug report. + IMPORTANT: if you take the time to report a test failure, please be sure to include the output of running 'make check' in verbose mode for each failing test. For example, @@ -176,9 +185,6 @@ run this command: For some tests, you can get even more detail by adding DEBUG=yes. Then include the contents of the file 'log' in your bug report. -Send bug reports, questions, comments, etc. to bug-coreutils@gnu.org. -If you would like to suggest a patch, see the files README-hacking -and HACKING for tips. *************************************** diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 595cb9f..64b6206 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -516,10 +516,19 @@ will benefit. The GNU utilities documented here are mostly compatible with the POSIX standard. @cindex bugs, reporting -Please report bugs to @email{bug-coreutils@@gnu.org}. Remember -to include the version number, machine architecture, input files, and + +Please report bugs to @email{bug-coreutils@@gnu.org}. +Include the version number, machine architecture, input files, and any other information needed to reproduce the bug: your input, what you -expected, what you got, and why it is wrong. Diffs are welcome, but +expected, what you got, and why it is wrong. + +If you have a problem with @command{sort}, try running @samp{sort +--debug}, as it can can often help find and fix problems without +having to wait for an answer to a bug report. If the debug output +does not suffice to fix the problem on your own, please compress and +attach it to the rest of your bug report. + +Although diffs are welcome, please include a description of the problem as well, since this is sometimes difficult to infer. @xref{Bugs, , , gcc, Using and Porting GNU CC}. @@ -4001,6 +4010,42 @@ output. Synopsis: sort [@var{option}]@dots{} [@var{file}]@dots{} @end example +@cindex sort stability +@cindex sort's last-resort comparison +Many options affect how @command{sort} compares lines; if the results +are unexpected, try the @option{--debug} option to see what happened. +A pair of lines is compared as follows: +@command{sort} compares each pair of fields, in the +order specified on the command line, according to the associated +ordering options, until a difference is found or no fields are left. +If no key fields are specified, @command{sort} uses a default key of +the entire line. Finally, as a last resort when all keys compare +equal, @command{sort} compares entire lines as if no ordering options +other than @option{--reverse} (@option{-r}) were specified. The +@option{--stable} (@option{-s}) option disables this @dfn{last-resort +comparison} so that lines in which all fields compare equal are left +in their original relative order. The @option{--unique} +(@option{-u}) option also disables the last-resort comparison. +@vindex LC_ALL +@vindex LC_COLLATE + +Unless otherwise specified, all comparisons use the character collating +sequence specified by the @env{LC_COLLATE} locale.@footnote{If you +use a non-POSIX locale (e.g., by setting @env{LC_ALL} +to @samp{en_US}), then @command{sort} may produce output that is sorted +differently than you're accustomed to. In that case, set the @env{LC_ALL} +environment variable to @samp{C}@. Note that setting only @env{LC_COLLATE} +has two problems. First, it is ineffective if @env{LC_ALL} is also set. +Second, it has undefined behavior if @env{LC_CTYPE} (or @env{LANG}, if +@env{LC_CTYPE} is unset) is set to an incompatible value. For example, +you get undefined behavior if @env{LC_CTYPE} is @code{ja_JP.PCK} but +@env{LC_COLLATE} is @code{en_US.UTF-8}.} +A line's trailing newline is not part of the line for comparison +purposes. If the final byte of an input file is not a newline, GNU +@command{sort} silently supplies one. GNU @command{sort} (as +specified for all GNU utilities) has no limit on input line length or +restrictions on bytes allowed within lines. + @command{sort} has three modes of operation: sort (the default), merge, and check for sortedness. The following options change the operation mode: @@ -4042,41 +4087,6 @@ works. @end table -@cindex sort stability -@cindex sort's last-resort comparison -A pair of lines is compared as follows: -@command{sort} compares each pair of fields, in the -order specified on the command line, according to the associated -ordering options, until a difference is found or no fields are left. -If no key fields are specified, @command{sort} uses a default key of -the entire line. Finally, as a last resort when all keys compare -equal, @command{sort} compares entire lines as if no ordering options -other than @option{--reverse} (@option{-r}) were specified. The -@option{--stable} (@option{-s}) option disables this @dfn{last-resort -comparison} so that lines in which all fields compare equal are left -in their original relative order. The @option{--unique} -(@option{-u}) option also disables the last-resort comparison. - -@vindex LC_ALL -@vindex LC_COLLATE -Unless otherwise specified, all comparisons use the character collating -sequence specified by the @env{LC_COLLATE} locale.@footnote{If you -use a non-POSIX locale (e.g., by setting @env{LC_ALL} -to @samp{en_US}), then @command{sort} may produce output that is sorted -differently than you're accustomed to. In that case, set the @env{LC_ALL} -environment variable to @samp{C}@. Note that setting only @env{LC_COLLATE} -has two problems. First, it is ineffective if @env{LC_ALL} is also set. -Second, it has undefined behavior if @env{LC_CTYPE} (or @env{LANG}, if -@env{LC_CTYPE} is unset) is set to an incompatible value. For example, -you get undefined behavior if @env{LC_CTYPE} is @code{ja_JP.PCK} but -@env{LC_COLLATE} is @code{en_US.UTF-8}.} - -GNU @command{sort} (as specified for all GNU utilities) has no -limit on input line length or restrictions on bytes allowed within lines. -In addition, if the final byte of an input file is not a newline, GNU -@command{sort} silently supplies one. A line's trailing newline is not -part of the line for comparison purposes. - @cindex exit status of @command{sort} Exit status: -- 2.1.0 --------------070607060006010801080107-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 07 16:07:15 2015 Received: (at 22109) by debbugs.gnu.org; 7 Dec 2015 21:07:15 +0000 Received: from localhost ([127.0.0.1]:42037 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a6306-0004ta-JA for submit@debbugs.gnu.org; Mon, 07 Dec 2015 16:07:15 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51378) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a6304-0004tR-4q for 22109@debbugs.gnu.org; Mon, 07 Dec 2015 16:07:12 -0500 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (Postfix) with ESMTPS id 2D1615BA03; Mon, 7 Dec 2015 21:07:11 +0000 (UTC) Received: from [10.3.113.183] (ovpn-113-183.phx2.redhat.com [10.3.113.183]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id tB7L7AkM016701; Mon, 7 Dec 2015 16:07:10 -0500 Subject: Re: bug#22109: Sort gives incorrect order when changing delimiters To: Paul Eggert , 22109@debbugs.gnu.org, edbrambley@gmail.com References: <5665B884.1080407@redhat.com> <5665CAD7.30303@cs.ucla.edu> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg Organization: Red Hat, Inc. Message-ID: <5665F4FE.9040602@redhat.com> Date: Mon, 7 Dec 2015 14:07:10 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <5665CAD7.30303@cs.ucla.edu> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="2EnOu8FujFIRBq8GpWT9MNE18a7E1HUW1" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 22109 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --2EnOu8FujFIRBq8GpWT9MNE18a7E1HUW1 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 12/07/2015 11:07 AM, Paul Eggert wrote: > This confusion happens often enough that I installed the attached > documentation patch to try to make things clearer. Should we also modify this paragraph in 'sort --help'? Maybe: > KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where = F is a > field number and C a character position in the field; both are origin 1= , and > the stop position defaults to the line's end. If neither -t nor -b is i= n > effect, characters in a field are counted from the beginning of the pre= ceding > whitespace. OPTS is one or more single-letter ordering options [bdfgiMh= nRrV], > which override global ordering options for that key. If no key is given= , use >-the entire line as the key. >+the entire line as the key. Use --debug to diagnose incorrect key usag= e. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --2EnOu8FujFIRBq8GpWT9MNE18a7E1HUW1 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJWZfT+AAoJEKeha0olJ0NqcWIH/1Z+l/Y+WSSaECZp8qedi90x aUHx+xaeGnx1yq3DXBfM6NComJYQ/nLw3NRV5tZpt334hgZC4adSdpaVTya8DWLG bEvBxrM1R2u9pO22UaL0iNkWO4IRfapLMBygvPzTnurhyJHLA6sryvDwAryzjpQ5 S8KN39Mk8gU0UMv9TF01+QXUa7ICj0RVhO4zTvvQBidJBltnI78TH9LbhxVPYLxD 0FseB/EVAqJYuJ5RYjkUELlcTApBD4/g7kW1BbSRDw6gCEXhkeeChfQpmEEuGBj9 vmd4ZGsw51VWovUEFNbiq1BZM4Q5igYifPODpeW0cTdwAFSsSfW05w3GCqiF+yA= =+39L -----END PGP SIGNATURE----- --2EnOu8FujFIRBq8GpWT9MNE18a7E1HUW1-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 07 16:08:25 2015 Received: (at 22109) by debbugs.gnu.org; 7 Dec 2015 21:08:25 +0000 Received: from localhost ([127.0.0.1]:42041 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a631F-0004vQ-BD for submit@debbugs.gnu.org; Mon, 07 Dec 2015 16:08:25 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:56793) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a631D-0004vI-EF for 22109@debbugs.gnu.org; Mon, 07 Dec 2015 16:08:24 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id B8D3C160D25; Mon, 7 Dec 2015 13:08:22 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id MzpG8sgD7KaM; Mon, 7 Dec 2015 13:08:21 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id B973F160817; Mon, 7 Dec 2015 13:08:19 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id h9B6C3zSSqdc; Mon, 7 Dec 2015 13:08:19 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 1FBF9160E64; Mon, 7 Dec 2015 13:08:18 -0800 (PST) Subject: Re: bug#22109: Sort gives incorrect order when changing delimiters To: Eric Blake , 22109@debbugs.gnu.org, edbrambley@gmail.com References: <5665B884.1080407@redhat.com> <5665CAD7.30303@cs.ucla.edu> <5665F4FE.9040602@redhat.com> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <5665F542.3060906@cs.ucla.edu> Date: Mon, 7 Dec 2015 13:08:18 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <5665F4FE.9040602@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 22109 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) On 12/07/2015 01:07 PM, Eric Blake wrote: > Should we also modify this paragraph in 'sort --help'? Works for me. From debbugs-submit-bounces@debbugs.gnu.org Tue Dec 08 04:50:57 2015 Received: (at 22109) by debbugs.gnu.org; 8 Dec 2015 09:50:57 +0000 Received: from localhost ([127.0.0.1]:42339 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a6EvA-0003hX-Dx for submit@debbugs.gnu.org; Tue, 08 Dec 2015 04:50:57 -0500 Received: from mail-qg0-f49.google.com ([209.85.192.49]:33030) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a6Eup-0003gU-KW for 22109@debbugs.gnu.org; Tue, 08 Dec 2015 04:50:54 -0500 Received: by qgea14 with SMTP id a14so11287748qge.0 for <22109@debbugs.gnu.org>; Tue, 08 Dec 2015 01:50:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=LVBtXoynoZodPjYJZVHcGLcyJCjzLFsDY2/7S/cgo2A=; b=F7xAX8KF2sNAzuVOJkxp1t9AtxRWF4uOG8W4ffDxi/ws+dxiSTQdP/vQkWu0STzq3W urDbJ9/iq8tr6cg2BgVfySY57vZXP/VhjU4cYtgwS68b6NTOV/IdWtxeVF+fqIuj6KHY a4BILODAmu8KgLXRD/TwLx2nGics0Fh5J8UdBbix6lmIYkdHMBn/nQzPr4muZqtvRhk5 y6QQW+qsCOp7sJpUy3Qfj5M2deKuObsOA9UZeOnSXuusS5tkBpfK7SFJiaWY4MriZ9mp F2NKhwieQW1Bpdcyg3uieKasVzeWzl60F2cdQTp0KhkoWCZW5UStTbqb2pNpgjrX7eYk YvJA== MIME-Version: 1.0 X-Received: by 10.140.172.3 with SMTP id s3mr3401841qhs.6.1449568234907; Tue, 08 Dec 2015 01:50:34 -0800 (PST) Received: by 10.55.4.140 with HTTP; Tue, 8 Dec 2015 01:50:34 -0800 (PST) In-Reply-To: <5665F542.3060906@cs.ucla.edu> References: <5665B884.1080407@redhat.com> <5665CAD7.30303@cs.ucla.edu> <5665F4FE.9040602@redhat.com> <5665F542.3060906@cs.ucla.edu> Date: Tue, 8 Dec 2015 09:50:34 +0000 Message-ID: Subject: Re: bug#22109: Sort gives incorrect order when changing delimiters From: Ed Brambley To: 22109@debbugs.gnu.org Content-Type: multipart/alternative; boundary=001a113a6e8eb82e4205265fe790 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22109 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --001a113a6e8eb82e4205265fe790 Content-Type: text/plain; charset=UTF-8 Dear All, Thanks Assaf and Eric for the explanation. It's very well hidden in the man page. I know it would break backward compatability (so don't do it) and, as Eric pointed out to me, would break POSIX compatability, but I would think most people's expectation would be that -k2 would be shorthand for -k2,2 rather than -k2,end. Updating the documentation would really help. Your proposals so far seem good, but they are really missing the point as far as I'm concerned, which is that *field separators are including in the comparison*, So I think Paul's update is a bit misleading, as it says "Sort compares each pair of fields, in the order specified on the command line, according to the associated ordering options, until a difference is found or no fields are left", but doesn't mention that it also uses the field separators when comparing fields. If I'd seen the documentation suggesting using --debug, I would have used it, but still reported a bug as --debug would have just confirmed that sort was doing what I thought it was doing, which I thought was wrong. So parhaps we could say somewhere in the documentation something like: > KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a > field number and C a character position in the field; both default to 1, and > the stop position defaults to the line's end. Note that any field separators between > the start and stop positions are also included in the comparison. And also possibly something like: > ... A line's trailing newline is not part of the line for comparison purposes, but field > separators are included in the comparison... Thanks again, Ed Ps: Sorry for emailing you directly, Eric. My fault for not replying all. --001a113a6e8eb82e4205265fe790 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Dear All,

Thanks Assaf and Eric for the e= xplanation.=C2=A0 It's=20 very well hidden in the man page.=C2=A0 I know it would break backward=20 compatability (so don't do it) and, as Eric pointed out to me, would br= eak POSIX compatability, but I would think most people's=20 expectation would be that -k2 would be shorthand for -k2,2 rather than=20 -k2,end.

Updating the documentation would really help.=C2= =A0 Your proposals so far seem good, but they are really missing the point = as far as I'm concerned, which is that *field separators are including = in the comparison*,=C2=A0 So I think Paul's update is a bit misleading,= as it says "Sort compares each pair of fields, in the order specified= on the command line, according to the associated ordering options, until a= difference is found or no fields are left", but doesn't mention t= hat it also uses the field separators when comparing fields.

<= div>If I'd seen the documentation suggesting using --debug, I would hav= e used it, but still reported a bug as --debug would have just confirmed th= at sort was doing what I thought it was doing, which I thought was wrong.
So parhaps we could say somewhere in the documentation som= ething like:

> KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for= start and stop position, where F is a
> field number and C a character position in the field; both default to = 1, and
> the stop position defaults to the line's end.=C2=A0 Note that any = field separators between
> the start and stop positions are also incl= uded in the comparison.

And also possibly something like:=

> ... A line's trailing newline is not part of the line for = comparison purposes, but field
> separators are included in the compa= rison...

Thanks again,
Ed
Ps: Sorry for emailing you directly, Eric.=C2=A0 My fault for n= ot replying all.
--001a113a6e8eb82e4205265fe790-- From unknown Thu Jun 19 14:05:35 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 05 Jan 2016 12:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator