From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 25 13:54:49 2020 Received: (at submit) by debbugs.gnu.org; 25 Mar 2020 17:54:50 +0000 Received: from localhost ([127.0.0.1]:57723 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jHAEr-00033I-6s for submit@debbugs.gnu.org; Wed, 25 Mar 2020 13:54:49 -0400 Received: from lists.gnu.org ([209.51.188.17]:46237) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jH9yO-0002d7-Mw for submit@debbugs.gnu.org; Wed, 25 Mar 2020 13:37:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34504) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jH9yN-000129-8h for bug-coreutils@gnu.org; Wed, 25 Mar 2020 13:37:48 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_50,RCVD_IN_DNSWL_LOW, URIBL_BLOCKED autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jH9yM-0007L6-8z for bug-coreutils@gnu.org; Wed, 25 Mar 2020 13:37:47 -0400 Received: from out4-smtp.messagingengine.com ([66.111.4.28]:45213) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jH9yJ-0007IZ-PS for bug-coreutils@gnu.org; Wed, 25 Mar 2020 13:37:46 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 68C9D5C034B for ; Wed, 25 Mar 2020 13:37:42 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Wed, 25 Mar 2020 13:37:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vx21.xyz; h=date :from:to:subject:message-id:mime-version:content-type; s=fm2; bh=etlzxLtpvLv3r1XS+C8VgM90JNdvib0Lx6vXCCQY5uQ=; b=fXXh8oh/cEg+ dKfv9PnkzEn0eJnm7IxTLWCEZ5uUqVZQom7J0Oep9Z2o/e9AqF+H0IO24YaIf1pN gTwAmiLtF6ysFlB5FQFPyoiepzDMEys//z7A0ro29azO1kv93HiMzq5Cc2ToRQUm PQVSbiVNqTj8pDH3AvZJBnbSgCr+GLDYMJDtXlKJ1/pfTYngZIp5O3TmtqGmFq3y FVaPVhSrv1eUGbokhqpf8FQJHJqCZEG3u8jwzt4CTjomjSDPhuzBiP3pz8BhnTb9 RAtKD1yfy13j80cgkq1k1OqV+1vAmyFRJ4aw1jiWKimPEH4auaH5u8ygL6qggeEa WYSwlne67A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:message-id :mime-version:subject:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm2; bh=etlzxLtpvLv3r1XS+C8VgM90JNdvi b0Lx6vXCCQY5uQ=; b=UOuTTfB39xRIiAm3W7yFe6HR64DFrwPD+X1OqB9hqrTdw o40zHWVne5M+EnJyoVz+Bpw6LBeW/wCjaCnVbmxDUEBjFhmNpf1udPPqRJ9qKyz+ 8DLZuUH9+8LQ3mksR0yG5dlUByT18eMcGkQOQMphUlxlxH3dZwsAvJUlrKLOVfcV tZEs2pjuSfkcey0zGy3lF4Rj8BueVCafJJWIhNOCuL/SE4b1gLie/YdbvGVy6RIK +lox4zwKPOftzhgjuVYwOSsutr4x3pHOJOB4HS2l3v0CO9qVivS6th70UyMV1cfI DXb16o72MtNbDKTVYiCJtviW5pQO3W/tdXXtTNzEA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedugedrudehgedgjeekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucgfrhhlucfvnfffucdlfeehmdenucfjughrpeffhf fvuffkgggtuggfsehttdertddtredvnecuhfhrohhmpeftihgthhgrrhguucfkphhsuhhm uceorhhitghhrghrughiphhsuhhmsehvgidvuddrgiihiieqnecukfhppeegiedrvdefrd elfedruddufeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhr ohhmpehrihgthhgrrhguihhpshhumhesvhigvddurdighiii X-ME-Proxy: Received: from persephone.openbsd.amsterdam (unknown [46.23.93.113]) by mail.messagingengine.com (Postfix) with ESMTPA id DE6BB306669F for ; Wed, 25 Mar 2020 13:37:41 -0400 (EDT) Date: Wed, 25 Mar 2020 18:37:38 +0100 From: Richard Ipsum To: bug-coreutils@gnu.org Subject: sort: expected sort order when -c in use Message-ID: <20200325173738.GA16172@persephone.openbsd.amsterdam> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.12.2 (2019-09-21) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.111.4.28 X-Spam-Score: 2.7 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Hi, I'm trying to understand something and thought it would be good to ask here. I get different results for a case-insensitive sort using -c. My understanding is that -f should lead to lower case characters with upper case equivalents being converted to their upper case equivalen [...] Content analysis details: (2.7 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: vx21.xyz] 0.7 SPF_NEUTRAL SPF: sender does not match SPF record (neutral) 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: vx21.xyz (xyz)] -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at https://www.dnswl.org/, low trust [209.51.188.17 listed in list.dnswl.org] 0.5 FROM_SUSPICIOUS_NTLD From abused NTLD 0.2 FROM_SUSPICIOUS_NTLD_FP From abused NTLD X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 25 Mar 2020 13:54:48 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Hi, I'm trying to understand something and thought it would be good to ask here. I get different results for a case-insensitive sort using -c. My understanding is that -f should lead to lower case characters with upper case equivalents being converted to their upper case equivalen [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.7 SPF_NEUTRAL SPF: sender does not match SPF record (neutral) 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: vx21.xyz (xyz)] -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at https://www.dnswl.org/, low trust [209.51.188.17 listed in list.dnswl.org] 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: messagingengine.com] 0.5 FROM_SUSPICIOUS_NTLD From abused NTLD -1.0 MAILING_LIST_MULTI Multiple indicators imply a widely-seen list manager Hi, I'm trying to understand something and thought it would be good to ask here. I get different results for a case-insensitive sort using -c. My understanding is that -f should lead to lower case characters with upper case equivalents being converted to their upper case equivalents. This doesn't seem to be happening for the C locale though. % echo -e "aaaa\nAAAA" | LC_COLLATE=en_GB.UTF-8 sort -c -f - % echo -e "aaaa\nAAAA" | LC_COLLATE=en_US.UTF-8 sort -c -f - % echo -e "aaaa\nAAAA" | LC_COLLATE=C sort -c -f - sort: -:2: disorder: AAAA Is this considered a bug or an expected difference between the locales? Thanks, Richard From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 25 14:17:28 2020 Received: (at 40226) by debbugs.gnu.org; 25 Mar 2020 18:17:28 +0000 Received: from localhost ([127.0.0.1]:57733 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jHAal-0003fR-Sv for submit@debbugs.gnu.org; Wed, 25 Mar 2020 14:17:28 -0400 Received: from us-smtp-delivery-74.mimecast.com ([63.128.21.74]:54865) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jHAak-0003fJ-96 for 40226@debbugs.gnu.org; Wed, 25 Mar 2020 14:17:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585160245; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BYcG7sKP+xKjnCTJ3wsN5/6nkKgIta2bD+0L2BNq+SU=; b=cxqHIU0s2Ozk62VdPaJJMJQcR9WZlbMcpnDrmuoRl4w4nT/fdT+V/+Hbq8ME5AYfvI612G 4XKY3CV0Hfii22BeXBBcg5H8E/EhhG60SgF6dQyq32gjYJtVgmONQkLqQ/QH9haAkzwDDx WL1PbaEz7zpegQKu9uIjUJYEYieBWM0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-130-QRFaYPDCOJ25f7ABhCgWhg-1; Wed, 25 Mar 2020 14:17:21 -0400 X-MC-Unique: QRFaYPDCOJ25f7ABhCgWhg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 647627A525; Wed, 25 Mar 2020 18:17:20 +0000 (UTC) Received: from [10.3.113.103] (ovpn-113-103.phx2.redhat.com [10.3.113.103]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D454092D22; Wed, 25 Mar 2020 18:17:19 +0000 (UTC) Subject: Re: bug#40226: sort: expected sort order when -c in use To: Richard Ipsum , 40226@debbugs.gnu.org References: <20200325173738.GA16172@persephone.openbsd.amsterdam> From: Eric Blake Organization: Red Hat, Inc. Message-ID: Date: Wed, 25 Mar 2020 13:17:19 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: <20200325173738.GA16172@persephone.openbsd.amsterdam> Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=WINDOWS-1252; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 40226 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 3/25/20 12:37 PM, Richard Ipsum wrote: > Hi, >=20 > I'm trying to understand something and thought it would be good to ask > here. >=20 > I get different results for a case-insensitive sort using -c. My > understanding is that -f should lead to lower case characters with upper > case equivalents being converted to their upper case equivalents. This > doesn't seem to be happening for the C locale though. >=20 > % echo -e "aaaa\nAAAA" | LC_COLLATE=3Den_GB.UTF-8 sort -c -f - > % echo -e "aaaa\nAAAA" | LC_COLLATE=3Den_US.UTF-8 sort -c -f - > % echo -e "aaaa\nAAAA" | LC_COLLATE=3DC sort -c -f - > sort: -:2: disorder: AAAA First, 'echo -e' is not portable, so I'll be reproducing your example=20 with printf. And you are assuming that LC_ALL is not set (otherwise,=20 LC_COLLATE would have no impact); so I'll set LC_ALL to be sure. Except=20 that I can't reproduce your example (I'm using Fedora 31, coreutils 8.31): $ printf 'aaaa\nAAAA\n' | LC_ALL=3Den_US.UTF-8 sort -c -f - sort: -:2: disorder: AAAA So there's probably something different in the locale libraries and/or=20 your coreutils version on your system, compared to mine. Next, let's debug things to see why: $ printf 'aaaa\nAAAA\n' | LC_ALL=3Den_US.UTF-8 sort -c -f - --debug sort: options '-c --debug' are incompatible Oh, bummer - I don't know why we have that restriction. Okay, let's try=20 a slightly different approach: $ printf 'aaaa\nAAAA\n' | LC_ALL=3Den_GB.UTF-8 sort -f - --debug sort: text ordering performed using =91en_GB.UTF-8=92 sorting rules AAAA ____ ____ aaaa ____ ____ $ printf 'aaaa\nAAAA\n' | LC_ALL=3Den_GB.UTF-8 sort -f - --debug -s sort: text ordering performed using =91en_GB.UTF-8=92 sorting rules aaaa ____ AAAA ____ See the difference? In the first case, sort is doing its default=20 case-insensitive comparison of the entire line (because you passed -f=20 but not -k), AND a stability comparison of the byte values of the entire=20 line (as shown by the two ____ lines per input). But in the second=20 case, when you add -s, the stability comparison is omitted. The two=20 lines are indeed different when the stability comparison is performed,=20 explaining why -c choked when -s is absent. Or put another way, -f=20 affects only -k, including the implied -k1 when you don't specify=20 anything, and not -s. So now that we know that, let's return to your=20 example: $ printf 'aaaa\nAAAA\n' | LC_ALL=3Den_GB.UTF-8 sort -f - -c -s $ echo $? 0 >=20 > Is this considered a bug or an expected difference between the locales? I don't know if it's the locale definition, or something changed between=20 coreutils versions, or both; although I'm more likely to chalk it up to=20 locale issues and not something where coreutils needs a patch, other=20 than perhaps a documentation patch. I'll leave the bug report itself=20 open for a bit longer, in case anyone else has an opinion. --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 25 17:16:28 2020 Received: (at 40226) by debbugs.gnu.org; 25 Mar 2020 21:16:29 +0000 Received: from localhost ([127.0.0.1]:57868 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jHDO0-0000DD-Ix for submit@debbugs.gnu.org; Wed, 25 Mar 2020 17:16:28 -0400 Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:38895) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jHCEf-0006aB-Uu for 40226@debbugs.gnu.org; Wed, 25 Mar 2020 16:02:47 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id BE7B0440; Wed, 25 Mar 2020 16:02:37 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Wed, 25 Mar 2020 16:02:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vx21.xyz; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm2; bh=xCf6SHe1g5093Cpx0/WywLz4/SY W9oUXny5yrhFyE4Q=; b=gwqzlxT8uj++kHiqyKkC/28a25x5n4ct6u6/bx6d/0A 0BnIE6bjc2iuZACLr2zUupgRgWBLWnMOcWHWOP1QlLwXnHPPKH4KtzHhlyGLofR+ s16raX33RcI5XgcMzpJJfKXk6I3TXWnn2Q+IOfNpDy0XEfy7q7boxIrUN/mCZh+2 EaCs71so4+HIFi3DA8LxHk7/HVSOljhh42SiEc7EY3TjQszO4n25nmFLmXCOEKc6 yPpAvPykpaw4qpCUkM//MHLMNJhG5mvRuP78R/btz5tB+WQgaBpb4O2LFa+jDNiB S/Iy3jsDSDhG2PmtE+mdDgFKqNh9/+qgdCy46MEwRDQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=xCf6SH e1g5093Cpx0/WywLz4/SYW9oUXny5yrhFyE4Q=; b=x+FaFkU9eRUuX4PRTmkFKm vWgRHyFmlzwzq+AaEdWmKv+sY8oLO/4z4yW5EbWr4of3bncZIo8s+izMisRPzduP WYPL9clKnTYNs7LFsDNRmkCH6hUO/RzihpvzRye8wCdK7wbjiz20l66Zx/hzRF8t Cuhspsd3LO69cIhlmdhbH0VP+2czqXPXGyqsgegEVHIYNBd+vIJiet1H7cFVG1C8 ffeAwUjXQjRACn5qL+oh77GMbY09+9WkhY3qINh3b/w3SBwLB8oQOv11DFTSOcSq Icmjws3bGf7WFdWGrwY2zM0VZERJvI7ox7gh1gSqIjX4asjqNBU6c64USy8AmwHA == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedugedrudehgedguddtjecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enfghrlhcuvffnffculdefhedmnecujfgurhepfffhvffukfhfgggtuggjfgesthdtredt tdervdenucfhrhhomheptfhitghhrghrugcukfhpshhumhcuoehrihgthhgrrhguihhpsh humhesvhigvddurdighiiiqeenucfkphepgeeirddvfedrleefrdduudefnecuvehluhhs thgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprhhitghhrghrughiph hsuhhmsehvgidvuddrgiihii X-ME-Proxy: Received: from persephone.openbsd.amsterdam (unknown [46.23.93.113]) by mail.messagingengine.com (Postfix) with ESMTPA id 71D5A3280064; Wed, 25 Mar 2020 16:02:36 -0400 (EDT) Date: Wed, 25 Mar 2020 21:02:32 +0100 From: Richard Ipsum To: Eric Blake Subject: Re: bug#40226: sort: expected sort order when -c in use Message-ID: <20200325200232.GB16172@persephone.openbsd.amsterdam> References: <20200325173738.GA16172@persephone.openbsd.amsterdam> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.12.2 (2019-09-21) X-Spam-Score: 1.8 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On Wed, Mar 25, 2020 at 01:17:19PM -0500, Eric Blake wrote: > On 3/25/20 12:37 PM, Richard Ipsum wrote: [snip] > > See the difference? In the first case, sort is doing its default > case-insensitive c [...] Content analysis details: (1.8 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at https://www.dnswl.org/, low trust [64.147.123.25 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: vx21.xyz (xyz)] 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: messagingengine.com] 0.5 FROM_SUSPICIOUS_NTLD From abused NTLD X-Debbugs-Envelope-To: 40226 X-Mailman-Approved-At: Wed, 25 Mar 2020 17:16:26 -0400 Cc: 40226@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.8 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On Wed, Mar 25, 2020 at 01:17:19PM -0500, Eric Blake wrote: > On 3/25/20 12:37 PM, Richard Ipsum wrote: [snip] > > See the difference? In the first case, sort is doing its default > case-insensitive c [...] Content analysis details: (1.8 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_HELO_PASS SPF: HELO matches SPF record 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [URIs: vx21.xyz] -0.0 SPF_PASS SPF: sender matches SPF record 2.0 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: vx21.xyz (xyz)] -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at https://www.dnswl.org/, low trust [64.147.123.25 listed in list.dnswl.org] 0.5 FROM_SUSPICIOUS_NTLD From abused NTLD -1.0 MAILING_LIST_MULTI Multiple indicators imply a widely-seen list manager 1.0 BULK_RE_SUSP_NTLD Precedence bulk and RE: from a suspicious TLD On Wed, Mar 25, 2020 at 01:17:19PM -0500, Eric Blake wrote: > On 3/25/20 12:37 PM, Richard Ipsum wrote: [snip] > > See the difference? In the first case, sort is doing its default > case-insensitive comparison of the entire line (because you passed -f but > not -k), AND a stability comparison of the byte values of the entire line > (as shown by the two ____ lines per input). But in the second case, when > you add -s, the stability comparison is omitted. The two lines are indeed > different when the stability comparison is performed, explaining why -c > choked when -s is absent. Or put another way, -f affects only -k, including > the implied -k1 when you don't specify anything, and not -s. So now that we > know that, let's return to your example: I'm trying to understand this relative to POSIX, which makes no mention of stability as far as I can see (and there is no -s in POSIX). POSIX says that -f should override the default ordering rules. I don't understand why the last-resort comparison is required when -c is in use, since we're not sorting with -c, just checking if the input is already sorted? Put another way should -c imply -s ? Thanks, Richard From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 25 17:35:58 2020 Received: (at 40226) by debbugs.gnu.org; 25 Mar 2020 21:35:58 +0000 Received: from localhost ([127.0.0.1]:57887 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jHDgr-0000lC-VB for submit@debbugs.gnu.org; Wed, 25 Mar 2020 17:35:58 -0400 Received: from us-smtp-delivery-74.mimecast.com ([216.205.24.74]:58476) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jHDgq-0000l3-Ct for 40226@debbugs.gnu.org; Wed, 25 Mar 2020 17:35:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585172156; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d+6AIjJ1mp6gs586Rnu8M6/S2YHYPSI+mKrJo22Bay0=; b=Lx8spwQXPNzBWBWLSRaO/BAYgyGSwlC8bJ5PO6cYUaKMQHyBil64fLUCaX23YWEKuS3VHt n63qK3DitmiaDYavhw79PdamKfvvJT7zhFEU9ltn1zYdEvpiBMAITzX3TG6lBObYWgJr2x E1t1iOxvYYRwVyo4uw7p96WciMiSxuE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-341-ncgUs9I0Muq0Os4GbnOnZg-1; Wed, 25 Mar 2020 17:35:49 -0400 X-MC-Unique: ncgUs9I0Muq0Os4GbnOnZg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 117889F5C5; Wed, 25 Mar 2020 21:35:48 +0000 (UTC) Received: from [10.3.113.103] (ovpn-113-103.phx2.redhat.com [10.3.113.103]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A18AE90766; Wed, 25 Mar 2020 21:35:47 +0000 (UTC) Subject: Re: bug#40226: sort: expected sort order when -c in use To: Richard Ipsum References: <20200325173738.GA16172@persephone.openbsd.amsterdam> <20200325200232.GB16172@persephone.openbsd.amsterdam> From: Eric Blake Organization: Red Hat, Inc. Message-ID: Date: Wed, 25 Mar 2020 16:35:47 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: <20200325200232.GB16172@persephone.openbsd.amsterdam> Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 40226 Cc: 40226@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 3/25/20 3:02 PM, Richard Ipsum wrote: > On Wed, Mar 25, 2020 at 01:17:19PM -0500, Eric Blake wrote: >> On 3/25/20 12:37 PM, Richard Ipsum wrote: > [snip] >> >> See the difference? In the first case, sort is doing its default >> case-insensitive comparison of the entire line (because you passed -f but >> not -k), AND a stability comparison of the byte values of the entire line >> (as shown by the two ____ lines per input). But in the second case, when >> you add -s, the stability comparison is omitted. The two lines are indeed >> different when the stability comparison is performed, explaining why -c >> choked when -s is absent. Or put another way, -f affects only -k, including >> the implied -k1 when you don't specify anything, and not -s. So now that we >> know that, let's return to your example: > > I'm trying to understand this relative to POSIX, which makes no mention > of stability as far as I can see (and there is no -s in POSIX). POSIX > says that -f should override the default ordering rules. I don't > understand why the last-resort comparison is required when -c is in use, > since we're not sorting with -c, just checking if the input is already sorted? POSIX states [sort description]: "If this collating sequence does not have a total ordering of all characters (see XBD LC_COLLATE), any lines of input that collate equally should be further compared byte-by-byte using the collating sequence for the POSIX locale." As I understand it, this is true even when -f modifies the collating sequence to compare all lowercase characters as their uppercase equivalent. But POSIX further states [XBD LC_COLLATE]: "All implementation-provided locales (either preinstalled or provided as locale definitions which can be installed later) should define a collation sequence that has a total ordering of all characters unless the locale name has an '@' modifier indicating that it has a special collation sequence (for example, @icase could indicate that each upper and lowercase character pair collates equally). Notes: A future version of this standard may require these locales to define a collation sequence that has a total ordering of all characters (by changing "should" to "shall"). Users installing their own locales should ensure that they define a collation sequence with a total ordering of all characters unless an '@' modifier in the locale name (such as @icase ) indicates that it has a special collation sequence." > > Put another way should -c imply -s ? Maybe we compromise, and state that -c implies -s only for locales that do not include @ in their name (that is, if a locale already guarantees a total ordering of all characters, then even when -f collapses lowercase into uppercase, we don't need the final-resort comparison; but if a locale does not guarantee total ordering, the -s has to be explicit)? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org