From unknown Sun Jun 22 00:14:55 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#59275 <59275@debbugs.gnu.org> To: bug#59275 <59275@debbugs.gnu.org> Subject: Status: Unexpected return value of `string-collate-lessp' on Mac Reply-To: bug#59275 <59275@debbugs.gnu.org> Date: Sun, 22 Jun 2025 07:14:55 +0000 retitle 59275 Unexpected return value of `string-collate-lessp' on Mac=20 reassign 59275 emacs submitter 59275 Ihor Radchenko severity 59275 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 14 23:07:42 2022 Received: (at submit) by debbugs.gnu.org; 15 Nov 2022 04:07:42 +0000 Received: from localhost ([127.0.0.1]:52487 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ounEc-00087k-Ea for submit@debbugs.gnu.org; Mon, 14 Nov 2022 23:07:42 -0500 Received: from lists.gnu.org ([209.51.188.17]:37638) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ounEa-00087c-Lb for submit@debbugs.gnu.org; Mon, 14 Nov 2022 23:07:41 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ounEa-0006T9-H6 for bug-gnu-emacs@gnu.org; Mon, 14 Nov 2022 23:07:40 -0500 Received: from mout01.posteo.de ([185.67.36.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ounEY-0002rD-JD for bug-gnu-emacs@gnu.org; Mon, 14 Nov 2022 23:07:40 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id A1028240026 for ; Tue, 15 Nov 2022 05:07:34 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1668485255; bh=eBNf0ahUNl5eWBpqY8Gh5BpucTmyg6yA07F+5va8nYM=; h=From:To:Subject:Date:From; b=BkMh0Qy7mfWnJItvNe6AVieHAMgEfChu0RaFZJIVwh4kaLclwWOwRWPZJYN++li7R SutINXa1PrZ8Mfye/+0L6tH0qW7zHDvtUOeRMUwMOsV80U/ftWZCpttoDoWW/ZUvVE H67/B0IpAqPoZUILVhEnwjE2SGv1apsFqQXy1prjMdVV49HbSbw9pLj1Souc0sRxIW HJWW4xM6bAgU27R/JLixwS7eZROGjhR1QcDjDz6nxZfh1VKgfHO3jdxH4oqwToATRN i3orkgW9Bk4ngkHNp5M1F/AQez7xm+//6vkBkkrTLeLLyWqJ3sO935KyjPBiayRukI K7awWxQLAcwYA== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4NBCLr2wJrz9rxG for ; Tue, 15 Nov 2022 05:07:31 +0100 (CET) From: Ihor Radchenko To: bug-gnu-emacs@gnu.org Subject: Unexpected return value of `string-collate-lessp' on Mac Date: Tue, 15 Nov 2022 04:08:13 +0000 Message-ID: <87zgcsdfma.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=185.67.36.65; envelope-from=yantar92@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) Hi, I am forwarding an issue originally reported on Org mailing list. https://orgmode.org/list/m2ilkwso8r.fsf@me.com On Emacs 29 (adaa2fc90e) MacOS build: (string-collate-lessp "a" "B" "C" t) ; => nil On Linux: (string-collate-lessp "a" "B" "C" t) ; => t The return value on MacOS is unexpected. See more information, including locale date, in the Org ML thread. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 15 04:51:51 2022 Received: (at 59275) by debbugs.gnu.org; 15 Nov 2022 09:51:51 +0000 Received: from localhost ([127.0.0.1]:52990 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ousbf-0002Ma-H6 for submit@debbugs.gnu.org; Tue, 15 Nov 2022 04:51:51 -0500 Received: from mail-wr1-f54.google.com ([209.85.221.54]:43566) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ousbd-0002MM-QG for 59275@debbugs.gnu.org; Tue, 15 Nov 2022 04:51:50 -0500 Received: by mail-wr1-f54.google.com with SMTP id g12so23106262wrs.10 for <59275@debbugs.gnu.org>; Tue, 15 Nov 2022 01:51:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:from:to:cc:subject:date:message-id:reply-to; bh=nQGURAH3IDIA/Q7OkBIEkx0iP1SVxsK5bxkZHlEzafA=; b=UQadF8lQxxriKWjiwGu9walr4Qczg/zU8jFkbz4baJ4jtGQlk3ZBAmFYFkoLtW9RWr hjXAUeh/B76qI2Yuc2yZkUeY8Y/5qRejsXpm1/g2E/bdQk0y50udOyLSm2Le5HK/8OMV 3SrLoW2EfL0GBu1GPMOEFcoF+vY97yVxdcZGMendJe0Nrv4BFscQuL9R6L4rUFIJMyRo G6Cs4sufOuCxrpNpqyi+1uIsO67GueunnDatAIrV7M97l4nCPeemF+Vg+xycSmiF45pz vZG8wUC9anPpziUqOr8XGgtMfuiPZUPPFhHPfof+L8VQCUHgjTmx2wshbzXH59X4Mv3H kUuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nQGURAH3IDIA/Q7OkBIEkx0iP1SVxsK5bxkZHlEzafA=; b=GLfzzDkCsAjz//+zI34p2G0iMFmvB59HvfebJftRQbPDxxHF8LZPHw7zDDYWxAfpD4 WHhhiXb174zUXCfvyCeOhIExIGsfE8nwqu+9WMACfzpp8BAL6i3lnEd3HXWkvdLp+qiT ERk1EblJlZsle15zelB9xJwbX/VRft7lSwkT+3+pyseXKf63UET3gA1pFRWcmyWfO9A7 N7f2zZaDwhI+96EKjg4NDRw+qofMHSMprN0D1iFYcrLFhqQBjzxZQXXyamVsIm2U24b6 aiyEJ1UHXk7mMpOyDje7Adck0AK/w3yEOwWS6Qki2bK6G9A2Y+Z7ZGe/vAXR8G/N1mFr z0Jg== X-Gm-Message-State: ANoB5pkEY1Se+DhpootY2nvcTS2h9i38QnOPmannKEwo5WW8sa8GFm8D qe2nELEnctb21xNPFgHnNGBCac2jgtI= X-Google-Smtp-Source: AA0mqf4+GjJZgtgbD+8hFjYz5rw83YyJoiwXVTP4sm5DKJRgOL4MhR1TJL6ycwsN0xsJ7xJzavMKWA== X-Received: by 2002:a05:6000:512:b0:239:339e:2e14 with SMTP id a18-20020a056000051200b00239339e2e14mr10197364wrf.592.1668505903346; Tue, 15 Nov 2022 01:51:43 -0800 (PST) Received: from rltb ([82.66.8.55]) by smtp.gmail.com with ESMTPSA id m14-20020a05600c4f4e00b003a1980d55c4sm23307363wmq.47.2022.11.15.01.51.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Nov 2022 01:51:42 -0800 (PST) From: Robert Pluim To: Ihor Radchenko Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac In-Reply-To: <87zgcsdfma.fsf@localhost> (Ihor Radchenko's message of "Tue, 15 Nov 2022 04:08:13 +0000") References: <87zgcsdfma.fsf@localhost> Date: Tue, 15 Nov 2022 10:51:42 +0100 Message-ID: <87wn7wczpt.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) >>>>> On Tue, 15 Nov 2022 04:08:13 +0000, Ihor Radchenko said: Ihor> Hi, Ihor> I am forwarding an issue originally reported on Org mailing list. Ihor> https://orgmode.org/list/m2ilkwso8r.fsf@me.com Ihor> On Emacs 29 (adaa2fc90e) MacOS build: Ihor> (string-collate-lessp "a" "B" "C" t) ; => nil Ihor> On Linux: Ihor> (string-collate-lessp "a" "B" "C" t) ; => t Ihor> The return value on MacOS is unexpected. Ihor> See more information, including locale date, in the Org ML thread. I think this is expected. See the long thread on emacs-devel back in July, eg https://lists.gnu.org/archive/html/emacs-devel/2022-07/msg00940.html (it resulted in the addition of `string-equal-ignore-case') Robert -- From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 15 08:46:02 2022 Received: (at 59275) by debbugs.gnu.org; 15 Nov 2022 13:46:02 +0000 Received: from localhost ([127.0.0.1]:53479 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ouwGI-0002rz-3f for submit@debbugs.gnu.org; Tue, 15 Nov 2022 08:46:02 -0500 Received: from eggs.gnu.org ([209.51.188.92]:59240) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ouwGG-0002rE-34 for 59275@debbugs.gnu.org; Tue, 15 Nov 2022 08:46:00 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouwGA-0006bx-PN; Tue, 15 Nov 2022 08:45:54 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=ZBQFYDNsiacO3dy/DOkYHfCuq6OLLHqUY8xur7zHAZ8=; b=iFYWYtiQfp59 uQELnC27AopwMV16MuZU9Cfb80qicYsOXPecDmjbzpQS6gYUzq4pSt4rwfx7O2ETiNqailQ+UKsmk A0HxjZbI/tN8YBpIbogTcEj6eQEMlWhc/aA+L5j+2aq2CpLYaEyvoB+wihJnF+7fcE5hV/ed/hKJZ vYMb1ePGjDOZ2P+LecTYUag+1i0w29X661/zEtRkEQ9yjNAV4PxjoMvT+PMxPmifpA3iONk9yA4Ng 6zJX17KCexZTU2/3htGIrEyh658bjPPWB2nnLjR7gMYRM2kgQAayKSSVErv/Ot9VfWPjWJDls69D5 mu1cyrJxascpSXARdhLSjQ==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouwG9-0002aA-4C; Tue, 15 Nov 2022 08:45:54 -0500 Date: Tue, 15 Nov 2022 15:46:07 +0200 Message-Id: <83iljgib4w.fsf@gnu.org> From: Eli Zaretskii To: Ihor Radchenko In-Reply-To: <87zgcsdfma.fsf@localhost> (message from Ihor Radchenko on Tue, 15 Nov 2022 04:08:13 +0000) Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac References: <87zgcsdfma.fsf@localhost> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Ihor Radchenko > Date: Tue, 15 Nov 2022 04:08:13 +0000 > > I am forwarding an issue originally reported on Org mailing list. > https://orgmode.org/list/m2ilkwso8r.fsf@me.com > > On Emacs 29 (adaa2fc90e) MacOS build: > > (string-collate-lessp "a" "B" "C" t) ; => nil > > On Linux: > > (string-collate-lessp "a" "B" "C" t) ; => t > > The return value on MacOS is unexpected. string-collate-lessp is inherently platform- (and locale-) dependent. Don't use it if you want consistent results across platforms and locales. From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 15 10:05:16 2022 Received: (at 59275) by debbugs.gnu.org; 15 Nov 2022 15:05:16 +0000 Received: from localhost ([127.0.0.1]:55018 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ouxUx-0005N6-OT for submit@debbugs.gnu.org; Tue, 15 Nov 2022 10:05:16 -0500 Received: from mout01.posteo.de ([185.67.36.65]:54097) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ouxUv-0005Mr-Hn for 59275@debbugs.gnu.org; Tue, 15 Nov 2022 10:05:14 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 84226240027 for <59275@debbugs.gnu.org>; Tue, 15 Nov 2022 16:05:07 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1668524707; bh=+UlVfkeYBadVA2WX6sirx2LBSfP+6GF1KuG/4szdpoI=; h=From:To:Cc:Subject:Date:From; b=fdvadL4Uk2ZAt4vkKXNxahHRbRLglY3OUq8TQ5twXUfUlprjW0SGBBlJTE5oFnpKD HhDmyGaMWtcr+nkoh1HYyrR2OIWhnSHset/qDr/o5dUN+kIZ9MRINUJcfjqz7zoR8E AvU8Pxtd5cF16McTKDbdG46U815PI4cwvZZ2GeAMFoYsB/A6WwE7etdkW95pyFSqlv 2zpdRN867Hr11/5PB36cHPlk94JRmqcdXHt3TXgd+w8cFrj2k3zSzM6sKuRFbbuyPp ktuZ8WodEDR9uqm78Q5y8RyWYHnG6ITzJGC/CKrcFZwtfL+8ojRGXtcu/+B8NPW667 w4Vmt40wrKxyQ== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4NBTxZ2Q8Sz6tmb; Tue, 15 Nov 2022 16:05:05 +0100 (CET) From: Ihor Radchenko To: Eli Zaretskii Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac In-Reply-To: <83iljgib4w.fsf@gnu.org> References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> Date: Tue, 15 Nov 2022 15:05:48 +0000 Message-ID: <87h6z0cl6b.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Eli Zaretskii writes: >> On Emacs 29 (adaa2fc90e) MacOS build: >> >> (string-collate-lessp "a" "B" "C" t) ; => nil >> >> On Linux: >> >> (string-collate-lessp "a" "B" "C" t) ; => t >> >> The return value on MacOS is unexpected. > > string-collate-lessp is inherently platform- (and locale-) dependent. > Don't use it if you want consistent results across platforms and > locales. Is there a better alternative? Also, do I miss something, or is this pitfall not documented in the docstring of `string-collate-lessp'? -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 15 10:16:10 2022 Received: (at 59275) by debbugs.gnu.org; 15 Nov 2022 15:16:10 +0000 Received: from localhost ([127.0.0.1]:55070 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ouxfV-0005gf-RR for submit@debbugs.gnu.org; Tue, 15 Nov 2022 10:16:10 -0500 Received: from eggs.gnu.org ([209.51.188.92]:51708) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ouxfS-0005fr-BW for 59275@debbugs.gnu.org; Tue, 15 Nov 2022 10:16:08 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouxfM-00057K-Tg; Tue, 15 Nov 2022 10:16:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=Cp73UlZW43tXFrVb19JHIpdCK3psV5olNS3/CEIgWF8=; b=alXIB8JR6NAYvikHnckw cus54x9o4afwcHOqcSImcXB02xqo3awSm+FERce/jltW9Zxyd3HGPpxaw4J5V7KlMwWMIXctqgJJg 7zfTrR0vm1I9KHx02WFalUeLoPUeYAvLLbGmlvisSCeKl+yCwWU3MjcUlt5WuXKYTtnTgGHjiz94z 37zJ7Fj8Bi+6IOm8cSiKjYU8U7SPi6vUuk8qG5X9R67yuw7dQDrPikC4jAtbUx9Y30C9AAZH5ItCg /pkkFyv1MLs6KHghahZp09/X3g1F+N5oZNKv76DKuzO8+ldvJfon1cyyDYsrLhLlqNs4tnkU2BrpP gfgl4mEbu0CRKg==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ouxfM-0008O0-Cz; Tue, 15 Nov 2022 10:16:00 -0500 Date: Tue, 15 Nov 2022 17:16:14 +0200 Message-Id: <837czwi6yp.fsf@gnu.org> From: Eli Zaretskii To: Ihor Radchenko In-Reply-To: <87h6z0cl6b.fsf@localhost> (message from Ihor Radchenko on Tue, 15 Nov 2022 15:05:48 +0000) Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Ihor Radchenko > Cc: 59275@debbugs.gnu.org > Date: Tue, 15 Nov 2022 15:05:48 +0000 > > Eli Zaretskii writes: > > > string-collate-lessp is inherently platform- (and locale-) dependent. > > Don't use it if you want consistent results across platforms and > > locales. > > Is there a better alternative? Alternative to do what job? > Also, do I miss something, or is this pitfall not documented in the > docstring of `string-collate-lessp'? It isn't? then what is this about: This function obeys the conventions for collation order in your locale settings. For example, punctuation and whitespace characters might be considered less significant for sorting: (sort '("11" "12" "1 1" "1 2" "1.1" "1.2") 'string-collate-lessp) => ("11" "1 1" "1.1" "12" "1 2" "1.2") [...] To emulate Unicode-compliant collation on MS-Windows systems, bind ‘w32-collate-ignore-punctuation’ to a non-nil value, since the codeset part of the locale cannot be "UTF-8" on MS-Windows. The ELisp manual says in addition: This behavior is system-dependent; e.g., punctuation and whitespace are never ignored on Cygwin, regardless of locale. If this doesn't have a big WARNING sign near it, then what would? From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 15 20:33:40 2022 Received: (at 59275) by debbugs.gnu.org; 16 Nov 2022 01:33:40 +0000 Received: from localhost ([127.0.0.1]:55619 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ov7J6-0002gX-1E for submit@debbugs.gnu.org; Tue, 15 Nov 2022 20:33:40 -0500 Received: from mout01.posteo.de ([185.67.36.65]:57717) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ov7J1-0002gI-Tp for 59275@debbugs.gnu.org; Tue, 15 Nov 2022 20:33:38 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 13806240026 for <59275@debbugs.gnu.org>; Wed, 16 Nov 2022 02:33:29 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1668562410; bh=1rLJfVO6Z6jbNFXq2WWolXgH710kNNw2SyN5NXhfsGM=; h=From:To:Cc:Subject:Date:From; b=oPNf7vYbHt4dwRx8/GtuL/WiOy3N2pKUM4dYPPQIS5B1pxp4sbjmrHUVdU5xLXGYo WAyiHVLlpcRilAqTnHNefSkVs3CNFo0wFNbywad4IxK8EUwr26LVbanYIuUVCeArgk bk84eOHtbGUGeLAnhf0Jfa318crCn040QhLP3f8lmZDYdPx9kjNM9gVJhIfdrtSpWJ rm/SgPyqb+YRtprpGDNlUZAFQZPmR+aHFskhw9fqRJoJ1JqWJW7FD1kMQgR4nQ4NlR KAr1IlppVXwMu9xwGCBjkBdXitLdYE1JFpuOrZe0u/xc+vx8HQW5+ecDJIp/NJCWW1 ABK7c4yur+0MA== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4NBltc3lCfz9rxF; Wed, 16 Nov 2022 02:33:27 +0100 (CET) From: Ihor Radchenko To: Eli Zaretskii Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac In-Reply-To: <837czwi6yp.fsf@gnu.org> References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> Date: Wed, 16 Nov 2022 01:34:09 +0000 Message-ID: <8735ajel7y.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Eli Zaretskii writes: >> > string-collate-lessp is inherently platform- (and locale-) dependent. >> > Don't use it if you want consistent results across platforms and >> > locales. >>=20 >> Is there a better alternative? > > Alternative to do what job? Reliable sorting. In particular, I am looking for a better PREDICATE argument for `sort-subr' for case-sensitive and case-insensitive sorting of strings. >> Also, do I miss something, or is this pitfall not documented in the >> docstring of `string-collate-lessp'? > > It isn't? then what is this about: > > This function obeys the conventions for collation order in your > locale settings. For example, punctuation and whitespace characters > might be considered less significant for sorting: > > (sort '("11" "12" "1 1" "1 2" "1.1" "1.2") 'string-collate-lessp) > =3D> ("11" "1 1" "1.1" "12" "1 2" "1.2") > [...] > To emulate Unicode-compliant collation on MS-Windows systems, > bind =E2=80=98w32-collate-ignore-punctuation=E2=80=99 to a non-nil valu= e, since > the codeset part of the locale cannot be "UTF-8" on MS-Windows. The above sounds like we just need to worry about some edge cases where different approaches may exist to sorting. Like with punctuation, numbers, and spaces. Having (string-collate-lessp "a" "B" "C" t) ; =3D> nil is totally unexpected because case-insensitive "a"<"B"<"C" sounds like the only reasonable outcome. I'd like the warning to be even more prominent. Feel free to disagree. --=20 Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 15 22:46:55 2022 Received: (at 59275) by debbugs.gnu.org; 16 Nov 2022 03:46:55 +0000 Received: from localhost ([127.0.0.1]:55739 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ov9O3-00061I-4S for submit@debbugs.gnu.org; Tue, 15 Nov 2022 22:46:55 -0500 Received: from mout02.posteo.de ([185.67.36.66]:35875) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ov9Ny-000610-5n for 59275@debbugs.gnu.org; Tue, 15 Nov 2022 22:46:53 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id 01D12240103 for <59275@debbugs.gnu.org>; Wed, 16 Nov 2022 04:46:41 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1668570403; bh=3ywZ6+8s2otpFx6BKVD7w/pr0rGqovtvAhKJ+2S+gQ4=; h=From:To:Cc:Subject:Date:From; b=VzmYHWKEXb7JCz2oZq5lnjwRSOlyCU1v+4scXU+G5tGEdf2hG+IDqUxQfjGRabW0q BzV62dw+4QwhU2MH1F/MrcpF58npsCrkCXyccsj7XasrIDEkLPitD8MNlWQ+tZipWl 8furBd83ZF8AtjKD8PeRP8qsT3cPCfkyGeJurlMNL3Mb7b2VC75AWbIEIQ6IbwLDHa ugw6fgkakv5+7/zPESAG0yaRPcH4CYlACsQT8OtYCEA/OrXBUuKixCZZDO/m1F6GTm oUtBupXz7zLyeA3x4vI72SfULxgdZ+y+4kSun6bq3emLRkLUlI//tAfKQ2feA9MZMu WzUuljxTpUMPw== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4NBprG0PtMz9rxG; Wed, 16 Nov 2022 04:46:37 +0100 (CET) From: Ihor Radchenko To: Robert Pluim Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac In-Reply-To: <87wn7wczpt.fsf@gmail.com> References: <87zgcsdfma.fsf@localhost> <87wn7wczpt.fsf@gmail.com> Date: Wed, 16 Nov 2022 03:47:19 +0000 Message-ID: <87mt8rd0hk.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Robert Pluim writes: > I think this is expected. See the long thread on emacs-devel back in > July, eg > https://lists.gnu.org/archive/html/emacs-devel/2022-07/msg00940.html > > (it resulted in the addition of `string-equal-ignore-case') Ok. So, it looks like `compare-strings' is the way to go for system-independent string comparison. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 16 08:00:19 2022 Received: (at 59275) by debbugs.gnu.org; 16 Nov 2022 13:00:19 +0000 Received: from localhost ([127.0.0.1]:56322 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ovI1a-0007y8-Sw for submit@debbugs.gnu.org; Wed, 16 Nov 2022 08:00:19 -0500 Received: from eggs.gnu.org ([209.51.188.92]:35144) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ovI1W-0007xn-SS for 59275@debbugs.gnu.org; Wed, 16 Nov 2022 08:00:17 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ovI1R-0002iJ-Jg; Wed, 16 Nov 2022 08:00:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=txniEdsEYWygIFO11tJfsYFUaox/ci288IFrhdM0EwU=; b=RkzgypVx36K8uftnYXzq Ovi7jG8f5jIw5UspMFxPVAZP4mH1x9RMQRy6G9pwpLJ+yj0RS4veHuLk2uV5odTVMmLHyRcnx1/n/ xOUpeq+H+wP9KeaZbAeooEsvhTI6BM26Bg4QfI2p3KoX1y5ufDx0lOaTrypP2aSidO8p0Kki2gPRn UloTrfUbll6ZAhaDILIRqEOLPEStEJBnAkTOaPbM8GpzV9sF3dBl7P8+fWI1+SAmeg+h59DnuC7Yy ej/sbjCJBs89nV2XHN/Sn1Ilbgva+93z0y/42LNzmUvpgFKS6ajb1oTnZ5iitdLIahFkHHOKouRcy rY/wj8syB9w1Vg==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ovI1R-0006c6-1I; Wed, 16 Nov 2022 08:00:09 -0500 Date: Wed, 16 Nov 2022 15:00:06 +0200 Message-Id: <83mt8rgill.fsf@gnu.org> From: Eli Zaretskii To: Ihor Radchenko In-Reply-To: <8735ajel7y.fsf@localhost> (message from Ihor Radchenko on Wed, 16 Nov 2022 01:34:09 +0000) Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Ihor Radchenko > Cc: 59275@debbugs.gnu.org > Date: Wed, 16 Nov 2022 01:34:09 +0000 > > Eli Zaretskii writes: > >> > string-collate-lessp is inherently platform- (and locale-) dependent. > >> > Don't use it if you want consistent results across platforms and > >> > locales. > >> > >> Is there a better alternative? > > > > Alternative to do what job? > > Reliable sorting. > In particular, I am looking for a better PREDICATE argument for > `sort-subr' for case-sensitive and case-insensitive sorting of strings. In the strict order of Unicode codepoints? Use compare-strings. > >> Also, do I miss something, or is this pitfall not documented in the > >> docstring of `string-collate-lessp'? > > > > It isn't? then what is this about: > > > > This function obeys the conventions for collation order in your > > locale settings. For example, punctuation and whitespace characters > > might be considered less significant for sorting: > > > > (sort '("11" "12" "1 1" "1 2" "1.1" "1.2") 'string-collate-lessp) > > => ("11" "1 1" "1.1" "12" "1 2" "1.2") > > [...] > > To emulate Unicode-compliant collation on MS-Windows systems, > > bind ‘w32-collate-ignore-punctuation’ to a non-nil value, since > > the codeset part of the locale cannot be "UTF-8" on MS-Windows. > > The above sounds like we just need to worry about some edge cases where > different approaches may exist to sorting. Like with punctuation, > numbers, and spaces. > > Having > > (string-collate-lessp "a" "B" "C" t) ; => nil > > is totally unexpected because case-insensitive "a"<"B"<"C" sounds like > the only reasonable outcome. It is hard to guess what will be unexpected for people. When the doc string was written, the example used there was deemed to be the most striking surprise from using locale-dependent collation, so it was what we used. > I'd like the warning to be even more prominent. You want to make it explicit that for systems where we use string-lessp the IGNORE-CASE argument is ignored? Or do you want some other change? Anyway, feel free to suggest some text to that effect. From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 21 02:28:28 2022 Received: (at 59275) by debbugs.gnu.org; 21 Nov 2022 07:28:28 +0000 Received: from localhost ([127.0.0.1]:45206 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ox1EB-0005FY-R3 for submit@debbugs.gnu.org; Mon, 21 Nov 2022 02:28:28 -0500 Received: from mout01.posteo.de ([185.67.36.65]:33131) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ox1E9-0005FK-HG for 59275@debbugs.gnu.org; Mon, 21 Nov 2022 02:28:26 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 13731240027 for <59275@debbugs.gnu.org>; Mon, 21 Nov 2022 08:28:18 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1669015699; bh=3b8BqRW74Kr3uWyd9uBDGy8/l8VELqXBDie8WZ+W14c=; h=From:To:Cc:Subject:Date:From; b=k21TzTzS53fk+SWLAqsG+i6t7KeUPnbaxw9GLiT88bS40asZLey+NuhYCcUTCNxb8 IZi+UPE3vcPtHXyZCSYkFOT9++B6rwNX2ZwWqZPjFD3UyPlTlN8ZuqGCkVqi7ouvLc SGyCGKg7mVmv16HXL9a2f76D+FyWv7vVNKQnJrlTZpgxlhnq3zZPiL9+WNWCy47ww8 RqzPVTTJFE7F2iVbP8hbl9Lhl2XLmr4EhVXHorZTDy28TDBrgBsHc3/RMuB3NwBE1S BAV9HN5SV2ydHLQBdISrDyrZ3oz9QJXPuiHC+DEtgFmrUQRCkLhLKZ0PMKvvc0He0q wjEGHqo0/7Dqw== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4NFzWj4Zjsz6tmr; Mon, 21 Nov 2022 08:28:17 +0100 (CET) From: Ihor Radchenko To: Eli Zaretskii Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac In-Reply-To: <83mt8rgill.fsf@gnu.org> References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> Date: Mon, 21 Nov 2022 07:28:55 +0000 Message-ID: <877czokbpk.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Eli Zaretskii writes: >> Reliable sorting. >> In particular, I am looking for a better PREDICATE argument for >> `sort-subr' for case-sensitive and case-insensitive sorting of strings. > > In the strict order of Unicode codepoints? Use compare-strings. Thanks for the clarification. After further considerations, it looks like we should still use `string-collate-lessp' on Org side as it yields expected results if libc properly implements the collation. >> I'd like the warning to be even more prominent. > > You want to make it explicit that for systems where we use > string-lessp the IGNORE-CASE argument is ignored? Or do you want some > other change? Yes, I think. > Anyway, feel free to suggest some text to that effect. Maybe change If your system does not support a locale environment, this function behaves like `string-lessp'. to Some operating systems do not implement correct collation (in specific locale environments or at all). Then, this functions falls back to case-sensitive `string-lessp' and IGNORE-CASE argument is ignored. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 21 08:31:37 2022 Received: (at 59275) by debbugs.gnu.org; 21 Nov 2022 13:31:37 +0000 Received: from localhost ([127.0.0.1]:45783 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ox6tc-0000fD-TQ for submit@debbugs.gnu.org; Mon, 21 Nov 2022 08:31:37 -0500 Received: from eggs.gnu.org ([209.51.188.92]:43918) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ox6tZ-0000ey-BI for 59275@debbugs.gnu.org; Mon, 21 Nov 2022 08:31:36 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ox6tT-0005zg-Ky; Mon, 21 Nov 2022 08:31:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=sB1pkd1rAr1XueDezPqLN9c/sFhNQp3L4vToGsSgIps=; b=p4VLKCBN7byq 24L0V52Z6cGYrE1MVyz3dZW7TWpJqR3Xeh9n4juZYVolik8tnchtS57LxXPLmZ5GkNjPeGOujp5tP /RJtTS/12BfNUzLdKJRID4qZM1o4J8shlphKyd1PRptbMmy7u0knHJPooy0tQSODh9KmFnP3I72Lo cAqQP2pJmgh+bzCMb8tO0OfOQ+HZpdng5DlNNoVzKXCtTIOmjPZz33yMhfXkxfMTWRjk7O6E5dWB0 sv9yk0pqD6gJ4a/Rupj2bYMSeTHbGSFPdPqu/mSav67Qz8wlqEgbzVDMvvVEl1zSJmAm3b21NHimT 5u99a5nfo3eSOTBAf5M9pg==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ox6tT-0003Ga-4G; Mon, 21 Nov 2022 08:31:27 -0500 Date: Mon, 21 Nov 2022 15:31:38 +0200 Message-Id: <8335ac4eo5.fsf@gnu.org> From: Eli Zaretskii To: Ihor Radchenko In-Reply-To: <877czokbpk.fsf@localhost> (message from Ihor Radchenko on Mon, 21 Nov 2022 07:28:55 +0000) Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Ihor Radchenko > Cc: 59275@debbugs.gnu.org > Date: Mon, 21 Nov 2022 07:28:55 +0000 > > Eli Zaretskii writes: > > >> Reliable sorting. > >> In particular, I am looking for a better PREDICATE argument for > >> `sort-subr' for case-sensitive and case-insensitive sorting of strings. > > > > In the strict order of Unicode codepoints? Use compare-strings. > > Thanks for the clarification. > After further considerations, it looks like we should still use > `string-collate-lessp' on Org side as it yields expected results if libc > properly implements the collation. Is the feature that uses it intended to be used only on glibc platforms (which basically means GNU/Linux)? If not, I'm surprised that you arrived at this conclusion. It is the 180 deg opposite of what I think you should have decided. Once again: locale-specific collation order is inherently unpredictable in its results, and should only be used when the locale-specific order is a _must_, like when sorting people's names for a telephone directory. > Maybe change > > If your system does not support a locale environment, this function > behaves like `string-lessp'. > > to > > Some operating systems do not implement correct collation (in specific > locale environments or at all). Then, this functions falls back to > case-sensitive `string-lessp' and IGNORE-CASE argument is ignored. Fine with me. From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 21 20:24:14 2022 Received: (at 59275) by debbugs.gnu.org; 22 Nov 2022 01:24:14 +0000 Received: from localhost ([127.0.0.1]:49191 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oxI1G-0006rh-0F for submit@debbugs.gnu.org; Mon, 21 Nov 2022 20:24:14 -0500 Received: from mout01.posteo.de ([185.67.36.65]:43231) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oxI1A-0006rQ-Nr for 59275@debbugs.gnu.org; Mon, 21 Nov 2022 20:24:12 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id AFF09240026 for <59275@debbugs.gnu.org>; Tue, 22 Nov 2022 02:24:02 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1669080242; bh=mbCRgYcQC9C75aGIgNLBPhB+ueYaEK+LJMYJVk63c20=; h=From:To:Cc:Subject:Date:From; b=WKi0CwEXwlVRq19ixv1IaR20FlP9nOR1ZE93I/Ds6hBKWFIP8OHm7nidAdrW6qFXa 914yejIiJuYIJe4exEVO5DDEC2LJ8w1lSg50Hg47CIJktPMWa6TEA4I/OJyop8k6tE kMSqG8pmX6/erj1kTgn7OZyO6sbxvnv4Q7gtQ4MOr4xA/ZtZwtz2m9nO9RyCTduBb4 5E00CtSbLQ5rhm+3xiV5JI9mLyrF19KId6S8idwt+NfUb9P0pj1Sd8NgbLc3rrRH3Q hpoZtEPzZQMNb7Hhx20BVQ+ibbFaqWDlvJFOIbDSEL9R8IamebDRD/SX+Nsr4WF+KE 9frpjwm4gwbag== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4NGRNx13ZJz9rxH; Tue, 22 Nov 2022 02:24:00 +0100 (CET) From: Ihor Radchenko To: Eli Zaretskii Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac In-Reply-To: <8335ac4eo5.fsf@gnu.org> References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> Date: Tue, 22 Nov 2022 01:24:43 +0000 Message-ID: <87ilj7dbms.fsf@localhost> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --=-=-= Content-Type: text/plain Eli Zaretskii writes: >> > In the strict order of Unicode codepoints? Use compare-strings. >> >> Thanks for the clarification. >> After further considerations, it looks like we should still use >> `string-collate-lessp' on Org side as it yields expected results if libc >> properly implements the collation. > > Is the feature that uses it intended to be used only on glibc platforms > (which basically means GNU/Linux)? If not, I'm surprised that you arrived > at this conclusion. It is the 180 deg opposite of what I think you should > have decided. > > Once again: locale-specific collation order is inherently unpredictable in > its results, and should only be used when the locale-specific order is a > _must_, like when sorting people's names for a telephone directory. We use string collation for 1. Sorting bibliographies 2. Sorting lists 3. Sorting table lines 4. Sorting tags 5. Sorting headings 6. Sorting entries in agendas 7. As a criterion for agenda/tag filtering when comparison operator is used on string property values (11.3.3 Matching tags and properties) 1-6 should follow the locale. I think we had a bug report in the past where a user got confusing about list sorting being confusing for the user language conventions. 7 is more debatable. >> Maybe change >> >> If your system does not support a locale environment, this function >> behaves like `string-lessp'. >> >> to >> >> Some operating systems do not implement correct collation (in specific >> locale environments or at all). Then, this functions falls back to >> case-sensitive `string-lessp' and IGNORE-CASE argument is ignored. > > Fine with me. See the attached patch. --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-src-fns.c-Fstring_collate_lessp-Clarify-docstring.patch >From d9a67e94547ffeb6d8ac8a1202434fff1117af3f Mon Sep 17 00:00:00 2001 Message-Id: From: Ihor Radchenko Date: Tue, 22 Nov 2022 09:21:17 +0800 Subject: [PATCH] * src/fns.c (Fstring_collate_lessp): Clarify docstring Clarify that IGNORE-CASE argument might be ignored when the operation system does not implement string collation for the specified locale. See bug#59275. --- src/fns.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/fns.c b/src/fns.c index 035fa12935..e337c0958d 100644 --- a/src/fns.c +++ b/src/fns.c @@ -596,8 +596,9 @@ DEFUN ("string-collate-lessp", Fstring_collate_lessp, Sstring_collate_lessp, 2, bind `w32-collate-ignore-punctuation' to a non-nil value, since the codeset part of the locale cannot be \"UTF-8\" on MS-Windows. -If your system does not support a locale environment, this function -behaves like `string-lessp'. */) +Some operating systems do not implement correct collation (in specific +locale environments or at all). Then, this functions falls back to +case-sensitive `string-lessp' and IGNORE-CASE argument is ignored. */) (Lisp_Object s1, Lisp_Object s2, Lisp_Object locale, Lisp_Object ignore_case) { #if defined __STDC_ISO_10646__ || defined WINDOWSNT -- 2.35.1 --=-=-= Content-Type: text/plain -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 22 07:56:09 2022 Received: (at 59275-done) by debbugs.gnu.org; 22 Nov 2022 12:56:09 +0000 Received: from localhost ([127.0.0.1]:50163 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oxSor-00065W-0r for submit@debbugs.gnu.org; Tue, 22 Nov 2022 07:56:09 -0500 Received: from eggs.gnu.org ([209.51.188.92]:43338) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oxSop-00064k-3b for 59275-done@debbugs.gnu.org; Tue, 22 Nov 2022 07:56:07 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oxSoh-0005BP-MY; Tue, 22 Nov 2022 07:55:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=UA1fAWsZgSuH33r7xo86lMVqdzQTx3ruK1sq2/bOngk=; b=Zl0OkQNw1rNU 33imF0fXIohOv1BH4Bk7lVWNzER6nj6Pnguco8YF3Km7vp9LHu3UtJyA0CBG9Hckvdxbn7fucbpoL qFmeP8gYQRQYCilkfnJpIvZ7Emwr1CNxWCniQGiZKy2BmoU8Fqilop5b6CIIluYzvId4Vx8XuXFWh et4sB2yoNP2IXJS9aL4RELMkdKYM5xe3bssJH8i41YhBHR0OuwKWSqbqtApiKzwmTFdv/jIgZGDJN XXMvfHsVwx6bBqUn0afMyxBuWlqum8t0OB33EFDnV4EkFd1cT+5nnGiBUxdxszNADRLleTbfUVDMv E0eTFv4cx5Iu7T4T2WWxEA==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oxSoh-0000jr-6E; Tue, 22 Nov 2022 07:55:59 -0500 Date: Tue, 22 Nov 2022 14:56:14 +0200 Message-Id: <83sfib172p.fsf@gnu.org> From: Eli Zaretskii To: Ihor Radchenko In-Reply-To: <87ilj7dbms.fsf@localhost> (message from Ihor Radchenko on Tue, 22 Nov 2022 01:24:43 +0000) Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 59275-done Cc: 59275-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Ihor Radchenko > Cc: 59275@debbugs.gnu.org > Date: Tue, 22 Nov 2022 01:24:43 +0000 > > > Once again: locale-specific collation order is inherently unpredictable in > > its results, and should only be used when the locale-specific order is a > > _must_, like when sorting people's names for a telephone directory. > > We use string collation for > > 1. Sorting bibliographies > 2. Sorting lists > 3. Sorting table lines > 4. Sorting tags > 5. Sorting headings > 6. Sorting entries in agendas > 7. As a criterion for agenda/tag filtering when comparison operator is > used on string property values (11.3.3 Matching tags and properties) > > 1-6 should follow the locale. I think only 1 and 6 are firmly in that category. For the others it depends on whether the results of the sorting are immediately displayed, or used for further processing. In the former case, using string-collate-lessp is semi-okay ("semi" because producing different results in different locales can still confuse users); in the latter case it is wrong, IMO, because you will cause unexpected results. > See the attached patch. Thanks, installed. From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 23 05:38:52 2022 Received: (at 59275-done) by debbugs.gnu.org; 23 Nov 2022 10:38:52 +0000 Received: from localhost ([127.0.0.1]:53787 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oxn9Y-0006wd-FV for submit@debbugs.gnu.org; Wed, 23 Nov 2022 05:38:52 -0500 Received: from mout01.posteo.de ([185.67.36.65]:37305) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oxn9U-0006wO-Cu for 59275-done@debbugs.gnu.org; Wed, 23 Nov 2022 05:38:51 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 6DB2E240028 for <59275-done@debbugs.gnu.org>; Wed, 23 Nov 2022 11:38:42 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1669199922; bh=0cWji6FFUcQkVCNjyP2tO2nfxSuxqZduQLQFwUtuvU4=; h=From:To:Cc:Subject:Date:From; b=ZUeiEgeqBlolsWjL9rpwGwWANVYmtY2YYmciedf+ASKnY9jqOUM9jOITVaHwV0/y2 H9Q6emlZEDKE2BFgOvU3rfATEqulmhijeF8KmTvFvl98CrQ3rJs18fck0aabZpFBXV FOWKgHh6EIQB2D2+toZJ68mtNbryPvB2Ku9Amg5RV+myIgfi1emzl94uxFZwP99ixM JP3oNEtX69+i7pWedS+/NBMUqY4zz6p2DRHMNIFTSY+FGhZwrVAbwwekWx0s87Zf9l i7XDXsmhU/1o1oBfDgq53OrAgf+C6NOEbdcCoxNy1+dh5nslWrgUX5sxHL9c6PpZnQ 8DNRT8TBoE9sA== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4NHHfT0Gcrz6tml; Wed, 23 Nov 2022 11:38:40 +0100 (CET) From: Ihor Radchenko To: Eli Zaretskii Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac In-Reply-To: <83sfib172p.fsf@gnu.org> References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> Date: Wed, 23 Nov 2022 10:39:22 +0000 Message-ID: <87h6yqhs4l.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 59275-done Cc: 59275-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Eli Zaretskii writes: >> See the attached patch. > > Thanks, installed. Should we update the manual as well? 4.5 Comparison of Characters and Strings section contains the old docstring verbatim. P.S. I am wondering if there is some automated way to deal with verbatim docstrings in the manuals. They are so easy to slip through when the Elisp docstrings get updated. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 23 09:58:14 2022 Received: (at 59275) by debbugs.gnu.org; 23 Nov 2022 14:58:14 +0000 Received: from localhost ([127.0.0.1]:55903 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oxrCY-0004Bj-2H for submit@debbugs.gnu.org; Wed, 23 Nov 2022 09:58:14 -0500 Received: from eggs.gnu.org ([209.51.188.92]:45416) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oxrCU-0004BV-D8 for 59275@debbugs.gnu.org; Wed, 23 Nov 2022 09:58:12 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oxrCP-00077Y-2F; Wed, 23 Nov 2022 09:58:05 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=yfb0NhpweMcqZ+U7Yz05HLrcvSk/wG8sC0jwSVPEu4A=; b=NX1OKg2XuzNc cnP2Jqvc1WAWnB31WPmsL55lla0yIIIlNyYBqgoGIAHDXDoV9L6t35lleoRExZeS1n04bdvj/dYq5 p3bx4s+91p1iRYM5aIKSM/rxYz96gUSugmHEO3x1j4IsbIb50UrRKgPDqbQHfduqiXnuHZyeuQYar qSEY/X00te/x2FWs1ueTYuPbfe26PWu2DJEngUGzyH/r5AMkLQfWBe0JA439sZ9JwQu1HVAGtoR5h CL92CWyFi4s8vrgIoVcKA/zRwL36wv3nZ+rt46fokDWqodL+3CaNvPgPDnXIu79+HOeQo7ALtjnuD zFnLMY8WBhvTcE2jdPREYw==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oxrCO-0005lm-90; Wed, 23 Nov 2022 09:58:04 -0500 Date: Wed, 23 Nov 2022 16:58:20 +0200 Message-Id: <835yf5zpir.fsf@gnu.org> From: Eli Zaretskii To: Ihor Radchenko In-Reply-To: <87h6yqhs4l.fsf@localhost> (message from Ihor Radchenko on Wed, 23 Nov 2022 10:39:22 +0000) Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> <87h6yqhs4l.fsf@localhost> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Ihor Radchenko > Cc: 59275-done@debbugs.gnu.org > Date: Wed, 23 Nov 2022 10:39:22 +0000 > > Eli Zaretskii writes: > > >> See the attached patch. > > > > Thanks, installed. > > Should we update the manual as well? > 4.5 Comparison of Characters and Strings section contains the old > docstring verbatim. I see in the manual text that is not a verbatim copy of the doc string, but an expanded version of it with more detailed explanations. Which is how it should be: it is IMNSHO bad documentation-fu to have the manual just copycat the doc strings. (We sometimes do it for lack of time, but it is not a Good Thing.) The note about case-sensitivity of the fallback was missing from the manual, so I added it. > P.S. I am wondering if there is some automated way to deal with verbatim > docstrings in the manuals. They are so easy to slip through when the > Elisp docstrings get updated. There should be no verbatim copies of doc strings in the manual. So I'm not interested in making that bad practice easier ;-) Thanks. From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 23 21:22:13 2022 Received: (at 59275) by debbugs.gnu.org; 24 Nov 2022 02:22:14 +0000 Received: from localhost ([127.0.0.1]:56738 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oy1sT-0001ct-LF for submit@debbugs.gnu.org; Wed, 23 Nov 2022 21:22:13 -0500 Received: from mout02.posteo.de ([185.67.36.66]:36751) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oy1sP-0001cc-Vf for 59275@debbugs.gnu.org; Wed, 23 Nov 2022 21:22:12 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id 451A3240101 for <59275@debbugs.gnu.org>; Thu, 24 Nov 2022 03:22:04 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1669256524; bh=1acoRrZNH0BPllLuinp+HiWgH/QtxZdnyEkwSgaf8xk=; h=From:To:Cc:Subject:Date:From; b=GU2QWvnoM8XDzRJkhg5XiyE1VNZbNBiQSh1RpbVQ9ZK29KjwgFL46sGTqOIDmupWV b3PP7dC1UHxP0L3b4hpu4bvy3wpt1EscLCSh+N3wgZmEJGSGt92qWCZ5mk0iwH8f1l vvVPKCHdGAaCw7h8jK1zxkFUUG2jOy8/W9aLAIFJI/cNZ0SBsAAkel/MDtpdaZOzO9 ObS+Bj51O70l+Zm5YaGOPGwK427c5dbog6YobI2nTKHxS1FURXZNYvg5cAgttp0HZi gpxcbIf/bqfZRE5dlGUy3rhOoY92qU0slw/9TmmyFEO2ploLRd+Ucv0zJ/Y1hsZTfX IPe+jY0MF2W3Q== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4NHhZy6k30z6tqC; Thu, 24 Nov 2022 03:21:59 +0100 (CET) From: Ihor Radchenko To: Eli Zaretskii Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac In-Reply-To: <835yf5zpir.fsf@gnu.org> References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> <87h6yqhs4l.fsf@localhost> <835yf5zpir.fsf@gnu.org> Date: Thu, 24 Nov 2022 02:22:41 +0000 Message-ID: <87wn7lay6m.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Eli Zaretskii writes: >> Should we update the manual as well? >> 4.5 Comparison of Characters and Strings section contains the old >> docstring verbatim. > > I see in the manual text that is not a verbatim copy of the doc string, but > an expanded version of it with more detailed explanations. Which is how it > should be: it is IMNSHO bad documentation-fu to have the manual just copycat > the doc strings. (We sometimes do it for lack of time, but it is not a Good > Thing.) Fair point. > The note about case-sensitivity of the fallback was missing from the manual, > so I added it. Thanks! >> P.S. I am wondering if there is some automated way to deal with verbatim >> docstrings in the manuals. They are so easy to slip through when the >> Elisp docstrings get updated. > > There should be no verbatim copies of doc strings in the manual. So I'm not > interested in making that bad practice easier ;-) What about forgetting to update the manual when important changes are made to the docstring? I know for certain that it happened many times with Org manual. Maybe something can be done to auto-check if updates were done to the docstring but not the manual? -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at From debbugs-submit-bounces@debbugs.gnu.org Thu Nov 24 02:22:53 2022 Received: (at 59275) by debbugs.gnu.org; 24 Nov 2022 07:22:53 +0000 Received: from localhost ([127.0.0.1]:57030 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oy6ZQ-0003Je-SD for submit@debbugs.gnu.org; Thu, 24 Nov 2022 02:22:53 -0500 Received: from eggs.gnu.org ([209.51.188.92]:33988) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oy6ZP-0003JP-4c for 59275@debbugs.gnu.org; Thu, 24 Nov 2022 02:22:51 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oy6ZI-0008Hn-7B; Thu, 24 Nov 2022 02:22:45 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=u3FloXWkVNBVdgPMB+r5iZOSqcUzHfPq8UMc5XkJFMk=; b=GqPrxjr3m1r07GZKqlcK XbxHcZs5jsF37MC6PA/jy9R94TosaHxaC5Msy9jGE5EbqORg92iMEpmaRL41Cngo4aNPwHbLShvHk 1RiPRxLgN+GGEKynaCIHMIYOSKLku0BiW1pzN6COBQGnu+ZVds4Zh2SlAuLRWzVlb8zsMbuUlH8Ho kpeHhjXU+ZxFibSN4bZj66w7w9F+GJ+znoQ/NAKzrvrQWecLvmwuUsjiTQtot6GJYLBUHaoYHJPlU SdSZQ+Sv3H3WE/dP7jUqvHRLXgBl4y6Mpseq0iZO6Llcf7WNvGLGKKEv+NStAjyo+iTt+F4y/iMUF 86qDYwTyOWKSsw==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oy6ZH-0001AR-9w; Thu, 24 Nov 2022 02:22:44 -0500 Date: Thu, 24 Nov 2022 09:23:03 +0200 Message-Id: <83bkowyfxk.fsf@gnu.org> From: Eli Zaretskii To: Ihor Radchenko In-Reply-To: <87wn7lay6m.fsf@localhost> (message from Ihor Radchenko on Thu, 24 Nov 2022 02:22:41 +0000) Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> <87h6yqhs4l.fsf@localhost> <835yf5zpir.fsf@gnu.org> <87wn7lay6m.fsf@localhost> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Ihor Radchenko > Cc: 59275@debbugs.gnu.org > Date: Thu, 24 Nov 2022 02:22:41 +0000 > > > There should be no verbatim copies of doc strings in the manual. So I'm not > > interested in making that bad practice easier ;-) > > What about forgetting to update the manual when important changes are > made to the docstring? I know for certain that it happened many times > with Org manual. Maybe something can be done to auto-check if updates > were done to the docstring but not the manual? That could be a useful feature, suitable for checkdoc.el, perhaps. But there are 2 issues here that I'm not sure how would such a feature handle: . not every symbol that has a doc string is mentioned in the manuals . the doc string and the text in the manual are generally different, and so it could be that the update to a doc string doesn't require any update to the manual text So a naïve implementation would probably have too many false positives. Not sure if this could render the feature useless. Bottom line: I'm not sure we can have a good automated way of detecting updates that were missed, except at patch review time, and that is a judgment call by the person who does the review, and relies on his/her vigilance. But if someone could come up with a good way of doing that, it will be appreciated. From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 25 21:03:17 2022 Received: (at 59275-done) by debbugs.gnu.org; 26 Nov 2022 02:03:17 +0000 Received: from localhost ([127.0.0.1]:37504 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oykXE-0004KQ-RJ for submit@debbugs.gnu.org; Fri, 25 Nov 2022 21:03:17 -0500 Received: from mout01.posteo.de ([185.67.36.65]:43209) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oykXC-0004KA-E2 for 59275-done@debbugs.gnu.org; Fri, 25 Nov 2022 21:03:15 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 622F2240026 for <59275-done@debbugs.gnu.org>; Sat, 26 Nov 2022 03:03:08 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1669428188; bh=RekQwok0c7maSVMx1b3K3esIFAeLUUXOjsulqZ8dHOs=; h=From:To:Cc:Subject:Date:From; b=Ji4DnQZqNIkoJqQTUg3FDPivdMaey0MOrXC/lOj7vPYuhpGRDANkQJgA3H/LUmubs FkoNjH2NJ+gNBIfDvR4aJGKnrvq6AXKgh0m9BXoAm//CqB20uOWpyTDvvxUWcq2nRs m27zi3565vN1Ai+7BTVEjmD38wkCaJDGZMgkbeVLaUXbqoxcK9sj0oEPB+F6Gp7IPf HueCGe79mAT4cyRJEarGMIaFXSb8LWl3cnEu3czCrx0mDxV3GEUNMtnnaGnlpfuQZM zUe8T3xcgs2cbbCK4GbZiYOkqKa9lfpxug2kXchDrmnNFrZk+tw6EYbSUMWl29djmf QALmUvczCqWwg== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4NJw4B6WdWz6tm9; Sat, 26 Nov 2022 03:03:03 +0100 (CET) From: Ihor Radchenko To: Eli Zaretskii Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac In-Reply-To: <83sfib172p.fsf@gnu.org> References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> Date: Sat, 26 Nov 2022 02:03:43 +0000 Message-ID: <877czimpz4.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 59275-done Cc: 59275-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Eli Zaretskii writes: >> We use string collation for >>=20 >> 1. Sorting bibliographies >> 2. Sorting lists >> 3. Sorting table lines >> 4. Sorting tags >> 5. Sorting headings >> 6. Sorting entries in agendas >> 7. As a criterion for agenda/tag filtering when comparison operator is >> used on string property values (11.3.3 Matching tags and properties) >>=20 >> 1-6 should follow the locale. > > I think only 1 and 6 are firmly in that category. For the others it depe= nds > on whether the results of the sorting are immediately displayed, or used = for > further processing. In the former case, using string-collate-lessp is > semi-okay ("semi" because producing different results in different locales > can still confuse users); in the latter case it is wrong, IMO, because you > will cause unexpected results. 1-6 are for interactive use. As Maxim pointed out in https://orgmode.org/list/tlle59$pl3$1@ciao.gmane.io, `string-collate-lessp' generally yield better results for human consumption: " (setq lst '("semana" "se=C3=B1or" "sepia")) (sort lst #'string-lessp) ; =3D> ("semana" "sepia" "se=C3=B1or") (sort lst #'string-collate-lessp) ; =3D> ("semana" "se=C3=B1or" "sepia") " In the same thread, we also discussed what Org can do about MacOS and other systems that do not implement string collation. We concluded that a better fallback when collation is not available would be using downcase+string-lessp when `string-collate-lessp' is called with non-nil IGNORE-CASE argument. Would it be acceptable for Emacs to change the fallback behavior of `string-collate-lessp' to: 1. If string collation is not available and IGNORE-CASE is nil, fallback to`string-lessp'; 2. If string collation is not available and IGNORE-CASE is non-nil, use `downcase' + `string-lessp'. This will not compromise consistency and will yield slightly better fallback results. I also do not think that it will be backwards-incompatible. If the call to `string-collate-lessp' explicitly requests ignoring case, `downcase' is more expected than bare `string-lessp' that _does not_ ignore case. WDYT? --=20 Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 26 03:06:26 2022 Received: (at 59275) by debbugs.gnu.org; 26 Nov 2022 08:06:26 +0000 Received: from localhost ([127.0.0.1]:37720 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oyqCg-0000tz-1f for submit@debbugs.gnu.org; Sat, 26 Nov 2022 03:06:26 -0500 Received: from eggs.gnu.org ([209.51.188.92]:58354) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oyqCe-0000tl-3P for 59275@debbugs.gnu.org; Sat, 26 Nov 2022 03:06:24 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyqCY-0002aM-ML; Sat, 26 Nov 2022 03:06:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=ftgoWBe3FJUCjwtE1Dq5MQ4C3Ty05UYUga/VmFZLuIs=; b=A9eBKSdCbmcbXVL1eDwO uyq8yRCM/p9rKuFyAdxEt9P6hDViL73sZ0j16PJE9wUjrcSzQHIBE0AurSlYKMqds6/Y0n+H0B52c wfjOUnAcui574BlGgffDydIO1XjtbVsYPuzNd4yeMm6OTwrbmLcmGwjcQtkhhWUBOhazaUER29iJq dtvsXkm+RssrWYEM4ywDORA/B5Rr/j42WbkwGwJG0kZ7TDw2jwVp25caLiA1nPnHl+IbGixJVYa4r q1fqZ+PfoyDw2ZyqyZIcJn0xJpL6+hHJdk4JL+LwITSYvie+H8eUPpMSxi3p3yJWo65fiAS2ZpSke 2tKP2DqSp5gvRA==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyqCX-0003iE-Uw; Sat, 26 Nov 2022 03:06:18 -0500 Date: Sat, 26 Nov 2022 10:06:42 +0200 Message-Id: <83r0xqta0d.fsf@gnu.org> From: Eli Zaretskii To: Ihor Radchenko In-Reply-To: <877czimpz4.fsf@localhost> (message from Ihor Radchenko on Sat, 26 Nov 2022 02:03:43 +0000) Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> <877czimpz4.fsf@localhost> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Ihor Radchenko > Cc: 59275-done@debbugs.gnu.org > Date: Sat, 26 Nov 2022 02:03:43 +0000 > > We concluded that a better fallback when collation is not available > would be using downcase+string-lessp when `string-collate-lessp' is > called with non-nil IGNORE-CASE argument. This has caveats, see below. I won't argue about your Org-local decision, since I don't know enough about the intended uses of what you did, but I do have something to say about this decision in general. I suggest at least a FIXME comment where you do this stuff, based on what I tell below. > Would it be acceptable for Emacs to change the fallback behavior of > `string-collate-lessp' to: > > 1. If string collation is not available and IGNORE-CASE is nil, fallback > to`string-lessp'; > 2. If string collation is not available and IGNORE-CASE is non-nil, > use `downcase' + `string-lessp'. 'downcase' uses the buffer-local case table if such is defined for the buffer that happens to be the current when you invoke 'downcase', and that's another cause of inconsistency and user surprises, especially when the strings you compare don't really "belong" to the current buffer. Also, in some (rarely-used) locales, downcasing has unexpected results, even with the default case-table. For example, downcasing "I" produces "ı", not "i" as expected. Did you think about these cases when making the above decision? > I also do not think that it will be backwards-incompatible. If the call > to `string-collate-lessp' explicitly requests ignoring case, `downcase' > is more expected than bare `string-lessp' that _does not_ ignore case. > > WDYT? See above. What you suggest is perhaps fine for plain-ASCII text, but not in general, IMNSHO. The reason for what Emacs currently does on systems that lack collation functions is that for such systems collation rules are indeterminate, and so inventing them by following naïve rules of plain ASCII, in particular the case-conversion rules, is potentially very wrong. These are general-purpose APIs, not something concrete in specific Org contexts, and as such, these APIs cannot "mostly work", they should work always and for every possible use case. And we are talking about a single system where these problems happen, which is macOS, right? Wouldn't it be better for "Someone" who uses macOS to just bite the bullet and write a proper collation function, or find a free software implementation of one, and include it in Emacs? This is what I did for MS-Windows at the time string-collate-lessp was added to Emacs. Why cannot macOS users do the same? From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 26 03:46:50 2022 Received: (at 59275) by debbugs.gnu.org; 26 Nov 2022 08:46:50 +0000 Received: from localhost ([127.0.0.1]:37768 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oyqpm-0001ra-7C for submit@debbugs.gnu.org; Sat, 26 Nov 2022 03:46:50 -0500 Received: from mout02.posteo.de ([185.67.36.66]:35147) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oyqpj-0001rL-BK for 59275@debbugs.gnu.org; Sat, 26 Nov 2022 03:46:49 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id 699C2240101 for <59275@debbugs.gnu.org>; Sat, 26 Nov 2022 09:46:38 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1669452401; bh=zFbRJeuMobBMo+X2FjNaqFWY7J5CaoYi8caDhrereIA=; h=From:To:Cc:Subject:Date:From; b=Ed3BI8CnfeedvqavZIRUek0LrLWjitoK7Q9SarrRZioOtiqoFQCAkNY/9bI0kfXZr MjoOOxAkux4j6kBqjXekol03u6jrhQARjIlaTx+F5kNIWw3u29YKDSB6/Ha5c4Y0xZ K2MIaSC3kvcIvtrmzCECxVJnZ6cU+E4Lo7zD3ItxArQFUUAo5+42m+SkNecu/I4RwS 8m0K9jSrykV/fs1BeUQMOoLMl3SoNRC3NXIKwWbAAOBjVGDmYI1YNz8En94S9z4C7w a7j2SPRZFfAqFiZEw9D7ReJjN9JvvzpZ4kB5LBT2fFyNppBKMFnAKQQSk9+qQK//FN fG55rHCouJb6Q== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4NK51n46Pmz6tlh; Sat, 26 Nov 2022 09:46:35 +0100 (CET) From: Ihor Radchenko To: Eli Zaretskii Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac In-Reply-To: <83r0xqta0d.fsf@gnu.org> References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> <877czimpz4.fsf@localhost> <83r0xqta0d.fsf@gnu.org> Date: Sat, 26 Nov 2022 08:47:13 +0000 Message-ID: <87v8n2je5q.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Eli Zaretskii writes: >> We concluded that a better fallback when collation is not available >> would be using downcase+string-lessp when `string-collate-lessp' is >> called with non-nil IGNORE-CASE argument. > > This has caveats, see below. I won't argue about your Org-local decision, > since I don't know enough about the intended uses of what you did, but I = do > have something to say about this decision in general. I suggest at least= a > FIXME comment where you do this stuff, based on what I tell below. Thanks for the information! >> Would it be acceptable for Emacs to change the fallback behavior of >> `string-collate-lessp' to: >>=20 >> 1. If string collation is not available and IGNORE-CASE is nil, fallback >> to`string-lessp'; >> 2. If string collation is not available and IGNORE-CASE is non-nil, >> use `downcase' + `string-lessp'. > > 'downcase' uses the buffer-local case table if such is defined for the > buffer that happens to be the current when you invoke 'downcase', and tha= t's > another cause of inconsistency and user surprises, especially when the > strings you compare don't really "belong" to the current buffer. Interesting. Is there any reason why this is not mentioned in the docstring for `downcase'? I now see 4.10 The Case Table section of the manual, and it looks like case tables should be set mostly automatically (by Emacs?) according to the language environment. Are details about this process documented anywhere? Are these case conversion tables independent of glibc? > Also, in > some (rarely-used) locales, downcasing has unexpected results, even with = the > default case-table. For example, downcasing "I" produces "=C4=B1", not "= i" as > expected. Did you think about these cases when making the above decision? I did not. However, I recall reading somewhere that it is possible work around this kind of issues by calling case conversion several times: upcase -> downcase -> upcase -> downcase. I did not. But now, after you reminded me about this caveat, I do recall https://nullprogram.com/blog/2014/06/13/ that mentioned something similar about caveats with composition. Just mentioning it for your reference. (I am not sure if the caveats discussed have been raised on Emacs devel). >> I also do not think that it will be backwards-incompatible. If the call >> to `string-collate-lessp' explicitly requests ignoring case, `downcase' >> is more expected than bare `string-lessp' that _does not_ ignore case. >>=20 >> WDYT? > > See above. What you suggest is perhaps fine for plain-ASCII text, but not > in general, IMNSHO. > > The reason for what Emacs currently does on systems that lack collation > functions is that for such systems collation rules are indeterminate, and= so > inventing them by following na=C3=AFve rules of plain ASCII, in particula= r the > case-conversion rules, is potentially very wrong. These are general-purp= ose > APIs, not something concrete in specific Org contexts, and as such, these > APIs cannot "mostly work", they should work always and for every possible > use case. I feel that I miss something. Don't Emacs provide unicode case conversion tables? Why plain ASCII rules? > And we are talking about a single system where these problems happen, whi= ch > is macOS, right? Wouldn't it be better for "Someone" who uses macOS to j= ust > bite the bullet and write a proper collation function, or find a free > software implementation of one, and include it in Emacs? This is what I = did > for MS-Windows at the time string-collate-lessp was added to Emacs. Why > cannot macOS users do the same? It would be. But how can we ask for this? etc/TODO? Or maybe re-open this bug report? --=20 Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 26 04:22:17 2022 Received: (at 59275) by debbugs.gnu.org; 26 Nov 2022 09:22:17 +0000 Received: from localhost ([127.0.0.1]:37812 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oyrO5-0002jG-7C for submit@debbugs.gnu.org; Sat, 26 Nov 2022 04:22:17 -0500 Received: from eggs.gnu.org ([209.51.188.92]:33116) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oyrO0-0002in-N9 for 59275@debbugs.gnu.org; Sat, 26 Nov 2022 04:22:14 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyrNu-0002sY-M3; Sat, 26 Nov 2022 04:22:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=3U5qW0Sw0IQZyGxzksonoeJ2TVOad1Tzbj5dzPw0VX8=; b=Bol+rKILZtb6 eV0nI6+BHIxRQDurLxOOrc4XU6DG61VRcvy5PSWOOy855QY2D7YzUUSAk+QgVJqlno3cpgNHz1KP7 GoxRlSQCmqokWG6rS7i4kM7n8m5xj3YJX9Uv1gtYlGpTCBXo7QwhI1LcDCGxP3OnUiLgBEUiT6XMO TP6HlLGF3HYoTY0iSuXjSRXogzKWxz2+LTiOsDC3MmwHA4qdNEjC4H0ejwue104s+LlfVDiNPqekw QRZbHRaGl+9omkucuRKF6I01YaqfCvaQHVzzFEiemW9y/Z+jUNdVA13Xm+1pDK4GIHr3pZM+U/gYW b620WKc45UkVspMS8Zg2lA==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oyrNu-000553-5g; Sat, 26 Nov 2022 04:22:06 -0500 Date: Sat, 26 Nov 2022 11:22:29 +0200 Message-Id: <83k03it6i2.fsf@gnu.org> From: Eli Zaretskii To: Ihor Radchenko In-Reply-To: <87v8n2je5q.fsf@localhost> (message from Ihor Radchenko on Sat, 26 Nov 2022 08:47:13 +0000) Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> <877czimpz4.fsf@localhost> <83r0xqta0d.fsf@gnu.org> <87v8n2je5q.fsf@localhost> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Ihor Radchenko > Cc: 59275@debbugs.gnu.org > Date: Sat, 26 Nov 2022 08:47:13 +0000 > > > 'downcase' uses the buffer-local case table if such is defined for the > > buffer that happens to be the current when you invoke 'downcase', and that's > > another cause of inconsistency and user surprises, especially when the > > strings you compare don't really "belong" to the current buffer. > > Interesting. Is there any reason why this is not mentioned in the > docstring for `downcase'? Yes: because we are ashamed of that and hope to change it at some point, if we ever figure out how to do that. The way to avoid this caveat is simple: let-bind case-table when you call 'downcase'. > I now see 4.10 The Case Table section of the manual, and it looks like > case tables should be set mostly automatically (by Emacs?) according to > the language environment. Yes. But a buffer can have its local case-table. > Are details about this process documented anywhere? No. But see characters.el and the function I mention below. > Are these case conversion tables independent of glibc? Yes. We build them completely separately and from scratch, as you will see in characters.el. > https://nullprogram.com/blog/2014/06/13/ that mentioned something > similar about caveats with composition. I don't see there anything about sorting or collation. What did I miss? > Just mentioning it for your reference. (I am not sure if the caveats > discussed have been raised on Emacs devel). What did you think ought to be discussed? Btw, that blog fails to distinguish between display-time features and processing of text without displaying it. On display, Emacs combines characters that are combining, so equivalent character sequences should look the same. But Emacs doesn't by default consider equivalent character sequences as equal in all situations, leaving this to the Lisp program. Considering them always as equal looks sexy in a blog post, because it raises some brows and has the "whoah!" effect, but isn't a good policy in general, since some applications definitely need to know about the original decomposed sequence. We cannot conceal this from Lisp programs by hiding the original sequence on some low level that is not exposed to Lisp. Yes, this makes Lisp programs more complicated, but that comes with the territory: you cannot have power without complexity. > I feel that I miss something. Don't Emacs provide unicode case > conversion tables? The case tables we provide are based on Unicode, but are tweaked by the language-environment. See, for example, turkish-case-conversion-enable, which is run when the Turkish language-environment is turned on. > Why plain ASCII rules? Your logic is. What you suggest breaks down if you consider various complications in some locales. > > And we are talking about a single system where these problems happen, which > > is macOS, right? Wouldn't it be better for "Someone" who uses macOS to just > > bite the bullet and write a proper collation function, or find a free > > software implementation of one, and include it in Emacs? This is what I did > > for MS-Windows at the time string-collate-lessp was added to Emacs. Why > > cannot macOS users do the same? > > It would be. But how can we ask for this? etc/TODO? Or maybe re-open > this bug report? Anything will be fine with me, but unless the people who are asking you to do these workarounds are motivated enough to sit down and do the job, we will never get there. And guess what effect these workarounds have on their motivation. From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 27 09:01:02 2022 Received: (at 59275) by debbugs.gnu.org; 27 Nov 2022 14:01:02 +0000 Received: from localhost ([127.0.0.1]:42306 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ozIDN-00051R-Dy for submit@debbugs.gnu.org; Sun, 27 Nov 2022 09:01:02 -0500 Received: from mail-lj1-f172.google.com ([209.85.208.172]:34389) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ozIDL-00051L-AS for 59275@debbugs.gnu.org; Sun, 27 Nov 2022 09:01:00 -0500 Received: by mail-lj1-f172.google.com with SMTP id d3so10340678ljl.1 for <59275@debbugs.gnu.org>; Sun, 27 Nov 2022 06:00:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=6SGdlg/pBotK5793f/MCmOq+3VUn158dqGG8MY2pxMs=; b=kDD7u4iaV554FFZS+9/ECC+uEKAyXafElp4C1eK24ehSkaVmH2/O/cUqHYvW0Nsiiy slQ1RRZQzE0gKQkoR9xtOZmhkxxD6P3MsgvUmID2c27xTaD1z0KgKkMEUk7wPTvKGCfu 2nljmO+PgcpbZzsqgAzC7Bj7F34bE60R9TZzb9F0MNMUwiGJ4jIndqCca1TwbAtvgB7o +Sw17m/C0krWQ7yWPM5yQFuntOQ/UfCe4CqzNjj1+bjRZFSsd9qiWoE6nOJZk46KTqZy XeVKr4kfOJqsFatnJ0S5RpHlvfOoaeEzyXdvq7N3FpZ5jM1uw6q5nKAbwTbTJ7oOyQwe JoZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6SGdlg/pBotK5793f/MCmOq+3VUn158dqGG8MY2pxMs=; b=6Y3bHwrf2NZQU4RKbuD06B47LMbWVgR4ijFYIas0WFvEiPxxs+6NZLT1sNYQ9mZtZS Xg9q6TGr+ZqFTUE1OBM7jvRn9tdaKsZtQUT+a7ey9QumbMLhNXhR8hSsmUmnA0f0QhH0 vBSWAJdlicpuBEPYbkaJDOjRfvy2hvb+z/Xn7RtDnEypBrhO8QngtS/FA0MZmCFYcJoq 5dmzz4clA2kVAzqf2fyvx58uYzbhVnqr0m19Jnr6VE0JIl/qFr5Nq41ll/OPuNo4Oxoz Mh+dH0gxYEUiHtrtR/ZS0iEWGi/T0062NF4nq3zCmtlKWlE/RgXTY9Gezhx1H+PDk3aC ChvA== X-Gm-Message-State: ANoB5pmdGKfutFOZQePBCeNDQDdcC/M8cWmUzXfq/knr9J82AeuPYOwn IRhsh++gNVUPGiP7gzOBCco= X-Google-Smtp-Source: AA0mqf6SilwPmIkavpKwNTVLV5/4Vxs+ey3upSUpq/tL7in2eDUkwm/+fw5/5EFMdLRo97MbasZiOw== X-Received: by 2002:a05:651c:82:b0:277:2f15:4179 with SMTP id 2-20020a05651c008200b002772f154179mr9386894ljq.408.1669557653106; Sun, 27 Nov 2022 06:00:53 -0800 (PST) Received: from [192.168.0.101] (nat-0-0.nsk.sibset.net. [5.44.169.188]) by smtp.googlemail.com with ESMTPSA id f18-20020a2eb5b2000000b0026bca725cd0sm926438ljn.39.2022.11.27.06.00.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 27 Nov 2022 06:00:52 -0800 (PST) From: Maxim Nikulin X-Google-Original-From: Maxim Nikulin Message-ID: <2ed46071-5cd1-67ca-bd95-1c2a3060807d@gmail.com> Date: Sun, 27 Nov 2022 21:00:50 +0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac Content-Language: en-US To: Eli Zaretskii , Ihor Radchenko References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> <877czimpz4.fsf@localhost> <83r0xqta0d.fsf@gnu.org> <87v8n2je5q.fsf@localhost> <83k03it6i2.fsf@gnu.org> In-Reply-To: <83k03it6i2.fsf@gnu.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 59275 Cc: 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 26/11/2022 16:22, Eli Zaretskii wrote: >> From: Ihor Radchenko Date: Sat, 26 Nov 2022 08:47:13 +0000 >> >>> 'downcase' uses the buffer-local case table if such is defined for the >>> buffer that happens to be the current when you invoke 'downcase', and that's >>> another cause of inconsistency and user surprises, especially when the >>> strings you compare don't really "belong" to the current buffer. `downcase' is already used in Org for case-insensitive sorting. I am unsure if it appeared earlier than `string-collate-lessp' was introduced. Buffer-local conversion table is not a problem when table rows, list items (text formatting object, not elisp structure), or tags local to the current file are sorted. However when agenda is built from several files current buffer should not affect entries order. Concerning Org, my point is that caseless sorting should be uniform. Currently different functions use distinct approaches and it is more severe inconsistency. >> https://nullprogram.com/blog/2014/06/13/ that mentioned something >> similar about caveats with composition. > > I don't see there anything about sorting or collation. What did I miss? Does not composed/decomposed representation affect comparison result? Emacs-devel thread mentioned earlier in this bug contains a link describing enough issues with string comparison: https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison >>> And we are talking about a single system where these problems happen, which >>> is macOS, right? Wouldn't it be better for "Someone" who uses macOS to just >>> bite the bullet and write a proper collation function, or find a free >>> software implementation of one, and include it in Emacs? My impression was that clang should eventually get better locales support. If so, I am in doubts concerning macOS-specific implementation. I have no a macOS machine, so I may be wrong in my assumption concerning locale implementation there. However Emacs may benefit from its own implementation of collation (based on built-in Unicode character database) used on (almost) all OSes. It will allow using of several locales in parallel without switching of libc locale that is not thread-safe. I consider `downcase' as a kind of workaround (ignore case for poors) that allows graceful degradation in comparison to `string-lessp'. From my point of view e.g. case transformation rule for Turkish I is a minor issue in comparison to complete disregarding of IGNORE-CASE argument at least when results are presented to users. My argument against `downcase' in `string-collate-lessp' is that it may add noticeable performance penalty. Interestingly `compare-strings' uses upcase conversion when the IGNORE-CASE argument is true. I believed that some implementations (unrelated to Emacs) may have problems with e.g. ß and considered downcase as a safer option. From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 27 09:23:33 2022 Received: (at 59275) by debbugs.gnu.org; 27 Nov 2022 14:23:33 +0000 Received: from localhost ([127.0.0.1]:42350 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ozIZA-0005FG-S4 for submit@debbugs.gnu.org; Sun, 27 Nov 2022 09:23:33 -0500 Received: from eggs.gnu.org ([209.51.188.92]:34060) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ozIZ8-0005F9-WB for 59275@debbugs.gnu.org; Sun, 27 Nov 2022 09:23:31 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ozIZ2-0004EX-Uo; Sun, 27 Nov 2022 09:23:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=Vo28Vtg2+6jJc4kWNwM6JlmUEJI+sTgJsk8NQmDZ5Po=; b=Jm8d6D4BTyN7qZMjSN6h Vc3KddJYs0kas2Bu8s4k2euoArP7I7jIxJZqJgcUlHUbIsTzeozTrLxgscI0LHELBv9ipH5kVYOur pvlXf3cOwi6MTojk2KjUcL4XczEEmcShNlZalVadr/g+pawpcDkhT/HNMCa0pjbPlrw05+RgK4k/g IRv/TcXWdzX9bpf3Tn/zGnMuNWVurEw+cpdI7QbWyeGgo+Wh3aLPK2l9MwB67Kfp5hiv+D8YzerXQ iZHZQhX8h2PJvpsvTezkixorXlRHcurgjrT0AFjWMCHozgQd/obkgBzXfU2DjrCxOz5egYpDq6sgT kSxzAv6Uzm9Yfw==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ozIZ1-0006mw-Dz; Sun, 27 Nov 2022 09:23:23 -0500 Date: Sun, 27 Nov 2022 16:23:50 +0200 Message-Id: <83mt8cpjbd.fsf@gnu.org> From: Eli Zaretskii To: Maxim Nikulin In-Reply-To: <2ed46071-5cd1-67ca-bd95-1c2a3060807d@gmail.com> (message from Maxim Nikulin on Sun, 27 Nov 2022 21:00:50 +0700) Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> <877czimpz4.fsf@localhost> <83r0xqta0d.fsf@gnu.org> <87v8n2je5q.fsf@localhost> <83k03it6i2.fsf@gnu.org> <2ed46071-5cd1-67ca-bd95-1c2a3060807d@gmail.com> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 59275 Cc: yantar92@posteo.net, 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Maxim Nikulin > Date: Sun, 27 Nov 2022 21:00:50 +0700 > Cc: 59275@debbugs.gnu.org > > Concerning Org, my point is that caseless sorting should be uniform. You need to work hard to get that. Just using 'downcase' is not enough, and neither is using 'string-collate-equalp'. > >> https://nullprogram.com/blog/2014/06/13/ that mentioned something > >> similar about caveats with composition. > > > > I don't see there anything about sorting or collation. What did I miss? > > Does not composed/decomposed representation affect comparison result? They are different texts, so yes, they do, and they should. If you want to treat such strings as equivalent, you need to work even harder, since Emacs currently doesn't have enough infrastructure to do it right in all cases. > > Emacs-devel thread mentioned earlier in this bug contains a link > describing enough issues with string comparison: > > https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison This is about Python, no? > From my point of view e.g. case transformation rule for Turkish I is a > minor issue Why, Org doesn't want to support Turkish users? > My argument against `downcase' in `string-collate-lessp' is that it may > add noticeable performance penalty. I'd worry about correctness before performance. > Interestingly `compare-strings' uses upcase conversion when the > IGNORE-CASE argument is true. I believed that some implementations > (unrelated to Emacs) may have problems with e.g. ß and considered > downcase as a safer option. Case conversions always have problems. From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 27 10:19:35 2022 Received: (at 59275) by debbugs.gnu.org; 27 Nov 2022 15:19:36 +0000 Received: from localhost ([127.0.0.1]:42481 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ozJRP-0005ha-FS for submit@debbugs.gnu.org; Sun, 27 Nov 2022 10:19:35 -0500 Received: from mail-lf1-f44.google.com ([209.85.167.44]:36623) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ozJRN-0005hU-20 for 59275@debbugs.gnu.org; Sun, 27 Nov 2022 10:19:33 -0500 Received: by mail-lf1-f44.google.com with SMTP id g12so13862758lfh.3 for <59275@debbugs.gnu.org>; Sun, 27 Nov 2022 07:19:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:cc:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=v1IH2fY7UeuaJ360fMY24t1ILDhlFl+99oNH8N+7nPg=; b=VVdfZ716W2Jyfg/5q9YvdY2fJ3VdW6krd64Lmkdn0WqmsKKh5rhwCsWr1X2kvE1WHW VYUxk0zqJsR70S+mWHI5x4EveOiA2JJAocwz1sqG99pkhHstCQPwn1CGIhdPOgKj/gBQ BElm490x5o/zmBcWrK5lQvhloOdg2wOwVs0GXx2DvMv5hLqt7S7O+Pg7B2eXM0ZuZbrz VJoMhzcdbD0N/XApqAdjBeQtCP0KktYarLBtO6dvSJQ/PJqE+2Cr2q/jTT2DgdTZSYuz FAJXgxMj32HgfGE8o6ZMDiAzBnJdLLfPf0DU1dkQkLHiu4IlDsOKnK8KbEhRvzCWe+e5 8QAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:cc:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=v1IH2fY7UeuaJ360fMY24t1ILDhlFl+99oNH8N+7nPg=; b=V0oWctzND2AroqbXef5V+YCzFiAh3FDKm4Pugq0BNciZrZDpTpjC/EWK4ojDmrqrtF GP4Jqf61ppIKEsMQbMIB4zwzT/TBugdKkTVTcQ3iSo166KluwFYVbwV5u33sMnhavflT rAKG02wIJQLHAVp23rYvHdqL9HLnJl0LbcWiDx0eDfDCL/mrraNsJu3Pb+8thoQ5FNCf nqiRVSWRVxJdrcEcNVZy/U6HzY0+doK2XfdETKNWTdwCJd8YdmU2krw4zGVHSemmhhle jyN16LqDZEVeWtlM5o3T1QTPIKR/ooYLsPN1byIc8FXfydcZ2mLRJ8rXdr6IVmquhQqy Ln2g== X-Gm-Message-State: ANoB5pkSTkvNprw2JGi0AtYAfwfO8umOdPh4Ocsur0sGnzqQZ9IamwjB 7BLo2kytR9imTkuITk/QlpY= X-Google-Smtp-Source: AA0mqf5UcY8/JszTJGLAZ/4VG0ukEfLR18c9GvIXV95XcGh7WYJ1YsNAv6MDOEW0yAJIz0GLgP6mJA== X-Received: by 2002:ac2:5f9b:0:b0:4a2:5163:f61b with SMTP id r27-20020ac25f9b000000b004a25163f61bmr15516042lfe.177.1669562366614; Sun, 27 Nov 2022 07:19:26 -0800 (PST) Received: from [192.168.0.101] (nat-0-0.nsk.sibset.net. [5.44.169.188]) by smtp.googlemail.com with ESMTPSA id r7-20020ac24d07000000b0049aa20af00fsm1325692lfi.21.2022.11.27.07.19.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 27 Nov 2022 07:19:26 -0800 (PST) From: Maxim Nikulin X-Google-Original-From: Maxim Nikulin Message-ID: <5fb31dfb-6bde-b895-4c0d-dc1f6eed704c@gmail.com> Date: Sun, 27 Nov 2022 22:19:24 +0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac Content-Language: en-US To: Eli Zaretskii References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> <877czimpz4.fsf@localhost> <83r0xqta0d.fsf@gnu.org> <87v8n2je5q.fsf@localhost> <83k03it6i2.fsf@gnu.org> <2ed46071-5cd1-67ca-bd95-1c2a3060807d@gmail.com> <83mt8cpjbd.fsf@gnu.org> In-Reply-To: <83mt8cpjbd.fsf@gnu.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 59275 Cc: Ihor Radchenko , 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 27/11/2022 21:23, Eli Zaretskii wrote: >> From: Maxim Nikulin Date: Sun, 27 Nov 2022 21:00:50 +0700 >> >> Concerning Org, my point is that caseless sorting should be uniform. > > You need to work hard to get that. Just using 'downcase' is not enough, and > neither is using 'string-collate-equalp'. I do not like that in some functions `string-collate-lessp' with IGNORE-CASE argument is used while strings are passed through `downcase' in other places. When proper locales implementation is available, I believe, it is better to consistently use IGNORE-CASE. I assume that text is presented to users, not serialized to be saved or sent as data. When `string-collate-lessp' disregards IGNORE-CASE, I consider it acceptable to use `downcase' (`upcase' may be worse since Org currently uses `downcase'). It provides reasonable balance of invested efforts and obtained result. >> Does not composed/decomposed representation affect comparison result? > > They are different texts, so yes, they do, and they should. > If you want to treat such strings as equivalent, you need to work even > harder, since Emacs currently doesn't have enough infrastructure to do it > right in all cases. `("semana" "señor" ,(ucs-normalize-NFD-string "señor") "sepia") (sort lst #'string-lessp) => ("semana" "señor" "sepia" "señor") (sort lst #'string-collate-lessp) => ("semana" "señor" "señor" "sepia") `string-collate-lessp' is able to handle at least some cases, it is another argument to use it. >> https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison > > This is about Python, no? The value of this link is a collection of examples that are not obvious for everybody. They are applicable to behavior `string-lessp' vs. `string-collate-lessp' as well. >> From my point of view e.g. case transformation rule for Turkish I is a >> minor issue > > Why, Org doesn't want to support Turkish users? From my point of view it is a minor issue in comparison to (string-collate-lessp "a" "B" "C" t) ; => nil that breaks comparison not only for accented letters. You almost manged to convince Ihor to use `string-lessp' instead of `string-collate-lessp'. I do not think it would improve quality of support of Turkish language. My suggestion is to fall back to `downcase' and `string-lessp' only if `string-collate-lessp' is unable to provide case insensitive comparison. >> My argument against `downcase' in `string-collate-lessp' is that it may >> add noticeable performance penalty. > > I'd worry about correctness before performance. `downcase' with `string-lessp' handles more cases than just `string-lessp' (leaving aside buffer-local conversion tables), so form my point of view the former is more correct. Even `downcase' with fixed "C" locale may give result more consistent with user expectations. My impression that users may be familiar with wide spread problems with sorting. From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 27 10:42:22 2022 Received: (at 59275) by debbugs.gnu.org; 27 Nov 2022 15:42:22 +0000 Received: from localhost ([127.0.0.1]:42721 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ozJnS-0005zU-7s for submit@debbugs.gnu.org; Sun, 27 Nov 2022 10:42:22 -0500 Received: from eggs.gnu.org ([209.51.188.92]:36758) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ozJnQ-0005zJ-Dy for 59275@debbugs.gnu.org; Sun, 27 Nov 2022 10:42:21 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ozJnL-0001hu-0u; Sun, 27 Nov 2022 10:42:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=HWxBLoQCPWRKsVs+Ltu7K1vl74B5MOBBcVEFf0zCF00=; b=ge6ArB9K4SqU5n8qGPYI khq1CQG0yflvZY1+fP80qbKQ6hhkAQDPIwoI33MbMrywI4RB/SBwZouOKpkrD2ebwAmBgBb++Q1Tz K9pIKadNUHOnqvHwNvPkT+30YIvEaft+kfzqBNKJFAKxqg54kov1XA/ajlfydWl9UxVmSBvKaJBBv BhehMDTOoIMdcS14/p/FQvJz1uDk9EL7/NLyWmLF7kYU7KfYCUcFVkujWp4uKeZvc08GJpjJ0Q26K GgmRJImh1mbu/wYjWERgyMJbhAjYcduJHntp0otI+ee3PU+2bsj4fFsOk6AljHO8Xv7/ltJ79Fem1 rHZdTqL+JFeiXg==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ozJnJ-0001WN-Ar; Sun, 27 Nov 2022 10:42:13 -0500 Date: Sun, 27 Nov 2022 17:42:40 +0200 Message-Id: <83ilj0pfnz.fsf@gnu.org> From: Eli Zaretskii To: Maxim Nikulin In-Reply-To: <5fb31dfb-6bde-b895-4c0d-dc1f6eed704c@gmail.com> (message from Maxim Nikulin on Sun, 27 Nov 2022 22:19:24 +0700) Subject: Re: bug#59275: Unexpected return value of `string-collate-lessp' on Mac References: <87zgcsdfma.fsf@localhost> <83iljgib4w.fsf@gnu.org> <87h6z0cl6b.fsf@localhost> <837czwi6yp.fsf@gnu.org> <8735ajel7y.fsf@localhost> <83mt8rgill.fsf@gnu.org> <877czokbpk.fsf@localhost> <8335ac4eo5.fsf@gnu.org> <87ilj7dbms.fsf@localhost> <83sfib172p.fsf@gnu.org> <877czimpz4.fsf@localhost> <83r0xqta0d.fsf@gnu.org> <87v8n2je5q.fsf@localhost> <83k03it6i2.fsf@gnu.org> <2ed46071-5cd1-67ca-bd95-1c2a3060807d@gmail.com> <83mt8cpjbd.fsf@gnu.org> <5fb31dfb-6bde-b895-4c0d-dc1f6eed704c@gmail.com> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 59275 Cc: yantar92@posteo.net, 59275@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Maxim Nikulin > Date: Sun, 27 Nov 2022 22:19:24 +0700 > Cc: Ihor Radchenko , 59275@debbugs.gnu.org > > I do not like that in some functions `string-collate-lessp' with > IGNORE-CASE argument is used while strings are passed through `downcase' > in other places. When proper locales implementation is available, I > believe, it is better to consistently use IGNORE-CASE. I already explained up-thread why we ignore IGNORE-CASE when collation order is not known. I stand by that reasoning. I believe your opinion is based on considering only simple locales, and on the a-priori knowledge what is the locale's collation to begin with, something that Emacs cannot know in that case. > When `string-collate-lessp' disregards IGNORE-CASE, I consider it > acceptable to use `downcase' (`upcase' may be worse since Org currently > uses `downcase'). It provides reasonable balance of invested efforts and > obtained result. We disagree, sorry. > `("semana" "señor" ,(ucs-normalize-NFD-string "señor") "sepia") > (sort lst #'string-lessp) > => ("semana" "señor" "sepia" "señor") > (sort lst #'string-collate-lessp) > => ("semana" "señor" "señor" "sepia") > > `string-collate-lessp' is able to handle at least some cases On what OS and with which libc? And I don't think this is evidence of collation knowing about equivalent sequences. It is most probable the side effect of collation ignoring Latin accents altogether. > >> https://stackoverflow.com/questions/319426/how-do-i-do-a-case-insensitive-string-comparison > > > > This is about Python, no? > > The value of this link is a collection of examples that are not obvious > for everybody. They are applicable to behavior `string-lessp' vs. > `string-collate-lessp' as well. Which parts are applicable, in your opinion, and in what way? > >> From my point of view e.g. case transformation rule for Turkish I is a > >> minor issue > > > > Why, Org doesn't want to support Turkish users? > > From my point of view it is a minor issue in comparison to > > (string-collate-lessp "a" "B" "C" t) ; => nil > > that breaks comparison not only for accented letters. Org is free to make such misguided decisions, but Emacs won't. We cannot decide that some locale is "minor" and others are "major". My suggestion is to look for a solution that works in any locale. > You almost manged to convince Ihor to use `string-lessp' instead of > `string-collate-lessp'. I do not think it would improve quality of > support of Turkish language. I didn't try to convince Ihor of anything, just point out the pitfalls of using locale-specific collation order in portable programs. I said back then that I don't know enough to evaluate your decisions. Once you understand the subtle issues with these APIs, it is your call to decide how to solve your particular problems. > My suggestion is to fall back to `downcase' and `string-lessp' only if > `string-collate-lessp' is unable to provide case insensitive comparison. You can do that in Org if that's the decision of the Org developers. Emacs cannot do that automatically for the reasons I explained up-thread. > >> My argument against `downcase' in `string-collate-lessp' is that it may > >> add noticeable performance penalty. > > > > I'd worry about correctness before performance. > > `downcase' with `string-lessp' handles more cases than just > `string-lessp' (leaving aside buffer-local conversion tables), so form > my point of view the former is more correct. I'm quite sure this is only true for the cases that you considered, not in general. > Even `downcase' with fixed "C" locale may give result more consistent with > user expectations. How does it help on systems where locale-specific collation is not accessible to Emacs? > My impression that users may be familiar with wide spread problems with > sorting. Not IME. But that's a separate issue, and I don't pretend to know Org users better than you do, so I will defer to you on this one. From unknown Sun Jun 22 00:14:55 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 26 Dec 2022 12:24:09 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator