From unknown Mon Jun 23 07:49:03 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#40540 <40540@debbugs.gnu.org> To: bug#40540 <40540@debbugs.gnu.org> Subject: Status: Faster sort with locale Reply-To: bug#40540 <40540@debbugs.gnu.org> Date: Mon, 23 Jun 2025 14:49:03 +0000 retitle 40540 Faster sort with locale reassign 40540 coreutils submitter 40540 Ole Tange severity 40540 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 10 09:19:39 2020 Received: (at submit) by debbugs.gnu.org; 10 Apr 2020 13:19:39 +0000 Received: from localhost ([127.0.0.1]:54883 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jMtZL-0006Kz-0n for submit@debbugs.gnu.org; Fri, 10 Apr 2020 09:19:39 -0400 Received: from lists.gnu.org ([209.51.188.17]:34017) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jMtZH-0006Km-DV for submit@debbugs.gnu.org; Fri, 10 Apr 2020 09:19:37 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:42741) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jMtZF-0006nO-Jj for bug-coreutils@gnu.org; Fri, 10 Apr 2020 09:19:34 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jMtZD-0004jm-TS for bug-coreutils@gnu.org; Fri, 10 Apr 2020 09:19:33 -0400 Received: from mail-oi1-f171.google.com ([209.85.167.171]:45943) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jMtZD-0004j6-Op for bug-coreutils@gnu.org; Fri, 10 Apr 2020 09:19:31 -0400 Received: by mail-oi1-f171.google.com with SMTP id k133so643353oih.12 for ; Fri, 10 Apr 2020 06:19:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=9Y136JiQrqVWGfozzu5j4E4p9m1o7tRkaxHWMaPXdK8=; b=MYLn6zq3czhMv8czQaAzcfERV7g67t+R+cVlEHvrlawbuv6/SqA16W11KmiWLsAAf2 QCImmeTfgvEZo5u1GuM3K7D5e0GzCkM2XktixAfhBjUjqv3CK2Z8dFL3izfGPDc5iMWQ oHxGIE/ANFe/DRS+ZX4x9KnjbKokfCQyCLX5l1ZtaXPNJxiVCCaL3P1vm9O0nk18gi3x 3TjmEgNa3YgSBmDLaK0/DHfBAdCuPm54ZYQp72f0TBGXwYde7a41XhvwQoPSmU2CHvtR /tpUr2nnTbL2eb/iAVVxovEOTaVtvsIm1MjrDM5oIYXKc0HcK9DNp/tM1ARzlDvLAt7h SWcw== X-Gm-Message-State: AGi0PuZOHz5yT1RBV5C1YT7N3OUNGd+JO9iNrRALG70RW3StiYgkpGZN d899aZXMXU4mYs7RSuSde96TcWTk4B58xJAZmO8wy4Ay X-Google-Smtp-Source: APiQypJ7Ff53GLmTzzfGWPIUe+NZfS3Ni2kCGgLk1CqJY6Kr+Hee4JnQKwkqCi7KqZpKdHFAKGPmiB9zZ5lbxIVdzZ4= X-Received: by 2002:a05:6808:11:: with SMTP id u17mr747173oic.87.1586524770486; Fri, 10 Apr 2020 06:19:30 -0700 (PDT) MIME-Version: 1.0 From: Ole Tange Date: Fri, 10 Apr 2020 15:19:19 +0200 Message-ID: Subject: Faster sort with locale To: bug-coreutils@gnu.org Content-Type: text/plain; charset="UTF-8" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.85.167.171 X-Spam-Score: 2.8 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: I have noticed that if locale is set, then sort becomes much slower. I imagine that it is because instead of doing simple_compare(string1, string2) Content analysis details: (2.8 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.2 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 1.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (ole.tange[at]gmail.com) -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at https://www.dnswl.org/, low trust [209.51.188.17 listed in list.dnswl.org] 0.2 FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different 2.0 SPOOFED_FREEMAIL No description available. X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.2 (/) I have noticed that if locale is set, then sort becomes much slower. I imagine that it is because instead of doing simple_compare(string1,string2) it does: localized_compare(string1,string2) But would it be possible to convert the input string1 into a string in a generalized format, which would sort the same way as the localized sort, but using a simple compare? Like this: string1_general = localize(string1) string2_general = localize(string2) simple_compare(string1_general,string2_general) If that is possible, then localize() can be done by other cores in advance and thereby offload the "primary" core. /Ole From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 10 14:56:58 2020 Received: (at 40540) by debbugs.gnu.org; 10 Apr 2020 18:56:58 +0000 Received: from localhost ([127.0.0.1]:56025 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jMypm-0008Vf-LW for submit@debbugs.gnu.org; Fri, 10 Apr 2020 14:56:58 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:37486) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jMypk-0008VO-Os for 40540@debbugs.gnu.org; Fri, 10 Apr 2020 14:56:57 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id EACF91600D0; Fri, 10 Apr 2020 11:56:48 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id KPzyM6ywGIWF; Fri, 10 Apr 2020 11:56:48 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 56CBF1600D9; Fri, 10 Apr 2020 11:56:48 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id cnxscFDH-_ZO; Fri, 10 Apr 2020 11:56:48 -0700 (PDT) Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 2C15F1600D0; Fri, 10 Apr 2020 11:56:48 -0700 (PDT) Subject: Re: bug#40540: Faster sort with locale To: Ole Tange References: From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <83d22efe-4e2b-4823-e1a3-08bb594654e3@cs.ucla.edu> Date: Fri, 10 Apr 2020 11:56:47 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 40540 Cc: 40540@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) On 4/10/20 6:19 AM, Ole Tange wrote: > But would it be possible to convert the input string1 into a string in > a generalized format, which would sort the same way as the localized > sort, but using a simple compare? I tried doing that a long time ago by using strxfrm, but it made 'sort' significantly slower. You're welcome to try again; perhaps things have changed.