From unknown Thu Sep 11 20:48:57 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#16004 <16004@debbugs.gnu.org> To: bug#16004 <16004@debbugs.gnu.org> Subject: Status: Multicore Core-utils Reply-To: bug#16004 <16004@debbugs.gnu.org> Date: Fri, 12 Sep 2025 03:48:57 +0000 retitle 16004 Multicore Core-utils reassign 16004 coreutils submitter 16004 CDR severity 16004 wishlist tag 16004 notabug thanks From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 29 17:19:13 2013 Received: (at submit) by debbugs.gnu.org; 29 Nov 2013 22:19:13 +0000 Received: from localhost ([127.0.0.1]:50659 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmWP2-0003kh-Kk for submit@debbugs.gnu.org; Fri, 29 Nov 2013 17:19:12 -0500 Received: from eggs.gnu.org ([208.118.235.92]:32963) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmWOz-0003kT-R2 for submit@debbugs.gnu.org; Fri, 29 Nov 2013 17:19:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VmWOt-0002OR-LT for submit@debbugs.gnu.org; Fri, 29 Nov 2013 17:19:04 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:52505) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VmWOt-0002OM-Ii for submit@debbugs.gnu.org; Fri, 29 Nov 2013 17:19:03 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34672) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VmWOs-00086h-OC for bug-coreutils@gnu.org; Fri, 29 Nov 2013 17:19:03 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VmWOr-0002Nx-S2 for bug-coreutils@gnu.org; Fri, 29 Nov 2013 17:19:02 -0500 Received: from mail-wg0-x22b.google.com ([2a00:1450:400c:c00::22b]:51073) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VmWOr-0002Nc-L1; Fri, 29 Nov 2013 17:19:01 -0500 Received: by mail-wg0-f43.google.com with SMTP id k14so7287427wgh.10 for ; Fri, 29 Nov 2013 14:18:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=EKL5AiQLlg0f/fmXKRMQOmeuOzIzPiAORYVxsGj+Sc4=; b=wmMCqaXX1s+XA7zPjQEdYqXqfo2ThyTxUVzr4NWOoYGcz4+FBmIfurStaQSM+JiXOR RTtjgoB/FlRX83vwR5xHAbOcDKlWZImv3Hy49cXLab1zO44T5z8zt4qDsyDUhgzocab5 zlYD/S4pecu2iI2THc90EZu6Xz3WIykIniEmP0rXDanvMbFMR1jwO1QTJkoC1IWexGjn L+h70nQA/yh7PZT8T9kh0DrdqKiNzIKxSRUWvdtCkWrtUpflB7+aHlE+DOe2SOcp/t6G Bqa+oFlATAGhsBcDMkuIKImjAmHR3TK5bJickoaR9Ajf/U2xYqOW2jywlYK0R/q66Ib6 NX6w== MIME-Version: 1.0 X-Received: by 10.180.19.201 with SMTP id h9mr8565800wie.36.1385763539817; Fri, 29 Nov 2013 14:18:59 -0800 (PST) Received: by 10.180.0.68 with HTTP; Fri, 29 Nov 2013 14:18:59 -0800 (PST) Date: Fri, 29 Nov 2013 17:18:59 -0500 Message-ID: Subject: Multicore Core-utils From: CDR To: coreutils@gnu.org, bug-coreutils@gnu.org Content-Type: text/plain; charset=ISO-8859-1 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Dear friends In case this email is read by Richard M. Stallman and David MacKenzie. I need a multi-core version of "comm" and "join". The current version only uses one core and it takes hours to process two files, with 4 columns and 510 million lines. I need to process those files every night. I wonder if any plan exists to jump to multicore. If not, is there a volunteer that can do the job, for a reasonable fee? I am one-man company but I guess we all need a parallel-processing-capable core-utils. Yours Philip Orleans From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 29 17:29:08 2013 Received: (at submit) by debbugs.gnu.org; 29 Nov 2013 22:29:08 +0000 Received: from localhost ([127.0.0.1]:50687 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmWYe-00040j-CE for submit@debbugs.gnu.org; Fri, 29 Nov 2013 17:29:08 -0500 Received: from eggs.gnu.org ([208.118.235.92]:35168) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmWYd-00040Q-55 for submit@debbugs.gnu.org; Fri, 29 Nov 2013 17:29:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VmWYO-0005Qu-98 for submit@debbugs.gnu.org; Fri, 29 Nov 2013 17:29:01 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_05 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:60142) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VmWYO-0005Qq-6E for submit@debbugs.gnu.org; Fri, 29 Nov 2013 17:28:52 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36859) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VmWYG-0003Cd-Rx for bug-coreutils@gnu.org; Fri, 29 Nov 2013 17:28:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VmWY9-0005PI-It for bug-coreutils@gnu.org; Fri, 29 Nov 2013 17:28:44 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:47881) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VmWY9-0005PC-D3 for bug-coreutils@gnu.org; Fri, 29 Nov 2013 17:28:37 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id C4CBE39E8108 for ; Fri, 29 Nov 2013 14:28:35 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8-T3JKwMloWQ for ; Fri, 29 Nov 2013 14:28:35 -0800 (PST) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 7B96A39E80FF for ; Fri, 29 Nov 2013 14:28:35 -0800 (PST) Message-ID: <5299150B.1010808@cs.ucla.edu> Date: Fri, 29 Nov 2013 14:28:27 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 MIME-Version: 1.0 To: bug-coreutils@gnu.org Subject: Re: bug#16004: Multicore Core-utils References: In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) CDR wrote: > I wonder if any plan exists to jump to multicore. There's no specific plan, and it'd be nice to have coreutils run faster on multicore machines. From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 29 18:06:04 2013 Received: (at 16004) by debbugs.gnu.org; 29 Nov 2013 23:06:04 +0000 Received: from localhost ([127.0.0.1]:50721 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmX8O-0004yH-3n for submit@debbugs.gnu.org; Fri, 29 Nov 2013 18:06:04 -0500 Received: from mail2.vodafone.ie ([213.233.128.44]:28766) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmX8L-0004xg-F2 for 16004@debbugs.gnu.org; Fri, 29 Nov 2013 18:06:02 -0500 Received: from unknown (HELO [192.168.1.79]) ([109.77.186.208]) by mail2.vodafone.ie with ESMTP; 29 Nov 2013 23:05:55 +0000 Message-ID: <52991DD2.2000909@draigBrady.com> Date: Fri, 29 Nov 2013 23:05:54 +0000 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: CDR Subject: Re: bug#16004: Multicore Core-utils References: In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16004 Cc: 16004@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 11/29/2013 10:18 PM, CDR wrote: > Dear friends > > In case this email is read by Richard M. Stallman and David MacKenzie. > I need a multi-core version of "comm" and "join". The current version > only uses one core and it takes hours to process two files, with 4 > columns and 510 million lines. I need to process those files every > night. > > I wonder if any plan exists to jump to multicore. If not, is there a > volunteer that can do the job, for a reasonable fee? I am one-man > company but I guess we all need a parallel-processing-capable > core-utils. Note comm and join need a sorted file and sort(1) is already multicore aware. Since sorting needs to implicitly handle all the input before generating output, it makes sense for sort(1) to handle that itself. Also the sorting operation itself is relative expensive compared to the corresponding I/O involved, which further justifies the multicore knowledge within sort(1). So if you're dealing with an already sorted file, it then often depends on the I/O for that file which could be a bottleneck. For example if your data file that "takes hours to process" was on a mechanical hard disk, then processing with a single thread/process is probably best, otherwise multiple ones would be just seeking the disk head and slow things down. The increasing prevalence of SSDs changes the game here though, so that separate accesses to the same file could very well be a win. BTW you haven't said whether you're I/O or CPU bound. I presume you're CPU bound given you're mentioning multicore, which is a little surprising given the relatively inexpensive operations done within comm(1) and join(1). It's worth mentioning locales here, because if you don't need the relatively expensive locale matching rules, you can disable those before a run by setting: export LC_ALL=C If that did change things to be I/O bound again then you might consider putting each file on separate devices, to gain from parallel I/O operations. So if you're still CPU bound, a more general technique to consider, is splitting up the file to be processed by separate _processes_. Now this is more sorted to tools that don't have relevance on the relative order of particular lines which unfortunately comm(1) and join(1) do, but perhaps there is some way you could split your data to more files when generating it, which could then be fed to separate join(1) processes. thanks, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Thu Oct 11 18:15:48 2018 Received: (at control) by debbugs.gnu.org; 11 Oct 2018 22:15:48 +0000 Received: from localhost ([127.0.0.1]:45654 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gAjFE-0002PD-5i for submit@debbugs.gnu.org; Thu, 11 Oct 2018 18:15:48 -0400 Received: from mail-pf1-f181.google.com ([209.85.210.181]:39391) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gAjFC-0002Ow-D5 for control@debbugs.gnu.org; Thu, 11 Oct 2018 18:15:46 -0400 Received: by mail-pf1-f181.google.com with SMTP id c25-v6so5102440pfe.6 for ; Thu, 11 Oct 2018 15:15:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:message-id:date:user-agent:mime-version:content-language :content-transfer-encoding; bh=aCMPf+7kyoxN+rBTekFZz3oWrRjDvMuc6vTuYT21Ayk=; b=jQSTj1oIOh/kU+0QgvVa9jAf+WwO4i158LaHlPg5KcDpTdIIMVsexf46xGQWHnYLM8 +zReHcVThWS5DKY2PtiaYQGoq0/qHoDTNNNQ7BTSHUqQwf5HTyh5a6RobQNQugQFtydJ Rt2cbxuXfVQHEozp/uVR62CVqhNPR6aaaYGnlGU20ZoIjt4uN5rnGh2BcOMDV5upRjvo 6dwGWMSXjJC8ddSe+KJfFeLw6KnDTzBrUeAiZj247qIw5PooKZAmz9jkgQfx2LmRWPJC 98xDJquUSuqjjWj758gJwUlPLaWqlEn+H43oIvxpqV7epCT+uuams8HD7MkjZy1uo3/8 nwXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=aCMPf+7kyoxN+rBTekFZz3oWrRjDvMuc6vTuYT21Ayk=; b=XAcJmAEOey9bb2WzHT6+srgmRjRzR4cDJbuTkbhLNQxG/XSET2F5+c/jjtle2ISd0V YsQW4UwM1Z1NbdhHaf5HYWoZ0bZ6A7oO7bzRAWXn4000cNozruOCjhAqqpwg2RAkePfP CboI+2L8zn9621bO6EApDM3/7ZZVOJt7Ca3HO7PZ4EXgZXAorXCKS93G/O64ggjklOud 3XKuBNBnYabnsHdE00C3fOPEUO9WVB+MFDzrf7X37xHyNiFxeTz24yLJJxTZ5gEYlMWQ s0j/PGkwzaDWVWEWjs5iKC2GQvuq9P92h+5Y5lCllSnlj8/pChugcBAQtQhPz3RZnIbG h/8A== X-Gm-Message-State: ABuFfogJB04BZVDOwb0VfdxCQKygSQpusXbFwqFvlObU9/zrOBl+BIHK lIYX2UAUTb546doifRv5Qh/++IfM X-Google-Smtp-Source: ACcGV63/p1ZxnixeZlKyAhWdRojntCOIqhMBML/uGDJBsBc9xOlhDy1myTnczOQ7SPvTJ1b7zRMyWA== X-Received: by 2002:a62:3384:: with SMTP id z126-v6mr3277051pfz.85.1539296139769; Thu, 11 Oct 2018 15:15:39 -0700 (PDT) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id t15-v6sm66768319pfj.7.2018.10.11.15.15.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Oct 2018 15:15:38 -0700 (PDT) To: control@debbugs.gnu.org From: Assaf Gordon Message-ID: Date: Thu, 11 Oct 2018 16:15:36 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 2.0 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: tags 15308 notabug close 15308 tags 15634 notabug close 15634 tags 16004 notabug severity 16004 wishlist close 16004 [...] Content analysis details: (2.0 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_PASS SPF: sender matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (assafgordon[at]gmail.com) -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.210.181 listed in list.dnswl.org] 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.210.181 listed in wl.mailspike.net] 1.8 MISSING_SUBJECT Missing Subject: header 0.2 NO_SUBJECT Extra score for no subject 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) tags 15308 notabug close 15308 tags 15634 notabug close 15634 tags 16004 notabug severity 16004 wishlist close 16004 tags 16245 notabug close 16245 tags 16249 notabug close 16249 tags 16249 notabug close 16249 close 16309 tags 16468 notabug close 16468 tag 16530 notabug close 16530 tags 16718 notabug close 16718 tags 16742 +moreinfo close 16742 tags 16831 wontfix close 16831 tags 16838 wontfix close 16838 tags 16872 fixed close 16872 close 16945 close 17224 tags 17503 + notabug close 17503 close 17546 tags 17904 notabug close 17904 From unknown Thu Sep 11 20:48:57 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Fri, 09 Nov 2018 12:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator