From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 02 19:21:18 2023 Received: (at submit) by debbugs.gnu.org; 3 Jan 2023 00:21:18 +0000 Received: from localhost ([127.0.0.1]:44339 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pCV3N-0000QW-Dq for submit@debbugs.gnu.org; Mon, 02 Jan 2023 19:21:18 -0500 Received: from lists.gnu.org ([209.51.188.17]:37968) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pCRkV-0003O3-A5 for submit@debbugs.gnu.org; Mon, 02 Jan 2023 15:49:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pCRkN-0001Dv-GY for bug-grep@gnu.org; Mon, 02 Jan 2023 15:49:28 -0500 Received: from mail-pj1-x102d.google.com ([2607:f8b0:4864:20::102d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pCRkL-0006ss-Sl for bug-grep@gnu.org; Mon, 02 Jan 2023 15:49:27 -0500 Received: by mail-pj1-x102d.google.com with SMTP id 60-20020a17090a0fc200b002264ebad204so7528461pjz.1 for ; Mon, 02 Jan 2023 12:49:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=iGsFLjN3sBQM/cCbEVpMF+Yjthf1zgCYgzgj6dGJPsQ=; b=Ez7mx8a6G5hbD2qAxXdVC61TUHVZQBRly2s2qBZrZb6DTmh3YFzOTvj5JLWTtrTotX KgQbGn7bU10I6CzvRUSax4aAbtjFEe0SVvKTxjZU3j2MW46xhKJ3qbIFmr8OpDRm7L20 mZmyDpq45vRtCfZuYD8xZvHJMJ7ViMLWyTloOeEKsdePzCROEjIKAweOlcrYRKjW2ISE iVMiaYR0cbSsegaa8Zosd1EPqrU/KDTnEWha+qxoJoxm3UeGo0yRAdB/TwnIPuPS7t6s P/8U/VDhG7B/rgQQeaHoDqCt2hhwUmxL7jB0VTnp/EqHXU4bXDnjqU7Q0zHMeITNJmy9 5ffg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=iGsFLjN3sBQM/cCbEVpMF+Yjthf1zgCYgzgj6dGJPsQ=; b=Ck8FW46wvQdsDgastJeFEz+fhQvtwHOXPXjW6PT+48zRMauaooJKRn9MpZH14mIJKh 3rCrYVSp6XehNcCI5BKAF4BNwbWsXK4EL5BIWhhsQKGW9J7cVhFLSKMb5GYDtMTaF1v2 sXZXRejq6Dqt1Bq5Ab7zJJbVLeLU9RROi7m6zqzWb3+VzPZpwj1vaAvIicjopAUeRLRK /Bu1Od4th7ms3SYZIf0J1bQwM7y1YMH2Ag4FxCa86ALTaeM0UJz2/17gUr+Ag8TTtqVH FgRk9e4Bd/Y0T3TH1eGIiThrqJBwt9quyG64PduuEBnuEoNyGm/itodHHHFlM+6Fq6uN 74Aw== X-Gm-Message-State: AFqh2kroBDo3UCtPw2XITyLUtzpb2W3myL4ROIdt7Vk3PcQdhZrmei0q wrHgB+qDGe2s7zVtzU13ObyvJn6mI3DkkPDq0yfgZsoWJfo= X-Google-Smtp-Source: AMrXdXtsqiUck7P9LxMZO67tu/h6209D4R4WxwfprNkUStLTf3CmVVvnZPENCAY/dJpk2UTKZiUmKBx5LWqzh4LEZow= X-Received: by 2002:a17:90a:fe93:b0:223:4a7d:878d with SMTP id co19-20020a17090afe9300b002234a7d878dmr3898903pjb.29.1672692563881; Mon, 02 Jan 2023 12:49:23 -0800 (PST) MIME-Version: 1.0 From: Eike Dierks Date: Mon, 2 Jan 2023 21:49:13 +0100 Message-ID: Subject: feature: parallel grep --recursive To: bug-grep@gnu.org Content-Type: text/plain; charset="UTF-8" Received-SPF: pass client-ip=2607:f8b0:4864:20::102d; envelope-from=foonlyboy@gmail.com; helo=mail-pj1-x102d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 02 Jan 2023 19:21:14 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) Hi at the gnu grep development team I'd like to suggest a new feature for: grep --recursive The grep --recursive should work in parallel. Rational: This could speed up the grep by the numbers of threads Currently, the --recursive option works on every file in sequence. Instead, I want to start some greps in parallel. If we want to be good, then we would parse the expression first (which might be expensive) and then fork on the files. The master grep process would then collect the results, so that the results would be serialized to be identical with the current implementation. I'd like to suggest a --fast option, where results show up, as soon as they are found. .... I am fed up with all that precomputed indexes. I want to grep it really fast now. I expect that the file access is fast now, but has latency. I want the grep to saturate the machine. // job card . From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 02 21:35:10 2023 Received: (at submit) by debbugs.gnu.org; 3 Jan 2023 02:35:10 +0000 Received: from localhost ([127.0.0.1]:44403 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pCX8w-0003qC-BI for submit@debbugs.gnu.org; Mon, 02 Jan 2023 21:35:10 -0500 Received: from lists.gnu.org ([209.51.188.17]:33340) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pCX8u-0003q4-3E for submit@debbugs.gnu.org; Mon, 02 Jan 2023 21:35:08 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pCX8q-00062n-AX for bug-grep@gnu.org; Mon, 02 Jan 2023 21:35:07 -0500 Received: from out1-smtp.messagingengine.com ([66.111.4.25]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pCX8o-0007bm-8a for bug-grep@gnu.org; Mon, 02 Jan 2023 21:35:03 -0500 Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id 43F4E5C00B3 for ; Mon, 2 Jan 2023 21:35:00 -0500 (EST) Received: from imap51 ([10.202.2.101]) by compute6.internal (MEProxy); Mon, 02 Jan 2023 21:35:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1672713300; x=1672799700; bh=oPNbA7dbSc8BH6V1Q+cCvZOMRR6T B3tJE0hao5Hqpdw=; b=j05cZ5E8qtyZiGZTduqTsO/iwGdtjdoRmfzFczvRHptf bbdQw2o4vQ0RCPEYLuM3b9Jmv62UpbIC/DQiufybJjpv9aYjaUqf4Ccf8kIg15B4 VV4sl12TK6gNr/G78wPCvAT5PBKWUaT6zH735GtRwSBPE7b4HDZ252qgaJQf7Y3u JBdScSXwNSN7EKX5uKFgeqZw1jvfa36TioPkTUpwfAgOuOowBaISy2KSMDqyYHJR tq1pyVPsj4hc29f6lLqB3WP1RikkVOCNvJj4Bcti7gbQKLX9NrC+YAube/chz2Er y6d2XBefjf+8vw5aRehPihePVqoHMouEUZQjYpRe1Q== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrjeefgdehtdcutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhepofgfggfkjghffffhvffutgesthdtre dtreertdenucfhrhhomhepfdfrrghulhculfgrtghkshhonhdfuceophhjsehushgrrdhn vghtqeenucggtffrrghtthgvrhhnpefhvedtheejtefhgeejtdeigffftdelgfdvkeeufe ehjeekveevhedufffgudeljeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhep mhgrihhlfhhrohhmpehpjhesuhhsrgdrnhgvth X-ME-Proxy: Feedback-ID: i047841af:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id CBF30B60086; Mon, 2 Jan 2023 21:34:59 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.7.0-alpha0-1185-g841157300a-fm-20221208.002-g84115730 Mime-Version: 1.0 Message-Id: <04d86085-a044-4b9b-8451-b6e0c3586bb3@app.fastmail.com> In-Reply-To: References: Date: Mon, 02 Jan 2023 20:34:39 -0600 From: "Paul Jackson" To: bug-grep@gnu.org Subject: Re: bug#60506: feature: parallel grep --recursive Content-Type: text/plain Received-SPF: neutral client-ip=66.111.4.25; envelope-from=pj@usa.net; helo=out1-smtp.messagingengine.com X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_NEUTRAL=0.779 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.6 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.6 (--) There's no need for special logic in grep to run parallel grep's. The "parallel" command can handle that for you. For example, on the 12 core, 24 thread Ryzen CPU that I am using: find $HOME -xdev -type f -ctime -333 | wc -l ## counts 136126 files. find $HOME -xdev -type f -ctime -333 | parallel -m grep -l foobar | wc -l ## takes about 13 seconds find $HOME -xdev -type f -ctime -333 | xargs -d '\n' grep -l foobar | wc -l ## takes about 52 seconds The above parallel invocation ran 24 grep commands in parallel, and took about 1/4 the time, otherwise performing rather like xargs, which ran one grep command at a time. (Granted, reading either the 'parallel' or 'xargs' man pages is not easy .) -- Paul Jackson pj@usa.net From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 02 21:49:04 2023 Received: (at 60506) by debbugs.gnu.org; 3 Jan 2023 02:49:04 +0000 Received: from localhost ([127.0.0.1]:44407 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pCXMN-0004Dv-Pk for submit@debbugs.gnu.org; Mon, 02 Jan 2023 21:49:04 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:57358) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pCXMI-0004DK-5L for 60506@debbugs.gnu.org; Mon, 02 Jan 2023 21:49:02 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id DE8F2160044; Mon, 2 Jan 2023 18:48:50 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id yQj8tRKMvcRo; Mon, 2 Jan 2023 18:48:50 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 3776C160045; Mon, 2 Jan 2023 18:48:50 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.9.2 zimbra.cs.ucla.edu 3776C160045 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=78364E5A-2AF3-11ED-87FA-8298ECA2D365; t=1672714130; bh=a3Anb5xxlFFoRHMTQAYDQvA8yY+tyHdXS9UeQM35mRY=; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type: Content-Transfer-Encoding; b=Ed4DuX8cR6YmF2ul/xclcOqyRW/trGujR68XcUD56lG9nh8CneFV8GTl+ez6SRwIG +Y3m12AxQNh4v0nR7IfCHkmqHnybmJHiQoMnpjmLpR9n5u46OECCqBEMCCAUDnttAq /Yb8j45JZJQ2cN0frIwg3W6CNPQn4mKASg1+9Ecw= X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id hjzqh-yIerpF; Mon, 2 Jan 2023 18:48:50 -0800 (PST) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 0C84C160044; Mon, 2 Jan 2023 18:48:50 -0800 (PST) Message-ID: <31e8cb06-606f-ea05-7d99-08e0311920a5@cs.ucla.edu> Date: Mon, 2 Jan 2023 18:48:49 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#60506: feature: parallel grep --recursive Content-Language: en-US To: Paul Jackson , 60506@debbugs.gnu.org References: <04d86085-a044-4b9b-8451-b6e0c3586bb3@app.fastmail.com> From: Paul Eggert Organization: UCLA Computer Science Department In-Reply-To: <04d86085-a044-4b9b-8451-b6e0c3586bb3@app.fastmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -3.4 (---) X-Debbugs-Envelope-To: 60506 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.4 (----) On 2023-01-02 18:34, Paul Jackson wrote: > There's no need for special logic in grep to run parallel grep's. There might be, if one wants to use a parallel grep to search a single large file. From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 02 21:59:25 2023 Received: (at 60506) by debbugs.gnu.org; 3 Jan 2023 02:59:25 +0000 Received: from localhost ([127.0.0.1]:44426 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pCXWP-0004TW-0i for submit@debbugs.gnu.org; Mon, 02 Jan 2023 21:59:25 -0500 Received: from out4-smtp.messagingengine.com ([66.111.4.28]:58123) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pCXWN-0004TK-My for 60506@debbugs.gnu.org; Mon, 02 Jan 2023 21:59:24 -0500 Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.nyi.internal (Postfix) with ESMTP id 5237E5C0103; Mon, 2 Jan 2023 21:59:18 -0500 (EST) Received: from imap51 ([10.202.2.101]) by compute6.internal (MEProxy); Mon, 02 Jan 2023 21:59:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1672714758; x=1672801158; bh=YreBbwxX1LGnJ9S7cKaJWSqM1fvm jLb+hcQ9OMtwFM0=; b=t67jRLJbvuRycBG7myxtiFv7MU+LF/5YQ1TVRW4as6xg NDsJpMTk9qBTJ3gREFZ8VTlFFA3oRqs47mmcJBbSGmbLt3UDokjx0VsaDOcLAk0K u8K8qkF7+oCaYSy9I+waEwc41JIag4/tMjr1oXQZJ/w+zs6pYrcQB032e+NtUFHi nIRDuTx7qO/buqlY5/N4veRIGhLKFT/NqTu+N4orIM2oc+MN4lT5CELofj7sweeN kn3oxXPk6s0y0rRyB4vYGsn6kFxiHG+7lT1o7jXO92fjNeMRx7oNEZfkWnVsjczw jPqNhNmCUUhlvXcratV8TU1m9TvqjU9mODZWTAaE/w== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrjeefgdehhecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefofgggkfgjfhffhffvufgtsehttdertderredtnecuhfhrohhmpedfrfgruhhl ucflrggtkhhsohhnfdcuoehpjhesuhhsrgdrnhgvtheqnecuggftrfgrthhtvghrnhephf evtdehjeethfegjedtiefgffdtlefgvdekueefheejkeevveehudffgfduleejnecuvehl uhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepphhjsehushgrrd hnvght X-ME-Proxy: Feedback-ID: i047841af:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 2932BB60086; Mon, 2 Jan 2023 21:59:18 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.7.0-alpha0-1185-g841157300a-fm-20221208.002-g84115730 Mime-Version: 1.0 Message-Id: <68afdacb-e8db-41c4-a3bd-1ce5ddd185ac@app.fastmail.com> In-Reply-To: <31e8cb06-606f-ea05-7d99-08e0311920a5@cs.ucla.edu> References: <04d86085-a044-4b9b-8451-b6e0c3586bb3@app.fastmail.com> <31e8cb06-606f-ea05-7d99-08e0311920a5@cs.ucla.edu> Date: Mon, 02 Jan 2023 20:56:23 -0600 From: "Paul Jackson" To: "Paul Eggert" , 60506@debbugs.gnu.org Subject: Re: bug#60506: feature: parallel grep --recursive Content-Type: text/plain X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 60506 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) << a parallel grep to search a single large file >> I'm but one user, and a rather idiosyncratic user at that, but for my usage patterns, the specialized logic that it would take to run a parallelized grep on a large file would likely not shrink the elapsed time enough to justify the coding, documentation, and maintenance effort. I would expect the time to read the large file in from disk to dominate the total elapsed time in any case. (or maybe I am just jealous that I didn't think of that parallel grep use case myself .) -- Paul Jackson pj@usa.net From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 03 17:33:28 2023 Received: (at 60506) by debbugs.gnu.org; 3 Jan 2023 22:33:28 +0000 Received: from localhost ([127.0.0.1]:47011 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pCpqa-0001yV-Ex for submit@debbugs.gnu.org; Tue, 03 Jan 2023 17:33:28 -0500 Received: from wout2-smtp.messagingengine.com ([64.147.123.25]:38735) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pCpqX-0001yI-PH for 60506@debbugs.gnu.org; Tue, 03 Jan 2023 17:33:26 -0500 Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.west.internal (Postfix) with ESMTP id 8A9833200945; Tue, 3 Jan 2023 17:33:19 -0500 (EST) Received: from imap51 ([10.202.2.101]) by compute6.internal (MEProxy); Tue, 03 Jan 2023 17:33:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1672785199; x=1672871599; bh=qIhnPwU5g44s+774c2EkmlFgQDeb eVuQ86+T0JVEh9k=; b=g7/YG8j/hyIz/xztkykoVeDytKTGMsCRg3RJXEqnDICC MiTE7nuy9FjtfYDQSSqLmr+0xl4iUkrNVqRmGBVDu3ENaWJUibCrIj/XkzK7nY1I pcvBWmRljTU115Yq2KC9VDsKsAh+lyHDHaGZrJIpSfQxt63GBBQrO+mRAXm7825+ kTrynCD2bb2w+pKJhLsuut2mKPg0jWzZTYWpwaJM5hMiZX7L3n8FoXXbl3rq7gPQ quqMtDiEsHTYBleoCWnIRioRN4domgWKLslOMVw9aV805Xlxdt8XWK2p0pW7KAUS +oeUNsUV8tYt9YGmzfmgZALPqmGH6TTvNTowVwK7mQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrjeeggdduiedvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvffutgesrgdtreerreertdenucfhrhhomhepfdfrrghu lhculfgrtghkshhonhdfuceophhjsehushgrrdhnvghtqeenucggtffrrghtthgvrhhnpe ejteetudfhffdvgeekieeiteefvdefteejteduvdejtefhueekgfeludeuleelieenucff ohhmrghinhepthhhvghphihthhhonhhitggtohifrdhushenucevlhhushhtvghrufhiii gvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehpjhesuhhsrgdrnhgvth X-ME-Proxy: Feedback-ID: i047841af:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id B81EEB60086; Tue, 3 Jan 2023 17:33:18 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.7.0-alpha0-1185-g841157300a-fm-20221208.002-g84115730 Mime-Version: 1.0 Message-Id: In-Reply-To: <786328389.5641240.1672780518915@mail.yahoo.com> References: <04d86085-a044-4b9b-8451-b6e0c3586bb3@app.fastmail.com> <31e8cb06-606f-ea05-7d99-08e0311920a5@cs.ucla.edu> <68afdacb-e8db-41c4-a3bd-1ce5ddd185ac@app.fastmail.com> <786328389.5641240.1672780518915@mail.yahoo.com> Date: Tue, 03 Jan 2023 16:32:01 -0600 From: "Paul Jackson" To: "David G. Pickett" , "eggert@cs.ucla.edu" , "60506@debbugs.gnu.org" <60506@debbugs.gnu.org> Subject: Re: bug#60506: feature: parallel grep --recursive Content-Type: multipart/alternative; boundary=3e9d5f89839d42a281a1fa27514752b9 X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 60506 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --3e9d5f89839d42a281a1fa27514752b9 Content-Type: text/plain David Pickett wrote: << I also wrote a simpler, line oriented, faster xargs, fxargs! >> I've been quite pleased with an xargs wrapper I wrote that basically converts newlines to nuls, and then invokes either "xargs" or, if asked to run multiple threads, "parallel --xargs", passing all the "xargs" arguments to "xargs --null". I got all the exit status's and such just right, and preferred having all the xargs options available, once this hack worked around the confused space character handling of xargs without the --null option. I call my wrapper "x", a short name since I use it a lot, having been a regular xargs user since it was added to Version 7 Unix, inside Bell Labs, back around 1978. You can find my wrapper at: http://thepythoniccow.us/x.c By the way, even the original author of xargs, Herb Gellis, agrees that its interface is somewhat borked. See a note Gellis posted a decade after writing xargs, which I include in the above "x.c" source. An amusing bit of history ... -- Paul Jackson pj@usa.net --3e9d5f89839d42a281a1fa27514752b9 Content-Type: text/html Content-Transfer-Encoding: quoted-printable
David Pickett wrote:
<< I also wrote a simpler, line oriented, faster xargs= , fxargs!  >>

I've been quite pl= eased with an xargs wrapper I wrote that basically
converts newlines to nuls, and then invokes either = "xargs" or, if asked
to run m= ultiple threads, "parallel --xargs", passing all the "xargs" arguments
to "xargs --null".

I go= t all the exit status's and such just right, and preferred having all th= e
xargs options available, once t= his hack worked around the confused
space character handling of xargs without the --null option.
<= /div>

I call my wrapper "x", a shor= t name since  I use it a lot, having been a regular
xargs user since it was added to Version 7 Unix,= inside Bell Labs, back around
1978.

You can find my wrapper at:

http://thepythoniccow.us/x.c

By the way, even the original author of xarg= s, Herb Gellis, agrees that its
interface is somewhat borked.  See a note Gellis posted a d= ecade after writing
xargs, wh= ich I include in the above "x.c" source.  An amusing bit of history= ...

--=20
                Paul Jackson
                pj@usa.net


--3e9d5f89839d42a281a1fa27514752b9-- From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 03 18:56:44 2023 Received: (at 60506) by debbugs.gnu.org; 3 Jan 2023 23:56:44 +0000 Received: from localhost ([127.0.0.1]:47093 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pCr99-0006cO-4N for submit@debbugs.gnu.org; Tue, 03 Jan 2023 18:56:44 -0500 Received: from sonic313-20.consmr.mail.gq1.yahoo.com ([98.137.65.83]:43524) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pCod9-0001SR-Jv for 60506@debbugs.gnu.org; Tue, 03 Jan 2023 16:15:33 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1672780523; bh=yMiDFlSOExqPp4oIz+L7d8uBgm3YPC+hqNhwg3hlj58=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject:Reply-To; b=tFRt04VsZEdPMtIFhU4gL1LSj9WU6T5FuUf86aV1lOLK6gzVK78mJQM1aPqZC9nqKUXzSmuZYeE4KVSLpDFeQ30+RlE0VQtyzN8BpgRVGbPxINF6YpZg/2vsuStQsUzMgcI14jU7AyJRHMoJ01D/MNKmZb8pC21yz8AVl4Uov/M/qtAr8Z6qqL77J+LV4d38gmqvHg6GnEazn629VUmryXy2SXYAdi+3AzHMkSjXtpvHm7XCsujF2STQvQPYitixbSRCxlj/VzYQm7eG0z46Y7sVOfWqB2Bh9hPb8grfeHBzds2lmwlgKhAoJ7kl7zgLFRuetjYrzJzbuvALJS02jw== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1672780523; bh=/+e9tw5dets7LILStRdHL+bVOMNSsQUsIHa4FvZF96V=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=pbSIvx1e9A1gBB1vWkMkxAUGoZ4MDMXm8RL0maKgg38ZHMYYpf1M2AJGO5AEJuFNl4z6R+Pl2iV2+l+nn7xPWNYwKi2El8H/zZ/SXATTiZvIOAxETMnktQTFVWin+Lcaq1ZoQgJsrIVtxMx1gcDR6lrH2rABQGxs4ivVthQRli1GRun7jF36nO7YbmxGMm248f4hqsr6xs9zizvXQ/Ak+lOZwrCYweY7YJVAPAtH+q0kjIxzvBkN5vcSfJCufJ7VQlaayqH+47JCG0qpIEiqen81Ck/z5QrQhc9oAtyuKVV8K5vWax8/INJvoCWrF+EqXk2Rm2lvUdvWFS33RfN6zw== X-YMail-OSG: tPYxrpoVM1muT20f4H_tYNAYLtmSmhKUOIFI19mmhTCn8sunDXomU14mAqVOpG_ JGJ9vus0yodi69HhGly4UbXIajzedqlecQc87CCB5v4hbLWiTXwXzq9W0nqIuI_WOys_yg_dUkUJ W2ljbZEbbGjyOzknTx36lhwr6q1Kyg1.VmmKgKB5DrvcBO10WHEdqrEmvQRx709QaYAbvKaZ72k9 hLitxG09DbeZmkFjFFoBaYmu5FaTc.CFk.QP9eW3KEXQScUPwx2B9Hg6ItHBxhkdepashPJ_AYv1 kEXF1cVublfJMLqsys9YE7fqTxi3C2iSFtLUlbhxlQdRrXwa7spZ3G6h06mO_kE1vewocGwGQbj5 fD1scbyam.TFzX_AZvA0nKGTli0GGEKfmfW8lEx0a4tvrUZjMc5DvAsaDHKSmos0juEckigjS1pV 75EmAYkp1W8tj1HhkKpM4mNM88AmsOkZQTqhZa_otmRn4mCTi9S5w8kZxHVmXvZ_1I_MKcqa5Sp5 xNiSA8wcRyHMGpbtdC0Skd074ZFvBMZNUWVwLHSNErBceS22.E_LizJInR4Ay7BOuxe7RyMwCubF jO4Pp4v7pHeBl0Bjgmfst5_vBrJKXgU_SiIfzg.NGV3ly8QPsrcK49o4HgD3g8GijvWj4PCt9su4 UvMH62XppleHzQz_IL7VZXiH6AxbdLpoWatEQq0dQXlV3cRy7mUE.jZbTOtjI.4N3WKix7uV7meX TpdCmPwR.UMo0ZeZCHSmfhUqlSEPubZhcT6PakmVal6ua5SYDQMNIbgH_swBD6olESAESdTuJ0mQ 0PSPG1wYF8DGS5V5fTheG31AirGsDtxoIUhJnP_uo.FgDNrI8uV_hXM8nPQYknn3cm6CDk7Lq8U4 amyID4xCuirT0wbCAFbdafoPOirULTrrZKSKgO10OX6ecCe_Rf2UPa_A4nyvSyHEumUuZVOZnYA3 OHnTFCKOlbfb1J_bjkSU9v8NgMt078YUop32r00a1fsTkXBOhIjbiHxsOemUDX2IH9vq8MwuASmp o_GwmUOk5o8BQpxKCCLQ3BWZbK2SYYUaRL2FUjM4KeGafl1BCCSFkpXrdnt7Q4VBUhtitBrVDcf8 QYE7ylTfjKousQXdTNGoebsWqmfjpBNjiuyzA4t6giOPIA6gHrO8dwbyyk3jIoklqI.X3LxspKRg F5XqGFa9oD5tlAUui0hGZaITTVt7BujEnI6xxaQ9rYLUzRX0Xhdkpw1dZXCd0wGqWrtRiQBmIvwk Z528BSv3hbtrZLKPs.wWVppjhg3YixtatnpW3jZXehg8hPn.F.Pe4wRPK1o5QlMd9Q2_cT5fycHb CRTCXEVadIjDGBCz5KYR7PgyZxaW1gso.wUQtk07hY2wOwsZqNJ4MZmTdJtqQNGm.EX68D66lVYw ihEAws38G8zVIMCziP4lxG1Y52RLEG5InWYtPgA1BzZHhANpD8eLEPKi6mPDwGz4f7zjyWTge95Y 0YojO3S2zCsPXMlh2N7Obh_g5OExndXzMpkvRA.hF1uWZKRQmCdtzgLena3vxqRDGnR57WNTODT2 UE9f5oyMx.kXG6O.WjnmhQZP3Z0Nhat159pkYO8S6YfqDjY27M6YJ5w9E0bXCvb05Ucl2GPlrNHt jedmJogJ25a1VWMZ242xXNZRldDKxGRoz2Znem9DXvE_S.eQH3Bmw1o8frdbCElwEH1oi3kW_bcx W0pcFivPpxVuuNLD1OyB14HryTsUtK1r7jbrV3Syu9Ha60kc1v9._q5YovRPJsfsrf66LMKnameQ DqXL2yfsaCoyC2oAL64uMiD25Uq_sI2jzOW8wPlNIURX2SgwpbMest_lDm33PLAkeNtsi8gTQIQj oMew7H9pHnIz.AJUtejeDSXq6lVqhBt0bjClq_EHLK0WQOvqBf259yDRD4zfzPmECBN05sB5Ipyj siZJPoiT4pgiu2NjzR_iSUmLhAnpTHtfKd0OJ_dKr1UHsMR4Ieyk6BN7mlInfbvG2Okg5s..hFVP J.NHRU7dlhRHBYKv0FqzRVTURG8MiiTehKVUjRbOgkTRSqUP1GVhCyM1KyguRtxQT9ijATdemy94 Nmthv7sx7C9kmfYWh1HL.JdPgH0qQ7ciNZyhb8SInAWEu.OOF396mtWB6ILZKLFofAYWvGSn8NN. 1af8u18BCcH9YmBDaPUOzaLwUUMffS1t34XuPXIPlj5iyj33.lJzYGeVwHTmMhVnB5U6RzBuu8ir XC2sNcA.M6QtqCMZl2Zjo5Zjjx0_Iu0obFTpIAtsva9U18dBVZGJKw.zGpcqCyb7JI3Y- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic313.consmr.mail.gq1.yahoo.com with HTTP; Tue, 3 Jan 2023 21:15:23 +0000 Date: Tue, 3 Jan 2023 21:15:18 +0000 (UTC) From: "David G. Pickett" To: "pj@usa.net" , "eggert@cs.ucla.edu" , "60506@debbugs.gnu.org" <60506@debbugs.gnu.org> Message-ID: <786328389.5641240.1672780518915@mail.yahoo.com> In-Reply-To: <68afdacb-e8db-41c4-a3bd-1ce5ddd185ac@app.fastmail.com> References: <04d86085-a044-4b9b-8451-b6e0c3586bb3@app.fastmail.com> <31e8cb06-606f-ea05-7d99-08e0311920a5@cs.ucla.edu> <68afdacb-e8db-41c4-a3bd-1ce5ddd185ac@app.fastmail.com> Subject: Re: bug#60506: feature: parallel grep --recursive MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_5641239_1148930522.1672780518913" X-Mailer: WebService/1.1.20982 aolwebmail Content-Length: 8775 X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 60506 X-Mailman-Approved-At: Tue, 03 Jan 2023 18:56:39 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: "David G. Pickett" Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) ------=_Part_5641239_1148930522.1672780518913 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable It seems like we have 2 suggestions: parallel in different files and parall= el is large files. =C2=A0- Parallel in different files is two ways tricky since you need threa= ds and mutex on the file name stream, and in addition for parallel director= ies, some sort of threads and queue to pass the file names (producers) to t= he grep's (consumers).=C2=A0=C2=A0 =20 - You might need a following consumer layer to ensure the output lines a= re in order or at very least not commingled.=C2=A0 A big FILE* buffer and f= flush() can ensure each lines is a write(), but you give up original orderi= ng unless you arrange to arrange or sort the output.=C2=A0=C2=A0 - You probably want to set a thread count limit.=C2=A0=C2=A0 - You might want to start with one file name producer, one grep consumer= -producer and one arrange/sort consumer, and add more threads to which ever= upstream side is emptying/filling a fixed sized queue.=C2=A0=C2=A0 - But of course, a lot of this is available from "parallel" if you make = a study of it!=C2=A0=C2=A0 - I made a C pipe fitting I called xdemux to take a stream like file nam= e lines from stdin and spread it in rotation to N downstream popen() pipes = to a given command, like xargs grep.=C2=A0 N can be set to 2 x your local c= ore count so it is less likely to block on IO, paging, or congestion.=C2=A0= =C2=A0 - I also wrote a simpler, line oriented, faster xargs, fxargs!=C2=A0=C2= =A0 - I also wrote a C tool I called pipebuf to buffer stdin to stdout so on= e slow consumer does not stop others from getting work, but more parallelis= m is a simpler solution.=C2=A0=C2=A0 - Threads in Intel Hyperthreaded CPUs can run twice as many in parallel = as with parallel processes. =C2=A0- Parallel in large files reminds me of AbInitio ETL, which I assume = divides a file into N portions, but each thread is responsible for any line= that starts in its portion, even if it ends in another.=C2=A0 Merging outp= ut to present hits in order requires some sort of buffering or sorting of o= utput.=C2=A0 For very simple grep (is it in there?), you need to design it = so you can call off the other threads on any hit. Doing both the above simultaneously would be a lot!=C2=A0 Either is a lot t= o focus on what is one of many simple tools!=C2=A0 Other tools might want s= imilar enhancement!=C2=A0 :D File read speeds vary wildly, between network drives on various speed and c= ongestion networks, spinning hard drives of various RPM and bit density, so= lid state drives, and then files cached in DRAM (most read IO uses mmap64()= ), not to mention in MOBO and CPU caches at many levels.=C2=A0 I wrote a mm= ap64() based fgrep and it turned out to be so "good" on a big file list tha= t ALL the other processes on the group's server got swapped out big time (w= ithout parallelism)! -----Original Message----- From: Paul Jackson To: Paul Eggert ; 60506@debbugs.gnu.org Sent: Mon, Jan 2, 2023 9:56 pm Subject: bug#60506: feature: parallel grep --recursive << a parallel grep to search a single large file >> I'm but one user, and a rather idiosyncratic user at that, but for my usage patterns, the specialized logic that it would take to run a parallelized grep on a large file would likely not shrink the elapsed time enough to justify the coding, documentation, and maintenance effort. I would expect the time to read the large file in from disk to dominate the total elapsed time in any case. (or maybe I am just jealous that I didn't think of that parallel grep use case myself .) --=20 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Paul Jackson =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 pj@usa.net ------=_Part_5641239_1148930522.1672780518913 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
  • I made a C pipe fitting I called xdemux to take a s= tream like file name lines from stdin and spread it in rotation to N downst= ream popen() pipes to a given command, like xargs grep.  N can be set = to 2 x your local core count so it is less likely to block on IO, paging, o= r congestion.  
  • I also wrote a simpler, line oriented, fa= ster xargs, fxargs!  
  • I also wrote a C tool I called pipe= buf to buffer stdin to stdout so one slow consumer does not stop others fro= m getting work, but more parallelism is a simpler solution.  
  • Threads in Intel Hyperthreaded CPUs can run twice as many in parallel = as with parallel processes.

  •  - Parallel in large files reminds me of AbInitio ETL, which I as= sume divides a file into N portions, but each thread is responsible for any= line that starts in its portion, even if it ends in another.  Merging= output to present hits in order requires some sort of buffering or sorting= of output.  For very simple grep (is it in there?), you need to desig= n it so you can call off the other threads on any hit.

    Doing both the above simultaneously would be a lot!  Either is a = lot to focus on what is one of many simple tools!  Other tools might w= ant similar enhancement!  :D

    File read speeds vary wildly, between network drives on various speed and c= ongestion networks, spinning hard drives of various RPM and bit density, so= lid state drives, and then files cached in DRAM (most read IO uses mmap64()= ), not to mention in MOBO and CPU caches at many levels.  I wrote a mm= ap64() based fgrep and it turned out to be so "good" on a big file list tha= t ALL the other processes on the group's server got swapped out big time (w= ithout parallelism)!

    -----Original Message-----
    From: Paul Jackson <pj@usa.net>
    To: Paul Eggert <eggert@cs.ucla.edu>; 60506@debbugs.gnu.org
    Sent: Mon, Jan 2, 2023 9:56 pm
    Subject: bug#60506: feature: parallel grep --recursive

    << a parallel grep to search a single large file >= ;>

    I'm but one user, and a rather i= diosyncratic user at that,
    but for my usage patterns, the= specialized logic that it
    would take to run a paralleliz= ed grep on a large file
    would likely not shrink the elaps= ed time enough to justify
    the coding, documentation, and = maintenance effort.

    I would expect the= time to read the large file in from disk to
    dominate the= total elapsed time in any case.

    (or m= aybe I am just jealous that I didn't think of that parallel
    grep use case myself <grin>.)


    --
                &= nbsp;   Paul Jackson

                    = = pj@usa.net




    ------=_Part_5641239_1148930522.1672780518913-- From debbugs-submit-bounces@debbugs.gnu.org Wed Jan 04 11:43:10 2023 Received: (at 60506) by debbugs.gnu.org; 4 Jan 2023 16:43:10 +0000 Received: from localhost ([127.0.0.1]:49439 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pD6r7-0000wj-RR for submit@debbugs.gnu.org; Wed, 04 Jan 2023 11:43:10 -0500 Received: from sonic303-24.consmr.mail.gq1.yahoo.com ([98.137.64.205]:39630) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pD6r5-0000wE-3X for 60506@debbugs.gnu.org; Wed, 04 Jan 2023 11:43:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1672850577; bh=Yuvvvh/alaMIYkLttrOm8lH+pYgLGeytHYKtcjBDf2o=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject:Reply-To; b=KBO4H8CnV+13/Pv9DGvfpO0uOxLdMLRAGDNTndAyxICdCRakoG8IdzpbwgIVepoXYgBf0px4Whs9S6JZUvdsGCsBfDDp6hcpdpjk2QRrwnu3vs0d13vza3MumI2fetBFB92U3Y+8YHPSo9ZqNTv1l0hLdBgY7yCUtyPgt0hfne25w4qClmGXDQLtIxdMgaUI9KeFwlF1U5I9oueI6LwxVx+MzWshMNegkETMl74a/NOYG7lQE2KP6K74++NfkxPXn69CfSEX7sT++3Gk6enJbsRQ+vF3ouZ1lEv/vVI18GqfdaeAeDdrS6rZad0YHW1OXasC8mJvyo6Ubths68s81g== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1672850577; bh=zEbgYKbVp4XLhf6fqU8kTg3uR0QC+zth0QO6E6uJ5dB=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=TA2/g5Ex9Kl9CEqQOddYgd10x2Nw2brgrmvgJCKi3P3IIlRqk56CBvQ3+cLz2qutzVcvgkdsJnlK0vGjcmnF+OxYpl0PViLmRZZQJ5uQaXwP7Ikc4OFnboRRw9U3Y4cwNdi4mns7dZcrRExUC0KpzOuDTg2YAK+mWpojYM8y//cRbK2M6CMi0Z5Z15cfBcLP5/x8LPMLpsxHtQ13wMEflB/uiKFaPYyTLLbp1O1baQYkrW4kVHzUnzIprhWENStSptJLjWayX++QYmLHUqgrwdUat8x7ImQmTolNBAqwh1ef1RpsF0M4Tji0TPMRewg0EfCx9ESVhdgyk9qBohXFFw== X-YMail-OSG: mA6YwXoVM1k3HMCkKAfF0BW6YnotlbjtNTcjmvp58k3j1wrlMEsuNa_kvo5i.xB uJyjBhYgV3IpqMj4mR3IfdjjMii8xEOL1GQ0ARG8_C7n1mMhtDg8pVY6hwPedhC.XZ7OdGgfbQdy yirQZqhPfR6fxmWqaWIGQyB4Yp1uEFf85NYQcysb_lnrYHH_vJflKXubTtXTN1Agq.2ONDVqlsL8 Q_z7.96MubiTdHMChkujlok7oR0kATLhvV4GvADGAkg7OOKQ4dsVIPJ36C7Wy7uwKB0tRakmq3CC 3xBltB1BdaZQtzks3lnFeaSXIBeNCgNpRJWtg0jntO9SGT5OPRTQG3I6bJA_MpK8cyYITziPaeFm LneqSz_952J1hOkFalBxHBv7pjllKXH3E7wCYxcEAa6_y23M19uTtTyU25APzI44yCbvxazTE.pc UtPnIrK6iEIJya1mV__EL51ONkP4ElZZI_47ju2jrRnVxyIR_7tB1Fqo5LG7lc7Z8BkgCcVxJCAG RZTgwnK9eggBltvfTQclktYhzqDw7oJ48l_pW_OpLhKK_oa8oBUn0fVZZVr9HEQ0VP2oAijRljCz iRPyIoJV8aDT1rsmDPaikX8iX26drNTvNb0cxZTyIWVx_Bco86pdDwtfS3fAb5.qMz3aYYRQJzi6 1zAm.PAeQ3ja.qnVacQzMW1gLHDmZ6.0DiQb_gKv3WeolILKomZiOCLZhz72CXI7YaHK9gswrmDS OUXR4bEyK.zkVGtm8ZQmFk9uRYI1L5jM3DjR8sr0xBOOM2b7jbN0XFJXSV.K_N8.233Ky4.T0cam XG0pJ8f0SgFHHlmtFME4nBHxLZPXMWJBu0NEh2RcpGRRxKje5rVVYE_7pcNk.RdnslqZA3rKtvK6 bz5.brQB9p9epq9.glXXzke3jtcXwC2LOZNNzr66TkwjAxXBI4S3fElar4hhis.sEY7cL9JmCYXd KmbzGC7uXU7RBCAuGH4unh5h_ohtW.ohaZM3GbajxAeFg45x0AV6Mw_1FUSywOoH17K98xSV4LdJ w41fa9ZkaNFkgT9E5w4d5thin_nx9UuA0kQoKbZHLIz_zTtyWZRhGEdGIDzqqinJnQelilKObnbj ZOwYHil_Vz7qLuQTPoPsVtRIU7BBpHfBKuF264u96wXf1bC1dM6PzLkyLWeo1O0iGsgkats0EQap 5WarJtNngfuQuyGKe8vGt_tlVOCVRRJt14HsBP2HkjO94uWknhoXG5oalK29LjgGn0ZqojFiaBrx eqE.F48vKB8ioHkV2_DF0QYv3puN3UAYi0XC5eRAkZ3GbfCQ1ceTjndUIMnfrt.JM41GdHpLQPSx cJRnWZRrTXt3YN80YV0ZESsKWmO0K026X5koB5pApfI1MeoMqkcVzS0iu4pGsJy0_3K_5rZV7nvv dmHs2Fv6aN86w80S9bIGOf.rkpanctTYXgSsd44S590Z92PMpSCDHoUzZ81.MmEy4fXg9dwvVD9J FbFW82pBmJbUUhSqeEv3NPzItt2NIyaFwq5XIIA9atK9KC67KHZOOFLzHqCWdMG6q9BTB0dZlDI4 xElkUW.ua3_HjihJcYhXct5IfCXo2VL5P7pD0J5Z68c.y.oQ74RwQiUlzcz1spwmGu7arNSdTMGS 198rkCPoJmacm6OSeilbVDjCgnOVFeckCWbIlt_rplkQlEY5ht7gHIO57gHKzZ5jzuMjTvZ4InMm tCQNnGVvCryBClKJD40YpgbKibLeKa3ejI4UvNvydJcBSFps67G7l7_bc_BfXwVMAoMoVVeMUvys BzczDMVCVndnOJ_FUoeKpykfV7Uj08B70zTVc668yEwPUKZWdvTTJ9EVTW08nFYtjfUP62v.EzGi RoFbOX9pS5_rHzca1PdjIg6_hyEV2sxZfQ30wCiz6RlAAB0iaYueczFD3Qv5ujSgMp6FyNO1eciK TCLNnGM1HWOmr3bJVW.Q6Pj5kpGHe8vW3ZLs3Dd0xqZSX4uxebATXuVwc1uRbm59gUc9uWioDbh5 ZA6DZSWUIPGlnzbP3P1cFJ3Np0n0hEfaEzfWzYm8CILmklXH7PGqHFvqcw6wINCaVFuHH9PilJUP MC34zy83gGxwGxpuOg9w.M4iapOFFxW3xQEUfY7bMzoMfSpb0QwlGk6hisKTFBG6nR7.xxj3eoLk Y8nicO0F7Oi_fp2b4cyoJOP__LTB0qNSo3am6NxKft9TzJmQyamz3DDsE9I3yIi.g5LxUzf.8fWx z3jQ_Xe3CSExerzcZZxNrBWZS2jCc6ZjGXo5NNh0MxtLkF33if0Zcnv.9AAQ- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic303.consmr.mail.gq1.yahoo.com with HTTP; Wed, 4 Jan 2023 16:42:57 +0000 Date: Wed, 4 Jan 2023 16:42:56 +0000 (UTC) From: "David G. Pickett" To: "60506@debbugs.gnu.org" <60506@debbugs.gnu.org> Message-ID: <2136865415.5921209.1672850576730@mail.yahoo.com> In-Reply-To: References: <04d86085-a044-4b9b-8451-b6e0c3586bb3@app.fastmail.com> <31e8cb06-606f-ea05-7d99-08e0311920a5@cs.ucla.edu> <68afdacb-e8db-41c4-a3bd-1ce5ddd185ac@app.fastmail.com> <786328389.5641240.1672780518915@mail.yahoo.com> Subject: Re: bug#60506: feature: parallel grep --recursive MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_5921208_1546564133.1672850576727" X-Mailer: WebService/1.1.20982 aolwebmail Content-Length: 8616 X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 60506 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: "David G. Pickett" Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) ------=_Part_5921208_1546564133.1672850576727 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable xargs enhancement: I collect new args while the last set is running, use a = fixed common buffer for input, and vary the arg count down for long args. dgp@dgp-p6803w:~$ fxargs2 Usage: fxargs2 [ -n ] [ -v ] [ -p ] [ ... ] Reads arguments as lines from standard input and executes:=C2=A0 [ ... ] Each line becomes one argument.=C2=A0 The nu= mber of args per command is limitedby (default 1024).=C2=A0= The command is executed when either:=C2=A0- the total number of args from = standard input is , or=C2=A0- the buffer has ( 80 * ) unexecuted bytes of input, or=C2=A0- stdin EOF is detected with a= ny args from standard input.The [ ... ] is never executed = alone.The buffer is fixed in size at 80 * , so long args can= forcefewer for any pass.While a command is executing, read= ing resumes, but before another commandis executed, the prior command must = return a status.With -v, any abnormal child state returned is reported.With= -p, any child terminating on SIGPIPE causes a normal exit. dgp@dgp-p6803w:~$=C2=A0 I was tempted to exec more often if stdin was temporarily dry, but better i= s the enemy of good enough! -----Original Message----- From: Paul Jackson To: David G. Pickett ; eggert@cs.ucla.edu ; 60506@debbugs.gnu.org <60506@debbugs.gnu.org> Sent: Tue, Jan 3, 2023 5:32 pm Subject: Re: bug#60506: feature: parallel grep --recursive #yiv4580765374 p.yiv4580765374MsoNormal, #yiv4580765374 p.yiv4580765374MsoN= oSpacing{margin:0;}#yiv4580765374 p.yiv4580765374MsoNormal, #yiv4580765374 = p.yiv4580765374MsoNoSpacing{margin:0;}David Pickett wrote:<< I also wrote a= simpler, line oriented, faster xargs, fxargs!=C2=A0 >> I've been quite pleased with an xargs wrapper I wrote that basically converts newlines to nuls, and then invokes either "xargs" or, if asked to run multiple threads, "parallel --xargs", passing all the "xargs" argume= nts to "xargs --null". I got all the exit status's and such just right, and preferred having all t= hexargs options available, once this hack worked around the confused space character handling of xargs without the --null option. I call my wrapper "x", a short name since=C2=A0 I use it a lot, having been= a regularxargs user since it was added to Version 7 Unix, inside Bell Labs= , back around 1978. You can find my wrapper at: http://thepythoniccow.us/x.c By the way, even the original author of xargs,=C2=A0Herb Gellis, agrees tha= t its interface is somewhat borked.=C2=A0 See a note Gellis posted a decade after= writing xargs, which I include in the above "x.c" source.=C2=A0 An amusing bit of h= istory ... --=20 Paul Jackson pj@usa.net ------=_Part_5921208_1546564133.1672850576727 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit
    xargs enhancement: I collect new args while the last set is running, use a fixed common buffer for input, and vary the arg count down for long args.

    dgp@dgp-p6803w:~
    $ fxargs2

    Usage: fxargs2 [ -n <args_per_exec> ] [ -v ] [ -p ] <cmd> [ <cmd_arg> ... ]

    Reads arguments as lines from standard input and executes:
     <cmd> [ <cmd_args> ... ] <args_from_stdin>
    Each line becomes one argument.  The number of args per command is limited
    by <args_per_exec> (default 1024).  The command is executed when either:
     - the total number of args from standard input is <args_per_exec>, or
     - the buffer has ( 80 * <args_per_exec> ) unexecuted bytes of input, or
     - stdin EOF is detected with any args from standard input.
    The <cmd> [ <cmd_args> ... ] is never executed alone.
    The buffer is fixed in size at 80 * <args_per_exec>, so long args can force
    fewer <args_per_exec> for any pass.
    While a command is executing, reading resumes, but before another command
    is executed, the prior command must return a status.
    With -v, any abnormal child state returned is reported.
    With -p, any child terminating on SIGPIPE causes a normal exit.

    dgp@dgp-p6803w:~


    I was tempted to exec more often if stdin was temporarily dry, but better is the enemy of good enough!


    -----Original Message-----
    From: Paul Jackson <pj@usa.net>
    To: David G. Pickett <dgpickett@aol.com>; eggert@cs.ucla.edu <eggert@cs.ucla.edu>; 60506@debbugs.gnu.org <60506@debbugs.gnu.org>
    Sent: Tue, Jan 3, 2023 5:32 pm
    Subject: Re: bug#60506: feature: parallel grep --recursive

    David Pickett wrote:
    << I also wrote a simpler, line oriented, faster xargs, fxargs!  >>

    I've been quite pleased with an xargs wrapper I wrote that basically
    converts newlines to nuls, and then invokes either "xargs" or, if asked
    to run multiple threads, "parallel --xargs", passing all the "xargs" arguments
    to "xargs --null".

    I got all the exit status's and such just right, and preferred having all the
    xargs options available, once this hack worked around the confused
    space character handling of xargs without the --null option.

    I call my wrapper "x", a short name since  I use it a lot, having been a regular
    xargs user since it was added to Version 7 Unix, inside Bell Labs, back around
    1978.

    You can find my wrapper at:


    By the way, even the original author of xargs, Herb Gellis, agrees that its
    interface is somewhat borked.  See a note Gellis posted a decade after writing
    xargs, which I include in the above "x.c" source.  An amusing bit of history ...

    -- 
                    Paul Jackson
                    pj@usa.net
    

    ------=_Part_5921208_1546564133.1672850576727-- From debbugs-submit-bounces@debbugs.gnu.org Fri Jan 06 19:41:06 2023 Received: (at 60506) by debbugs.gnu.org; 7 Jan 2023 00:41:06 +0000 Received: from localhost ([127.0.0.1]:56171 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pDxGk-0000pR-Bj for submit@debbugs.gnu.org; Fri, 06 Jan 2023 19:41:06 -0500 Received: from mail-pj1-f41.google.com ([209.85.216.41]:33535) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pDxGh-0000ol-Qq for 60506@debbugs.gnu.org; Fri, 06 Jan 2023 19:41:04 -0500 Received: by mail-pj1-f41.google.com with SMTP id fz16-20020a17090b025000b002269d6c2d83so6922650pjb.0 for <60506@debbugs.gnu.org>; Fri, 06 Jan 2023 16:41:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=iPV1ssUp6SpuVdW2QgcxetyVaLowk4wYVl2C58g3RS4=; b=MtCPDMSvLiFMxuL2fwPyutyaF8EH3moUQzP0oQwY//8pSIm/AObgtCbsTam4NCwKVn jNPTHMIIXLAIEfuOWmxR7Ye4wated7RoFJ2KX2fu1jsYa3XugBGC+ogiBx8iFc9je770 +yDKPfbWDgPoBWUYloRqbqmU7Ygk5j1yRzWzeOMZWZCCs3nfd4NrWqpIiglmdIdQgPS8 5KjXwR5wQxc0UeugZTdk9cmDRTSjnlL8XVB8cvRnMw3uVH1OghyTqWcTFZrV3VK2aETr D3/cbC8/89+kwPXPyoI+uePaAS7pRzfJ/XfQSwsIogOFXV2ShODw046uVh7ijuRR2VZf RDMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=iPV1ssUp6SpuVdW2QgcxetyVaLowk4wYVl2C58g3RS4=; b=jHhprgt8OJnYGUWB7F0JFRxrR3PAqDr3NFkVmxXNryCIHMx6SV53X65/R8xnK3CVyb bRK1Tm7bWzp8RI51rYiB2IjTtHPQAsNFZD7nmr7NcukNusKs1x3GLP0i3yOq7AThAMuG tF12zpSzzu+uWCXhqGdv8wkM3vlToRSH7IqGwP/4ecBK4/M2SXCVuBNqNCvrBlGwTXGR sODFadyYfooggT/LJLIaHqHsIdi20bJ62aMy7KBu4z429qEnDYD+NkgNRFIONIwR/bgv +AZhhFcnRTKeXghiJLJKl8DctjyAWbBQu9k5ubcBOGPSKRd6bKUUhSMYdHUIsNZV5Ms4 nBnA== X-Gm-Message-State: AFqh2kqRrohTM5pATFZ+hRVhclmBJcUow7njugII7n3esimt+s++eWn6 1RU5/uWFPhuWRyQFaWDGvzKsdtLnm6VL9cTRxFXw3QWP X-Google-Smtp-Source: AMrXdXsmFUKNUehfaxh0ddQDSAuH6jRLntoLToseitYV/nwSwz2bE15MLkkjy1ogV3a431Ug/nqiS/DXWmlKecNlK6A= X-Received: by 2002:a17:90a:d983:b0:226:b1d9:912c with SMTP id d3-20020a17090ad98300b00226b1d9912cmr1099939pjv.29.1673052057818; Fri, 06 Jan 2023 16:40:57 -0800 (PST) MIME-Version: 1.0 From: Eike Dierks Date: Sat, 7 Jan 2023 01:40:46 +0100 Message-ID: Subject: parallel grep To: 60506@debbugs.gnu.org Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 60506 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) I was thinking about this again. It looked easy at first, but it is not. My prime use would be to grep in /usr/include That would search a lot of files, but only return a few results. In that case, searching a lot of files in parallel could be beneficial. But it gets a lot more troublesome, if you get a lot of results from a single file. In that case a lot of results would need to be buffered, so as to give them the very same ordering of the results. Because the output of grep should always stay stable. But we could make this explicit: We could introduce a new option: --parallel (-P) That would not have any order of the results returned. I know, that we have to be very conservative about how grep works. Actually a wrapper with gnu-parallel could do. I want the grep to make my machine to scream and go. I want to have grep to use all io and all compute, and to return results as fast as it can get. // hi at the grep // eike From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 07 23:33:51 2023 Received: (at 60506) by debbugs.gnu.org; 8 Jan 2023 04:33:51 +0000 Received: from localhost ([127.0.0.1]:59532 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pENNX-00020p-4d for submit@debbugs.gnu.org; Sat, 07 Jan 2023 23:33:51 -0500 Received: from sonic308-8.consmr.mail.gq1.yahoo.com ([98.137.68.32]:34052) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pENNU-00020Z-VP for 60506@debbugs.gnu.org; Sat, 07 Jan 2023 23:33:50 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1673152421; bh=fzmfjvpKUyJdrveFupUGNYV0AD9jl6ECkSO05CU6MK4=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject:Reply-To; b=KMpiK6VhagfmwZ1sl4IAeeM4iJBYm4XKCdvg0213SlJg/oYU/Uv4S6XkECaJHq8+WZxGXqBM3oe2y9tuOnLkkHSfL4BDB4uV9o1e1/fj+Lp4cSVJojqfhWzRxE2Dt3J0CdTUJtP1de85dqqL1ZnmgQ4VImUYXyp2RHhEJN+UhZG59MazeV1Ynd9z6wNDUBuyIzFe38K+oFev0LSxdOoR9DOWaUTK2ppSWH0a1yFr9xTe3oUFvAJixGIULqdrntXpA7zCsr4AjHTFQEvhWF395RgAikEgkFEnisXnajbV/MenrvQwvZgV7NswjLXkmSZ4Y6FtoU7/Dp0TvHEufUwrKg== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1673152421; bh=+VOtUMrhrzX5KfuaOe7MzqPmaL9RgIP+zawbO7O//9p=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=Bw+QRolGTn5zMwQ9GIZAhEbG5c0sTo1vHRjH1Bz1QsnUyT+3C+N6YmRwc2H6yGfPVLC4gf7sQmvLwVnbl2pXrGBLVjCtF72TuKlXrix8JN+jqDVhUXP1GAtrk27R7GDi5EEiM2EjIQo9aFi1ffSKG/pVbfa4GBxuY9tYsmcyROLHR+E7jspzw5RPxTi1o8Ap9KGG+shrJ2Es4tW0bhTYKJl5uVVhOPTzF5nsBiKQUFo/cCoh5eUa6a5aM+PoceNc20jB8ysDv3/J1HXl2epePQCKbuC2G5VzziJ5fkLG783RP14r88V6jJ/sq/iMql0lcJjZNmaY3QPGTKVqJy1ZRg== X-YMail-OSG: FQa1a.cVM1mL1Do_vJZq7KevgeXatf3k9BVVCTVMeHwI_IhOQlD4Zj.maVzeTSE Io5ZQh_V.sSH_7LOe1lp9qaE0ov6NPcw1werB4aTJk8SjUa_MtF5kzlDkHgK93vNs12EEiKfKQj2 r9ES3IDUZVyXhEmeUOhYq3yHR1oS0b1WWdGloQoyskP8A2uJj3S4wrVxHoiVoo8k3rDJfBn1MPKP uvLBcd0ryVGbYTPoj_Gvk5dhQpfRiRVfRjSOhG4lI2ELYrqpzFEictVZPugkUm3xILp.J7eFQ9xO 5vgkqM1BdZEznE3rvLzHjI47__.HO7eg1I8Rme9ZdD.LThpXogU.U2JMtLh9RYA3taBCmGQ5Pos8 OBY0NnVgDroZYnUy6AV1eu78k15xkI3OhPJq3kqF9PydQ9j.VwETc6mWEdj64iP5k77toT4YCrjD B7FxGYsevqo7u7M5UJ_ZxkVwrPjnAYarDBvt4fcGebS1RoODoAJNqfkVlbH_1NQQNfXxG84YT9uz eevQ0FIsnRHNcds1EUC5S2lYdeDxC5iK0Bi5FIe.quInLjvDx_2iq7o.9DoHMkoPIm0gcS.6pGFh jht7CPLeE4eJNYeiu55cjzejP.J9ASiJLufJ9FkViRu8rTKzJYQKo332IrT9cGyH1IGuZSo5OpAk I7yfkUtJ3CLlW92xz1XF3mG3mbrAYkCQpJ1RNdfLN9plZvxAu2X.dD85kWvTBk7P007h2wr8VXaC .04aeGSkaQeKK8BODg.inKicU8z6SWY2E0YoAQFvwotOyzlxOWIU8GOgGLn2EJc9nMZNAWdmRqS2 1dnBn4BrG0ykWo6J1aaj91ac_r_Z8wKOUG9A4fbwOwCi0bcidwl4.VIHvSVtD7OB.UiVtiZUZXx4 diVT7nlfjOS9vYgNvmbskZK8vROxQdhOe.BJ3xAnKP1O4JSrWFzgM.jRcw.KgcYr1gA0HcqMzx5v prAwCW05PMCbigxWTnZSrSJR2r6Xxgzz9HtcgcMO0gN9iH3p4ZlCHWdYKvFooAiaNB2ya8uzRFch r3NTFuKEQqUvJy3KunFkYuU0ivS1tBR9xuE3r0Ur2DIl5dsBCuSONMfhTHkdyvIdi1sVOswUX9fK Zj4auKMqCvKfou8EkpJHW8NpxoY_r5awhNMPBqPjR9CbdudRZ1dbo5QvLjzMW0Yw_QNy2cl0gDXS o5zSh3qBcH4t605yE3VeP0fV81Bf7XmaOyJPKmZAt30xTetttv8zyDrXyLKsyD1gxr4_jUiM8xfA 2XgZLKI4_55mZW0LdH3wu2JCn6v7qEyEDhNoiGvX5O9uQY_5iPLOXdRfKQGf6GDa2g4wCj1wsgyc pumS2UmkUe14_q_8Tk39GS1DUbv_r.3lvISRQ4Rk41t5EWwd1C1tfjjRxZa251fmVc7Fzi9QMy63 FVCfATo21NsfpecOI.OcXeQhBAV2ZV3inpduqDnLSA9E1mtaIv13t60c9_hqrhPu15MmaYQXOxZD QoY2MggCDSvDW2iOdNTb.3wmLWHjizYZ.S0p8jt.TQ5AEb38UXNTUr26gQkGWckCdSVe9vjXsDYE _VmnrdYGX5g.k.4zizRQQc.OOiIVfdFsKL4R36bibDgN82mwTEJ08o56sAw4ag6zEvW0cx2MxDhr GOeZE7MKoCl3FloKVZNEcD7A40V3q235BLN5gwolKO_8F93YfRXWGUg65g3YOxfijAAQMzQGdR4u upZMKEO1f3_rxA40jhVQW4LrKonzXsl2GK.x9LyBoDidEY3MWEla2Pa7XFHQVkcMQsrPhCK.o14B vamkKx_6Oi7H0jPYpnhyS0xkZBkLtrYO.njB65XxA8w1pLO9vRUYPkKNG6ZbO6szwPuluMfc1_ro hpQcTtVWAavyOnsgYg2dMGU1DzOXrGUrEXzTEu5VYX8wHMISEkKHqjv7IHym6y1yjh165WXvnTJ0 6Gp4730hh3C2Ca6IJMGgpOw15OozWvTTz_m6ADTWH44aEC6B35cSIx2z5A__TyN3.v8gIJU0knXq vml3xKIP1xX2HwIjEOKqX7YmBgv_WnQZhdcawMunngx7ou3EO7v_IOFrzViIbcM7kZ_bd.Kgazz3 DJ2bg6Zr59wlCV7x0.l65.DNJVQ.kAjCw.ZWCu_pY4ML6kMDj2xn7VXBys1rlLKNfGBpGMyTR2bH bqIC80NSbu3be0VsWJNL5GTarecwP3wbu42_uxDv7KTBe8gvcDbVy2RIFHitUJUdbDdDRamNRHr4 oPzQM389kZaLCVJsfojLK3.Qb67RKkRBn5Dec2gFcTvFQ8S5craC4 X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic308.consmr.mail.gq1.yahoo.com with HTTP; Sun, 8 Jan 2023 04:33:41 +0000 Date: Sun, 8 Jan 2023 04:33:40 +0000 (UTC) From: "David G. Pickett" To: "60506@debbugs.gnu.org" <60506@debbugs.gnu.org> Message-ID: <29444413.2946971.1673152420399@mail.yahoo.com> In-Reply-To: References: Subject: Re: bug#60506: parallel grep MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_2946970_1205034038.1673152420398" X-Mailer: WebService/1.1.20982 aolwebmail Content-Length: 3472 X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 60506 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: "David G. Pickett" Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) ------=_Part_2946970_1205034038.1673152420398 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit I recommend cscope for source file analysis. -----Original Message----- From: Eike Dierks To: 60506@debbugs.gnu.org Sent: Fri, Jan 6, 2023 7:40 pm Subject: bug#60506: parallel grep I was thinking about this again. It looked easy at first, but it is not. My prime use would be to grep in /usr/include That would search a lot of files, but only return a few results. In that case, searching a lot of files in parallel could be beneficial. But it gets a lot more troublesome, if you get a lot of results from a single file. In that case a lot of results would need to be buffered, so as to give them the very same ordering of the results. Because the output of grep should always stay stable. But we could make this explicit: We could introduce a new option: --parallel (-P) That would not have any order of the results returned. I know, that we have to be very conservative about how grep works. Actually a wrapper with gnu-parallel could do. I want the grep to make my machine to scream and go. I want to have grep to use all io and all compute, and to return results as fast as it can get. // hi at the grep // eike ------=_Part_2946970_1205034038.1673152420398 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
    -----Original Message-----
    From: Eike Dierks <foonlyboy@gmail.com>
    To: 60506@debbugs.gnu.org
    Sent: Fri, Jan 6, 2023 7:40 pm
    Subject: bug#60506: parallel grep

    I was thinking about this again.
    It look= ed easy at first, but it is not.

    My pr= ime use would be to grep in /usr/include
    That would searc= h a lot of files, but only return a few results.
    In that = case, searching a lot of files in parallel could be beneficial.

    But it gets a lot more troublesome,
    if you get a lot of results from a single file.
    In t= hat case a lot of results would need to be buffered,
    so a= s to give them the very same ordering of the results.
    Bec= ause the output of grep should always stay stable.

    But we could make this explicit:
    We could in= troduce a new option: --parallel (-P)
    That would not have= any order of the results returned.

    I = know, that we have to be very conservative about how grep works.
    Actually a wrapper with gnu-parallel could do.
    =
    I want the grep to make my machine to scream and go.
    I want to have grep to use all io and all compute,
    and to return results as fast as it can get.
    // hi at the grep
    // eike




    ------=_Part_2946970_1205034038.1673152420398--