From unknown Sun Jun 15 10:53:31 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#70511 <70511@debbugs.gnu.org> To: bug#70511 <70511@debbugs.gnu.org> Subject: Status: Option to grep into compressed files Reply-To: bug#70511 <70511@debbugs.gnu.org> Date: Sun, 15 Jun 2025 17:53:31 +0000 retitle 70511 Option to grep into compressed files reassign 70511 grep submitter 70511 Mary severity 70511 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 22 02:51:36 2024 Received: (at submit) by debbugs.gnu.org; 22 Apr 2024 06:51:36 +0000 Received: from localhost ([127.0.0.1]:45560 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rynWY-0005NP-In for submit@debbugs.gnu.org; Mon, 22 Apr 2024 02:51:36 -0400 Received: from lists.gnu.org ([2001:470:142::17]:39556) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ryaLE-0002sw-CN for submit@debbugs.gnu.org; Sun, 21 Apr 2024 12:47:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ryaKq-0003aX-Cv for bug-grep@gnu.org; Sun, 21 Apr 2024 12:46:39 -0400 Received: from mail-40132.protonmail.ch ([185.70.40.132]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ryaKn-00027Q-0e for bug-grep@gnu.org; Sun, 21 Apr 2024 12:46:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=proton.me; s=protonmail; t=1713717983; x=1713977183; bh=A4Y06umtfCkbiBumMXn/9/bmxuBLYBTSAaTR+7GuXUU=; h=Date:To:From:Subject:Message-ID:Feedback-ID:From:To:Cc:Date: Subject:Reply-To:Feedback-ID:Message-ID:BIMI-Selector; b=M1mpjO5Uz4U9xdS2CTvyutAMbdPJyXO8+sfwLKusuU5DGhxQTZdt4jfqmK1/40qB9 MnF5lha17t7jIC5vY02cQmalYiQk46kW34y3PYPDdBFirvtzZPKDofovD7Np6F1Xc7 HlmXmKjL0Xr9zqzc+nwh3OsDnHj7Y499S4CDNNUsSHmulQXjDCO5VxxQxaZvRX2gwW T2KcvQQ9ToZqJ+ZyYr9zZrId1iZ7nUMcs3F8F0r6jBV7Wq209PFQ6VfheZmxeb3rHo a07gDOGJvuJXEPHdRVS1i/9rHFqKFv6a8zAsR+KlQ4MF+DSXzGKEili39zjGzNaqnw hPH8JYKzZxRVg== Date: Sun, 21 Apr 2024 16:46:17 +0000 To: "bug-grep@gnu.org" From: Mary Subject: Option to grep into compressed files Message-ID: Feedback-ID: 107467773:user:proton X-Pm-Message-ID: 6f9ee145bc8d83ce2913fea35fb99e864c33b858 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.70.40.132; envelope-from=marycada@proton.me; helo=mail-40132.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 22 Apr 2024 02:51:31 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Hello, I added an option to grep that filters files through a specified program. T= he main purpose for that is to uncompress files using the zcat (or `gzip -d= `) command, or an equivalent for another compression format. It works like this: grep -j zcat pattern textfile.gz [textfile2.gz...] (I chose `-j` for no particular reason. Any unused letter could go there.) This will spawn a shell and execute the given command (zcat), which will re= ceive each file through stdin and its stdout will be used in lieu of the fi= le. Any valid shell command can be used instead of zcat. This is better than the zgrep commands provided by the gzip, bzip2 and xz p= rojects, because it supports all of the options, including `-r`. It can also be used with arbitrary commands, like less popular compression = algorithms or even commands unrelated to compression. I read at https://www= .gnu.org/software/grep/devel.html that there were plans to add `-Z` and `-J= ` options for gzip and bzip2; my implementation can support any algorithm. The problem I see with that though is that it would add a shell command opt= ion to grep. This is longer to type our than `-Z` or `-J`, but it also prov= ides a shell access to anybody who can control what options grep receives. = In practice, I'm not sure how serious that is, but I thought it would be us= eful to point it out. I have a patch that I can send. I believe my patch is trivial, the only par= t that's longer than 3 lines is a simple fork-exec pattern (40 lines includ= ing whitespace, but they're the same you've seen in plenty of other program= s). I know there are specific requirements regarding copyright, and I don't= want to cause problems about that (I can't sign the FSF's documents). May = I send my patch? -- Mary PS- This is my first time using a mailing list, please let me know if I'm d= oing something wrong! From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 22 03:47:05 2024 Received: (at 70511) by debbugs.gnu.org; 22 Apr 2024 07:47:06 +0000 Received: from localhost ([127.0.0.1]:45633 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ryoOF-0005ZL-Uw for submit@debbugs.gnu.org; Mon, 22 Apr 2024 03:47:05 -0400 Received: from mail.cs.ucla.edu ([131.179.128.66]:56380) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ryoOC-0005Yy-Ty for 70511@debbugs.gnu.org; Mon, 22 Apr 2024 03:47:02 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 91F7C3C011BD8; Mon, 22 Apr 2024 00:46:39 -0700 (PDT) Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavis, port 10032) with ESMTP id UM7knFQRPCAg; Mon, 22 Apr 2024 00:46:39 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 3D2513C00E40A; Mon, 22 Apr 2024 00:46:39 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu 3D2513C00E40A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1713771999; bh=QeTrgi3iqv2IOnrw1L5oNfcv1aYwBMzAYbSjsTzSluc=; h=Message-ID:Date:MIME-Version:To:From; b=Or3iM6kaXrxrCgYTD0ytfKtF4IVXvr/WOD/Tnto+aWhPta7MX7TrfJHJrGcTwrhWf 9JUv7y519BcEbyj3ckd4pcbL6035VIwhCTmdC5IXu/6K9glomANxPmmdOQm0tbw2IP UtIDyVdIDk4oHxBSgLlOnStAhltEiBlKetuSg+o837hs5wX5Cna/HSB5SDpBXs39W4 P8YMoSeoqtTdoGGrLb8vol0lbrsbU9EZWmVBiORWQioinhdOxqerDKRESjaJKbJMFj xvMNjmKAhazRxW1qzT9l2cficRK47Bhgz3r3T6GrNOueh8Km6d4vkhwWgsTCn7BE3/ dOnA8hh7tYQiw== X-Virus-Scanned: amavis at mail.cs.ucla.edu Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id oIiDEwl75a7I; Mon, 22 Apr 2024 00:46:39 -0700 (PDT) Received: from [192.168.254.12] (unknown [47.154.17.165]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id 1F18D3C011BD8; Mon, 22 Apr 2024 00:46:39 -0700 (PDT) Message-ID: <9102c0b5-ce68-4435-84d2-f257dbc363c0@cs.ucla.edu> Date: Mon, 22 Apr 2024 00:46:38 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: bug#70511: Option to grep into compressed files To: Mary References: Content-Language: en-US From: Paul Eggert Organization: UCLA Computer Science Department In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 70511 Cc: 70511@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Thanks for the suggestion. You're right, this would be better than zgrep etc. I have some qualms though, as the new option would increase the attack surface for 'grep', in that you could then execute arbitrary code by passing certain options to 'grep'. Is there some safer way to get what you want? From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 22 10:01:04 2024 Received: (at 70511) by debbugs.gnu.org; 22 Apr 2024 14:01:04 +0000 Received: from localhost ([127.0.0.1]:46072 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ryuEC-0004oV-2S for submit@debbugs.gnu.org; Mon, 22 Apr 2024 10:01:04 -0400 Received: from sonic318-20.consmr.mail.gq1.yahoo.com ([98.137.70.146]:40201) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ryuE8-0004n8-Rp for 70511@debbugs.gnu.org; Mon, 22 Apr 2024 10:01:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1713794437; bh=Jdajsc0dghvY7HSDl6T8p7euV6WiGvgwcD3fvZJH5bA=; h=Date:From:To:Cc:In-Reply-To:References:Subject:From:Subject:Reply-To; b=N7Mljz3GxcWCC7XiP4Makci++MToMr3xBtMggwhI5NPD3jbFD1l2iasdw2yd41ZnTp7mIjg0SE/DBAkosBm0r/JUaAPS4JxxmXRGTf5O8PBgT2hSI6OcsLE2x9iEhEDkPZnWu/T4U3Kxm5DSAW64R1sQPf+Rl/oQuAPGFQh1Rbj4+KqdnyysSIEjTpntDkpDVkQEshBe2FZ16tC6kXex47fqRM6OQQOeEaFwpBt8FXE9eSAzdxWUvOXl5LNAfCRZrdbHNko4AQbcvbcim9cbS76ilNL5WYbHxVch6laPdvnWmtt4uf7+KE5xS2+pVV0wnglfBkHdcZVyDDOdLnhGyg== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1713794437; bh=QRBvJ6khEaK/ZZK9fgNwjdRKNGPJC5knyGVru1taj4R=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=Zc+lRbCTQmap48m373p5ub8ZFZfCgyNNHkhvoN1iQVtV5OBLdfl03k6FgXEWcSo3HeCGs153ENJ7a/ozVl3N+DQ8sy9EU/WxanNxeoTXVHodkq7W9WK/X5dS8dE9mlS4qEbJblHxQRS56wa9aL1AD5i5CTmFy13iI393+qMdMjrPw9cafENdtJ4kqDn0WL0sLmzwX1qV4RubxE7wWXg4lYO5hdnPK33rokncbvzcTBcP6wWtFPhBuKS3Mm4MyFj4LASvb7GJX6Sr2H4sMaqYo3Y6WU0Q80PYAdKkPiviH1Xl4LaanLUnU2SEsXVBAHN7TSmO2fUz4g9g5D16TDqhkw== X-YMail-OSG: UaIt3mEVM1nKvYZQdXqCGt54Gu8NJ4xvGcYYD4F.i6YjwS42hea0zNFklXigMOw wfMtGo7DSj7W0oQhv9EJ6qH0yJivFMNcOr_Mv7iifOSY.UrEjxuGKK9jk.6nbO.2yQzcIhJj2mZa hnabHC3yb_Ci7pzMk6Y.VucKBUZLJ1CtPx_jTgRmDY.JsRolvQnXHgBo9c40NdDQMB5PB6ynb72t sVDb8Argi.LcDlg7xlCwVk1Dq182UDLdc_jzM1fzAqqYYEctfvE6B_xoipiB1WrbtHaG7666_zIo VcjNzOaK5BS9KdfhZPd7B.qt7Z3IWkMjrxRxP_Lkx0Ki75WcVM9hs3K5VvLBI3UMwU2BaHXp6xuD Hb0YeCVHBnljI7E_anCy8pRjtl1CZB0cpM.h.cj8jhqm_vYdV3ZXVW0AvCIDUAkHZQHeRwhY37kg WSi84ih4ER3DE2YMAh7H7jV9GHvOyKzWRe7djRgT4wn67266G027FSw3JKnuKjjoy3Zemd0daW5N 8DNmSvILcORQmzQbjoMQEfa6yRgXke5IXNNF_FiZZeWpnzz9zjEoYPJ58UOFuInWI_MYbkGN1JCm vamsCBX_82xyDiW6OUyC17W134HruIFDLY4SVwjWtMUlzTbPBvBaRej2qW3PR.LChVegl8KVGXA1 1fDDaaeC6POrJa0x5fjIPpQfjySTf85WEjhKtwbsenwCEEfvlia.0EqSk2GxJgtI.A3QaM0dXV50 2vpBrN2XZMxu2xJEJtoWkKGpQCvQT5VqN9_..eL274Wb.CBjydHamvpm8J3p.wU9h_E5CMYoeSeT QWFGjaknyKm8Q9dj0H_pml7HHakLdduFYyzZILa4ivimP3TAjxxC.lfofHPQ4EKxkXFLFJkxEUcL jYQGneVDYPnsu4NRpv2n9gj5zqgIkX5tsDBUhkBJwU5247qjRpwgWxpxX8nOBazxhkaLokBybNfl bCUySmjPoK2DN5eQapbIwHtUOt3qPdPWdng2wQ5XmG3FxFgHgr5xNpbpr3U3qxO5.u5pb1Cky3dJ 2YvzRvEj6MHEcVcrTxAe48orxz6fIyUeiElXBvbeS6CDC1KhlNJr1Tz1p_iA2oVx6PkYTj.7RGle LyUohNLCZ.sSgKWWmj7BIfsqevqlkVctD9L8AoTfhHqp.g521RX3WAwgWnSyWCb5lymSt_CL5smX _uHrD1a_NxS1_D5RPUTTCYsE4A7k2kZJjAdQ05zsqLvrN1VRe0lt3tNZ4PGf6s9QCV1tDnGnnrRn 7F4iZBsLgJ9Dj1UnHJ.y388.0jdXVUFOcDueVp62laGqtwxR1C5COZsKgdMeQexAq5qfOtgqjk7A 5fiZfb0awLrCfW0J.zySFeeYuRtkcKqfHN2cho7CQtapzPXdT9aOF2IyQkgHGu2XVhWeH_DJspM7 8d1ZJ7_Nw.xNKFX0L73Fq1srmmmkbgvXS9GsT2xjch3mEhIBK.6gYqEJ3YgX1qbydNR87WXyv7xy _VtovIrwFsWQT_kFq.1hmEq4v8CoG5snVx0w4YtUQe_QuOrCxSv3thg.dWDxHj57y8NcAmu6W6cn hPP6TyuZP4bwv.uM3DR0GJc96mMyeYROBRR425GbG_KtTL5sPqzMudoujl46.Kh01Xana3K_2fxU ZULCSDuppSI4V2OtWb2_Ljj6PJ5h65tlw3vHknCTMf7OFU7OOvaahBER9c.Sn007l0y6YGyLTuUd rEeF5WjOYWn6HG0aEqaWSL2nSjvY2HqVLI9e6_cATOTYZatnYX8yctQAxh0UUg3c8w75_1TaPatX wXIKFGvehkymrrmTieakQCnGWXL_ckHG5IU.tJXA3HQJhZ.TnrlNJfIcrTq3jDb2BYqmylyyZLSN UY53qSeBwny0StMJ5EH_i6Ft07hz2MdKWQp8ZxVVhDLjOCKWPiHpJNq89i877c4NLSV38koPH6cL ZcZ__aMN6OGT8TrpwuoGCAU8y.qgvz3PYsafVF4nbzenJu3qAFdE1ie.0hNCY_.vDJTrrZa8HWAc yEG9ROmBe0yPu2PSEtm1cC7_FCxakehPlO.JfHa0Gq.3eIy42TGdjW6Zpdx_C6MTJLJifLOGpVJp Ec85r.2BVqX2Kn2AAGnSwAstpw5BGQO7aer5BH_RchwFDEe2Skopa132POQmzOwcdoIL2F6EsW60 0cZVggFT2n6wKMv8HX8Um98qVFqfOYIVqjLVGuzT58pWEs70O5ZGXNhTHOhILpCjPq2KuQEOLNQb swgx2U78ZnKzGaIPCLtqCGrow3LWC6NOsCbnpSo0B8vq6ScLrgGJHZTQ3 X-Sonic-MF: X-Sonic-ID: 44db3bfe-71e3-4bd7-83da-9bfeefc5f359 Received: from sonic.gate.mail.ne1.yahoo.com by sonic318.consmr.mail.gq1.yahoo.com with HTTP; Mon, 22 Apr 2024 14:00:37 +0000 Date: Mon, 22 Apr 2024 14:00:32 +0000 (UTC) From: "David G. Pickett" To: Mary , Paul Eggert Message-ID: <616436294.1780884.1713794432436@mail.yahoo.com> In-Reply-To: <9102c0b5-ce68-4435-84d2-f257dbc363c0@cs.ucla.edu> References: <9102c0b5-ce68-4435-84d2-f257dbc363c0@cs.ucla.edu> Subject: Re: bug#70511: Option to grep into compressed files MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_1780883_916968533.1713794432434" X-Mailer: WebService/1.1.22256 AolMailNorrin Content-Length: 3045 X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 70511 Cc: "70511@debbugs.gnu.org" <70511@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) ------=_Part_1780883_916968533.1713794432434 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable One supposes that if the file extension is not trustworthy, one can taste = file like the file command, and use libraries like the gzip libraries to ha= ndle gzipped files as a stream.=C2=A0 There are so many others: zip files c= ould be treated like directories and all the files in them that match the g= lob could be searched, and then there is bzip2, 7zip, ....=C2=A0 It becomes= a popularity contest!=C2=A0 One can do all this with shell scripting, and = leave poor old grep out of it! On Monday, April 22, 2024 at 03:48:14 AM EDT, Paul Eggert wrote: =20 =20 Thanks for the suggestion. You're right, this would be better than zgrep= =20 etc. I have some qualms though, as the new option would increase the attack=20 surface for 'grep', in that you could then execute arbitrary code by=20 passing certain options to 'grep'. Is there some safer way to get what=20 you want? =20 ------=_Part_1780883_916968533.1713794432434 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit
One supposes that if the file extension is not trustworthy, one can taste file like the file command, and use libraries like the gzip libraries to handle gzipped files as a stream.  There are so many others: zip files could be treated like directories and all the files in them that match the glob could be searched, and then there is bzip2, 7zip, ....  It becomes a popularity contest!  One can do all this with shell scripting, and leave poor old grep out of it!

On Monday, April 22, 2024 at 03:48:14 AM EDT, Paul Eggert <eggert@cs.ucla.edu> wrote:


Thanks for the suggestion. You're right, this would be better than zgrep
etc.

I have some qualms though, as the new option would increase the attack
surface for 'grep', in that you could then execute arbitrary code by
passing certain options to 'grep'. Is there some safer way to get what
you want?




------=_Part_1780883_916968533.1713794432434-- From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 23 11:33:12 2024 Received: (at 70511) by debbugs.gnu.org; 23 Apr 2024 15:33:12 +0000 Received: from localhost ([127.0.0.1]:52826 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rzI8r-0002Tj-Kn for submit@debbugs.gnu.org; Tue, 23 Apr 2024 11:33:12 -0400 Received: from mail-4318.protonmail.ch ([185.70.43.18]:64715) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rzHxr-0006pj-Jy for 70511@debbugs.gnu.org; Tue, 23 Apr 2024 11:21:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=proton.me; s=protonmail; t=1713885684; x=1714144884; bh=sl1hsaOkeoZR9UNFdGR0g+Nd9FwOE9SUmSV5tXRjTe4=; h=Date:To:From:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=JCuPq9fmJEKH6x5hC/nv1ut0P4k+VWo80W1byg0BduuG4BVA0FbLhm4vkFs/FLo/w BTgjLgXWnwEXJ8m7pom6u/TE7bYz3uc/DbdzX8IRhC814HzH/+/3B4cEBEaI9iUw5E v0396Jzp/EpwJY3zmFPKcZCDF2umFekWvDGFgtRdawL8vidhbC5myiq2eGsCy+hNnU IZH14kzhQNJeQ91Yl6FWxs10w75lSFLck8mhvnKjQnJtaXqob3iXVvqtRISoB6E+oD KPH8f2UVxpilSOkMKMqq+dO7m/XHey9l45Ho/0yFDcuqmzlXsg+viljpuwD6cfWb6d R7DEJ5AtfguFQ== Date: Tue, 23 Apr 2024 15:21:15 +0000 To: "David G. Pickett" , "eggert@cs.ucla.edu" , "70511@debbugs.gnu.org" <70511@debbugs.gnu.org> From: Mary Subject: Re: bug#70511: Option to grep into compressed files Message-ID: In-Reply-To: <616436294.1780884.1713794432436@mail.yahoo.com> References: <9102c0b5-ce68-4435-84d2-f257dbc363c0@cs.ucla.edu> <616436294.1780884.1713794432436@mail.yahoo.com> Feedback-ID: 107467773:user:proton X-Pm-Message-ID: 2529f5ee866ca22251bb61991e0dffa17fde0241 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 70511 X-Mailman-Approved-At: Tue, 23 Apr 2024 11:33:07 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) > Thanks for the suggestion. You're right, this would be better than zgrep > etc. >=20 > I have some qualms though, as the new option would increase the attack > surface for 'grep', in that you could then execute arbitrary code by > passing certain options to 'grep'. Is there some safer way to get what > you want? There is still the possibility of including the respective compression libr= aries directly in grep and using the `-Z` and `-J` as proposed, but this wo= uldn't allow to use less popular compression algorithms. One possibility, but I'm not sure what it's worth, would be to give grep a = special arg0 to enable shell commands, like `jgrep zcat pattern123 file.gz`= . But I'm not sure if it's worth the trouble. > One supposes that if the file extension is not trustworthy, one can taste= file like the file command, and use libraries like the gzip libraries to h= andle gzipped files as a stream.=C2=A0 There are so many others: zip files = could be treated like directories and all the files in them that match the = glob could be searched, and then there is bzip2, 7zip, ....=C2=A0 It become= s a popularity contest!=C2=A0 One can do all this with shell scripting, and= leave poor old grep out of it! The reason why I wanted to do this in grep directly is because it's difficu= lt to implement this with shell scripting. I noticed that neither zgrep, bz= grep nor xzgrep support the `-r` option, among others, presumably because i= t's too difficult to implement in a portable way. I made my patch use a shell command specifically to provide maximum flexibi= lity with minimum maintenance cost. But it does open the door to security r= isks, so I understand if it's not worth adding to grep. From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 23 13:07:51 2024 Received: (at submit) by debbugs.gnu.org; 23 Apr 2024 17:07:51 +0000 Received: from localhost ([127.0.0.1]:53331 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rzJcU-00084W-2r for submit@debbugs.gnu.org; Tue, 23 Apr 2024 13:07:51 -0400 Received: from lists.gnu.org ([2001:470:142::17]:43286) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rzJcO-00082l-Gb for submit@debbugs.gnu.org; Tue, 23 Apr 2024 13:07:47 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rzJc2-00016O-9P for bug-grep@gnu.org; Tue, 23 Apr 2024 13:07:22 -0400 Received: from mail.oetec.com ([108.160.241.186]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rzJbz-00077K-O7 for bug-grep@gnu.org; Tue, 23 Apr 2024 13:07:21 -0400 Received: from [172.16.35.3] (pool-99-253-151-152.cpe.net.cable.rogers.com [99.253.151.152]) (authenticated bits=0) by mail.oetec.com (8.17.1/8.17.1) with ESMTPSA id 43NH7AUL079737 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Tue, 23 Apr 2024 13:07:10 -0400 (EDT) (envelope-from dclarke@blastwave.org) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=blastwave.org; s=default; t=1713892030; bh=nBiP1ncFQ6Gb9si/SGtn4XNMgfoZJT/vu1PSHBbtDXU=; h=Date:Subject:To:References:From:In-Reply-To; b=p6zha9uqwOM1XmZa3d6Ax/G/4vjtrQqP6hSFzzkJcSAcRTzhSyn+2S46ABvqT5Ecl yt7+hT8uR48hRqUOQOAWU6lvlSbPpqTlSM2xKZtw94NqWPawQWmmrY+hO4ULQob7fU tKoS4LdVvkeBXmUf947eVJ37h+OEgq6fyjI74FIdb06VNJZMd/KACdWRiWEC0caSUo d7xPakBUrypryvdui0dvtnvoMBUFxrmGJgP0FEv8BIZOlCIYUmgFb+Dl3Wo1dVYPcV HfQLfsklWEcGm0ban41LwNLkmPf7pd/IoAzTZUDd8uouj/BktkHSS7LhULYHvcgT// lvCIRkHWIBy3w== Message-ID: <15c6ed98-3bb4-4d03-89aa-4379b2c6df43@blastwave.org> Date: Tue, 23 Apr 2024 13:07:10 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: bug#70511: Option to grep into compressed files Content-Language: en-CA To: bug-grep@gnu.org References: <9102c0b5-ce68-4435-84d2-f257dbc363c0@cs.ucla.edu> <616436294.1780884.1713794432436@mail.yahoo.com> From: Dennis Clarke Organization: GENUNIX In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-oetec-MailScanner-Information: Please contact the ISP for more information X-oetec-MailScanner-ID: 43NH7AUL079737 X-oetec-MailScanner: Found to be clean X-oetec-MailScanner-From: dclarke@blastwave.org X-Spam-Status: No Received-SPF: pass client-ip=108.160.241.186; envelope-from=dclarke@blastwave.org; helo=mail.oetec.com X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 4/23/24 11:21, Mary via Bug reports for GNU grep wrote: >> Thanks for the suggestion. You're right, this would be better than zgrep >> etc. What happened to the old UNIX concept of Do one thing. Do it well. Then stop. To grep a compressed stream of bits you just pass the decompressed bits along a pipe. Done. -- -- Dennis Clarke RISC-V/SPARC/PPC/ARM/CISC UNIX and Linux spoken From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 23 16:49:26 2024 Received: (at 70511) by debbugs.gnu.org; 23 Apr 2024 20:49:27 +0000 Received: from localhost ([127.0.0.1]:54273 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rzN4n-0003Zp-HX for submit@debbugs.gnu.org; Tue, 23 Apr 2024 16:49:25 -0400 Received: from fhigh3-smtp.messagingengine.com ([103.168.172.154]:42739) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rzN43-0003Pz-C8 for 70511@debbugs.gnu.org; Tue, 23 Apr 2024 16:48:34 -0400 Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailfhigh.nyi.internal (Postfix) with ESMTP id 045E11140169; Tue, 23 Apr 2024 16:48:09 -0400 (EDT) Received: from imap53 ([10.202.2.103]) by compute2.internal (MEProxy); Tue, 23 Apr 2024 16:48:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastmail.com; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1713905289; x=1713991689; bh=xMikUWN0Hv 8IVFr49ZMaM8vblB1K6n5x4y2h8G02GgU=; b=VqLXtLHzc6uweXHiSXzNfUOl1h xyrVafub6J2rcbpkOeDTmZceZbz33l0T34u4PEi7WT5BKBJagNLKxO2NOA8zCiBY ejeOlHOzLfJqwvddiQYWKtkwMS1pauRy/UaNxHD0imjW0Mj4eFYF1TiF1S6CgbdS qQkW2Ax3NpGzM3KIvmxdJuYmh1O7lzWuUzOEnqV2bhS5Y7uW5Fl8WfsQEAKIh7n9 7ppjt93pwCn20d2ruXArRXgPPEXNW4+bhBeefcJpGk2h4qHNsOZivSNjbHK52RCn aChiZT/Neoozs+5NR84c0AG52+PN3SyZJEE6+KzGOC1KyLKh/QdtSwgi5/zw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1713905289; x=1713991689; bh=xMikUWN0Hv8IVFr49ZMaM8vblB1K 6n5x4y2h8G02GgU=; b=LBupVMnAjjE4BJrpF5KoWyDVOwif6lcUvX+hY9otoEtn MHWtZnuQKz2RWvznENBjCfATnNaYExAow6vs00joly95ZqAeXDutF+c6b+/cwMhk dsCBXJku9Fgg8iEKS/8So/kTKSa2W2er9xnqbGW8Ox03ZdS+MU2Fe+bBH3grrxLE Sb/GtBIu4R5A+/SM7U+AvSZTk2VszYUs4OpBIH9Shl7qmmZAGcmP8//pCpEPd9pO Srb70TUAuYnzNml4glGVGSyzRAImhD6nHgK3gg7qrU35SRAbPfTutSgxTnJHOyUn qMIZ32raZz0DoqU42xE89ul51RuoKWPzpOdbA9ruSw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudeluddgudehhecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefofgggkfgjfhffhffvvefutgesthdtredtreertdenucfhrhhomhepjhgr tghkshhonhesfhgrshhtmhgrihhlrdgtohhmnecuggftrfgrthhtvghrnhepheevffeltd fhfefgkeetheelfefhvdfhvdffudfhjedtudelueefheeiheeikeevnecuvehluhhsthgv rhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepjhgrtghkshhonhesfhgrsh htmhgrihhlrdgtohhm X-ME-Proxy: Feedback-ID: i982440cf:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id AE17F3640072; Tue, 23 Apr 2024 16:48:08 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.11.0-alpha0-386-g4cb8e397f9-fm-20240415.001-g4cb8e397 MIME-Version: 1.0 Message-Id: In-Reply-To: <9102c0b5-ce68-4435-84d2-f257dbc363c0@cs.ucla.edu> References: <9102c0b5-ce68-4435-84d2-f257dbc363c0@cs.ucla.edu> Date: Tue, 23 Apr 2024 15:46:01 -0500 From: jackson@fastmail.com To: "Paul Eggert" , Mary Subject: Re: bug#70511: Option to grep into compressed files Content-Type: text/plain X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 70511 Cc: 70511@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Paul Eggert wrote: > I have some qualms though, as the new option would increase the attack > surface for 'grep', Agreed. Given the recent uproar involving liblzma being linked into ssh in systemd builds, resulting in a potentially very dangerous ssh compromise ... ... I would think that minimizing the attack surface on common commands by not linking in non-essential compression libraries would be a no brainer. -- Paul Jackson jackson@fastmail.fm From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 23 18:52:24 2024 Received: (at 70511) by debbugs.gnu.org; 23 Apr 2024 22:52:25 +0000 Received: from localhost ([127.0.0.1]:54818 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rzOzs-0007dl-6E for submit@debbugs.gnu.org; Tue, 23 Apr 2024 18:52:24 -0400 Received: from sonic317-21.consmr.mail.gq1.yahoo.com ([98.137.66.147]:43422) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rzOzl-0007bI-7G for 70511@debbugs.gnu.org; Tue, 23 Apr 2024 18:52:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1713912710; bh=nnMwpL0i1VpMfZV20sbmj4ugKJWJn65eWefoY2WB0oo=; h=Date:From:To:Subject:References:From:Subject:Reply-To; b=nmmyb6ASzIolFfuscVeKsejZyg1vP1KB4N3JG5t0694aODZdkjdzmp3wNNDQm34hk+FddNP43HYISXRy1bPm9bpAeS6jLmfSBFFg1qABQ+AbMzN957FT2d1Hx5lDN/rz2olCNX4x2Ct4x1/9Zg+Jx439lTwvAIbLs2Hzi5crclm5X3gEjGVu9m/l7/Oo72oqbs4NWIeeWbwk9i58J+xrMGHgMdVK2JSwcaTPaNN0huDLUjuiUDP4jrF8N6+2wUDxfjXW6FhMZsIKUPo+aogpcQ3yh45wXLX+wtp59qP95sA744Cp75VXB9PpptIDbAx3vYJyqtWosTvhHX1loS/VOA== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1713912710; bh=OVYqvCs6RCdJyxVDpSlZd1pfLZ4yNMcGi1lZQE3utHg=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=AYxg2AjBx/Zye00wAyopfo3C3jzX7TqUozO5si5BCWJokOG0y82LnoAQc9TMbSh5Ud8KnJ8y+0RZ2G4HWN9RxGTib7HH911aHz0bfva2EpSGF91ZhW55HEz9XF45N3GkxmU9s68pjrVZ/qskifoHWfFzTll2Cy9Ol9Tb6+VdyESd/Snhsr7kjWjsi+k7KgYc0lt7aY8WuZ6uz8rH3Xw0yWp4pb91b8HW95sYguBrq8N0dJ1bvOg3/rhHkoapvEj9OcPA7hGopEkGdLXxplyTvtguoVn0s7AsIp0NzJlO/efVz1MH+wcangY7uFEc0Tam70cAL2dDrbhMDSENuqnA1g== X-YMail-OSG: grOMdz0VM1nSlL_hp58uh6lcrejnl9_U2QiJKB7AFIVxBjFVbE9D9nBpury_zlK 7f06WbG64q3nZVMLC_.ZzKPSLIJUb0kXnX51IOWaUKaw6q9J4KBt7Xd7W.zDNKwDpCa.kiPjk5vh EGeegIdK.DumSasIE6LCYAqrEQtSMXjjJigWD09H9ybH_y3CYoCl9j8KTVFiGrg4ZZY5skqnNH.t pFAAqPdw.zxg6co3Lk88McA.JXV6nqP2cDM8iXUW6xs72EJWm2tT6acQRqsBGCc6vSqWWDTE1Baf 3HRHQXApRYbgLGsSjJw6Ze4Fwwk.3.8PlFsoZA48RvRWcc0hZANAM2oeYZeE750QJSEk8HnB0W.D ys69mOObyTiS7Ks.i8_B1bp4zOCJw0DiZAvVLXmoTeFLrQhhmnS.O8gYXCorLD5ZMFNA2jqJykz6 CyudgfV6LDrgQn3ImSICnBBVVmgu9eZ_DKBpa4aGYMzl4.5Qak.cyy7sjy0YQz7aw9PGX7bqvM3t p_BJS1vugTVWiTA4H0tPrwHhiGh2inwnu2zOZeuq1_u5yIp1IvO73t38TllQIQfGKoQ9rZB4oF0z k0fl.vUOCuKt7CH7t6VUsO_cEf.TCDq101P1GpSJSpK3WaTGq.gKBBGZyW2bsvqJAHoPUUOmFYo0 QE55bH3qUFDjpSD91dYcuaJ4BPaHNc.hv9OvPdPv29nHRiHD1aJZ161.q4xN1i3WIpJjcLy8Ozqt Q_gFlgwymV.qcAheHn3FkMb24reoosATV_CFnXhbilmtFTCHKYgrdOyiJWsYe4nWg5vB1bkdB48F _H1GqAmEHlXMrYK5syXdL.xINWUUXcfoTNr3awY8Pwf9QoM7cNrkUVvp0hYBCFkpiiVP9SWjRCz3 cf8PRxynkxjjLTYNJuPAJzwqUgtaucubzEJD6wg1tlnVMaJcg5AGn7U.2WCn3UaOfqShbihcDkRT Envl0thej38QMtQ7tzUg4hsIY8J67TciLYw90b2Vk1TRuk2v3lPHdJrsMtVO8yd.gld8Zptv6zk8 4hv_WRrqVjakg5LQyafXYw6YV4PAPeDtESkTokMxDexrGlji916nivHaZy6zTXvOsfps8ZJTOACf nNbG15MZzwGA98OVhF4Bhq90tyxx5puutZ73HDCmc.rSNN.BJpwNB4cb8GasboTSxBqWm3Axo8FJ xtFpbcFzqPP1Nm9y8xDOr8hmgEbR1CLvG5uWVmw8SOwuviCy6O9Z5LZax0hfbbX9dj2HGtKW0bpS Qu_KR9KpfVRAUHf6a1jzSm.fZenGVw8.HSz94d5uVPB5BHs6.Ml2rsDshOAzEajb6_NbYROQzOO1 EC2OCcEEkSRVypcbQkQj1jqzZeaDpwuGvEmNbcCXGqEiGfJ2vVdKwFPSzJh4b3aFeKAcgivhik.6 m_3nLI6DP8cHzzEX1EeQc_AcSmNNvBnSjf_edSR3eX2GYyNlBl9IltNMhWK_DfEkOvzWAHyPYBqQ by90uuteX_SaPK5_fk2IO11yHoWZ9mkURA6yS9rUI9faDenUg5LlLS2L9YjiIQayJXOSGudBk4r9 LzNdKFdQjVEh9tNgGKxgfRQ4gvOg7lbRi9JGu_6sJEEvhR1XLv6j0shcigs5U3_4AKizFendOum1 qPeu7xsx0TM5rS1sQcaWW6rIMsep9fqPFxSMgDawiuhKrbnETCiQ6paLVrpNiUifMP9CXYBVI0Oc hkkWciThpPfMk6iiJ4ZLLHElHJSpGhwL2lHK50ALc4auvlGlJ_rCuZD57dku9GMJDXb1J63Z9Lco QyUFiyw2Wwek9DELvNGtqM3Mdj7X9T0.ckp9C_GGHcFC.3qZXlhBTv7Hs.GxSBxIWwX.EgoFCaCK 3Yt7v0eqY5hpqLPJhndO_CMzWHz7YLw_qdc4c8re8YhVA2Fpai0Urh1ldjs.js1V6yXUvQCUcEHd 4X7UF5DZxE_mORXh7AIEBXBlwY_Lx3C4KgK2H2oNxDjQJ12vYbB51n.HHgmcMKmLinOsuvaN8Ucx aT8ZOrivWI3QjVQ6jWCKsZ.ACbqt_GRc00wbGt7asQ.DgwKLJKAtFVFPn0UR3hFe0UoPiV6rltVQ yNff_BCRwj48JLhi.76vmeSIJHWb.WYfvLJi2J26y7YIAAveQceYuFK2lLYeSNrBryJt64Hk.wzk nlrYDTIpYROXtnSp7Znapc67YAA2SWL0MluEjhEXr5QoNj1D_vMR6QGoYVJy5lavp19e9GX0QLwn JPp3c4YjVJN_GhXjI56kt2V_MM4VLEn9q X-Sonic-MF: X-Sonic-ID: 75ab894f-2de1-44f0-b7e0-e8b3984e7529 Received: from sonic.gate.mail.ne1.yahoo.com by sonic317.consmr.mail.gq1.yahoo.com with HTTP; Tue, 23 Apr 2024 22:51:50 +0000 Date: Tue, 23 Apr 2024 22:51:45 +0000 (UTC) From: "David G. Pickett" To: "eggert@cs.ucla.edu" , "70511@debbugs.gnu.org" <70511@debbugs.gnu.org>, Mary Message-ID: <544520814.2460213.1713912705107@mail.yahoo.com> Subject: Re: bug#70511: Option to grep into compressed files MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_2460212_239158558.1713912705106" References: <544520814.2460213.1713912705107.ref@mail.yahoo.com> X-Mailer: WebService/1.1.22256 AolMailNorrin Content-Length: 8648 X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 70511 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) ------=_Part_2460212_239158558.1713912705106 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Shell scripting can take file names in from a find or ls with 'while read'= , or by globbing 'for f in pattern', and examine them one by one, run 'grep= -q' to find out if the file or uncompressed stream from that file has a ma= tch, and if so 'echo' the file name out, or if you want lines, it can 'whil= e read l' the stream out of grep to prefix each line with a file name in an= 'echo'.=C2=A0 It helps to juggle steams not file names, create steams not = temp files that have to be cleaned up and create delay.=C2=A0 In bash, some= times while read gets tricky as the variable(s) are local to the loop, so s= ometimes a parenthesis wrapper helps.=C2=A0 Both ksh and bash also have the= nice '<(command)' feature to turn streams of stdout into input file names,= and '>(command)' for output streams to file names.=C2=A0 Bash has so many = nice tricks I often google for them, like if recognize pattern.=C2=A0 If yo= u do not trust extensions, you can '$(file filename)' to find out what you = have in hand: $ echo $(file .profile).profile: ASCII textdgp@dgp-p6803w:~$=C2=A0 On Tuesday, April 23, 2024 at 11:21:26 AM EDT, Mary wrote: =20 =20 > Thanks for the suggestion. You're right, this would be better than zgrep > etc. >=20 > I have some qualms though, as the new option would increase the attack > surface for 'grep', in that you could then execute arbitrary code by > passing certain options to 'grep'. Is there some safer way to get what > you want? There is still the possibility of including the respective compression libr= aries directly in grep and using the `-Z` and `-J` as proposed, but this wo= uldn't allow to use less popular compression algorithms. One possibility, but I'm not sure what it's worth, would be to give grep a = special arg0 to enable shell commands, like `jgrep zcat pattern123 file.gz`= . But I'm not sure if it's worth the trouble. > One supposes that if the file extension is not trustworthy, one can taste= file like the file command, and use libraries like the gzip libraries to h= andle gzipped files as a stream.=C2=A0 There are so many others: zip files = could be treated like directories and all the files in them that match the = glob could be searched, and then there is bzip2, 7zip, ....=C2=A0 It become= s a popularity contest!=C2=A0 One can do all this with shell scripting, and= leave poor old grep out of it! The reason why I wanted to do this in grep directly is because it's difficu= lt to implement this with shell scripting. I noticed that neither zgrep, bz= grep nor xzgrep support the `-r` option, among others, presumably because i= t's too difficult to implement in a portable way. I made my patch use a shell command specifically to provide maximum flexibi= lity with minimum maintenance cost. But it does open the door to security r= isks, so I understand if it's not worth adding to grep. =20 ------=_Part_2460212_239158558.1713912705106 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Shell scripting can take file names in from a find= or ls with 'while read', or by globbing 'for f in pattern', and examine th= em one by one, run 'grep -q' to find out if the file or uncompressed stream= from that file has a match, and if so 'echo' the file name out, or if you = want lines, it can 'while read l' the stream out of grep to prefix each lin= e with a file name in an 'echo'.  It helps to juggle steams not file n= ames, create steams not temp files that have to be cleaned up and create de= lay.  In bash, sometimes while read gets tricky as the variable(s) are= local to the loop, so sometimes a parenthesis wrapper helps.  Both ks= h and bash also have the nice '<(command)' feature to turn streams of st= dout into input file names, and '>(command)' for output streams to file = names.  Bash has so many nice tricks I often google for them, like if = recognize pattern.  If you do not trust extensions, you can '$(file fi= lename)' to find out what you have in hand:

$ echo $(file .prof= ile)
.profile: ASCII text
dgp@dgp-p6803w:~
$&= nbsp;



=20
=20
On Tuesday, April 23, 2024 at 11:21:26 AM EDT, Mary= <marycada@proton.me> wrote:


=20 =20
> Thanks for the suggestion. You'r= e right, this would be better than zgrep
> etc.
>
> I have some qualms though, as the = new option would increase the attack
> surface for 'gr= ep', in that you could then execute arbitrary code by
>= ; passing certain options to 'grep'. Is there some safer way to get what> you want?


There is still the possibility of including the respective comp= ression libraries directly in grep and using the `-Z` and `-J` as proposed,= but this wouldn't allow to use less popular compression algorithms.

One possibility, but I'm not sure what it's = worth, would be to give grep a special arg0 to enable shell commands, like = `jgrep zcat pattern123 file.gz`. But I'm not sure if it's worth the trouble= .



&g= t; One supposes that if the file extension is not trustworthy, one can tast= e file like the file command, and use libraries like the gzip libraries to = handle gzipped files as a stream.  There are so many others: zip files= could be treated like directories and all the files in them that match the= glob could be searched, and then there is bzip2, 7zip, ....  It becom= es a popularity contest!  One can do all this with shell scripting, an= d leave poor old grep out of it!



The reason why I wanted to do this in grep directly is = because it's difficult to implement this with shell scripting. I noticed th= at neither zgrep, bzgrep nor xzgrep support the `-r` option, among others, = presumably because it's too difficult to implement in a portable way.

I made my patch use a shell command specifi= cally to provide maximum flexibility with minimum maintenance cost. But it = does open the door to security risks, so I understand if it's not worth add= ing to grep.

------=_Part_2460212_239158558.1713912705106-- From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 24 02:19:22 2024 Received: (at submit) by debbugs.gnu.org; 24 Apr 2024 06:19:24 +0000 Received: from localhost ([127.0.0.1]:56695 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rzVyP-0006vs-Su for submit@debbugs.gnu.org; Wed, 24 Apr 2024 02:19:22 -0400 Received: from lists.gnu.org ([2001:470:142::17]:56610) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rzSLK-0002vu-GX for submit@debbugs.gnu.org; Tue, 23 Apr 2024 22:26:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rzSKw-0007y8-6n for bug-grep@gnu.org; Tue, 23 Apr 2024 22:26:18 -0400 Received: from mail-40132.protonmail.ch ([185.70.40.132]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rzSKr-00086N-W8 for bug-grep@gnu.org; Tue, 23 Apr 2024 22:26:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=proton.me; s=protonmail; t=1713925569; x=1714184769; bh=n1tSehOnrqIpoOFo2S0AnxO3YH7PPIB3HYB0NOnuCvQ=; h=Date:To:From:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=l/w6j3KhTNwVo70pdAxL0R3iesvIYEqR4cS/Fukwyq4cc6itDgSQLVnvo7J4RTcwP Iwuz3x83NJRLI87InQJqkHUrnmNmgTa5Wd/uFPv02GmjbcGaImRw1U9Ae4L+Ri27kC T9Hz7wJHUEIgbrgC6gTLexWYnHCu5BJtgaU+2kGKBNz1DvmL973pae7n/LRuueVV2j SptLnfOVh5J7yzD7jTMJ1OxHlPMf2Gejsjsu1UjN8hl8LtxAnmog7RYYTfYgtTXLrV tqn6wcXWwGYyuW9ScRagXbepSO11Umkc6tjwGINpTjvjHlxvQgp+MVfhu7UPTBWFD1 0VWl0ITG/SfZA== Date: Wed, 24 Apr 2024 02:26:03 +0000 To: "bug-grep@gnu.org" From: Mary Subject: Re: bug#70511: Option to grep into compressed files Message-ID: In-Reply-To: <6627DE4B.5080802@gnu.org> References: <9102c0b5-ce68-4435-84d2-f257dbc363c0@cs.ucla.edu> <616436294.1780884.1713794432436@mail.yahoo.com> <6627DE4B.5080802@gnu.org> Feedback-ID: 107467773:user:proton X-Pm-Message-ID: 35485bdb9c2e0211c286733efe901a8f4b082d10 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.70.40.132; envelope-from=marycada@proton.me; helo=mail-40132.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 24 Apr 2024 02:19:16 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) > Do you know zgrep from zutils? TIL! My system does not come with those by default, and instead provides a = `zgrep` that is a Bash script supporting only `gzip`. Are those the generally recommended tools to use? (I'm not sure why `zgrep`= /`bzgrep`/`xzgrep` would be provided by their respective projects, given th= e existence of this project.) > What happened to the old UNIX concept of >=20 > Do one thing. > Do it well. > Then stop. >=20 > To grep a compressed stream of bits you just pass the decompressed > bits along a pipe. >=20 > Done. I'm not sure what's the threshold for that principle. GNU grep implements a= certain number of options beyond POSIX ones. I decided to send my proposal= because I read here: https://www.gnu.org/software/grep/devel.html that GNU= grep planned to implement the `-Z` and `-J` options, though I'm not sure i= f that page is still up-to-date. As for the piping mechanism, it does work for simple cases, but it doesn't = work well with `--recursive`, or `--with-filename` for example. There are w= ays to work around it with certain shells, but they tend to give long and c= omplex strings. They are generally better suited for ad hoc uses, and it's = difficult to make them portable. > ... I would think that minimizing the attack surface on common commands > by not linking in non-essential compression libraries would be a no brain= er. I agree with that. I only wanted to make life easier for the maintainers of= compression libraries. Perhaps it would be better security-wise to provide= the regular grep vanilla, but also provide on the side some "flavored" uti= lities like `zgrep`/`bzgrep`/`xzgrep` which would be compiled against the r= elevant libraries? From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 25 17:16:03 2024 Received: (at 70511) by debbugs.gnu.org; 25 Apr 2024 21:16:05 +0000 Received: from localhost ([127.0.0.1]:33320 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s06Re-0006au-St for submit@debbugs.gnu.org; Thu, 25 Apr 2024 17:16:02 -0400 Received: from resdmta-a2p-658371.sys.comcast.net ([2001:558:fd01:2bb4::d]:16512) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s06RV-0006YL-Tz for 70511@debbugs.gnu.org; Thu, 25 Apr 2024 17:15:50 -0400 Received: from resomta-a2p-647974.sys.comcast.net ([96.103.145.228]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 256/256 bits) (Client did not present a certificate) by resdmta-a2p-658371.sys.comcast.net with ESMTPS id 058ys4bIjW2hY06R7sIJN0; Thu, 25 Apr 2024 21:15:21 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20211018a; t=1714079721; bh=foEnnepnfd7YxOXlDbNLhLGBtkkCuY/pm8aYi96gRzc=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID:Xfinity-Spam-Result; b=WUhLlGohEstt6/AkVJoCsIgeIeSh3yQZb9FyVa6LPQ5LxNYUUbVhkTUOQhnYOUm3c xAEhkekNOrkdVxGs3157jmhF0OMphJ9kO/xD1BhEGB064AMnUhwPKoikkaDLIdLP8C pmJr1TZC6g0Spvl1ojP1wruUuDUcwQpPHAr8cU4vuFajb3wYM4HpKSqS5jxOrgkx+T bd8b9K37EDVW8/mZVQs0lfYxMgwebTy7hRey9M78B0HsPcx35IZ/CrpmVkv1C11pg4 rwxC95k7OpZBHkOkwdSksl0b8RNEC5Z4KTDQjuux8H0YNs7Rb6weTuEiZd5k7LAgD0 EI6CNdPwwz+6w== Received: from hobgoblin.ariadne.com ([IPv6:2601:192:4a00:430::508e]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 256/256 bits) (Client did not present a certificate) by resomta-a2p-647974.sys.comcast.net with ESMTPSA id 06R5sLqYNUSPC06R6sDJ1t; Thu, 25 Apr 2024 21:15:21 +0000 Received: from hobgoblin.ariadne.com (localhost [127.0.0.1]) by hobgoblin.ariadne.com (8.16.1/8.16.1) with ESMTPS id 43PLFJKh1509093 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT) for <70511@debbugs.gnu.org>; Thu, 25 Apr 2024 17:15:19 -0400 Received: (from worley@localhost) by hobgoblin.ariadne.com (8.16.1/8.16.1/Submit) id 43PLFJLH1509090; Thu, 25 Apr 2024 17:15:19 -0400 X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f From: "Dale R. Worley" To: 70511@debbugs.gnu.org Subject: Re: bug#70511: Option to grep into compressed files In-Reply-To: <544520814.2460213.1713912705107@mail.yahoo.com> (bug-grep@gnu.org) Date: Thu, 25 Apr 2024 17:15:19 -0400 Message-ID: <871q6tw3dk.fsf@hobgoblin.ariadne.com> X-CMAE-Envelope: MS4xfFwOZ+nUt6GTWZOFNkKFfjSD3h5kQ/WpNY1JaRJLteU7AnTXkZlaOP+RCQxM6JhBhHALNEgpgiXyAjDRS6TgzR2bH37Eoy0E9jimlNeBuhFvdJN0zgr6 82PShA0Ia51OzUuABAnOEG41k7MkagFboobWACB4NB6/v6QgeFKxRUGi42vh2KS8dlGJ0XlrcN12oBEwqXKtL3G2+wm9Y5J1VmWB/C+2qewOE9Tw/X8EAZwC X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 70511 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) People have mentioned that it's best for a utility "to do one thing well". It seems to me that even the existing grep options do three things (in the complex use cases): - select a set of files - uncompress the files (if they're compressed) - search within the file contents I am ignoring the case of extracting members from archive files. It seems to me that if one wanted to do a version of this that isn't covered by the existing grep options, step 1 can best be done by "find". Step 3 can be done by running grep on each set of contents, with --label=name to get output lines labeled with the original file name. What doesn't seem to exist is something that does step 2 in a general way. The tool that is needed is something that reads the first few bytes of a file, determines which compression signature is present if any, then processes the contents through the correct decompressor. Ideally, it would be programmable in something like the manner of "file" so that additional compression formats could be fitted into the framework, and it could use either a compiled-in decompression library (like zlib) or call an external decompression program, as necessary. Actually, I'm asking whether anybody knows whether such a tool exists already. It seems like a "natural" facility that somebody would have thought to write maybe fifteen years ago. Dale From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 25 20:36:12 2024 Received: (at 70511) by debbugs.gnu.org; 26 Apr 2024 00:36:13 +0000 Received: from localhost ([127.0.0.1]:33484 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s09ZR-0003oE-JC for submit@debbugs.gnu.org; Thu, 25 Apr 2024 20:36:12 -0400 Received: from relay9-d.mail.gandi.net ([217.70.183.199]:45351) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s09ZM-0003lF-2Y for 70511@debbugs.gnu.org; Thu, 25 Apr 2024 20:36:06 -0400 Received: by mail.gandi.net (Postfix) with ESMTPSA id 27799FF804; Fri, 26 Apr 2024 00:35:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=loyalty.org; s=gm1; t=1714091739; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=7jZCnQs+hKE1fwX+BF+UIFvKGSAZZOhsHWjkhyZ9xis=; b=d8DNKIo9gQj6l7bBOCyT08Z1NnTqQA18jhlkmF/G/0zcA5uV5j4Fjroyj42V794xLMCsCY hDfvJWhcM0EnfUpGuJaXEEgDHIVsfjEhCWtic63/fqzKpJ9mjdaayx8J7IckRy09cN2NoH 6KQ7fYWGIKiIxnxSJtRS+iWKr5CCoad8Vmue3vlN1sgprOqsHggfwTDkdQDNWWn0/BTzW6 MNf8I3RxfyfNw2GzsqdeAlcTOT4GBjJ4lL7Wgo0lw+d38SCUkTJ9PQcWzInCQXqJn13Yry /rYBzpJNVpKBbVB0JaiTlUGPXPaEb2+ZQwTIcAmKyV/JxifdTOuLMjVi2dSQLQ== Date: Thu, 25 Apr 2024 17:35:34 -0700 From: Seth David Schoen To: "Dale R. Worley" Subject: Re: bug#70511: Option to grep into compressed files Message-ID: References: <871q6tw3dk.fsf@hobgoblin.ariadne.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <871q6tw3dk.fsf@hobgoblin.ariadne.com> X-GND-Sasl: schoen@loyalty.org X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 70511 Cc: 70511@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Dale R. Worley writes: > What doesn't seem to exist is something that does step 2 in a general > way. The tool that is needed is something that reads the first few > bytes of a file, determines which compression signature is present if > any, then processes the contents through the correct decompressor. > Ideally, it would be programmable in something like the manner of "file" > so that additional compression formats could be fitted into the > framework, and it could use either a compiled-in decompression library > (like zlib) or call an external decompression program, as necessary. > > Actually, I'm asking whether anybody knows whether such a tool exists > already. It seems like a "natural" facility that somebody would have > thought to write maybe fifteen years ago. For a while, new options were getting added to GNU tar frequently in order to allow you to do things like compress -dc | tar xf - zcat | tar xf - bzcat | tar xf - lzcat | tar xf - etc., but just using the single tar invocation without (explicitly running) an external compression program. The current ones are (in alphabetical order in the man page, not historical order of when they were added) -j, --bzip2 Filter the archive through bzip2(1). -J, --xz Filter the archive through xz(1). --lzip Filter the archive through lzip(1). --lzma Filter the archive through lzma(1). --lzop Filter the archive through lzop(1). -z, --gzip, --gunzip, --ungzip Filter the archive through gzip(1). -Z, --compress, --uncompress Filter the archive through compress(1). --zstd Filter the archive through zstd(1). Wow, _eight_ specific forms of compression! But a newer functionality in GNU tar is -a, --auto-compress Use archive suffix to determine the compression program. and something like that (apparently also looking at the file header) is now the default. It's weird to me to imagine having all of that functionality in grep, but maybe all of the functionality that was put into tar for this could become a separate standalone program? From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 26 10:27:13 2024 Received: (at 70511) by debbugs.gnu.org; 26 Apr 2024 14:27:14 +0000 Received: from localhost ([127.0.0.1]:38177 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s0MXh-0006Kf-LG for submit@debbugs.gnu.org; Fri, 26 Apr 2024 10:27:13 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53400) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s0MXg-0006Jh-3X for 70511@debbugs.gnu.org; Fri, 26 Apr 2024 10:27:12 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s0MXH-0002yQ-QY; Fri, 26 Apr 2024 10:26:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=In-Reply-To:References:Subject:To:MIME-Version:From: Date; bh=DnQeqL5TRloYtXehu3xQiLmD5pNNBvootVnx5ZrvIAY=; b=e2ozIyTqanAH2MMVNANW VsWKWFUmMNiy8Gkq2K/3kiSrXUwOG/sbd/zujAIv6e3h5p2DGV9UnCGn+MZIF8NjpOGi1DCjV4JWw LtePNJfi99I2F7TL1cLhod44CTLzQpV62okTUqOElJ2bFFZtfGVhuu1tdVGf10rhaZh+FlGI/jz46 kA9kRbi0hDKUMxdu2gCk/jrGPJKCE/YTGo6rq8Ak1/LvizW3wTvqCjnSFN5zoDgxg3mO3SM99NRWK UcCQ/7AM5uyzWmh8EapwaiCWIr4ZmEi6SV6sL5CI8p4u1r/T10Dw9WacbDM52Gf1m0SOlns6+qnj+ VIt2vwLqS2a56g==; Message-ID: <662BB9B9.1020900@gnu.org> Date: Fri, 26 Apr 2024 16:27:05 +0200 From: Antonio Diaz Diaz User-Agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 MIME-Version: 1.0 To: "Dale R. Worley" Subject: Re: bug#70511: Option to grep into compressed files References: <871q6tw3dk.fsf@hobgoblin.ariadne.com> In-Reply-To: <871q6tw3dk.fsf@hobgoblin.ariadne.com> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -5.3 (-----) X-Debbugs-Envelope-To: 70511 Cc: 70511@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.3 (------) Dale R. Worley wrote: > What doesn't seem to exist is something that does step 2 in a general > way. The tool that is needed is something that reads the first few > bytes of a file, determines which compression signature is present if > any, then processes the contents through the correct decompressor. Such tool[1] does in fact exist since 2009. It is only that it is not yet widely known. :-) [1] http://www.nongnu.org/zutils/manual/zutils_manual.html#Zgrep Gest regards, Antonio. From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 26 17:14:41 2024 Received: (at submit) by debbugs.gnu.org; 26 Apr 2024 21:14:41 +0000 Received: from localhost ([127.0.0.1]:38559 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s0Su0-0000Rh-Vx for submit@debbugs.gnu.org; Fri, 26 Apr 2024 17:14:41 -0400 Received: from lists.gnu.org ([2001:470:142::17]:42396) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s0Qez-0007gT-Rc for submit@debbugs.gnu.org; Fri, 26 Apr 2024 14:51:05 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s0QeZ-0007ee-Mh for bug-grep@gnu.org; Fri, 26 Apr 2024 14:50:36 -0400 Received: from mail-4325.protonmail.ch ([185.70.43.25]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s0QeX-0007Y5-2h for bug-grep@gnu.org; Fri, 26 Apr 2024 14:50:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=proton.me; s=protonmail; t=1714157429; x=1714416629; bh=XmmXKA3jLmv/iEmXW67inxbFHm0l5JnAVV7NB1MmiIk=; h=Date:To:From:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=P2GIwTNVCDOYyqcLPMANLOe9dshNeSq2tKbE/+rx568i1K/K0t++fy+GbVa62ey17 onQmZjbslVAXb4oFmZNe9KqqloHTqA5WGh+w/QprXHNw+EKCWGmpCHSPSWtMePXcpu 98mbhS+ee1B/uE9l6U2ZdQ4p/Q5SKDbiLagoKkVf+wqc0OzPtRQpFwL8OHvwiyTIH1 GnuuQo0TojfzkPSRkBEgOcxuo6mUoG6PQEhiY6b4EK+bpEqoqbLrsNNIsiQPydrPhp 2IBLdlT2nIWne07erNJ9+kkEkyV/9+PfqPj2ArhH1W1I4eyFhK1dFVs2SshGVSCY5w /v6coMgxw4POA== Date: Fri, 26 Apr 2024 18:50:24 +0000 To: "bug-grep@gnu.org" From: Mary Subject: Re: bug#70511: Option to grep into compressed files Message-ID: In-Reply-To: References: <871q6tw3dk.fsf@hobgoblin.ariadne.com> Feedback-ID: 107467773:user:proton X-Pm-Message-ID: 64944b66495847b619c065f1f7dc9b5bde614961 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.70.43.25; envelope-from=marycada@proton.me; helo=mail-4325.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 26 Apr 2024 17:14:38 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) > For a while, new options were getting added to GNU tar frequently in orde= r > to allow you to do things like >=20 > compress -dc | tar xf - > zcat | tar xf - > bzcat | tar xf - > lzcat | tar xf - >=20 > etc., but just using the single tar invocation without (explicitly > running) an external compression program. The current ones are (in > alphabetical order in the man page, not historical order of when they > were added) >=20 > -j, --bzip2 > Filter the archive through bzip2(1). >=20 > -J, --xz > Filter the archive through xz(1). >=20 > --lzip Filter the archive through lzip(1). >=20 > --lzma Filter the archive through lzma(1). >=20 > --lzop Filter the archive through lzop(1). >=20 > -z, --gzip, --gunzip, --ungzip > Filter the archive through gzip(1). >=20 > -Z, --compress, --uncompress > Filter the archive through compress(1). >=20 > --zstd Filter the archive through zstd(1). >=20 > Wow, eight specific forms of compression! But a newer functionality > in GNU tar is >=20 > -a, --auto-compress > Use archive suffix to determine the compression program. >=20 > and something like that (apparently also looking at the file header) > is now the default. >=20 > It's weird to me to imagine having all of that functionality in grep, > but maybe all of the functionality that was put into tar for this could > become a separate standalone program? GNU tar also supports `-I, --use-compress-program=3DPROG filter through P= ROG (must accept -d)`, which is one of the reasons I thought it would be re= levant to add a similar option to grep. From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 28 17:26:30 2024 Received: (at 70511) by debbugs.gnu.org; 28 Apr 2024 21:26:30 +0000 Received: from localhost ([127.0.0.1]:52963 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s1C2Y-0002GU-6b for submit@debbugs.gnu.org; Sun, 28 Apr 2024 17:26:30 -0400 Received: from resqmta-c2p-570503.sys.comcast.net ([2001:558:fd00:56::5]:40418) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s1C2U-0002GM-Ow for 70511@debbugs.gnu.org; Sun, 28 Apr 2024 17:26:28 -0400 Received: from resomta-c2p-555441.sys.comcast.net ([96.102.18.240]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 256/256 bits) (Client did not present a certificate) by resqmta-c2p-570503.sys.comcast.net with ESMTPS id 1AlgsUxlRbwk31C24sTeoz; Sun, 28 Apr 2024 21:26:00 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20211018a; t=1714339560; bh=/gOnTpjlgCvnHEJnhgg6u7lL6PftUiKZucfh3ztsycw=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID:Xfinity-Spam-Result; b=AZjwaqOLbJ/+E/6WnDXLY4B2XugS9oC+XYMqhFMCdmCDrWBG1TjGBv6jsVkg4M4LC aGCJKFOPMRzwRStAAi4p7TTqRbAllq/G8hiWAdmgXmAl+f8pr3+JTcJrBGqacHldu0 zQxXNmTl9YH/w5JGnv9SogzeS4LzerEc0yFD+4nRXAPZkOFua12ILZMaAsbnDdpPae 76Iyu7UoQxibOeZL7vILzAWsJnzSI4PjzwiKpEkIqGukK3PcSiRZ0DYoFV1L5bn83G Gw4delXmZFDOffgjUybz4ivanhLUBg43RPBzWhZaoFcTl6yzPpZPxzCXsb1Q0Yqr16 w1i/F0uDYOJiA== Received: from hobgoblin.ariadne.com ([IPv6:2601:192:4a00:430::10f6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 256/256 bits) (Client did not present a certificate) by resomta-c2p-555441.sys.comcast.net with ESMTPSA id 1C1hsSbvFfB6P1C1isPqGR; Sun, 28 Apr 2024 21:25:38 +0000 Received: from hobgoblin.ariadne.com (localhost [127.0.0.1]) by hobgoblin.ariadne.com (8.16.1/8.16.1) with ESMTPS id 43SLPaef1933243 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT) for <70511@debbugs.gnu.org>; Sun, 28 Apr 2024 17:25:36 -0400 Received: (from worley@localhost) by hobgoblin.ariadne.com (8.16.1/8.16.1/Submit) id 43SLPaZX1933240; Sun, 28 Apr 2024 17:25:36 -0400 X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f From: "Dale R. Worley" To: 70511@debbugs.gnu.org Subject: Re: bug#70511: Option to grep into compressed files In-Reply-To: (bug-grep@gnu.org) Date: Sun, 28 Apr 2024 17:25:36 -0400 Message-ID: <87mspd89in.fsf@hobgoblin.ariadne.com> X-CMAE-Envelope: MS4xfIKTXXD2AIDsidm9bJTsC5ihBTcS3id5amnmnS785y2GO48WejM+M6R9FJxqDQnKWI0LjfsM3PCnb4Kme1UZf0HEGbjkALh48G12B7KPnrJCHSNYexHQ HcaeNVd8Xm8s1nn0ufsk4KxN4frcdkH2SxJIPhnd0e0cI9lNjoOSSYFgPkswAT/EgGlCgIMwijukjOnnYhX0y3m8Sgz1t0uaiTkNDgSzTqq+NRdkn6WBbM4K X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 70511 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Antonio Diaz Diaz writes: > Dale R. Worley wrote: >> What doesn't seem to exist is something that does step 2 in a general >> way. The tool that is needed is something that reads the first few >> bytes of a file, determines which compression signature is present if >> any, then processes the contents through the correct decompressor. > > Such tool[1] does in fact exist since 2009. It is only that it is not yet > widely known. :-) > > [1] http://www.nongnu.org/zutils/manual/zutils_manual.html#Zgrep Looking at that page, I think you meant to point to #Zcat. But yes, it does seem to do that job. Mary via Bug reports for GNU grep writes: > GNU tar also supports `-I, --use-compress-program=PROG filter > through PROG (must accept -d)`, which is one of the reasons I thought > it would be relevant to add a similar option to grep. So the construction I'm thinking of would be grep ... --use-compress-program=zcat ... pattern file ... except it looks like zcat doesn't accept -d (which would need to be a no-op for it). Though it looks like zcat supports five compression techniques and gnu tar handles eight, so zcat should be expanded there. Dale From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 30 14:27:01 2024 Received: (at 70511) by debbugs.gnu.org; 30 Apr 2024 18:27:01 +0000 Received: from localhost ([127.0.0.1]:60791 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s1sBx-00071J-4s for submit@debbugs.gnu.org; Tue, 30 Apr 2024 14:27:01 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36452) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s1sBv-00071C-W1 for 70511@debbugs.gnu.org; Tue, 30 Apr 2024 14:27:00 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s1sBV-00027u-8Y; Tue, 30 Apr 2024 14:26:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=In-Reply-To:References:Subject:To:MIME-Version:From: Date; bh=Uab+8vPaAz+ue6qNRZ/gSaKOyV1B78vJzHhqf49NxI0=; b=PcsBpAk9M0Ja0eU0lqxI RtflZmODS/4OX2TV/crBO0GJU+HCvgrSP1pr1wcNHzL1S7AMLvwKcYK/rmkHQAtNd82ZQUyk0sg0v aJyKxfPvqSrIhqeMFmYftGCHXGehXvgfvLYkRMSB1DMBuYDmcblaKiVRdgFrbcvbDl/VqRtclh2sk KJDx3EHD7DyXsXYFvE90wurQfHE3O4Dz0+bXZhaPaqhNS4VnJYJVg/Jlob41y3/4Jy30o4VCUIWOj beiwk0X7AxFZ+PpkLDYgSIG4I59b67ijmz29HXDZl3dRjIR1gBwJQajeTCG0FxLpXUJY3thCv1i1U 2DDMYz2jnWcXXQ==; Message-ID: <663137F1.7060002@gnu.org> Date: Tue, 30 Apr 2024 20:26:57 +0200 From: Antonio Diaz Diaz User-Agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 MIME-Version: 1.0 To: "Dale R. Worley" Subject: Re: bug#70511: Option to grep into compressed files References: <87mspd89in.fsf@hobgoblin.ariadne.com> In-Reply-To: <87mspd89in.fsf@hobgoblin.ariadne.com> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -6.6 (------) X-Debbugs-Envelope-To: 70511 Cc: 70511@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -7.6 (-------) Dale R. Worley wrote: > So the construction I'm thinking of would be > > grep ... --use-compress-program=zcat ... pattern file ... Ah! interesting. Zgrep duplicates some of the work of grep. For example it recurses through directories, feeds grep one file at a time, and prepends the file name to the output of grep if needed. Delegating the decompression to zcat as you propose would allow the full use of grep's features. If it is possible it would be possibly the best option. > except it looks like zcat doesn't accept -d (which would need to be a > no-op for it). Zcat does indeed accept (and ignore) option -d for compatibility with gzip. Therefore all that is needed is to implement a way for grep to delegate decompression to zcat. > Though it looks like zcat supports five compression techniques and gnu > tar handles eight, so zcat should be expanded there. Zcat also supports the (obsolete) compress format (.Z) through gzip. Of the other two, lzma should be better removed from tar, and I do not remember to have seen any tarball compressed with lzop (maybe because it compresses less than gzip). Best regards, Antonio. From debbugs-submit-bounces@debbugs.gnu.org Wed May 01 15:21:01 2024 Received: (at 70511) by debbugs.gnu.org; 1 May 2024 19:21:01 +0000 Received: from localhost ([127.0.0.1]:38975 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s2FVk-0005WL-Pb for submit@debbugs.gnu.org; Wed, 01 May 2024 15:21:01 -0400 Received: from resqmta-a2p-658919.sys.comcast.net ([2001:558:fd01:2bb4::8]:42456) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s2FVh-0005WD-KT for 70511@debbugs.gnu.org; Wed, 01 May 2024 15:20:59 -0400 Received: from resomta-a2p-647655.sys.comcast.net ([96.103.145.235]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 256/256 bits) (Client did not present a certificate) by resqmta-a2p-658919.sys.comcast.net with ESMTPS id 2BjxsZgDx50xX2FVGs2E8r; Wed, 01 May 2024 19:20:30 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20211018a; t=1714591230; bh=wIqyogRL+GTI1gWnZwR5BGQrrLnfy2ldfwyKD4jmH6Y=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID:Xfinity-Spam-Result; b=dPpWeb9v+jQ/erip8bVft0U1bwnjHT40xyy42LGGnEfvJui/QisxWUEFus1nufHHm 0EWFTE3oOY2ucaNpUaMiy+Q9+4TZRN/O5CcEOjRVPvefV3jiiDPaALzjA/XCDDt0Z4 whp91Y998ry/TAPvgz0rILxBxJAJTR9tRLkmrgiYens8CwQQ6ia7YNJ9yugiGgOKKl Au/+KU4G7a3kGTU90v/Y/zrpCQSgNEhvrxANXD9zPc2+Z+0IrQI2+66eE7CUAfkrWW W+8oQtoXfKOLksJFFJE1WXo+qFL2BztlRrNNTsTlfcd1gLlcN7g5NHE0TBs8D0Yezd Pm/HF4fmrzqNA== Received: from hobgoblin.ariadne.com ([IPv6:2601:192:4a00:430::10f6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 256/256 bits) (Client did not present a certificate) by resomta-a2p-647655.sys.comcast.net with ESMTPSA id 2FUtsZ9Ebbs3Y2FUusV8ee; Wed, 01 May 2024 19:20:09 +0000 Received: from hobgoblin.ariadne.com (localhost [127.0.0.1]) by hobgoblin.ariadne.com (8.16.1/8.16.1) with ESMTPS id 441JK7f32432911 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Wed, 1 May 2024 15:20:07 -0400 Received: (from worley@localhost) by hobgoblin.ariadne.com (8.16.1/8.16.1/Submit) id 441JK7U12432908; Wed, 1 May 2024 15:20:07 -0400 X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f From: "Dale R. Worley" To: Antonio Diaz Diaz Subject: Re: bug#70511: Option to grep into compressed files In-Reply-To: <663137F1.7060002@gnu.org> (antonio@gnu.org) Date: Wed, 01 May 2024 15:20:07 -0400 Message-ID: <87zft9s5js.fsf@hobgoblin.ariadne.com> X-CMAE-Envelope: MS4xfDFh7LR2cyoo9+oIsqLSvLE0oB97sSmt6IOepGLERTcMXM5hbGcqs7zovnLzEeG3y5ktYUnWzvdfl1Cflrov2HbP1NZujhS58sSzEiPYZOa4NRcjRaxe 7fagu0DI3XUzjvsqtakU4r7wInFVYqQRTANja50n4znLkRX8BDa8+MacDG70Vi9e5GscMb7cEtCclq9lbXynX8wnUr+FBB9vg8gFS6u5OSRChOdwoB3tW9e3 YLbrfYZa2kC1GiWlSnUv9g== X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 70511 Cc: 70511@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Antonio Diaz Diaz writes: > Dale R. Worley wrote: >> So the construction I'm thinking of would be >> >> grep ... --use-compress-program=zcat ... pattern file ... > Zcat does indeed accept (and ignore) option -d for compatibility with gzip. > Therefore all that is needed is to implement a way for grep to delegate > decompression to zcat. > Zcat also supports the (obsolete) compress format (.Z) through gzip. I missed those facts. I only skimmed the section of http://www.nongnu.org/zutils/manual/zutils_manual.html about Zcat and hadn't read the "Common options" section which makes those clear. I'll have to remember that zcat has this nice functionality. I'm not interested enough in this to implement it, but I'll leave this one note for anyone who is researching the possibility: You might be concerned that starting a e.g. zcat process for every file to scan would be excessively high overhead. But many years ago, I modified "tar" to be able to compress each file individually (rather than the entire archive collectively). Each file was processed by a separate gzip process, and in my usage written to an Exabyte tape. I was worried that all these process invocations would slow down the backup, but even on a low-speed 486, the processes were insignificant. So I never improved the implementation to use an internal compression library. Apparently once all the needed files are in the buffers, creating yet another process from them is quick. Dale From debbugs-submit-bounces@debbugs.gnu.org Sat May 04 11:39:06 2024 Received: (at submit) by debbugs.gnu.org; 4 May 2024 15:39:06 +0000 Received: from localhost ([127.0.0.1]:54315 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s3HTd-0002AE-3E for submit@debbugs.gnu.org; Sat, 04 May 2024 11:39:06 -0400 Received: from lists.gnu.org ([2001:470:142::17]:37880) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s3HTY-00029r-Dy for submit@debbugs.gnu.org; Sat, 04 May 2024 11:39:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s3HT4-0001tJ-CJ for bug-grep@gnu.org; Sat, 04 May 2024 11:38:31 -0400 Received: from mail-4318.protonmail.ch ([185.70.43.18]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s3HT0-00069W-JL for bug-grep@gnu.org; Sat, 04 May 2024 11:38:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=proton.me; s=protonmail; t=1714837102; x=1715096302; bh=ssr8xeYhWIpTm54ozYI3CJ2qdMZhY1luPi/gF0BWX+Y=; h=Date:To:From:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=ED7NYXwvdsoaURpo5x87VFUfIWDrJ7bqqnIZJ9R0m2KL3GXY2HDcz15FD4DiGnLG5 4+Q3fHCVcsoq2AWrVVtmwIFa33FMzqa+KhwLOegw0YinQvNO9ItfNFg2F+VJNFsuKO 5LPwDGn+o2FeXVtFbzgmy3MGC3lw0ZIwdUSq0HAAXQGJNmH8joKmyOmw2JpTMLbH89 7pHcAZc390MJJTQ/Ds94q8FJ8ZS4hWeAv+uoqXKP6bQa5xwf0E5+S0x7xILj5BCO+d me6y6kyelanNLdY2HSYN/+j0RZKhnhkpMGuJb+Pj29py08xTaS13V8KGbJHR9OjmPd vZ0owFQL7O8qw== Date: Sat, 04 May 2024 15:38:20 +0000 To: "bug-grep@gnu.org" From: Mary Subject: Re: bug#70511: Option to grep into compressed files Message-ID: In-Reply-To: <4me0yTI4lwJSLqgJeiclLF2fSeUck8rQ4ztnUNBLjVmCoTD6fvIUcKBG9F9PdWJzwYN0LMnNS3fIhtnqynYTKc1Z0mLaPDfLMRCzz8C2thk=@proton.me> References: <87zft9s5js.fsf@hobgoblin.ariadne.com> <4me0yTI4lwJSLqgJeiclLF2fSeUck8rQ4ztnUNBLjVmCoTD6fvIUcKBG9F9PdWJzwYN0LMnNS3fIhtnqynYTKc1Z0mLaPDfLMRCzz8C2thk=@proton.me> Feedback-ID: 107467773:user:proton X-Pm-Message-ID: 55865f859ec590bf351a816ce2c44be4305e64e5 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.70.43.18; envelope-from=marycada@proton.me; helo=mail-4318.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Dale R. Worley worley@alum.mit.edu wrote: > I missed those facts. I only skimmed the section of > http://www.nongnu.org/zutils/manual/zutils_manual.html about Zcat and > hadn't read the "Common options" section which makes those clear. I'll > have to remember that zcat has this nice functionality. >=20 > I'm not interested enough in this to implement it, but I'll leave this > one note for anyone who is researching the possibility: You might be > concerned that starting a e.g. zcat process for every file to scan would > be excessively high overhead. But many years ago, I modified "tar" to > be able to compress each file individually (rather than the entire > archive collectively). Each file was processed by a separate gzip > process, and in my usage written to an Exabyte tape. I was worried that > all these process invocations would slow down the backup, but even on a > low-speed 486, the processes were insignificant. So I never improved > the implementation to use an internal compression library. Apparently > once all the needed files are in the buffers, creating yet another > process from them is quick. >=20 > Dale I already have a patch that I believe is trivial enough to not cause copyri= ght concerns, would you like me to send it? From debbugs-submit-bounces@debbugs.gnu.org Mon May 06 13:57:11 2024 Received: (at 70511) by debbugs.gnu.org; 6 May 2024 17:57:11 +0000 Received: from localhost ([127.0.0.1]:39364 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s42aN-00056X-93 for submit@debbugs.gnu.org; Mon, 06 May 2024 13:57:11 -0400 Received: from resdmta-a2p-658199.sys.comcast.net ([2001:558:fd01:2bb4::c]:58248) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s42aJ-000561-4o for 70511@debbugs.gnu.org; Mon, 06 May 2024 13:57:09 -0400 Received: from resomta-a2p-646965.sys.comcast.net ([96.103.145.237]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 256/256 bits) (Client did not present a certificate) by resdmta-a2p-658199.sys.comcast.net with ESMTPS id 41jQsZlFf09oY42ZosAFbx; Mon, 06 May 2024 17:56:36 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20211018a; t=1715018196; bh=2BIM8LmJtyXfcFjY8qFG9k5vtzPu9VvWu9T8QYge1a8=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID:Xfinity-Spam-Result; b=ouTAcIylYAOhqx/nnZmOPwHRJFZBvvQmmsZqmqhn9D/L9oJmWF7hBGltIIC00kWhO MAT3SKGHBf1VgV0KOtazTglfUkNCqKqgHLUOO8kHThn7v9It+4wcoZRwqOmjAUjQcV +ygtsWC766mP417ebHl1aer51uyPDWQSy7E0RSIZcMjC3ftryeapyyCMBjUtqhF5jx NzCy7O+nbG6LIl64SPeQ2UOSQ1p7cCuOx0dD5fr1hiO949bbEggRkBNL9SvTCVx1cl dqJ1Nl771YkiNFKZr6kyrBmISDWP0i+BD8LHmZrAs85kv93yjrJ0snekzQps8eJQ6B c9QWDxBRnWeuA== Received: from hobgoblin.ariadne.com ([IPv6:2601:192:4a00:430::5bcd]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 256/256 bits) (Client did not present a certificate) by resomta-a2p-646965.sys.comcast.net with ESMTPSA id 42ZmsRSwyac1d42ZnstkXL; Mon, 06 May 2024 17:56:36 +0000 Received: from hobgoblin.ariadne.com (localhost [127.0.0.1]) by hobgoblin.ariadne.com (8.16.1/8.16.1) with ESMTPS id 446HuYxV3120580 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Mon, 6 May 2024 13:56:34 -0400 Received: (from worley@localhost) by hobgoblin.ariadne.com (8.16.1/8.16.1/Submit) id 446HuYSN3120577; Mon, 6 May 2024 13:56:34 -0400 X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f From: "Dale R. Worley" To: Mary Subject: Re: bug#70511: Option to grep into compressed files In-Reply-To: (bug-grep@gnu.org) Date: Mon, 06 May 2024 13:56:34 -0400 Message-ID: <87ikzqddt9.fsf@hobgoblin.ariadne.com> X-CMAE-Envelope: MS4xfMFpUWVitYjs9kKCmnZoh8/1Jt+OZ+EQXlzPYYQlbEmKGV7qkujJ5On38us+x5A+IpSY5Ha48XEbX9Jr6TsXmty+WMnsuyEITWdnwkKTII22XdN0oXTn 9Z2DLg24O4Cuseizkvr7pIx+g8Ke6mD9hXJabMZiumoi3fmTrdzV6E+seoJEAuXkV49P4pB820Y9b20s4DAgnsq9OMNLX/mmU6ELVk2Xb79e2IQXnN2VUsRE Z3kgbPR/LXjFse6iaI3HRQ== X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 70511 Cc: 70511@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Mary via Bug reports for GNU grep writes: > Dale R. Worley worley@alum.mit.edu wrote: >> [...] > I already have a patch that I believe is trivial enough to not cause > copyright concerns, would you like me to send it? *I* am all in favor of it, but I'm not a grep maintainer! Dale