From debbugs-submit-bounces@debbugs.gnu.org Mon Aug 21 03:15:43 2023 Received: (at submit) by debbugs.gnu.org; 21 Aug 2023 07:15:43 +0000 Received: from localhost ([127.0.0.1]:55292 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qXz8Y-00011t-Pb for submit@debbugs.gnu.org; Mon, 21 Aug 2023 03:15:43 -0400 Received: from lists.gnu.org ([2001:470:142::17]:44400) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qXnYD-0003y5-SZ for submit@debbugs.gnu.org; Sun, 20 Aug 2023 14:53:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qXnY5-0004u9-Cs for bug-grep@gnu.org; Sun, 20 Aug 2023 14:53:17 -0400 Received: from mail-vk1-xa2f.google.com ([2607:f8b0:4864:20::a2f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qXnY3-0002YU-9k for bug-grep@gnu.org; Sun, 20 Aug 2023 14:53:17 -0400 Received: by mail-vk1-xa2f.google.com with SMTP id 71dfb90a1353d-48d2c072030so160725e0c.0 for ; Sun, 20 Aug 2023 11:53:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692557593; x=1693162393; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=bg99V1KMHAL5581gavbNU6JuY73v67XNZjB8dTmQ5zc=; b=si00ZvC5/P/lvhpbW0t334ml8E3NSVcVezI4BbDGPhpGaCorH9GheObEPL2IPGMY9w jlbtwMW0qq9tfGGvIR32uhzbvuqPHu2AI+bJ9Q3s+9CA0ClEoseNMnxTKsR3K4/rMIbG +FqXVhg8BsfFOHedRW+fH/FE5BUMEIhq7VXEPm5OdDNA+wWnxhLRFnnPIxEHj61hhpkg 1gM1Aw412C+BL01XV38hWWtENlPyFnzyGF2Wmo3Fz5qxYBGiKqrwIkRz97UCbQ5lr4R+ eubX72zOCK0jZzKJCT8dt//6Stc+vNj8qbfEIO8LjQ1yjFrWFl8NDvdgVq8BP/w1UyDr 3yyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692557593; x=1693162393; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=bg99V1KMHAL5581gavbNU6JuY73v67XNZjB8dTmQ5zc=; b=lS3Vqi9dnBM2EFzw8oT8gYCj44C0cT7vEGENWxB/1FKoUaK/UUn4qhdhXLYtqqWzTF KWjEEGpNRoEo6E7MkFwhrwQOmTUPWZVapZMCg8ta21MaC+dSPFtLZzhUixes9HpFcLM0 xJ7P7FGmJhZ/CTw56XMxuDImdxamrRILsQ60qIZzTuTO6S62kv0UdzAw6b09DrbvBJ2A XfYIjA9XufE+FeV6qS5HXSmggPCl7W8Rno05iq7kRrV4A5DEdpLniG+Qpk2hQfRaq4N0 3jDEeMTclapVuYXAUu/Ea8qDBN0veA6cknWQ/YJBwlCqkLE3T9mE6ZAL3+ToKtz/CZdG 0oqw== X-Gm-Message-State: AOJu0YyNLHYK8XwnoxnkBg3gCCWxu2rIrIzYGl1ABM9q+gSEMUWkiQiL iOkYKat7ByB81hkmQJijPqg4bO/eE9eocngf0ssdNTZjxO0= X-Google-Smtp-Source: AGHT+IG0BokvbxUHtAEIq5VNhuZUf4s6LAAM51oeH88YOAicneMPtVdXqHjPNNfBaCkD46nJjJpgwIkXBvDR1xNd6n0= X-Received: by 2002:a05:6122:e5d:b0:486:9138:7a17 with SMTP id bj29-20020a0561220e5d00b0048691387a17mr2935358vkb.2.1692557593096; Sun, 20 Aug 2023 11:53:13 -0700 (PDT) MIME-Version: 1.0 From: Daniel Green Date: Sun, 20 Aug 2023 14:53:02 -0400 Message-ID: Subject: Feature request: include first line of file in output To: bug-grep@gnu.org Content-Type: multipart/alternative; boundary="000000000000193f5b06035f4661" Received-SPF: pass client-ip=2607:f8b0:4864:20::a2f; envelope-from=ddgreen@gmail.com; helo=mail-vk1-xa2f.google.com X-Spam_score_int: -1 X-Spam_score: -0.2 X-Spam_bar: / X-Spam_report: (-0.2 / 5.0 requ) BAYES_20=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 21 Aug 2023 03:15:39 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) --000000000000193f5b06035f4661 Content-Type: text/plain; charset="UTF-8" I'm frequently searching CSV files with 20-30 columns, and when there's a hit it can be hard to know what the columns are. An option to also print the first line of a file (either always, or only if that file had a match to the pattern) in addition to any hits would be nice. Thanks, Dan --000000000000193f5b06035f4661 Content-Type: text/html; charset="UTF-8"
I'm frequently searching CSV files with 20-30 columns, and when there's a hit it can be hard to know what the columns are. An option to also print the first line of a file (either always, or only if that file had a match to the pattern) in addition to any hits would be nice.

Thanks,
Dan
--000000000000193f5b06035f4661-- From debbugs-submit-bounces@debbugs.gnu.org Mon Aug 21 12:57:19 2023 Received: (at 65416) by debbugs.gnu.org; 21 Aug 2023 16:57:19 +0000 Received: from localhost ([127.0.0.1]:57616 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qY8DP-0002zS-E8 for submit@debbugs.gnu.org; Mon, 21 Aug 2023 12:57:19 -0400 Received: from frenzy.freefriends.org ([198.99.81.75]:47286 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qY8DN-0002zK-Nr for 65416@debbugs.gnu.org; Mon, 21 Aug 2023 12:57:18 -0400 X-Envelope-From: arnold@skeeve.com Received: from freefriends.org (localhost [127.0.0.1]) by freefriends.org (8.14.7/8.14.7) with ESMTP id 37LGvENK031461 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 21 Aug 2023 10:57:14 -0600 Received: (from arnold@localhost) by freefriends.org (8.14.7/8.14.7/Submit) id 37LGvETQ031460; Mon, 21 Aug 2023 10:57:14 -0600 From: arnold@skeeve.com Message-Id: <202308211657.37LGvETQ031460@freefriends.org> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@skeeve.com using -f Date: Mon, 21 Aug 2023 10:57:14 -0600 To: ddgreen@gmail.com, 65416@debbugs.gnu.org Subject: Re: bug#65416: Feature request: include first line of file in output References: In-Reply-To: User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 65416 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Daniel Green wrote: > I'm frequently searching CSV files with 20-30 columns, and when there's a > hit it can be hard to know what the columns are. An option to also print > the first line of a file (either always, or only if that file had a match > to the pattern) in addition to any hits would be nice. > > Thanks, > Dan It sounds like awk would be a better tool: awk 'FNR == 1 || /pattern/' files ... should do the trick. HTH, Arnold From debbugs-submit-bounces@debbugs.gnu.org Mon Aug 21 14:37:26 2023 Received: (at 65416) by debbugs.gnu.org; 21 Aug 2023 18:37:26 +0000 Received: from localhost ([127.0.0.1]:57786 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qY9mI-0008Bv-2N for submit@debbugs.gnu.org; Mon, 21 Aug 2023 14:37:26 -0400 Received: from frenzy.freefriends.org ([198.99.81.75]:48482 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qY9mF-0008Bk-Gg for 65416@debbugs.gnu.org; Mon, 21 Aug 2023 14:37:24 -0400 X-Envelope-From: arnold@skeeve.com Received: from freefriends.org (localhost [127.0.0.1]) by freefriends.org (8.14.7/8.14.7) with ESMTP id 37LIbDEK013507 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 21 Aug 2023 12:37:13 -0600 Received: (from arnold@localhost) by freefriends.org (8.14.7/8.14.7/Submit) id 37LIbC7a013506; Mon, 21 Aug 2023 12:37:12 -0600 From: arnold@skeeve.com Message-Id: <202308211837.37LIbC7a013506@freefriends.org> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@skeeve.com using -f Date: Mon, 21 Aug 2023 12:37:12 -0600 To: ddgreen@gmail.com, arnold@skeeve.com Subject: Re: bug#65416: Feature request: include first line of file in output References: <202308211657.37LGvETQ031460@freefriends.org> In-Reply-To: User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 65416 Cc: 65416@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Gawk 4.0.2 is 11 years old. Try timing the current version, I'll bet it's faster. And it solves your problem NOW, instead of waiting for a feature that the grep developers aren't likely to add. My two cents of course. Arnold Daniel Green wrote: > That works, as well as the Perl version I've been using: > > perl -ne 'print if ($. == 1 || /pattern/)' > > But timings for a real-life example (3GB file with ~16m lines, CentOS 7) > show the problem: > > grep (v2.20): ~1.15s > perl (v5.36.1): ~4.48s > awk (v4.0.2): ~10.81s > > Admittedly grep is just searching in those timings, but I suspect it could > accomplish the full task with a minimal decrease in speed. > > Dan > > On Mon, Aug 21, 2023 at 12:57 PM wrote: > > > Daniel Green wrote: > > > > > I'm frequently searching CSV files with 20-30 columns, and when there's a > > > hit it can be hard to know what the columns are. An option to also print > > > the first line of a file (either always, or only if that file had a match > > > to the pattern) in addition to any hits would be nice. > > > > > > Thanks, > > > Dan > > > > It sounds like awk would be a better tool: > > > > awk 'FNR == 1 || /pattern/' files ... > > > > should do the trick. > > > > HTH, > > > > Arnold > > From debbugs-submit-bounces@debbugs.gnu.org Mon Aug 21 14:43:50 2023 Received: (at 65416) by debbugs.gnu.org; 21 Aug 2023 18:43:50 +0000 Received: from localhost ([127.0.0.1]:57801 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qY9sT-0008LS-TM for submit@debbugs.gnu.org; Mon, 21 Aug 2023 14:43:50 -0400 Received: from mail.cs.ucla.edu ([131.179.128.66]:49720) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qY9sQ-0008LC-Ll for 65416@debbugs.gnu.org; Mon, 21 Aug 2023 14:43:48 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 56CAA3C011BDE; Mon, 21 Aug 2023 11:43:38 -0700 (PDT) Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id w8t009jgZ_xc; Mon, 21 Aug 2023 11:43:38 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 100F33C011BDF; Mon, 21 Aug 2023 11:43:38 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu 100F33C011BDF DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1692643418; bh=9eq2jNM7ZFJr6VAeJFI03Zqgafb2IIC6C9E/0x9eDMs=; h=Message-ID:Date:MIME-Version:To:From; b=HWg98dlmbWTxcU4xF0fbZcT4vu0hiLEV8lIooc8Cnsm3wdSRrjd/kjN6N349Hkphb bztpxFIS5/vwIshGqE5x/hKKyglm37ijI9TSu0OR7He8+UwJy84ogwjxK0CzBSHZXp 4J9lWRZ5p6tQKtGkiWzLhU8H/YXMVjiqn1EQ6sJPscYq3m1dHgv39FA4yV9/xZ1gG3 yaiDCPyoAmrLM64Am3+8fsm6ZY0BAkHV2hDLQepXLdZR7np6schSFIgQVqkRnhBsmQ SdDELjUexfhBGKn5osxDVkCcPfIgOnrCOfxYvOD0vLICp8I/J+XLfTT8uoheGJ0B8I dM8uL9EFNlGlg== X-Virus-Scanned: amavisd-new at mail.cs.ucla.edu Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id c6kkMxbbkG65; Mon, 21 Aug 2023 11:43:37 -0700 (PDT) Received: from [192.168.0.57] (ip72-206-2-24.fv.ks.cox.net [72.206.2.24]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id B366B3C011BDE; Mon, 21 Aug 2023 11:43:37 -0700 (PDT) Message-ID: Date: Mon, 21 Aug 2023 13:43:36 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: bug#65416: Feature request: include first line of file in output Content-Language: en-US To: arnold@skeeve.com, ddgreen@gmail.com References: <202308211657.37LGvETQ031460@freefriends.org> <202308211837.37LIbC7a013506@freefriends.org> From: Paul Eggert In-Reply-To: <202308211837.37LIbC7a013506@freefriends.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 65416 Cc: 65416@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.0 (--) On 8/21/23 13:37, arnold@skeeve.com wrote: > it solves your problem NOW, > instead of waiting for a feature that the grep developers > aren't likely to add. Yes, Grep already has a lot of features that in hindsight would have better addressed by saying "Use Awk". From debbugs-submit-bounces@debbugs.gnu.org Mon Aug 21 15:10:47 2023 Received: (at 65416) by debbugs.gnu.org; 21 Aug 2023 19:10:47 +0000 Received: from localhost ([127.0.0.1]:57848 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYAIY-0003IJ-Pc for submit@debbugs.gnu.org; Mon, 21 Aug 2023 15:10:47 -0400 Received: from mail-oo1-xc30.google.com ([2607:f8b0:4864:20::c30]:61789) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qY9FU-0007HF-Mr for 65416@debbugs.gnu.org; Mon, 21 Aug 2023 14:03:33 -0400 Received: by mail-oo1-xc30.google.com with SMTP id 006d021491bc7-570c51530e5so1141028eaf.3 for <65416@debbugs.gnu.org>; Mon, 21 Aug 2023 11:03:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692641004; x=1693245804; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=xoy/AB262HHv1SkjonpcOxPhEirPMwSGaD00pUgNL8g=; b=B4632musLTl8zGP2WutawsYDrEABHQYdXr+QwQgUJY7E5DprkF28sgccK5ahSSQcfh XnsQyv6kWank5/uyNDfCvUomCx5DQ/N6s9GbgUJgGBv7htY5KBxGMmHguSr8kI7UIENe BJuOcNZ0oVcI/dUZIr+gmm2GANIDqfMS9nzvlRoGdEfqIPEfTyKCgjXYQzxyBIVMWqLm 2xUlMv3kTy8Bj0XVjHTZVwLr+N9kwz4AYCvoqoF2jq4ZWIhZ4zYiFp3AalmkCAMs0SUZ UM2h1/wl9g5pvE4doidHcYSVlxRa20SdXXS3V+H6MfQOKcsN49+CI05EiLoJ9vcBlNoy OQ3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692641004; x=1693245804; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xoy/AB262HHv1SkjonpcOxPhEirPMwSGaD00pUgNL8g=; b=VpwHoo7rtD/ujtYv4+TFjhlfeQyPfKlmz2ZJpJhW0sg2sCyOPjy/CgXj6ls+X3jt2s 9pwFfyZkH2ksO8z0MOOmrNNVEoLQ9ycPSto1CNA74T0wS2WoaNBwQgr43hdUx+KfdzvK txILU4d51Z549K9tjAB8ByCTqoWUpkfaV992/EUHrJxP9u3PpOxNox1aqDwz0rrr/xmW jlUfZDTleZlyFUpyqu7wlnXfKnWU7sR3cTcc9sSeGUbFU2f3wHK61fopSdICyNBotvzc kVrkH+W/tNNDo+SkJf6Lq+9IVvrgR4thaoy4V/2upvWR3S81YclvO4H0XtEom/i82cGN +ZEw== X-Gm-Message-State: AOJu0YyBuW7TTAN6p9oN8OsNnkeH6I4UCc3V20ntVbHaBNO4+UU8dKeA 7TUqNr3Mfa1Kz5d0CFqoaDq6QqI/Ej5xWSZzAenLRROl X-Google-Smtp-Source: AGHT+IHasE6/Vx3SDQstzGZRwtbA7ZHkeJ62XyNhp7OTuGlJ1dv89fyyDVFhPRNauFTvwwSo0lJV/pQqN8D5adr3NCw= X-Received: by 2002:a05:6358:99a1:b0:134:28d6:be7 with SMTP id j33-20020a05635899a100b0013428d60be7mr6598787rwb.9.1692641004427; Mon, 21 Aug 2023 11:03:24 -0700 (PDT) MIME-Version: 1.0 References: <202308211657.37LGvETQ031460@freefriends.org> In-Reply-To: <202308211657.37LGvETQ031460@freefriends.org> From: Daniel Green Date: Mon, 21 Aug 2023 14:03:11 -0400 Message-ID: Subject: Re: bug#65416: Feature request: include first line of file in output To: arnold@skeeve.com Content-Type: multipart/alternative; boundary="000000000000cd1daf060372b15b" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 65416 X-Mailman-Approved-At: Mon, 21 Aug 2023 15:10:45 -0400 Cc: 65416@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --000000000000cd1daf060372b15b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable That works, as well as the Perl version I've been using: perl -ne 'print if ($. =3D=3D 1 || /pattern/)' But timings for a real-life example (3GB file with ~16m lines, CentOS 7) show the problem: grep (v2.20): ~1.15s perl (v5.36.1): ~4.48s awk (v4.0.2): ~10.81s Admittedly grep is just searching in those timings, but I suspect it could accomplish the full task with a minimal decrease in speed. Dan On Mon, Aug 21, 2023 at 12:57=E2=80=AFPM wrote: > Daniel Green wrote: > > > I'm frequently searching CSV files with 20-30 columns, and when there's= a > > hit it can be hard to know what the columns are. An option to also prin= t > > the first line of a file (either always, or only if that file had a mat= ch > > to the pattern) in addition to any hits would be nice. > > > > Thanks, > > Dan > > It sounds like awk would be a better tool: > > awk 'FNR =3D=3D 1 || /pattern/' files ... > > should do the trick. > > HTH, > > Arnold > --000000000000cd1daf060372b15b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
That works, as well as the Perl=C2=A0version I've been= using:

=C2=A0 =C2=A0 perl -ne = 'print if ($. =3D=3D 1 || /pattern/)'

But timings for a real-life example (3GB file with ~16m lines, CentOS 7)= show the problem:

= =C2=A0 =C2=A0 grep (v2.20):=C2=A0 =C2=A0 ~1.15s
=C2=A0 =C2=A0 perl (v5.36.1):=C2=A0 ~4.48s
= =C2=A0 =C2=A0 =C2=A0awk (v4.0.2):=C2=A0 ~10.81s

Admittedly=C2=A0grep is just searching in those timings= , but I suspect it could accomplish the full task with a minimal decrease i= n speed.

Dan

On M= on, Aug 21, 2023 at 12:57=E2=80=AFPM <arnold@skeeve.com> wrote:
Daniel Green <ddgreen@gmail.com> wrote:

> I'm frequently searching CSV files with 20-30 columns, and when th= ere's a
> hit it can be hard to know what the columns are. An option to also pri= nt
> the first line of a file (either always, or only if that file had a ma= tch
> to the pattern) in addition to any hits would be nice.
>
> Thanks,
> Dan

It sounds like awk would be a better tool:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 awk 'FNR =3D=3D 1 || /pattern/' files .= ..

should do the trick.

HTH,

Arnold
--000000000000cd1daf060372b15b-- From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 22 22:33:30 2023 Received: (at 65416) by debbugs.gnu.org; 23 Aug 2023 02:33:30 +0000 Received: from localhost ([127.0.0.1]:60736 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYdgY-00032F-0S for submit@debbugs.gnu.org; Tue, 22 Aug 2023 22:33:30 -0400 Received: from frenzy.freefriends.org ([198.99.81.75]:44552 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYdgW-000327-2a for 65416@debbugs.gnu.org; Tue, 22 Aug 2023 22:33:29 -0400 X-Envelope-From: arnold@skeeve.com Received: from freefriends.org (localhost [127.0.0.1]) by freefriends.org (8.14.7/8.14.7) with ESMTP id 37N2XKiK018873 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 22 Aug 2023 20:33:20 -0600 Received: (from arnold@localhost) by freefriends.org (8.14.7/8.14.7/Submit) id 37N2XJlZ018872; Tue, 22 Aug 2023 20:33:19 -0600 From: arnold@skeeve.com Message-Id: <202308230233.37N2XJlZ018872@freefriends.org> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@skeeve.com using -f Date: Tue, 22 Aug 2023 20:33:19 -0600 To: ddgreen@gmail.com, arnold@skeeve.com Subject: Re: bug#65416: Feature request: include first line of file in output References: <202308211657.37LGvETQ031460@freefriends.org> <202308211837.37LIbC7a013506@freefriends.org> In-Reply-To: User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 65416 Cc: 65416@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) I can't speak for the grep guys, but at least I was correct that current gawk is much faster than gawk 4.0.2. Arnold Daniel Green wrote: > I don't have access to a newer gawk where I did the initial timings, but I > ran an almost identical test on my home machine. > > grep (v3.11): ~0.60s > perl (v5.38.0): ~3.21s > gawk (v4.0.2 built from source with `-O3 -march=native`): ~10.22s > gawk (v5.2.2 built from source with `-O3 -march=native`): ~4.95s > > If grep will never add this functionality I'll survive, it just seemed like > it might not be too much work to implement, and would probably still be > much faster than using awk/perl. I've never looked at the grep source code > before, but could be tempted to try implementing it myself if there was any > chance of the path being accepted. > > Dan > > On Mon, Aug 21, 2023 at 2:37 PM wrote: > > > Gawk 4.0.2 is 11 years old. Try timing the current version, > > I'll bet it's faster. And it solves your problem NOW, > > instead of waiting for a feature that the grep developers > > aren't likely to add. > > > > My two cents of course. > > > > Arnold > > > > Daniel Green wrote: > > > > > That works, as well as the Perl version I've been using: > > > > > > perl -ne 'print if ($. == 1 || /pattern/)' > > > > > > But timings for a real-life example (3GB file with ~16m lines, CentOS 7) > > > show the problem: > > > > > > grep (v2.20): ~1.15s > > > perl (v5.36.1): ~4.48s > > > awk (v4.0.2): ~10.81s > > > > > > Admittedly grep is just searching in those timings, but I suspect it > > could > > > accomplish the full task with a minimal decrease in speed. > > > > > > Dan > > > > > > On Mon, Aug 21, 2023 at 12:57 PM wrote: > > > > > > > Daniel Green wrote: > > > > > > > > > I'm frequently searching CSV files with 20-30 columns, and when > > there's a > > > > > hit it can be hard to know what the columns are. An option to also > > print > > > > > the first line of a file (either always, or only if that file had a > > match > > > > > to the pattern) in addition to any hits would be nice. > > > > > > > > > > Thanks, > > > > > Dan > > > > > > > > It sounds like awk would be a better tool: > > > > > > > > awk 'FNR == 1 || /pattern/' files ... > > > > > > > > should do the trick. > > > > > > > > HTH, > > > > > > > > Arnold > > > > > > From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 23 02:20:41 2023 Received: (at 65416) by debbugs.gnu.org; 23 Aug 2023 06:20:41 +0000 Received: from localhost ([127.0.0.1]:60870 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYhEO-0003Lu-NV for submit@debbugs.gnu.org; Wed, 23 Aug 2023 02:20:40 -0400 Received: from mail-wr1-x430.google.com ([2a00:1450:4864:20::430]:45273) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYhEN-0003Le-V9 for 65416@debbugs.gnu.org; Wed, 23 Aug 2023 02:20:40 -0400 Received: by mail-wr1-x430.google.com with SMTP id ffacd0b85a97d-31aeef88a55so3098712f8f.2 for <65416@debbugs.gnu.org>; Tue, 22 Aug 2023 23:20:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692771631; x=1693376431; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=5fGpQ0CP6EbhRalm9bYrXMf5p9zSHVEMXdFym9o8o2Q=; b=Wsv2YYaiVkbk/0HZpgAH9tRmmQKpkbzLmrtcpIoqSvMzf1nVLvn/a1gxclB58ZWQfN qjp9Is1MYWna36y+E80EqkKDM6TpHPwa7hZK8j78TsdHq5fRjbPt7jZKotLk+lKXrzxi WzKF2SwbgLEqgmJtt7GRIUX46Zd4Py8EmE80ym1V0ew4fOqRHqic5xArc7okB3cRhx8y PP5qbKp3dIes4lRoqeZkQh3Jpm7dQylu30yE2f/KXTvmeRg20oSMSZwKFedMPwtTBRYF NgIyVguhHbtbP8QjBdMg7Nle3smYaPIb0B6TT5E3NJ+Z1GPRDMHqKitjTDHT4ashRl8E pLLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692771631; x=1693376431; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5fGpQ0CP6EbhRalm9bYrXMf5p9zSHVEMXdFym9o8o2Q=; b=bfv1aOfkDXJ5h4IAMyK4NA3xjoNyTmm7teqKRcncr3Ga3DpztrP+6MTh9lqb9V8AT7 b/ya3E51aY6G6sa4IZA23afrmGFhYqaEZUJGCyfw2D8lDW36ey3n9u3RdRaQPspdpfRi /cuGLzbXetqVSJQ9d3W+LWl2KBUNKaKJHs9dSHwxp4oJPI+cOIRJBj5f4X3S8kw3J6Jr A1/j12U22LzyqU7u6GxFFVXQRk0XxOuGTD2+5L4qP+fkIJurdfSRg4d9K0HrDPIz1ffH Y67ZGE1x/iLNs3l7xdBWzURGbu9IkKY02Mju98NijK0L1GbnTpFPYB9ZqLX4SMKGlsdZ 3ZKA== X-Gm-Message-State: AOJu0YxOMMf5pX7JNlKJfQ8rIzjj6p0QH3VEfMr47ygTUiy2RpOihAgy mk5/X6ehwDVkbFPL2E7+5sd8/MyTic5HIpTTSew= X-Google-Smtp-Source: AGHT+IE7lKTmUjG8BHOvIGP5+Wgnyp3fMj7izpQacsZw6dg6qKqumvUkgGSklqRxIv/FlHHzkwZFC5SlM9Lq5m4+E1s= X-Received: by 2002:a5d:5225:0:b0:319:7b59:77cb with SMTP id i5-20020a5d5225000000b003197b5977cbmr7864348wra.58.1692771630751; Tue, 22 Aug 2023 23:20:30 -0700 (PDT) MIME-Version: 1.0 References: <202308211657.37LGvETQ031460@freefriends.org> <202308211837.37LIbC7a013506@freefriends.org> <202308230233.37N2XJlZ018872@freefriends.org> In-Reply-To: <202308230233.37N2XJlZ018872@freefriends.org> From: Carlo Arenas Date: Tue, 22 Aug 2023 23:20:19 -0700 Message-ID: Subject: Re: bug#65416: Feature request: include first line of file in output To: arnold@skeeve.com Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 65416 Cc: ddgreen@gmail.com, 65416@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) > Daniel Green wrote: > > > I've never looked at the grep source code > > before, but could be tempted to try implementing it myself if there was any > > chance of the path being accepted. A slightly more complicated perl script would be my first choice if coding is the solution, but grep already has a feature that could be used to provide a solution as shown by the following scriptlet (including an scaled data file) : $ cat > c.csv USER,TIP john,0 jane,10 carenas,100 $ ( grep -m1 USER && grep carenas ) < c.csv USER,TIP carenas,100 Carlo From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 23 02:55:10 2023 Received: (at 65416) by debbugs.gnu.org; 23 Aug 2023 06:55:10 +0000 Received: from localhost ([127.0.0.1]:60924 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYhll-0004V3-S0 for submit@debbugs.gnu.org; Wed, 23 Aug 2023 02:55:10 -0400 Received: from mail-vk1-xa2c.google.com ([2607:f8b0:4864:20::a2c]:47531) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYdMT-0002Vo-Uo for 65416@debbugs.gnu.org; Tue, 22 Aug 2023 22:12:46 -0400 Received: by mail-vk1-xa2c.google.com with SMTP id 71dfb90a1353d-48d0e739e32so1120571e0c.3 for <65416@debbugs.gnu.org>; Tue, 22 Aug 2023 19:12:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692756757; x=1693361557; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Grulhqsjx4PlksXa7w6D7wOpgkQ4aGdOqCnotV/Jd58=; b=fMfC9jyWu9IApcPeeYULE6KHC1MmPgkeFtAgpfDnzBtTLMg4y7PtaW+311pbp1Bn/a 6quBb0/20YpXCEd3fp6aRKvKYV74K3FKBse85WkYSnfbV/McfRg3bpgf/wXRhUalwO2Y 8yaqoisP/rGujldPAA0Kf4L80amkeSBe6rgIXYeqaIti2lcNPsRU0bWmt6SPfHB4Bwug bPPuo/U3C5Al8DJHV3ZEUqWKYbj04hIv9B/HkF7R6hT9pOXQwmKKVgZmpNJh9I+OqRrQ oRs8WZdwiTCJ9XLtTs/f6/khxbbvXfNo336AGzutClCick6jbhuKR1+z5pWiTKfivmXI Za4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692756757; x=1693361557; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Grulhqsjx4PlksXa7w6D7wOpgkQ4aGdOqCnotV/Jd58=; b=CtmXY+zcxTXtCNRetGRcui156dYdRwfbApEjy8LbI6rKj2gWEegZVFZ3BOl1tqIyA+ 3Wd3GX0xIdpespXcRSzcFtowO+YDFf06ppK0R3ifA4AOJjfqncCxgzEVxajNFh52/s2i lI3jFgC1KODCalbxDJxf4xv0vVEXMDEkXRLsMKv8a9ja+vCsqgBsThytmN3e/YdKEW8o KAmHpAR2CtqCEhUoNmw3Qltjv1v6SppoL6u6cduV7g4dRc8SAP6BiQoXIxoYqFS+cH0x Tb0OkAH0qJQ27XRpbupiYcJ7g/C2tPFpEq3jaZFu6OR6Uh+6MhXdl4VyGeACFEW9r+PE DIxA== X-Gm-Message-State: AOJu0YxpcAZDlAkH1OJzsEONlK1USlHtZWJ0vRBO1Vij/MHmxJxr8rCu SYIA9c72uZUmbxj5qP14kUY+VOlBD4Tj7SQiy5xERIy+ X-Google-Smtp-Source: AGHT+IFZPo46M9uZ/InBZPO7I24M8Ez6sGNObyt4PSNCCT9O5Y+ZbpCEVMTdf+xh5ds5ImGh05i1FQd4tgAJu3knj84= X-Received: by 2002:a1f:4e43:0:b0:48d:b79:d5d1 with SMTP id c64-20020a1f4e43000000b0048d0b79d5d1mr7253167vkb.1.1692756756752; Tue, 22 Aug 2023 19:12:36 -0700 (PDT) MIME-Version: 1.0 References: <202308211657.37LGvETQ031460@freefriends.org> <202308211837.37LIbC7a013506@freefriends.org> In-Reply-To: <202308211837.37LIbC7a013506@freefriends.org> From: Daniel Green Date: Tue, 22 Aug 2023 22:12:25 -0400 Message-ID: Subject: Re: bug#65416: Feature request: include first line of file in output To: arnold@skeeve.com Content-Type: multipart/alternative; boundary="0000000000002d716606038da59c" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 65416 X-Mailman-Approved-At: Wed, 23 Aug 2023 02:55:06 -0400 Cc: 65416@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --0000000000002d716606038da59c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I don't have access to a newer gawk where I did the initial timings, but I ran an almost identical test on my home machine. grep (v3.11): ~0.60s perl (v5.38.0): ~3.21s gawk (v4.0.2 built from source with `-O3 -march=3Dnative`): ~10.22s gawk (v5.2.2 built from source with `-O3 -march=3Dnative`): ~4.95s If grep will never add this functionality I'll survive, it just seemed like it might not be too much work to implement, and would probably still be much faster than using awk/perl. I've never looked at the grep source code before, but could be tempted to try implementing it myself if there was any chance of the path being accepted. Dan On Mon, Aug 21, 2023 at 2:37=E2=80=AFPM wrote: > Gawk 4.0.2 is 11 years old. Try timing the current version, > I'll bet it's faster. And it solves your problem NOW, > instead of waiting for a feature that the grep developers > aren't likely to add. > > My two cents of course. > > Arnold > > Daniel Green wrote: > > > That works, as well as the Perl version I've been using: > > > > perl -ne 'print if ($. =3D=3D 1 || /pattern/)' > > > > But timings for a real-life example (3GB file with ~16m lines, CentOS 7= ) > > show the problem: > > > > grep (v2.20): ~1.15s > > perl (v5.36.1): ~4.48s > > awk (v4.0.2): ~10.81s > > > > Admittedly grep is just searching in those timings, but I suspect it > could > > accomplish the full task with a minimal decrease in speed. > > > > Dan > > > > On Mon, Aug 21, 2023 at 12:57=E2=80=AFPM wrote: > > > > > Daniel Green wrote: > > > > > > > I'm frequently searching CSV files with 20-30 columns, and when > there's a > > > > hit it can be hard to know what the columns are. An option to also > print > > > > the first line of a file (either always, or only if that file had a > match > > > > to the pattern) in addition to any hits would be nice. > > > > > > > > Thanks, > > > > Dan > > > > > > It sounds like awk would be a better tool: > > > > > > awk 'FNR =3D=3D 1 || /pattern/' files ... > > > > > > should do the trick. > > > > > > HTH, > > > > > > Arnold > > > > --0000000000002d716606038da59c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I don't have access to a newer g= awk where I did the initial timings, but I ran an almost identical test on = my home machine.

=C2=A0=C2=A0=C2=A0 grep (= v3.11):=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ~0.60s
<= div>=C2=A0=C2=A0=C2=A0 perl (v5.38.0)= :=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ~3.21s
=C2=A0=C2=A0=C2=A0 gawk (v4.0.2 built from source w= ith `-O3 -march=3Dnative`): ~10.22s
=C2=A0=C2=A0=C2=A0 gawk (v5.2.2 built from source with `-O3= -march=3Dnative`):=C2=A0 ~4.95s

= If grep will never add this functionality I'll survive, it just seemed = like it might not be too much work to implement, and would probably still b= e much faster than using awk/perl. I've never looked at the grep source= code before, but could be tempted to try implementing it myself if there w= as any chance of the path being accepted.

Dan
<= /div>

On Mon, Aug 21, 2023 at 2:37=E2=80=AFPM <arnold@skeeve.com> wrote:
Gawk 4.0.2 is 11 years o= ld. Try timing the current version,
I'll bet it's faster.=C2=A0 And it solves your problem NOW,
instead of waiting for a feature that the grep developers
aren't likely to add.

My two cents of course.

Arnold

Daniel Green <ddg= reen@gmail.com> wrote:

> That works, as well as the Perl version I've been using:
>
>=C2=A0 =C2=A0 =C2=A0perl -ne 'print if ($. =3D=3D 1 || /pattern/)&#= 39;
>
> But timings for a real-life example (3GB file with ~16m lines, CentOS = 7)
> show the problem:
>
>=C2=A0 =C2=A0 =C2=A0grep (v2.20):=C2=A0 =C2=A0 ~1.15s
>=C2=A0 =C2=A0 =C2=A0perl (v5.36.1):=C2=A0 ~4.48s
>=C2=A0 =C2=A0 =C2=A0 awk (v4.0.2):=C2=A0 ~10.81s
>
> Admittedly grep is just searching in those timings, but I suspect it c= ould
> accomplish the full task with a minimal decrease in speed.
>
> Dan
>
> On Mon, Aug 21, 2023 at 12:57=E2=80=AFPM <arnold@skeeve.com> wrote:
>
> > Daniel Green <ddgreen@gmail.com> wrote:
> >
> > > I'm frequently searching CSV files with 20-30 columns, a= nd when there's a
> > > hit it can be hard to know what the columns are. An option t= o also print
> > > the first line of a file (either always, or only if that fil= e had a match
> > > to the pattern) in addition to any hits would be nice.
> > >
> > > Thanks,
> > > Dan
> >
> > It sounds like awk would be a better tool:
> >
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0awk 'FNR =3D=3D 1 || /patter= n/' files ...
> >
> > should do the trick.
> >
> > HTH,
> >
> > Arnold
> >
--0000000000002d716606038da59c-- From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 23 04:21:41 2023 Received: (at 65416) by debbugs.gnu.org; 23 Aug 2023 08:21:41 +0000 Received: from localhost ([127.0.0.1]:60979 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYj7V-0000r3-3c for submit@debbugs.gnu.org; Wed, 23 Aug 2023 04:21:41 -0400 Received: from wout3-smtp.messagingengine.com ([64.147.123.19]:37355) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYj7S-0000qn-Jn for 65416@debbugs.gnu.org; Wed, 23 Aug 2023 04:21:39 -0400 Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.west.internal (Postfix) with ESMTP id 4D3EF320090C; Wed, 23 Aug 2023 04:21:27 -0400 (EDT) Received: from imap51 ([10.202.2.101]) by compute6.internal (MEProxy); Wed, 23 Aug 2023 04:21:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1692778886; x=1692865286; bh=qfQUDvrmqi0XW rNQoz8fIU2SLxZD5gMZOAgxD5nVjrA=; b=2dfryQzus+pFnaNRlGdD6kwF9bi1L HUZOGWd5nBUAtsOl9rp11De7i6ojlIYndRPPKUxTP5pv48+pg6UvsCwYKswahIXM 7N4AGroyOE5MQD+vNOhjtGQAMe7m+UHVb4pdSp8Q084Kl1M+NkfzZG3dIDCl/8HW E2fmQzkxPBzAe82/kHLADfV5CPbznuz7i5pwSL7siEgzZZC5Yy/EH+af56Bb7MUC KUoVaGQ9CEoGHtlop4EPi2f051MEz/QAcvFkkMsExYNG2Lr8wD8xLAfUqgb9J9IS 1vNoD878r/yNdU/9eU32iBok+bauTY8Egqx8nm+CVbAOUAOOhMXfyZOHQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedruddvgedgtddvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvfevufgtsehttdertderredtnecuhfhrohhmpedfrfgr uhhlucflrggtkhhsohhnfdcuoehpjhesuhhsrgdrnhgvtheqnecuggftrfgrthhtvghrnh epueehvdelvdelvedtvdejlefhveehgeeijeeuvedtfeeltdfgueffveefveehgeegnecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepphhjsehush grrdhnvght X-ME-Proxy: Feedback-ID: i047841af:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 646A6B60089; Wed, 23 Aug 2023 04:21:26 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.9.0-alpha0-647-g545049cfe6-fm-20230814.001-g545049cf Mime-Version: 1.0 Message-Id: <2c44ee1c-b8c2-4800-95f7-e2a8b32b2dac@app.fastmail.com> In-Reply-To: References: <202308211657.37LGvETQ031460@freefriends.org> <202308211837.37LIbC7a013506@freefriends.org> Date: Wed, 23 Aug 2023 03:20:26 -0500 From: "Paul Jackson" To: "Daniel Green" , arnold@skeeve.com Subject: Re: bug#65416: Feature request: include first line of file in output Content-Type: text/plain X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 65416 Cc: 65416@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) sed and awk can also to this (1st line plus any matching lines) Following transcript from zsh session on my fast Ryzen: $ <<-'@@' time sh -c "grep -m1 USER && grep carenas" USER,TIP john,0 jane,10 carenas,100 @@ USER,TIP carenas,100 sh -c "grep -m1 USER && grep carenas" 0.00s user 0.00s system 93% cpu 0.003 total $ <<-'@@' time sed -n -e 1p -e /carenas/p USER,TIP john,0 jane,10 carenas,100 @@ USER,TIP carenas,100 sed -n -e 1p -e /carenas/p 0.00s user 0.00s system 80% cpu 0.001 total $ <<-'@@' time awk 'NR == 1 || /carenas/' USER,TIP john,0 jane,10 carenas,100 @@ USER,TIP carenas,100 awk 'NR == 1 || /carenas/' 0.00s user 0.00s system 88% cpu 0.002 total As I expected, sed is fastest, grep next, and awk slowest of the three, but the 1, 2, and 3 millisecond totals are within the margin of test error. -- Paul Jackson pj@usa.net From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 23 04:23:55 2023 Received: (at 65416) by debbugs.gnu.org; 23 Aug 2023 08:23:55 +0000 Received: from localhost ([127.0.0.1]:60990 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYj9f-0000ui-F9 for submit@debbugs.gnu.org; Wed, 23 Aug 2023 04:23:55 -0400 Received: from wout3-smtp.messagingengine.com ([64.147.123.19]:59793) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYj9d-0000uT-Da for 65416@debbugs.gnu.org; Wed, 23 Aug 2023 04:23:53 -0400 Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.west.internal (Postfix) with ESMTP id 64E74320094D; Wed, 23 Aug 2023 04:23:44 -0400 (EDT) Received: from imap51 ([10.202.2.101]) by compute6.internal (MEProxy); Wed, 23 Aug 2023 04:23:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1692779024; x=1692865424; bh=V1v06+28mR/SJ bl/AsOaTaQFw8tzjstI1H6hmRDXGoA=; b=c6ztj9H99Ei+JTFZI5SOoGCBifhbF Qcr4/j5cVLzEId53j59pc1b2ZLHQ3gH0HPTUasfzbAEhIyealuq3jyT1AQmQGXeF Y5qy3ngSUzsnrIxSKcnfumMbCbIell0/245AenpnoGv0+W8cA9NizPzIY/n3W6s5 sXcsWuL8yC43Yxdp0WMjC9T0j5hzPaDiRr6pCs4yEUwTAqeIT06HqB3uY8kpgBei +e7v5KBSkVKbHuvagkf14g+ZEXJR5TQprBTIXDU8v9qRo68QpwyLwLjXzdCL0EaT c9OMDiommGpBfSbXDUXOsIa0uVGLH8zTTo+XTVhbF3WVxXBhxAMubQOsA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedruddvgedgtdefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvfevufgtsehttdertderredtnecuhfhrohhmpedfrfgr uhhlucflrggtkhhsohhnfdcuoehpjhesuhhsrgdrnhgvtheqnecuggftrfgrthhtvghrnh epueehvdelvdelvedtvdejlefhveehgeeijeeuvedtfeeltdfgueffveefveehgeegnecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepphhjsehush grrdhnvght X-ME-Proxy: Feedback-ID: i047841af:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id BCD44B60089; Wed, 23 Aug 2023 04:23:43 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.9.0-alpha0-647-g545049cfe6-fm-20230814.001-g545049cf Mime-Version: 1.0 Message-Id: <11507157-b80a-45c7-8e52-9ae5a853190a@app.fastmail.com> In-Reply-To: <2c44ee1c-b8c2-4800-95f7-e2a8b32b2dac@app.fastmail.com> References: <202308211657.37LGvETQ031460@freefriends.org> <202308211837.37LIbC7a013506@freefriends.org> <2c44ee1c-b8c2-4800-95f7-e2a8b32b2dac@app.fastmail.com> Date: Wed, 23 Aug 2023 03:23:01 -0500 From: "Paul Jackson" To: "Daniel Green" , arnold@skeeve.com Subject: Re: bug#65416: Feature request: include first line of file in output Content-Type: text/plain X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 65416 Cc: 65416@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) oops - grep slower than awk, not the other way around, on these _highly_ inconclusive timings. -- Paul Jackson pj@usa.net From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 23 10:25:09 2023 Received: (at 65416) by debbugs.gnu.org; 23 Aug 2023 14:25:09 +0000 Received: from localhost ([127.0.0.1]:34857 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYonD-0005Rr-Q6 for submit@debbugs.gnu.org; Wed, 23 Aug 2023 10:25:08 -0400 Received: from mail-vs1-xe29.google.com ([2607:f8b0:4864:20::e29]:50294) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYoih-0005JO-Nr for 65416@debbugs.gnu.org; Wed, 23 Aug 2023 10:20:30 -0400 Received: by mail-vs1-xe29.google.com with SMTP id ada2fe7eead31-44d4c3fa6a6so1438789137.0 for <65416@debbugs.gnu.org>; Wed, 23 Aug 2023 07:20:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692800418; x=1693405218; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=AxlfgewlXc4Y9MLoJgrV3ZGiQaKr1DP6LmBUEmseid8=; b=qjPmlgdsTHWQlFT03YkflWuld8DvHanBJq82Hx/15EVNstlFRBFClxtarCAsYqNW4k 3zUDB0s9gVeo/iMg+pWqyfNov4HUke2iKNlnZk61hWprW2tklkkl1uWv3hzRgIJfBz70 zq7KJyNwuZvY1/dVUGUOikIjPPvpv4MmzgaZN3ktxtu8Mku5sIgfcMxPZ8yWCpB5kLKm UfXJ+nEICWZNtmju/xfQG6sIwlrhHmux9TEJeHZm5ZJMrhxFBFWnJmdFU6NjTL7DCl2S rO8Yno3fmoivashbNSnPTTQUayuUsIvnFZn8cGehZjtU9PSVM50GyGjVop8f6oN/V6Tc RYdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692800418; x=1693405218; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AxlfgewlXc4Y9MLoJgrV3ZGiQaKr1DP6LmBUEmseid8=; b=P4hQCzEAV1IFXF2Wp/Zs9HBiGgO3j/3QGfOwl7AQsYy2XZxumo7sx4cAKe0p5gSKOs Rkd/UKzg9A0BvpKA1Sw51Vf7mHGg0dU6RculJUb9lp61eO4tqEFD0jTIW5mClqBWL32b NIE8pBrZDcvHHSoqG4BIxw4+4561CoY58RWVg70ijW4j+QJ6vpHfzpcv2vkFRXfk7Nvf 8Dk+JFS2xqckpxNVW5S/LjZ7GdEdJwwRHYi5HR2ja8e+10QSMHCPugvpHjOm5AvkRFXm +Mf4WKyUXccY+AGsWrm8LIx+xANDMCN58AZ17tEVpQ3WbBlreSOF0e2PfTKThJJgwLMB pddw== X-Gm-Message-State: AOJu0YxOhqT7oXg3WDAKAvGHJBYmnuiSfTn/RAKDFPM1UNU22WXbuW3f 0vpsJos0SCgKya2CjOe61UoIOly4mh8uDXge6/73MwUFDY8= X-Google-Smtp-Source: AGHT+IGIhvieiJDBnk0LJ6H6MpEiX1r/Z4XSfmZNcofBVZrK/WaT83bnIhR3oMrfbVGaWh+Z2ndGcSjDFoB6gXNHSU4= X-Received: by 2002:a05:6102:503:b0:44d:4e57:5c4f with SMTP id l3-20020a056102050300b0044d4e575c4fmr9112778vsa.33.1692800418129; Wed, 23 Aug 2023 07:20:18 -0700 (PDT) MIME-Version: 1.0 References: <202308211657.37LGvETQ031460@freefriends.org> <202308211837.37LIbC7a013506@freefriends.org> <2c44ee1c-b8c2-4800-95f7-e2a8b32b2dac@app.fastmail.com> <11507157-b80a-45c7-8e52-9ae5a853190a@app.fastmail.com> In-Reply-To: <11507157-b80a-45c7-8e52-9ae5a853190a@app.fastmail.com> From: Daniel Green Date: Wed, 23 Aug 2023 10:20:05 -0400 Message-ID: Subject: Re: bug#65416: Feature request: include first line of file in output To: Paul Jackson Content-Type: multipart/alternative; boundary="000000000000992b8e060397cfd2" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 65416 X-Mailman-Approved-At: Wed, 23 Aug 2023 10:25:04 -0400 Cc: arnold@skeeve.com, 65416@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --000000000000992b8e060397cfd2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On the original test machine I timed the sed solution, as well as `(grep -m1 . 'file' && grep 'pattern' 'file')` and `(mapfile -n1 <'file' && echo $MAPFILE[0] && grep 'pattern' 'file')` and `(head -n1 'file' && grep 'pattern' 'file')`. Total table of speeds. grep (v2.20): ~1.15s perl (v5.36.1): ~4.48s awk (v4.0.2): ~10.81s sed (v4.2.2): ~8.15s grep && grep: ~1.15s mapfile && grep: ~1.15s head && grep: ~1.15s I can write a shell function to make the head+grep version a little easier to use in practice (i.e., loop over the list of files passed calling head+grep on each one instead of calling head on the list and then grep on the list), but I believe it would be difficult to change any options given to grep. I still think the best combination of speed + output as I imagine + ease of integrating with changing grep options used is accomplished by a new option for grep. But if there's no interest then this feature request can be closed. Dan On Wed, Aug 23, 2023 at 4:23=E2=80=AFAM Paul Jackson wrote: > oops - grep slower than awk, not the other way around, > on these _highly_ inconclusive timings. > > -- > Paul Jackson > pj@usa.net > --000000000000992b8e060397cfd2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On the original test machine I timed the sed solution, as = well as `(grep -m1 . 'file' && gre= p 'pattern' 'file')` and `(= mapfile -n1 <'file' && echo $MAPFILE[0] && grep = 'pattern' 'file')` and=C2=A0 `(head -n1 'file' && grep 'pat= tern' 'file')`. Total table of speeds.

=C2=A0 =C2=A0 grep (v2.20):=C2=A0 =C2=A0 ~1= .15s
=C2=A0 =C2=A0 perl (v5.36.1)= :=C2=A0 ~4.48s
=C2=A0 =C2=A0 =C2= =A0awk (v4.0.2):=C2=A0 ~10.81s
=C2=A0 =C2=A0 =C2=A0sed (v4.2.2):=C2=A0 =C2=A0~8.15s
<= font face=3D"monospace">=C2=A0 =C2=A0 =C2=A0grep && grep:=C2=A0 =C2= =A0~1.15s
=C2=A0 mapfile && grep:=C2=A0 = =C2=A0~1.15s
=C2=A0 = =C2=A0 =C2=A0head && grep:=C2=A0 =C2=A0~1.15s

I can write a shell function to make the head+grep version a = little=C2=A0easier to use in practice (i.e., loop over the list of files pa= ssed calling head+grep on each one instead=C2=A0of calling head on the list= and then grep on the list), but I believe it would be difficult to change = any options given to grep. I still think the best combination of speed=C2= =A0+ output as I imagine + ease of integrating with changing grep options u= sed is accomplished by a new option for grep. But if there's no interes= t then this feature request can be closed.

Da= n

On Wed, Aug 23, 2023 at 4:23=E2=80=AFAM Paul Jackson <pj@usa.net> wrote:
oops - grep slower than awk, not the = other way around,
on these _highly_ inconclusive timings.

--
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Paul Jackson
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 pj@usa.net
--000000000000992b8e060397cfd2-- From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 23 14:04:15 2023 Received: (at control) by debbugs.gnu.org; 23 Aug 2023 18:04:15 +0000 Received: from localhost ([127.0.0.1]:35139 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYsDG-0003Qu-Q0 for submit@debbugs.gnu.org; Wed, 23 Aug 2023 14:04:15 -0400 Received: from mail.cs.ucla.edu ([131.179.128.66]:33078) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYsDE-0003Qf-Vi for control@debbugs.gnu.org; Wed, 23 Aug 2023 14:04:13 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 94DF03C011BD7 for ; Wed, 23 Aug 2023 11:04:02 -0700 (PDT) Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Fh3PxF0vwiYX for ; Wed, 23 Aug 2023 11:04:02 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 4B0283C011BE3 for ; Wed, 23 Aug 2023 11:04:02 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu 4B0283C011BE3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1692813842; bh=8FBlI8ZP45nvz7v06kkftShF+bZeZ5zuPWIWwhet6Jc=; h=Message-ID:Date:MIME-Version:To:From; b=mUGx/AIUbpuuC9IOHGn57eFGiuDmQ/ctzYElxIc4U51yhrbui8y9XtE9XcH+byg35 pZhOHsruy4s+xCqvw65+vZKzAyo/pYWzxhkWu6F3RfAfFY0ylN2FuHcKpBJH6UEkXs BTwtTAH8tu4lUyrnUt7FSXXK2My/3rKvIVjsL0/YzPcUHj+9iF6YbwvJc15PyywX3E 0CEZEQlTUN8sv/+Cy6nTCbPIsS2msg2ZhEiFTjfTwiGz7h/GhwvHvlAvSWfnRobrBA oDHsfrsuN4VihQ7jiM7FekifmXmmLMl5SbYajHXEYfSXVWx/m7TatoqFUqXwcDV1Pa j+Xd6HsJe1soQ== X-Virus-Scanned: amavisd-new at mail.cs.ucla.edu Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id qKF5200aYVnd for ; Wed, 23 Aug 2023 11:04:02 -0700 (PDT) Received: from [192.168.2.17] (c-73-242-198-106.hsd1.nm.comcast.net [73.242.198.106]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id 146143C011BD7 for ; Wed, 23 Aug 2023 11:04:02 -0700 (PDT) Message-ID: <557db0e1-578b-fd04-484b-29c7871704fb@cs.ucla.edu> Date: Wed, 23 Aug 2023 12:04:01 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Content-Language: en-US To: control@debbugs.gnu.org From: Paul Eggert Subject: close 65416 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) close 65416 From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 23 18:22:17 2023 Received: (at 65416) by debbugs.gnu.org; 23 Aug 2023 22:22:17 +0000 Received: from localhost ([127.0.0.1]:35299 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYwEz-0003d9-BY for submit@debbugs.gnu.org; Wed, 23 Aug 2023 18:22:17 -0400 Received: from wout4-smtp.messagingengine.com ([64.147.123.20]:57077) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYwEx-0003cu-7o for 65416@debbugs.gnu.org; Wed, 23 Aug 2023 18:22:16 -0400 Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.west.internal (Postfix) with ESMTP id 30BBA32006F2; Wed, 23 Aug 2023 18:22:05 -0400 (EDT) Received: from imap51 ([10.202.2.101]) by compute6.internal (MEProxy); Wed, 23 Aug 2023 18:22:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1692829324; x=1692915724; bh=ZcQRwim4/vnZK xjpnYy0VnEezgx1SpdgWbPxhip2Ypw=; b=sT13S5NaPcCXOkLsXxvMovvvAo4m0 6r6Q/UutSUJZ6IbSCSPD0Kj1hh32Vu189RvJTzG2I+pBUWm3cFyFddjNC52BubGl AlFn/UITsF0iO3wFYBF7fi9kk6Pj1Bk6PeA+128FmNqo/kFComiJr2KdvcysVGzi NegRM4XSy/kx7pTjJCiB75u95WyGj5Rozj6NTpinSEQ0dCmmMQ8LQ9zZ4GD4MZ6c oObvRfzC3IIac60iIj4scxAF6He3HCKApGfrwc7FA63OBxoPEplHU/C64St6lph1 NMeHSskHGEwE0WronD9j3/770a40zY7SN2CKwkqaN0+R4+KzAdRW7cXZg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedruddvhedgtdelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvfevufgtsegrtderreerredtnecuhfhrohhmpedfrfgr uhhlucflrggtkhhsohhnfdcuoehpjhesuhhsrgdrnhgvtheqnecuggftrfgrthhtvghrnh epudehkefgleekgffgjeehkeejtdevgffffeeggfefvdehteejffefgffhfeehjeehnecu vehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepphhjsehush grrdhnvght X-ME-Proxy: Feedback-ID: i047841af:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 39DD5B60089; Wed, 23 Aug 2023 18:22:04 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.9.0-alpha0-647-g545049cfe6-fm-20230814.001-g545049cf Mime-Version: 1.0 Message-Id: <528dc225-62dc-432b-b969-0ad8d6106a54@app.fastmail.com> In-Reply-To: References: <202308211657.37LGvETQ031460@freefriends.org> <202308211837.37LIbC7a013506@freefriends.org> <2c44ee1c-b8c2-4800-95f7-e2a8b32b2dac@app.fastmail.com> <11507157-b80a-45c7-8e52-9ae5a853190a@app.fastmail.com> Date: Wed, 23 Aug 2023 17:21:37 -0500 From: "Paul Jackson" To: "Daniel Green" Subject: Re: bug#65416: Feature request: include first line of file in output Content-Type: multipart/alternative; boundary=34e72f1795004c85a268bf17cf983043 X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 65416 Cc: arnold@skeeve.com, 65416@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --34e72f1795004c85a268bf17cf983043 Content-Type: text/plain Ah - those times show another reason why one might be motivated to keep requesting more options be added to grep. >From those timings, and from looking at the source, it's clear that the FSF rewrote grep from scratch, sometime back in the late 1980's or early 1990's, to have fast reads, whereas sed is still using stdio fread in a classical manner, which is a painfully slower double copy solution. If sed were still a widely used command in performance sensitive applications, it should have some serious TLC applied to its performance. However, since the pool of Jurassic Park Dinosaurs who can (and perhaps do) compose sed commands in their sleep is a nearly extinct breed, I see no sufficient interest in accepting such a rewrite of sed, even if it showed up as a proposed checkin. That grep can even seriously beat perl for such raw read performance is impressive. Perl used to be the King of such challenges. -- Paul Jackson pj@usa.net --34e72f1795004c85a268bf17cf983043 Content-Type: text/html Content-Transfer-Encoding: quoted-printable
Ah - those times show another reason why one might
be motivated to keep requesting= more options be added
to gre= p.

From those timings, and from looking at the source, it= 's clear
that the FSF rewrote= grep from scratch, sometime back in the
late 1980's or early 1990's, to have fast reads, whereas sed= is
still using stdio fread i= n a classical manner, which is a painfully
slower double copy solution.

If sed were st= ill a widely used command in performance sensitive
applications, it should have some serious TLC appli= ed to its
performance.

However, since the pool of Jurassic Park Dinosaurs who can (and=
perhaps do) compose sed comm= ands in their sleep is a nearly
extinct breed, I see no sufficient interest in accepting such a rewri= te
of sed, even if it showed = up as a proposed checkin.
That grep can even seriously be= at perl for such raw read performance
is impressive.  Perl used to be the King of such challenge= s.

--=20
                Paul Jackson
                pj@usa.net


--34e72f1795004c85a268bf17cf983043-- From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 24 02:57:50 2023 Received: (at 65416) by debbugs.gnu.org; 24 Aug 2023 06:57:50 +0000 Received: from localhost ([127.0.0.1]:36004 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qZ4Ht-0000nZ-7a for submit@debbugs.gnu.org; Thu, 24 Aug 2023 02:57:50 -0400 Received: from mail-vk1-xa31.google.com ([2607:f8b0:4864:20::a31]:62508) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYyJW-0006xg-UC for 65416@debbugs.gnu.org; Wed, 23 Aug 2023 20:35:10 -0400 Received: by mail-vk1-xa31.google.com with SMTP id 71dfb90a1353d-48d2c072030so241485e0c.0 for <65416@debbugs.gnu.org>; Wed, 23 Aug 2023 17:35:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692837297; x=1693442097; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=y8FV4XQrlyItP81iS6xB+5Maep3bCoGKq1Ii0w4SOm8=; b=XfkaWdX08dqDe216kcYH7UGuuGUYQ127dnnGvbMWurwcTHVIkr23dN5HIAAQSarvFA zgSo6le+LWXkTvqLK1pQKrlemIw8vMpoMhw3la+z3EikU2JPQwTXNxJnhWf2iSHm06aI ZTGD4piGSOZs4Iby16GS3RI4SyDrfW2+ln4ucQMTYGuknTp54tWnaC14O6j76DMRIukn bjDJLOlsOh7pg5XPhpfr3gzw8EcxxRjZ+UD1Qg10G5jXfZwwAuxT6LcZFhCgVH/hK/XG XXBiv9DMC21RgZlqccQuCBdLQNlLiW0OsbWzi3kX5OjB+ODFzaBb24Jpk40afuyp3GA/ O4Xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692837297; x=1693442097; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=y8FV4XQrlyItP81iS6xB+5Maep3bCoGKq1Ii0w4SOm8=; b=DwaFMGNNfVXQbaf23C/yofexoBjfvwioXunZx+XTGyGHG/UNrF9o4gyYHfovWRsfUI QLtm0BJp3pSorApp+n1LDiLA3GS1XoP8DlAXBjZS+8AJCrUFY0yLMoa/6pVxM5dU7m6s VJMA5FY+XuwiIyREXBmldglvr0PpRD/kygnTrkTh8l6+3Y8tlYQ5pM8nt7DSE5o3zaw2 dQ7X5MXTdeVRpIT05MPm3W9PKDzGMDi7/n9anSwtyd76h05WMblNm00whe8Y8X+MyNX/ 9XppWK1qPwRiM5ry/ip3P0eBb7lA9L4CO8Ri+4h2T/McKnwHk0Whl+hTP7MUmc0V3+nW uxSg== X-Gm-Message-State: AOJu0YwvITLtfaRCrKbKBatu9gtkR+0TIU6q38fYQ0iFsYMo4gGXqtxq 6w3hfs23WhO5nnzYRWIYaJHfiVPyQrIY0Th8ohQ= X-Google-Smtp-Source: AGHT+IE77LW6YWxEwcV9YFpKDuSxDYdNkc+VFqCGhmdQy1A3dA79Xx8ME6FeRPsnh4Qj7p1rLz4Xipl9A4mJ/T55bng= X-Received: by 2002:a1f:9c4b:0:b0:48d:d98:1266 with SMTP id f72-20020a1f9c4b000000b0048d0d981266mr5899815vke.6.1692837297443; Wed, 23 Aug 2023 17:34:57 -0700 (PDT) MIME-Version: 1.0 References: <202308211657.37LGvETQ031460@freefriends.org> <202308211837.37LIbC7a013506@freefriends.org> <2c44ee1c-b8c2-4800-95f7-e2a8b32b2dac@app.fastmail.com> <11507157-b80a-45c7-8e52-9ae5a853190a@app.fastmail.com> <528dc225-62dc-432b-b969-0ad8d6106a54@app.fastmail.com> In-Reply-To: <528dc225-62dc-432b-b969-0ad8d6106a54@app.fastmail.com> From: Daniel Green Date: Wed, 23 Aug 2023 20:34:46 -0400 Message-ID: Subject: Re: bug#65416: Feature request: include first line of file in output To: Paul Jackson Content-Type: multipart/alternative; boundary="000000000000c6d77a0603a0656c" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 65416 X-Mailman-Approved-At: Thu, 24 Aug 2023 02:57:44 -0400 Cc: arnold@skeeve.com, 65416@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --000000000000c6d77a0603a0656c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Re Perl's read speed, it's faster when not doing the line number check for every line. So `perl -ne 'print if (/pattern/)'` is only ~2.60s, compared to ~3.28s for `perl -ne 'print if ($. =3D=3D 1 || /pattern/)'`. Doing nothi= ng in Perl, i.e., `perl -ne ''` is only ~1.38s. Dan On Wed, Aug 23, 2023 at 6:22=E2=80=AFPM Paul Jackson wrote: > Ah - those times show another reason why one might > be motivated to keep requesting more options be added > to grep. > > From those timings, and from looking at the source, it's clear > that the FSF rewrote grep from scratch, sometime back in the > late 1980's or early 1990's, to have fast reads, whereas sed is > still using stdio fread in a classical manner, which is a painfully > slower double copy solution. > > If sed were still a widely used command in performance sensitive > applications, it should have some serious TLC applied to its > performance. > > However, since the pool of Jurassic Park Dinosaurs who can (and > perhaps do) compose sed commands in their sleep is a nearly > extinct breed, I see no sufficient interest in accepting such a rewrite > of sed, even if it showed up as a proposed checkin. > > That grep can even seriously beat perl for such raw read performance > is impressive. Perl used to be the King of such challenges. > > -- > Paul Jackson > pj@usa.net > > > --000000000000c6d77a0603a0656c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Re Perl's read speed, it's faster when not do= ing the line number check for every line. So `perl -ne 'print if (/pattern/)'` is only ~2.60s, co= mpared to ~3.28s for `perl -ne 'p= rint if ($. =3D=3D 1 || /pattern/)'`. Doing nothing in Perl, i.e., `perl -ne ''` is only ~1.38s.

Dan

On Wed, Aug 23, 2023 at 6:22=E2=80=AFPM Paul Jackson &l= t;pj@usa.net> wrote:
Ah - those times show ano= ther reason why one might
be moti= vated to keep requesting more options be added
to grep.

=
From those timings, and from looking at th= e source, it's clear
that the= FSF rewrote grep from scratch, sometime back in the
late 1980's or early 1990's, to have fast reads= , whereas sed is
still using stdi= o fread in a classical manner, which is a painfully
slower double copy solution.

If sed were still= a widely used command in performance sensitive
applications, it should have some serious TLC applied to its=
performance.

However, = since the pool of Jurassic Park Dinosaurs who can (and
perhaps do) compose sed commands in their sleep is a= nearly
extinct breed, I see no s= ufficient interest in accepting such a rewrite
of sed, even if it showed up as a proposed checkin.
=

That grep can even seriously beat perl for such raw read performance
is impressive.=C2=A0 Perl used to be t= he King of such challenges.

<= /div>
--=20
                Paul Jackson
                pj@usa.net<=
/a>


--000000000000c6d77a0603a0656c-- From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 29 09:57:31 2023 Received: (at 65416) by debbugs.gnu.org; 29 Aug 2023 13:57:31 +0000 Received: from localhost ([127.0.0.1]:51284 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qazDn-0005ky-0y for submit@debbugs.gnu.org; Tue, 29 Aug 2023 09:57:31 -0400 Received: from mail-ej1-x62f.google.com ([2a00:1450:4864:20::62f]:45214) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qazDh-0005kc-0D for 65416@debbugs.gnu.org; Tue, 29 Aug 2023 09:57:29 -0400 Received: by mail-ej1-x62f.google.com with SMTP id a640c23a62f3a-99357737980so571959766b.2 for <65416@debbugs.gnu.org>; Tue, 29 Aug 2023 06:57:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693317432; x=1693922232; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=TlJ+7V4VtJ0Wk4lJnLdaCXzXrqOlbN/biUHqnWaAQyA=; b=laZCKYIM4ruKZRAPEzZrKDzpuv7JhY6rnS4OiyGLq0RLCBE03RKZR2i8kwxk+90NYS IC2ZfdqSJpNCjGq/qt9RWCwRcykh8QMSKQ+RYAsx3DG6ForFaxYtp9TiGh1t5bcphm1Q o1p0NB9yOtpu2IIdlUZF2bumySKulcqkTJ7RTn7At/bh3UHywy82HXPZn+tZTuQmUFir FEfcbHqOz/3dPx4E/X8G6AK6xN9+ZboHMG/fjPkKbdMZHQj7lYpzrMjVTBmEF0+avyGb GRsUp2RKMW8jFPqrTrpRNNrqVSN7ju02JWM3ucRrhR8vp8KvE0GdGQ0BGGjnc3UbcVrT P6fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693317432; x=1693922232; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TlJ+7V4VtJ0Wk4lJnLdaCXzXrqOlbN/biUHqnWaAQyA=; b=Wsx4drgs9PrunQywP3yN6HFNkqacflHfEEqPjRkTfF1/Z/VaDRwuL7cmAWfW7opU24 48jJVmRBlewE2l3HY74ocxam8Ai+GzJV2PDgiEVxTcA0Xo2x7st5qNO66xPaTz/gGaJT EFBLX0DugPaZ4KhTEWAtQy0NpwsP4UicjyIopDRvjx5tUBxLJ8ZoYGrSCqxnL3nZXjWo R+hUhSqe2gYTY8+bpOIGb/4TJZafPPvGFwqBibM1H4pUV1zZ7j5x/s9vtBZOhH635gtx yXapXPnF8zuzVzgd/Sl6mXKflNOMUkzdlOmJrnRSttoTD6lotxbV+FYpMrrWznJxHDMR ZoiA== X-Gm-Message-State: AOJu0YzfMMxnlEDy27iMGA+JAnjQfHA2pKT7TiVDVjEDEIunUcyttF59 b8gPzFxdAeXF45NKD89IapuVJnEv8zSNmLoA7J0= X-Google-Smtp-Source: AGHT+IF3iB3y6eWXa9Prs/JvcpmRUe6tVghfcRKcbP2oogDtRgNqjflpoknD0iAz7sedEaCTWy30+0FwOwviaSnpnes= X-Received: by 2002:a17:906:8a6a:b0:9a5:c957:ed4a with SMTP id hy10-20020a1709068a6a00b009a5c957ed4amr1704553ejc.46.1693317431740; Tue, 29 Aug 2023 06:57:11 -0700 (PDT) MIME-Version: 1.0 References: <202308211657.37LGvETQ031460@freefriends.org> <202308211837.37LIbC7a013506@freefriends.org> <2c44ee1c-b8c2-4800-95f7-e2a8b32b2dac@app.fastmail.com> <11507157-b80a-45c7-8e52-9ae5a853190a@app.fastmail.com> <528dc225-62dc-432b-b969-0ad8d6106a54@app.fastmail.com> In-Reply-To: From: lacsaP Patatetom Date: Tue, 29 Aug 2023 15:56:58 +0200 Message-ID: Subject: Re: bug#65416: Feature request: include first line of file in output To: Daniel Green Content-Type: multipart/alternative; boundary="00000000000002cb8b06041030e1" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 65416 Cc: Paul Jackson , arnold@skeeve.com, 65416@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --00000000000002cb8b06041030e1 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Le jeu. 24 ao=C3=BBt 2023 =C3=A0 08:58, Daniel Green a = =C3=A9crit : > Re Perl's read speed, it's faster when not doing the line number check fo= r > every line. So `perl -ne 'print if (/pattern/)'` is only ~2.60s, compared > to ~3.28s for `perl -ne 'print if ($. =3D=3D 1 || /pattern/)'`. Doing not= hing > in Perl, i.e., `perl -ne ''` is only ~1.38s. > > Dan > > On Wed, Aug 23, 2023 at 6:22=E2=80=AFPM Paul Jackson wrote: > > > Ah - those times show another reason why one might > > be motivated to keep requesting more options be added > > to grep. > > > > From those timings, and from looking at the source, it's clear > > that the FSF rewrote grep from scratch, sometime back in the > > late 1980's or early 1990's, to have fast reads, whereas sed is > > still using stdio fread in a classical manner, which is a painfully > > slower double copy solution. > > > > If sed were still a widely used command in performance sensitive > > applications, it should have some serious TLC applied to its > > performance. > > > > However, since the pool of Jurassic Park Dinosaurs who can (and > > perhaps do) compose sed commands in their sleep is a nearly > > extinct breed, I see no sufficient interest in accepting such a rewrite > > of sed, even if it showed up as a proposed checkin. > > > > That grep can even seriously beat perl for such raw read performance > > is impressive. Perl used to be the King of such challenges. > > > > -- > > Paul Jackson > > pj@usa.net > > > > > > > with a function, something like this : headgrep() { head -1 "$2" grep "$1" "$2" } --00000000000002cb8b06041030e1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Re Perl&= #39;s read speed, it's faster when not doing the line number check for<= br> every line. So `perl -ne 'print if (/pattern/)'` is only ~2.60s, co= mpared
to ~3.28s for `perl -ne 'print if ($. =3D=3D 1 || /pattern/)'`. Doi= ng nothing
in Perl, i.e., `perl -ne ''` is only ~1.38s.

Dan

On Wed, Aug 23, 2023 at 6:22=E2=80=AFPM Paul Jackson <pj@usa.net> wrote:

> Ah - those times show another reason why one might
> be motivated to keep requesting more options be added
> to grep.
>
> From those timings, and from looking at the source, it's clear
> that the FSF rewrote grep from scratch, sometime back in the
> late 1980's or early 1990's, to have fast reads, whereas sed i= s
> still using stdio fread in a classical manner, which is a painfully > slower double copy solution.
>
> If sed were still a widely used command in performance sensitive
> applications, it should have some serious TLC applied to its
> performance.
>
> However, since the pool of Jurassic Park Dinosaurs who can (and
> perhaps do) compose sed commands in their sleep is a nearly
> extinct breed, I see no sufficient interest in accepting such a rewrit= e
> of sed, even if it showed up as a proposed checkin.
>
> That grep can even seriously beat perl for such raw read performance > is impressive.=C2=A0 Perl used to be the King of such challenges.
>
> --
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Paul Jack= son
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pj@usa.net
>
>
>

with a function, something like th= is :

headgre= p() {
=C2=A0head -1 = "$2"
=C2= =A0grep "$1" "$2"
}

--00000000000002cb8b06041030e1-- From unknown Fri Aug 15 21:28:07 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 27 Sep 2023 11:24:06 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator