GNU bug report logs - #65416
Feature request: include first line of file in output

Previous Next

Package: grep;

Reported by: Daniel Green <ddgreen <at> gmail.com>

Date: Mon, 21 Aug 2023 07:16:02 UTC

Severity: wishlist

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #26 received at 65416 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Green <ddgreen <at> gmail.com>
To: arnold <at> skeeve.com
Cc: 65416 <at> debbugs.gnu.org
Subject: Re: bug#65416: Feature request: include first line of file in output
Date: Tue, 22 Aug 2023 22:12:25 -0400
[Message part 1 (text/plain, inline)]
I don't have access to a newer gawk where I did the initial timings, but I
ran an almost identical test on my home machine.

    grep (v3.11):                                              ~0.60s
    perl (v5.38.0):                                            ~3.21s
    gawk (v4.0.2 built from source with `-O3 -march=native`): ~10.22s
    gawk (v5.2.2 built from source with `-O3 -march=native`):  ~4.95s

If grep will never add this functionality I'll survive, it just seemed like
it might not be too much work to implement, and would probably still be
much faster than using awk/perl. I've never looked at the grep source code
before, but could be tempted to try implementing it myself if there was any
chance of the path being accepted.

Dan

On Mon, Aug 21, 2023 at 2:37 PM <arnold <at> skeeve.com> wrote:

> Gawk 4.0.2 is 11 years old. Try timing the current version,
> I'll bet it's faster.  And it solves your problem NOW,
> instead of waiting for a feature that the grep developers
> aren't likely to add.
>
> My two cents of course.
>
> Arnold
>
> Daniel Green <ddgreen <at> gmail.com> wrote:
>
> > That works, as well as the Perl version I've been using:
> >
> >     perl -ne 'print if ($. == 1 || /pattern/)'
> >
> > But timings for a real-life example (3GB file with ~16m lines, CentOS 7)
> > show the problem:
> >
> >     grep (v2.20):    ~1.15s
> >     perl (v5.36.1):  ~4.48s
> >      awk (v4.0.2):  ~10.81s
> >
> > Admittedly grep is just searching in those timings, but I suspect it
> could
> > accomplish the full task with a minimal decrease in speed.
> >
> > Dan
> >
> > On Mon, Aug 21, 2023 at 12:57 PM <arnold <at> skeeve.com> wrote:
> >
> > > Daniel Green <ddgreen <at> gmail.com> wrote:
> > >
> > > > I'm frequently searching CSV files with 20-30 columns, and when
> there's a
> > > > hit it can be hard to know what the columns are. An option to also
> print
> > > > the first line of a file (either always, or only if that file had a
> match
> > > > to the pattern) in addition to any hits would be nice.
> > > >
> > > > Thanks,
> > > > Dan
> > >
> > > It sounds like awk would be a better tool:
> > >
> > >         awk 'FNR == 1 || /pattern/' files ...
> > >
> > > should do the trick.
> > >
> > > HTH,
> > >
> > > Arnold
> > >
>
[Message part 2 (text/html, inline)]

This bug report was last modified 1 year and 321 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.