GNU bug report logs -
#65416
Feature request: include first line of file in output
Previous Next
Reported by: Daniel Green <ddgreen <at> gmail.com>
Date: Mon, 21 Aug 2023 07:16:02 UTC
Severity: wishlist
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
I can't speak for the grep guys, but at least I was correct that
current gawk is much faster than gawk 4.0.2.
Arnold
Daniel Green <ddgreen <at> gmail.com> wrote:
> I don't have access to a newer gawk where I did the initial timings, but I
> ran an almost identical test on my home machine.
>
> grep (v3.11): ~0.60s
> perl (v5.38.0): ~3.21s
> gawk (v4.0.2 built from source with `-O3 -march=native`): ~10.22s
> gawk (v5.2.2 built from source with `-O3 -march=native`): ~4.95s
>
> If grep will never add this functionality I'll survive, it just seemed like
> it might not be too much work to implement, and would probably still be
> much faster than using awk/perl. I've never looked at the grep source code
> before, but could be tempted to try implementing it myself if there was any
> chance of the path being accepted.
>
> Dan
>
> On Mon, Aug 21, 2023 at 2:37 PM <arnold <at> skeeve.com> wrote:
>
> > Gawk 4.0.2 is 11 years old. Try timing the current version,
> > I'll bet it's faster. And it solves your problem NOW,
> > instead of waiting for a feature that the grep developers
> > aren't likely to add.
> >
> > My two cents of course.
> >
> > Arnold
> >
> > Daniel Green <ddgreen <at> gmail.com> wrote:
> >
> > > That works, as well as the Perl version I've been using:
> > >
> > > perl -ne 'print if ($. == 1 || /pattern/)'
> > >
> > > But timings for a real-life example (3GB file with ~16m lines, CentOS 7)
> > > show the problem:
> > >
> > > grep (v2.20): ~1.15s
> > > perl (v5.36.1): ~4.48s
> > > awk (v4.0.2): ~10.81s
> > >
> > > Admittedly grep is just searching in those timings, but I suspect it
> > could
> > > accomplish the full task with a minimal decrease in speed.
> > >
> > > Dan
> > >
> > > On Mon, Aug 21, 2023 at 12:57 PM <arnold <at> skeeve.com> wrote:
> > >
> > > > Daniel Green <ddgreen <at> gmail.com> wrote:
> > > >
> > > > > I'm frequently searching CSV files with 20-30 columns, and when
> > there's a
> > > > > hit it can be hard to know what the columns are. An option to also
> > print
> > > > > the first line of a file (either always, or only if that file had a
> > match
> > > > > to the pattern) in addition to any hits would be nice.
> > > > >
> > > > > Thanks,
> > > > > Dan
> > > >
> > > > It sounds like awk would be a better tool:
> > > >
> > > > awk 'FNR == 1 || /pattern/' files ...
> > > >
> > > > should do the trick.
> > > >
> > > > HTH,
> > > >
> > > > Arnold
> > > >
> >
This bug report was last modified 1 year and 321 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.