GNU bug report logs -
#65416
Feature request: include first line of file in output
Previous Next
Reported by: Daniel Green <ddgreen <at> gmail.com>
Date: Mon, 21 Aug 2023 07:16:02 UTC
Severity: wishlist
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 65416 in the body.
You can then email your comments to 65416 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#65416
; Package
grep
.
(Mon, 21 Aug 2023 07:16:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Daniel Green <ddgreen <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Mon, 21 Aug 2023 07:16:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I'm frequently searching CSV files with 20-30 columns, and when there's a
hit it can be hard to know what the columns are. An option to also print
the first line of a file (either always, or only if that file had a match
to the pattern) in addition to any hits would be nice.
Thanks,
Dan
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#65416
; Package
grep
.
(Mon, 21 Aug 2023 16:58:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 65416 <at> debbugs.gnu.org (full text, mbox):
Daniel Green <ddgreen <at> gmail.com> wrote:
> I'm frequently searching CSV files with 20-30 columns, and when there's a
> hit it can be hard to know what the columns are. An option to also print
> the first line of a file (either always, or only if that file had a match
> to the pattern) in addition to any hits would be nice.
>
> Thanks,
> Dan
It sounds like awk would be a better tool:
awk 'FNR == 1 || /pattern/' files ...
should do the trick.
HTH,
Arnold
Information forwarded
to
bug-grep <at> gnu.org
:
bug#65416
; Package
grep
.
(Mon, 21 Aug 2023 18:38:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 65416 <at> debbugs.gnu.org (full text, mbox):
Gawk 4.0.2 is 11 years old. Try timing the current version,
I'll bet it's faster. And it solves your problem NOW,
instead of waiting for a feature that the grep developers
aren't likely to add.
My two cents of course.
Arnold
Daniel Green <ddgreen <at> gmail.com> wrote:
> That works, as well as the Perl version I've been using:
>
> perl -ne 'print if ($. == 1 || /pattern/)'
>
> But timings for a real-life example (3GB file with ~16m lines, CentOS 7)
> show the problem:
>
> grep (v2.20): ~1.15s
> perl (v5.36.1): ~4.48s
> awk (v4.0.2): ~10.81s
>
> Admittedly grep is just searching in those timings, but I suspect it could
> accomplish the full task with a minimal decrease in speed.
>
> Dan
>
> On Mon, Aug 21, 2023 at 12:57 PM <arnold <at> skeeve.com> wrote:
>
> > Daniel Green <ddgreen <at> gmail.com> wrote:
> >
> > > I'm frequently searching CSV files with 20-30 columns, and when there's a
> > > hit it can be hard to know what the columns are. An option to also print
> > > the first line of a file (either always, or only if that file had a match
> > > to the pattern) in addition to any hits would be nice.
> > >
> > > Thanks,
> > > Dan
> >
> > It sounds like awk would be a better tool:
> >
> > awk 'FNR == 1 || /pattern/' files ...
> >
> > should do the trick.
> >
> > HTH,
> >
> > Arnold
> >
Information forwarded
to
bug-grep <at> gnu.org
:
bug#65416
; Package
grep
.
(Mon, 21 Aug 2023 18:44:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 65416 <at> debbugs.gnu.org (full text, mbox):
On 8/21/23 13:37, arnold <at> skeeve.com wrote:
> it solves your problem NOW,
> instead of waiting for a feature that the grep developers
> aren't likely to add.
Yes, Grep already has a lot of features that in hindsight would have
better addressed by saying "Use Awk".
Information forwarded
to
bug-grep <at> gnu.org
:
bug#65416
; Package
grep
.
(Mon, 21 Aug 2023 19:11:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 65416 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
That works, as well as the Perl version I've been using:
perl -ne 'print if ($. == 1 || /pattern/)'
But timings for a real-life example (3GB file with ~16m lines, CentOS 7)
show the problem:
grep (v2.20): ~1.15s
perl (v5.36.1): ~4.48s
awk (v4.0.2): ~10.81s
Admittedly grep is just searching in those timings, but I suspect it could
accomplish the full task with a minimal decrease in speed.
Dan
On Mon, Aug 21, 2023 at 12:57 PM <arnold <at> skeeve.com> wrote:
> Daniel Green <ddgreen <at> gmail.com> wrote:
>
> > I'm frequently searching CSV files with 20-30 columns, and when there's a
> > hit it can be hard to know what the columns are. An option to also print
> > the first line of a file (either always, or only if that file had a match
> > to the pattern) in addition to any hits would be nice.
> >
> > Thanks,
> > Dan
>
> It sounds like awk would be a better tool:
>
> awk 'FNR == 1 || /pattern/' files ...
>
> should do the trick.
>
> HTH,
>
> Arnold
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#65416
; Package
grep
.
(Wed, 23 Aug 2023 02:34:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 65416 <at> debbugs.gnu.org (full text, mbox):
I can't speak for the grep guys, but at least I was correct that
current gawk is much faster than gawk 4.0.2.
Arnold
Daniel Green <ddgreen <at> gmail.com> wrote:
> I don't have access to a newer gawk where I did the initial timings, but I
> ran an almost identical test on my home machine.
>
> grep (v3.11): ~0.60s
> perl (v5.38.0): ~3.21s
> gawk (v4.0.2 built from source with `-O3 -march=native`): ~10.22s
> gawk (v5.2.2 built from source with `-O3 -march=native`): ~4.95s
>
> If grep will never add this functionality I'll survive, it just seemed like
> it might not be too much work to implement, and would probably still be
> much faster than using awk/perl. I've never looked at the grep source code
> before, but could be tempted to try implementing it myself if there was any
> chance of the path being accepted.
>
> Dan
>
> On Mon, Aug 21, 2023 at 2:37 PM <arnold <at> skeeve.com> wrote:
>
> > Gawk 4.0.2 is 11 years old. Try timing the current version,
> > I'll bet it's faster. And it solves your problem NOW,
> > instead of waiting for a feature that the grep developers
> > aren't likely to add.
> >
> > My two cents of course.
> >
> > Arnold
> >
> > Daniel Green <ddgreen <at> gmail.com> wrote:
> >
> > > That works, as well as the Perl version I've been using:
> > >
> > > perl -ne 'print if ($. == 1 || /pattern/)'
> > >
> > > But timings for a real-life example (3GB file with ~16m lines, CentOS 7)
> > > show the problem:
> > >
> > > grep (v2.20): ~1.15s
> > > perl (v5.36.1): ~4.48s
> > > awk (v4.0.2): ~10.81s
> > >
> > > Admittedly grep is just searching in those timings, but I suspect it
> > could
> > > accomplish the full task with a minimal decrease in speed.
> > >
> > > Dan
> > >
> > > On Mon, Aug 21, 2023 at 12:57 PM <arnold <at> skeeve.com> wrote:
> > >
> > > > Daniel Green <ddgreen <at> gmail.com> wrote:
> > > >
> > > > > I'm frequently searching CSV files with 20-30 columns, and when
> > there's a
> > > > > hit it can be hard to know what the columns are. An option to also
> > print
> > > > > the first line of a file (either always, or only if that file had a
> > match
> > > > > to the pattern) in addition to any hits would be nice.
> > > > >
> > > > > Thanks,
> > > > > Dan
> > > >
> > > > It sounds like awk would be a better tool:
> > > >
> > > > awk 'FNR == 1 || /pattern/' files ...
> > > >
> > > > should do the trick.
> > > >
> > > > HTH,
> > > >
> > > > Arnold
> > > >
> >
Information forwarded
to
bug-grep <at> gnu.org
:
bug#65416
; Package
grep
.
(Wed, 23 Aug 2023 06:21:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 65416 <at> debbugs.gnu.org (full text, mbox):
> Daniel Green <ddgreen <at> gmail.com> wrote:
>
> > I've never looked at the grep source code
> > before, but could be tempted to try implementing it myself if there was any
> > chance of the path being accepted.
A slightly more complicated perl script would be my first choice if
coding is the solution, but grep already has a feature that could be
used to provide a solution as shown by the following scriptlet
(including an scaled data file) :
$ cat > c.csv
USER,TIP
john,0
jane,10
carenas,100
$ ( grep -m1 USER && grep carenas ) < c.csv
USER,TIP
carenas,100
Carlo
Information forwarded
to
bug-grep <at> gnu.org
:
bug#65416
; Package
grep
.
(Wed, 23 Aug 2023 06:56:04 GMT)
Full text and
rfc822 format available.
Message #26 received at 65416 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I don't have access to a newer gawk where I did the initial timings, but I
ran an almost identical test on my home machine.
grep (v3.11): ~0.60s
perl (v5.38.0): ~3.21s
gawk (v4.0.2 built from source with `-O3 -march=native`): ~10.22s
gawk (v5.2.2 built from source with `-O3 -march=native`): ~4.95s
If grep will never add this functionality I'll survive, it just seemed like
it might not be too much work to implement, and would probably still be
much faster than using awk/perl. I've never looked at the grep source code
before, but could be tempted to try implementing it myself if there was any
chance of the path being accepted.
Dan
On Mon, Aug 21, 2023 at 2:37 PM <arnold <at> skeeve.com> wrote:
> Gawk 4.0.2 is 11 years old. Try timing the current version,
> I'll bet it's faster. And it solves your problem NOW,
> instead of waiting for a feature that the grep developers
> aren't likely to add.
>
> My two cents of course.
>
> Arnold
>
> Daniel Green <ddgreen <at> gmail.com> wrote:
>
> > That works, as well as the Perl version I've been using:
> >
> > perl -ne 'print if ($. == 1 || /pattern/)'
> >
> > But timings for a real-life example (3GB file with ~16m lines, CentOS 7)
> > show the problem:
> >
> > grep (v2.20): ~1.15s
> > perl (v5.36.1): ~4.48s
> > awk (v4.0.2): ~10.81s
> >
> > Admittedly grep is just searching in those timings, but I suspect it
> could
> > accomplish the full task with a minimal decrease in speed.
> >
> > Dan
> >
> > On Mon, Aug 21, 2023 at 12:57 PM <arnold <at> skeeve.com> wrote:
> >
> > > Daniel Green <ddgreen <at> gmail.com> wrote:
> > >
> > > > I'm frequently searching CSV files with 20-30 columns, and when
> there's a
> > > > hit it can be hard to know what the columns are. An option to also
> print
> > > > the first line of a file (either always, or only if that file had a
> match
> > > > to the pattern) in addition to any hits would be nice.
> > > >
> > > > Thanks,
> > > > Dan
> > >
> > > It sounds like awk would be a better tool:
> > >
> > > awk 'FNR == 1 || /pattern/' files ...
> > >
> > > should do the trick.
> > >
> > > HTH,
> > >
> > > Arnold
> > >
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#65416
; Package
grep
.
(Wed, 23 Aug 2023 08:22:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 65416 <at> debbugs.gnu.org (full text, mbox):
sed and awk can also to this (1st line plus any matching lines)
Following transcript from zsh session on my fast Ryzen:
$ <<-'@@' time sh -c "grep -m1 USER && grep carenas"
USER,TIP
john,0
jane,10
carenas,100
@@
USER,TIP
carenas,100
sh -c "grep -m1 USER && grep carenas" 0.00s user 0.00s system 93% cpu 0.003 total
$ <<-'@@' time sed -n -e 1p -e /carenas/p
USER,TIP
john,0
jane,10
carenas,100
@@
USER,TIP
carenas,100
sed -n -e 1p -e /carenas/p 0.00s user 0.00s system 80% cpu 0.001 total
$ <<-'@@' time awk 'NR == 1 || /carenas/'
USER,TIP
john,0
jane,10
carenas,100
@@
USER,TIP
carenas,100
awk 'NR == 1 || /carenas/' 0.00s user 0.00s system 88% cpu 0.002 total
As I expected, sed is fastest, grep next, and awk slowest of the three,
but the 1, 2, and 3 millisecond totals are within the margin of test error.
--
Paul Jackson
pj <at> usa.net
Information forwarded
to
bug-grep <at> gnu.org
:
bug#65416
; Package
grep
.
(Wed, 23 Aug 2023 08:24:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 65416 <at> debbugs.gnu.org (full text, mbox):
oops - grep slower than awk, not the other way around,
on these _highly_ inconclusive timings.
--
Paul Jackson
pj <at> usa.net
Information forwarded
to
bug-grep <at> gnu.org
:
bug#65416
; Package
grep
.
(Wed, 23 Aug 2023 14:26:02 GMT)
Full text and
rfc822 format available.
Message #35 received at 65416 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On the original test machine I timed the sed solution, as well as `(grep
-m1 . 'file' && grep 'pattern' 'file')` and `(mapfile -n1 <'file' && echo
$MAPFILE[0] && grep 'pattern' 'file')` and `(head -n1 'file' && grep
'pattern' 'file')`. Total table of speeds.
grep (v2.20): ~1.15s
perl (v5.36.1): ~4.48s
awk (v4.0.2): ~10.81s
sed (v4.2.2): ~8.15s
grep && grep: ~1.15s
mapfile && grep: ~1.15s
head && grep: ~1.15s
I can write a shell function to make the head+grep version a little easier
to use in practice (i.e., loop over the list of files passed calling
head+grep on each one instead of calling head on the list and then grep on
the list), but I believe it would be difficult to change any options given
to grep. I still think the best combination of speed + output as I imagine
+ ease of integrating with changing grep options used is accomplished by a
new option for grep. But if there's no interest then this feature request
can be closed.
Dan
On Wed, Aug 23, 2023 at 4:23 AM Paul Jackson <pj <at> usa.net> wrote:
> oops - grep slower than awk, not the other way around,
> on these _highly_ inconclusive timings.
>
> --
> Paul Jackson
> pj <at> usa.net
>
[Message part 2 (text/html, inline)]
bug closed, send any further explanations to
65416 <at> debbugs.gnu.org and Daniel Green <ddgreen <at> gmail.com>
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Wed, 23 Aug 2023 18:05:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#65416
; Package
grep
.
(Wed, 23 Aug 2023 22:23:02 GMT)
Full text and
rfc822 format available.
Message #40 received at 65416 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Ah - those times show another reason why one might
be motivated to keep requesting more options be added
to grep.
From those timings, and from looking at the source, it's clear
that the FSF rewrote grep from scratch, sometime back in the
late 1980's or early 1990's, to have fast reads, whereas sed is
still using stdio fread in a classical manner, which is a painfully
slower double copy solution.
If sed were still a widely used command in performance sensitive
applications, it should have some serious TLC applied to its
performance.
However, since the pool of Jurassic Park Dinosaurs who can (and
perhaps do) compose sed commands in their sleep is a nearly
extinct breed, I see no sufficient interest in accepting such a rewrite
of sed, even if it showed up as a proposed checkin.
That grep can even seriously beat perl for such raw read performance
is impressive. Perl used to be the King of such challenges.
--
Paul Jackson
pj <at> usa.net
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#65416
; Package
grep
.
(Thu, 24 Aug 2023 06:58:03 GMT)
Full text and
rfc822 format available.
Message #43 received at 65416 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Re Perl's read speed, it's faster when not doing the line number check for
every line. So `perl -ne 'print if (/pattern/)'` is only ~2.60s, compared
to ~3.28s for `perl -ne 'print if ($. == 1 || /pattern/)'`. Doing nothing
in Perl, i.e., `perl -ne ''` is only ~1.38s.
Dan
On Wed, Aug 23, 2023 at 6:22 PM Paul Jackson <pj <at> usa.net> wrote:
> Ah - those times show another reason why one might
> be motivated to keep requesting more options be added
> to grep.
>
> From those timings, and from looking at the source, it's clear
> that the FSF rewrote grep from scratch, sometime back in the
> late 1980's or early 1990's, to have fast reads, whereas sed is
> still using stdio fread in a classical manner, which is a painfully
> slower double copy solution.
>
> If sed were still a widely used command in performance sensitive
> applications, it should have some serious TLC applied to its
> performance.
>
> However, since the pool of Jurassic Park Dinosaurs who can (and
> perhaps do) compose sed commands in their sleep is a nearly
> extinct breed, I see no sufficient interest in accepting such a rewrite
> of sed, even if it showed up as a proposed checkin.
>
> That grep can even seriously beat perl for such raw read performance
> is impressive. Perl used to be the King of such challenges.
>
> --
> Paul Jackson
> pj <at> usa.net
>
>
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#65416
; Package
grep
.
(Tue, 29 Aug 2023 13:58:02 GMT)
Full text and
rfc822 format available.
Message #46 received at 65416 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Le jeu. 24 août 2023 à 08:58, Daniel Green <ddgreen <at> gmail.com> a écrit :
> Re Perl's read speed, it's faster when not doing the line number check for
> every line. So `perl -ne 'print if (/pattern/)'` is only ~2.60s, compared
> to ~3.28s for `perl -ne 'print if ($. == 1 || /pattern/)'`. Doing nothing
> in Perl, i.e., `perl -ne ''` is only ~1.38s.
>
> Dan
>
> On Wed, Aug 23, 2023 at 6:22 PM Paul Jackson <pj <at> usa.net> wrote:
>
> > Ah - those times show another reason why one might
> > be motivated to keep requesting more options be added
> > to grep.
> >
> > From those timings, and from looking at the source, it's clear
> > that the FSF rewrote grep from scratch, sometime back in the
> > late 1980's or early 1990's, to have fast reads, whereas sed is
> > still using stdio fread in a classical manner, which is a painfully
> > slower double copy solution.
> >
> > If sed were still a widely used command in performance sensitive
> > applications, it should have some serious TLC applied to its
> > performance.
> >
> > However, since the pool of Jurassic Park Dinosaurs who can (and
> > perhaps do) compose sed commands in their sleep is a nearly
> > extinct breed, I see no sufficient interest in accepting such a rewrite
> > of sed, even if it showed up as a proposed checkin.
> >
> > That grep can even seriously beat perl for such raw read performance
> > is impressive. Perl used to be the King of such challenges.
> >
> > --
> > Paul Jackson
> > pj <at> usa.net
> >
> >
> >
>
with a function, something like this :
headgrep() {
head -1 "$2"
grep "$1" "$2"
}
[Message part 2 (text/html, inline)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 27 Sep 2023 11:24:06 GMT)
Full text and
rfc822 format available.
This bug report was last modified 1 year and 320 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.