GNU bug report logs - #37754
wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union)

Previous Next

Package: grep;

Reported by: "Trent W. Buck" <trentbuck <at> gmail.com>

Date: Tue, 15 Oct 2019 01:49:01 UTC

Severity: wishlist

Found in version 3.3-1

Full log


Message #20 received at 37754 <at> debbugs.gnu.org (full text, mbox):

From: "Trent W. Buck" <trentbuck <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 37754 <at> debbugs.gnu.org
Subject: Re: bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union)
Date: Fri, 18 Oct 2019 22:49:23 +1100
Paul Eggert wrote:
> On 10/16/19 5:19 PM, Trent W. Buck wrote:
> > I would expect "grep -Fw -e 4GB -e DDR4 --and" to print the same thing as
> >
> >      grep -Fw 4GB | grep -Fw DDR4 | grep -Fw -e 4GB -e DDR4 -o
>
> You're right, it's not obvious. :-)
>
> It may be better to just pipe greps together, as you do now. That's simple
> and fast enough for this relatively-uncommon case, and it's portable to all
> greps.

I admit that most of the time, I want "grep --and" for a small dataset
(<1MB computer_parts.txt), so it's merely a convenience.

Sometimes I grep audit logs (~1TB uncompressed), which takes anywhere
from 15 minutes to 3 days, depending on how I tweak my grep calls.

In that case, each grep in the pipeline has to pay the costs to
de-serialize input from the previous grep, and re-serialize output to
the next grep.  If the first grep matches (say) 200GB of the 1TB,
that's can be a lot of overhead (I assume).

I was basically hoping that if it was all in a single grep process,
the de/serialization steps could be skipped completely.
I think the buzzword for that is "zero-copy"?

I've noticed "grep" is about 30% slower than either "grep -F" or
"LC_COLLATE=C grep", because (I think) it avoids the costs of decoding
from UTF-8 to Unicode and back.  So I was basically expecting a
similar saving from --and.

I'm only speaking as an end user - I haven't dug through the grep
source, so those expectations might be unrealistic, and implementing
it might be painful/impossible.  I figured I should at least ask :-)

If your expert opinion is that it's a pain to implement (and
maintain!) and there's not enough demand, then I'm OK with that.
This is NOT something that's burning me every day.

Regardless, I appreciate you taking the time to discuss it. :-)


PS: Regarding portability, I'm personally not worried because when I
need a GNUism badly enough (e.g. du --threshold), I can usually get
permission to install the relevant GNU software, even if it's only
into %APPDATA% or $HOME.

PS: I noticed on bugs.gnu.org something about grep being
single-threaded, which might mean "grep --and" would end up being
SLOWER than the existing pipelines, since the kernel can distribute
a pipeline's elements across multiple CPUs/cores.




This bug report was last modified 2 years and 149 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.