#37754 - wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union)

GNU bug report logs - #37754
wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union)

Package: grep;

Reported by: "Trent W. Buck" <trentbuck <at> gmail.com>

Date: Tue, 15 Oct 2019 01:49:01 UTC

Severity: wishlist

Found in version 3.3-1

Message #20 received at 37754 <at> debbugs.gnu.org (full text, mbox):

From: "Trent W. Buck" <trentbuck <at> gmail.com> To: Paul Eggert <eggert <at> cs.ucla.edu> Cc: 37754 <at> debbugs.gnu.org Subject: Re: bug#37754: wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union) Date: Fri, 18 Oct 2019 22:49:23 +1100

Paul Eggert wrote: > On 10/16/19 5:19 PM, Trent W. Buck wrote: > > I would expect "grep -Fw -e 4GB -e DDR4 --and" to print the same thing as > > > > grep -Fw 4GB | grep -Fw DDR4 | grep -Fw -e 4GB -e DDR4 -o > > You're right, it's not obvious. :-) > > It may be better to just pipe greps together, as you do now. That's simple > and fast enough for this relatively-uncommon case, and it's portable to all > greps. I admit that most of the time, I want "grep --and" for a small dataset (<1MB computer_parts.txt), so it's merely a convenience. Sometimes I grep audit logs (~1TB uncompressed), which takes anywhere from 15 minutes to 3 days, depending on how I tweak my grep calls. In that case, each grep in the pipeline has to pay the costs to de-serialize input from the previous grep, and re-serialize output to the next grep. If the first grep matches (say) 200GB of the 1TB, that's can be a lot of overhead (I assume). I was basically hoping that if it was all in a single grep process, the de/serialization steps could be skipped completely. I think the buzzword for that is "zero-copy"? I've noticed "grep" is about 30% slower than either "grep -F" or "LC_COLLATE=C grep", because (I think) it avoids the costs of decoding from UTF-8 to Unicode and back. So I was basically expecting a similar saving from --and. I'm only speaking as an end user - I haven't dug through the grep source, so those expectations might be unrealistic, and implementing it might be painful/impossible. I figured I should at least ask :-) If your expert opinion is that it's a pain to implement (and maintain!) and there's not enough demand, then I'm OK with that. This is NOT something that's burning me every day. Regardless, I appreciate you taking the time to discuss it. :-) PS: Regarding portability, I'm personally not worried because when I need a GNUism badly enough (e.g. du --threshold), I can usually get permission to install the relevant GNU software, even if it's only into %APPDATA% or $HOME. PS: I noticed on bugs.gnu.org something about grep being single-threaded, which might mean "grep --and" would end up being SLOWER than the existing pipelines, since the kernel can distribute a pipeline's elements across multiple CPUs/cores.

This bug report was last modified 2 years and 245 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #37754 wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union)

GNU bug report logs - #37754
wish for grep --and -eX -eY -eZ (X∩Y∩Z intersection, not X∪Y∪Z union)