GNU bug report logs -
#71094
[PATCH] Prefer to run find and grep in parallel in rgrep
Previous Next
Reported by: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Tue, 21 May 2024 14:36:01 UTC
Severity: normal
Tags: patch
Done: Andrea Corallo <acorallo <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> Date: Wed, 22 May 2024 17:22:56 +0300
> Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
> From: Dmitry Gutov <dmitry <at> gutov.dev>
>
> >> The directory where I saw significant improvement has 300K files.
> >
> > That's what I thought. So we are changing the decade-old defaults to
> > favor huge directories, which is not necessarily the wisest thing to
> > do.
>
> I don't see any regression on small directories, though. And an
> improvement on big ones.
On your system.
> > That's true, but what is your mental model of how the pipe with xargs
> > works in practice? How many invocations of grep will xargs do, and
> > when will the first invocation happen?
>
> In my mental model xargs acts like an asynchronous queue with batch
> processing. The first invocation will happen after the output reaches
> the maximum line number of maximum number of arguments configured. They
> are system-dependent by default.
And can be rather small. But if it is large, then...
> For example, on my system 'xargs --show-limits' says
>
> Size of command buffer we are actually using: 131072
>
> Whereas in the Emacs repository "find ... -print0 | wc" reports 202928
> characters. Meaning, it uses just 1.5 'grep' invocations. To see better
> parallelism there we'll need to either lower the limit or test it in a
> project at least twice as big.
...until xargs collects all those characters, it will not invoke grep,
right? So, for directories whose file names total less than those
200K, xargs will still wait until find ends its job, right?
> So here is another example: a Linux kernel checkout (76K files). Also
> about 30% improvement: 1.40s vs 2.00s.
This is all highly system-dependent.
This bug report was last modified 326 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.