GNU bug report logs -
#71094
[PATCH] Prefer to run find and grep in parallel in rgrep
Previous Next
Reported by: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Tue, 21 May 2024 14:36:01 UTC
Severity: normal
Tags: patch
Done: Andrea Corallo <acorallo <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> Date: Wed, 22 May 2024 17:50:42 +0300
> Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
> From: Dmitry Gutov <dmitry <at> gutov.dev>
>
> >> Whereas in the Emacs repository "find ... -print0 | wc" reports 202928
> >> characters. Meaning, it uses just 1.5 'grep' invocations. To see better
> >> parallelism there we'll need to either lower the limit or test it in a
> >> project at least twice as big.
> >
> > ...until xargs collects all those characters, it will not invoke grep,
> > right? So, for directories whose file names total less than those
> > 200K, xargs will still wait until find ends its job, right?
>
> That's right. And it's why we're not seeing much of a difference in
> projects of Emacs's size or smaller. No apparent regression either, though.
But we added xargs to the soup. On GNU/Linux, where GNU Findutils are
developed, it probably isn't a problem. On other systems, not
necessarily...
> >> So here is another example: a Linux kernel checkout (76K files). Also
> >> about 30% improvement: 1.40s vs 2.00s.
> >
> > This is all highly system-dependent.
>
> Naturally. So it'd be great to see some additional data points from
> users on other systems.
>
> Especially those where the default limit is lower than it is on mine.
I'd be happy if someone could time these methods on MS-Windows and on
some *BSD system, at least. Bonus points for macOS.
This bug report was last modified 1 year and 14 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.