GNU bug report logs - #71094
[PATCH] Prefer to run find and grep in parallel in rgrep

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Tue, 21 May 2024 14:36:01 UTC

Severity: normal

Tags: patch

Done: Andrea Corallo <acorallo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: sbaugh <at> janestreet.com, 71094 <at> debbugs.gnu.org, rgm <at> gnu.org
Subject: bug#71094: [PATCH] Prefer to run find and grep in parallel in rgrep
Date: Wed, 22 May 2024 17:50:42 +0300
On 22/05/2024 17:42, Eli Zaretskii wrote:
>>> That's true, but what is your mental model of how the pipe with xargs
>>> works in practice?  How many invocations of grep will xargs do, and
>>> when will the first invocation happen?
>>
>> In my mental model xargs acts like an asynchronous queue with batch
>> processing. The first invocation will happen after the output reaches
>> the maximum line number of maximum number of arguments configured. They
>> are system-dependent by default.
> 
> And can be rather small.  But if it is large, then...
> 
>> For example, on my system 'xargs --show-limits' says
>>
>>     Size of command buffer we are actually using: 131072
>>
>> Whereas in the Emacs repository "find ... -print0 | wc" reports 202928
>> characters. Meaning, it uses just 1.5 'grep' invocations. To see better
>> parallelism there we'll need to either lower the limit or test it in a
>> project at least twice as big.
> 
> ...until xargs collects all those characters, it will not invoke grep,
> right?  So, for directories whose file names total less than those
> 200K, xargs will still wait until find ends its job, right?

That's right. And it's why we're not seeing much of a difference in 
projects of Emacs's size or smaller. No apparent regression either, though.

>> So here is another example: a Linux kernel checkout (76K files). Also
>> about 30% improvement: 1.40s vs 2.00s.
> 
> This is all highly system-dependent.

Naturally. So it'd be great to see some additional data points from 
users on other systems.

Especially those where the default limit is lower than it is on mine.




This bug report was last modified 326 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.