GNU bug report logs - #71094
[PATCH] Prefer to run find and grep in parallel in rgrep

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Tue, 21 May 2024 14:36:01 UTC

Severity: normal

Tags: patch

Done: Andrea Corallo <acorallo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: rgm <at> gnu.org, 71094 <at> debbugs.gnu.org, dmitry <at> gutov.dev
Subject: bug#71094: [PATCH] Prefer to run find and grep in parallel in rgrep
Date: Wed, 22 May 2024 14:59:39 +0300
> Cc: Glenn Morris <rgm <at> gnu.org>, dmitry <at> gutov.dev
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Date: Tue, 21 May 2024 10:35:07 -0400
> 
> grep.el prefers to run "find" and "xargs grep" in a pipeline,
> which means that "find" can continue searching the filesystem
> while "xargs grep" searches files.  If find and xargs don't
> support the flags required for this behavior, grep.el will fall
> back to using the -exec flags to "find", which meant "find" will
> wait for each "grep" process to complete before continuing to
> search the filesystem tree.  This behavior is controlled by
> grep-find-use-xargs; `gnu' produces the pipeline and `exec' is
> the slower fallback.
> 
> In f3ca7378c1336b3ff98ecb5a99a98c7b2eceece9, the `exec-plus'
> option was added for grep-find-use-xargs, which improves on
> `exec' by running one "grep" process to search multiple files,
> which `gnu' (by using xargs) already did.  However, the change
> erroneously added the `exec-plus' case before the `gnu' case in
> the autodetection code in grep-compute-defaults, so `exec-plus'
> would be used even if `gnu' was supported.
> 
> This change just swaps the two cases, so the faster `gnu' option
> is once again used in preference to `exec-plus'.  In my
> benchmarking on a large repository, this provides a ~40%
> speedup.

With how many files did you measure the 40% speedup?  Can you show the
performance with much fewer and much more files than what you used?  I
suspect that the effect depends on that.  (It also depends on the
system limit on the number of files and the length of the command line
that xargs can use.)  The argument about 'find' waiting is no longer
relevant with 'exec-plus', since in most cases there will be just one
invocation of 'grep'.

In any case, please modify the patch so that 'exec-plus' is still
preferred on MS-Windows (because most Windows ports of xargs are IME
abysmally buggy, so better avoided as much as possible).

A comment there with the justification of the order will also be
appreciated.

Thanks.




This bug report was last modified 326 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.