GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Wed, 19 Jul 2023 21:17:02 UTC

Severity: normal

Found in version 29.0.92

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: luangruo <at> yahoo.com, sbaugh <at> janestreet.com, yantar92 <at> posteo.net, 64735 <at> debbugs.gnu.org
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Thu, 27 Jul 2023 08:22:27 +0300
> Date: Thu, 27 Jul 2023 03:41:29 +0300
> Cc: Eli Zaretskii <eliz <at> gnu.org>, luangruo <at> yahoo.com, sbaugh <at> janestreet.com,
>  64735 <at> debbugs.gnu.org
> From: Dmitry Gutov <dmitry <at> gutov.dev>
> 
> > I have modified `directory-files-recursively' to avoid O(N^2) `nconc'
> > calls + bypassing regexp matches when REGEXP is nil.
> 
> Sounds good. I haven't examined the diff closely, but it sounds like an 
> improvement that can be applied irrespective of how this discussion ends.

That change should be submitted as a separate issue and discussed in
detail before we decide we can make it.

> Skipping regexp matching entirely, though, will make this benchmark 
> farther removed from real-life usage: this thread started from being 
> able to handle multiple ignore entries when listing files (e.g. in a 
> project).

Agreed.  From my POV, that variant's purpose was only to show how much
time is spent in matching file names against some include or exclude
list.

> So any solution for that (whether we use it on all or just 
> some platforms) needs to be able to handle those. And it doesn't seem 
> like directory-files-recursively has any alternative solution for that 
> other than calling string-match on every found file.

There's a possibility of pushing this filtering into
file-name-all-completions, but I'm not sure that will be faster.  We
should try that and measure the results, I think.

> > If we forget about GC, Elisp version can get fairly close to GNU find.
> > And if we do not perform regexp matching (which makes sense when the
> > REGEXP is ""), Elisp version is faster.
> 
> We can't really forget about GC, though.

But we could temporarily lift the threshold while this function runs,
if that leads to significant savings.

> But the above numbers make me hopeful about the async-parallel solution, 
> implying that the parallelization really can help (and offset whatever 
> latency we lose on pselect), as soon as we determine the source of extra 
> consing and decide what to do about it.

Isn't it clear that additional consing comes from the fact that we
first insert the Find's output into a buffer or produce a string from
it, and then chop that into individual file names?




This bug report was last modified 1 year and 274 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.