#64735 - 29.0.92; find invocations are ~15x slower because of ignores

GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Wed, 19 Jul 2023 21:17:02 UTC

Severity: normal

Found in version 29.0.92

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org> To: Dmitry Gutov <dmitry <at> gutov.dev> Cc: luangruo <at> yahoo.com, sbaugh <at> janestreet.com, yantar92 <at> posteo.net, 64735 <at> debbugs.gnu.org Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores Date: Thu, 27 Jul 2023 08:22:27 +0300

> Date: Thu, 27 Jul 2023 03:41:29 +0300 > Cc: Eli Zaretskii <eliz <at> gnu.org>, luangruo <at> yahoo.com, sbaugh <at> janestreet.com, > 64735 <at> debbugs.gnu.org > From: Dmitry Gutov <dmitry <at> gutov.dev> > > > I have modified `directory-files-recursively' to avoid O(N^2) `nconc' > > calls + bypassing regexp matches when REGEXP is nil. > > Sounds good. I haven't examined the diff closely, but it sounds like an > improvement that can be applied irrespective of how this discussion ends. That change should be submitted as a separate issue and discussed in detail before we decide we can make it. > Skipping regexp matching entirely, though, will make this benchmark > farther removed from real-life usage: this thread started from being > able to handle multiple ignore entries when listing files (e.g. in a > project). Agreed. From my POV, that variant's purpose was only to show how much time is spent in matching file names against some include or exclude list. > So any solution for that (whether we use it on all or just > some platforms) needs to be able to handle those. And it doesn't seem > like directory-files-recursively has any alternative solution for that > other than calling string-match on every found file. There's a possibility of pushing this filtering into file-name-all-completions, but I'm not sure that will be faster. We should try that and measure the results, I think. > > If we forget about GC, Elisp version can get fairly close to GNU find. > > And if we do not perform regexp matching (which makes sense when the > > REGEXP is ""), Elisp version is faster. > > We can't really forget about GC, though. But we could temporarily lift the threshold while this function runs, if that leads to significant savings. > But the above numbers make me hopeful about the async-parallel solution, > implying that the parallelization really can help (and offset whatever > latency we lose on pselect), as soon as we determine the source of extra > consing and decide what to do about it. Isn't it clear that additional consing comes from the fact that we first insert the Find's output into a buffer or produce a string from it, and then chop that into individual file names?

This bug report was last modified 1 year and 327 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #64735 29.0.92; find invocations are ~15x slower because of ignores

GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores