GNU bug report logs -
#64735
29.0.92; find invocations are ~15x slower because of ignores
Previous Next
Full log
View this message in rfc822 format
> Date: Thu, 27 Jul 2023 03:41:29 +0300
> Cc: Eli Zaretskii <eliz <at> gnu.org>, luangruo <at> yahoo.com, sbaugh <at> janestreet.com,
> 64735 <at> debbugs.gnu.org
> From: Dmitry Gutov <dmitry <at> gutov.dev>
>
> > I have modified `directory-files-recursively' to avoid O(N^2) `nconc'
> > calls + bypassing regexp matches when REGEXP is nil.
>
> Sounds good. I haven't examined the diff closely, but it sounds like an
> improvement that can be applied irrespective of how this discussion ends.
That change should be submitted as a separate issue and discussed in
detail before we decide we can make it.
> Skipping regexp matching entirely, though, will make this benchmark
> farther removed from real-life usage: this thread started from being
> able to handle multiple ignore entries when listing files (e.g. in a
> project).
Agreed. From my POV, that variant's purpose was only to show how much
time is spent in matching file names against some include or exclude
list.
> So any solution for that (whether we use it on all or just
> some platforms) needs to be able to handle those. And it doesn't seem
> like directory-files-recursively has any alternative solution for that
> other than calling string-match on every found file.
There's a possibility of pushing this filtering into
file-name-all-completions, but I'm not sure that will be faster. We
should try that and measure the results, I think.
> > If we forget about GC, Elisp version can get fairly close to GNU find.
> > And if we do not perform regexp matching (which makes sense when the
> > REGEXP is ""), Elisp version is faster.
>
> We can't really forget about GC, though.
But we could temporarily lift the threshold while this function runs,
if that leads to significant savings.
> But the above numbers make me hopeful about the async-parallel solution,
> implying that the parallelization really can help (and offset whatever
> latency we lose on pselect), as soon as we determine the source of extra
> consing and decide what to do about it.
Isn't it clear that additional consing comes from the fact that we
first insert the Find's output into a buffer or produce a string from
it, and then chop that into individual file names?
This bug report was last modified 1 year and 274 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.