GNU bug report logs -
#64735
29.0.92; find invocations are ~15x slower because of ignores
Previous Next
Full log
View this message in rfc822 format
On 26/07/2023 12:09, Ihor Radchenko wrote:
>> It's also something to note that, GC-wise, numbers 1 and 2 are not the
>> worst: the time must be spent somewhere else.
> Indeed. I did more detailed analysis in
> https://yhetil.org/emacs-devel/87cz0p2xlc.fsf <at> localhost/
>
> Main contributors in the lisp versions are (in the order from most
> significant to less significant) (1) file name handlers; (2) regexp
> matching of the file names; (3) nconc calls in the current
> `directory-files-recursively' implementation.
>
> I have modified `directory-files-recursively' to avoid O(N^2) `nconc'
> calls + bypassing regexp matches when REGEXP is nil.
Sounds good. I haven't examined the diff closely, but it sounds like an
improvement that can be applied irrespective of how this discussion ends.
Skipping regexp matching entirely, though, will make this benchmark
farther removed from real-life usage: this thread started from being
able to handle multiple ignore entries when listing files (e.g. in a
project). So any solution for that (whether we use it on all or just
some platforms) needs to be able to handle those. And it doesn't seem
like directory-files-recursively has any alternative solution for that
other than calling string-match on every found file.
> Here are the results (using the attached modified version of your
> benchmark file):
>
> (my-bench 10 "/usr/src/linux/" "")
> (("built-in" . "Elapsed time: 7.285597s (3.853368s in 6 GCs)")
> ("built-in no filename handler alist" . "Elapsed time: 5.855019s (3.760662s in 6 GCs)")
> ("built-in non-recursive no filename handler alist" . "Elapsed time: 5.817639s (4.326945s in 7 GCs)")
> ("built-in non-recursive no filename handler alist + skip re-match" . "Elapsed time: 2.708306s (1.871665s in 3 GCs)")
> ("with-find" . "Elapsed time: 6.082200s (4.262830s in 7 GCs)")
> ("with-find-p" . "Elapsed time: 4.325503s (3.058647s in 5 GCs)")
> ("with-find-sync" . "Elapsed time: 3.267648s (1.903655s in 3 GCs)"))
Nice.
> (let ((gc-cons-threshold most-positive-fixnum))
> (my-bench 10 "/usr/src/linux/" ""))
> (("built-in" . "Elapsed time: 2.754473s")
> ("built-in no filename handler alist" . "Elapsed time: 1.322443s")
> ("built-in non-recursive no filename handler alist" . "Elapsed time: 1.235044s")
> ("built-in non-recursive no filename handler alist + skip re-match" . "Elapsed time: 0.750275s")
> ("with-find" . "Elapsed time: 1.438510s")
> ("with-find-p" . "Elapsed time: 1.200876s")
> ("with-find-sync" . "Elapsed time: 1.349755s"))
>
> If we forget about GC, Elisp version can get fairly close to GNU find.
> And if we do not perform regexp matching (which makes sense when the
> REGEXP is ""), Elisp version is faster.
We can't really forget about GC, though.
But the above numbers make me hopeful about the async-parallel solution,
implying that the parallelization really can help (and offset whatever
latency we lose on pselect), as soon as we determine the source of extra
consing and decide what to do about it.
This bug report was last modified 1 year and 274 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.