GNU bug report logs -
#64735
29.0.92; find invocations are ~15x slower because of ignores
Previous Next
Full log
View this message in rfc822 format
> From: sbaugh <at> catern.com
> Date: Sat, 22 Jul 2023 17:18:19 +0000 (UTC)
> Cc: sbaugh <at> janestreet.com, yantar92 <at> posteo.net, rms <at> gnu.org, dmitry <at> gutov.dev,
> michael.albinus <at> gmx.de, 64735 <at> debbugs.gnu.org
>
> First my results:
>
> (my-bench 100 "~/public_html" "")
> (("built-in" . "Elapsed time: 1.140173s (0.389344s in 5 GCs)")
> ("with-find" . "Elapsed time: 0.643306s (0.305130s in 4 GCs)"))
>
> (my-bench 10 "~/.local/src/linux" "")
> (("built-in" . "Elapsed time: 2.402341s (0.937857s in 11 GCs)")
> ("with-find" . "Elapsed time: 1.544024s (0.827364s in 10 GCs)"))
>
> (my-bench 100 "/ssh:catern.com:~/public_html" "")
> (("built-in" . "Elapsed time: 36.494233s (6.450840s in 79 GCs)")
> ("with-find" . "Elapsed time: 4.619035s (1.133656s in 14 GCs)"))
>
> 2x speedup on local files, and almost a 10x speedup for remote files.
Thanks, that's impressive. But you omitted some of the features of
directory-files-recursively, see below.
> And my implementation *isn't even using the fact that find can run in
> parallel with Emacs*. If I did start using that, I expect even more
> speed gains from parallelism, which aren't achievable in Emacs itself.
I'm not sure I understand what you mean by "in parallel" and why it
would be faster.
> So can we add something like this (with the appropriate fallbacks to
> directory-files-recursively), since it has such a big speedup even
> without parallelism?
We can have an alternative implementation, yes. But it should support
predicate, and it should sort the files in each directory like
directory-files-recursively does, so that it's a drop-in replacement.
Also, I believe that Find does return "." in each directory, and your
implementation doesn't filter them, whereas
directory-files-recursively does AFAIR.
And I see no need for any fallback: that's for the application to do
if it wants.
> (cl-assert (null _predicate) t "find-directory-files-recursively can't accept arbitrary predicates")
It should.
> (if follow-symlinks
> '("-L")
> '("!" "(" "-type" "l" "-xtype" "d" ")"))
> (unless (string-empty-p regexp)
> "-regex" (concat ".*" regexp ".*"))
> (unless include-directories
> '("!" "-type" "d"))
> '("-print0")
Some of these switches are specific to GNU Find. Are we going to
support only GNU Find?
> ))
> (remote (file-remote-p dir))
> (proc
> (if remote
> (let ((proc (apply #'start-file-process
> "find" (current-buffer) command)))
> (set-process-sentinel proc (lambda (_proc _state)))
> (set-process-query-on-exit-flag proc nil)
> proc)
> (make-process :name "find" :buffer (current-buffer)
> :connection-type 'pipe
> :noquery t
> :sentinel (lambda (_proc _state))
> :command command))))
> (while (accept-process-output proc))
Why do you call accept-process-output here? it could interfere with
reading output from async subprocesses running at the same time. To
come think of this, why use async subprocesses here and not
call-process?
This bug report was last modified 1 year and 274 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.