GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Wed, 19 Jul 2023 21:17:02 UTC

Severity: normal

Found in version 29.0.92

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: sbaugh <at> catern.com
Cc: sbaugh <at> janestreet.com, yantar92 <at> posteo.net, rms <at> gnu.org, dmitry <at> gutov.dev, michael.albinus <at> gmx.de, 64735 <at> debbugs.gnu.org
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Sat, 22 Jul 2023 20:46:01 +0300
> From: sbaugh <at> catern.com
> Date: Sat, 22 Jul 2023 17:18:19 +0000 (UTC)
> Cc: sbaugh <at> janestreet.com, yantar92 <at> posteo.net, rms <at> gnu.org, dmitry <at> gutov.dev,
> 	michael.albinus <at> gmx.de, 64735 <at> debbugs.gnu.org
> 
> First my results:
> 
> (my-bench 100 "~/public_html" "")
> (("built-in" . "Elapsed time: 1.140173s (0.389344s in 5 GCs)")
>  ("with-find" . "Elapsed time: 0.643306s (0.305130s in 4 GCs)"))
> 
> (my-bench 10 "~/.local/src/linux" "")
> (("built-in" . "Elapsed time: 2.402341s (0.937857s in 11 GCs)")
>  ("with-find" . "Elapsed time: 1.544024s (0.827364s in 10 GCs)"))
> 
> (my-bench 100 "/ssh:catern.com:~/public_html" "")
> (("built-in" . "Elapsed time: 36.494233s (6.450840s in 79 GCs)")
>  ("with-find" . "Elapsed time: 4.619035s (1.133656s in 14 GCs)"))
> 
> 2x speedup on local files, and almost a 10x speedup for remote files.

Thanks, that's impressive.  But you omitted some of the features of
directory-files-recursively, see below.

> And my implementation *isn't even using the fact that find can run in
> parallel with Emacs*.  If I did start using that, I expect even more
> speed gains from parallelism, which aren't achievable in Emacs itself.

I'm not sure I understand what you mean by "in parallel" and why it
would be faster.

> So can we add something like this (with the appropriate fallbacks to
> directory-files-recursively), since it has such a big speedup even
> without parallelism?

We can have an alternative implementation, yes.  But it should support
predicate, and it should sort the files in each directory like
directory-files-recursively does, so that it's a drop-in replacement.
Also, I believe that Find does return "." in each directory, and your
implementation doesn't filter them, whereas
directory-files-recursively does AFAIR.

And I see no need for any fallback: that's for the application to do
if it wants.

>   (cl-assert (null _predicate) t "find-directory-files-recursively can't accept arbitrary predicates")

It should.

> 	     (if follow-symlinks
> 		 '("-L")
> 	       '("!" "(" "-type" "l" "-xtype" "d" ")"))
> 	     (unless (string-empty-p regexp)
> 	       "-regex" (concat ".*" regexp ".*"))
> 	     (unless include-directories
> 	       '("!" "-type" "d"))
> 	     '("-print0")

Some of these switches are specific to GNU Find.  Are we going to
support only GNU Find?

> 	     ))
> 	   (remote (file-remote-p dir))
> 	   (proc
> 	    (if remote
> 		(let ((proc (apply #'start-file-process
> 				   "find" (current-buffer) command)))
> 		  (set-process-sentinel proc (lambda (_proc _state)))
> 		  (set-process-query-on-exit-flag proc nil)
> 		  proc)
> 	      (make-process :name "find" :buffer (current-buffer)
> 			    :connection-type 'pipe
> 			    :noquery t
> 			    :sentinel (lambda (_proc _state))
> 			    :command command))))
>       (while (accept-process-output proc))

Why do you call accept-process-output here? it could interfere with
reading output from async subprocesses running at the same time.  To
come think of this, why use async subprocesses here and not
call-process?




This bug report was last modified 1 year and 274 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.