GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Wed, 19 Jul 2023 21:17:02 UTC

Severity: normal

Found in version 29.0.92

Full log


View this message in rfc822 format

From: sbaugh <at> catern.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: sbaugh <at> janestreet.com, yantar92 <at> posteo.net, rms <at> gnu.org, dmitry <at> gutov.dev, michael.albinus <at> gmx.de, 64735 <at> debbugs.gnu.org
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Sat, 22 Jul 2023 17:18:19 +0000 (UTC)
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: sbaugh <at> catern.com
>> Date: Sat, 22 Jul 2023 10:38:37 +0000 (UTC)
>> Cc: Spencer Baugh <sbaugh <at> janestreet.com>, dmitry <at> gutov.dev,
>> 	yantar92 <at> posteo.net, michael.albinus <at> gmx.de, rms <at> gnu.org,
>> 	64735 <at> debbugs.gnu.org
>> 
>> Eli Zaretskii <eliz <at> gnu.org> writes:
>> > No, the first step is to use in Emacs what Find does today, because it
>> > will already be a significant speedup.
>> 
>> Why bother?  directory-files-recursively is a rarely used API, as you
>> have mentioned before in this thread.
>
> Because we could then use it much more (assuming the result will be
> performant enough -- this remains to be seen).
>
>> And there is a way to speed it up which will have a performance boost
>> which is unbeatable any other way: Use find instead of
>> directory-files-recursively, and operate on files as they find prints
>> them.
>
> Not every command can operate on the output sequentially: some need to
> see all of the output, others will need to be redesigned and
> reimplemented to support such sequential mode.
>
> Moreover, piping from Find incurs overhead: data is broken into blocks
> by the pipe or PTY, reading the data can be slowed down if Emacs is
> busy processing something, etc.

I went ahead and implemented it, and I get a 2x speedup even *without*
running find in parallel with Emacs.

First my results:

(my-bench 100 "~/public_html" "")
(("built-in" . "Elapsed time: 1.140173s (0.389344s in 5 GCs)")
 ("with-find" . "Elapsed time: 0.643306s (0.305130s in 4 GCs)"))

(my-bench 10 "~/.local/src/linux" "")
(("built-in" . "Elapsed time: 2.402341s (0.937857s in 11 GCs)")
 ("with-find" . "Elapsed time: 1.544024s (0.827364s in 10 GCs)"))

(my-bench 100 "/ssh:catern.com:~/public_html" "")
(("built-in" . "Elapsed time: 36.494233s (6.450840s in 79 GCs)")
 ("with-find" . "Elapsed time: 4.619035s (1.133656s in 14 GCs)"))

2x speedup on local files, and almost a 10x speedup for remote files.

And my implementation *isn't even using the fact that find can run in
parallel with Emacs*.  If I did start using that, I expect even more
speed gains from parallelism, which aren't achievable in Emacs itself.

So can we add something like this (with the appropriate fallbacks to
directory-files-recursively), since it has such a big speedup even
without parallelism?

My implementation and benchmarking:

(defun find-directory-files-recursively (dir regexp &optional include-directories _predicate follow-symlinks)
  (cl-assert (null _predicate) t "find-directory-files-recursively can't accept arbitrary predicates")
  (with-temp-buffer
    (setq case-fold-search nil)
    (cd dir)
    (let* ((command
	    (append
	     (list "find" (file-local-name dir))
	     (if follow-symlinks
		 '("-L")
	       '("!" "(" "-type" "l" "-xtype" "d" ")"))
	     (unless (string-empty-p regexp)
	       "-regex" (concat ".*" regexp ".*"))
	     (unless include-directories
	       '("!" "-type" "d"))
	     '("-print0")
	     ))
	   (remote (file-remote-p dir))
	   (proc
	    (if remote
		(let ((proc (apply #'start-file-process
				   "find" (current-buffer) command)))
		  (set-process-sentinel proc (lambda (_proc _state)))
		  (set-process-query-on-exit-flag proc nil)
		  proc)
	      (make-process :name "find" :buffer (current-buffer)
			    :connection-type 'pipe
			    :noquery t
			    :sentinel (lambda (_proc _state))
			    :command command))))
      (while (accept-process-output proc))
      (let ((start (goto-char (point-min))) ret)
	(while (search-forward "\0" nil t)
	  (push (concat remote (buffer-substring-no-properties start (1- (point)))) ret)
	  (setq start (point)))
	ret))))

(defun my-bench (count path regexp)
  (setq path (expand-file-name path))
  (let ((old (directory-files-recursively path regexp))
	(new (find-directory-files-recursively path regexp)))
    (dolist (path old)
      (should (member path new)))
    (dolist (path new)
      (should (member path old))))
  (list
   (cons "built-in" (benchmark count (list 'directory-files-recursively path regexp)))
   (cons "with-find" (benchmark count (list 'find-directory-files-recursively path regexp)))))




This bug report was last modified 1 year and 274 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.