GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Wed, 19 Jul 2023 21:17:02 UTC

Severity: normal

Found in version 29.0.92

Full log


View this message in rfc822 format

From: Michael Albinus <michael.albinus <at> gmx.de>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: yantar92 <at> posteo.net, rms <at> gnu.org, sbaugh <at> catern.com, dmitry <at> gutov.dev, Eli Zaretskii <eliz <at> gnu.org>, 64735 <at> debbugs.gnu.org
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Sun, 23 Jul 2023 13:44:28 +0200
Spencer Baugh <sbaugh <at> janestreet.com> writes:

Hi Spencer,

> I mean having Emacs read output from the process and turn them into
> strings while find is still running and walking the directory tree.  So
> the two parts are running in parallel.  This, specifically:

Just as POC, I have modified your function slightly that it runs with
both local and remote directories.

--8<---------------cut here---------------start------------->8---
(defun find-directory-files-recursively (dir regexp &optional include-directories _predicate follow-symlinks)
  (let* (buffered
         result
	 (remote (file-remote-p dir))
	 (file-name-handler-alist (and remote file-name-handler-alist))
         (proc
	  (make-process
           :name "find" :buffer nil
	   :connection-type 'pipe
	   :noquery t
	   :sentinel #'ignore
	   :file-handler remote
           :filter (lambda (proc data)
                     (let ((start 0))
		       (when-let ((end (string-search "\0" data start)))
			 (push (concat buffered (substring data start end)) result)
			 (setq buffered "")
			 (setq start (1+ end))
			 (while-let ((end (string-search "\0" data start)))
                           (push (substring data start end) result)
                           (setq start (1+ end))))
                       (setq buffered (concat buffered (substring data start)))))
	   :command (append
	             (list "find" (file-local-name dir))
	             (if follow-symlinks
		         '("-L")
	               '("!" "(" "-type" "l" "-xtype" "d" ")"))
	             (unless (string-empty-p regexp)
	               "-regex" (concat ".*" regexp ".*"))
	             (unless include-directories
	               '("!" "-type" "d"))
	             '("-print0")
	             ))))
    (while (accept-process-output proc))
    (if remote (mapcar (lambda (file) (concat remote file)) result) result)))
--8<---------------cut here---------------end--------------->8---

This returns on my laptop

--8<---------------cut here---------------start------------->8---
(my-bench 100 "~/src/tramp" "")
(("built-in" . "Elapsed time: 99.177562s (3.403403s in 107 GCs)")
 ("with-find" . "Elapsed time: 83.432360s (2.820053s in 98 GCs)"))

(my-bench 100 "/ssh:remotehost:~/src/tramp" "")
(("built-in" . "Elapsed time: 128.406359s (34.981183s in 1850 GCs)")
 ("with-find" . "Elapsed time: 82.765064s (4.155410s in 163 GCs)"))
--8<---------------cut here---------------end--------------->8---

Of course the other problems still remain. For example, you cannot know
whether on a given host (local or remote) find supports all
arguments. On my NAS, for example, we have

--8<---------------cut here---------------start------------->8---
[~] # find -h
BusyBox v1.01 (2022.10.27-23:57+0000) multi-call binary

Usage: find [PATH...] [EXPRESSION]

Search for files in a directory hierarchy.  The default PATH is
the current directory; default EXPRESSION is '-print'

EXPRESSION may consist of:
	-follow		Dereference symbolic links.
	-name PATTERN	File name (leading directories removed) matches PATTERN.
	-print		Print (default and assumed).

	-type X		Filetype matches X (where X is one of: f,d,l,b,c,...)
	-perm PERMS	Permissions match any of (+NNN); all of (-NNN);
			or exactly (NNN)
	-mtime TIME	Modified time is greater than (+N); less than (-N);
			or exactly (N) days
--8<---------------cut here---------------end--------------->8---

Best regards, Michael.




This bug report was last modified 1 year and 274 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.