GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Wed, 19 Jul 2023 21:17:02 UTC

Severity: normal

Found in version 29.0.92

Full log


Message #302 received at 64735 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: yantar92 <at> posteo.net, rms <at> gnu.org, sbaugh <at> catern.com, dmitry <at> gutov.dev,
 michael.albinus <at> gmx.de, 64735 <at> debbugs.gnu.org
Subject: Re: bug#64735: 29.0.92; find invocations are ~15x slower because of
 ignores
Date: Sun, 23 Jul 2023 09:15:22 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: sbaugh <at> catern.com,  yantar92 <at> posteo.net,  rms <at> gnu.org,
>    dmitry <at> gutov.dev,  michael.albinus <at> gmx.de,  64735 <at> debbugs.gnu.org
> Date: Sat, 22 Jul 2023 16:53:05 -0400
> 
> Can you try this further change on your Windows (and GNU/Linux) box?  I
> just tested on a different box and my original change gets:
> 
> (("built-in" . "Elapsed time: 4.506643s (2.276269s in 21 GCs)")
>  ("with-find" . "Elapsed time: 4.114531s (2.848497s in 27 GCs)"))
> 
> while this parallel implementation gets
> 
> (("built-in" . "Elapsed time: 4.479185s (2.236561s in 21 GCs)")
>  ("with-find" . "Elapsed time: 2.858452s (1.934647s in 19 GCs)"))
> 
> so it might have a favorable impact on Windows and your other GNU/Linux
> box.

Almost no effect here on MS-Windows:

  (("built-in" . "Elapsed time: 0.859375s (0.093750s in 4 GCs)")
   ("with-find" . "Elapsed time: 8.437500s (0.078125s in 4 GCs)"))

It was 8.578 sec with the previous version.

(The Lisp version is somewhat faster in this test because I
native-compiled the code for this test.)

On GNU/Linux:

  (("built-in" . "Elapsed time: 4.244898s (1.934182s in 56 GCs)")
   ("with-find" . "Elapsed time: 3.011574s (1.190498s in 35 GCs)"))

Faster by 10% (previous version yielded 3.327 sec).

Btw, I needed to fix the code: when-let needs 2 open parens after it,
not one.  The original code signals an error from the filter function
in Emacs 29.

> >>   (cl-assert (null _predicate) t "find-directory-files-recursively can't accept arbitrary predicates")
> >
> > It should.
> 
> This is where I think a fallback would be useful - it's basically
> impossible to support arbitrary predicates efficiently here, since it
> requires us to put Lisp in control of whether find descends into a
> directory.

There's nothing wrong with supporting this less efficiently.

And there's no need to control where Find descends: you could just
filter out the files from those directories that need to be ignored.

> So I'm thinking I would just fall back to running the old
> directory-files-recursively whenever there's a predicate.  Or just not
> supporting this at all...

We cannot not support it at all, because then it will not be a
replacement.  Fallback is okay, though I'd prefer a self-contained
function.

> >> 	     (if follow-symlinks
> >> 		 '("-L")
> >> 	       '("!" "(" "-type" "l" "-xtype" "d" ")"))
> >> 	     (unless (string-empty-p regexp)
> >> 	       "-regex" (concat ".*" regexp ".*"))
> >> 	     (unless include-directories
> >> 	       '("!" "-type" "d"))
> >> 	     '("-print0")
> >
> > Some of these switches are specific to GNU Find.  Are we going to
> > support only GNU Find?
> 
> POSIX find doesn't support -regex, so I think we have to.  We could
> stick to just POSIX find if we only allowed globs in
> find-directory-files-recursively, instead of full regexes.

The latter would again be incompatible with
directory-files-recursively, so it isn't TRT, IMO.

One other subtlety is non-ASCII file names: you use -print0 switch to
Find, which produces null bytes, and those could inhibit decoding of
non-ASCII characters. So you may need to bind
inhibit-null-byte-detection to a non-nil value to get correctly
decoded file names you get from Find.




This bug report was last modified 1 year and 273 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.