GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Wed, 19 Jul 2023 21:17:02 UTC

Severity: normal

Found in version 29.0.92

Full log


Message #302 received at 64735 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: yantar92 <at> posteo.net, rms <at> gnu.org, sbaugh <at> catern.com, dmitry <at> gutov.dev,
 michael.albinus <at> gmx.de, 64735 <at> debbugs.gnu.org
Subject: Re: bug#64735: 29.0.92; find invocations are ~15x slower because of
 ignores
Date: Sun, 23 Jul 2023 09:15:22 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: sbaugh <at> catern.com,  yantar92 <at> posteo.net,  rms <at> gnu.org,
>    dmitry <at> gutov.dev,  michael.albinus <at> gmx.de,  64735 <at> debbugs.gnu.org
> Date: Sat, 22 Jul 2023 16:53:05 -0400
> 
> Can you try this further change on your Windows (and GNU/Linux) box?  I
> just tested on a different box and my original change gets:
> 
> (("built-in" . "Elapsed time: 4.506643s (2.276269s in 21 GCs)")
>  ("with-find" . "Elapsed time: 4.114531s (2.848497s in 27 GCs)"))
> 
> while this parallel implementation gets
> 
> (("built-in" . "Elapsed time: 4.479185s (2.236561s in 21 GCs)")
>  ("with-find" . "Elapsed time: 2.858452s (1.934647s in 19 GCs)"))
> 
> so it might have a favorable impact on Windows and your other GNU/Linux
> box.

Almost no effect here on MS-Windows:

  (("built-in" . "Elapsed time: 0.859375s (0.093750s in 4 GCs)")
   ("with-find" . "Elapsed time: 8.437500s (0.078125s in 4 GCs)"))

It was 8.578 sec with the previous version.

(The Lisp version is somewhat faster in this test because I
native-compiled the code for this test.)

On GNU/Linux:

  (("built-in" . "Elapsed time: 4.244898s (1.934182s in 56 GCs)")
   ("with-find" . "Elapsed time: 3.011574s (1.190498s in 35 GCs)"))

Faster by 10% (previous version yielded 3.327 sec).

Btw, I needed to fix the code: when-let needs 2 open parens after it,
not one.  The original code signals an error from the filter function
in Emacs 29.

> >>   (cl-assert (null _predicate) t "find-directory-files-recursively can't accept arbitrary predicates")
> >
> > It should.
> 
> This is where I think a fallback would be useful - it's basically
> impossible to support arbitrary predicates efficiently here, since it
> requires us to put Lisp in control of whether find descends into a
> directory.

There's nothing wrong with supporting this less efficiently.

And there's no need to control where Find descends: you could just
filter out the files from those directories that need to be ignored.

> So I'm thinking I would just fall back to running the old
> directory-files-recursively whenever there's a predicate.  Or just not
> supporting this at all...

We cannot not support it at all, because then it will not be a
replacement.  Fallback is okay, though I'd prefer a self-contained
function.

> >> 	     (if follow-symlinks
> >> 		 '("-L")
> >> 	       '("!" "(" "-type" "l" "-xtype" "d" ")"))
> >> 	     (unless (string-empty-p regexp)
> >> 	       "-regex" (concat ".*" regexp ".*"))
> >> 	     (unless include-directories
> >> 	       '("!" "-type" "d"))
> >> 	     '("-print0")
> >
> > Some of these switches are specific to GNU Find.  Are we going to
> > support only GNU Find?
> 
> POSIX find doesn't support -regex, so I think we have to.  We could
> stick to just POSIX find if we only allowed globs in
> find-directory-files-recursively, instead of full regexes.

The latter would again be incompatible with
directory-files-recursively, so it isn't TRT, IMO.

One other subtlety is non-ASCII file names: you use -print0 switch to
Find, which produces null bytes, and those could inhibit decoding of
non-ASCII characters. So you may need to bind
inhibit-null-byte-detection to a non-nil value to get correctly
decoded file names you get from Find.




This bug report was last modified 16 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.