GNU bug report logs -
#64735
29.0.92; find invocations are ~15x slower because of ignores
Previous Next
Full log
Message #302 received at 64735 <at> debbugs.gnu.org (full text, mbox):
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: sbaugh <at> catern.com, yantar92 <at> posteo.net, rms <at> gnu.org,
> dmitry <at> gutov.dev, michael.albinus <at> gmx.de, 64735 <at> debbugs.gnu.org
> Date: Sat, 22 Jul 2023 16:53:05 -0400
>
> Can you try this further change on your Windows (and GNU/Linux) box? I
> just tested on a different box and my original change gets:
>
> (("built-in" . "Elapsed time: 4.506643s (2.276269s in 21 GCs)")
> ("with-find" . "Elapsed time: 4.114531s (2.848497s in 27 GCs)"))
>
> while this parallel implementation gets
>
> (("built-in" . "Elapsed time: 4.479185s (2.236561s in 21 GCs)")
> ("with-find" . "Elapsed time: 2.858452s (1.934647s in 19 GCs)"))
>
> so it might have a favorable impact on Windows and your other GNU/Linux
> box.
Almost no effect here on MS-Windows:
(("built-in" . "Elapsed time: 0.859375s (0.093750s in 4 GCs)")
("with-find" . "Elapsed time: 8.437500s (0.078125s in 4 GCs)"))
It was 8.578 sec with the previous version.
(The Lisp version is somewhat faster in this test because I
native-compiled the code for this test.)
On GNU/Linux:
(("built-in" . "Elapsed time: 4.244898s (1.934182s in 56 GCs)")
("with-find" . "Elapsed time: 3.011574s (1.190498s in 35 GCs)"))
Faster by 10% (previous version yielded 3.327 sec).
Btw, I needed to fix the code: when-let needs 2 open parens after it,
not one. The original code signals an error from the filter function
in Emacs 29.
> >> (cl-assert (null _predicate) t "find-directory-files-recursively can't accept arbitrary predicates")
> >
> > It should.
>
> This is where I think a fallback would be useful - it's basically
> impossible to support arbitrary predicates efficiently here, since it
> requires us to put Lisp in control of whether find descends into a
> directory.
There's nothing wrong with supporting this less efficiently.
And there's no need to control where Find descends: you could just
filter out the files from those directories that need to be ignored.
> So I'm thinking I would just fall back to running the old
> directory-files-recursively whenever there's a predicate. Or just not
> supporting this at all...
We cannot not support it at all, because then it will not be a
replacement. Fallback is okay, though I'd prefer a self-contained
function.
> >> (if follow-symlinks
> >> '("-L")
> >> '("!" "(" "-type" "l" "-xtype" "d" ")"))
> >> (unless (string-empty-p regexp)
> >> "-regex" (concat ".*" regexp ".*"))
> >> (unless include-directories
> >> '("!" "-type" "d"))
> >> '("-print0")
> >
> > Some of these switches are specific to GNU Find. Are we going to
> > support only GNU Find?
>
> POSIX find doesn't support -regex, so I think we have to. We could
> stick to just POSIX find if we only allowed globs in
> find-directory-files-recursively, instead of full regexes.
The latter would again be incompatible with
directory-files-recursively, so it isn't TRT, IMO.
One other subtlety is non-ASCII file names: you use -print0 switch to
Find, which produces null bytes, and those could inhibit decoding of
non-ASCII characters. So you may need to bind
inhibit-null-byte-detection to a non-nil value to get correctly
decoded file names you get from Find.
This bug report was last modified 1 year and 273 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.