GNU bug report logs -
#64735
29.0.92; find invocations are ~15x slower because of ignores
Previous Next
Full log
View this message in rfc822 format
Several important commands and functions invoke find; for example rgrep
and project-find-regexp.
Most of these add some set of ignores to the find command, pulling from
grep-find-ignored-files in the former case. So the find command looks
like:
find -H . \( -path \*/SCCS/\* -o -path \*/RCS/\* [...more ignores...] \)
-prune -o -type f -print0
Alas, on my system, using GNU find, these ignores slow down find by
about 15x on a large directory tree, taking it from around .5 seconds to
7.8 seconds.
This is very noticeable overhead; removing the ignores makes rgrep and
other find-invoking commands substantially faster for me.
The overhead is linear in the number of ignores - that is, each
additional ignore adds a small fixed cost. This suggests that find is
linearly scanning the list of ignores and checking each one, rather than
optimizing them to a single regexp and checking that regexp.
Obviously, GNU find should be optimizing this. However they have
previously said they will not optimize this; I commented on this bug
https://savannah.gnu.org/bugs/index.php?58197 to request they rethink
that. Hopefully as a fellow GNU project they will be interested in
helping us...
In Emacs alone, there are a few things we could do:
- we could mitigate the find bug by optimizing the regexp before we pass
it to find; this should basically remove all the overhead but makes the
find command uglier and harder to edit
- we could remove rare and likely irrelevant things from
completion-ignored-extensions and vc-ignore-dir-regexp (which are used
to build these lists of ignores)
- we could use our own recursive directory-tree walking implementation
(directory-files-recursively), if we found a nice way to pipe its output
directly to grep etc without going through Lisp. (This could be nice
for project-files, at least)
Incidentally, I tried a find alternative, "bfs",
https://github.com/tavianator/bfs and it doesn't optimize this either,
sadly, so it also has the 15x slowdown.
In GNU Emacs 29.0.92 (build 5, x86_64-pc-linux-gnu, X toolkit, cairo
version 1.15.12, Xaw scroll bars) of 2023-07-10 built on
Repository revision: dd15432ffacbeff0291381c0109f5b1245060b1d
Repository branch: emacs-29
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Rocky Linux 8.8 (Green Obsidian)
Configured using:
'configure --config-cache --with-x-toolkit=lucid
--with-gif=ifavailable'
Configured features:
CAIRO DBUS FREETYPE GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON
LIBSELINUX LIBXML2 MODULES NOTIFY INOTIFY PDUMPER PNG RSVG SECCOMP SOUND
SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XINPUT2 XPM LUCID
ZLIB
Important settings:
value of $LANG: en_US.UTF-8
locale-coding-system: utf-8-unix
Major mode: Shell
Memory information:
((conses 16 1939322 193013)
(symbols 48 76940 49)
(strings 32 337371 45355)
(string-bytes 1 12322013)
(vectors 16 148305)
(vector-slots 8 3180429 187121)
(floats 8 889 751)
(intervals 56 152845 1238)
(buffers 976 235)
(heap 1024 978725 465480))
This bug report was last modified 16 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.