GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Wed, 19 Jul 2023 21:17:02 UTC

Severity: normal

Found in version 29.0.92

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 29.0.92; find invocations are ~15x slower because of ignores
Date: Wed, 19 Jul 2023 17:16:31 -0400
Several important commands and functions invoke find; for example rgrep
and project-find-regexp.

Most of these add some set of ignores to the find command, pulling from
grep-find-ignored-files in the former case.  So the find command looks
like:

find -H . \( -path \*/SCCS/\* -o -path \*/RCS/\* [...more ignores...] \)
-prune -o -type f -print0

Alas, on my system, using GNU find, these ignores slow down find by
about 15x on a large directory tree, taking it from around .5 seconds to
7.8 seconds.

This is very noticeable overhead; removing the ignores makes rgrep and
other find-invoking commands substantially faster for me.

The overhead is linear in the number of ignores - that is, each
additional ignore adds a small fixed cost.  This suggests that find is
linearly scanning the list of ignores and checking each one, rather than
optimizing them to a single regexp and checking that regexp.

Obviously, GNU find should be optimizing this.  However they have
previously said they will not optimize this; I commented on this bug
https://savannah.gnu.org/bugs/index.php?58197 to request they rethink
that.  Hopefully as a fellow GNU project they will be interested in
helping us...

In Emacs alone, there are a few things we could do:
- we could mitigate the find bug by optimizing the regexp before we pass
it to find; this should basically remove all the overhead but makes the
find command uglier and harder to edit
- we could remove rare and likely irrelevant things from
completion-ignored-extensions and vc-ignore-dir-regexp (which are used
to build these lists of ignores)
- we could use our own recursive directory-tree walking implementation
(directory-files-recursively), if we found a nice way to pipe its output
directly to grep etc without going through Lisp.  (This could be nice
for project-files, at least)

Incidentally, I tried a find alternative, "bfs",
https://github.com/tavianator/bfs and it doesn't optimize this either,
sadly, so it also has the 15x slowdown.



In GNU Emacs 29.0.92 (build 5, x86_64-pc-linux-gnu, X toolkit, cairo
 version 1.15.12, Xaw scroll bars) of 2023-07-10 built on

Repository revision: dd15432ffacbeff0291381c0109f5b1245060b1d
Repository branch: emacs-29
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Rocky Linux 8.8 (Green Obsidian)

Configured using:
 'configure --config-cache --with-x-toolkit=lucid
 --with-gif=ifavailable'

Configured features:
CAIRO DBUS FREETYPE GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON
LIBSELINUX LIBXML2 MODULES NOTIFY INOTIFY PDUMPER PNG RSVG SECCOMP SOUND
SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XINPUT2 XPM LUCID
ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Shell

Memory information:
((conses 16 1939322 193013)
 (symbols 48 76940 49)
 (strings 32 337371 45355)
 (string-bytes 1 12322013)
 (vectors 16 148305)
 (vector-slots 8 3180429 187121)
 (floats 8 889 751)
 (intervals 56 152845 1238)
 (buffers 976 235)
 (heap 1024 978725 465480))




This bug report was last modified 1 year and 274 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.