GNU bug report logs -
#64735
29.0.92; find invocations are ~15x slower because of ignores
Previous Next
Full log
Message #11 received at 64735 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Date: Wed, 19 Jul 2023 17:16:31 -0400
>>
>>
>> Several important commands and functions invoke find; for example rgrep
>> and project-find-regexp.
>>
>> Most of these add some set of ignores to the find command, pulling from
>> grep-find-ignored-files in the former case. So the find command looks
>> like:
>>
>> find -H . \( -path \*/SCCS/\* -o -path \*/RCS/\* [...more ignores...] \)
>> -prune -o -type f -print0
>>
>> Alas, on my system, using GNU find, these ignores slow down find by
>> about 15x on a large directory tree, taking it from around .5 seconds to
>> 7.8 seconds.
>>
>> This is very noticeable overhead; removing the ignores makes rgrep and
>> other find-invoking commands substantially faster for me.
>
> grep-find-ignored-files is a customizable user option, so if this
> slowdown bothers you, just customize it to avoid that.
I think the fact that the default behavior is very slow, is bad.
> And if there are patterns there that are no longer pertinent or rare,
> we could remove them from the default value.
Sure!
So the thing to narrow down would be completion-ignored-extensions,
which is what populates grep-find-ignored-files. Most things in that
list are irrelevant to most users, but all of them are relevant to some
users.
Most of these are language-specific things - e.g. there's a bunch of
Common Lisp compiled object (or something) extensions.
Perhaps we could modularize this, so that individual packages add things
to completion-ignored-extensions at load time. Then
completion-ignored-extensions would only include things which are
relevant to a given user, as determined by what packages they load.
> I'm not sure we should bother more than these two simple measures.
Unfortunately those two simple measures help rgrep but they don't help
project-find-regexp (and others project.el commands using
project--files-in-directory such as project-find-file), since those
project commands pull their ignores from the version control system
through vc (not grep-find-ignored-files), and then pass them to find.
>> The overhead is linear in the number of ignores - that is, each
>> additional ignore adds a small fixed cost. This suggests that find is
>> linearly scanning the list of ignores and checking each one, rather than
>> optimizing them to a single regexp and checking that regexp.
>
> If it uses fnmatch, it cannot do it any other way, I think
This bug report was last modified 1 year and 273 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.