GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Wed, 19 Jul 2023 21:17:02 UTC

Severity: normal

Found in version 29.0.92

Full log


Message #11 received at 64735 <at> debbugs.gnu.org (full text, mbox):

From: sbaugh <at> catern.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Spencer Baugh <sbaugh <at> janestreet.com>, 64735 <at> debbugs.gnu.org
Subject: Re: bug#64735: 29.0.92; find invocations are ~15x slower because of
 ignores
Date: Thu, 20 Jul 2023 12:22:19 +0000 (UTC)
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Date: Wed, 19 Jul 2023 17:16:31 -0400
>> 
>> 
>> Several important commands and functions invoke find; for example rgrep
>> and project-find-regexp.
>> 
>> Most of these add some set of ignores to the find command, pulling from
>> grep-find-ignored-files in the former case.  So the find command looks
>> like:
>> 
>> find -H . \( -path \*/SCCS/\* -o -path \*/RCS/\* [...more ignores...] \)
>> -prune -o -type f -print0
>> 
>> Alas, on my system, using GNU find, these ignores slow down find by
>> about 15x on a large directory tree, taking it from around .5 seconds to
>> 7.8 seconds.
>> 
>> This is very noticeable overhead; removing the ignores makes rgrep and
>> other find-invoking commands substantially faster for me.
>
> grep-find-ignored-files is a customizable user option, so if this
> slowdown bothers you, just customize it to avoid that.

I think the fact that the default behavior is very slow, is bad.

> And if there are patterns there that are no longer pertinent or rare,
> we could remove them from the default value.

Sure!

So the thing to narrow down would be completion-ignored-extensions,
which is what populates grep-find-ignored-files.  Most things in that
list are irrelevant to most users, but all of them are relevant to some
users.

Most of these are language-specific things - e.g. there's a bunch of
Common Lisp compiled object (or something) extensions.

Perhaps we could modularize this, so that individual packages add things
to completion-ignored-extensions at load time.  Then
completion-ignored-extensions would only include things which are
relevant to a given user, as determined by what packages they load.

> I'm not sure we should bother more than these two simple measures.

Unfortunately those two simple measures help rgrep but they don't help
project-find-regexp (and others project.el commands using
project--files-in-directory such as project-find-file), since those
project commands pull their ignores from the version control system
through vc (not grep-find-ignored-files), and then pass them to find.

>> The overhead is linear in the number of ignores - that is, each
>> additional ignore adds a small fixed cost.  This suggests that find is
>> linearly scanning the list of ignores and checking each one, rather than
>> optimizing them to a single regexp and checking that regexp.
>
> If it uses fnmatch, it cannot do it any other way, I think




This bug report was last modified 1 year and 273 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.