GNU bug report logs - #64735
29.0.92; find invocations are ~15x slower because of ignores

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Wed, 19 Jul 2023 21:17:02 UTC

Severity: normal

Found in version 29.0.92

Full log


Message #68 received at 64735 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: sbaugh <at> catern.com, Eli Zaretskii <eliz <at> gnu.org>, 64735 <at> debbugs.gnu.org
Subject: Re: bug#64735: 29.0.92; find invocations are ~15x slower because of
 ignores
Date: Thu, 20 Jul 2023 21:54:32 +0300
On 20/07/2023 16:43, Spencer Baugh wrote:

>> That's only a problem when the default file listing logic is used (and
>> we usually delegate to something like 'git ls-files' instead, when the
>> vc-aware backend is used).
> 
> Hm, yes, but things like C-u project-find-regexp will use the default
> find-based file listing logic instead of git ls-files, as do a few other
> things.

Right.

> I wonder, could we just go ahead and make a vc function which is
> list-files(GLOBS) and returns a list of files?  Both git and hg support
> this.  Then we could have C-u project-find-regexp use that instead of
> find, by taking the cross product of dirs-to-search and
> file-name-patterns-to-search.  (And this would let me delete a big chunk
> of my own project backend, so I'd be happy to implement it.)

I started out on this inside the branch scratch/project-regen. Didn't 
have time to dedicate to it recently, but the basics are there, take a 
look (the method is called project-files-filtered).

The difficulty with making such changes, is the project protocol grows 
in size, it becomes difficult for a user to understand what is 
mandatory, what's obsolete, and how to use it, especially in the face of 
backward compatibility requirements.

Take a look, feedback is welcome, it should help move this forward. We 
should also transition to returning relative file names when possible, 
for performance (optionally or always).

> Fundamentally it seems a little silly for project-ignores to ever be
> used for a vc project; if the vcs gives us ignores, we can probably just
> ask the vcs to list the files too, and it will have an efficient
> implementation of that.

Possibly, yes. But there will likely remain cases when the project-files 
could stay useful for callers, to construct some bigger command line for 
some new feature. Though perhaps we'll be able to drop that need by 
extracting the theoretically best performance from project-files (using 
a process object or some abstraction), to facilitate low-overhead piping.

> If we do that uniformly, then this find slowness would only affect
> transient projects, and transient projects pull their ignores from
> grep-find-ignored-files just like rgrep, so improvements will more
> easily be applied to both.  (And maybe we could even get rid of
> project-ignores entirely, then?)

Regarding removing it, see above. And it'll take a number of years 
anyway ;-(




This bug report was last modified 1 year and 274 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.