GNU bug report logs - #69188
30.0.50; project-files + project-find-file is slow in large repositories

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Sun, 18 Feb 2024 18:22:02 UTC

Severity: normal

Merged with 69233

Found in version 30.0.50

Full log


View this message in rfc822 format

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 69233 <at> debbugs.gnu.org, 69188 <at> debbugs.gnu.org
Subject: bug#69188: bug#69233: 30.0.50; project-files + project-find-file is slow in large repositories
Date: Sun, 5 May 2024 06:32:12 +0300
On 30/04/2024 00:04, Spencer Baugh wrote:

> Oh, interesting, I see roughly the same result.
> 
> Benchmarking with:
> (benchmark-run 10 (project-files (project-current)))
> 
> Running in my long-lived existing Emacs 29 session:
> Old:
> (4.434228319 14 2.850654906999921)
> New:
> (4.983809167 16 3.2989908669999295)
> 
> In Emacs 29 emacs -Q:
> Old:
> (3.5112438729999997 130 1.9230644630000002)
> New:
> (3.819248509 171 2.309731412)
> 
> But, in Emacs 30 emacs -Q:
> Old:
> (7.949549188 65 3.3445626799999992)
> New:
> (7.270785783999999 87 4.0610532379999995)
> 
> So... the performance improvement seems highly unreliable.  Probably not
> worth changing this area, then - the other patch to allow relative files
> will probably be more worth it.

All right then, let's hold off on this potential change for now, and 
maybe revisit it later. Maybe the new GC engine will swing the needle in 
one or the other direction.

> I think the defvar approach seems reasonable.
> 
> The existing project-read-file-name-function certainly don't expect
> relative names, but they do actually work OK.  e.g.
> 
> (project--read-file-cpd-relative "" '("foo/bar" "foo1/bar") nil 'minibuffer-history)

Evaluating this one with the version in master results in

Debugger entered--Lisp error: (wrong-type-argument stringp nil)
  expand-file-name(nil)

hence the associated change in the patch.

> (project--read-file-absolute "" '("foo/bar" "foo1/bar") nil 'minibuffer-history)

No errors here, but two problems are that a) it doesn't show the 
default-directory [meaning no indication in which project the read is 
happening], and b) returning the relative name will mess up the 
file-name-history entry.

Good thing you noted the latter, it needs explicit handling. The former 
can be be shown in the prompt, at least.

> Both complete fine and return a filename fine.  read-file-cpd-relative
> returns an absolute filename, read-file-absolute reutrns a relative
> filename.
> 
> Maybe the same is true for any custom project-read-file-name-functions
> that exist?  Maybe they will just work?

So, apparently not.

Anyway, I've pushed the patch in commit 370b216f086. Here's hoping the 
breakage will be minimal.

>>> However, that would make it easy for project-files as a whole to be
>>> asynchronous.  Then that would allow project-find-file to start the
>>> listing in the background, and then we'd write a completion table which
>>> completes only over whatever files we've already read into Emacs.  I
>>> think this would be a lot nicer for most use-cases, and I'd again be
>>> happy to implement this.
>>
>> Could this be that simple?
>>
>> Whatever the source of the file listing, as soon as the UI (or
>> completion styles) calls try-completion or all-completions, the search
>> has to finish first, shouldn't it? That seems like the semantics of
>> this API. Or if perhaps we allow it to operate on incomplete results,
>> how would we indicate to the user at the end that the scan has
>> finished, and they can press TAB once more to refresh the results? Or
>> perhaps to be able to find a file they hadn't managed to find in the
>> incomplete set.
>>
>> This seems like it might require both a new UI and an extension of
>> completion table API. E.g. in certain cases we could say that we only
>> need N matches, so if the current incomplete set can provide as many,
>> we don't have to wait until the end. But 'try-completion' would become
>> unreliable either way.
> 
> Yes, that's all true, and this is definitely not the intended semantics
> of the API, but I vaguely suspect it might be fine in practice?  That
> vague suspicion can wait until later, though, because I think the more
> conservative approach you suggest is also a good improvement on its own.

Some async stuff could make a big improvement on top of it, but it seems 
to require a fair bit more complexity.

>> Even if keeping to the most conservative approach, though, it should
>> be possible to at least render the prompt before the file listing is
>> finished. That could make the UI look a bit more responsive.
> 
> True, that would be pretty nice.  And further I suppose in the case of
> the default completion UI (which doesn't automatically display
> completions), the user can even type some input before hitting TAB and
> waiting.

It could be advantageous if the search process starts right when (or 
before) the prompt is shown, then by the type the first input is entered 
the search could either be finished or have found some matches at least.

> Also, I suppose that even non-default completion UIs would allow the
> user to type input, if the non-default completion UI uses
> while-no-input.  So it would be a pretty responsive experience for such
> UIs (assuming we are careful in our implementation and don't have bugs
> when being interrupted).

Not sure about this one:

1) If you only do the search while the user is not typing, it will 
finish later compared to the scheme in the previous paragraph.
2) Suppose you type a char, pause, then another one. Will the search 
start, abort, and then start again? That seems wasteful.

I'd ultimately prefer a scheme where work isn't thrown away - but that 
would require a more complex API. Including a way to abort the 
background computation (since typing won't do that anymore).

For some UIs and commands that makes sense (e.g. incremental interfaces 
like counsel-rg) because they perform the search with different inputs 
each time you type a new character. That kinds of works for 
small-to-medium projects, and you can enjoy the responsiveness of the 
process. I'm not sure about this approach for big projects.





This bug report was last modified 1 year and 42 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.