GNU bug report logs - #50733
28.0.1; project-find-regexp can block Emacs for a long time

Previous Next

Package: emacs;

Reported by: Daniel Martín <mardani29 <at> yahoo.es>

Date: Wed, 22 Sep 2021 09:31:02 UTC

Severity: normal

Found in version 28.0.1

Full log


Message #14 received at 50733 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Daniel Martín <mardani29 <at> yahoo.es>
Cc: 50733 <at> debbugs.gnu.org
Subject: Re: bug#50733: 28.0.1; project-find-regexp can block Emacs for a long
 time
Date: Thu, 23 Sep 2021 02:09:16 +0300
On 23.09.2021 00:58, Daniel Martín wrote:
> Dmitry Gutov <dgutov <at> yandex.ru> writes:
>>
>> IIRC you are using macOS. I received another report recently that
>> find/grep based tooling, and project-find-regexp in particular, are
>> pretty slow on that OS.
> 
> Yes, this is on macOS.
> 
>>
>> When you say "block for a long time", how long are we talking about?
>>>
>> To try it, evaluate
>>
>>    (benchmark 1 '(project-find-regexp "new-collection"))
> 
> I usually work on a monorepo with ~67000 tracked files (many of them big
> binary files).  Here's what I get when using ripgrep as the xref search
> program:
> 
> Elapsed time: 36.087181s (8.067474s in 22 GCs)

Thanks for testing. Did the switch to ripgrep help much?

I wonder if we should advertise this setting and recommendation more 
prominently, at least until we get auto-detection.

> Running the same search with ripgrep from the command line takes around
> 6 seconds.

Is that with an SSD?

Your project sounds respectable. The torvalds-linux repo I have checked 
out here is also 70000 files, but I guess your files are bigger.

>> Another benchmark to try is
>>
>>    (benchmark 1 '(project-files (project-current)))
> 
> Elapsed time: 1.590223s (0.432372s in 1 GCs)

That's a while (I wonder if you find 'project-find-file' usable with 
this kind of performance), but still better than I might have expected.

> Here's an ELisp profile of the first benchmark:
> 
>          8696  78% - command-execute
>          8696  78%  - call-interactively
>          8493  76%   - funcall-interactively
>          8480  76%    - eval-expression
>          8479  76%     - eval
>          8479  76%      - project-find-regexp
>          8227  74%       - xref--show-xrefs
>          8227  74%        - xref--show-xref-buffer
>          5584  50%         - #<compiled 0x140b5a40100bafc6>
>          5584  50%          - apply
>          5584  50%           - project--find-regexp-in-files
>          5574  50%            - xref-matches-in-files
>          3016  27%             - xref--convert-hits
>          3000  27%              - mapcan
>          2992  27%               - #<compiled -0x6cdcd56218925c3>
>          2734  24%                - xref--collect-matches
>          2094  18%                 - xref--collect-matches-1
>           800   7%                  + xref-make-match
>           774   7%                  + xref-make-file-location
>           104   0%                   xref--find-file-buffer
>            80   0%                   file-remote-p
>            51   0%                   xref--regexp-syntax-dependent-p
>           906   8%             + xref--process-file-region
>           331   2%               sort
>          1413  12%         + xref--analyze
>          1230  11%         + xref--show-common-initialize
>           249   2%       + project-files
>             3   0%       + project-current
>             9   0%    + minibuffer-complete
>             4   0%    + execute-extended-command
>           203   1%   + byte-code
>          2314  20% - ...
>          2314  20%    Automatic GC
>            27   0% + timer-event-handler

When you have a lot of matches, at some point Lisp overhead is going to 
show up. E.g., the searches seem almost instantaneous with up to several 
thousand matches here, but 10000s and 100000s - yeah, I have to wait.

Help with optimizations in that area (around/in xref-matches-in-files 
and xref--convert-hits) is welcome, but I'm not sure how much more we 
can squeeze.

> The search time is reduced when I use a more specific search term,
> presumably because the number of results is lower and the Elisp
> post-processing takes less time.  Here's what I got, for example, when I
> search for something with results from only one file:
> 
> Elapsed time: 6.859815s (0.864738s in 2 GCs)
> 
> Compared to the time taken by the same query from the command line
> (6.5s) shows that the Elisp post-processing time is probably negligible
> in this scenario.

It's a good result. A little suspicious, though: given that 
project-find-regexp calls project-files first, and the latter takes 
1.5s, the difference should ~ that time. But I guess rg also needs to 
traverse the directory tree, and spends some time on doing that too.

What else can be done -- again, if someone wants to investigate an 
asynchronous/nonblocking API for Xref (or using threads) -- welcome. The 
case when most of the time is spent in the subprocess is a good match. 
But I don't think we'll manage this for the upcoming release.

Another thing you can do is set up the additional ignores for the 
project. If those big binary files are not something you are interested 
in searching and touching, you could add ignore entries for them. When 
the vc project backend is in use (default), it is currently done via 
.dir-locals.el: the variable is project-vc-ignores, it's a list of 
strings that should be globs. See its docstring and the explanation in 
project-ignores's docstring.

Note that ignores also affect project-find-file.




This bug report was last modified 3 years and 261 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.