GNU bug report logs - #62837
[PATCH] Add a semantic-symref backend which uses xref-matches-in-files

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Fri, 14 Apr 2023 15:38:01 UTC

Severity: wishlist

Tags: patch

Full log


View this message in rfc822 format

From: sbaugh <at> catern.com
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: Spencer Baugh <sbaugh <at> janestreet.com>, 62837 <at> debbugs.gnu.org
Subject: bug#62837: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
Date: Sat, 15 Apr 2023 21:56:24 +0000 (UTC)
Dmitry Gutov <dgutov <at> yandex.ru> writes:
> Hi!
>
> On 14/04/2023 18:37, Spencer Baugh wrote:
>> When project-files is available, this is a much more efficient
>> fallback than the current grep fallback.  Ultimately, this is
>> motivated by making xref-find-references faster by default even in the
>> absence of an index.
>
> It's a clever enough idea, but unfortunately it doesn't look like the
> performance is always improved by this change.
>
> E.g. I have this checkout of gecko-dev (a big project, just for
> testing: https://github.com/mozilla/gecko-dev) which contains
> different types of files: cpp, js, py.
>
> If I do an xref-find-references search with the current code, it
> finishes in around ~0.8s. 'find' is not that slow, actually:
>
>   time find . -type f -name "*.cpp" >/dev/null
>
> reports just 400 ms here.
>
> Whereas with your patch the search, depending on the language (cpp --
> more files, py -- less files) can take 3 seconds and more.
>
> Why? First of all, project-files returns all files (which are then all
> searched), whereas semantic-symref-filepattern-alist contains a
> mapping from modes to file globs, limiting both the scan and
> subsequent search to those.
>
> Second -- using project-files means we're forced to round-trip the
> list of files names from the first project's stdout, to buffer, then
> to a list of Lisp strings, and then back to another buffer, to use as
> stdin. I have a couple of things planner in the medium term to improve
> that, but some overhead is probably unavoidable (unless we get some
> new primitive that would allow "piping" between process buffers).

Yes, this is a very good point.

> Perhaps you could describe your case where you *did* see a significant
> improvement from this patch, and we can discuss the best steps to
> address that.

In short: I have a project.el backend for a large monorepo which has a
project-files backend which returns only the subset of files which are
relevant to work happening in a given clone.  (Generally a user will
have many clones and be doing different work in each one.)  The
relevant-files subset is determined by integration with the build
system.

So running find returns a vast number of files and then searches over
those, whereas running a search over project-files searches a much
smaller number of files.

Regarding your medium-term plans to improve project-files performance -
wildly guessing, but perhaps you have in mind a way to run a subprocess
that outputs the project-files list?  Let's call it
"project-files-process".  And then project-files-process could be piped
to grep instead, for maximum efficiency?  If that was the idea, then my
own backend could certainly have a project-files-process implementation
too, for maximum efficiency.

> BTW, at first I figured you're using MacOS (which historically has
> bundled outdated versions of find and grep, with worse
> performance). But apparently not?

Nope, Linux.




This bug report was last modified 1 year and 279 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.