GNU bug report logs - #62837
[PATCH] Add a semantic-symref backend which uses xref-matches-in-files

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Fri, 14 Apr 2023 15:38:01 UTC

Severity: wishlist

Tags: patch

To reply to this bug, email your comments to 62837 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#62837; Package emacs. (Fri, 14 Apr 2023 15:38:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Spencer Baugh <sbaugh <at> janestreet.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 14 Apr 2023 15:38:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: bug-gnu-emacs <at> gnu.org
Cc: Dmitry Gutov <dgutov <at> yandex.ru>
Subject: [PATCH] Add a semantic-symref backend which uses xref-matches-in-files
Date: Fri, 14 Apr 2023 11:37:43 -0400
[Message part 1 (text/plain, inline)]
Tags: patch


When project-files is available, this is a much more efficient
fallback than the current grep fallback.  Ultimately, this is
motivated by making xref-find-references faster by default even in the
absence of an index.

* lisp/cedet/semantic/symref/project.el:
Add.
* lisp/cedet/semantic/symref.el (semantic-symref-tool-alist):
Add project tool

In GNU Emacs 29.0.60 (build 3, x86_64-pc-linux-gnu, X toolkit, cairo
 version 1.15.12, Xaw scroll bars) of 2023-03-13 built on
 igm-qws-u22796a
Repository revision: e759905d2e0828eac4c8164b09113b40f6899656
Repository branch: emacs-29
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: CentOS Linux 7 (Core)

Configured using:
 'configure --with-x-toolkit=lucid --with-modules
 --with-gif=ifavailable'

[0001-Add-a-semantic-symref-backend-which-uses-xref-matche.patch (text/patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#62837; Package emacs. (Fri, 14 Apr 2023 22:39:02 GMT) Full text and rfc822 format available.

Message #8 received at 62837 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Spencer Baugh <sbaugh <at> janestreet.com>, 62837 <at> debbugs.gnu.org
Subject: Re: bug#62837: [PATCH] Add a semantic-symref backend which uses
 xref-matches-in-files
Date: Sat, 15 Apr 2023 01:38:18 +0300
Hi!

On 14/04/2023 18:37, Spencer Baugh wrote:
> When project-files is available, this is a much more efficient
> fallback than the current grep fallback.  Ultimately, this is
> motivated by making xref-find-references faster by default even in the
> absence of an index.

It's a clever enough idea, but unfortunately it doesn't look like the 
performance is always improved by this change.

E.g. I have this checkout of gecko-dev (a big project, just for testing: 
https://github.com/mozilla/gecko-dev) which contains different types of 
files: cpp, js, py.

If I do an xref-find-references search with the current code, it 
finishes in around ~0.8s. 'find' is not that slow, actually:

  time find . -type f -name "*.cpp" >/dev/null

reports just 400 ms here.

Whereas with your patch the search, depending on the language (cpp -- 
more files, py -- less files) can take 3 seconds and more.

Why? First of all, project-files returns all files (which are then all 
searched), whereas semantic-symref-filepattern-alist contains a mapping 
from modes to file globs, limiting both the scan and subsequent search 
to those.

Second -- using project-files means we're forced to round-trip the list 
of files names from the first project's stdout, to buffer, then to a 
list of Lisp strings, and then back to another buffer, to use as stdin. 
I have a couple of things planner in the medium term to improve that, 
but some overhead is probably unavoidable (unless we get some new 
primitive that would allow "piping" between process buffers).

Perhaps you could describe your case where you *did* see a significant 
improvement from this patch, and we can discuss the best steps to 
address that.

BTW, at first I figured you're using MacOS (which historically has 
bundled outdated versions of find and grep, with worse performance). But 
apparently not?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#62837; Package emacs. (Sat, 15 Apr 2023 06:51:01 GMT) Full text and rfc822 format available.

Message #11 received at 62837 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: sbaugh <at> janestreet.com, 62837 <at> debbugs.gnu.org
Subject: Re: bug#62837: [PATCH] Add a semantic-symref backend which uses
 xref-matches-in-files
Date: Sat, 15 Apr 2023 09:50:36 +0300
> Date: Sat, 15 Apr 2023 01:38:18 +0300
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> 
> On 14/04/2023 18:37, Spencer Baugh wrote:
> > When project-files is available, this is a much more efficient
> > fallback than the current grep fallback.  Ultimately, this is
> > motivated by making xref-find-references faster by default even in the
> > absence of an index.
> 
> It's a clever enough idea, but unfortunately it doesn't look like the 
> performance is always improved by this change.

Maybe we could offer that as optional behavior, turned on by some user
option?  Then people who do experience performance boost could use it.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#62837; Package emacs. (Sat, 15 Apr 2023 12:38:01 GMT) Full text and rfc822 format available.

Message #14 received at 62837 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: sbaugh <at> janestreet.com, 62837 <at> debbugs.gnu.org
Subject: Re: bug#62837: [PATCH] Add a semantic-symref backend which uses
 xref-matches-in-files
Date: Sat, 15 Apr 2023 15:37:20 +0300
On 15/04/2023 09:50, Eli Zaretskii wrote:
>> Date: Sat, 15 Apr 2023 01:38:18 +0300
>> From: Dmitry Gutov<dgutov <at> yandex.ru>
>>
>> On 14/04/2023 18:37, Spencer Baugh wrote:
>>> When project-files is available, this is a much more efficient
>>> fallback than the current grep fallback.  Ultimately, this is
>>> motivated by making xref-find-references faster by default even in the
>>> absence of an index.
>> It's a clever enough idea, but unfortunately it doesn't look like the
>> performance is always improved by this change.
> Maybe we could offer that as optional behavior, turned on by some user
> option?  Then people who do experience performance boost could use it.

Sure. That's also possible. But I'd like more info anyway, for example, 
to be able to make the choice about which value of said option should be 
the default.

Or if the scenario with the improvement turns out to be a rare one, 
concentrate on what project.el needs to provide to make it better.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#62837; Package emacs. (Sat, 15 Apr 2023 21:57:02 GMT) Full text and rfc822 format available.

Message #17 received at 62837 <at> debbugs.gnu.org (full text, mbox):

From: sbaugh <at> catern.com
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: Spencer Baugh <sbaugh <at> janestreet.com>, 62837 <at> debbugs.gnu.org
Subject: Re: bug#62837: [PATCH] Add a semantic-symref backend which uses
 xref-matches-in-files
Date: Sat, 15 Apr 2023 21:56:24 +0000 (UTC)
Dmitry Gutov <dgutov <at> yandex.ru> writes:
> Hi!
>
> On 14/04/2023 18:37, Spencer Baugh wrote:
>> When project-files is available, this is a much more efficient
>> fallback than the current grep fallback.  Ultimately, this is
>> motivated by making xref-find-references faster by default even in the
>> absence of an index.
>
> It's a clever enough idea, but unfortunately it doesn't look like the
> performance is always improved by this change.
>
> E.g. I have this checkout of gecko-dev (a big project, just for
> testing: https://github.com/mozilla/gecko-dev) which contains
> different types of files: cpp, js, py.
>
> If I do an xref-find-references search with the current code, it
> finishes in around ~0.8s. 'find' is not that slow, actually:
>
>   time find . -type f -name "*.cpp" >/dev/null
>
> reports just 400 ms here.
>
> Whereas with your patch the search, depending on the language (cpp --
> more files, py -- less files) can take 3 seconds and more.
>
> Why? First of all, project-files returns all files (which are then all
> searched), whereas semantic-symref-filepattern-alist contains a
> mapping from modes to file globs, limiting both the scan and
> subsequent search to those.
>
> Second -- using project-files means we're forced to round-trip the
> list of files names from the first project's stdout, to buffer, then
> to a list of Lisp strings, and then back to another buffer, to use as
> stdin. I have a couple of things planner in the medium term to improve
> that, but some overhead is probably unavoidable (unless we get some
> new primitive that would allow "piping" between process buffers).

Yes, this is a very good point.

> Perhaps you could describe your case where you *did* see a significant
> improvement from this patch, and we can discuss the best steps to
> address that.

In short: I have a project.el backend for a large monorepo which has a
project-files backend which returns only the subset of files which are
relevant to work happening in a given clone.  (Generally a user will
have many clones and be doing different work in each one.)  The
relevant-files subset is determined by integration with the build
system.

So running find returns a vast number of files and then searches over
those, whereas running a search over project-files searches a much
smaller number of files.

Regarding your medium-term plans to improve project-files performance -
wildly guessing, but perhaps you have in mind a way to run a subprocess
that outputs the project-files list?  Let's call it
"project-files-process".  And then project-files-process could be piped
to grep instead, for maximum efficiency?  If that was the idea, then my
own backend could certainly have a project-files-process implementation
too, for maximum efficiency.

> BTW, at first I figured you're using MacOS (which historically has
> bundled outdated versions of find and grep, with worse
> performance). But apparently not?

Nope, Linux.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#62837; Package emacs. (Wed, 19 Apr 2023 01:11:01 GMT) Full text and rfc822 format available.

Message #20 received at 62837 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: sbaugh <at> catern.com
Cc: Spencer Baugh <sbaugh <at> janestreet.com>, 62837 <at> debbugs.gnu.org
Subject: Re: bug#62837: [PATCH] Add a semantic-symref backend which uses
 xref-matches-in-files
Date: Wed, 19 Apr 2023 04:10:24 +0300
On 16/04/2023 00:56, sbaugh <at> catern.com wrote:

>> Perhaps you could describe your case where you *did* see a significant
>> improvement from this patch, and we can discuss the best steps to
>> address that.
> 
> In short: I have a project.el backend for a large monorepo which has a
> project-files backend which returns only the subset of files which are
> relevant to work happening in a given clone.  (Generally a user will
> have many clones and be doing different work in each one.)  The
> relevant-files subset is determined by integration with the build
> system.
> 
> So running find returns a vast number of files and then searches over
> those, whereas running a search over project-files searches a much
> smaller number of files.

Neat.

> Regarding your medium-term plans to improve project-files performance -
> wildly guessing, but perhaps you have in mind a way to run a subprocess
> that outputs the project-files list?  Let's call it
> "project-files-process".  And then project-files-process could be piped
> to grep instead, for maximum efficiency?  If that was the idea, then my
> own backend could certainly have a project-files-process implementation
> too, for maximum efficiency.

That might be step number 3, although I'm not sure yet which kind of 
code will be required for the piping to be done efficiently enough.

The other two things I was looking at are:

- Use relative file names (less text to parse, memory to allocate, GC to 
thrash). The awkward part is how to merge that with the idea that 
project-files can include files from directories ("external roots"). 
Split those off into a different method? Treat them as separate projects 
to flat-map the lists of files at?

- Add arguments to allow filtering the files using the underlying tool. 
That can also result is much fewer files to parse in the output under 
suitable circumstances (e.g. we'd be able to pass a list of globs here).

There is one implementation of the second item in the branch 
scratch/etags-regen.

And both items need to be done carefully enough to maintain some 
backward compatibility.

So unless you're in a hurry, give me a few weeks to get around to this.

Further suggestions and patches are welcome, of course.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#62837; Package emacs. (Wed, 19 Apr 2023 01:27:01 GMT) Full text and rfc822 format available.

Message #23 received at 62837 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: sbaugh <at> catern.com, 62837 <at> debbugs.gnu.org
Subject: Re: bug#62837: [PATCH] Add a semantic-symref backend which uses
 xref-matches-in-files
Date: Tue, 18 Apr 2023 21:26:13 -0400
Dmitry Gutov <dgutov <at> yandex.ru> writes:
> On 16/04/2023 00:56, sbaugh <at> catern.com wrote:
>
>>> Perhaps you could describe your case where you *did* see a significant
>>> improvement from this patch, and we can discuss the best steps to
>>> address that.
>> In short: I have a project.el backend for a large monorepo which has
>> a
>> project-files backend which returns only the subset of files which are
>> relevant to work happening in a given clone.  (Generally a user will
>> have many clones and be doing different work in each one.)  The
>> relevant-files subset is determined by integration with the build
>> system.
>> So running find returns a vast number of files and then searches
>> over
>> those, whereas running a search over project-files searches a much
>> smaller number of files.
>
> Neat.
>
>> Regarding your medium-term plans to improve project-files performance -
>> wildly guessing, but perhaps you have in mind a way to run a subprocess
>> that outputs the project-files list?  Let's call it
>> "project-files-process".  And then project-files-process could be piped
>> to grep instead, for maximum efficiency?  If that was the idea, then my
>> own backend could certainly have a project-files-process implementation
>> too, for maximum efficiency.
>
> That might be step number 3, although I'm not sure yet which kind of
> code will be required for the piping to be done efficiently enough.
>
> The other two things I was looking at are:
>
> - Use relative file names (less text to parse, memory to allocate, GC
>   to thrash). The awkward part is how to merge that with the idea that
>   project-files can include files from directories ("external
>   roots"). Split those off into a different method? Treat them as
>   separate projects to flat-map the lists of files at?
>
> - Add arguments to allow filtering the files using the underlying
>   tool. That can also result is much fewer files to parse in the
>   output under suitable circumstances (e.g. we'd be able to pass a
>  list of globs here).
>
> There is one implementation of the second item in the branch
> scratch/etags-regen.
>
> And both items need to be done carefully enough to maintain some
> backward compatibility.
>
> So unless you're in a hurry, give me a few weeks to get around to this.
>
> Further suggestions and patches are welcome, of course.

I'm in no hurry.  I will probably add this backend locally at my site in
the meantime.  We have no existing (non-trivial) xref-find-references
backend, so speeding this one up isn't too urgent (it's not competing
with anything), but definitely I am interested in project-files (and
project.el in general) speed improvements and will try to help out as it
becomes relevant.




Severity set to 'wishlist' from 'normal' Request was from Stefan Kangas <stefankangas <at> gmail.com> to control <at> debbugs.gnu.org. (Mon, 11 Sep 2023 23:25:02 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 279 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.