GNU bug report logs -
#62837
[PATCH] Add a semantic-symref backend which uses xref-matches-in-files
Previous Next
To reply to this bug, email your comments to 62837 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62837
; Package
emacs
.
(Fri, 14 Apr 2023 15:38:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Spencer Baugh <sbaugh <at> janestreet.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Fri, 14 Apr 2023 15:38:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Tags: patch
When project-files is available, this is a much more efficient
fallback than the current grep fallback. Ultimately, this is
motivated by making xref-find-references faster by default even in the
absence of an index.
* lisp/cedet/semantic/symref/project.el:
Add.
* lisp/cedet/semantic/symref.el (semantic-symref-tool-alist):
Add project tool
In GNU Emacs 29.0.60 (build 3, x86_64-pc-linux-gnu, X toolkit, cairo
version 1.15.12, Xaw scroll bars) of 2023-03-13 built on
igm-qws-u22796a
Repository revision: e759905d2e0828eac4c8164b09113b40f6899656
Repository branch: emacs-29
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: CentOS Linux 7 (Core)
Configured using:
'configure --with-x-toolkit=lucid --with-modules
--with-gif=ifavailable'
[0001-Add-a-semantic-symref-backend-which-uses-xref-matche.patch (text/patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62837
; Package
emacs
.
(Fri, 14 Apr 2023 22:39:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 62837 <at> debbugs.gnu.org (full text, mbox):
Hi!
On 14/04/2023 18:37, Spencer Baugh wrote:
> When project-files is available, this is a much more efficient
> fallback than the current grep fallback. Ultimately, this is
> motivated by making xref-find-references faster by default even in the
> absence of an index.
It's a clever enough idea, but unfortunately it doesn't look like the
performance is always improved by this change.
E.g. I have this checkout of gecko-dev (a big project, just for testing:
https://github.com/mozilla/gecko-dev) which contains different types of
files: cpp, js, py.
If I do an xref-find-references search with the current code, it
finishes in around ~0.8s. 'find' is not that slow, actually:
time find . -type f -name "*.cpp" >/dev/null
reports just 400 ms here.
Whereas with your patch the search, depending on the language (cpp --
more files, py -- less files) can take 3 seconds and more.
Why? First of all, project-files returns all files (which are then all
searched), whereas semantic-symref-filepattern-alist contains a mapping
from modes to file globs, limiting both the scan and subsequent search
to those.
Second -- using project-files means we're forced to round-trip the list
of files names from the first project's stdout, to buffer, then to a
list of Lisp strings, and then back to another buffer, to use as stdin.
I have a couple of things planner in the medium term to improve that,
but some overhead is probably unavoidable (unless we get some new
primitive that would allow "piping" between process buffers).
Perhaps you could describe your case where you *did* see a significant
improvement from this patch, and we can discuss the best steps to
address that.
BTW, at first I figured you're using MacOS (which historically has
bundled outdated versions of find and grep, with worse performance). But
apparently not?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62837
; Package
emacs
.
(Sat, 15 Apr 2023 06:51:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 62837 <at> debbugs.gnu.org (full text, mbox):
> Date: Sat, 15 Apr 2023 01:38:18 +0300
> From: Dmitry Gutov <dgutov <at> yandex.ru>
>
> On 14/04/2023 18:37, Spencer Baugh wrote:
> > When project-files is available, this is a much more efficient
> > fallback than the current grep fallback. Ultimately, this is
> > motivated by making xref-find-references faster by default even in the
> > absence of an index.
>
> It's a clever enough idea, but unfortunately it doesn't look like the
> performance is always improved by this change.
Maybe we could offer that as optional behavior, turned on by some user
option? Then people who do experience performance boost could use it.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62837
; Package
emacs
.
(Sat, 15 Apr 2023 12:38:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 62837 <at> debbugs.gnu.org (full text, mbox):
On 15/04/2023 09:50, Eli Zaretskii wrote:
>> Date: Sat, 15 Apr 2023 01:38:18 +0300
>> From: Dmitry Gutov<dgutov <at> yandex.ru>
>>
>> On 14/04/2023 18:37, Spencer Baugh wrote:
>>> When project-files is available, this is a much more efficient
>>> fallback than the current grep fallback. Ultimately, this is
>>> motivated by making xref-find-references faster by default even in the
>>> absence of an index.
>> It's a clever enough idea, but unfortunately it doesn't look like the
>> performance is always improved by this change.
> Maybe we could offer that as optional behavior, turned on by some user
> option? Then people who do experience performance boost could use it.
Sure. That's also possible. But I'd like more info anyway, for example,
to be able to make the choice about which value of said option should be
the default.
Or if the scenario with the improvement turns out to be a rare one,
concentrate on what project.el needs to provide to make it better.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62837
; Package
emacs
.
(Sat, 15 Apr 2023 21:57:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 62837 <at> debbugs.gnu.org (full text, mbox):
Dmitry Gutov <dgutov <at> yandex.ru> writes:
> Hi!
>
> On 14/04/2023 18:37, Spencer Baugh wrote:
>> When project-files is available, this is a much more efficient
>> fallback than the current grep fallback. Ultimately, this is
>> motivated by making xref-find-references faster by default even in the
>> absence of an index.
>
> It's a clever enough idea, but unfortunately it doesn't look like the
> performance is always improved by this change.
>
> E.g. I have this checkout of gecko-dev (a big project, just for
> testing: https://github.com/mozilla/gecko-dev) which contains
> different types of files: cpp, js, py.
>
> If I do an xref-find-references search with the current code, it
> finishes in around ~0.8s. 'find' is not that slow, actually:
>
> time find . -type f -name "*.cpp" >/dev/null
>
> reports just 400 ms here.
>
> Whereas with your patch the search, depending on the language (cpp --
> more files, py -- less files) can take 3 seconds and more.
>
> Why? First of all, project-files returns all files (which are then all
> searched), whereas semantic-symref-filepattern-alist contains a
> mapping from modes to file globs, limiting both the scan and
> subsequent search to those.
>
> Second -- using project-files means we're forced to round-trip the
> list of files names from the first project's stdout, to buffer, then
> to a list of Lisp strings, and then back to another buffer, to use as
> stdin. I have a couple of things planner in the medium term to improve
> that, but some overhead is probably unavoidable (unless we get some
> new primitive that would allow "piping" between process buffers).
Yes, this is a very good point.
> Perhaps you could describe your case where you *did* see a significant
> improvement from this patch, and we can discuss the best steps to
> address that.
In short: I have a project.el backend for a large monorepo which has a
project-files backend which returns only the subset of files which are
relevant to work happening in a given clone. (Generally a user will
have many clones and be doing different work in each one.) The
relevant-files subset is determined by integration with the build
system.
So running find returns a vast number of files and then searches over
those, whereas running a search over project-files searches a much
smaller number of files.
Regarding your medium-term plans to improve project-files performance -
wildly guessing, but perhaps you have in mind a way to run a subprocess
that outputs the project-files list? Let's call it
"project-files-process". And then project-files-process could be piped
to grep instead, for maximum efficiency? If that was the idea, then my
own backend could certainly have a project-files-process implementation
too, for maximum efficiency.
> BTW, at first I figured you're using MacOS (which historically has
> bundled outdated versions of find and grep, with worse
> performance). But apparently not?
Nope, Linux.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62837
; Package
emacs
.
(Wed, 19 Apr 2023 01:11:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 62837 <at> debbugs.gnu.org (full text, mbox):
On 16/04/2023 00:56, sbaugh <at> catern.com wrote:
>> Perhaps you could describe your case where you *did* see a significant
>> improvement from this patch, and we can discuss the best steps to
>> address that.
>
> In short: I have a project.el backend for a large monorepo which has a
> project-files backend which returns only the subset of files which are
> relevant to work happening in a given clone. (Generally a user will
> have many clones and be doing different work in each one.) The
> relevant-files subset is determined by integration with the build
> system.
>
> So running find returns a vast number of files and then searches over
> those, whereas running a search over project-files searches a much
> smaller number of files.
Neat.
> Regarding your medium-term plans to improve project-files performance -
> wildly guessing, but perhaps you have in mind a way to run a subprocess
> that outputs the project-files list? Let's call it
> "project-files-process". And then project-files-process could be piped
> to grep instead, for maximum efficiency? If that was the idea, then my
> own backend could certainly have a project-files-process implementation
> too, for maximum efficiency.
That might be step number 3, although I'm not sure yet which kind of
code will be required for the piping to be done efficiently enough.
The other two things I was looking at are:
- Use relative file names (less text to parse, memory to allocate, GC to
thrash). The awkward part is how to merge that with the idea that
project-files can include files from directories ("external roots").
Split those off into a different method? Treat them as separate projects
to flat-map the lists of files at?
- Add arguments to allow filtering the files using the underlying tool.
That can also result is much fewer files to parse in the output under
suitable circumstances (e.g. we'd be able to pass a list of globs here).
There is one implementation of the second item in the branch
scratch/etags-regen.
And both items need to be done carefully enough to maintain some
backward compatibility.
So unless you're in a hurry, give me a few weeks to get around to this.
Further suggestions and patches are welcome, of course.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62837
; Package
emacs
.
(Wed, 19 Apr 2023 01:27:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 62837 <at> debbugs.gnu.org (full text, mbox):
Dmitry Gutov <dgutov <at> yandex.ru> writes:
> On 16/04/2023 00:56, sbaugh <at> catern.com wrote:
>
>>> Perhaps you could describe your case where you *did* see a significant
>>> improvement from this patch, and we can discuss the best steps to
>>> address that.
>> In short: I have a project.el backend for a large monorepo which has
>> a
>> project-files backend which returns only the subset of files which are
>> relevant to work happening in a given clone. (Generally a user will
>> have many clones and be doing different work in each one.) The
>> relevant-files subset is determined by integration with the build
>> system.
>> So running find returns a vast number of files and then searches
>> over
>> those, whereas running a search over project-files searches a much
>> smaller number of files.
>
> Neat.
>
>> Regarding your medium-term plans to improve project-files performance -
>> wildly guessing, but perhaps you have in mind a way to run a subprocess
>> that outputs the project-files list? Let's call it
>> "project-files-process". And then project-files-process could be piped
>> to grep instead, for maximum efficiency? If that was the idea, then my
>> own backend could certainly have a project-files-process implementation
>> too, for maximum efficiency.
>
> That might be step number 3, although I'm not sure yet which kind of
> code will be required for the piping to be done efficiently enough.
>
> The other two things I was looking at are:
>
> - Use relative file names (less text to parse, memory to allocate, GC
> to thrash). The awkward part is how to merge that with the idea that
> project-files can include files from directories ("external
> roots"). Split those off into a different method? Treat them as
> separate projects to flat-map the lists of files at?
>
> - Add arguments to allow filtering the files using the underlying
> tool. That can also result is much fewer files to parse in the
> output under suitable circumstances (e.g. we'd be able to pass a
> list of globs here).
>
> There is one implementation of the second item in the branch
> scratch/etags-regen.
>
> And both items need to be done carefully enough to maintain some
> backward compatibility.
>
> So unless you're in a hurry, give me a few weeks to get around to this.
>
> Further suggestions and patches are welcome, of course.
I'm in no hurry. I will probably add this backend locally at my site in
the meantime. We have no existing (non-trivial) xref-find-references
backend, so speeding this one up isn't too urgent (it's not competing
with anything), but definitely I am interested in project-files (and
project.el in general) speed improvements and will try to help out as it
becomes relevant.
Severity set to 'wishlist' from 'normal'
Request was from
Stefan Kangas <stefankangas <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Mon, 11 Sep 2023 23:25:02 GMT)
Full text and
rfc822 format available.
This bug report was last modified 1 year and 279 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.