GNU bug report logs - #77588
Catastrophic slowdown in eglot via `flymake-diag-region'

Package: emacs;

Reported by: JD Smith <jdtsmith <at> gmail.com>

Date: Sun, 6 Apr 2025 22:51:02 UTC

Severity: normal

Done: João Távora <joaotavora <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 77588 in the body.
You can then email your comments to 77588 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#77588; Package emacs. (Sun, 06 Apr 2025 22:51:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to JD Smith <jdtsmith <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 06 Apr 2025 22:51:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: JD Smith <jdtsmith <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Catastrophic slowdown in eglot via `flymake-diag-region'
Date: Sun, 6 Apr 2025 18:50:26 -0400

[Message part 1 (text/plain, inline)]

I was diagnosing a painful eglot pause of about 10 seconds that would reliably occur every few edits when completing candidates and working in a long python file (~8k lines). I am using the basedpyright LSP server. I narrowed this down to eglot's `textDocument/publishDiagnostics' notification handler.

The file I was working on has several hundred warnings and errors, so that represents a substantial message from the LSP server. After a fair bit of timing and sleuthing, I narrowed it down further. A few interrelated misbehaviors combined to produce the huge intermittent pauses:

When completing text directly after entering/removing a newline, the Diagnostics ranges received by eglot are out of date and "one line off".
Some of these off-by-one-line diagnostic message ranges now land on blank lines in the file, so the effective range there is empty.
Seeing an empty range, eglot declares the server has "botched" it, so it tries to make a by-hand recalculation of the range using `flymake-diag-region'.
`flymake-diag-region' for some reason calls (end-of-thing 'sexp) and (end-of-thing 'symbol).
In some positions in a long python file these commands are VERY SLOW. Rather than the typical call time of ~1ms, at these certain positions `flymake-diag-region' takes ~400ms.

It doesn't take too many "botched eglot ranges" interacting with slow `thingatpt' misbehavior to add up to a 10s delay.

I'm not certain of the best solution. A few ideas, from hardest to easiest:

Teach eglot textDocument/diagnostic
++++++++++++++++++++++++++++

textDocument/publishDiagnostics messages arrive way too frequently IMO, after every buffer change of any kind. They are pushed to eglot from the LSP server, and if they contain hundreds of errors, this becomes very inefficient (re-painting with flymake the same hundreds of regions over and over after each keystroke).

The best solution here would probably be to adopt "pull" diagnostics using textDocument/diagnostic, perhaps in an idle-timer whose duration the user can configure. I don't believe EGLOT can do diagnostic pulls at the moment.

Don't use thingatpt in `flymake-diag-region'
+++++++++++++++++++++++++++++++++

`flymake-diag-region' should perhaps not use thingapt, which is subject to the performance vagaries of the major-mode and underlying file. I am uncertain why it relies on that. Perhaps the performance of those will be improved with treesitter variants.

Eglot could detect off-by-one diagnostics
+++++++++++++++++++++++++++++++

Hard to know the best heuristic, but lots of null effective ranges is a good one.

Eglot can simply ignore null range diagnostics
+++++++++++++++++++++++++++++++++++

Eglot doesn't need to use `flymake-diag-region' to try to calculate an update range if it encounters a null diagnostic range. It could simply drop those, as they are probably wrong anyway, and will shortly be updated.

[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77588; Package emacs. (Mon, 07 Apr 2025 11:22:02 GMT) Full text and rfc822 format available.

Message #8 received at 77588 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: JD Smith <jdtsmith <at> gmail.com>, João Távora
 <joaotavora <at> gmail.com>, Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 77588 <at> debbugs.gnu.org
Subject: Re: bug#77588: Catastrophic slowdown in eglot via
 `flymake-diag-region'
Date: Mon, 07 Apr 2025 14:21:38 +0300

> From: JD Smith <jdtsmith <at> gmail.com>
> Date: Sun, 6 Apr 2025 18:50:26 -0400
> 
> I was diagnosing a painful eglot pause of about 10 seconds that would reliably occur every few edits when
> completing candidates and working in a long python file (~8k lines).  I am using the basedpyright LSP server. 
> I narrowed this down to eglot's `textDocument/publishDiagnostics' notification handler.
> 
> The file I was working on has several hundred warnings and errors, so that represents a substantial message
> from the LSP server.  After a fair bit of timing and sleuthing, I narrowed it down further.  A few interrelated
> misbehaviors combined to produce the huge intermittent pauses:
> 
> 1 When completing text directly after entering/removing a newline, the Diagnostics ranges received by eglot
>  are out of date and "one line off".
> 2 Some of these off-by-one-line diagnostic message ranges now land on blank lines in the file, so the
>  effective range there is empty.
> 3 Seeing an empty range, eglot declares the server has "botched" it, so it tries to make a by-hand
>  recalculation of the range using `flymake-diag-region'.
> 4 `flymake-diag-region' for some reason calls (end-of-thing 'sexp) and (end-of-thing 'symbol).  
> 5 In some positions in a long python file these commands are VERY SLOW.  Rather than the typical call time
>  of ~1ms, at these certain positions `flymake-diag-region' takes ~400ms.
> 
> It doesn't take too many "botched eglot ranges" interacting with slow `thingatpt' misbehavior to add up to a
> 10s delay.
> 
> I'm not certain of the best solution.  A few ideas, from hardest to easiest:
> 
> Teach eglot textDocument/diagnostic
> ++++++++++++++++++++++++++++
> 
>  textDocument/publishDiagnostics messages arrive way too frequently IMO, after every buffer change of any
> kind.  They are pushed to eglot from the LSP server, and if they contain hundreds of errors, this becomes
> very inefficient (re-painting with flymake the same hundreds of regions over and over after each keystroke).
> 
> The best solution here would probably be to adopt "pull" diagnostics using textDocument/diagnostic, perhaps
> in an idle-timer whose duration the user can configure.  I don't believe EGLOT can do diagnostic pulls at the
> moment.
> 
> Don't use thingatpt in `flymake-diag-region'
> +++++++++++++++++++++++++++++++++
> 
> `flymake-diag-region' should perhaps not use thingapt, which is subject to the performance vagaries of the
> major-mode and underlying file.  I am uncertain why it relies on that.  Perhaps the performance of those will be
> improved with treesitter variants.
> 
> Eglot could detect off-by-one diagnostics
> +++++++++++++++++++++++++++++++
> 
> Hard to know the best heuristic, but lots of null effective ranges is a good one.
> 
> Eglot can simply ignore null range diagnostics
> +++++++++++++++++++++++++++++++++++
> 
> Eglot doesn't need to use `flymake-diag-region' to try to calculate an update range if it encounters a null
> diagnostic range.  It could simply drop those, as they are probably wrong anyway, and will shortly be updated.

Adding João and Spencer to the discussion.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77588; Package emacs. (Mon, 07 Apr 2025 16:00:02 GMT) Full text and rfc822 format available.

Message #11 received at 77588 <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Spencer Baugh <sbaugh <at> janestreet.com>, 77588 <at> debbugs.gnu.org,
 JD Smith <jdtsmith <at> gmail.com>
Subject: Re: bug#77588: Catastrophic slowdown in eglot via
 `flymake-diag-region'
Date: Mon, 07 Apr 2025 17:00:13 +0100

Eli Zaretskii <eliz <at> gnu.org> writes:

>> It doesn't take too many "botched eglot ranges" interacting with slow
>> `thingatpt' misbehavior to add up to a 10s delay.

ACK.

>> I'm not certain of the best solution.  A few ideas, from hardest to easiest:
>> 
>> Teach eglot textDocument/diagnostic
>> ++++++++++++++++++++++++++++
>> 
>>  textDocument/publishDiagnostics messages arrive way too frequently IMO, after every buffer change of any
>> kind.  They are pushed to eglot from the LSP server, and if they contain hundreds of errors, this becomes
>> very inefficient (re-painting with flymake the same hundreds of regions over and over after each keystroke).
>> 
>> The best solution here would probably be to adopt "pull" diagnostics using textDocument/diagnostic, perhaps
>> in an idle-timer whose duration the user can configure.  I don't believe EGLOT can do diagnostic pulls at the
>> moment.

You're right it can't.  A patch that implements without much code
repetition and keeps support for the "push" diagnostics model is
welcome.  Would you like to work on it, JD?

As far as I understand, this model is much more complicated and allows
you to pull diagnostics for individual LSP documents or the whole
project.  One of the difficulties I envison is to do it in a way that
maintains support for project-wide diagnostics.

But its certainly not impossible and would be a wonderful addition,
fixing many problems such as
https://github.com/joaotavora/eglot/issues/1296.

>> 
>> Don't use thingatpt in `flymake-diag-region'
>> +++++++++++++++++++++++++++++++++
>> 
>> `flymake-diag-region' should perhaps not use thingapt, which is
>> subject to the performance vagaries of the major-mode and underlying
>> file.  I am uncertain why it relies on that.  Perhaps the performance
>> of those will be improved with treesitter variants.

When a Flymake backend passes on to Flymake a 0-dimensional point in a
file you still want Flymake to create a diagnostic emcompassing a
1-dimensional span of buffer positions.

thingatpt.el is, AFAICT, the standard Emacs's way to move from the 0
dimension to the 1 dimension space.  It needs, quite understandibly,
help from the major mode to do that job.

If that help comes at a dog slow pace, I think that's a problem in
itself.

It takes around 10~20 microsseconds on my machine in Emacs Lisp mode as
measured by:

(/ (car (benchmark-run 10000 (thing-at-point 'sexp))) 10000)

Even less in c++-ts-mode, around 4 microsseconds.

So if python-mode is taking around a hundred thousand times (!) more to
compute that piece of information for some buffer positions, I'd say
it's definitely something to look at...

From a quick check it seems indeed that the time it takes to compute it
is proportional to the value of point the buffer.  Suggest creating a
bug for python-mode, or calling in whoever wrote that part.

>> Eglot could detect off-by-one diagnostics
>> +++++++++++++++++++++++++++++++
>> 
>> Hard to know the best heuristic, but lots of null effective ranges is
>> a good one.
>> 
>> Eglot can simply ignore null range diagnostics
>> +++++++++++++++++++++++++++++++++++
>> 
>> Eglot doesn't need to use `flymake-diag-region' to try to calculate an update range if it encounters a null
>> diagnostic range.  It could simply drop those, as they are probably wrong anyway, and will shortly be updated.

Maybe, IF we can confidently say that the server is in the wrong. 

But I added it there for a reason, and quite a long time ago. See the
commit log for

commit 7826b265a0ecd9357719b2fb9491c9bcb517d4cc
Author: João Távora <joaotavora <at> gmail.com>
Date:   Thu Jun 21 23:32:14 2018 +0100

I'm moderately sure someone somewhere expects the botches ranges to be
auto-corrected by Eglot (though admittedly not at the expense of large
delays).

João

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77588; Package emacs. (Mon, 07 Apr 2025 21:43:02 GMT) Full text and rfc822 format available.

Message #14 received at 77588 <at> debbugs.gnu.org (full text, mbox):

From: JD Smith <jdtsmith <at> gmail.com>
To: João Távora <joaotavora <at> gmail.com>
Cc: Spencer Baugh <sbaugh <at> janestreet.com>, Eli Zaretskii <eliz <at> gnu.org>,
 77588 <at> debbugs.gnu.org
Subject: Re: bug#77588: Catastrophic slowdown in eglot via
 `flymake-diag-region'
Date: Mon, 7 Apr 2025 17:41:55 -0400

[Message part 1 (text/plain, inline)]

> On Apr 7, 2025, at 12:00 PM, João Távora <joaotavora <at> gmail.com> wrote:
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
>>> It doesn't take too many "botched eglot ranges" interacting with slow
>>> `thingatpt' misbehavior to add up to a 10s delay.
> 
> ACK.
> 
>>> I'm not certain of the best solution.  A few ideas, from hardest to easiest:
>>> 
>>> Teach eglot textDocument/diagnostic
>>> ++++++++++++++++++++++++++++
>>> 
>>> textDocument/publishDiagnostics messages arrive way too frequently IMO, after every buffer change of any
>>> kind.  They are pushed to eglot from the LSP server, and if they contain hundreds of errors, this becomes
>>> very inefficient (re-painting with flymake the same hundreds of regions over and over after each keystroke).
>>> 
>>> The best solution here would probably be to adopt "pull" diagnostics using textDocument/diagnostic, perhaps
>>> in an idle-timer whose duration the user can configure.  I don't believe EGLOT can do diagnostic pulls at the
>>> moment.
> 
> You're right it can't.  A patch that implements without much code
> repetition and keeps support for the "push" diagnostics model is
> welcome.  Would you like to work on it, JD?
> 
> As far as I understand, this model is much more complicated and allows
> you to pull diagnostics for individual LSP documents or the whole
> project.  One of the difficulties I envison is to do it in a way that
> maintains support for project-wide diagnostics.
> 
> But its certainly not impossible and would be a wonderful addition,
> fixing many problems such as
> https://github.com/joaotavora/eglot/issues/1296.

I agree diagnostics pull would be a great addition.  I know neovim added support over a year ago[1].  That said, I don't know much about how this has been implemented in other clients, and there are many questions: 

When is the appropriate time to pull?
Should pull be done in idle time?  Driven by flymake?
When should you ask for the full project's diagnostics vs. just the current buffer?  Always?
Should diagnostics be versioned (likely)?

As well, my familiarity with eglot/jsonrpc's internal structure and comm model is rudimentary at best.  I'd be happy to help with logic and provide deep testing, but I'm afraid it's not something I could tackle alone.

>>> Don't use thingatpt in `flymake-diag-region'
>>> +++++++++++++++++++++++++++++++++
>>> 
>>> `flymake-diag-region' should perhaps not use thingapt, which is
>>> subject to the performance vagaries of the major-mode and underlying
>>> file.  I am uncertain why it relies on that.  Perhaps the performance
>>> of those will be improved with treesitter variants.
> 
> When a Flymake backend passes on to Flymake a 0-dimensional point in a
> file you still want Flymake to create a diagnostic emcompassing a
> 1-dimensional span of buffer positions.

> thingatpt.el is, AFAICT, the standard Emacs's way to move from the 0
> dimension to the 1 dimension space.  It needs, quite understandibly,
> help from the major mode to do that job.

That's a good way to put it.  The situation here is a bit more subtle.  The LSP server provides a valid range, such as:

(:start (:line 5272 :character 48) :end (:line 5272 :character 61))

but it is out of date compared to the current state of the buffer.  Since in some instances the range points at a blank line or similar bad location, eglot collapses it down to dimension 0, then asks flymake for help to expand it back to 1d.

> If that help comes at a dog slow pace, I think that's a problem in
> itself.
> 
> It takes around 10~20 microsseconds on my machine in Emacs Lisp mode as
> measured by:
> 
> (/ (car (benchmark-run 10000 (thing-at-point 'sexp))) 10000)
> 
> Even less in c++-ts-mode, around 4 microsseconds.

At a randomly selected position at around line 8200 in a python-ts-mode file, I get 19ms.  After a few tries on subsequent lines, I found a position that takes 583ms for the same (had to drop to 10 iterations).  

In case people want to play along, try João's test at the start of L8817 in this file (no eglot needed; either python-mode or python-ts-mode is fine, as both show the same issue):

https://raw.githubusercontent.com/matplotlib/matplotlib/refs/heads/main/lib/matplotlib/axes/_axes.py
_axes
Text Document · 353 KB

I will open a separate bug for that.

> So if python-mode is taking around a hundred thousand times (!) more to
> compute that piece of information for some buffer positions, I'd say
> it's definitely something to look at...
> 
> From a quick check it seems indeed that the time it takes to compute it
> is proportional to the value of point the buffer.  Suggest creating a
> bug for python-mode, or calling in whoever wrote that part.

I agree it's suboptimal for python-mode to be so (intermittently) slow at computing end-of-thing.  It's certainly an issue worth solving, but not the main issue IMO.  

The main issue is that eglot is trying to "correct" a region using data which is already outdated, and therefore not likely to result in anything useful, however hard flymake tries.

>>> Eglot could detect off-by-one diagnostics
>>> +++++++++++++++++++++++++++++++
>>> 
>>> Hard to know the best heuristic, but lots of null effective ranges is
>>> a good one.
>>> 
>>> Eglot can simply ignore null range diagnostics
>>> +++++++++++++++++++++++++++++++++++
>>> 
>>> Eglot doesn't need to use `flymake-diag-region' to try to calculate an update range if it encounters a null
>>> diagnostic range.  It could simply drop those, as they are probably wrong anyway, and will shortly be updated.
> 
> Maybe, IF we can confidently say that the server is in the wrong. 
> 
> But I added it there for a reason, and quite a long time ago. See the
> commit log for
> 
> commit 7826b265a0ecd9357719b2fb9491c9bcb517d4cc
> Author: João Távora <joaotavora <at> gmail.com>
> Date:   Thu Jun 21 23:32:14 2018 +0100
> 
> I'm moderately sure someone somewhere expects the botches ranges to be
> auto-corrected by Eglot (though admittedly not at the expense of large
> delays).

It appears from that commit message that the issue is rather that certain servers send null ranges up front.  One middle ground here would then be to attempt to correct ranges only if they start out null from the server, not if they are collapsed that way by eglot.  Implementing this makes performance not that great, but predictable: no 10s pauses.

Another much better option I just discovered: avoid applying out-of-date diagnostics in the first place.  After taking another look, I noticed that since LSP spec v3.15 (v3.17 current), textDocument/publishDiagnostics includes:

/**
 * Optional the version number of the document the diagnostics are published
 * for.
 *
 * @since 3.15.0
 */
version?: integer;

It seems eglot keeps track of the "document version" in `eglot--versioned-identifier'.  I believe that can be compared to the one in the `textDocument/diagnostic' message to avoid this mess altogether.  

See attached patch for both of these options rolled together; this has performance which is now actually quite decent in my long file.  It drops the outdated push diagnostics of my chatty server, which seem to be about half of them.  Not only does this eliminate the "off by one" catastrophic flymake issue, it improves performance overall significantly, probably due to less flymake overlay churn.

Best,
JD

[1] https://atlee.ca/posts/pull-diagnostic-support-for-neovim/

[Message part 2 (text/html, inline)]

[preview.png (image/png, inline)]

[Message part 4 (text/html, inline)]

[null-ranges-only-use-version.patch (application/octet-stream, attachment)]

[Message part 6 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77588; Package emacs. (Tue, 08 Apr 2025 08:01:02 GMT) Full text and rfc822 format available.

Message #17 received at 77588 <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: JD Smith <jdtsmith <at> gmail.com>
Cc: Spencer Baugh <sbaugh <at> janestreet.com>, Eli Zaretskii <eliz <at> gnu.org>,
 77588 <at> debbugs.gnu.org
Subject: Re: bug#77588: Catastrophic slowdown in eglot via
 `flymake-diag-region'
Date: Tue, 08 Apr 2025 09:00:55 +0100

JD Smith <jdtsmith <at> gmail.com> writes:

> know much about how this has been implemented in other clients, and there are many questions: 
>
> * When is the appropriate time to pull?
> * Should pull be done in idle time?  Driven by flymake?

You are overthinking things.  Pull diagnostics are actually much closer
to Flymake's original model than push diagnostics.  You pull them when
Flymake decides it's a good time to pull, and that's it.

  See https://www.gnu.org/software/emacs/manual/html_node/flymake/Backend-functions.html

> * When should you ask for the full project's diagnostics vs. just the
> current buffer?  Always?

You could start with just the current buffer.

> * Should diagnostics be versioned (likely)?

I don't know what that means, but the answer is likely no.  Flymake
knows how to manage its objects.

> As well, my familiarity with eglot/jsonrpc's internal structure and
> comm model is rudimentary at best.  I'd be happy to help with logic
> and provide deep testing, but I'm afraid it's not something I could
> tackle alone.

You don't need to know any jsonrpc internals to implement this.  As to
Eglot's internals, you need to know no more than what you've
demonstrated you already know in this issue.

> I will open a separate bug for that.

Paste the number here when you do that.

> The main issue is that eglot is trying to "correct" a region using
> data which is already outdated, and therefore not likely to result in
> anything useful, however hard flymake tries.

Agree.

> It seems eglot keeps track of the "document version" in `eglot--versioned-identifier'.  I believe that can be compared to
> the one in the `textDocument/diagnostic' message to avoid this mess
> altogether.

I've implemented this.  Please test and let's close.

The second thing you propose is too risky and redundant anyway.  Eglot
is generally not in the business of trying very hard to fix server
flaws.  So I will let the "botched ranged" code be.  It was introduced
early on when this policy was less clear and I wanted a young Eglot to
work with flawed servers.  I'll stay for backward compatibility reasons,
I've been doing this long enough to know someonw will grumble if I
delete it.

I suggest you open bugs against the LSP server (and the ridiculously
slow thingatpt.el for Python, which you already said you did)

João

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77588; Package emacs. (Tue, 08 Apr 2025 13:47:02 GMT) Full text and rfc822 format available.

Message #20 received at 77588 <at> debbugs.gnu.org (full text, mbox):

From: JD Smith <jdtsmith <at> gmail.com>
To: João Távora <joaotavora <at> gmail.com>
Cc: Spencer Baugh <sbaugh <at> janestreet.com>, Eli Zaretskii <eliz <at> gnu.org>,
 77588 <at> debbugs.gnu.org
Subject: Re: bug#77588: Catastrophic slowdown in eglot via
 `flymake-diag-region'
Date: Tue, 8 Apr 2025 09:46:13 -0400

> On Apr 8, 2025, at 4:00 AM, João Távora <joaotavora <at> gmail.com> wrote:
> 
> JD Smith <jdtsmith <at> gmail.com> writes:
> 
> You pull them when Flymake decides it's a good time to pull, and that's it.

I see.  What I was thinking here is that since flymake doesn't understand that it is talking with an LSP server, it may ask for updates at inconvenient times.  Ideally you'd arrange the timing of diagnostic pulls to maximize the chances the document version matches. 

>> * Should diagnostics be versioned (likely)?
> 
> I don't know what that means, but the answer is likely no.  Flymake
> knows how to manage its objects.

I meant LSP version identifiers.  After my success solving the current problem using them, I believe ensuring the document version the LSP server worked against when working up a response to the diagnostic pull request matches the current version is vitally important.  Luckily this looks to be straightforward.

>> As well, my familiarity with eglot/jsonrpc's internal structure and
>> comm model is rudimentary at best.  I'd be happy to help with logic
>> and provide deep testing, but I'm afraid it's not something I could
>> tackle alone.
> 
> You don't need to know any jsonrpc internals to implement this.  As to
> Eglot's internals, you need to know no more than what you've
> demonstrated you already know in this issue.

I'll take it under consideration, though I suspect it would be 10x easier for someone more familiar with eglot's semantics and code conventions, which are more complex than typical packages.

>> I will open a separate bug for that.
> 
> Paste the number here when you do that.

Opened at bug#77620.

>> I believe that can be compared to the one in the `textDocument/diagnostic' message to avoid this mess altogether.
> 
> I've implemented this.  Please test and let's close.

Thanks.  This tests fine on master.  Performance in my large python buffer is now refreshingly acceptable.  Two questions:

- Can we be certain that `eglot--versioned-identifier' is always an integer?
- Any reason not to implement this on the emacs-30 branch?  

> The second thing you propose is too risky and redundant anyway.  Eglot
> is generally not in the business of trying very hard to fix server
> flaws.  So I will let the "botched ranged" code be.  

That's fine, it likely won't matter since that code will now rarely be triggered, and possibly thingatpt performance can be improved in python-mode for those rare times in which it is.

> I suggest you open bugs against the LSP server (and the ridiculously
> slow thingatpt.el for Python, which you already said you did)

I don't think the LSP server is to blame here.  Yes, it's sending large messages (too often) and getting behind, but it is dutifully providing the document version number.  The client is presumably responsible to act on that accordingly.  That said, I suspect eglot predates the availability of this particular version information.

Thanks for your work on eglot.  Used daily.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77588; Package emacs. (Tue, 08 Apr 2025 15:43:01 GMT) Full text and rfc822 format available.

Message #23 received at 77588 <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: JD Smith <jdtsmith <at> gmail.com>
Cc: Spencer Baugh <sbaugh <at> janestreet.com>, Eli Zaretskii <eliz <at> gnu.org>,
 77588 <at> debbugs.gnu.org, 77588-done <at> debbugs.gnu.org
Subject: Re: bug#77588: Catastrophic slowdown in eglot via
 `flymake-diag-region'
Date: Tue, 8 Apr 2025 16:42:15 +0100

[Message part 1 (text/plain, inline)]

On Tue, Apr 8, 2025, 14:46 JD Smith <jdtsmith <at> gmail.com> wrote:.

I meant LSP version identifiers.  After my success solving the current
> problem using them, I believe ensuring the document version the LSP server
> worked against when working up a response to the diagnostic pull request
> matches the current version is vitally important.  Luckily this looks to be
> straightforward.
>

Yes.

I'll take it under consideration, though I suspect it would be 10x easier
> for someone more familiar with eglot's semantics and code conventions,
> which are more complex than typical packages.
>

The main difficulty seems to be finding a server which supports this mode.
Clangd and basedpyright don't seem to.

>
> Thanks.  This tests fine on master.  Performance in my large python buffer
> is now refreshingly acceptable.  Two questions:
>
> - Can we be certain that `eglot--versioned-identifier' is always an
> integer?
>

No. It can be nil. But what it's not nil, it's an integer as specified in
the standard.

- Any reason not to implement this on the emacs-30 branch?

Eglot is a GNU Elpa :core package and so it has its own release rhythm.

I don't think the LSP server is to blame here.  Yes, it's sending large
> messages (too often) and getting behind, but it is dutifully providing the
> document version number.

I beg to differ. A server that is sending useless junk down the wire,
however legal that junk is, is a server that should be fixed. As you well
know, performance problems arise often from just parsing JSON into
expensive Lisp structures.

João

[Message part 2 (text/html, inline)]

Reply sent to João Távora <joaotavora <at> gmail.com>:
You have taken responsibility. (Tue, 08 Apr 2025 15:43:02 GMT) Full text and rfc822 format available.

Notification sent to JD Smith <jdtsmith <at> gmail.com>:
bug acknowledged by developer. (Tue, 08 Apr 2025 15:43:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77588; Package emacs. (Tue, 08 Apr 2025 16:25:01 GMT) Full text and rfc822 format available.

Message #31 received at 77588 <at> debbugs.gnu.org (full text, mbox):

From: JD Smith <jdtsmith <at> gmail.com>
To: João Távora <joaotavora <at> gmail.com>
Cc: Spencer Baugh <sbaugh <at> janestreet.com>, Eli Zaretskii <eliz <at> gnu.org>,
 77588 <at> debbugs.gnu.org, 77588-done <at> debbugs.gnu.org
Subject: Re: bug#77588: Catastrophic slowdown in eglot via
 `flymake-diag-region'
Date: Tue, 8 Apr 2025 12:24:00 -0400

[Message part 1 (text/plain, inline)]

> On Apr 8, 2025, at 11:42 AM, João Távora <joaotavora <at> gmail.com> wrote:
> 
> The main difficulty seems to be finding a server which supports this mode. Clangd and basedpyright don't seem to. 

basedpyright actually does; it was added about a month ago via pyright:

https://github.com/microsoft/pyright/pull/10055
Implement support for pull diagnostics in Pyright by rchiodo · Pull Request #10055 · microsoft/pyright
github.com

>> - Can we be certain that `eglot--versioned-identifier' is always an integer?
> 
> 
> No. It can be nil. But what it's not nil, it's an integer as specified in the standard.

Then it might be worth switching the test to `(not (eq ...' from `(/= ...'.

>> I don't think the LSP server is to blame here.  Yes, it's sending large messages (too often) and getting behind, but it is dutifully providing the document version number. 
> 
> 
> I beg to differ. A server that is sending useless junk down the wire, however legal that junk is, is a server that should be fixed. As you well know, performance problems arise often from just parsing JSON into expensive Lisp structures.

My point is that the message's version info is an explicit acknowledgement that the information may be out of date by the time the client processes it.  I see no way around that (other than everything working quicker).  A fair critique is just how many push updates arrive.  I suspect if we raise this they'd point us to pull diagnostics anyway.  A nice feature of pull is it can respond "no changes" when there are none.

[Message part 2 (text/html, inline)]

[10055.png (image/png, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77588; Package emacs. (Tue, 08 Apr 2025 16:25:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77588; Package emacs. (Tue, 08 Apr 2025 21:01:01 GMT) Full text and rfc822 format available.

Message #37 received at 77588 <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: JD Smith <jdtsmith <at> gmail.com>
Cc: Spencer Baugh <sbaugh <at> janestreet.com>, Eli Zaretskii <eliz <at> gnu.org>,
 77588 <at> debbugs.gnu.org
Subject: Re: bug#77588: Catastrophic slowdown in eglot via
 `flymake-diag-region'
Date: Tue, 08 Apr 2025 22:00:22 +0100

[I dropped the bug report and previous recipients, adding them back and
hoping they can piece together the context from the quotations]

JD Smith <jdtsmith <at> gmail.com> writes:

>> Btw your html mails are hard to read.
>
> This is hopefully better.

Yes it is, thanks!

>> On Apr 8, 2025, at 12:44 PM, João Távora <joaotavora <at> gmail.com> wrote:
>> 
>> 
>> On Tue, Apr 8, 2025, 17:24 JD Smith <jdtsmith <at> gmail.com> wrote
>> 
>> Then it might be worth switching the test to `(not (eq ...' from `(/= ...'.
>> 
>> I don't see why. = Is the lisp way of comparing things we know are numbers.
>
> You just mentioned one of them can be nil though...

Right, but the patch I pushed checks for that before using `/=`.

>> My point is that the message's version info is an explicit
>> acknowledgement that the information may be out of date by the time
>> the client processes it.  I see no way around that (other than
>> everything working quicker).
>> 
>> What could be the server's rationale for sending out of date
>> information when it _knows_ that the information is out of date?
>> Presumably we the client have informed it of the new version in
>> didChange. Or am I missing something?
>
> Maybe I am missing something, but if the push diagnostics and
> didChange cross paths, I could see how the latter could be out of
> date.

Indeed, but they don't cross paths, at least not on virtually all
servers that I know of.  Servers on a "push diagnostics" model operate
by sending at most 1 publishDiagnostics notification _after_ receiiving
a didChange.  Which makes sense.  You're notified of some changes V1,
you do some calculations, and you publish your results.  If you operate
asynchronously and during that time you get another notification of
changes V2, you probably have throw most work out (certainly don't
publish it).

If you supply the Eglot events log to your server session we can check
what's happening.  I have used basedpyright+eglot in the past though,
and I never noticed any problems (doesn't mean they can't or weren't
happening under the hood, but I certainly didn't experience any
catastrophic slowdown).

João

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77588; Package emacs. (Tue, 08 Apr 2025 23:46:02 GMT) Full text and rfc822 format available.

Message #40 received at 77588 <at> debbugs.gnu.org (full text, mbox):

From: JD Smith <jdtsmith <at> gmail.com>
To: João Távora <joaotavora <at> gmail.com>
Cc: Spencer Baugh <sbaugh <at> janestreet.com>, Eli Zaretskii <eliz <at> gnu.org>,
 77588 <at> debbugs.gnu.org
Subject: Re: bug#77588: Catastrophic slowdown in eglot via
 `flymake-diag-region'
Date: Tue, 8 Apr 2025 19:45:05 -0400


> On Apr 8, 2025, at 5:00 PM, João Távora <joaotavora <at> gmail.com> wrote:
> 
>> 
>> Maybe I am missing something, but if the push diagnostics and
>> didChange cross paths, I could see how the latter could be out of
>> date.
> 
> Indeed, but they don't cross paths, at least not on virtually all
> servers that I know of.  Servers on a "push diagnostics" model operate
> by sending at most 1 publishDiagnostics notification _after_ receiiving
> a didChange.  Which makes sense.  You're notified of some changes V1,
> you do some calculations, and you publish your results.  If you operate
> asynchronously and during that time you get another notification of
> changes V2, you probably have throw most work out (certainly don't
> publish it).
> 
> If you supply the Eglot events log to your server session we can check
> what's happening.  I have used basedpyright+eglot in the past though,
> and I never noticed any problems (doesn't mean they can't or weren't
> happening under the hood, but I certainly didn't experience any
> catastrophic slowdown).

This is an interesting point.  Here's an event log: [1].  I modified the old notifier to simply alert when versions didn't match.  It reports:

Diagnostics version mismatch 28 received, 29 current

In the log, you can see the following pattern:

[jsonrpc] e[19:15:32.870] --> textDocument/inlayHint[107] 
[jsonrpc] e[19:15:32.905] <-- textDocument/inlayHint[107] 
[jsonrpc] e[19:15:33.079] --> textDocument/didChange             ; Version 29 change sent
[jsonrpc] e[19:15:33.079] --> textDocument/completion[108] 
[jsonrpc] e[19:15:33.383]   --> textDocument/inlayHint[109] 
[jsonrpc] e[19:15:33.446]   <-- textDocument/publishDiagnostics  ; Version 28 diagnostics received
[jsonrpc] e[19:15:33.800]   --> textDocument/signatureHelp[110] 
[jsonrpc] e[19:15:33.801]   --> textDocument/hover[111] 
[jsonrpc] e[19:15:33.801]   --> textDocument/documentHighlight[112] 
[jsonrpc] e[19:15:33.807]   <-- textDocument/completion[108] 
[jsonrpc] e[19:15:33.809] <-- textDocument/inlayHint[109] 
[jsonrpc] e[19:15:34.095] <-- textDocument/publishDiagnostics    ; Version 29 diagnostics received

I.e. v28 diagnostics are received *after* v29 didChange is sent.  Up to 50% of the updates have this version mismatch.  Is it significant that the publishDiagnostics is "nested" inside a completion?  Note that it also happens (though less frequently) even when typing a multiple line comment, always shortly after a newline is entered.  I agree this is strange: basepyright should drop the v28 update like a rock.

Also during these pauses, I found that end-of-thing often (but not always) errors out:

1 -> (end-of-thing symbol)
1 <- end-of-thing: !non-local\ exit!
======================================================================
1 -> (end-of-thing sexp)
1 <- end-of-thing: !non-local\ exit!
======================================================================
1 -> (end-of-thing symbol)
1 <- end-of-thing: 306546
======================================================================
1 -> (end-of-thing sexp)
1 <- end-of-thing: !non-local\ exit!
======================================================================
1 -> (end-of-thing symbol)
1 <- end-of-thing: 306546
======================================================================
1 -> (end-of-thing sexp)
1 <- end-of-thing: !non-local\ exit!
======================================================================

I think this explains why my slow-down isn't as bad here: end-of-thing is now *crashing* (inside ignore-errors).  Not the most useful speedup to be sure, and not sure what changed, probably a parsing difference in the file. 

Repeating that command by hand shows the real error:

Debugger entered--Lisp error: (error "No sexp here")
  signal(error ("No sexp here"))
  error("No %s here" sexp)
  #<subr end-of-thing>(sexp)
  apply(#<subr end-of-thing> sexp)
  #f(compiled-function (body &rest args) #<bytecode 0x1dcd258a89e09bb5>)(#<subr end-of-thing> sexp)
  apply(#f(compiled-function (body &rest args) #<bytecode 0x1dcd258a89e09bb5>) #<subr end-of-thing> sexp)
  end-of-thing(sexp)
  eval((end-of-thing 'sexp) t)
  #f(compiled-function () #<bytecode 0x12f7d3ddabb81c2a>)()
  #f(compiled-function () #<bytecode -0x5db3e1955cb81d1>)()
  handler-bind-1(#f(compiled-function () #<bytecode -0x5db3e1955cb81d1>) (error) eval-expression--debug)
  eval-expression((end-of-thing 'sexp) nil nil 127)
  funcall-interactively(eval-expression (end-of-thing 'sexp) nil nil 127)
  call-interactively(eval-expression nil nil)
  command-execute(eval-expression)


[1] https://gist.githubusercontent.com/jdtsmith/a604c1e99f98f48da4d76f0abf07aedc/raw/0eb71ea878c70dc4d44afe56e44731d7dd135e19/event-buffer.txt

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 07 May 2025 11:24:25 GMT) Full text and rfc822 format available.

This bug report was last modified 94 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #77588 Catastrophic slowdown in eglot via `flymake-diag-region'

GNU bug report logs - #77588
Catastrophic slowdown in eglot via `flymake-diag-region'