Package: emacs;
Reported by: JD Smith <jdtsmith <at> gmail.com>
Date: Sun, 6 Apr 2025 22:51:02 UTC
Severity: normal
Done: João Távora <joaotavora <at> gmail.com>
Bug is archived. No further changes may be made.
Message #14 received at 77588 <at> debbugs.gnu.org (full text, mbox):
From: JD Smith <jdtsmith <at> gmail.com> To: João Távora <joaotavora <at> gmail.com> Cc: Spencer Baugh <sbaugh <at> janestreet.com>, Eli Zaretskii <eliz <at> gnu.org>, 77588 <at> debbugs.gnu.org Subject: Re: bug#77588: Catastrophic slowdown in eglot via `flymake-diag-region' Date: Mon, 7 Apr 2025 17:41:55 -0400
[Message part 1 (text/plain, inline)]
> On Apr 7, 2025, at 12:00 PM, João Távora <joaotavora <at> gmail.com> wrote: > > Eli Zaretskii <eliz <at> gnu.org> writes: > >>> It doesn't take too many "botched eglot ranges" interacting with slow >>> `thingatpt' misbehavior to add up to a 10s delay. > > ACK. > >>> I'm not certain of the best solution. A few ideas, from hardest to easiest: >>> >>> Teach eglot textDocument/diagnostic >>> ++++++++++++++++++++++++++++ >>> >>> textDocument/publishDiagnostics messages arrive way too frequently IMO, after every buffer change of any >>> kind. They are pushed to eglot from the LSP server, and if they contain hundreds of errors, this becomes >>> very inefficient (re-painting with flymake the same hundreds of regions over and over after each keystroke). >>> >>> The best solution here would probably be to adopt "pull" diagnostics using textDocument/diagnostic, perhaps >>> in an idle-timer whose duration the user can configure. I don't believe EGLOT can do diagnostic pulls at the >>> moment. > > You're right it can't. A patch that implements without much code > repetition and keeps support for the "push" diagnostics model is > welcome. Would you like to work on it, JD? > > As far as I understand, this model is much more complicated and allows > you to pull diagnostics for individual LSP documents or the whole > project. One of the difficulties I envison is to do it in a way that > maintains support for project-wide diagnostics. > > But its certainly not impossible and would be a wonderful addition, > fixing many problems such as > https://github.com/joaotavora/eglot/issues/1296. I agree diagnostics pull would be a great addition. I know neovim added support over a year ago[1]. That said, I don't know much about how this has been implemented in other clients, and there are many questions: When is the appropriate time to pull? Should pull be done in idle time? Driven by flymake? When should you ask for the full project's diagnostics vs. just the current buffer? Always? Should diagnostics be versioned (likely)? As well, my familiarity with eglot/jsonrpc's internal structure and comm model is rudimentary at best. I'd be happy to help with logic and provide deep testing, but I'm afraid it's not something I could tackle alone. >>> Don't use thingatpt in `flymake-diag-region' >>> +++++++++++++++++++++++++++++++++ >>> >>> `flymake-diag-region' should perhaps not use thingapt, which is >>> subject to the performance vagaries of the major-mode and underlying >>> file. I am uncertain why it relies on that. Perhaps the performance >>> of those will be improved with treesitter variants. > > When a Flymake backend passes on to Flymake a 0-dimensional point in a > file you still want Flymake to create a diagnostic emcompassing a > 1-dimensional span of buffer positions. > thingatpt.el is, AFAICT, the standard Emacs's way to move from the 0 > dimension to the 1 dimension space. It needs, quite understandibly, > help from the major mode to do that job. That's a good way to put it. The situation here is a bit more subtle. The LSP server provides a valid range, such as: (:start (:line 5272 :character 48) :end (:line 5272 :character 61)) but it is out of date compared to the current state of the buffer. Since in some instances the range points at a blank line or similar bad location, eglot collapses it down to dimension 0, then asks flymake for help to expand it back to 1d. > If that help comes at a dog slow pace, I think that's a problem in > itself. > > It takes around 10~20 microsseconds on my machine in Emacs Lisp mode as > measured by: > > (/ (car (benchmark-run 10000 (thing-at-point 'sexp))) 10000) > > Even less in c++-ts-mode, around 4 microsseconds. At a randomly selected position at around line 8200 in a python-ts-mode file, I get 19ms. After a few tries on subsequent lines, I found a position that takes 583ms for the same (had to drop to 10 iterations). In case people want to play along, try João's test at the start of L8817 in this file (no eglot needed; either python-mode or python-ts-mode is fine, as both show the same issue): https://raw.githubusercontent.com/matplotlib/matplotlib/refs/heads/main/lib/matplotlib/axes/_axes.py _axes Text Document · 353 KB I will open a separate bug for that. > So if python-mode is taking around a hundred thousand times (!) more to > compute that piece of information for some buffer positions, I'd say > it's definitely something to look at... > > From a quick check it seems indeed that the time it takes to compute it > is proportional to the value of point the buffer. Suggest creating a > bug for python-mode, or calling in whoever wrote that part. I agree it's suboptimal for python-mode to be so (intermittently) slow at computing end-of-thing. It's certainly an issue worth solving, but not the main issue IMO. The main issue is that eglot is trying to "correct" a region using data which is already outdated, and therefore not likely to result in anything useful, however hard flymake tries. >>> Eglot could detect off-by-one diagnostics >>> +++++++++++++++++++++++++++++++ >>> >>> Hard to know the best heuristic, but lots of null effective ranges is >>> a good one. >>> >>> Eglot can simply ignore null range diagnostics >>> +++++++++++++++++++++++++++++++++++ >>> >>> Eglot doesn't need to use `flymake-diag-region' to try to calculate an update range if it encounters a null >>> diagnostic range. It could simply drop those, as they are probably wrong anyway, and will shortly be updated. > > Maybe, IF we can confidently say that the server is in the wrong. > > But I added it there for a reason, and quite a long time ago. See the > commit log for > > commit 7826b265a0ecd9357719b2fb9491c9bcb517d4cc > Author: João Távora <joaotavora <at> gmail.com> > Date: Thu Jun 21 23:32:14 2018 +0100 > > I'm moderately sure someone somewhere expects the botches ranges to be > auto-corrected by Eglot (though admittedly not at the expense of large > delays). It appears from that commit message that the issue is rather that certain servers send null ranges up front. One middle ground here would then be to attempt to correct ranges only if they start out null from the server, not if they are collapsed that way by eglot. Implementing this makes performance not that great, but predictable: no 10s pauses. Another much better option I just discovered: avoid applying out-of-date diagnostics in the first place. After taking another look, I noticed that since LSP spec v3.15 (v3.17 current), textDocument/publishDiagnostics includes: /** * Optional the version number of the document the diagnostics are published * for. * * @since 3.15.0 */ version?: integer; It seems eglot keeps track of the "document version" in `eglot--versioned-identifier'. I believe that can be compared to the one in the `textDocument/diagnostic' message to avoid this mess altogether. See attached patch for both of these options rolled together; this has performance which is now actually quite decent in my long file. It drops the outdated push diagnostics of my chatty server, which seem to be about half of them. Not only does this eliminate the "off by one" catastrophic flymake issue, it improves performance overall significantly, probably due to less flymake overlay churn. Best, JD [1] https://atlee.ca/posts/pull-diagnostic-support-for-neovim/ 
[Message part 2 (text/html, inline)]
[preview.png (image/png, inline)]
[Message part 4 (text/html, inline)]
[null-ranges-only-use-version.patch (application/octet-stream, attachment)]
[Message part 6 (text/html, inline)]
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.