GNU bug report logs -
#61726
[PATCH] Eglot: Support positionEncoding capability
Previous Next
Reported by: Augusto Stoffel <arstoffel <at> gmail.com>
Date: Thu, 23 Feb 2023 08:06:01 UTC
Severity: normal
Tags: patch
Done: João Távora <joaotavora <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
Message #65 received at 61726 <at> debbugs.gnu.org (full text, mbox):
On Fri, 24 Feb 2023 at 10:38, Eli Zaretskii wrote:
>> From: Augusto Stoffel <arstoffel <at> gmail.com>
>> Cc: joaotavora <at> gmail.com, 61726 <at> debbugs.gnu.org
>> Date: Fri, 24 Feb 2023 08:18:30 +0100
>>
>> On Fri, 24 Feb 2023 at 08:43, Eli Zaretskii wrote:
>>
>> > It does? then please humor me by walking me through the code and the
>> > patch to show how that would work after applying the patch.
>>
>> + :general
>> + (list
>> + :positionEncodings ["utf-32" "utf-8" "utf-16"])
>> :experimental eglot--{})))
>
> Is "UTF-32" an LSP thing and terminology? Because I'd prefer a
> different name if we can. At least for our internal nomenclature,
> let's use "codepoint" or "character" instead.
Yes, this is how the LSP spec refers to the 3 offset counting methods.
>> -(defun eglot-current-column () (- (point) (line-beginning-position)))
>> +(defun eglot-current-column ()
>> + "Calculate current column, counting Unicode codepoints."
>> + (- (point) (line-beginning-position)))
>
> Can we please take this opportunity to get rid of the confusing
> "column" terminology? As became evident from this discussion, we are
> not talking columns here, we are talking offsets in characters from
> BOL. So something like "pos" or "linepos" or "line-offset" should be
> better.
>
> João, are you okay with such a sweeping change in all of eglot.el?
I like linepos, if João is fine with not making the absolute minimal
amount of changes to the code.
>> +(defun eglot--current-column-utf-8 ()
>> + "Calculate current column, counting bytes."
>> + (- (position-bytes (point)) (position-bytes (line-beginning-position))))
>
> As discussed, position-bytes is incorrect. You should instead do
> something like
>
> (length (encode-coding-string
> (buffer-substring-no-properties (point)
> (line-beginning-position))
> 'utf-8-unix t))
But it is incorrect only if the buffer contains characters outside of
the Unicode range, right? If that happens, we already lost, because a
few steps later we will serialize the buffer text as JSON to send it to
the server:
(progn
(insert ?x (max-char) ?y)
(json-serialize (buffer-substring-no-properties (pos-bol)
(pos-eol))))
⇒ Debugger entered--Lisp error: (wrong-type-argument utf-8-string-p " (json-serialize (buffer-substring-no-properties (...")
> Also, for 100% reliable results, we should bind
> inhibit-field-text-motion to t when calling line-beginning-position.
We should rather be using pos-bol, no? But how do we keep compatibility
with older Emacsen?
>> +(defun eglot--move-to-column-utf-8 (column)
>> + "Move to COLUMN, regarded as a byte offset."
>> + (goto-char (min (byte-to-position
>> + (+ (position-bytes (line-beginning-position)) column))
>> + (line-end-position))))
>
> Likewise here.
>
>> @@ -1515,14 +1536,20 @@ eglot--lsp-position-to-point
>> (forward-line (min most-positive-fixnum
>> (plist-get pos-plist :line)))
>> (unless (eobp) ;; if line was excessive leave point at eob
>> - (let ((tab-width 1)
>> + (let ((movefn (or eglot-move-to-column-function
>> + (pcase (plist-get (eglot--capabilities (eglot-current-server))
>> + :positionEncoding)
>> + ("utf-32" #'eglot-move-to-column)
>> + ("utf-8" #'eglot--move-to-column-utf-8)
>> + (_ #'eglot-move-to-lsp-abiding-column))))
>> + (tab-width 1)
> ^^^^^^^^^^^
> This last part shouldn't be necessary: we should move by characters,
> not by columns. Why is it necessary?
Maybe João can clarify, but I'm pretty sure this is there to support the
UTF-16 way of counting offsets, so this ideally should move to
eglot-move-to-lsp-abiding-column.
This bug report was last modified 2 years and 138 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.