GNU bug report logs -
#61726
[PATCH] Eglot: Support positionEncoding capability
Previous Next
Reported by: Augusto Stoffel <arstoffel <at> gmail.com>
Date: Thu, 23 Feb 2023 08:06:01 UTC
Severity: normal
Tags: patch
Done: João Távora <joaotavora <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
Message #26 received at 61726 <at> debbugs.gnu.org (full text, mbox):
> From: Augusto Stoffel <arstoffel <at> gmail.com>
> Cc: 61726 <at> debbugs.gnu.org, joaotavora <at> gmail.com
> Date: Thu, 23 Feb 2023 12:46:48 +0100
>
> >> +(defun eglot--current-column-utf-8 ()
> >> + "Calculate current column, counting bytes."
> >> + (- (position-bytes (point)) (position-bytes (line-beginning-position))))
> >
> > This is subtly incorrect: position-bytes doesn't cound UTF-8 bytes, it
> > counts the bytes in the internal representation Emacs uses for buffer
> > and string text. The differences are minor and subtle, but not
> > negligible.
>
> Right, if the buffer contains a char outside of the Unicode range, we
> lose.
>
> But just to confirm: position-bytes and byte-to-position are always with
> respect to Emacs's internal extended UTF-8 representation and have
> nothing to do with the buffer file enconding, right?
Yes. See bufferpos-to-filepos to get an idea of what hoops we need to
jump through to get it right, even just with UTF-8.
> > What does this stuff do with double-width or zero-width characters?
> > Emacs takes character-width into consideration when it counts columns,
> > but it is unclear to me what do LSP servers do in those cases.
> > Likewise with characters that are composed on display.
>
> `eglot-move-to-column' is supposed so count Unicode codepoints, so
> e.g. x, ⇒ and 😃 all contribute 1 unit.
But if the resulting column is then used in move-to-column etc., it
might go to the wrong column, because in Emacs each column is not
necessarily a single codepoint. The simplest example is a TAB
character, but there are more examples, some of which are quite
complicated (see below).
> One the other hand, the Emoji
> 🧛♀️ contributes 4 units. This is independent of with screen display.
Not in Emacs.
> By the way, I don't undertand your claim about column counting. If I
> move point over 🧛♀️, the mode line column count increments by 3 units,
> which seems to make no sense: this Emoji is 4 codepoints longs and
> occupies 1 screen column. What's the logic here?
If that is what you see, it could be a bug. Does current-column agree
with what you see in the mode line?
In general, characters (codepoints) that are composed on display into
a single glyph or "grapheme cluster" are supposed to be counted as a
single column. Try typing this in "emacs -Q"
a C-x 8 RET COMBINING ACUTE ACCENT RET
If your default font is capable enough, you will see a single glyph of
'a' with acute accent (á), and it will count as 1 column, although
there are 2 codepoints in the buffer. And "M-: (move-to-column 1) RET"
will move past both codepoints. Now imagine that we get such sequences
from the LSP server -- what will Eglot do in terms of column counting?
This bug report was last modified 2 years and 139 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.