#61726 - [PATCH] Eglot: Support positionEncoding capability

GNU bug report logs - #61726
[PATCH] Eglot: Support positionEncoding capability

Package: emacs;

Reported by: Augusto Stoffel <arstoffel <at> gmail.com>

Date: Thu, 23 Feb 2023 08:06:01 UTC

Severity: normal

Tags: patch

Done: João Távora <joaotavora <at> gmail.com>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org> To: Augusto Stoffel <arstoffel <at> gmail.com> Cc: 61726 <at> debbugs.gnu.org, joaotavora <at> gmail.com Subject: bug#61726: [PATCH] Eglot: Support positionEncoding capability Date: Thu, 23 Feb 2023 14:54:50 +0200

> From: Augusto Stoffel <arstoffel <at> gmail.com> > Cc: 61726 <at> debbugs.gnu.org, joaotavora <at> gmail.com > Date: Thu, 23 Feb 2023 12:46:48 +0100 > > >> +(defun eglot--current-column-utf-8 () > >> + "Calculate current column, counting bytes." > >> + (- (position-bytes (point)) (position-bytes (line-beginning-position)))) > > > > This is subtly incorrect: position-bytes doesn't cound UTF-8 bytes, it > > counts the bytes in the internal representation Emacs uses for buffer > > and string text. The differences are minor and subtle, but not > > negligible. > > Right, if the buffer contains a char outside of the Unicode range, we > lose. > > But just to confirm: position-bytes and byte-to-position are always with > respect to Emacs's internal extended UTF-8 representation and have > nothing to do with the buffer file enconding, right? Yes. See bufferpos-to-filepos to get an idea of what hoops we need to jump through to get it right, even just with UTF-8. > > What does this stuff do with double-width or zero-width characters? > > Emacs takes character-width into consideration when it counts columns, > > but it is unclear to me what do LSP servers do in those cases. > > Likewise with characters that are composed on display. > > `eglot-move-to-column' is supposed so count Unicode codepoints, so > e.g. x, ⇒ and 😃 all contribute 1 unit. But if the resulting column is then used in move-to-column etc., it might go to the wrong column, because in Emacs each column is not necessarily a single codepoint. The simplest example is a TAB character, but there are more examples, some of which are quite complicated (see below). > One the other hand, the Emoji > 🧛‍♀️ contributes 4 units. This is independent of with screen display. Not in Emacs. > By the way, I don't undertand your claim about column counting. If I > move point over 🧛‍♀️, the mode line column count increments by 3 units, > which seems to make no sense: this Emoji is 4 codepoints longs and > occupies 1 screen column. What's the logic here? If that is what you see, it could be a bug. Does current-column agree with what you see in the mode line? In general, characters (codepoints) that are composed on display into a single glyph or "grapheme cluster" are supposed to be counted as a single column. Try typing this in "emacs -Q" a C-x 8 RET COMBINING ACUTE ACCENT RET If your default font is capable enough, you will see a single glyph of 'a' with acute accent (á), and it will count as 1 column, although there are 2 codepoints in the buffer. And "M-: (move-to-column 1) RET" will move past both codepoints. Now imagine that we get such sequences from the LSP server -- what will Eglot do in terms of column counting?

This bug report was last modified 2 years and 137 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #61726 [PATCH] Eglot: Support positionEncoding capability

GNU bug report logs - #61726
[PATCH] Eglot: Support positionEncoding capability