#70000 - 29.2; Grapheme handling incorrect

GNU bug report logs - #70000
29.2; Grapheme handling incorrect

Package: emacs;

Reported by: Phillip Susi <phill <at> thesusis.net>

Date: Mon, 25 Mar 2024 18:47:01 UTC

Severity: normal

Tags: notabug, wontfix

Found in version 29.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org> To: jman <jman <at> city17.xyz> Cc: 70000 <at> debbugs.gnu.org, phill <at> thesusis.net Subject: bug#70000: 29.2; Grapheme handling incorrect Date: Sun, 29 Jun 2025 16:29:02 +0300

> From: jman <jman <at> city17.xyz> > Cc: 70000 <at> debbugs.gnu.org, phill <at> thesusis.net > Date: Sun, 29 Jun 2025 14:36:31 +0200 > > Eli Zaretskii <eliz <at> gnu.org> writes: > > > The protocol described in > > > > https://sw.kovidgoyal.net/kitty/text-sizing-protocol/ > > > > if we decide to implement it in Emacs, will need some non-trivial > > changes in how Emacs currently accounts for character width on > > display. That is because this protocol does NOT allow to query the > > terminal about the display width of a string of characters. Instead, > > it allows a program running on the terminal to instruct the terminal > > about the display width it expects to get, and the terminal needs to > > obey. What this means for Emacs is that we will have to add code > > which will determine the expected display width of each composed > > sequence of characters. By contrast, what we have now is that we > > expect the display backend to tell us the display width. > > > > This is important because Emacs has code which performs layout > > calculations by using the display code without actually displaying > > anything. Cursor movement commands in Emacs, and many places within > > the display engine, use these capabilities. When this code runs, it > > needs some way of computing the display width of each glyph that will > > or would be displayed. If we need to compute that ourselves, we will > > need to add such a code, which currently doesn't exist. > > > > Beyond that, there's the issue of how widely will this protocol be > > supported by terminal emulators other than kitty, and what should > > Emacs do when it runs on a terminal which doesn't support this. > > Thank you Eli for the overview. > > I infer we're still at a point with no solution at the horizon (and unfortunately I cannot > contribute one). Even if implementing this protocol were the complete solution, someone would need to code it for Emacs. > Meanwhile, is there a suggested workaround for users of Emacs TTY? The issue is that multi-byte > graphemes clusters are not correctly rendered. I've been suggested to play with > `glyphless-char-display` but (IIUC) it only works with single-bytes graphemes. For example Emacs > `describe-char` reports that the "writing hand" emoji is hex U+270D but the Emojipedia[0] describes > it as (U+270D, U+FE0F), U-FE0F being the VS16 variant selector[1] so I am not sure I can just hide > or replace it with something else. If auto-composition mode is turned ON (it is by default), Emacs expects the terminal to combine the modifier characters (such as U+FE0F) with the preceding base character, producing a single glyph. The width of that glyph is expected to be according to the width of the base character as stored in char-width-table. As long as the terminal behaves as Emacs expects, you should be okay. So the suggested workaround is to find a terminal emulator which behaves like described above or can be forced to behave like that. The sequence U+270D followed by U+FE0F should thus work in most cases. If you are talking about Emoji sequences that include characters which are not modifiers (i.e., they are characters on their own right and have non-zero width in char-width-table), things will generally not work in Emacs, I'm afraid, not without some auxiliary protocol which will allow Emacs to know the display width of an arbitrary sequence of characters.

This bug report was last modified 12 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #70000 29.2; Grapheme handling incorrect

GNU bug report logs - #70000
29.2; Grapheme handling incorrect