#70000 - 29.2; Grapheme handling incorrect

GNU bug report logs - #70000
29.2; Grapheme handling incorrect

Package: emacs;

Reported by: Phillip Susi <phill <at> thesusis.net>

Date: Mon, 25 Mar 2024 18:47:01 UTC

Severity: normal

Tags: notabug, wontfix

Found in version 29.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Message #22 received at 70000 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 70000 <at> debbugs.gnu.org, Phillip Susi <phill <at> thesusis.net> Subject: Re: bug#70000: 29.2; Grapheme handling incorrect Date: Fri, 28 Feb 2025 19:04:51 -0800

Eli Zaretskii <eliz <at> gnu.org> writes: >> From: Phillip Susi <phill <at> thesusis.net> >> Cc: 70000 <at> debbugs.gnu.org >> Date: Wed, 27 Mar 2024 10:11:30 -0400 >> >> Eli Zaretskii <eliz <at> gnu.org> writes: >> >> > Querying the cursor position won't help in this case because it is >> > Emacs that moves the cursor when you type C-f, not the terminal. >> >> I'm not talking about C-f, but simply displaying the characters on the >> screen. Emacs assumes the width is 4 when it prints this character, and >> so it thinks that the cursor moved over 4 places. When the terminal >> actually only moves the cursor over 2 spaces, emacs gets out of sync >> with the terminal, and massive breakage occurs. > > I understand what you are saying, but this is not how Emacs display > code works. It needs to know the width of every character displayed > on the screen, and it needs to be able to determine that even without > actually displaying the character. > > When Emacs is about to redraw some portion of the screen, it moves the > cursor to that place. To be able to move the cursor there, it needs > to be able to compute the coordinates on the screen of every character > that is currently shown, so it can construct the command for the > terminal driver to move cursor to that place. If Emacs were to rely > on displaying characters for that, it would have needed to constantly > redraw large portions of the screen, and that would both be much > slower and cause unpleasant flickering of the display, due to > redrawing of screen portions that don't actually change. > > So this technique is out of the question for Emacs. > >> By reading back the cursor position from the terminal after displaying a >> grapheme cluster, it would learn how the terminal displayed it and >> update its idea of where the cursor is correctly. > > I understand. But Emacs needs this information also long after the > characters were already drawn. For example, imagine that Emacs > displays these characters on the screen, and then leaves most of the > screen intact and periodically redraws some small portion of the > screen, like updating current time in the lower-right corner of the > screen when Emacs is otherwise idle. To do that, Emacs needs to move > the cursor from its current position somewhere on the screen to the > lower-right corner, redraw the time there, then move the cursor back > to where it was. These cursor moves are based on the ability to > calculate the geometry of each character on display without actually > writing the characters to the screen. > > In addition, if Emacs had to query the cursor position after each > written character, its redisplay would be much slower than it is now. > >> I originally ran into this problem not with a ZWJ, but with an emoji >> followed by alternate selector 16 that someone used in a subject line of >> an email, and when browsing my inbox with notmuch, the terminal went >> FUBAR. > > Yes, that's a known issue with some of the terminal emulators that > compose Emoji and other similar character sequences into grapheme > clusters, while ignoring the width that is expected from the result. > I'm not aware of any good solution, unfortunately. Sometimes, > disabling auto-composition-mode helps, but even that cannot solve all > the problems, especially when each of the characters composed by the > terminal into a single grapheme cluster has non-zero width according > to the Unicode tables. (If only the first character in the composed > sequence has non-zero width and the rest are zero-width, disabling > auto-composition-mode might produce a correct display.) > > The bottom line is what I said at the beginning: we need some protocol > by which a terminal emulator could be queried about whether it > supports character composition, and if so, what is the screen width of > a given sequence of codepoints that will be composed, without actually > displaying them. Better yet, some standard table of such widths could > be accepted by complying terminal emulators, and then Emacs could use > such a table to know the width in advance (similarly to how it knows > that from the Unicode data files). > > Until such protocols or tables exist, Emacs will be unable to produce > correct display on these terminal emulators. It seems to me like this should be closed as a wontfix?

This bug report was last modified 12 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #70000 29.2; Grapheme handling incorrect

GNU bug report logs - #70000
29.2; Grapheme handling incorrect