#70000 - 29.2; Grapheme handling incorrect

GNU bug report logs - #70000
29.2; Grapheme handling incorrect

Package: emacs;

Reported by: Phillip Susi <phill <at> thesusis.net>

Date: Mon, 25 Mar 2024 18:47:01 UTC

Severity: normal

Tags: notabug, wontfix

Found in version 29.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Message #27 received at control <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org> To: Stefan Kangas <stefankangas <at> gmail.com> Cc: 70000 <at> debbugs.gnu.org, phill <at> thesusis.net Subject: Re: bug#70000: 29.2; Grapheme handling incorrect Date: Sat, 01 Mar 2025 11:33:53 +0200

tags 70000 wontfix close 70000 thanks > From: Stefan Kangas <stefankangas <at> gmail.com> > Date: Fri, 28 Feb 2025 19:04:51 -0800 > Cc: Phillip Susi <phill <at> thesusis.net>, 70000 <at> debbugs.gnu.org > > Eli Zaretskii <eliz <at> gnu.org> writes: > > >> From: Phillip Susi <phill <at> thesusis.net> > >> Cc: 70000 <at> debbugs.gnu.org > >> Date: Wed, 27 Mar 2024 10:11:30 -0400 > >> > >> Eli Zaretskii <eliz <at> gnu.org> writes: > >> > >> > Querying the cursor position won't help in this case because it is > >> > Emacs that moves the cursor when you type C-f, not the terminal. > >> > >> I'm not talking about C-f, but simply displaying the characters on the > >> screen. Emacs assumes the width is 4 when it prints this character, and > >> so it thinks that the cursor moved over 4 places. When the terminal > >> actually only moves the cursor over 2 spaces, emacs gets out of sync > >> with the terminal, and massive breakage occurs. > > > > I understand what you are saying, but this is not how Emacs display > > code works. It needs to know the width of every character displayed > > on the screen, and it needs to be able to determine that even without > > actually displaying the character. > > > > When Emacs is about to redraw some portion of the screen, it moves the > > cursor to that place. To be able to move the cursor there, it needs > > to be able to compute the coordinates on the screen of every character > > that is currently shown, so it can construct the command for the > > terminal driver to move cursor to that place. If Emacs were to rely > > on displaying characters for that, it would have needed to constantly > > redraw large portions of the screen, and that would both be much > > slower and cause unpleasant flickering of the display, due to > > redrawing of screen portions that don't actually change. > > > > So this technique is out of the question for Emacs. > > > >> By reading back the cursor position from the terminal after displaying a > >> grapheme cluster, it would learn how the terminal displayed it and > >> update its idea of where the cursor is correctly. > > > > I understand. But Emacs needs this information also long after the > > characters were already drawn. For example, imagine that Emacs > > displays these characters on the screen, and then leaves most of the > > screen intact and periodically redraws some small portion of the > > screen, like updating current time in the lower-right corner of the > > screen when Emacs is otherwise idle. To do that, Emacs needs to move > > the cursor from its current position somewhere on the screen to the > > lower-right corner, redraw the time there, then move the cursor back > > to where it was. These cursor moves are based on the ability to > > calculate the geometry of each character on display without actually > > writing the characters to the screen. > > > > In addition, if Emacs had to query the cursor position after each > > written character, its redisplay would be much slower than it is now. > > > >> I originally ran into this problem not with a ZWJ, but with an emoji > >> followed by alternate selector 16 that someone used in a subject line of > >> an email, and when browsing my inbox with notmuch, the terminal went > >> FUBAR. > > > > Yes, that's a known issue with some of the terminal emulators that > > compose Emoji and other similar character sequences into grapheme > > clusters, while ignoring the width that is expected from the result. > > I'm not aware of any good solution, unfortunately. Sometimes, > > disabling auto-composition-mode helps, but even that cannot solve all > > the problems, especially when each of the characters composed by the > > terminal into a single grapheme cluster has non-zero width according > > to the Unicode tables. (If only the first character in the composed > > sequence has non-zero width and the rest are zero-width, disabling > > auto-composition-mode might produce a correct display.) > > > > The bottom line is what I said at the beginning: we need some protocol > > by which a terminal emulator could be queried about whether it > > supports character composition, and if so, what is the screen width of > > a given sequence of codepoints that will be composed, without actually > > displaying them. Better yet, some standard table of such widths could > > be accepted by complying terminal emulators, and then Emacs could use > > such a table to know the width in advance (similarly to how it knows > > that from the Unicode data files). > > > > Until such protocols or tables exist, Emacs will be unable to produce > > correct display on these terminal emulators. > > It seems to me like this should be closed as a wontfix? Yes, now done.

This bug report was last modified 12 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #70000 29.2; Grapheme handling incorrect

GNU bug report logs - #70000
29.2; Grapheme handling incorrect