GNU bug report logs -
#70000
29.2; Grapheme handling incorrect
Previous Next
Reported by: Phillip Susi <phill <at> thesusis.net>
Date: Mon, 25 Mar 2024 18:47:01 UTC
Severity: normal
Tags: notabug, wontfix
Found in version 29.2
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
Message #22 received at 70000 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Phillip Susi <phill <at> thesusis.net>
>> Cc: 70000 <at> debbugs.gnu.org
>> Date: Wed, 27 Mar 2024 10:11:30 -0400
>>
>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>
>> > Querying the cursor position won't help in this case because it is
>> > Emacs that moves the cursor when you type C-f, not the terminal.
>>
>> I'm not talking about C-f, but simply displaying the characters on the
>> screen. Emacs assumes the width is 4 when it prints this character, and
>> so it thinks that the cursor moved over 4 places. When the terminal
>> actually only moves the cursor over 2 spaces, emacs gets out of sync
>> with the terminal, and massive breakage occurs.
>
> I understand what you are saying, but this is not how Emacs display
> code works. It needs to know the width of every character displayed
> on the screen, and it needs to be able to determine that even without
> actually displaying the character.
>
> When Emacs is about to redraw some portion of the screen, it moves the
> cursor to that place. To be able to move the cursor there, it needs
> to be able to compute the coordinates on the screen of every character
> that is currently shown, so it can construct the command for the
> terminal driver to move cursor to that place. If Emacs were to rely
> on displaying characters for that, it would have needed to constantly
> redraw large portions of the screen, and that would both be much
> slower and cause unpleasant flickering of the display, due to
> redrawing of screen portions that don't actually change.
>
> So this technique is out of the question for Emacs.
>
>> By reading back the cursor position from the terminal after displaying a
>> grapheme cluster, it would learn how the terminal displayed it and
>> update its idea of where the cursor is correctly.
>
> I understand. But Emacs needs this information also long after the
> characters were already drawn. For example, imagine that Emacs
> displays these characters on the screen, and then leaves most of the
> screen intact and periodically redraws some small portion of the
> screen, like updating current time in the lower-right corner of the
> screen when Emacs is otherwise idle. To do that, Emacs needs to move
> the cursor from its current position somewhere on the screen to the
> lower-right corner, redraw the time there, then move the cursor back
> to where it was. These cursor moves are based on the ability to
> calculate the geometry of each character on display without actually
> writing the characters to the screen.
>
> In addition, if Emacs had to query the cursor position after each
> written character, its redisplay would be much slower than it is now.
>
>> I originally ran into this problem not with a ZWJ, but with an emoji
>> followed by alternate selector 16 that someone used in a subject line of
>> an email, and when browsing my inbox with notmuch, the terminal went
>> FUBAR.
>
> Yes, that's a known issue with some of the terminal emulators that
> compose Emoji and other similar character sequences into grapheme
> clusters, while ignoring the width that is expected from the result.
> I'm not aware of any good solution, unfortunately. Sometimes,
> disabling auto-composition-mode helps, but even that cannot solve all
> the problems, especially when each of the characters composed by the
> terminal into a single grapheme cluster has non-zero width according
> to the Unicode tables. (If only the first character in the composed
> sequence has non-zero width and the rest are zero-width, disabling
> auto-composition-mode might produce a correct display.)
>
> The bottom line is what I said at the beginning: we need some protocol
> by which a terminal emulator could be queried about whether it
> supports character composition, and if so, what is the screen width of
> a given sequence of codepoints that will be composed, without actually
> displaying them. Better yet, some standard table of such widths could
> be accepted by complying terminal emulators, and then Emacs could use
> such a table to know the width in advance (similarly to how it knows
> that from the Unicode data files).
>
> Until such protocols or tables exist, Emacs will be unable to produce
> correct display on these terminal emulators.
It seems to me like this should be closed as a wontfix?
This bug report was last modified 83 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.