GNU bug report logs -
#56237
29.0.50; delete-forward-char fails to delete character
Previous Next
Reported by: visuweshm <at> gmail.com
Date: Sun, 26 Jun 2022 16:08:02 UTC
Severity: normal
Tags: moreinfo
Found in version 29.0.50
Done: Visuwesh <visuweshm <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[திங்கள் ஜூன் 27, 2022] Visuwesh wrote:
> [ஞாயிறு ஜூன் 26, 2022] Eli Zaretskii wrote:
>
>>> From: Visuwesh <visuweshm <at> gmail.com>
>>> Cc: 56237 <at> debbugs.gnu.org
>>> Date: Sun, 26 Jun 2022 22:36:31 +0530
>>>
>>> > Invoke find-composition, and you will see that it returns a single
>>> > composition there.
>>>
>>> If find-composition is indeed right, then the return value is very
>>> unintuvitive as a native speaker: ப் and போ are two separate characters
>>> and combining them into a single cluster is weird...
>>
>> Maybe you are right, but then Someone(TM) will have to either modify
>> find-composition or explain how to interpret its return value
>> differently from what we do now. What is now in delete-forward-char
>> expresses my level of knowledge in this area, which admittedly is
>> limited.
>>
>
> Turns out that Someone™ was closer to us than I thought: describe-char.
> With a bit of edebug and reading the code in composition.h (for the
> LGLYPH_* macros) and defsubst's in composite.el, I think I figured out
> the logic:
>
> We need to call find-composition with a non-nil DETAIL-P argument to get
> the gstring. The gstring contains the glyphs that will be used to
> construct the grapheme cluster [1]. According to composition.h, those
> glyphs which have the same FROM and TO indices are part of the same
> grapheme cluster so to get the actual length of individual codepoints,
> we need to calculate the number of glyphs which have an equal FROM and
> TO indices.
>
> Understanding all this, I came up with the following code:
>
> (let* ((composition (find-composition 0 nil "ப்போ" t))
> (gstring (nth 2 composition))
> (num-glyphs (lgstring-glyph-len gstring))
> (i 1)
> (from (lglyph-from (lgstring-glyph gstring 0)))
> (to (lglyph-to (lgstring-glyph gstring 0))))
> (while (and (< i num-glyphs)
> (= from (lglyph-from (lgstring-glyph gstring i)))
> (= to (lglyph-to (lgstring-glyph gstring i))))
> (setq i (1+ i)))
> i)
>
> here i is the number of characters we need to delete using delete-char.
>
> [1] For the gstring format, see composition-get-gstring.
>
> But I think we should test this code in cases where a grapheme cluster
> contains more than two codepoints since all the composed characters in
> Tamil are made up of two Unicode codepoints. I can't test it on emojis
> since I don't know of an Emoji font that won't crash potentially Xft and
> has enough coverage.
>
I got my hopes too high. :(
This fails for the simple case of ரு (C-u C-x = also fails!) so I guess
we are back to square one. Although ரு is composed from 0BB0 0BC1, the
gstring only has one glyph.
This bug report was last modified 2 years and 312 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.