GNU bug report logs -
#63731
[PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate
Previous Next
Reported by: Steven Allen <steven <at> stebalien.com>
Date: Fri, 26 May 2023 03:19:01 UTC
Severity: normal
Tags: fixed, patch
Fixed in version 29.1
Done: Robert Pluim <rpluim <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
> Date: Thu, 01 Jun 2023 15:30:18 +0200
>
> Eli> OK, the issue is quite clear even without stepping with a debugger.
>
> Eli> Bottom line: we cannot support a situation where the same character
> Eli> can be composed by more than one slot in composition-function-table.
> Eli> If there are more than a single slot for the same character, one of
> Eli> them will be tried, and the rest will be ignored (not even tried).
> Eli> In particular, if a character CH has a "forward" composition rule that
> Eli> starts with itself, and also has a "backward" rule (one with non-zero
> Eli> look-back parameter) triggered by a different character (which should
> Eli> follow CH), the latter rule will never be tried.
>
> OK, that makes sense. Where would be a good place to document this?
In the doc string of composition-function-table, I think. We already
document there the caveat of arranging rules in descending order of
look-back, which is part of the same "misfeature".
> Eli> Which means that to have #xFE0F compose correctly with Emoji
> Eli> codepoints, we should include #xFE0F in the sequences in emoji-zwj.el.
>
> Thatʼs easy enough:
>
> diff --git a/admin/unidata/emoji-zwj.awk b/admin/unidata/emoji-zwj.awk
> index 7d2ff6cb900..d1195ebbad8 100644
> --- a/admin/unidata/emoji-zwj.awk
> +++ b/admin/unidata/emoji-zwj.awk
> @@ -106,7 +106,8 @@ END {
>
> for (elt in ch)
> {
> - printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n", elt, vec[elt])
> + entries = sprintf("%s\n\"\\N{U+%s}\\N{U+FE0F}\"", vec[elt], elt)
> + printf("(#x%s .\n,(eval-when-compile (regexp-opt\n'(\n%s\n))))\n", elt, entries)
> }
> print "))"
> print " (set-char-table-range composition-function-table"
>
> That makes all the VS-16 sequences in
> admin/unidata/emoji-variation-sequences.txt display with the emoji
> font for me.
Ready to install this on the emacs-29 branch?
> Eli> The reason why "C-u C-x =" lies to us saying there's a composition
> Eli> where really there isn't is because descr-text.el uses the
> Eli> find-composition primitive, whose implementation is parallel and
> Eli> separate from that of the display-engine routines, and is structured
> Eli> differently. So find-composition does succeed to detect the second
> Eli> rule, the one triggered by #xFE0F, which the display engine ignores.
> Eli> I will think whether this can be fixed, to avoid such false positives,
> Eli> but if we accept that there can be only one set of composition rules
> Eli> for a character, then we basically invoked undefined behavior here,
> Eli> and we got what we deserved.
>
> If find-composition DTRT, could we not use it in the display engine?
Not easily, because the display code calls subroutines of
find-composition in a certain order, and that's what causes the
behavior I described.
And even if we could make this happen, I'm not sure we should:
basically, having multiple matching slots would mean users and callers
will never be sure which one "wins".
This bug report was last modified 1 year and 350 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.