GNU bug report logs - #63731
[PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate

Previous Next

Package: emacs;

Reported by: Steven Allen <steven <at> stebalien.com>

Date: Fri, 26 May 2023 03:19:01 UTC

Severity: normal

Tags: fixed, patch

Fixed in version 29.1

Done: Robert Pluim <rpluim <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #113 received at 63731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: rpluim <at> gmail.com
Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
Subject: Re: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F)
 where appropriate
Date: Thu, 01 Jun 2023 15:43:26 +0300
> Cc: 63731 <at> debbugs.gnu.org, steven <at> stebalien.com
> Date: Wed, 31 May 2023 19:18:22 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
> > From: Robert Pluim <rpluim <at> gmail.com>
> > Cc: 63731 <at> debbugs.gnu.org,  steven <at> stebalien.com
> > Date: Wed, 31 May 2023 18:11:36 +0200
> > 
> >     Eli> So there are two issues here: (a) why there's no composition in the
> >     Eli> first case, and (b) why does "C-u C-x =" says there is when there
> >     Eli> isn't.
> > 
> > OK. I can poke around in gdb if you give me some idea of what I should
> > be looking at.
> 
> I don't really know.  I plan to just step through the code in
> composite.c tomorrow, unless you beat me to it.  Once we understand
> issue (a), I think we will also understand issue (b).

OK, the issue is quite clear even without stepping with a debugger.

Bottom line: we cannot support a situation where the same character
can be composed by more than one slot in composition-function-table.
If there are more than a single slot for the same character, one of
them will be tried, and the rest will be ignored (not even tried).
In particular, if a character CH has a "forward" composition rule that
starts with itself, and also has a "backward" rule (one with non-zero
look-back parameter) triggered by a different character (which should
follow CH), the latter rule will never be tried.

This is what happens in this case: the character #x1F44D has several
rules that start with itself in emoji-zwj.el:

  (#x1F44D .
  ,(eval-when-compile (regexp-opt
   '(
   "\N{U+1F44D}\N{U+1F3FB}"
   "\N{U+1F44D}\N{U+1F3FC}"
   "\N{U+1F44D}\N{U+1F3FD}"
   "\N{U+1F44D}\N{U+1F3FE}"
   "\N{U+1F44D}\N{U+1F3FF}"
   ))))

and it also has a "backward" rule:

  (set-char-table-range
   composition-function-table
   #xFE0F '(["\\c.\ufe0f" 1 font-shape-gstring]))

The latter is triggered by #xFE0F and has a 1-character look-back,
which will match #x1F44D, since its category is '.' (it's a "base
character").  This latter rule is never tried.  Why? because the
former rules, anchored at #X1F44D, are tried first (Emacs redisplay
examines characters in the order of their buffer positions), and fail
to match.  When those rules fail to match, due to how the
composition-related functions called by the display engine are
factored, we never again consider compositions triggered by a later
character which "cover" also #x1F44D: once that position was examined
and the attempted composition failed, we move to the next character.
IOW, we assume that this first set of composition rules we find for a
given character are the only ones that could possibly be relevant for
that character.

Which means that to have #xFE0F compose correctly with Emoji
codepoints, we should include #xFE0F in the sequences in emoji-zwj.el.

The reason why "C-u C-x =" lies to us saying there's a composition
where really there isn't is because descr-text.el uses the
find-composition primitive, whose implementation is parallel and
separate from that of the display-engine routines, and is structured
differently.  So find-composition does succeed to detect the second
rule, the one triggered by #xFE0F, which the display engine ignores.
I will think whether this can be fixed, to avoid such false positives,
but if we accept that there can be only one set of composition rules
for a character, then we basically invoked undefined behavior here,
and we got what we deserved.

Thanks.




This bug report was last modified 1 year and 350 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.