GNU bug report logs - #63731
[PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate

Previous Next

Package: emacs;

Reported by: Steven Allen <steven <at> stebalien.com>

Date: Fri, 26 May 2023 03:19:01 UTC

Severity: normal

Tags: fixed, patch

Fixed in version 29.1

Done: Robert Pluim <rpluim <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63731 <at> debbugs.gnu.org, Steven Allen <steven <at> stebalien.com>
Subject: bug#63731: [PATCH] Support Emoji Variation Sequence 16 (FE0F) where appropriate
Date: Fri, 26 May 2023 10:34:02 +0200
Disclaimer: I havenʼt looked at the patch yet

>>>>> On Fri, 26 May 2023 09:41:42 +0300, Eli Zaretskii <eliz <at> gnu.org> said:

    >> From: Steven Allen <steven <at> stebalien.com>
    >> Date: Thu, 25 May 2023 20:18:02 -0700
    >> 
    >> This patch imports the full list from unicode.org instead of
    >> special-casing a few characters as was done previously.
    >> 
    >> With this patch, '👍️' (1F44D FE0F) should look the same as '👍' (1F44D).
    >> Without it, it will look like '👍‌️'.
    >> 
    >> As a simple regression test, '✔' (2714) should still as "text" while '✔️'
    >> (2714 FE0F) should still display as an emoji.
    >> 
    >> Fixes https://github.com/alphapapa/ement.el/issues/137
    >> 
    >> NOTE: I'm not a Unicode expert, nor do I understand how Emacs handles
    >> Unicode (beyond what was required to implement this patch). But this
    >> patch appears to work and I can't find any regressions.

    Eli> AFAIU, this change will populate composition-function-table for many
    Eli> "normal" characters, including ASCII digits and symbol/punctuation
    Eli> characters from the 0x2xxx blocks.  E.g., after you build Emacs with
    Eli> this patch, what do the following evaluations yield:

    Eli>   M-: (aref composition-function-table ?0) RET
    Eli>   M-: (aref composition-function-table #x2122) RET

    Eli> If they yield non-nil values, it could mean dramatic slowdown of
    Eli> redisplay with these characters.  Which is precisely what we wanted to
    Eli> avoid when we made the decision which parts of the Unicode-defined
    Eli> Emoji sequences to support in Emacs, and how to arrange for that
    Eli> support to work.

Yes. We donʼt want to do composition checks for ASCII if we can avoid it.

    Eli> The issue you site is strange: according to the "C-u C-x =" display
    Eli> there, Emacs did compose #x1f44d with VS-16 using the Noto Color Emoji
    Eli> font, so I don't quite understand why VS-16 is then also shown as an
    Eli> empty rectangle.  On my system Noto Color Emoji doesn't work, and "C-u
    Eli> C-x =" says this instead:

    Eli>   Composed with the following character(s) "️" using this font:
    Eli>     harfbuzz:-outline-Noto Emoji-regular-normal-normal-mono-15-*-*-*-c-*-iso10646-1
    Eli>   by these glyphs:
    Eli>     [0 1 128077 422 19 2 17 14 2 nil]
    Eli>     [0 1 65039 3 19 0 1 0 1 [0 0 0]]
    Eli>   with these character(s):
    Eli>     ️ (#xfe0f) VARIATION SELECTOR-16

    Eli> which explains why I see two glyphs and not 1.  But in the display
    Eli> shown in the above issue, I see

    Eli>   Composed with the following character(s) "️" using this font:
    Eli>     ftcrhb:-GOOG-Noto Color Emoji-regular-normal-normal-*-18-*-*-*-m-0-iso10646-1
    Eli>   by these glyphs:
    Eli>     [0 1 128077 569 22 0 23 17 5 [0 0 136]]
    Eli>   with these character(s):
    Eli>     ️ (#xfe0f) VARIATION SELECTOR-16

    Eli> which describes only one glyph, not two.  So the result ought to be
    Eli> what you expect.

I see the emoji followed by a blank box with Noto Color Emoji here. I
donʼt yet understand why.

    Eli> Robert, what am I missing here?

1F44D FE0F is a valid sequence according to tr51

(aref composition-function-table #x1f44d)
=> (["\\(?:👍[🏻-🏿]\\)" 0 compose-gstring-for-graphic])

which means that the composition is being triggered by this entry:

(aref composition-function-table #xfe0f)
=> (["\\c.\\c^+" 1 compose-gstring-for-graphic] [nil 0 compose-gstring-for-graphic])

(time passes)

Ugh. The following fixes it for me:

diff --git a/lisp/composite.el b/lisp/composite.el
index fb8b76114f4..af86d1436d3 100644
--- a/lisp/composite.el
+++ b/lisp/composite.el
@@ -756,7 +756,7 @@ compose-gstring-for-dotted-circle
 ;; Allow for bootstrapping without uni-*.el.
 (when unicode-category-table
   (let ((elt `([,(purecopy "\\c.\\c^+") 1 compose-gstring-for-graphic]
-	       [nil 0 compose-gstring-for-graphic])))
+	       )))
     (map-char-table
      #'(lambda (key val)
 	 (if (memq val '(Mn Mc Me))

Although the following is less invasive:

diff --git a/lisp/composite.el b/lisp/composite.el
index fb8b76114f4..333428f008a 100644
--- a/lisp/composite.el
+++ b/lisp/composite.el
@@ -762,6 +762,11 @@ compose-gstring-for-dotted-circle
 	 (if (memq val '(Mn Mc Me))
 	     (set-char-table-range composition-function-table key elt)))
      unicode-category-table))
+  ;; for Emoji presentation selector
+  (set-char-table-range
+   composition-function-table
+   #xFE0F
+    `([,(purecopy "\\c.\ufe0f") 1 compose-gstring-for-graphic]))
   ;; for dotted-circle
   (aset composition-function-table #x25CC
 	`([,(purecopy ".\\c^") 0 compose-gstring-for-dotted-circle]))

Didnʼt we conclude that composition had some issues with multiple
entries for the same codepoint if there was a mix for forward and
backward looking regexp?

Robert
-- 




This bug report was last modified 1 year and 350 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.