GNU bug report logs - #39799
28.0.50; Most emoji sequences don’t render correctly

Previous Next

Package: emacs;

Reported by: Mike FABIAN <mfabian <at> redhat.com>

Date: Wed, 26 Feb 2020 14:30:03 UTC

Severity: normal

Found in version 28.0.50

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


Message #221 received at 39799 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 39799 <at> debbugs.gnu.org, mfabian <at> redhat.com
Subject: Re: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date: Tue, 21 Sep 2021 12:16:38 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: rgm <at> gnu.org,  39799 <at> debbugs.gnu.org,  mfabian <at> redhat.com
> Date: Mon, 20 Sep 2021 22:38:28 +0200
> 
> Iʼve just pushed a change to master that should fix (almost) all the
> issues with displaying emoji sequences (except for keycaps). Feedback
> welcome.

Thanks, this is mostly okay, IMO.  the only issue I have with this is
here:

> --- a/admin/unidata/blocks.awk
> +++ b/admin/unidata/blocks.awk
> @@ -221,6 +221,46 @@ FILENAME ~ "emoji-data.txt" && /^[0-9A-F].*; Emoji_Presentation / {
>  }
>  
>  END {
> +    ## These codepoints have Emoji_Presentation = No, but they are
> +    ## used in emoji-sequences.txt and emoji-zwj-sequences.txt (with a
> +    ## Variation Selector), so force them into the emoji script so
> +    ## they will get composed correctly.  FIXME: delete this when we
> +    ## can change the font used for a codepoint based on whether it's
> +    ## followed by a VS (usually VS-16)
> +    idx = 0
> +    override_start[idx] = "261D"
> +    override_end[idx] = "261D"
> +    idx++
> +    override_start[idx] = "26F9"
> +    override_end[idx] = "26F9"
> +    idx++
> +    override_start[idx] = "270C"
> +    override_end[idx] = "270D"
> +    idx++
> +    override_start[idx] = "2764"
> +    override_end[idx] = "2764"
> +    idx++
> +    override_start[idx] = "1F3CB"
> +    override_end[idx] = "1F3CC"
> +    idx++
> +    override_start[idx] = "1F3F3"
> +    override_end[idx] = "1F3F4"
> +    idx++
> +    override_start[idx] = "1F441"
> +    override_end[idx] = "1F441"
> +    idx++
> +    override_start[idx] = "1F575"
> +    override_end[idx] = "1F575"
> +
> +    for (k in override_start)
> +    {
> +        i++
> +        start[i] = override_start[k]
> +        end[i] = override_end[k]
> +        alt[i] = "emoji"
> +        name[i] = "Autogenerated emoji (override)"
> +    }

Specifically, the U+2xxx codepoints are now in the 'emoji' script,
which I think is undesirable, even if the price is that we won't
support the sequences in which those codepoints are followed by
VS-16.  So I think we should remove those codepoints from the above,
leaving only the U+1Fxxx" ones.

Btw, currently U+261D followed by VS-16 doesn't compose for me,
probably because compose-gstring-for-variation-glyph is hardcoded to
work only for Han characters, and U+261D isn't, or because that
function is not suited to VS-16 (it looks for glyph variations in the
font)?  Or am I missing something?

Now to my idea of supporting those "U+2xxx VS-16" sequences without
assigning them to the 'emoji' script:

The function autocmp_chars uses font_range to find whether the
sequence of characters that can be composed are supported by the same
font.  It currently takes the first character of the sequence, calls
font_for_char for it, then checks that all the rest of the characters
are supported by that font by calling font_encode_char.  In our case,
the first character of the sequence is U+2xxx, which is not in the
'emoji' script, so Emacs is likely to pick up a font that doesn't
support Emoji, and the composition will fail.  To avoid that, I
propose the following change:

  . add a new argument to font_range, the codepoint that triggered the
    composition
  . inside font_range, if that codepoint belongs to the 'emoji' script
    (use char-script-table to find that out), call font_for_char with
    a representative character for 'emoji' (from
    script-representative-chars) instead of the first character of the
    sequence, then check that all the sequence characters, including
    the first one, can be supported by that font; if they can, return
    that font to the caller, to be used for the composition

WDYT?

Btw, if you use Firefox or Chrome, or some other application that can
show Emoji sequences, or maybe just use HarfBuzz's hb-view, how does
the display of the U+2xxx changes when they are followed by VS-16?  Is
the change prominent enough for us to try to support it?  If not,
perhaps the above should be left out for the moment.




This bug report was last modified 3 years and 256 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.