On Wed, Sep 3, 2025 at 9:10 PM Eli Zaretskii <eliz@gnu.org> wrote:

That's the idea, yes.  It would mean to have a function in hbfont.c
that is the subset of hbfont_shape, and which accepts a single
character (not a Lisp string) and a font, and then constructs the
hb_buffer and submits that to hb_shape_full.

But please test if this should give good results by simulating it, as
follows:

 . make composition-function-table whose cells for several characters
   match only that one character, and see how a string of such
   characters is rendered using a font with relevant OpenType features
 . then compare that with rendering when composition-function-table
   has the same rule in the cell of each of those characters, matching
   any sequence of these characters (as in "[abcdefg]+")

If applying stylistic sets by rendering text one character at a time
produces different results from rendering them all as a single string,
then this idea is not workable, and we will need to use the (slower
and more complex) composite.c machinery instead.

If the idea does work, then presumably a change in
get_glyph_face_and_encoding for characters that have this special
face attribute will be all that's needed, perhaps together with some
flag in the 'struct it' to make that faster.  Details later, when we
know whether the idea works or not.


I compared the result between

#+begin_src elisp
  (set-char-table-range composition-function-table
    ?!
    '(["\\(!==\\)" 0 font-shape-gstring]))
#+end_src

and

#+begin_src elisp
  (set-char-table-range composition-function-table
    ?!
    '(["\\(!\\)" 0 font-shape-gstring]))

  (set-char-table-range composition-function-table
    ?=
    '(["\\(=\\)" 0 font-shape-gstring]))
#+end_src

They are different, only matching the sequence produces the desired result for multi-character ligatures.


I read the hbfont.c code and the hb buffer is cleared every time handling the shaping. I think it makes sense that it
should not store the state of the Emacs buffer in hb buffer, and HarfBuzz needs to know the whole sequence to shape
according to their document.

I did some research on how other programs make use of HarfBuzz. They typically put an entire paragraph or put a line
into the shaping function. It is quite an interesting way for Emacs to detect a sequence first and specifically shape
that sequence using HarfBuzz. It might be historical reason but it seems a lot more work needs to be done in composite.c
or we need to figure out something better.