GNU bug report logs - #20173
24.4; Rendering misallocates combining marks on ligatures

Previous Next

Package: emacs;

Reported by: Richard Wordingham <richard.wordingham <at> ntlworld.com>

Date: Mon, 23 Mar 2015 01:07:02 UTC

Severity: normal

Found in version 24.4

Done: Stefan Kangas <stefan <at> marxist.se>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 20173 in the body.
You can then email your comments to 20173 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#20173; Package emacs. (Mon, 23 Mar 2015 01:07:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Richard Wordingham <richard.wordingham <at> ntlworld.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 23 Mar 2015 01:07:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.4; Rendering misallocates combining marks on ligatures
Date: Mon, 23 Mar 2015 01:06:26 +0000
When a ligature of two base characters has two combining marks on the
first component but none on the second, the second combining mark is
rendered as though it applied to the second component. A good example
is the Arabic sequence لَّا (lam, shadda, fatha, alef - <U+0644, U+0651,
U+064E, U+0627), where the shadda is rendered on the lam part of
lam-alif ligature and the fatha on the alif part.  This problem is not
restricted to right-to-left scripts; I encountered the problem when
debugging left-to-right rendering.  Lam-alif is one of the most
reliably generated ligatures bearing marks on different components.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20173; Package emacs. (Mon, 23 Mar 2015 15:41:02 GMT) Full text and rfc822 format available.

Message #8 received at 20173 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Richard Wordingham <richard.wordingham <at> ntlworld.com>
Cc: 20173 <at> debbugs.gnu.org
Subject: Re: bug#20173: 24.4;
 Rendering misallocates combining marks on ligatures
Date: Mon, 23 Mar 2015 17:38:52 +0200
> Date: Mon, 23 Mar 2015 01:06:26 +0000
> From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
> 
> When a ligature of two base characters has two combining marks on the
> first component but none on the second, the second combining mark is
> rendered as though it applied to the second component. A good example
> is the Arabic sequence لَّا (lam, shadda, fatha, alef - <U+0644, U+0651,
> U+064E, U+0627), where the shadda is rendered on the lam part of
> lam-alif ligature and the fatha on the alif part.  This problem is not
> restricted to right-to-left scripts; I encountered the problem when
> debugging left-to-right rendering.  Lam-alif is one of the most
> reliably generated ligatures bearing marks on different components.

Is it possible that some rule(s) are missing from the end of
lisp/language/misc-lang.el?  Could you please take a look and see if
something needs to be fixed/added in how we set up the compositions
for Arabic?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20173; Package emacs. (Mon, 23 Mar 2015 22:42:01 GMT) Full text and rfc822 format available.

Message #11 received at 20173 <at> debbugs.gnu.org (full text, mbox):

From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20173 <at> debbugs.gnu.org
Subject: Re: bug#20173: 24.4; Rendering misallocates combining marks on
 ligatures
Date: Mon, 23 Mar 2015 22:41:07 +0000
On Mon, 23 Mar 2015 17:38:52 +0200
Eli Zaretskii <eliz <at> gnu.org> wrote:

> > Date: Mon, 23 Mar 2015 01:06:26 +0000
> > From: Richard Wordingham <richard.wordingham <at> ntlworld.com>

> Is it possible that some rule(s) are missing from the end of
> lisp/language/misc-lang.el?  Could you please take a look and see if
> something needs to be fixed/added in how we set up the compositions
> for Arabic?

There's no relevant problem there.  I demonstrated the bug to myself by
first rendering Tai Tham <NA, TONE-2, SIGN AA> and confirming that
TONE-2 rendered above the first component of the ligature NAA, fromed
from <NA, SIGN AA>.  I then hacked my font so that the glyph for TONE-2
was decomposed into the glyphs for MAI KANG and TONE-2, in that order,
and observing TONE-2 being rendered on the second component of the
ligature.  I then turned to Arabic so that a custom font would not be
needed to demonstrate the bug.

As to what needs fixing in the Arabic section of misc-lang.el:

Clusters containing letters should be limited to letters and marks on
them.  Otherwise, the digits 1, 2, 3 are reversed in a variable name
like بج١٢٣د.  (I'm not sure why the problem doesn't appear with بج١٢٣.)

(set-char-table-range
 composition-function-table
 '(#x600 . #x6FF)
 (list ["[\u0600-\u06FF]+" 0 font-shape-gstring]))

should change to something like

(set-char-table-range
 composition-function-table
 '(#x610 . #x615)
 (list
 ["[\u0610-\u0615\u0621-\u065F\u066E-\u06D3\u06D5-\u06EF\u06FA-\u06FC\u06FF]+"
 0
 font-shape-gstring]))

; Skip punctuation

(set-char-table-range
 composition-function-table
 '(#x621 . #x65F)
 (list 
 ["[\u0610-\u0615\u0621-\u065F\u066E-\u06D3\u06D5-\u06EF\u06FA-\u06FC\u06FF]+"
0 font-shape-gstring]))

; skip digits and punctuation

(set-char-table-range
 composition-function-table
 '(#x66E . #x6D3)
 (list 
 ["[\u0610-\u0615\u0621-\u065F\u066E-\u06D3\u06D5-\u06EF\u06FA-\u06FC\u06FF]+"
 0 font-shape-gstring]))

; skip punctuation

(set-char-table-range
 composition-function-table
 '(#x6D5 . #x6EF)
 (list
 ["[\u0610-\u0615\u0621-\u065F\u066E-\u06D3\u06D5-\u06EF\u06FA-\u06FC\u06FF]+"
  0 font-shape-gstring]))

; Skip digits

(set-char-table-range
 composition-function-table
 '(#x6FA . #x6FC)
 (list 
 ["[\u0610-\u0615\u0621-\u065F\u066E-\u06D3\u06D5-\u06EF\u06FA-\u06FC\u06FF]+"
0 font-shape-gstring]))

; Skip symbols

(set-char-table-range
 composition-function-table
 '(#x6FF . #x6FF)
 (list
 ["[\u0610-\u0615\u0621-\u065F\u066E-\u06D3\u06D5-\u06EF\u06FA-\u06FC\u06FF]+"
0 font-shape-gstring]))

There are more elegant ways of expressing this, which is just as well,
for there are also blocks Arabic Supplement (U+0750 to U+077F) and
Arabic Extended-A (U+08A0 to U+08FF).  Being an international script,
the Arabic script has a lot of letters, just like the Latin script.

Richard.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20173; Package emacs. (Tue, 24 Mar 2015 03:43:01 GMT) Full text and rfc822 format available.

Message #14 received at 20173 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Richard Wordingham <richard.wordingham <at> ntlworld.com>
Cc: 20173 <at> debbugs.gnu.org
Subject: Re: bug#20173: 24.4;
 Rendering misallocates combining marks on ligatures
Date: Tue, 24 Mar 2015 05:42:18 +0200
> Date: Mon, 23 Mar 2015 22:41:07 +0000
> From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
> Cc: 20173 <at> debbugs.gnu.org
> 
> On Mon, 23 Mar 2015 17:38:52 +0200
> Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
> > > Date: Mon, 23 Mar 2015 01:06:26 +0000
> > > From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
> 
> > Is it possible that some rule(s) are missing from the end of
> > lisp/language/misc-lang.el?  Could you please take a look and see if
> > something needs to be fixed/added in how we set up the compositions
> > for Arabic?
> 
> There's no relevant problem there.  I demonstrated the bug to myself by
> first rendering Tai Tham <NA, TONE-2, SIGN AA> and confirming that
> TONE-2 rendered above the first component of the ligature NAA, fromed
> from <NA, SIGN AA>.  I then hacked my font so that the glyph for TONE-2
> was decomposed into the glyphs for MAI KANG and TONE-2, in that order,
> and observing TONE-2 being rendered on the second component of the
> ligature.  I then turned to Arabic so that a custom font would not be
> needed to demonstrate the bug.

Sorry, I'm not sure I understand you.  If the setting of composition
rules for Arabic is not the culprit, then what is?  AFAIK, there are
no rules that guide Emacs's shaping except what's in
composition-function-table.  Beyond that, the only other factor is the
font backend and how it shapes glyphs given the chunks of text Emacs
presents to it.

> As to what needs fixing in the Arabic section of misc-lang.el:

Thanks, I will look into these.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20173; Package emacs. (Tue, 24 Mar 2015 08:29:02 GMT) Full text and rfc822 format available.

Message #17 received at 20173 <at> debbugs.gnu.org (full text, mbox):

From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20173 <at> debbugs.gnu.org
Subject: Re: bug#20173: 24.4; Rendering misallocates combining marks on
 ligatures
Date: Tue, 24 Mar 2015 08:28:28 +0000
On Tue, 24 Mar 2015 05:42:18 +0200
Eli Zaretskii <eliz <at> gnu.org> wrote:

> If the setting of composition
> rules for Arabic is not the culprit, then what is?  AFAIK, there are
> no rules that guide Emacs's shaping except what's in
> composition-function-table.  Beyond that, the only other factor is the
> font backend and how it shapes glyphs given the chunks of text Emacs
> presents to it.

The font backend on Unixy systems consists of three components - m17n
(shaping control), libotf (OTL look-up implementation) and Freetype
(glyph rendering).  The glue between them is in Emacs,
most relevantly in function ftfont_drive_otf() in ftfont.c.

My analysis of the problem, which could quite easily be wrong, is as
follows.  To control the positioning of marks for the mark2ligature
lookup, it is necessary to record in some fashion which component of
the ligature a mark applies to.  I cannot see this information being
stored.  The information should be generated and used by libotf, but
needs to be stored between callbacks of ftfont_drive_otf() by m17n.
(The initial settings are implicit in the sequence of codepoints.)
Storing this information would, so far as I can see, require a change to
ftfont_drive_otf().

I may be able to change my font to work round this bug; I can certainly
change it to hide the symptom I observed.  The solution will be to
categorise the ligature NAA <U+1A36, U+1A63> as a base glyph rather
than as a ligature glyph.

There are other places where the HarfBuzz rendering system, which aims
to be compatible with Windows, uses this information.  In particular,
marks applied to a ligature are only allowed to ligate if they apply to
the same component of a ligature, and mark2mark positioning only
applies if the two marks apply to the same component.  This logic is
described as 'the most tricky part of the OpenType specification'.
Part of the trickiness may be that it seems not to have been
published externally (possibly not even internally) by Microsoft.  The
guiding principle seems to be that one should do the right things to the
marks on a ligature of Arabic consonants.

I have become well-acquainted with this logic because the 'same
component logic' seems to be applied by HarfBuzz regardless of whether
the marks are preceded by a base glyph or a ligature glyph.  The
Windows logic seems similar, but is subtly different.  I hit problems
with the Tai Tham NAA ligature, because the marks above on its two
components do interact.  The marks below should probably also interact,
but combinations where I would expect them to have to interact seem not
to occur in natural text.

> > As to what needs fixing in the Arabic section of misc-lang.el:

> Thanks, I will look into these.

You might want to first check whether composed Arabic is
usable. Doesn't making each word a grapheme cluster makes editing
unpleasant?  It might be worth restricting the clustering to
cursively connected sequences of letters within a word.

Richard.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20173; Package emacs. (Tue, 24 Mar 2015 17:04:01 GMT) Full text and rfc822 format available.

Message #20 received at 20173 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Richard Wordingham <richard.wordingham <at> ntlworld.com>
Cc: 20173 <at> debbugs.gnu.org
Subject: Re: bug#20173: 24.4;
 Rendering misallocates combining marks on ligatures
Date: Tue, 24 Mar 2015 19:03:38 +0200
> Date: Tue, 24 Mar 2015 08:28:28 +0000
> From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
> Cc: 20173 <at> debbugs.gnu.org
> 
> On Tue, 24 Mar 2015 05:42:18 +0200
> Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
> > If the setting of composition
> > rules for Arabic is not the culprit, then what is?  AFAIK, there are
> > no rules that guide Emacs's shaping except what's in
> > composition-function-table.  Beyond that, the only other factor is the
> > font backend and how it shapes glyphs given the chunks of text Emacs
> > presents to it.
> 
> The font backend on Unixy systems consists of three components - m17n
> (shaping control), libotf (OTL look-up implementation) and Freetype
> (glyph rendering).  The glue between them is in Emacs,
> most relevantly in function ftfont_drive_otf() in ftfont.c.
> 
> My analysis of the problem, which could quite easily be wrong, is as
> follows.  To control the positioning of marks for the mark2ligature
> lookup, it is necessary to record in some fashion which component of
> the ligature a mark applies to.  I cannot see this information being
> stored.  The information should be generated and used by libotf, but
> needs to be stored between callbacks of ftfont_drive_otf() by m17n.
> (The initial settings are implicit in the sequence of codepoints.)
> Storing this information would, so far as I can see, require a change to
> ftfont_drive_otf().

So this means that on Windows this problem does not exist?

> You might want to first check whether composed Arabic is
> usable. Doesn't making each word a grapheme cluster makes editing
> unpleasant?

I don't know; I don't speak or write any of the languages that use the
Arabic script.  I expect the users that do to come up and ask for
features they miss.  We already allow deletion of single codepoints,
even when they are composed; we might as well provide similar features
for movement or whatever.  But the requests (and, perhaps, even the
code) should come from people who actually use these scripts,
otherwise it's a sure way to white elephants and other similar
creatures.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20173; Package emacs. (Tue, 24 Mar 2015 20:23:02 GMT) Full text and rfc822 format available.

Message #23 received at 20173 <at> debbugs.gnu.org (full text, mbox):

From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20173 <at> debbugs.gnu.org
Subject: Re: bug#20173: 24.4; Rendering misallocates combining marks on
 ligatures
Date: Tue, 24 Mar 2015 20:22:51 +0000
On Tue, 24 Mar 2015 19:03:38 +0200
Eli Zaretskii <eliz <at> gnu.org> wrote:

> So this means that on Windows this problem does not exist?

Correct.  The Arabic test sequence renders properly in Emacs 24.4.1
on Windows 7.

Richard.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20173; Package emacs. (Fri, 27 Mar 2015 09:05:02 GMT) Full text and rfc822 format available.

Message #26 received at 20173 <at> debbugs.gnu.org (full text, mbox):

From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20173 <at> debbugs.gnu.org
Subject: Re: bug#20173: 24.4; Rendering misallocates combining marks on
 ligatures
Date: Fri, 27 Mar 2015 09:04:44 +0000
On Tue, 24 Mar 2015 19:03:38 +0200
Eli Zaretskii <eliz <at> gnu.org> wrote:

> > Date: Tue, 24 Mar 2015 08:28:28 +0000
> > From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
> > Cc: 20173 <at> debbugs.gnu.org

> > You might want to first check whether composed Arabic is
> > usable. Doesn't making each word a grapheme cluster makes editing
> > unpleasant?

> I don't know; I don't speak or write any of the languages that use the
> Arabic script.  I expect the users that do to come up and ask for
> features they miss.  We already allow deletion of single codepoints,
> even when they are composed; we might as well provide similar features
> for movement or whatever.

I forgot that grapheme clustering is done in m17n, not Emacs itself.
The m17n code (in ARAB-OTF.flt) is reasonable - it clusters letters
with combining marks.  It *seems* I have a problem with tpu-forward-char
and tpu-backward-char; it's as though there's an initialisation fault
which stops them stepping through the Arabic compositions at first.  It
may be an issue with the presumably underlying forward-char and
backward-char; I haven't investigated further.  I'll have to record
the exact actions provoking the problem before I formally record a bug.

Richard. 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#20173; Package emacs. (Fri, 27 Mar 2015 09:55:02 GMT) Full text and rfc822 format available.

Message #29 received at 20173 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Richard Wordingham <richard.wordingham <at> ntlworld.com>
Cc: 20173 <at> debbugs.gnu.org
Subject: Re: bug#20173: 24.4;
 Rendering misallocates combining marks on ligatures
Date: Fri, 27 Mar 2015 12:54:22 +0300
> Date: Fri, 27 Mar 2015 09:04:44 +0000
> From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
> Cc: 20173 <at> debbugs.gnu.org
> 
> On Tue, 24 Mar 2015 19:03:38 +0200
> Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
> > > Date: Tue, 24 Mar 2015 08:28:28 +0000
> > > From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
> > > Cc: 20173 <at> debbugs.gnu.org
> 
> > > You might want to first check whether composed Arabic is
> > > usable. Doesn't making each word a grapheme cluster makes editing
> > > unpleasant?
> 
> > I don't know; I don't speak or write any of the languages that use the
> > Arabic script.  I expect the users that do to come up and ask for
> > features they miss.  We already allow deletion of single codepoints,
> > even when they are composed; we might as well provide similar features
> > for movement or whatever.
> 
> I forgot that grapheme clustering is done in m17n, not Emacs itself.
> The m17n code (in ARAB-OTF.flt) is reasonable - it clusters letters
> with combining marks.  It *seems* I have a problem with tpu-forward-char
> and tpu-backward-char; it's as though there's an initialisation fault
> which stops them stepping through the Arabic compositions at first.  It
> may be an issue with the presumably underlying forward-char and
> backward-char; I haven't investigated further.  I'll have to record
> the exact actions provoking the problem before I formally record a bug.

Please try in "emacs -Q" without activating the TPU emulation.




Reply sent to Stefan Kangas <stefan <at> marxist.se>:
You have taken responsibility. (Mon, 17 Aug 2020 22:46:03 GMT) Full text and rfc822 format available.

Notification sent to Richard Wordingham <richard.wordingham <at> ntlworld.com>:
bug acknowledged by developer. (Mon, 17 Aug 2020 22:46:03 GMT) Full text and rfc822 format available.

Message #34 received at 20173-done <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Richard Wordingham <richard.wordingham <at> ntlworld.com>,
 20173-done <at> debbugs.gnu.org
Subject: Re: bug#20173: 24.4;
 Rendering misallocates combining marks on ligatures
Date: Mon, 17 Aug 2020 22:45:26 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

>> Date: Fri, 27 Mar 2015 09:04:44 +0000
>> From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
>> Cc: 20173 <at> debbugs.gnu.org
>>
>> On Tue, 24 Mar 2015 19:03:38 +0200
>> Eli Zaretskii <eliz <at> gnu.org> wrote:
>>
>> > > Date: Tue, 24 Mar 2015 08:28:28 +0000
>> > > From: Richard Wordingham <richard.wordingham <at> ntlworld.com>
>> > > Cc: 20173 <at> debbugs.gnu.org
>>
>> > > You might want to first check whether composed Arabic is
>> > > usable. Doesn't making each word a grapheme cluster makes editing
>> > > unpleasant?
>>
>> > I don't know; I don't speak or write any of the languages that use the
>> > Arabic script.  I expect the users that do to come up and ask for
>> > features they miss.  We already allow deletion of single codepoints,
>> > even when they are composed; we might as well provide similar features
>> > for movement or whatever.
>>
>> I forgot that grapheme clustering is done in m17n, not Emacs itself.
>> The m17n code (in ARAB-OTF.flt) is reasonable - it clusters letters
>> with combining marks.  It *seems* I have a problem with tpu-forward-char
>> and tpu-backward-char; it's as though there's an initialisation fault
>> which stops them stepping through the Arabic compositions at first.  It
>> may be an issue with the presumably underlying forward-char and
>> backward-char; I haven't investigated further.  I'll have to record
>> the exact actions provoking the problem before I formally record a bug.
>
> Please try in "emacs -Q" without activating the TPU emulation.

More information was requested, but none was given within 5 years, so
I'm closing this bug.  If this is still an issue, please reply to this
email (use "Reply to all" in your email client) and we can reopen the
bug report.

Best regards,
Stefan Kangas




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 15 Sep 2020 11:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 282 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.