#11860 - 24.1; Arabic - Harakat (diacritics, short vowels) don't appear

GNU bug report logs - #11860
24.1; Arabic - Harakat (diacritics, short vowels) don't appear

Package: emacs;

Reported by: Steffan <smias <at> yandex.ru>

Date: Wed, 4 Jul 2012 18:43:12 UTC

Severity: normal

Found in version 24.1

Done: Stefan Kangas <stefan <at> marxist.se>

Bug is archived. No further changes may be made.

Message #62 received at 11860 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org> To: Kenichi Handa <handa <at> gnu.org>, Jason Rumney <jasonr <at> gnu.org> Cc: 11860 <at> debbugs.gnu.org, smias <at> yandex.ru Subject: Re: bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear Date: Sun, 19 Aug 2012 21:22:40 +0300

> From: Kenichi Handa <handa <at> gnu.org> > Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru > Date: Sat, 18 Aug 2012 11:45:27 +0900 > > So, apparently Emacs on Windows and GNU/Linux uses the > different metrics of glyphs. As the shaper on GNU/Linux > (m17n-lib library) works correctly for the same font, and > the other applications on Windows have no problem, I suspect > that the problem is in Emacs' interface with uniscribe > (w32font.c or w32uniscribe.c). I agree. > If this problem happens only for bidi scripts Can you suggest how to test this hypothesis? > one possibility is that Emacs's rendering engine (xdisp.c) expects > glyphs in a glyph-string are rendered in that order from left to > right, but the returned glyph-string on Windows should be rendered > in reverse order. You may be right, but it's hard to be sure. At least the advances[] array returned by ScriptPlace seems to point into that direction. Here's what I see in the debugger: Breakpoint 8, uniscribe_shape (lgstring=55041941) at w32uniscribe.c:373 373 LGLYPH_SET_CHAR (lglyph, chars[items[i].iCharPos (gdb) p items <at> nitems $1 = {0x35195a0} (gdb) p items[0]@nitems $2 = {{ iCharPos = 0, a = { eScript = 26, fRTL = 1, fLayoutRTL = 1, fLinkBefore = 0, fLinkAfter = 0, fLogicalOrder = 1, fNoGlyphIndex = 0, s = { uBidiLevel = 1, fOverrideDirection = 0, fInhibitSymSwap = 0, fCharShape = 0, fDigitSubstitute = 0, fInhibitLigate = 0, fDisplayZWG = 0, fArabicNumContext = 0, fGcpClusters = 0, fReserved = 0, fEngineReserved = 0 } } }} (gdb) p nitems $3 = 1 (gdb) p nglyphs $4 = 2 (gdb) p advances[0]@nglyphs $5 = {8, 0} (gdb) p offsets[0]@nglyphs $6 = {{ du = 0, dv = 0 }, { du = 1, dv = -2 }} (gdb) p chars[0]@2 $7 = L"\x639\x652" (Note that the fRTL member of items[0].a is set to TRUE.) My understanding of the advances[] array is that it gives, for each glyph in the cluster, the number of pixels to advance to the right after drawing the glyph. So the fact that it is 8 for the first (base) character and zero for the second one tells me that this grapheme cluster is supposed to be rendered in reverse order: first the Sukun, then Ayin at the same location, and then advance by 8 pixels for the next character. Is this correct? If it is correct, then how come the glyphs shown on GNU/Linux also have non-zero value of xadvance: [0 1 1593 969 8 2 8 4 4 nil] [0 1 1618 760 0 -6 -3 8 -11 [-9 2 0]] The value 8 after 969 comes directly from xadvance, as this code in ftfont.c shows: LGLYPH_SET_WIDTH (lglyph, g->xadv >> 6); Is the meaning of xadvance in libotf different from its meaning in Uniscribe? (And why is the glyph string element called WIDTH instead of ADVANCE?) If not, what am I missing? > For instance, in the above case, we may have to render glyphs in > this order (diacritical mark first): > > [0 1 1593 760 0 3 6 12 4 [1 -2 0]] > [0 1 1593 969 8 1 8 12 4 nil] I tried the naive patch below, but it didn't quite work. It seems like those changes somehow prevented character composition. Perhaps Handa-san could give me some guidance here. > I think the further debugging must be done by those who > knows uniscribe, w32font.c, and w32uniscribe.c. It's very hard, given that glyph-string documentation leaves a lot to be desired, and the way its various components are used during drawing is also left without clear documentation. E.g., this: FROM-IDX and TO-IDX are used internally and should not be touched. is not really helpful for explaining what are FROM-IDX and TO-IDX, so how can I figure out whether the code you asked about is doing TRT? And without knowing what is each component of glyph-string used for during drawing, how can I compare the values produced by Uniscribe APIs with what glyph-string needs? If someone could explain all those things, it would make debugging possible. Otherwise, I'm just randomly poking around... Here's the patch I tried: --- src/w32uniscribe.c~ 2012-07-08 07:24:56.000000000 +0300 +++ src/w32uniscribe.c 2012-08-19 15:55:17.323623900 +0300 @@ -331,17 +331,13 @@ uniscribe_shape (Lisp_Object lgstring) Lisp_Object lglyph = LGSTRING_GLYPH (lgstring, lglyph_index); ABC char_metric; unsigned gl; + int j1; if (NILP (lglyph)) { lglyph = Fmake_vector (make_number (LGLYPH_SIZE), Qnil); LGSTRING_SET_GLYPH (lgstring, lglyph_index, lglyph); } - /* Copy to a 32-bit data type to shut up the - compiler warning in LGLYPH_SET_CODE about - comparison being always false. */ - gl = glyphs[j]; - LGLYPH_SET_CODE (lglyph, gl); /* Detect clusters, for linking codes back to characters. */ @@ -365,6 +361,16 @@ uniscribe_shape (Lisp_Object lgstring) } } } + if (items[i].a.fRTL) + j1 = to - (j - from); + else + j1 = j; + + /* Copy to a 32-bit data type to shut up the + compiler warning in LGLYPH_SET_CODE about + comparison being always false. */ + gl = glyphs[j1]; + LGLYPH_SET_CODE (lglyph, gl); LGLYPH_SET_CHAR (lglyph, chars[items[i].iCharPos + from]); @@ -372,13 +378,13 @@ uniscribe_shape (Lisp_Object lgstring) LGLYPH_SET_TO (lglyph, items[i].iCharPos + to); /* Metrics. */ - LGLYPH_SET_WIDTH (lglyph, advances[j]); + LGLYPH_SET_WIDTH (lglyph, advances[j1]); LGLYPH_SET_ASCENT (lglyph, font->ascent); LGLYPH_SET_DESCENT (lglyph, font->descent); result = ScriptGetGlyphABCWidth (context, &(uniscribe_font->cache), - glyphs[j], &char_metric); + glyphs[j1], &char_metric); if (result == E_PENDING && !context) { /* Cache incomplete... */ @@ -387,7 +393,7 @@ uniscribe_shape (Lisp_Object lgstring) old_font = SelectObject (context, FONT_HANDLE (font)); result = ScriptGetGlyphABCWidth (context, &(uniscribe_font->cache), - glyphs[j], &char_metric); + glyphs[j1], &char_metric); } if (SUCCEEDED (result)) @@ -399,17 +405,17 @@ uniscribe_shape (Lisp_Object lgstring) else { LGLYPH_SET_LBEARING (lglyph, 0); - LGLYPH_SET_RBEARING (lglyph, advances[j]); + LGLYPH_SET_RBEARING (lglyph, advances[j1]); } - if (offsets[j].du || offsets[j].dv) + if (offsets[j1].du || offsets[j1].dv) { Lisp_Object vec; vec = Fmake_vector (make_number (3), Qnil); - ASET (vec, 0, make_number (offsets[j].du)); - ASET (vec, 1, make_number (offsets[j].dv)); + ASET (vec, 0, make_number (offsets[j1].du)); + ASET (vec, 1, make_number (offsets[j1].dv)); /* Based on what ftfont.c does... */ - ASET (vec, 2, make_number (advances[j])); + ASET (vec, 2, make_number (advances[j1])); LGLYPH_SET_ADJUSTMENT (lglyph, vec); } else

This bug report was last modified 4 years and 334 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #11860 24.1; Arabic - Harakat (diacritics, short vowels) don't appear

GNU bug report logs - #11860
24.1; Arabic - Harakat (diacritics, short vowels) don't appear