#11860 - 24.1; Arabic - Harakat (diacritics, short vowels) don't appear

GNU bug report logs - #11860
24.1; Arabic - Harakat (diacritics, short vowels) don't appear

Package: emacs;

Reported by: Steffan <smias <at> yandex.ru>

Date: Wed, 4 Jul 2012 18:43:12 UTC

Severity: normal

Found in version 24.1

Done: Stefan Kangas <stefan <at> marxist.se>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org> To: Jason Rumney <jasonr <at> gnu.org> Cc: handa <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru Subject: bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear Date: Sun, 19 Aug 2012 20:56:57 +0300

> From: Jason Rumney <jasonr <at> gnu.org> > Cc: Eli Zaretskii <eliz <at> gnu.org>, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru > Date: Sun, 19 Aug 2012 11:02:52 +0800 > > Kenichi Handa <handa <at> gnu.org> writes: > > > In article <83txw0aczg.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes: > > > >> > From: Kenichi Handa <handa <at> gnu.org> > >> > Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru > >> > Date: Sat, 18 Aug 2012 11:45:27 +0900 > >> > > >> > So, apparently Emacs on Windows and GNU/Linux uses the > >> > different metrics of glyphs. > > Right, but adding the offsets to the corresponding metrics, we get the > same result with both the Windows and GNU/Linux cases I think the results of addition are not relevant to the problem. The problem is that the diacriticals and/or vowels are not drawn at correct horizontal positions. The values of the offsets are directly relevant to that, because they describe how many pixels to advance after drawing each glyph. By contrast, the sum of the offsets will be always approximately the same, since the entire grapheme cluster occupies a single character cell. > So I'm not sure that this is causing us problems (see Eli's report about > Hebrew), it's just a case of a different reference point being used > between Windows and GNU/Linux. My report about Hebrew is not relevant either; see below. > If you are seeing something different than Eli for Hebrew with the same > font, then I suspect the cause is linked with the version of Uniscribe > that is installed. Maybe diacritic handling for Hebrew and Arabic is a > more recent addition to Uniscribe than the basic support for those > languages. That appears to be the case, indeed. My initial attempts to reproduce this were on XP SP3, where Hebrew rendering appeared to be OK. I now tried on Windows 7 and there I see the problem with Hebrew as well. Moreover, when I type the Hebrew characters specified by the OP, I don't see that the uniscribe_shape function is called at all on XP: a breakpoint inside it never breaks. On Windows 7, that function does get called. Jason, how can I find out whether Uniscribe is used for rendering Hebrew, or why doesn't Emacs call uniscribe_shape? (I know about uniscribe_font->cache, but I don't see that function called even if I start Emacs with a breakpoint in it, so it seems the cache is not the issue here. The cache is per application, right?) For Arabic characters in the recipe, uniscribe_shape _is_ called on XP. I guess that's why the problem with Arabic is visible on both XP and Windows7. For the record, here's the output of "C-u C-x =" on XP for the Hebrew character composition mentioned earlier: position: 193 of 194 (99%), column: 1 character: ג‎ (displayed as ג‎) (codepoint 1490, #o2722, #x5d2) preferred charset: iso-8859-8 (ISO/IEC 8859/8) code point in charset: 0xE2 syntax: w which means: word category: .:Base, R:Right-to-left (strong) to input: type "d" with hebrew-full buffer code: #xD7 #x92 file code: #xE2 (encoded by coding system hebrew-iso-8bit-dos) display: composed to form "גֻ" (see below) Composed with the following character(s) "ֻ" using this font: uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso8859-8 by these glyphs: [0 1 1490 674 8 0 6 12 4 nil] [0 1 1467 663 8 0 7 12 4 [-8 0 0]] Compare with the output on Windows 7 to see the differences: position: 193 of 194 (99%), column: 1 character: ג‎ (displayed as ג‎) (codepoint 1490, #o2722, #x5d2) preferred charset: unicode (Unicode (ISO10646)) code point in charset: 0x05D2 syntax: w which means: word category: .:Base, R:Right-to-left (strong) to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME" buffer code: #xD7 #x92 file code: not encodable by coding system iso-latin-1-dos display: composed to form "גֻ" (see below) Composed with the following character(s) "ֻ" using this font: uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1 by these glyphs: [0 1 1490 674 8 1 6 12 4 nil] [0 1 1490 663 0 2 6 12 4 nil] And here's the output of "C-u C-x =" for the Arabic character Ayin with sukun on XP: position: 197 of 198 (99%), column: 0 character: ع‎ (displayed as ع‎) (codepoint 1593, #o3071, #x639) preferred charset: unicode (Unicode (ISO10646)) code point in charset: 0x0639 syntax: w which means: word category: .:Base, R:Right-to-left (strong), b:Arabic buffer code: #xD8 #xB9 file code: not encodable by coding system hebrew-iso-8bit-dos display: composed to form "عْ" (see below) Composed with the following character(s) "ْ" using this font: uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1 by these glyphs: [0 1 1593 969 8 2 8 12 4 nil] [0 1 1593 1028 0 3 6 12 4 nil] Note that the glyph index of the sukun are different from the Windows 7 output. I have no idea why. > >> > [0 1 1593 760 0 3 6 12 4 [1 -2 0]] > >> > [0 1 1593 969 8 1 8 12 4 nil] > > I'm curious as to how we ended up with the same C entry in those > vectors. That's because the code in uniscribe_shape does this: LGLYPH_SET_CHAR (lglyph, chars[items[i].iCharPos + from]); and it does that for all the 'nglyphs' glyphs produced by ScriptPlace. As Handa-san writes, the character code is never used, because we have the font glyph index and its metrics, so I think this is a non-issue. > Could this be causing us problems later on? The glyph index > is correct (comparing to the GNU/Linux version), but I wonder if > Uniscribe is referring back to the character at some point and tripping > up because it has been changed. Uniscribe cannot refer to this code, because Uniscribe doesn't use LGSTRING, IIUC. Or does it? (If it does, please show where in the code it uses that value.) > > /* Detect clusters, for linking codes back to > > characters. */ > > if (attributes[j].fClusterStart) > > { > > while (from < nchars_in_run && clusters[from] < j) > > from++; > > if (from >= nchars_in_run) > > from = to = nchars_in_run - 1; > > else > > { > > int k; > > to = nchars_in_run - 1; > > for (k = from + 1; k < nchars_in_run; k++) > > { > > if (clusters[k] > j) > > { > > to = k - 1; > > break; > > } > > } > > } > > } > > > > The comment refer to "clusters". I don't know what it > > exactly means in uniscribe, but I guess it relates to > > grapheme cluster, and if so, this part seems to relates to > > the ordering of glyphs in this kind of grapheme clauster: > > > > [0 1 1593 969 8 1 8 12 4 nil] > > [0 1 1593 760 0 3 6 12 4 [1 -2 0]] > > That seems to be correct. Maybe this is the code that is changing the > character code to 1593. It doesn't _change_ the character code, it simply sets it to the code of the base character. But again, I don't think this is relevant.

This bug report was last modified 4 years and 334 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #11860 24.1; Arabic - Harakat (diacritics, short vowels) don't appear

GNU bug report logs - #11860
24.1; Arabic - Harakat (diacritics, short vowels) don't appear