#11860 - 24.1; Arabic - Harakat (diacritics, short vowels) don't appear

GNU bug report logs - #11860
24.1; Arabic - Harakat (diacritics, short vowels) don't appear

Package: emacs;

Reported by: Steffan <smias <at> yandex.ru>

Date: Wed, 4 Jul 2012 18:43:12 UTC

Severity: normal

Found in version 24.1

Done: Stefan Kangas <stefan <at> marxist.se>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Jason Rumney <jasonr <at> gnu.org> To: Kenichi Handa <handa <at> gnu.org> Cc: Eli Zaretskii <eliz <at> gnu.org>, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru Subject: bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear Date: Sun, 19 Aug 2012 11:02:52 +0800

Kenichi Handa <handa <at> gnu.org> writes: > In article <83txw0aczg.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes: > >> > From: Kenichi Handa <handa <at> gnu.org> >> > Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru >> > Date: Sat, 18 Aug 2012 11:45:27 +0900 >> > >> > So, apparently Emacs on Windows and GNU/Linux uses the >> > different metrics of glyphs. Right, but adding the offsets to the corresponding metrics, we get the same result with both the Windows and GNU/Linux cases, except for the total height of the font, which I think is because Windows counts inter-line spacing, while on GNU/Linux, that is separate. So I'm not sure that this is causing us problems (see Eli's report about Hebrew), it's just a case of a different reference point being used between Windows and GNU/Linux. > For Hebrew too, on Windows, I see the same problem as what > Steffan <smias <at> yandex.ru> reported: If you are seeing something different than Eli for Hebrew with the same font, then I suspect the cause is linked with the version of Uniscribe that is installed. Maybe diacritic handling for Hebrew and Arabic is a more recent addition to Uniscribe than the basic support for those languages. >> > For instance, in the above case, we may have to render glyphs in >> > this order (diacritical mark first): >> > >> > [0 1 1593 760 0 3 6 12 4 [1 -2 0]] >> > [0 1 1593 969 8 1 8 12 4 nil] I'm curious as to how we ended up with the same C entry in those vectors. Could this be causing us problems later on? The glyph index is correct (comparing to the GNU/Linux version), but I wonder if Uniscribe is referring back to the character at some point and tripping up because it has been changed. > I've just read the function uniscribe_shape in > w32uniscribe.c. It seems that these are the key API for > uniscribe: > > * ScriptItemize -- no idea what is this This should be a no-op on Emacs, as we already split the string into LGSTRING components. But if it is not called, subsequent uniscribe operations fail, so it must also be doing some initialization of internal structures as well. > * ScriptShape -- perhaps for glyph substitution (GSUB features of opentype) > * ScriptPlace -- perhaps for glyph positioning (GPOS features of opentype) Yes, I think that is correct. > So at first please check the documentation of ScriptShape > and figure out how it works for bidi script; i.e. what order > does it expect for input, and what order does it produce. > > Next please find the meaning of this code fragment: > > /* Detect clusters, for linking codes back to > characters. */ > if (attributes[j].fClusterStart) > { > while (from < nchars_in_run && clusters[from] < j) > from++; > if (from >= nchars_in_run) > from = to = nchars_in_run - 1; > else > { > int k; > to = nchars_in_run - 1; > for (k = from + 1; k < nchars_in_run; k++) > { > if (clusters[k] > j) > { > to = k - 1; > break; > } > } > } > } > > The comment refer to "clusters". I don't know what it > exactly means in uniscribe, but I guess it relates to > grapheme cluster, and if so, this part seems to relates to > the ordering of glyphs in this kind of grapheme clauster: > > [0 1 1593 969 8 1 8 12 4 nil] > [0 1 1593 760 0 3 6 12 4 [1 -2 0]] That seems to be correct. Maybe this is the code that is changing the character code to 1593. I seem to recall that something like this was required for Indic languages to let Emacs know which characters had been linked back into one glyph.

This bug report was last modified 4 years and 336 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #11860 24.1; Arabic - Harakat (diacritics, short vowels) don't appear

GNU bug report logs - #11860
24.1; Arabic - Harakat (diacritics, short vowels) don't appear