GNU bug report logs -
#11860
24.1; Arabic - Harakat (diacritics, short vowels) don't appear
Previous Next
Reported by: Steffan <smias <at> yandex.ru>
Date: Wed, 4 Jul 2012 18:43:12 UTC
Severity: normal
Found in version 24.1
Done: Stefan Kangas <stefan <at> marxist.se>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Kenichi Handa <handa <at> gnu.org> writes:
> In article <83txw0aczg.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:
>
>> > From: Kenichi Handa <handa <at> gnu.org>
>> > Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
>> > Date: Sat, 18 Aug 2012 11:45:27 +0900
>> >
>> > So, apparently Emacs on Windows and GNU/Linux uses the
>> > different metrics of glyphs.
Right, but adding the offsets to the corresponding metrics, we get the
same result with both the Windows and GNU/Linux cases, except for the
total height of the font, which I think is because Windows counts
inter-line spacing, while on GNU/Linux, that is separate.
So I'm not sure that this is causing us problems (see Eli's report about
Hebrew), it's just a case of a different reference point being used
between Windows and GNU/Linux.
> For Hebrew too, on Windows, I see the same problem as what
> Steffan <smias <at> yandex.ru> reported:
If you are seeing something different than Eli for Hebrew with the same
font, then I suspect the cause is linked with the version of Uniscribe
that is installed. Maybe diacritic handling for Hebrew and Arabic is a
more recent addition to Uniscribe than the basic support for those
languages.
>> > For instance, in the above case, we may have to render glyphs in
>> > this order (diacritical mark first):
>> >
>> > [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
>> > [0 1 1593 969 8 1 8 12 4 nil]
I'm curious as to how we ended up with the same C entry in those
vectors. Could this be causing us problems later on? The glyph index
is correct (comparing to the GNU/Linux version), but I wonder if
Uniscribe is referring back to the character at some point and tripping
up because it has been changed.
> I've just read the function uniscribe_shape in
> w32uniscribe.c. It seems that these are the key API for
> uniscribe:
>
> * ScriptItemize -- no idea what is this
This should be a no-op on Emacs, as we already split the string into
LGSTRING components. But if it is not called, subsequent uniscribe
operations fail, so it must also be doing some initialization of
internal structures as well.
> * ScriptShape -- perhaps for glyph substitution (GSUB features of opentype)
> * ScriptPlace -- perhaps for glyph positioning (GPOS features of opentype)
Yes, I think that is correct.
> So at first please check the documentation of ScriptShape
> and figure out how it works for bidi script; i.e. what order
> does it expect for input, and what order does it produce.
>
> Next please find the meaning of this code fragment:
>
> /* Detect clusters, for linking codes back to
> characters. */
> if (attributes[j].fClusterStart)
> {
> while (from < nchars_in_run && clusters[from] < j)
> from++;
> if (from >= nchars_in_run)
> from = to = nchars_in_run - 1;
> else
> {
> int k;
> to = nchars_in_run - 1;
> for (k = from + 1; k < nchars_in_run; k++)
> {
> if (clusters[k] > j)
> {
> to = k - 1;
> break;
> }
> }
> }
> }
>
> The comment refer to "clusters". I don't know what it
> exactly means in uniscribe, but I guess it relates to
> grapheme cluster, and if so, this part seems to relates to
> the ordering of glyphs in this kind of grapheme clauster:
>
> [0 1 1593 969 8 1 8 12 4 nil]
> [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
That seems to be correct. Maybe this is the code that is changing the
character code to 1593. I seem to recall that something like this was
required for Indic languages to let Emacs know which characters had been
linked back into one glyph.
This bug report was last modified 4 years and 275 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.