GNU bug report logs - #11860
24.1; Arabic - Harakat (diacritics, short vowels) don't appear

Previous Next

Package: emacs;

Reported by: Steffan <smias <at> yandex.ru>

Date: Wed, 4 Jul 2012 18:43:12 UTC

Severity: normal

Found in version 24.1

Done: Stefan Kangas <stefan <at> marxist.se>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Kenichi Handa <handa <at> gnu.org>
To: Kenichi Handa <handa.kenichi <at> aist.go.jp>
Cc: eliz <at> gnu.org, 11860 <at> debbugs.gnu.org, smias <at> yandex.ru
Subject: bug#11860: 24.1; Arabic - Harakat (diacritics, short vowels) don't appear
Date: Sat, 18 Aug 2012 11:45:27 +0900
In article <tl7628nslq9.fsf <at> m17n.org>, Kenichi Handa <handa.kenichi <at> aist.go.jp> writes:

> I'm very sorry for the late response.  I was just back from
> Europe.  I'll start investigating this problem soon.

I first confirmed that the described problems of Arabic and
Hebrew occur with Emacs running on Windows.  Typing C-u C-x
= on the first Arabic character (U+0639) showed that "Courier
New" font is used for it, and showed this composition
information.

Composed with the following character(s) "ْ" using this font:
  uniscribe:-outline-Courier New-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1
by these glyphs:
  [0 1 1593 969 8 1 8 12 4 nil]
  [0 1 1593 760 0 3 6 12 4 [1 -2 0]]

Next, I used the same "Courier New" font on GNU/Linux, and
specified it for Arabic as this with Emacs running on
GNU/Linux:
  (set-fontset-font t 'arabic '("courier new"  . "unicode-bmp"))
With this setting, Emacs correctly displayed Arabic, and
typing C-u C-x = on U+0639 showed this composition
information.

Composed with the following character(s) "ْ" using this font:
  xft:-monotype-Courier New-normal-normal-normal-*-13-*-*-*-m-0-iso10646-1
by these glyphs:
  [0 1 1593 969 8 2 8 4 4 nil]
  [0 1 1618 760 0 -6 -3 8 -11 [-9 2 0]]

Each vector is a GLYPH described in the docstring of
composition-get-gstring as this:
----------------------------------------------------------------------
GLYPH is a vector whose elements have this form:
    [ FROM-IDX TO-IDX C CODE WIDTH LBEARING RBEARING ASCENT DESCENT
      [ [X-OFF Y-OFF WADJUST] | nil] ]
where
    FROM-IDX and TO-IDX are used internally and should not be touched.
    C is the character of the glyph.
    CODE is the glyph-code of C in FONT-OBJECT.
    WIDTH thru DESCENT are the metrics (in pixels) of the glyph.
    X-OFF and Y-OFF are offsets to the base position for the glyph.
    WADJUST is the adjustment to the normal width of the glyph.
----------------------------------------------------------------------

So, apparently Emacs on Windows and GNU/Linux uses the
different metrics of glyphs.  As the shaper on GNU/Linux
(m17n-lib library) works correctly for the same font, and
the other applications on Windows have no problem, I suspect
that the problem is in Emacs' interface with uniscribe
(w32font.c or w32uniscribe.c).

If this problem happens only for bidi scripts, one
possibility is that Emacs's rendering engine (xdisp.c)
expects glyphs in a glyph-string are rendered in that order
from left to right, but the returned glyph-string on Windows
should be rendered in reverse order.  For instance, in the
above case, we may have to render glyphs in this order
(diacritical mark first):

  [0 1 1593 760 0 3 6 12 4 [1 -2 0]]
  [0 1 1593 969 8 1 8 12 4 nil]

I think the further debugging must be done by those who
knows uniscribe, w32font.c, and w32uniscribe.c.

---
Kenichi Handa
handa <at> gnu.org




This bug report was last modified 4 years and 274 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.