GNU bug report logs - #33729
27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz enabled (renders fine using m17n)

Previous Next

Package: emacs;

Reported by: Kaushal Modi <kaushal.modi <at> gmail.com>

Date: Thu, 13 Dec 2018 20:22:02 UTC

Severity: normal

Found in version 27.0.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Kaushal Modi <kaushal.modi <at> gmail.com>
To: dr.khaled.hosny <at> gmail.com
Cc: behdad <at> behdad.org, 33729 <at> debbugs.gnu.org, far.nasiri.m <at> gmail.com
Subject: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz enabled (renders fine using m17n)
Date: Thu, 13 Dec 2018 15:43:50 -0500
[Message part 1 (text/plain, inline)]
On Thu, Dec 13, 2018 at 3:31 PM Khaled Hosny <dr.khaled.hosny <at> gmail.com>
wrote:

>
> The HarfBuzz rendering of Arabic is the correct one in this screenshot.
>

Thanks. So here's the status so far:

Rendering of Namaste as seen in C-h h (M-x view-hello-file):

|          | harfbuzz | m17b    |
|----------+----------+---------|
| Hindi    | correct  | correct |
| Gujarati | wrong    | correct |
| Arabic   | correct  | wrong   |



> For debugging the such rendering differences, the actual font used by
> Emacs for a given part of the text need to be known,


I am using Mukta Vaani font for Gujarati. It is a free font and be
downloaded from https://ektype.in/mukta-vaani.html.

The string being rendered is "નમસ્તે".
By placing the cursor on each of those characters and doing C-u x = (on the
m17n build), I get:

(1) ન

             position: 1610 of 3509 (46%), column: 32
            character: ન (displayed as ન) (codepoint 2728, #o5250, #xaa8)
              charset: mule-unicode-0100-24ff (Unicode characters of the
range U+0100..U+24FF.)
code point in charset: 0x3968
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET aa8" or "C-x 8 RET GUJARATI LETTER
NA"
          buffer code: #xE0 #xAA #xA8
            file code: #xE0 #xAA #xA8 (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1
(#x234)

Character code properties: customize what to show
  name: GUJARATI LETTER NA
  general-category: Lo (Letter, Other)
  decomposition: (2728) ('ન')

There are text properties here:
  charset              mule-unicode-0100-24ff

(2) મ

             position: 1611 of 3509 (46%), column: 33
            character: મ (displayed as મ) (codepoint 2734, #o5256, #xaae)
              charset: mule-unicode-0100-24ff (Unicode characters of the
range U+0100..U+24FF.)
code point in charset: 0x396E
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET aae" or "C-x 8 RET GUJARATI LETTER
MA"
          buffer code: #xE0 #xAA #xAE
            file code: #xE0 #xAA #xAE (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1
(#x239)

Character code properties: customize what to show
  name: GUJARATI LETTER MA
  general-category: Lo (Letter, Other)
  decomposition: (2734) ('મ')

There are text properties here:
  charset              mule-unicode-0100-24ff

(3) સ્તે

             position: 1612 of 3509 (46%), column: 34
            character: સ (displayed as સ) (codepoint 2744, #o5270, #xab8)
              charset: mule-unicode-0100-24ff (Unicode characters of the
range U+0100..U+24FF.)
code point in charset: 0x3978
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET ab8" or "C-x 8 RET GUJARATI LETTER
SA"
          buffer code: #xE0 #xAA #xB8
            file code: #xE0 #xAA #xB8 (encoded by coding system utf-8-unix)
              display: composed to form "સ્તે" (see below)

Composed with the following character(s) "્તે" using this font:
  xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1
by these glyphs:
  [0 3 0 645 8 0 11 11 0 [0 0 8]]
  [0 3 2724 560 11 1 11 11 1 nil]
  [0 3 2759 589 0 -9 -2 16 -11 [-1 0 0]]

Character code properties: customize what to show
  name: GUJARATI LETTER SA
  general-category: Lo (Letter, Other)
  decomposition: (2744) ('સ')

There are text properties here:
  charset              mule-unicode-0100-24ff


=====


On harfbuzz build, the "સ્તે" part is different.. I can place the cursor
separately on સ્ and તે, do C-u x = and I get:

(3.1) સ્
             position: 1612 of 3509 (46%), column: 34
            character: સ (displayed as સ) (codepoint 2744, #o5270, #xab8)
              charset: mule-unicode-0100-24ff (Unicode characters of the
range U+0100..U+24FF.)
code point in charset: 0x3978
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET ab8" or "C-x 8 RET GUJARATI LETTER
SA"
          buffer code: #xE0 #xAA #xB8
            file code: #xE0 #xAA #xB8 (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1
(#x241)

Character code properties: customize what to show
  name: GUJARATI LETTER SA
  general-category: Lo (Letter, Other)
  decomposition: (2744) ('સ')

There are text properties here:
  charset              mule-unicode-0100-24ff

(3.2) તે

             position: 1614 of 3509 (46%), column: 35
            character: ત (displayed as ત) (codepoint 2724, #o5244, #xaa4)
              charset: mule-unicode-0100-24ff (Unicode characters of the
range U+0100..U+24FF.)
code point in charset: 0x3964
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET aa4" or "C-x 8 RET GUJARATI LETTER
TA"
          buffer code: #xE0 #xAA #xA4
            file code: #xE0 #xAA #xA4 (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1
(#x230)

Character code properties: customize what to show
  name: GUJARATI LETTER TA
  general-category: Lo (Letter, Other)
  decomposition: (2724) ('ત')

There are text properties here:
  charset              mule-unicode-0100-24ff



then the text and
> the font can be checked against vanilla HarfBuzz (e.g. using the hb-view
> command line tool); if it gives the same rendering then it is either a
> HarfBuzz or font issue, if not then it is a bug in the HarfBuzz
> integration code in Emacs.
>
[Message part 2 (text/html, inline)]

This bug report was last modified 3 years and 22 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.