GNU bug report logs - #50951
28.0.50; Urdu text is not displayed correctly

Previous Next

Package: emacs;

Reported by: Rah Guzar <aikrahguzar <at> gmail.com>

Date: Fri, 1 Oct 2021 20:19:01 UTC

Severity: normal

Tags: moreinfo

Found in version 28.0.50

Done: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Rah Guzar <aikrahguzar <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 50951 <at> debbugs.gnu.org
Subject: bug#50951: Fwd: bug#50951: 28.0.50; Urdu text is not displayed correctly
Date: Sat, 2 Oct 2021 16:19:01 +0200
[Message part 1 (text/plain, inline)]
On Sat, Oct 2, 2021 at 3:09 PM Eli Zaretskii <eliz <at> gnu.org> wrote:

> The way to investigate such problems is to see what does hb-view, a
> program that is part of the HarfBuzz installation, produce for the
> same text with the same font.  If hb-view produces correct display,
> but Emacs doesn't, then the problem is indeed in Emacs; otherwise the
> problem is probably with the font, and in any case should be taken up
> with the HarfBuzz developers.
>

I tried hb-view with NotoNastaliqUrdu and the text:
 خوبی اپنی قسمت کی
This is what I get
[image: urduhbtestnoto.png]
While in emacs I discovered how it is displayed depends a lot on the font
size.

For the same text at size 16, I get

[image: emacsq16.png]

At size 24 it looks almost correct
[image: emacsq24.png]
At size 32 it is really bad again
[image: emacsq32.png]
And the issue seem to be glyph placement rather than shaping.

NotoNastaliqUrdu seems to be the only font with this issue. I am not sure
if the problem is due to Nastaliq.
The other two Nastaliq fonts seem to handle joining characters through
composition. If I change font using

(set-fontset-font t 'arabic (font-spec :family "Jameel Noori Nastaleeq"
:size 32))

and move cursor to the word "قسمت" which has 4 characters, the cursor
encompasses all of them and "C-u C-x u"
gives

-----------------------------------------------------------------------------------------------------------------------
             position: 157 of 283 (55%), column: 11
            character: ق‎ (displayed as ق‎) (codepoint 1602, #o3102, #x642)
              charset: unicode (Unicode (ISO10646))
code point in charset: 0x0642
               script: arabic
               syntax: w which means: word
             category: .:Base, R:Right-to-left (strong), b:Arabic
             to input: type "C-x 8 RET 642" or "C-x 8 RET ARABIC LETTER QAF"
          buffer code: #xD9 #x82
            file code: #xD9 #x82 (encoded by coding system utf-8-unix)
              display: composed to form "قسمت" (see below)

Composed with the following character(s) "سمت" using this font:
  ftcrhb:-pdms-Jameel Noori
Nastaleeq-normal-normal-normal-*-32-*-*-*-*-0-iso10646-1
by these glyphs:
  [0 3 1578 11352 50 1 51 30 1 nil]
with these character(s):
  س (#x633) ARABIC LETTER SEEN
  م (#x645) ARABIC LETTER MEEM
  ت (#x62a) ARABIC LETTER TEH

Character code properties: customize what to show
  name: ARABIC LETTER QAF
  general-category: Lo (Letter, Other)
  decomposition: (1602) ('ق')

There are text properties here:
  fontified            nil
-----------------------------------------------------------------------------------------------------------------------------

Changing to NotoNastaliqUrdu using

(set-fontset-font t 'arabic (font-spec :family "NotoNastaliqUrdu" :size 32))

the cursor moves through one character at a time and moving the cursor to
the beginning of the same word
"C-u C-x =" gives

-----------------------------------------------------------------------------------------------------------------------------------------
             position: 157 of 282 (55%), column: 11
            character: ق‎ (displayed as ق‎) (codepoint 1602, #o3102, #x642)
              charset: unicode (Unicode (ISO10646))
code point in charset: 0x0642
               script: arabic
               syntax: w which means: word
             category: .:Base, R:Right-to-left (strong), b:Arabic
             to input: type "C-x 8 RET 642" or "C-x 8 RET ARABIC LETTER QAF"
          buffer code: #xD9 #x82
            file code: #xD9 #x82 (encoded by coding system utf-8-unix)
              display: composed to form "ق" (see below)

Composed using this font:
  ftcrhb:-GOOG-Noto Nastaliq
Urdu-normal-normal-normal-*-32-*-*-*-*-0-iso10646-1
by these glyphs:
  [0 0 1602 16 0 -6 6 35 -26 [3 -16 0]]
  [0 0 1602 983 0 0 0 0 0 nil]
  [0 0 1602 284 8 -1 8 24 6 [0 -23 8]]

Character code properties: customize what to show
  name: ARABIC LETTER QAF
  general-category: Lo (Letter, Other)
  decomposition: (1602) ('ق')

There are text properties here:
  fontified            t
-----------------------------------------------------------------------------------------------------------------------------------------

(Are you sure that LibreOffice uses NotoNastaliqUrdu for the text you
> type there?  They could use a different font under the hood.)
>

LibreOffice uses something else by default and when I changed to
NotoNastaliqUrdu the appearance changes
and is the same as what I get with hb-view.
[Message part 2 (text/html, inline)]
[urduhbtestnoto.png (image/png, inline)]
[emacsq16.png (image/png, inline)]
[emacsq24.png (image/png, inline)]
[emacsq32.png (image/png, inline)]

This bug report was last modified 2 years and 241 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.