GNU bug report logs - #50951
28.0.50; Urdu text is not displayed correctly

Previous Next

Package: emacs;

Reported by: Rah Guzar <aikrahguzar <at> gmail.com>

Date: Fri, 1 Oct 2021 20:19:01 UTC

Severity: normal

Tags: moreinfo

Found in version 28.0.50

Done: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>

Bug is archived. No further changes may be made.

Full log


Message #14 received at 50951 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Rah Guzar <aikrahguzar <at> gmail.com>
Cc: 50951 <at> debbugs.gnu.org
Subject: Re: bug#50951: Fwd: bug#50951: 28.0.50;
 Urdu text is not displayed correctly
Date: Sat, 02 Oct 2021 15:18:28 +0300
> From: Rah Guzar <aikrahguzar <at> gmail.com>
> Date: Sat, 2 Oct 2021 13:43:47 +0200
> 
> Let us consider the word نہیں
> 
> It is composed of four letters. I will use character field from `describe-char` for each of them below 
> 1) ن‎ (displayed as ن‎) (codepoint 1606, #o3106, #x646)
> 2)  ہ‎ (displayed as ہ‎) (codepoint 1729, #o3301, #x6c1)
> 3)  ی‎ (displayed as ی‎) (codepoint 1740, #o3314, #x6cc)
> 4) ں‎ (displayed as ں‎) (codepoint 1722, #o3272, #x6ba)
> 
> It should be displayed with all 4 characters joined together, instead they are all displayed individually.

What font displays them individually?  You should be able to tell that
if you type "C-u C-x =" on one of these characters.

For me, they display joined together.

> If I change to `NotoNastaliqUrdu` this word is displayed correctly. But there is problem with   حرف
> 
> It consist of three letters,
> 1) ح‎ (displayed as ح‎) (codepoint 1581, #o3055, #x62d)
> 2) ر‎ (displayed as ر‎) (codepoint 1585, #o3061, #x631)
> 3) ف‎ (displayed as ف‎) (codepoint 1601, #o3101, #x641)
> 
> The first two characters should be joined and the last one should be on its own. This seems to be the case.
> But the two groups are rendered on top of each other making it illegible.
> 
>  So isn't this a matter of finding a proper font, in particularly given
>  the "Nastaliq vs Naskh" issues?  NotoNastaliqUrdu is not the only font
>  supporting Nastaliq, so perhaps other fonts fare better?
>  
> My knowledge here is very deficient but my impression is Nastaliq and Naskh are styles and shouldn't affect
> composition.
> NotoNastaliqUrdu was the only Urdu font available from my distro.  Libreoffice which also uses harfbuzz
> renders it
> correctly so I didn't try another font at first. Like emacs libreoffice also uses a Naskh font by default but all the
> characters are joined properly.
> 
> I did try some fonts from https://urdufonts.net/ after your suggestions and they render correctly. Specifically
> the font I tried
> were: 
> Jameel Noori Nastaleeq Regular
> Alvi Nastaleeq 
> Zohra Unicode
> Manzor Unicode
> 
> I didn't notice a problem with any of them except a very minor one for the last two which have visible
> boundaries where glyphs
> are joined.  

So would it be correct to say that using a proper font solves the
problem?

>  Since Urdu uses the Arabic characters, Emacs uses character
>  composition rules for Arabic when displaying this text.  Do you know
>  if the composition rules for Urdu are different?
> 
> I think using Arabic composition rules might be part of the problem. Urdu alphabet is a superset of Arabic
> alphabet and if I
> don't set a font specifically designed for Urdu, the words where some characters should be joined but aren't
> always seem to
> include a character like ہ which is in Urdu alphabet but not in Arabic. 

I don't think the problem is with compositions, because in the 2
examples you described above, there are no character compositions.

Moreover, our pattern for asking HarfBuzz to shape Arabic text is
this:

   "[\u0600-\u074F\u200C\u200D]+"

which includes all of the characters, including U+06C1 which you say
causes problems.

You could try setting current-iso639-language to the symbol 'ur'
(without the quotes), that should tell HarfBuzz to shape the text as
appropriate for Urdu.  But I think the real problem is with the font,
not with shaping.




This bug report was last modified 2 years and 241 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.