GNU bug report logs - #19993
25.0.50; Unicode fonts defective on Windows

Previous Next

Package: emacs;

Reported by: Ilya Zakharevich <nospam-abuse <at> ilyaz.org>

Date: Tue, 3 Mar 2015 22:32:01 UTC

Severity: normal

Found in version 25.0.50

Full log


View this message in rfc822 format

From: Ilya Zakharevich <ilya <at> math.berkeley.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 19993 <at> debbugs.gnu.org
Subject: bug#19993: 25.0.50; Unicode fonts defective on Windows
Date: Sat, 7 Mar 2015 23:41:58 -0800
On Sat, Mar 07, 2015 at 10:14:16AM +0200, Eli Zaretskii wrote:
> >   What can it mean that a font “supports a script”?
> > 
> > Theoretically, it may mean that
> >   • it “knows” all the characters in the script, and
> >   • has enough extra infrastructure to shape these characters
> >     into a correct glyphic representation.
> > 
> > I may see that the second part may be described by one bit per
> > script.  But what about the first one?  A repertoir of a script
> > changes every year (sometimes several times per year).  How can this
> > be encapsulated into a bit?
> 
> All I know about this is what the MSDN documentation says:
> 
>   FONTSIGNATURE structure
> 
>   Contains information identifying the code pages and Unicode subranges
>   for which a given font provides glyphs.
>   [...]
>   Members
> 
>   fsUsb
> 
>       A 128-bit Unicode subset bitfield (USB) identifying up to 126
>       Unicode subranges. Each bit, except the two most significant bits,
>       represents a single subrange. The most significant bit is always 1
>       and identifies the bitfield as a font signature; the second most
>       significant bit is reserved and must be 0. Unicode subranges are
>       numbered in accordance with the ISO 10646 standard. For more
>       information, see Unicode Subset Bitfields.

So this bits “identify” a subrange.  Of course, nothing is said about
what this actually MEANS.  So I did an experiment: Cour.ttf.

The following subrange is “identified”:

  9 	  0400 - 04FF	       Cyrillic
  	  0500 - 052F	       Cyrillic Supplement
	  2DE0 - 2DFF	       Cyrillic Extended-A
 	  A640 - A69F	       Cyrillic Extended-B

What is actually supported:

  0400 - 04FF 	 Everything but 04d8,04d9 (Schwa, used in Cyrillic Azeri — but contemporary Azeri is written in Latin) 
  0500 - 052F	 Only 0500 - 0513, 051a - 051d supported
  2DE0 - 2DFF	 None supported (5.1)
  A640 - A69F	 None supported (5.1 and later)

Looking in DerivedAge.txt:

   04D0..04EB    ; 1.1 #  [28] CYRILLIC CAPITAL LETTER A WITH BREVE..CYRILLIC SMALL LETTER BARRED O WITH DIAERESIS

   0500..050F    ; 3.2 #  [16] CYRILLIC CAPITAL LETTER KOMI DE..CYRILLIC SMALL LETTER KOMI TJE
   0510..0513    ; 5.0 #   [4] CYRILLIC CAPITAL LETTER REVERSED ZE..CYRILLIC SMALL LETTER EL WITH HOOK
   0514..0523    ; 5.1 #  [16] CYRILLIC CAPITAL LETTER LHA..CYRILLIC SMALL LETTER EN WITH MIDDLE HOOK

So two characters of 1.1 are not supported, all characters of 3.2 and 5.0 are
supported, and part of 5.1 is supported.

Does it look like a good indication of anything?  I would say no… Do
you know any other tool looking at this bitmap for choosing which font
to pick up for a particular character?

Ilya




This bug report was last modified 10 years and 155 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.