#31315 - wrong font encoding for fallback font

GNU bug report logs - #31315
wrong font encoding for fallback font

Package: emacs;

Reported by: Werner LEMBERG <wl <at> gnu.org>

Date: Mon, 30 Apr 2018 07:22:01 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Message #53 received at 31315 <at> debbugs.gnu.org (full text, mbox):

From: Werner LEMBERG <wl <at> gnu.org> To: eliz <at> gnu.org Cc: handa <at> gnu.org, 31315 <at> debbugs.gnu.org Subject: Re: bug#31315: wrong font encoding for fallback font Date: Thu, 03 May 2018 07:52:27 +0200 (CEST)

> If by "xft" you mean the part of the X libraries that supports the > APIs used by xfont.c, then I think we are on the same page now. OK. >> While this is correct for other CJK encodings like GB, JIS, KSC, or >> Big5, it is *not* true for GB18030. This is *only* an encoding and >> *not* a charset! It is simply another representation of Unicode, >> comparable to UTF-8 or UCS4. There doesn't exist a single font >> natively encoded in GB18030! This encoding only exists to be >> code-wise backward compatible with GB 2312. > > Maybe so, but GB18030 is a Chinese encoding, and as such it behaves > in Emacs as all the other Chinese encodings. I know, and I agree. BUT! xft doesn't do what Emacs expects. *Any* font that covers the whole BMP (in particular, the whole CJK part of it) gets a `GB18030' tag from xft. In other words, the `Chinese' property isn't in the font from the very beginning.[*] > Emacs employs that logic for every charset it has defined, including > Latin-2, for example: if text was decoded from an encoding which > supports a particular charset, Emacs puts the corresponding > 'charset' text property on the decoded text, and the machinery which > selects the appropriate font tries first to find a font which > supports that charset. The idea is that users in a particular > culture have certain distinct preferences wrt fonts, and that an > encoding that supports a certain charset or culture provides a hint > about those preferences. This idea is very central in how Emacs > selects fonts. Being the FreeType maintainer, and having co-developed Emacs's internal buffer encoding scheme many, many years ago, I all know this. I can only repeat that Emacs might tag a certain text with GB18030 so that the user can deduce a Chinese origin. However, there is *no* guarantee that the user gets a Chinese-flavoured font – at least not from the xft interface.[**] As a corollary, it is fully sufficient for xft to handle GB18030 equal to Unicode (i.e., `iso10646'). Werner [*] Actually, having Unicode fonts that provide CJK glyphs for the whole BMP completely spoils Emacs's font selection scheme based on charsets – as shown in one of my previous e-mails, xft provides all common CJK encodings for such fonts because Unicode is a superset of those encodings. [**] If, say, the Pango font interface is used instead to access a modern CJK OpenType font, Emacs might request `script=hani, lang=ZHS' if it encounters GB18030 to resolve Unicode's Han unification, ensuring simplified Chinese glyph representation forms.

This bug report was last modified 7 years and 79 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #31315 wrong font encoding for fallback font

GNU bug report logs - #31315
wrong font encoding for fallback font