GNU bug report logs - #19910
24.4; Japanese font names are decoded incorrectly in Cygwin's emacs-w32 in LANG=ja_JP.UTF-8

Previous Next

Package: emacs;

Reported by: Fujii Hironori <fujii.hironori <at> gmail.com>

Date: Fri, 20 Feb 2015 10:41:01 UTC

Severity: normal

Found in version 24.4

To reply to this bug, email your comments to 19910 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#19910; Package emacs. (Fri, 20 Feb 2015 10:41:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Fujii Hironori <fujii.hironori <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 20 Feb 2015 10:41:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Fujii Hironori <fujii.hironori <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.4; Japanese font names are decoded incorrectly in Cygwin's
 emacs-w32 in LANG=ja_JP.UTF-8
Date: Fri, 20 Feb 2015 19:39:55 +0900

(font-family-list) returns incorrectly decoded Japanese font names.
My locale-coding-system is utf-8-unix.

If I do (setq locale-coding-system 'cp932), it returns correct font names.
But, locale-coding-system is used in other places (e.g. M-x term and M-x man).
locale-coding-system must be utf-8 in my Emacs.



In GNU Emacs 24.4.1 (x86_64-unknown-cygwin)
 of 2015-02-13 on desktop-new
Windowing system distributor `Microsoft Corp.', version 6.1.7601
Configured using:
 `configure
 --srcdir=/home/kbrown/src/cygemacs/emacs-24.4-3.x86_64/src/emacs-24.4
 --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin
 --libexecdir=/usr/libexec --datadir=/usr/share --localstatedir=/var
 --sysconfdir=/etc --libdir=/usr/lib --datarootdir=/usr/share
 --docdir=/usr/share/doc/emacs --htmldir=/usr/share/doc/emacs/html -C
 --with-w32 'CFLAGS=-ggdb -O2 -pipe -Wimplicit-function-declaration
 -fdebug-prefix-map=/home/kbrown/src/cygemacs/emacs-24.4-3.x86_64/build=/usr/src/debug/emacs-24.4-3
 -fdebug-prefix-map=/home/kbrown/src/cygemacs/emacs-24.4-3.x86_64/src/emacs-24.4=/usr/src/debug/emacs-24.4-3'
 CPPFLAGS= LDFLAGS='

Important settings:
  value of $LANG: ja_JP.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Fundamental

Minor modes in effect:
  tooltip-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
<language-change> <help-echo> <help-echo> <help-echo>
<help-echo> <help-echo> <help-echo> <help-echo> <menu-bar>
<help-menu> <send-emacs-bug-report>

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
No docstring slot for setup-japanese-environment-internal

Load-path shadows:
None found.

Features:
(shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml
easymenu mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util help-fns mail-prsvr mail-utils time-date japan-util tooltip
electric uniquify ediff-hook vc-hooks lisp-float-type mwheel
w32-common-fns disp-table w32-win w32-vars tool-bar dnd fontset image
regexp-opt fringe tabulated-list newcomment lisp-mode prog-mode register
page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core frame cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew
greek romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer nadvice
loaddefs button faces cus-face macroexp files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote make-network-process dbusbind
gfilenotify w32 multi-tty emacs)

Memory information:
((conses 16 76005 6531)
 (symbols 48 17442 0)
 (miscs 40 60 88)
 (strings 32 10617 5193)
 (string-bytes 1 268020)
 (vectors 16 9545)
 (vector-slots 8 454592 39464)
 (floats 8 57 94)
 (intervals 56 193 0)
 (buffers 960 12))

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#19910; Package emacs. (Fri, 20 Feb 2015 11:22:01 GMT) Full text and rfc822 format available.

Message #8 received at 19910 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Fujii Hironori <fujii.hironori <at> gmail.com>
Cc: 19910 <at> debbugs.gnu.org
Subject: Re: bug#19910: 24.4;
 Japanese font names are decoded incorrectly in Cygwin's emacs-w32
 in	LANG=ja_JP.UTF-8
Date: Fri, 20 Feb 2015 13:21:18 +0200

> Date: Fri, 20 Feb 2015 19:39:55 +0900
> From: Fujii Hironori <fujii.hironori <at> gmail.com>
> 
> (font-family-list) returns incorrectly decoded Japanese font names.
> My locale-coding-system is utf-8-unix.
> 
> If I do (setq locale-coding-system 'cp932), it returns correct font names.
> But, locale-coding-system is used in other places (e.g. M-x term and M-x man).
> locale-coding-system must be utf-8 in my Emacs.

The problem is in w32font.c: it should call the "wide" (a.k.a.
"Unicode") APIs, and then decode strings using utf-16le, like we do in
w32fns.c with encoding strings we pass to w32 GUI APIs.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#19910; Package emacs. (Fri, 27 Feb 2015 15:23:02 GMT) Full text and rfc822 format available.

Message #11 received at 19910 <at> debbugs.gnu.org (full text, mbox):

From: Fujii Hironori <fujii.hironori <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 19910 <at> debbugs.gnu.org
Subject: Re: bug#19910: 24.4; Japanese font names are decoded incorrectly in
 Cygwin's emacs-w32 in LANG=ja_JP.UTF-8
Date: Sat, 28 Feb 2015 00:22:00 +0900

[Message part 1 (text/plain, inline)]

Tags: patch

On Fri, Feb 20, 2015 at 8:21 PM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> The problem is in w32font.c: it should call the "wide" (a.k.a.
> "Unicode") APIs, and then decode strings using utf-16le, like we do in
> w32fns.c with encoding strings we pass to w32 GUI APIs.

Unicode API patch is attached. Could you review it?
Should I use GetProcAddress for Windows 9x?

[font.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#19910; Package emacs. (Fri, 27 Feb 2015 16:04:02 GMT) Full text and rfc822 format available.

Message #14 received at 19910 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Fujii Hironori <fujii.hironori <at> gmail.com>
Cc: 19910 <at> debbugs.gnu.org
Subject: Re: bug#19910: 24.4;
 Japanese font names are decoded incorrectly in Cygwin's emacs-w32 in
 LANG=ja_JP.UTF-8
Date: Fri, 27 Feb 2015 18:03:46 +0200

> Date: Sat, 28 Feb 2015 00:22:00 +0900
> From: Fujii Hironori <fujii.hironori <at> gmail.com>
> Cc: 19910 <at> debbugs.gnu.org
> 
> On Fri, Feb 20, 2015 at 8:21 PM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> > The problem is in w32font.c: it should call the "wide" (a.k.a.
> > "Unicode") APIs, and then decode strings using utf-16le, like we do in
> > w32fns.c with encoding strings we pass to w32 GUI APIs.
> 
> Unicode API patch is attached. Could you review it?
> Should I use GetProcAddress for Windows 9x?

Thanks.

However, this goes too far: there's no need to replace all the
functions with "wide" versions, only those functions that return font
name strings from the system.  For example, I don't think
CreateFontIndirect needs to be switched to Unicode, does it?  And CRT
functions like _wcslwr and swprintf that work on wchar_t arguments
aren't supported on Windows 9X, AFAIK, so we cannot call them.  (One
reason for using the minimum number of "wide" APIs is that we don't
have good ways of testing the development code on Windows 9X.)

And yes, for Windows 9X you will need to call these functions through
function pointers, after assigning them with GetProcAddress, as
w32font.c does elsewhere.

I would actually suggest to have a Cygwin-only branches of the code,
where you can freely call the "wide" APIs without bothering about
Windows 9X, since that's what the Cygwin-w32 build does elsewhere, and
since this is a Cygwin-specific problem due to the difference between
file-name encoding and the locale emulated by Cygwin.  There are a
bunch of macros like GUI_STR and GUI_ENCODE_FILE near the end of
w32term.h that can be used to minimize #ifdef's to the absolute
minimum.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#19910; Package emacs. (Sat, 28 Feb 2015 12:15:03 GMT) Full text and rfc822 format available.

Message #17 received at 19910 <at> debbugs.gnu.org (full text, mbox):

From: Fujii Hironori <fujii.hironori <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 19910 <at> debbugs.gnu.org
Subject: Re: bug#19910: 24.4; Japanese font names are decoded incorrectly in
 Cygwin's emacs-w32 in LANG=ja_JP.UTF-8
Date: Sat, 28 Feb 2015 21:14:00 +0900

Thank you for reviewing my patch, Eli.

On Sat, Feb 28, 2015 at 1:03 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> However, this goes too far: there's no need to replace all the
> functions with "wide" versions, only those functions that return font
> name strings from the system.  For example, I don't think
> CreateFontIndirect needs to be switched to Unicode, does it?  And CRT
> functions like _wcslwr and swprintf that work on wchar_t arguments
> aren't supported on Windows 9X, AFAIK, so we cannot call them.  (One
> reason for using the minimum number of "wide" APIs is that we don't
> have good ways of testing the development code on Windows 9X.)

This is the code:

|    862  hfont = CreateFontIndirect (&logfont);
| (...)
|    912 = DECODE_SYSTEM (build_string (logfont.lfFaceName));

logfont.lfFaceName is ANSI text and DECODE_SYSTEM is the problem.
CreateFontIndirect should be wide.

> I would actually suggest to have a Cygwin-only branches of the code,
> where you can freely call the "wide" APIs without bothering about
> Windows 9X, since that's what the Cygwin-w32 build does elsewhere, and
> since this is a Cygwin-specific problem due to the difference between
> file-name encoding and the locale emulated by Cygwin.  There are a
> bunch of macros like GUI_STR and GUI_ENCODE_FILE near the end of
> w32term.h that can be used to minimize #ifdef's to the absolute
> minimum.

If this approach is used, structs such as LOGFONT and ENUMLOGFONTEX
should be ranemed to GUI_FN(LOGFONT) and GUI_FN(ENUMLOGFONTEX).
This looks ugly.

The best way to solve this is defining _UNICODE.
Defining _UNICODE is already filed, but closed as wontfix.

#265 - Build error with _UNICODE on w32. - GNU bug report logs
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=265

If Bug#265 is resolved, this bug (Bug#19910) will be resolved automatically.
And, _UNICODE macro can be used not only for Cygwin, but also NTEmacs.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#19910; Package emacs. (Sat, 28 Feb 2015 12:47:01 GMT) Full text and rfc822 format available.

Message #20 received at 19910 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Fujii Hironori <fujii.hironori <at> gmail.com>
Cc: 19910 <at> debbugs.gnu.org
Subject: Re: bug#19910: 24.4;
 Japanese font names are decoded incorrectly in Cygwin's emacs-w32 in
 LANG=ja_JP.UTF-8
Date: Sat, 28 Feb 2015 14:46:19 +0200

> Date: Sat, 28 Feb 2015 21:14:00 +0900
> From: Fujii Hironori <fujii.hironori <at> gmail.com>
> Cc: 19910 <at> debbugs.gnu.org
> 
> > I would actually suggest to have a Cygwin-only branches of the code,
> > where you can freely call the "wide" APIs without bothering about
> > Windows 9X, since that's what the Cygwin-w32 build does elsewhere, and
> > since this is a Cygwin-specific problem due to the difference between
> > file-name encoding and the locale emulated by Cygwin.  There are a
> > bunch of macros like GUI_STR and GUI_ENCODE_FILE near the end of
> > w32term.h that can be used to minimize #ifdef's to the absolute
> > minimum.
> 
> If this approach is used, structs such as LOGFONT and ENUMLOGFONTEX
> should be ranemed to GUI_FN(LOGFONT) and GUI_FN(ENUMLOGFONTEX).
> This looks ugly.

We use it in quite a few places in Emacs, so ugly or not, this is a
kind of de-facto standard for resolving these issues.  More
importantly, it doesn't run the risk of breaking Emacs on Windows 9X.

> The best way to solve this is defining _UNICODE.
> Defining _UNICODE is already filed, but closed as wontfix.
> 
> #265 - Build error with _UNICODE on w32. - GNU bug report logs
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=265
> 
> If Bug#265 is resolved, this bug (Bug#19910) will be resolved automatically.
> And, _UNICODE macro can be used not only for Cygwin, but also NTEmacs.

Most, if not all, of the issues which could motivate someone to use
_UNICODE were meanwhile fixed, so reviving that now makes very little
sense.  In particular, the native Windows build already uses the
Unicode APIs wherever feasible.  (The particular issue discussed in
this thread doesn't exist in the native build, AFAIU, because
DECODE_SYSTEM does its job there.)

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#19910; Package emacs. (Sun, 01 Dec 2019 08:23:02 GMT) Full text and rfc822 format available.

Message #23 received at 19910 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Fujii Hironori <fujii.hironori <at> gmail.com>, 19910 <at> debbugs.gnu.org
Subject: Re: bug#19910: 24.4; Japanese font names are decoded incorrectly in
 Cygwin's emacs-w32 in	LANG=ja_JP.UTF-8
Date: Sun, 01 Dec 2019 09:21:56 +0100

Eli Zaretskii <eliz <at> gnu.org> writes:

>> Date: Fri, 20 Feb 2015 19:39:55 +0900
>> From: Fujii Hironori <fujii.hironori <at> gmail.com>
>> 
>> (font-family-list) returns incorrectly decoded Japanese font names.
>> My locale-coding-system is utf-8-unix.
>> 
>> If I do (setq locale-coding-system 'cp932), it returns correct font names.
>> But, locale-coding-system is used in other places (e.g. M-x term and M-x man).
>> locale-coding-system must be utf-8 in my Emacs.
>
> The problem is in w32font.c: it should call the "wide" (a.k.a.
> "Unicode") APIs, and then decode strings using utf-16le, like we do in
> w32fns.c with encoding strings we pass to w32 GUI APIs.

That was 5 years ago.  Is any of this still an issue on recent
versions of Emacs?

Best regards,
Stefan Kangas

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#19910; Package emacs. (Sun, 01 Dec 2019 17:37:02 GMT) Full text and rfc822 format available.

Message #26 received at 19910 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: fujii.hironori <at> gmail.com, 19910 <at> debbugs.gnu.org
Subject: Re: bug#19910: 24.4; Japanese font names are decoded incorrectly in
 Cygwin's emacs-w32 in	LANG=ja_JP.UTF-8
Date: Sun, 01 Dec 2019 19:35:48 +0200

> From: Stefan Kangas <stefan <at> marxist.se>
> Cc: Fujii Hironori <fujii.hironori <at> gmail.com>,  19910 <at> debbugs.gnu.org
> Date: Sun, 01 Dec 2019 09:21:56 +0100
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> >> Date: Fri, 20 Feb 2015 19:39:55 +0900
> >> From: Fujii Hironori <fujii.hironori <at> gmail.com>
> >> 
> >> (font-family-list) returns incorrectly decoded Japanese font names.
> >> My locale-coding-system is utf-8-unix.
> >> 
> >> If I do (setq locale-coding-system 'cp932), it returns correct font names.
> >> But, locale-coding-system is used in other places (e.g. M-x term and M-x man).
> >> locale-coding-system must be utf-8 in my Emacs.
> >
> > The problem is in w32font.c: it should call the "wide" (a.k.a.
> > "Unicode") APIs, and then decode strings using utf-16le, like we do in
> > w32fns.c with encoding strings we pass to w32 GUI APIs.
> 
> That was 5 years ago.  Is any of this still an issue on recent
> versions of Emacs?

I don't think anything's changed in that department, so the problem
should still be there.

However, I have an idea of a much simpler fix for this, but I need a
volunteer who has this problem to test a patch I'd like to write to
fix this.  Anyone?

This bug report was last modified 5 years and 252 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #19910 24.4; Japanese font names are decoded incorrectly in Cygwin's emacs-w32 in LANG=ja_JP.UTF-8

GNU bug report logs - #19910
24.4; Japanese font names are decoded incorrectly in Cygwin's emacs-w32 in LANG=ja_JP.UTF-8