#79296 - 30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config

GNU bug report logs - #79296
30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config

Package: emacs;

Reported by: Shingo Tanaka <shingo.fg8 <at> gmail.com>

Date: Sun, 24 Aug 2025 02:17:02 UTC

Severity: normal

Found in version 30.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Message #8 received at 79296 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org> To: Shingo Tanaka <shingo.fg8 <at> gmail.com>, Paul Eggert <eggert <at> cs.ucla.edu>, Bruno Haible <bruno <at> clisp.org> Cc: 79296 <at> debbugs.gnu.org Subject: Re: bug#79296: 30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config Date: Sun, 24 Aug 2025 09:15:59 +0300

> Date: Sun, 24 Aug 2025 11:16:05 +0900 > From: Shingo Tanaka <shingo.fg8 <at> gmail.com> > > MS Windows has a utf-8 configuration with setting "Beta: Use Unicode UTF-8 for > worldwide language support" to on in its language settings. In Japanese > environment, system coding is cp932 (MS version of Japanese SJIS) without the > setting but becomes cp65001 (utf-8) with the setting on. > > Emacs looks like successfully detecting the change because locale-coding-system > gets from cp932 to cp65001 as expected, and working as expected overall. > > However, I found a bug that format-time-string returns a day of the week string > encoded wrongly - with cp932 even in locale-coding-system is cp65001. > > Conditions: > - Windows 11 Pro 24H2 with Japanese setting and "Beta: Use Unicode UTF-8 > for worldwide language support" on > - Emacs 30.2 (latest, https://ftp.gnu.org/gnu/emacs/windows/emacs-30/) > > Here is how to reproduce. > 1. Run Emacs with no-init-file > 2. Go to *scratch* buffer and evaluate: > (format-time-string "%y,%d,%m %A" (date-to-time (concat "2025,1,Jan"))) > 3. You will get below wrongly encoded string: > "25,01,01 \220\205\227j\223\372" > 4. evaluate: > (decode-coding-string > (format-time-string "%y,%d,%m %A" (date-to-time (concat "2025,1,Jan"))) > 'cp932) > 5. You will get below correctly decoded string: > #("25,01,01 水曜日" 9 12 (charset cp932-2-byte)) > > This issue doesn't happen when "Beta: Use Unicode UTF-8 for worldwide language > support" off (locale-coding-system is cp932). Thanks. I think this is an issue with Gnulib, whose nstrftime function we use to format the time in Emacs: it seems to produce time strings encoded in cp932 even though the UTF-8 support is turned on on MS-Windows. I've added the Gnulib folks to the discussion. Bruno and Paul, does Gnulib's nstrftime support the UTF-8 system codepage on MS-Windows? I see some COMPILE_WIDE preprocessor conditions in the source, but it is not clear to me whether it is necessary for Unicode support, and whether strftime (as opposed to wcsftime) from the Windows C runtime properly supports this "beta feature". Do you happen to know? The "\220\205\227j\223\372" bytestream shown above is AFAICT the correct text properly encoded in cp932, so if we cannot get Windows and Gnulib to produce a UTF-8 string in this case, we might need as the last resort to use cp932 when decoding time strings, even if locale-coding-system is UTF-8 on MS-Windows. Shingo Tanaka, could you please tell what is the value of w32-multibyte-code-page on your system, both when "Beta: Use Unicode UTF-8 for worldwide language support" is ON and when it is OFF?

This bug report was last modified 21 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #79296 30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config

GNU bug report logs - #79296
30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config