GNU bug report logs - #79296
30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config

Previous Next

Package: emacs;

Reported by: Shingo Tanaka <shingo.fg8 <at> gmail.com>

Date: Sun, 24 Aug 2025 02:17:02 UTC

Severity: normal

Found in version 30.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Full log


Message #8 received at 79296 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Shingo Tanaka <shingo.fg8 <at> gmail.com>,
 Paul Eggert <eggert <at> cs.ucla.edu>, Bruno Haible <bruno <at> clisp.org>
Cc: 79296 <at> debbugs.gnu.org
Subject: Re: bug#79296: 30.2;
 format-time-string returns wrongly encoded string in MS Windows
 Japanese with cp65001 beta config
Date: Sun, 24 Aug 2025 09:15:59 +0300
> Date: Sun, 24 Aug 2025 11:16:05 +0900
> From: Shingo Tanaka <shingo.fg8 <at> gmail.com>
> 
> MS Windows has a utf-8 configuration with setting "Beta: Use Unicode UTF-8 for
> worldwide language support" to on in its language settings.  In Japanese
> environment, system coding is cp932 (MS version of Japanese SJIS) without the
> setting but becomes cp65001 (utf-8) with the setting on.
> 
> Emacs looks like successfully detecting the change because locale-coding-system
> gets from cp932 to cp65001 as expected, and working as expected overall.
> 
> However, I found a bug that format-time-string returns a day of the week string
> encoded wrongly - with cp932 even in locale-coding-system is cp65001.
> 
> Conditions:
> - Windows 11 Pro 24H2 with Japanese setting and "Beta: Use Unicode UTF-8
>   for worldwide language support" on
> - Emacs 30.2 (latest, https://ftp.gnu.org/gnu/emacs/windows/emacs-30/)
> 
> Here is how to reproduce.
> 1. Run Emacs with no-init-file
> 2. Go to *scratch* buffer and evaluate:
>    (format-time-string "%y,%d,%m %A" (date-to-time (concat "2025,1,Jan")))
> 3. You will get below wrongly encoded string:
>    "25,01,01 \220\205\227j\223\372"
> 4. evaluate:
>    (decode-coding-string
>     (format-time-string "%y,%d,%m %A" (date-to-time (concat "2025,1,Jan")))
>     'cp932)
> 5. You will get below correctly decoded string:
>    #("25,01,01 水曜日" 9 12 (charset cp932-2-byte))
> 
> This issue doesn't happen when "Beta: Use Unicode UTF-8 for worldwide language
> support" off (locale-coding-system is cp932).

Thanks.

I think this is an issue with Gnulib, whose nstrftime function we use
to format the time in Emacs: it seems to produce time strings encoded
in cp932 even though the UTF-8 support is turned on on MS-Windows.
I've added the Gnulib folks to the discussion.

Bruno and Paul, does Gnulib's nstrftime support the UTF-8 system
codepage on MS-Windows?  I see some COMPILE_WIDE preprocessor
conditions in the source, but it is not clear to me whether it is
necessary for Unicode support, and whether strftime (as opposed to
wcsftime) from the Windows C runtime properly supports this "beta
feature".  Do you happen to know?

The "\220\205\227j\223\372" bytestream shown above is AFAICT the
correct text properly encoded in cp932, so if we cannot get Windows
and Gnulib to produce a UTF-8 string in this case, we might need as
the last resort to use cp932 when decoding time strings, even if
locale-coding-system is UTF-8 on MS-Windows.

Shingo Tanaka, could you please tell what is the value of
w32-multibyte-code-page on your system, both when "Beta: Use Unicode
UTF-8 for worldwide language support" is ON and when it is OFF?




This bug report was last modified 21 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.