#79296 - 30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config

GNU bug report logs - #79296
30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config

Package: emacs;

Reported by: Shingo Tanaka <shingo.fg8 <at> gmail.com>

Date: Sun, 24 Aug 2025 02:17:02 UTC

Severity: normal

Found in version 30.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Message #50 received at 79296 <at> debbugs.gnu.org (full text, mbox):

From: Bruno Haible <bruno <at> clisp.org> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 79296 <at> debbugs.gnu.org, corwin <at> bru.st, shingo.fg8 <at> gmail.com, eggert <at> cs.ucla.edu Subject: Re: bug#79296: 30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config Date: Wed, 27 Aug 2025 02:05:42 +0200

Eli Zaretskii wrote: > Any details beyond that general consideration? Are you saying that > MSVCRT doesn't support codepage 65001 as a codeset of a locale, > whereas UCRT does? Do the tests you wrote fail when linked with > MSVCRT? Tried it now: running that unit test in the Windows UTF-8 environment, linked against MSVCRT: * GetACP() returns 65001. Which is not surprising, since GetACP() is a Windows API, not a libc API. * setlocale (LC_ALL, "") fails. [This was the Gnulib setlocale() override. I assume the MSVCRT setlocale failed in the same way.] * If you ignore the setlocale failure, MB_CUR_MAX is not >= 4. Meaning that the locale encoding is not UTF-8. MSVCRT supports only MB_CUR_MAX == 1 or == 2. Looking at the output of "dumpbin /imports emacs.exe, I see that the Emacs binary uses the following symbols from MSVCRT: 6C ___lc_codepage_func 6F ___mb_cur_max_func 188 _getmbcp 240 _mbschr 252 _mbsinc 256 _mbslwr 27A _mbsncpy 27E _mbsnextc 28C _mbspbrk 28E _mbsrchr 302 _snprintf 33C _stricmp 343 _strlwr 34A _strnicmp 4B1 fprintf 4D4 isalpha 4DC isspace 4EB isxdigit 4EF localeconv 51E setlocale 534 strerror 535 strftime 556 tolower 557 toupper 55D vfprintf Most of these are sensitive to the locale encoding and therefore will not produce the expected results for an UTF-8 environment. Additionally, the Emacs binary uses several DLLs, some of which also use locale-aware functions from libc. These DLLs will not work as expected either. So, the only reasonable way forward, for supporting the Windows UTF-8 environment, is to produce two sets of binaries for Emacs: - one set of .exe and .dlls linked with MSVCRT, for use on old Windows versions, - one set of .exe and .dlls linked with UCRT, for use on Windows versions from 2019 or newer [1]. For producing such binaries with only Free Software (no MSVC compiler, no MSVC header files) one can use MSYS2. For a year or two already it supports two target environments: - mingw-w64 with MSVCRT, - mingw-w64 with UCRT. These two development environments are very similar, which means that the Makefile will need very few adapations. Bruno [1] https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page

This bug report was last modified 21 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #79296 30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config

GNU bug report logs - #79296
30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config