GNU bug report logs - #79296
30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config

Previous Next

Package: emacs;

Reported by: Shingo Tanaka <shingo.fg8 <at> gmail.com>

Date: Sun, 24 Aug 2025 02:17:02 UTC

Severity: normal

Found in version 30.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Full log


Message #50 received at 79296 <at> debbugs.gnu.org (full text, mbox):

From: Bruno Haible <bruno <at> clisp.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79296 <at> debbugs.gnu.org, corwin <at> bru.st, shingo.fg8 <at> gmail.com,
 eggert <at> cs.ucla.edu
Subject: Re: bug#79296: 30.2;
 format-time-string returns wrongly encoded string in MS Windows Japanese with
 cp65001 beta config
Date: Wed, 27 Aug 2025 02:05:42 +0200
Eli Zaretskii wrote:
> Any details beyond that general consideration?  Are you saying that
> MSVCRT doesn't support codepage 65001 as a codeset of a locale,
> whereas UCRT does?  Do the tests you wrote fail when linked with
> MSVCRT?

Tried it now: running that unit test in the Windows UTF-8 environment, linked
against MSVCRT:

  * GetACP() returns 65001. Which is not surprising, since GetACP() is a
    Windows API, not a libc API.

  * setlocale (LC_ALL, "") fails. [This was the Gnulib setlocale() override.
    I assume the MSVCRT setlocale failed in the same way.]

  * If you ignore the setlocale failure, MB_CUR_MAX is not >= 4. Meaning
    that the locale encoding is not UTF-8.

MSVCRT supports only MB_CUR_MAX == 1 or == 2.

Looking at the output of "dumpbin /imports emacs.exe, I see that the Emacs
binary uses the following symbols from MSVCRT:

                          6C ___lc_codepage_func
                          6F ___mb_cur_max_func
                         188 _getmbcp
                         240 _mbschr
                         252 _mbsinc
                         256 _mbslwr
                         27A _mbsncpy
                         27E _mbsnextc
                         28C _mbspbrk
                         28E _mbsrchr
                         302 _snprintf
                         33C _stricmp
                         343 _strlwr
                         34A _strnicmp
                         4B1 fprintf
                         4D4 isalpha
                         4DC isspace
                         4EB isxdigit
                         4EF localeconv
                         51E setlocale
                         534 strerror
                         535 strftime
                         556 tolower
                         557 toupper
                         55D vfprintf

Most of these are sensitive to the locale encoding and therefore
will not produce the expected results for an UTF-8 environment.

Additionally, the Emacs binary uses several DLLs, some of which
also use locale-aware functions from libc. These DLLs will not
work as expected either.

So, the only reasonable way forward, for supporting the Windows UTF-8
environment, is to produce two sets of binaries for Emacs:
  - one set of .exe and .dlls linked with MSVCRT, for use on old
    Windows versions,
  - one set of .exe and .dlls linked with UCRT, for use on Windows
    versions from 2019 or newer [1].

For producing such binaries with only Free Software (no MSVC compiler,
no MSVC header files) one can use MSYS2. For a year or two already
it supports two target environments:
  - mingw-w64 with MSVCRT,
  - mingw-w64 with UCRT.
These two development environments are very similar, which means that
the Makefile will need very few adapations.

Bruno

[1] https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page







This bug report was last modified 21 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.