#79296 - 30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config

GNU bug report logs - #79296
30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config

Package: emacs;

Reported by: Shingo Tanaka <shingo.fg8 <at> gmail.com>

Date: Sun, 24 Aug 2025 02:17:02 UTC

Severity: normal

Found in version 30.2

Done: Eli Zaretskii <eliz <at> gnu.org>

View this message in rfc822 format

From: Bruno Haible <bruno <at> clisp.org> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 79296 <at> debbugs.gnu.org, corwin <at> bru.st, shingo.fg8 <at> gmail.com, eggert <at> cs.ucla.edu Subject: bug#79296: 30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config Date: Tue, 26 Aug 2025 23:08:16 +0200

Eli Zaretskii wrote: > Emacs by default calls setlocale with the argument of "", thus setting > up to use the default system locale. OK. > Are you saying that a call like > > setlocale (LC_TIME, ""); > > is insufficient to force UTF-8 encoding of time-related strings, on > MS-Windows with the UTF-8 system-codepage feature turned on? No, with the Windows UCRT libc and the enabled UTF-8 setting/checkbox this is enough to get nstrftime() to produce UTF-8 encoded output. That's what I can infer by playing with variations of my unit test. On GNU systems, you will also need setlocale (LC_CTYPE, ""); because glibc requires that the LC_TIME and LC_CTYPE categories specify the same encoding. (This is a kind of sanity check in glibc.) > Can you > try running your tests with a locale of "" and see if the codeset is > set to UTF-8 or codepage 65001? If I use setlocale (LC_ALL, ""); instead of just setlocale (LC_TIME, ""); then - again, in UCRT only - MB_CUR_MAX gets set to >= 4, which indicates an UTF-8 encoding. Even without a setlocale invocation, GetACP() returns 65001, since that's the direct effect of the UTF-8 setting/checkbox. > > Microsoft's UCRT has many changes compared to MSVCRT, probably worth of 10 years > > of development. Support for the UTF-8 environment is certainly only one of > > the many improvements. > > Any details beyond that general consideration? Are you saying that > MSVCRT doesn't support codepage 65001 as a codeset of a locale, > whereas UCRT does? Yes, that's what I'm saying. With MSVCRT, there is no way to get a MB_CUR_MAX value > 2. Which means, no UTF-8 support. > Do the tests you wrote fail when linked with MSVCRT? Yes, the tests already fail at the 'MB_CUR_MAX >= 4' assertion when linked with MSVCRT. Bruno

This bug report was last modified 21 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #79296 30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config

GNU bug report logs - #79296
30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config