GNU bug report logs - #79296
30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config

Previous Next

Package: emacs;

Reported by: Shingo Tanaka <shingo.fg8 <at> gmail.com>

Date: Sun, 24 Aug 2025 02:17:02 UTC

Severity: normal

Found in version 30.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Full log


View this message in rfc822 format

From: Bruno Haible <bruno <at> clisp.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79296 <at> debbugs.gnu.org, corwin <at> bru.st, shingo.fg8 <at> gmail.com, eggert <at> cs.ucla.edu
Subject: bug#79296: 30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config
Date: Tue, 26 Aug 2025 23:08:16 +0200
Eli Zaretskii wrote:
> Emacs by default calls setlocale with the argument of "", thus setting
> up to use the default system locale.

OK.

> Are you saying that a call like
> 
>   setlocale (LC_TIME, "");
> 
> is insufficient to force UTF-8 encoding of time-related strings, on
> MS-Windows with the UTF-8 system-codepage feature turned on?

No, with the Windows UCRT libc and the enabled UTF-8 setting/checkbox
this is enough to get nstrftime() to produce UTF-8 encoded output.
That's what I can infer by playing with variations of my unit test.

On GNU systems, you will also need
   setlocale (LC_CTYPE, "");
because glibc requires that the LC_TIME and LC_CTYPE categories specify
the same encoding. (This is a kind of sanity check in glibc.)

> Can you
> try running your tests with a locale of "" and see if the codeset is
> set to UTF-8 or codepage 65001?

If I use
  setlocale (LC_ALL, "");
instead of just
  setlocale (LC_TIME, "");
then - again, in UCRT only - MB_CUR_MAX gets set to >= 4, which indicates
an UTF-8 encoding.

Even without a setlocale invocation, GetACP() returns 65001, since that's the
direct effect of the UTF-8 setting/checkbox.

> > Microsoft's UCRT has many changes compared to MSVCRT, probably worth of 10 years
> > of development. Support for the UTF-8 environment is certainly only one of
> > the many improvements.
> 
> Any details beyond that general consideration?  Are you saying that
> MSVCRT doesn't support codepage 65001 as a codeset of a locale,
> whereas UCRT does?

Yes, that's what I'm saying. With MSVCRT, there is no way to get a MB_CUR_MAX
value > 2. Which means, no UTF-8 support.

> Do the tests you wrote fail when linked with MSVCRT?

Yes, the tests already fail at the 'MB_CUR_MAX >= 4' assertion when linked
with MSVCRT.

Bruno







This bug report was last modified 21 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.