GNU bug report logs - #79296
30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config

Previous Next

Package: emacs;

Reported by: Shingo Tanaka <shingo.fg8 <at> gmail.com>

Date: Sun, 24 Aug 2025 02:17:02 UTC

Severity: normal

Found in version 30.2

Done: Eli Zaretskii <eliz <at> gnu.org>

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Bruno Haible <bruno <at> clisp.org>, corwin <at> bru.st, shingo.fg8 <at> gmail.com
Cc: 79296 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
Subject: bug#79296: 30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config
Date: Tue, 26 Aug 2025 14:18:34 +0300
> From: Bruno Haible <bruno <at> clisp.org>
> Cc: shingo.fg8 <at> gmail.com, eggert <at> cs.ucla.edu, 79296 <at> debbugs.gnu.org
> Date: Tue, 26 Aug 2025 00:34:16 +0200
> 
> > * Actions:
> >   - Bruno: Add a unit test for nstrftime in w32utf8 mode.
> 
> Done. The test verifies that nstrftime produces the Japanese weekday
> in UTF-8 encoding. It passes, provided the locale name used is
> "Japanese_Japan.65001", *not* "Japanese_Japan.932".

Thanks.  See below about that, in the context of Emacs.

> > > * Hypothesis 2:
> > >   The Gnulib support included in Emacs 30.2 misses the commits
> > >   https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=927a70e0853345315570f051fd6996cfeb7b4d96
> > >   https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=9f7ff4f423cd805866cd4edef806c32393621df0
> > >   https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=00211fc69c926d6c8f6e3f3cf1d8802623db2af9
> > >   https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=8e795a8d9f8c3269a3d30d0d1adbaf0ea9ad4a84
> > 
> > These commits are in Gnulib files that are not used in Emacs.  What
> > are their effects on the issue at hand, which is the non-ASCII strings
> > produced by Gnulib's nstrftime?
> 
> That's most likely the problem, then. For Emacs, the third commit should be
> the essential one: It forces a setlocale() argument that ends in ".65001",
> thus telling the Microsoft UCRT that you want the UTF-8 environment.

Emacs by default calls setlocale with the argument of "", thus setting
up to use the default system locale.  Are you saying that a call like

  setlocale (LC_TIME, "");

is insufficient to force UTF-8 encoding of time-related strings, on
MS-Windows with the UTF-8 system-codepage feature turned on?  Can you
try running your tests with a locale of "" and see if the codeset is
set to UTF-8 or codepage 65001?

> > >   - This UTF-8 system codepage is only supported with Microsoft UCRT, not
> > >     with the MSVCRT. At compile time, this configuration can be tested via
> > >     '#ifdef _UCRT'. (This is true for both the mingw and the MSVC toolchains.)
> > 
> > What is it in UCRT that is required for Gnulib to support the UTF-8
> > system codepage on Windows, in particular for strftime?  IOW, what
> > does the UCRT implementation of libc does that the MSVCRT one doesn't,
> > that affects this aspect of Gnulib's strftime?
> 
> Microsoft's UCRT has many changes compared to MSVCRT, probably worth of 10 years
> of development. Support for the UTF-8 environment is certainly only one of
> the many improvements.

Any details beyond that general consideration?  Are you saying that
MSVCRT doesn't support codepage 65001 as a codeset of a locale,
whereas UCRT does?  Do the tests you wrote fail when linked with
MSVCRT?

> So, the remaining hypotheses are:
> 
> * Hypothesis 2:
>   The string that Emacs passes to the setlocale() function does not end in ".65001".

AFAIU, it shouldn't, not if Windows does TRT with the default locale
when the UTF-8 option is turned on.

However, since this is Emacs, Shingo Tanaka could test this by setting
the Lisp variable system-time-locale to the string
"Japanese_Japan.65001" and repeating the test presented at the
beginning of this discussion.  Assuming that the build is a UCRT build
(Corwin?), this should fix the problem, if your analysis is correct.

> * Hypothesis 3:
>   The Emacs 30.2 binaries are linked with MSVCRT, not with UCRT.
>   -> Corwin?

Corwin?




This bug report was last modified 21 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.