GNU bug report logs -
#79296
30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config
Previous Next
To reply to this bug, email your comments to 79296 AT debbugs.gnu.org.
There is no need to reopen the bug first.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Sun, 24 Aug 2025 02:17:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Shingo Tanaka <shingo.fg8 <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Sun, 24 Aug 2025 02:17:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
MS Windows has a utf-8 configuration with setting "Beta: Use Unicode UTF-8 for
worldwide language support" to on in its language settings. In Japanese
environment, system coding is cp932 (MS version of Japanese SJIS) without the
setting but becomes cp65001 (utf-8) with the setting on.
Emacs looks like successfully detecting the change because locale-coding-system
gets from cp932 to cp65001 as expected, and working as expected overall.
However, I found a bug that format-time-string returns a day of the week string
encoded wrongly - with cp932 even in locale-coding-system is cp65001.
Conditions:
- Windows 11 Pro 24H2 with Japanese setting and "Beta: Use Unicode UTF-8
for worldwide language support" on
- Emacs 30.2 (latest, https://ftp.gnu.org/gnu/emacs/windows/emacs-30/)
Here is how to reproduce.
1. Run Emacs with no-init-file
2. Go to *scratch* buffer and evaluate:
(format-time-string "%y,%d,%m %A" (date-to-time (concat "2025,1,Jan")))
3. You will get below wrongly encoded string:
"25,01,01 \220\205\227j\223\372"
4. evaluate:
(decode-coding-string
(format-time-string "%y,%d,%m %A" (date-to-time (concat "2025,1,Jan")))
'cp932)
5. You will get below correctly decoded string:
#("25,01,01 水曜日" 9 12 (charset cp932-2-byte))
This issue doesn't happen when "Beta: Use Unicode UTF-8 for worldwide language
support" off (locale-coding-system is cp932).
If you have any question or need further information, please let me know.
Regards,
Shingo
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Sun, 24 Aug 2025 06:17:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 79296 <at> debbugs.gnu.org (full text, mbox):
> Date: Sun, 24 Aug 2025 11:16:05 +0900
> From: Shingo Tanaka <shingo.fg8 <at> gmail.com>
>
> MS Windows has a utf-8 configuration with setting "Beta: Use Unicode UTF-8 for
> worldwide language support" to on in its language settings. In Japanese
> environment, system coding is cp932 (MS version of Japanese SJIS) without the
> setting but becomes cp65001 (utf-8) with the setting on.
>
> Emacs looks like successfully detecting the change because locale-coding-system
> gets from cp932 to cp65001 as expected, and working as expected overall.
>
> However, I found a bug that format-time-string returns a day of the week string
> encoded wrongly - with cp932 even in locale-coding-system is cp65001.
>
> Conditions:
> - Windows 11 Pro 24H2 with Japanese setting and "Beta: Use Unicode UTF-8
> for worldwide language support" on
> - Emacs 30.2 (latest, https://ftp.gnu.org/gnu/emacs/windows/emacs-30/)
>
> Here is how to reproduce.
> 1. Run Emacs with no-init-file
> 2. Go to *scratch* buffer and evaluate:
> (format-time-string "%y,%d,%m %A" (date-to-time (concat "2025,1,Jan")))
> 3. You will get below wrongly encoded string:
> "25,01,01 \220\205\227j\223\372"
> 4. evaluate:
> (decode-coding-string
> (format-time-string "%y,%d,%m %A" (date-to-time (concat "2025,1,Jan")))
> 'cp932)
> 5. You will get below correctly decoded string:
> #("25,01,01 水曜日" 9 12 (charset cp932-2-byte))
>
> This issue doesn't happen when "Beta: Use Unicode UTF-8 for worldwide language
> support" off (locale-coding-system is cp932).
Thanks.
I think this is an issue with Gnulib, whose nstrftime function we use
to format the time in Emacs: it seems to produce time strings encoded
in cp932 even though the UTF-8 support is turned on on MS-Windows.
I've added the Gnulib folks to the discussion.
Bruno and Paul, does Gnulib's nstrftime support the UTF-8 system
codepage on MS-Windows? I see some COMPILE_WIDE preprocessor
conditions in the source, but it is not clear to me whether it is
necessary for Unicode support, and whether strftime (as opposed to
wcsftime) from the Windows C runtime properly supports this "beta
feature". Do you happen to know?
The "\220\205\227j\223\372" bytestream shown above is AFAICT the
correct text properly encoded in cp932, so if we cannot get Windows
and Gnulib to produce a UTF-8 string in this case, we might need as
the last resort to use cp932 when decoding time strings, even if
locale-coding-system is UTF-8 on MS-Windows.
Shingo Tanaka, could you please tell what is the value of
w32-multibyte-code-page on your system, both when "Beta: Use Unicode
UTF-8 for worldwide language support" is ON and when it is OFF?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Sun, 24 Aug 2025 07:14:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 79296 <at> debbugs.gnu.org (full text, mbox):
Hi Eli,
> > 2. Go to *scratch* buffer and evaluate:
> > (format-time-string "%y,%d,%m %A" (date-to-time (concat "2025,1,Jan")))
> > 3. You will get below wrongly encoded string:
> > "25,01,01 \220\205\227j\223\372"
> ...
> I think this is an issue with Gnulib, whose nstrftime function we use
> to format the time in Emacs: it seems to produce time strings encoded
> in cp932 even though the UTF-8 support is turned on on MS-Windows.
> I've added the Gnulib folks to the discussion.
>
> Bruno and Paul, does Gnulib's nstrftime support the UTF-8 system
> codepage on MS-Windows?
* Facts:
- Gnulib supports the UTF-8 system codepage of Windows, since 2024-12-23.
It includes some unit tests, namely gnulib/tests/*w32utf8* .
- This UTF-8 system codepage is only supported with Microsoft UCRT, not
with the MSVCRT. At compile time, this configuration can be tested via
'#ifdef _UCRT'. (This is true for both the mingw and the MSVC toolchains.)
* Hypothesis 1:
The Gnulib support included in Emacs 30.2 is older than 2024-12-23.
* Hypothesis 2:
The Gnulib support included in Emacs 30.2 misses the commits
https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=927a70e0853345315570f051fd6996cfeb7b4d96
https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=9f7ff4f423cd805866cd4edef806c32393621df0
https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=00211fc69c926d6c8f6e3f3cf1d8802623db2af9
https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=8e795a8d9f8c3269a3d30d0d1adbaf0ea9ad4a84
* Hypothesis 3:
The Emacs 30.2 binaries are linked with MSVCRT, not with UCRT.
* Hypothesis 4:
Enabling the option "Beta: Use Unicode UTF-8 for worldwide language support"
has a different effect than creating a .manifest file like the Gnulib
test suite does.
<https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page>
Hypothesis 4 sounds unlikely.
> I see some COMPILE_WIDE preprocessor
> conditions in the source, but it is not clear to me whether it is
> necessary for Unicode support
This COMPILE_WIDE condition is needed only by glibc for the wcsftime() function.
It is not used by Gnulib. It is not needed for i18n or Unicode support.
* Actions:
- Bruno: Add a unit test for nstrftime in w32utf8 mode.
- Eli or Paul: Disprove hypotheses 1, 2, 3.
> Shingo Tanaka, could you please tell what is the value of
> w32-multibyte-code-page on your system, both when "Beta: Use Unicode
> UTF-8 for worldwide language support" is ON and when it is OFF?
Yes, this info would be useful.
Bruno
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Sun, 24 Aug 2025 07:28:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 79296 <at> debbugs.gnu.org (full text, mbox):
> From: Bruno Haible <bruno <at> clisp.org>
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 79296 <at> debbugs.gnu.org
> Date: Sun, 24 Aug 2025 09:13:22 +0200
>
> > > 2. Go to *scratch* buffer and evaluate:
> > > (format-time-string "%y,%d,%m %A" (date-to-time (concat "2025,1,Jan")))
> > > 3. You will get below wrongly encoded string:
> > > "25,01,01 \220\205\227j\223\372"
> > ...
> > I think this is an issue with Gnulib, whose nstrftime function we use
> > to format the time in Emacs: it seems to produce time strings encoded
> > in cp932 even though the UTF-8 support is turned on on MS-Windows.
> > I've added the Gnulib folks to the discussion.
> >
> > Bruno and Paul, does Gnulib's nstrftime support the UTF-8 system
> > codepage on MS-Windows?
>
> * Facts:
> - Gnulib supports the UTF-8 system codepage of Windows, since 2024-12-23.
> It includes some unit tests, namely gnulib/tests/*w32utf8* .
> - This UTF-8 system codepage is only supported with Microsoft UCRT, not
> with the MSVCRT. At compile time, this configuration can be tested via
> '#ifdef _UCRT'. (This is true for both the mingw and the MSVC toolchains.)
>
> * Hypothesis 1:
> The Gnulib support included in Emacs 30.2 is older than 2024-12-23.
>
> * Hypothesis 2:
> The Gnulib support included in Emacs 30.2 misses the commits
> https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=927a70e0853345315570f051fd6996cfeb7b4d96
> https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=9f7ff4f423cd805866cd4edef806c32393621df0
> https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=00211fc69c926d6c8f6e3f3cf1d8802623db2af9
> https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=8e795a8d9f8c3269a3d30d0d1adbaf0ea9ad4a84
>
> * Hypothesis 3:
> The Emacs 30.2 binaries are linked with MSVCRT, not with UCRT.
>
> * Hypothesis 4:
> Enabling the option "Beta: Use Unicode UTF-8 for worldwide language support"
> has a different effect than creating a .manifest file like the Gnulib
> test suite does.
> <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page>
>
> Hypothesis 4 sounds unlikely.
>
> > I see some COMPILE_WIDE preprocessor
> > conditions in the source, but it is not clear to me whether it is
> > necessary for Unicode support
>
> This COMPILE_WIDE condition is needed only by glibc for the wcsftime() function.
> It is not used by Gnulib. It is not needed for i18n or Unicode support.
>
> * Actions:
> - Bruno: Add a unit test for nstrftime in w32utf8 mode.
> - Eli or Paul: Disprove hypotheses 1, 2, 3.
Hypothesis 3 is actually for Corwin (CC'ed), since he built that
binary.
> > Shingo Tanaka, could you please tell what is the value of
> > w32-multibyte-code-page on your system, both when "Beta: Use Unicode
> > UTF-8 for worldwide language support" is ON and when it is OFF?
>
> Yes, this info would be useful.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Sun, 24 Aug 2025 08:15:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 79296 <at> debbugs.gnu.org (full text, mbox):
Hi Eli, Bruno,
> > Shingo Tanaka, could you please tell what is the value of
> > w32-multibyte-code-page on your system, both when "Beta: Use Unicode
> > UTF-8 for worldwide language support" is ON and when it is OFF?
>
> Yes, this info would be useful.
Here is the results. Looks like the value of w32-multibyte-code-page is wrong?
- "Beta: Use Unicode UTF-8 for worldwide language support": OFF
w32-system-coding-system
cp932
w32-multibyte-code-page
932
w32-ansi-code-page
932
- "Beta: Use Unicode UTF-8 for worldwide language support": ON
w32-system-coding-system
cp65001
w32-multibyte-code-page
0
w32-ansi-code-page
65001
Please let me know if you need further information.
--
Shingo
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Sun, 24 Aug 2025 09:13:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 79296 <at> debbugs.gnu.org (full text, mbox):
> Date: Sun, 24 Aug 2025 17:14:37 +0900
> From: Shingo Tanaka <shingo.fg8 <at> gmail.com>
> Cc: Shingo Tanaka <shingo.fg8 <at> gmail.com>,
> Paul Eggert <eggert <at> cs.ucla.edu>,
> 79296 <at> debbugs.gnu.org
>
> Hi Eli, Bruno,
>
> > > Shingo Tanaka, could you please tell what is the value of
> > > w32-multibyte-code-page on your system, both when "Beta: Use Unicode
> > > UTF-8 for worldwide language support" is ON and when it is OFF?
> >
> > Yes, this info would be useful.
>
> Here is the results. Looks like the value of w32-multibyte-code-page is wrong?
>
> - "Beta: Use Unicode UTF-8 for worldwide language support": OFF
>
> w32-system-coding-system
> cp932
>
> w32-multibyte-code-page
> 932
>
> w32-ansi-code-page
> 932
>
> - "Beta: Use Unicode UTF-8 for worldwide language support": ON
>
> w32-system-coding-system
> cp65001
>
> w32-multibyte-code-page
> 0
>
> w32-ansi-code-page
> 65001
>
> Please let me know if you need further information.
Thanks. This means w32-multibyte-code-page doesn't provide a good way
of detecting Japanese Windows where the system codepage was changed to
be UTF-8. What other aspects of your environment can be evidence that
this is the case? Please tell what do the following produce when
evaluated via "M-:" in a running Emacs session. Please show the
values both when "Beta: Use Unicode UTF-8 for worldwide language
support" is ON and when it is OFF.
(w32-get-current-locale-id)
(w32-get-locale-info (w32-get-current-locale-id))
(w32-get-default-locale-id)
(w32-get-locale-info (w32-get-default-locale-id))
(w32-get-console-codepage)
(w32-get-console-output-codepage)
Hopefully, some of these will allow us to identify the combination of
a Japanese Windows with UTF-8 as a system codepage.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Sun, 24 Aug 2025 09:51:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 79296 <at> debbugs.gnu.org (full text, mbox):
On Mon, 25 Aug 2025 03:12:20 +0900,
Eli Zaretskii wrote:
>
> Thanks. This means w32-multibyte-code-page doesn't provide a good way
> of detecting Japanese Windows where the system codepage was changed to
> be UTF-8. What other aspects of your environment can be evidence that
> this is the case? Please tell what do the following produce when
> evaluated via "M-:" in a running Emacs session. Please show the
> values both when "Beta: Use Unicode UTF-8 for worldwide language
> support" is ON and when it is OFF.
>
> (w32-get-current-locale-id)
> (w32-get-locale-info (w32-get-current-locale-id))
> (w32-get-default-locale-id)
> (w32-get-locale-info (w32-get-default-locale-id))
> (w32-get-console-codepage)
> (w32-get-console-output-codepage)
Here you are.
- "Beta: Use Unicode UTF-8 for worldwide language support": OFF
w32-system-coding-system: cp932
w32-multibyte-code-page: 932
w32-ansi-code-page: 932
(w32-get-current-locale-id): 1041
(w32-get-locale-info (w32-get-current-locale-id)): "JPN"
(w32-get-default-locale-id): 1041
(w32-get-locale-info (w32-get-default-locale-id)): "JPN"
(w32-get-console-codepage): 932
(w32-get-console-output-codepage): 932
- "Beta: Use Unicode UTF-8 for worldwide language support": ON
w32-system-coding-system: cp65001
w32-multibyte-code-page: 0
w32-ansi-code-page: 65001
(w32-get-current-locale-id): 1041
(w32-get-locale-info (w32-get-current-locale-id)): "JPN"
(w32-get-default-locale-id): 1041
(w32-get-locale-info (w32-get-default-locale-id)): "JPN"
(w32-get-console-codepage): 65001
(w32-get-console-output-codepage): 65001
--
Shingo
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Sun, 24 Aug 2025 10:42:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 79296 <at> debbugs.gnu.org (full text, mbox):
> From: Bruno Haible <bruno <at> clisp.org>
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 79296 <at> debbugs.gnu.org
> Date: Sun, 24 Aug 2025 09:13:22 +0200
>
> * Hypothesis 1:
> The Gnulib support included in Emacs 30.2 is older than 2024-12-23.
How does one know? I see Paul last ran admin/merge-gnulib on the
emacs-30 release branch on Aug 2, 2025, but maybe this is not what I
should be looking at? In any case, Dec 2024 sounds too old even for
the release branch.
> * Hypothesis 2:
> The Gnulib support included in Emacs 30.2 misses the commits
> https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=927a70e0853345315570f051fd6996cfeb7b4d96
> https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=9f7ff4f423cd805866cd4edef806c32393621df0
> https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=00211fc69c926d6c8f6e3f3cf1d8802623db2af9
> https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=8e795a8d9f8c3269a3d30d0d1adbaf0ea9ad4a84
These commits are in Gnulib files that are not used in Emacs. What
are their effects on the issue at hand, which is the non-ASCII strings
produced by Gnulib's nstrftime?
> - This UTF-8 system codepage is only supported with Microsoft UCRT, not
> with the MSVCRT. At compile time, this configuration can be tested via
> '#ifdef _UCRT'. (This is true for both the mingw and the MSVC toolchains.)
What is it in UCRT that is required for Gnulib to support the UTF-8
system codepage on Windows, in particular for strftime? IOW, what
does the UCRT implementation of libc does that the MSVCRT one doesn't,
that affects this aspect of Gnulib's strftime?
> * Hypothesis 4:
> Enabling the option "Beta: Use Unicode UTF-8 for worldwide language support"
> has a different effect than creating a .manifest file like the Gnulib
> test suite does.
> <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page>
This is about defining a process-specific codepage, which is not what
happens in this case. So I don't think it's relevant.
> * Actions:
> - Bruno: Add a unit test for nstrftime in w32utf8 mode.
I'd be interested to see how this test works for you.
> - Eli or Paul: Disprove hypotheses 1, 2, 3.
>
> > Shingo Tanaka, could you please tell what is the value of
> > w32-multibyte-code-page on your system, both when "Beta: Use Unicode
> > UTF-8 for worldwide language support" is ON and when it is OFF?
>
> Yes, this info would be useful.
The upshot is that we can only reliably know the system's language ID
(0x11), but it is still a mystery for me where did strftime take cp932
with which it encoded the time-related strings. Because all the other
APIs I know about which report codepages all say it's UTF-8.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Sun, 24 Aug 2025 15:33:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 79296 <at> debbugs.gnu.org (full text, mbox):
On 2025-08-24 03:41, Eli Zaretskii wrote:
>> From: Bruno Haible <bruno <at> clisp.org>
>> Date: Sun, 24 Aug 2025 09:13:22 +0200
>>
>> * Hypothesis 1:
>> The Gnulib support included in Emacs 30.2 is older than 2024-12-23.
>
> How does one know? I see Paul last ran admin/merge-gnulib on the
> emacs-30 release branch on Aug 2, 2025, but maybe this is not what I
> should be looking at? In any case, Dec 2024 sounds too old even for
> the release branch.
I ran admin/merge-gnulib on bleeding-edge Gnulib as usual, so the
release branch should have Gnulib as of August 2.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Mon, 25 Aug 2025 22:35:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 79296 <at> debbugs.gnu.org (full text, mbox):
> * Actions:
> - Bruno: Add a unit test for nstrftime in w32utf8 mode.
Done. The test verifies that nstrftime produces the Japanese weekday
in UTF-8 encoding. It passes, provided the locale name used is
"Japanese_Japan.65001", *not* "Japanese_Japan.932".
Eli Zaretskii wrote:
> > * Hypothesis 1:
> > The Gnulib support included in Emacs 30.2 is older than 2024-12-23.
>
> How does one know? I see Paul last ran admin/merge-gnulib on the
> emacs-30 release branch on Aug 2, 2025
With Paul's newer comment, this hypothesis is falsified.
> > * Hypothesis 2:
> > The Gnulib support included in Emacs 30.2 misses the commits
> > https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=927a70e0853345315570f051fd6996cfeb7b4d96
> > https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=9f7ff4f423cd805866cd4edef806c32393621df0
> > https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=00211fc69c926d6c8f6e3f3cf1d8802623db2af9
> > https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=8e795a8d9f8c3269a3d30d0d1adbaf0ea9ad4a84
>
> These commits are in Gnulib files that are not used in Emacs. What
> are their effects on the issue at hand, which is the non-ASCII strings
> produced by Gnulib's nstrftime?
That's most likely the problem, then. For Emacs, the third commit should be
the essential one: It forces a setlocale() argument that ends in ".65001",
thus telling the Microsoft UCRT that you want the UTF-8 environment.
> > - This UTF-8 system codepage is only supported with Microsoft UCRT, not
> > with the MSVCRT. At compile time, this configuration can be tested via
> > '#ifdef _UCRT'. (This is true for both the mingw and the MSVC toolchains.)
>
> What is it in UCRT that is required for Gnulib to support the UTF-8
> system codepage on Windows, in particular for strftime? IOW, what
> does the UCRT implementation of libc does that the MSVCRT one doesn't,
> that affects this aspect of Gnulib's strftime?
Microsoft's UCRT has many changes compared to MSVCRT, probably worth of 10 years
of development. Support for the UTF-8 environment is certainly only one of
the many improvements.
So, the remaining hypotheses are:
* Hypothesis 2:
The string that Emacs passes to the setlocale() function does not end in ".65001".
* Hypothesis 3:
The Emacs 30.2 binaries are linked with MSVCRT, not with UCRT.
-> Corwin?
Bruno
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Tue, 26 Aug 2025 11:20:01 GMT)
Full text and
rfc822 format available.
Message #35 received at 79296 <at> debbugs.gnu.org (full text, mbox):
> From: Bruno Haible <bruno <at> clisp.org>
> Cc: shingo.fg8 <at> gmail.com, eggert <at> cs.ucla.edu, 79296 <at> debbugs.gnu.org
> Date: Tue, 26 Aug 2025 00:34:16 +0200
>
> > * Actions:
> > - Bruno: Add a unit test for nstrftime in w32utf8 mode.
>
> Done. The test verifies that nstrftime produces the Japanese weekday
> in UTF-8 encoding. It passes, provided the locale name used is
> "Japanese_Japan.65001", *not* "Japanese_Japan.932".
Thanks. See below about that, in the context of Emacs.
> > > * Hypothesis 2:
> > > The Gnulib support included in Emacs 30.2 misses the commits
> > > https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=927a70e0853345315570f051fd6996cfeb7b4d96
> > > https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=9f7ff4f423cd805866cd4edef806c32393621df0
> > > https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=00211fc69c926d6c8f6e3f3cf1d8802623db2af9
> > > https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=8e795a8d9f8c3269a3d30d0d1adbaf0ea9ad4a84
> >
> > These commits are in Gnulib files that are not used in Emacs. What
> > are their effects on the issue at hand, which is the non-ASCII strings
> > produced by Gnulib's nstrftime?
>
> That's most likely the problem, then. For Emacs, the third commit should be
> the essential one: It forces a setlocale() argument that ends in ".65001",
> thus telling the Microsoft UCRT that you want the UTF-8 environment.
Emacs by default calls setlocale with the argument of "", thus setting
up to use the default system locale. Are you saying that a call like
setlocale (LC_TIME, "");
is insufficient to force UTF-8 encoding of time-related strings, on
MS-Windows with the UTF-8 system-codepage feature turned on? Can you
try running your tests with a locale of "" and see if the codeset is
set to UTF-8 or codepage 65001?
> > > - This UTF-8 system codepage is only supported with Microsoft UCRT, not
> > > with the MSVCRT. At compile time, this configuration can be tested via
> > > '#ifdef _UCRT'. (This is true for both the mingw and the MSVC toolchains.)
> >
> > What is it in UCRT that is required for Gnulib to support the UTF-8
> > system codepage on Windows, in particular for strftime? IOW, what
> > does the UCRT implementation of libc does that the MSVCRT one doesn't,
> > that affects this aspect of Gnulib's strftime?
>
> Microsoft's UCRT has many changes compared to MSVCRT, probably worth of 10 years
> of development. Support for the UTF-8 environment is certainly only one of
> the many improvements.
Any details beyond that general consideration? Are you saying that
MSVCRT doesn't support codepage 65001 as a codeset of a locale,
whereas UCRT does? Do the tests you wrote fail when linked with
MSVCRT?
> So, the remaining hypotheses are:
>
> * Hypothesis 2:
> The string that Emacs passes to the setlocale() function does not end in ".65001".
AFAIU, it shouldn't, not if Windows does TRT with the default locale
when the UTF-8 option is turned on.
However, since this is Emacs, Shingo Tanaka could test this by setting
the Lisp variable system-time-locale to the string
"Japanese_Japan.65001" and repeating the test presented at the
beginning of this discussion. Assuming that the build is a UCRT build
(Corwin?), this should fix the problem, if your analysis is correct.
> * Hypothesis 3:
> The Emacs 30.2 binaries are linked with MSVCRT, not with UCRT.
> -> Corwin?
Corwin?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Tue, 26 Aug 2025 14:20:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 79296 <at> debbugs.gnu.org (full text, mbox):
On Wed, 27 Aug 2025 05:18:34 +0900,
Eli Zaretskii wrote:
>
> However, since this is Emacs, Shingo Tanaka could test this by setting
> the Lisp variable system-time-locale to the string
> "Japanese_Japan.65001" and repeating the test presented at the
> beginning of this discussion. Assuming that the build is a UCRT build
> (Corwin?), this should fix the problem, if your analysis is correct.
Here is the result. Unfortunately it doesn't fix the issue.
(setq system-time-locale "Japanese_Japan.65001")
"Japanese_Japan.65001"
(format-time-string "%y,%d,%m %A" (date-to-time (concat "2025,1,Jan")))
"25,01,01 \220\205\227j\223\372"
Regards,
Shingo
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Tue, 26 Aug 2025 15:51:01 GMT)
Full text and
rfc822 format available.
Message #41 received at 79296 <at> debbugs.gnu.org (full text, mbox):
> Date: Tue, 26 Aug 2025 23:19:14 +0900
> From: Shingo Tanaka <shingo.fg8 <at> gmail.com>
> Cc: Bruno Haible <bruno <at> clisp.org>,
> corwin <at> bru.st,
> shingo.fg8 <at> gmail.com,
> eggert <at> cs.ucla.edu,
> 79296 <at> debbugs.gnu.org
>
> On Wed, 27 Aug 2025 05:18:34 +0900,
> Eli Zaretskii wrote:
> >
> > However, since this is Emacs, Shingo Tanaka could test this by setting
> > the Lisp variable system-time-locale to the string
> > "Japanese_Japan.65001" and repeating the test presented at the
> > beginning of this discussion. Assuming that the build is a UCRT build
> > (Corwin?), this should fix the problem, if your analysis is correct.
>
> Here is the result. Unfortunately it doesn't fix the issue.
>
> (setq system-time-locale "Japanese_Japan.65001")
> "Japanese_Japan.65001"
>
> (format-time-string "%y,%d,%m %A" (date-to-time (concat "2025,1,Jan")))
> "25,01,01 \220\205\227j\223\372"
OK, now let's try to establish whether your Emacs was linked against
UCRT or MSVCRT. If you have objdump.exe (part of Binutils) installed,
please do
objdump /path/to/emacs.exe | fgrep "DLL Name"
and see if the output includes msvcrt.dll (case-insensitive) or
ucrtbase.dll.
If you don't have objdump, try the dependency walker
(https://www.dependencywalker.com/) instead. Or Process Explorer with
its lower panel set to show DLLs. Look for msvcrt.dll or ucrtbase.dll.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Tue, 26 Aug 2025 16:20:01 GMT)
Full text and
rfc822 format available.
Message #44 received at 79296 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii wrote:
> OK, now let's try to establish whether your Emacs was linked against
> UCRT or MSVCRT.
I just did that:
> - Emacs 30.2 (latest, https://ftp.gnu.org/gnu/emacs/windows/emacs-30/)
Downloaded and unpacked it, and ran
$ dumpbin /imports emacs.exe
Result: It is linked against msvcrt.dll.
So, there is no way to make these binaries work right in the UTF-8 environment
of Windows.
Bruno
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Tue, 26 Aug 2025 21:09:02 GMT)
Full text and
rfc822 format available.
Message #47 received at 79296 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii wrote:
> Emacs by default calls setlocale with the argument of "", thus setting
> up to use the default system locale.
OK.
> Are you saying that a call like
>
> setlocale (LC_TIME, "");
>
> is insufficient to force UTF-8 encoding of time-related strings, on
> MS-Windows with the UTF-8 system-codepage feature turned on?
No, with the Windows UCRT libc and the enabled UTF-8 setting/checkbox
this is enough to get nstrftime() to produce UTF-8 encoded output.
That's what I can infer by playing with variations of my unit test.
On GNU systems, you will also need
setlocale (LC_CTYPE, "");
because glibc requires that the LC_TIME and LC_CTYPE categories specify
the same encoding. (This is a kind of sanity check in glibc.)
> Can you
> try running your tests with a locale of "" and see if the codeset is
> set to UTF-8 or codepage 65001?
If I use
setlocale (LC_ALL, "");
instead of just
setlocale (LC_TIME, "");
then - again, in UCRT only - MB_CUR_MAX gets set to >= 4, which indicates
an UTF-8 encoding.
Even without a setlocale invocation, GetACP() returns 65001, since that's the
direct effect of the UTF-8 setting/checkbox.
> > Microsoft's UCRT has many changes compared to MSVCRT, probably worth of 10 years
> > of development. Support for the UTF-8 environment is certainly only one of
> > the many improvements.
>
> Any details beyond that general consideration? Are you saying that
> MSVCRT doesn't support codepage 65001 as a codeset of a locale,
> whereas UCRT does?
Yes, that's what I'm saying. With MSVCRT, there is no way to get a MB_CUR_MAX
value > 2. Which means, no UTF-8 support.
> Do the tests you wrote fail when linked with MSVCRT?
Yes, the tests already fail at the 'MB_CUR_MAX >= 4' assertion when linked
with MSVCRT.
Bruno
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Wed, 27 Aug 2025 00:06:02 GMT)
Full text and
rfc822 format available.
Message #50 received at 79296 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii wrote:
> Any details beyond that general consideration? Are you saying that
> MSVCRT doesn't support codepage 65001 as a codeset of a locale,
> whereas UCRT does? Do the tests you wrote fail when linked with
> MSVCRT?
Tried it now: running that unit test in the Windows UTF-8 environment, linked
against MSVCRT:
* GetACP() returns 65001. Which is not surprising, since GetACP() is a
Windows API, not a libc API.
* setlocale (LC_ALL, "") fails. [This was the Gnulib setlocale() override.
I assume the MSVCRT setlocale failed in the same way.]
* If you ignore the setlocale failure, MB_CUR_MAX is not >= 4. Meaning
that the locale encoding is not UTF-8.
MSVCRT supports only MB_CUR_MAX == 1 or == 2.
Looking at the output of "dumpbin /imports emacs.exe, I see that the Emacs
binary uses the following symbols from MSVCRT:
6C ___lc_codepage_func
6F ___mb_cur_max_func
188 _getmbcp
240 _mbschr
252 _mbsinc
256 _mbslwr
27A _mbsncpy
27E _mbsnextc
28C _mbspbrk
28E _mbsrchr
302 _snprintf
33C _stricmp
343 _strlwr
34A _strnicmp
4B1 fprintf
4D4 isalpha
4DC isspace
4EB isxdigit
4EF localeconv
51E setlocale
534 strerror
535 strftime
556 tolower
557 toupper
55D vfprintf
Most of these are sensitive to the locale encoding and therefore
will not produce the expected results for an UTF-8 environment.
Additionally, the Emacs binary uses several DLLs, some of which
also use locale-aware functions from libc. These DLLs will not
work as expected either.
So, the only reasonable way forward, for supporting the Windows UTF-8
environment, is to produce two sets of binaries for Emacs:
- one set of .exe and .dlls linked with MSVCRT, for use on old
Windows versions,
- one set of .exe and .dlls linked with UCRT, for use on Windows
versions from 2019 or newer [1].
For producing such binaries with only Free Software (no MSVC compiler,
no MSVC header files) one can use MSYS2. For a year or two already
it supports two target environments:
- mingw-w64 with MSVCRT,
- mingw-w64 with UCRT.
These two development environments are very similar, which means that
the Makefile will need very few adapations.
Bruno
[1] https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Wed, 27 Aug 2025 12:06:02 GMT)
Full text and
rfc822 format available.
Message #53 received at 79296 <at> debbugs.gnu.org (full text, mbox):
> From: Bruno Haible <bruno <at> clisp.org>
> Cc: corwin <at> bru.st, shingo.fg8 <at> gmail.com, eggert <at> cs.ucla.edu,
> 79296 <at> debbugs.gnu.org
> Date: Wed, 27 Aug 2025 02:05:42 +0200
>
> Eli Zaretskii wrote:
> > Any details beyond that general consideration? Are you saying that
> > MSVCRT doesn't support codepage 65001 as a codeset of a locale,
> > whereas UCRT does? Do the tests you wrote fail when linked with
> > MSVCRT?
>
> Tried it now: running that unit test in the Windows UTF-8 environment, linked
> against MSVCRT:
>
> * GetACP() returns 65001. Which is not surprising, since GetACP() is a
> Windows API, not a libc API.
>
> * setlocale (LC_ALL, "") fails. [This was the Gnulib setlocale() override.
> I assume the MSVCRT setlocale failed in the same way.]
>
> * If you ignore the setlocale failure, MB_CUR_MAX is not >= 4. Meaning
> that the locale encoding is not UTF-8.
>
> MSVCRT supports only MB_CUR_MAX == 1 or == 2.
Thanks for this info.
> Looking at the output of "dumpbin /imports emacs.exe, I see that the Emacs
> binary uses the following symbols from MSVCRT:
>
> 6C ___lc_codepage_func
> 6F ___mb_cur_max_func
> 188 _getmbcp
> 240 _mbschr
> 252 _mbsinc
> 256 _mbslwr
> 27A _mbsncpy
> 27E _mbsnextc
> 28C _mbspbrk
> 28E _mbsrchr
> 302 _snprintf
> 33C _stricmp
> 343 _strlwr
> 34A _strnicmp
> 4B1 fprintf
> 4D4 isalpha
> 4DC isspace
> 4EB isxdigit
> 4EF localeconv
> 51E setlocale
> 534 strerror
> 535 strftime
> 556 tolower
> 557 toupper
> 55D vfprintf
Those are in most cases used only when w32-unicode-filenames is turned
off, which is supposed to happen only on Windows 9X (or in debugging).
The rest are used at startup, when the system locale and the
corresponding encoding machinery is not yet set up.
But yes, if turning on this UTF-8 feature doesn't make these functions
in MSVCRT use UTF-8 as the multibyte encoding, things will fall apart
in subtle ways when non-ASCII strings are involved.
> Additionally, the Emacs binary uses several DLLs, some of which
> also use locale-aware functions from libc. These DLLs will not
> work as expected either.
That's a separate issue, and it doesn't get resolved by linking Emacs
with UCRT. That's because, AFAIK, if a DLL was linked against MSVCRT
at its build time, it will continue using MSVCRT even when called from
a program that uses UCRT. So a person who wants to use UTF-8 as the
system codepage will need to make sure _all_ of the optional libraries
used by Emacs were also linked with UCRT. Moreover, the source code
of those libraries should be UTF-8 aware. For example, it should use
multibyte-aware functions for walking a string by character, instead
of assuming that each byte is a separate character. And how many
ported Unix and GNU libraries are aware of that? As a simple example,
it's enough to have something like
char filename[MAX_PATH];
to run the risk of blowing up the stack if the file name is non-ASCII,
encoded in UTF-8, and is long enough. (Emacs handles this particular
problem in its own code, but many external libraries don't.)
> So, the only reasonable way forward, for supporting the Windows UTF-8
> environment, is to produce two sets of binaries for Emacs:
> - one set of .exe and .dlls linked with MSVCRT, for use on old
> Windows versions,
> - one set of .exe and .dlls linked with UCRT, for use on Windows
> versions from 2019 or newer [1].
The Emacs project doesn't produce binaries. That is left to distros.
The MS-Windows binaries on the Gnu FTP site are produced by Corwin who
volunteered for this job, so it is up to him what he wants to support
and how much would he agree to complicate his job. Windows versions
before Vista (perhaps even before Windows 8.1) are already unsupported
by those binaries, since MSYS2 tossed them, so the resulting binaries
depend on APIs and DLLs that older systems don't have, and will thus
refuse to run on those older systems.
In addition, linking Emacs itself against UCRT is not enough, see
above.
For these reasons, I stand by my opinion that UTF-8 support on Windows
is not yet ready for prime time, and advise against turning it on if
one wants to use Emacs reliably on MS-Windows. MS knew what they were
doing when they designated this feature "Beta".
As a stopgap, we could introduce Windows-specific variables in Emacs
through which users could specify the encoding to decode time strings
and perhaps other strings if needed, instead of automatically falling
back on locale-coding-system. Then users like Shingo Tanaka could say
(setq w32-time-coding-system 'cp932)
and have the time strings decoded correctly.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#79296
; Package
emacs
.
(Wed, 27 Aug 2025 13:55:02 GMT)
Full text and
rfc822 format available.
Message #56 received at 79296 <at> debbugs.gnu.org (full text, mbox):
On Wed, 27 Aug 2025 18:05:42 +0900,
Bruno Haible wrote:
>
> For producing such binaries with only Free Software (no MSVC compiler,
> no MSVC header files) one can use MSYS2. For a year or two already
> it supports two target environments:
> - mingw-w64 with MSVCRT,
> - mingw-w64 with UCRT.
> These two development environments are very similar, which means that
> the Makefile will need very few adapations.
I've just installed MSYS2 Emacs of UCRT version and run it with no init file.
~> pacman -S mingw-w64-ucrt-x86_64-emacs
~> /ucrt64/bin/runemacs --no-init-file
And confirmed the issue I reported doesn't happen even with
"Beta: Use Unicode UTF-8 for worldwide language support" on.
In *scratch* buffer:
(format-time-string "%y,%d,%m %A" (date-to-time (concat "2025,1,Jan")))
"25,01,01 水曜日"
w32-system-coding-system
cp65001
w32-multibyte-code-page
0
w32-ansi-code-page
65001
(w32-get-current-locale-id)
1041
(w32-get-locale-info (w32-get-current-locale-id))
"JPN"
(w32-get-default-locale-id)
1041
(w32-get-locale-info (w32-get-default-locale-id))
"JPN"
(w32-get-console-codepage)
65001
(w32-get-console-output-codepage)
65001
Regards,
Shingo
Reply sent
to
Eli Zaretskii <eliz <at> gnu.org>
:
You have taken responsibility.
(Sat, 30 Aug 2025 09:26:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Shingo Tanaka <shingo.fg8 <at> gmail.com>
:
bug acknowledged by developer.
(Sat, 30 Aug 2025 09:26:02 GMT)
Full text and
rfc822 format available.
Message #61 received at 79296-done <at> debbugs.gnu.org (full text, mbox):
> Date: Wed, 27 Aug 2025 22:54:30 +0900
> From: Shingo Tanaka <shingo.fg8 <at> gmail.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,
> corwin <at> bru.st,
> shingo.fg8 <at> gmail.com,
> eggert <at> cs.ucla.edu,
> 79296 <at> debbugs.gnu.org
>
> On Wed, 27 Aug 2025 18:05:42 +0900,
> Bruno Haible wrote:
> >
> > For producing such binaries with only Free Software (no MSVC compiler,
> > no MSVC header files) one can use MSYS2. For a year or two already
> > it supports two target environments:
> > - mingw-w64 with MSVCRT,
> > - mingw-w64 with UCRT.
> > These two development environments are very similar, which means that
> > the Makefile will need very few adapations.
>
> I've just installed MSYS2 Emacs of UCRT version and run it with no init file.
>
> ~> pacman -S mingw-w64-ucrt-x86_64-emacs
> ~> /ucrt64/bin/runemacs --no-init-file
>
> And confirmed the issue I reported doesn't happen even with
> "Beta: Use Unicode UTF-8 for worldwide language support" on.
>
> In *scratch* buffer:
>
> (format-time-string "%y,%d,%m %A" (date-to-time (concat "2025,1,Jan")))
> "25,01,01 水曜日"
>
> w32-system-coding-system
> cp65001
>
> w32-multibyte-code-page
> 0
>
> w32-ansi-code-page
> 65001
>
> (w32-get-current-locale-id)
> 1041
>
> (w32-get-locale-info (w32-get-current-locale-id))
> "JPN"
>
> (w32-get-default-locale-id)
> 1041
>
> (w32-get-locale-info (w32-get-default-locale-id))
> "JPN"
>
> (w32-get-console-codepage)
> 65001
>
> (w32-get-console-output-codepage)
> 65001
Thanks. I've now added a new section to the Emacs w32 FAQ about these
issues, and I'm therefore closing this bug.
This bug report was last modified 21 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.