GNU bug report logs -
#79296
30.2; format-time-string returns wrongly encoded string in MS Windows Japanese with cp65001 beta config
Previous Next
Full log
View this message in rfc822 format
Eli Zaretskii wrote:
> Emacs by default calls setlocale with the argument of "", thus setting
> up to use the default system locale.
OK.
> Are you saying that a call like
>
> setlocale (LC_TIME, "");
>
> is insufficient to force UTF-8 encoding of time-related strings, on
> MS-Windows with the UTF-8 system-codepage feature turned on?
No, with the Windows UCRT libc and the enabled UTF-8 setting/checkbox
this is enough to get nstrftime() to produce UTF-8 encoded output.
That's what I can infer by playing with variations of my unit test.
On GNU systems, you will also need
setlocale (LC_CTYPE, "");
because glibc requires that the LC_TIME and LC_CTYPE categories specify
the same encoding. (This is a kind of sanity check in glibc.)
> Can you
> try running your tests with a locale of "" and see if the codeset is
> set to UTF-8 or codepage 65001?
If I use
setlocale (LC_ALL, "");
instead of just
setlocale (LC_TIME, "");
then - again, in UCRT only - MB_CUR_MAX gets set to >= 4, which indicates
an UTF-8 encoding.
Even without a setlocale invocation, GetACP() returns 65001, since that's the
direct effect of the UTF-8 setting/checkbox.
> > Microsoft's UCRT has many changes compared to MSVCRT, probably worth of 10 years
> > of development. Support for the UTF-8 environment is certainly only one of
> > the many improvements.
>
> Any details beyond that general consideration? Are you saying that
> MSVCRT doesn't support codepage 65001 as a codeset of a locale,
> whereas UCRT does?
Yes, that's what I'm saying. With MSVCRT, there is no way to get a MB_CUR_MAX
value > 2. Which means, no UTF-8 support.
> Do the tests you wrote fail when linked with MSVCRT?
Yes, the tests already fail at the 'MB_CUR_MAX >= 4' assertion when linked
with MSVCRT.
Bruno
This bug report was last modified 21 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.