GNU bug report logs - #63029
[BUG?] format inconsistency in deciding string widths on different locales

Previous Next

Package: emacs;

Reported by: Ruijie Yu <ruijie <at> netyu.xyz>

Date: Sun, 23 Apr 2023 10:39:02 UTC

Severity: normal

Done: Ruijie Yu <ruijie <at> netyu.xyz>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 63029 in the body.
You can then email your comments to 63029 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#63029; Package emacs. (Sun, 23 Apr 2023 10:39:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ruijie Yu <ruijie <at> netyu.xyz>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 23 Apr 2023 10:39:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ruijie Yu <ruijie <at> netyu.xyz>
To: bug-gnu-emacs <at> gnu.org
Subject: [BUG?] format inconsistency in deciding string widths on different
 locales
Date: Sun, 23 Apr 2023 18:23:02 +0800
Hello,

I don't quite know yet whether this is a bug in Emacs.  Here are the
observed results, and note the unicode character:

--8<---------------cut here---------------start------------->8---
$ for locale in {en_US,fr_FR,de_DE,zh_CN,ja_JA}.UTF-8; do
    printf "$locale\t"
    LANG="$locale" src/emacs -Q -batch \
                   -eval '(message "%S" (format "%-5.5s" "1234…"))'
done
--8<---------------cut here---------------end--------------->8---

This results in the following output:

--8<---------------cut here---------------start------------->8---
en_US.UTF-8	"1234…"
fr_FR.UTF-8	"1234…"
de_DE.UTF-8	"1234…"
zh_CN.UTF-8	"1234 "
ja_JA.UTF-8	"1234 "
--8<---------------cut here---------------end--------------->8---

Notice that in zh_CN and ja_JA, we have a space instead of the expected
ellipsis character.


If this is expected behavior, how do we know how "wide" the `format'
function thinks any given character is?  In other words, why _does_ it
think "…" should be two-character wide?  And how do we, the elisp users,
get this information?  I tried to dive into the C code for
`styled_format', but got lost.  Thanks.

----------

Reproduced on this in-source build:

In GNU Emacs 30.0.50 (build 2, x86_64-pc-linux-gnu, GTK+ Version
 3.24.37, cairo version 1.17.8) of 2023-04-23 built on fw.net.yu
Repository revision: 3badd2358d5f0af71887ee1cc9d39c2f312b6888
Repository branch: master
System Description: Arch Linux

Configured using:
 'configure --sysconfdir=/etc --prefix=/usr --localstatedir=/var
 --with-cairo --with-harfbuzz --with-libsystemd --with-modules
 --with-pgtk --with-native-compilation CFLAGS=-Og'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY
PDUMPER PGTK PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XIM GTK3 ZLIB

-- 
Best,


RY

[Please note that this mail might go to spam due to some
misconfiguration in my mail server -- still investigating.]




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63029; Package emacs. (Sun, 23 Apr 2023 11:04:02 GMT) Full text and rfc822 format available.

Message #8 received at 63029 <at> debbugs.gnu.org (full text, mbox):

From: Ihor Radchenko <yantar92 <at> posteo.net>
To: Ruijie Yu <ruijie <at> netyu.xyz>
Cc: 63029 <at> debbugs.gnu.org
Subject: Re: bug#63029: [BUG?] format inconsistency in deciding string
 widths on different locales
Date: Sun, 23 Apr 2023 11:06:38 +0000
Ruijie Yu via "Bug reports for GNU Emacs, the Swiss army knife of text
>                    -eval '(message "%S" (format "%-5.5s" "1234…"))'
> ...
> en_US.UTF-8	"1234…"
> fr_FR.UTF-8	"1234…"
> de_DE.UTF-8	"1234…"
> zh_CN.UTF-8	"1234 "
> ja_JA.UTF-8	"1234 "

Context: https://orgmode.org/list/sdv7cu4ugk2.fsf <at> netyu.xyz

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63029; Package emacs. (Sun, 23 Apr 2023 11:06:02 GMT) Full text and rfc822 format available.

Message #11 received at 63029 <at> debbugs.gnu.org (full text, mbox):

From: Ihor Radchenko <yantar92 <at> posteo.net>
To: Ruijie Yu <ruijie <at> netyu.xyz>
Cc: 63029 <at> debbugs.gnu.org
Subject: Re: bug#63029: [BUG?] format inconsistency in deciding string
 widths on different locales
Date: Sun, 23 Apr 2023 11:08:33 +0000
Ruijie Yu via "Bug reports for GNU Emacs, the Swiss army knife of text
editors" <bug-gnu-emacs <at> gnu.org> writes:

> en_US.UTF-8	"1234…"
> fr_FR.UTF-8	"1234…"
> de_DE.UTF-8	"1234…"
> zh_CN.UTF-8	"1234 "
> ja_JA.UTF-8	"1234 "

I can reproduce on the latest master, Emacs 28, Emacs 27, and Emacs 26.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63029; Package emacs. (Sun, 23 Apr 2023 14:19:02 GMT) Full text and rfc822 format available.

Message #14 received at 63029 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Ruijie Yu <ruijie <at> netyu.xyz>
Cc: 63029 <at> debbugs.gnu.org
Subject: Re: bug#63029: [BUG?] format inconsistency in deciding string widths
 on different locales
Date: Sun, 23 Apr 2023 17:19:10 +0300
> Date: Sun, 23 Apr 2023 18:23:02 +0800
> From:  Ruijie Yu via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> I don't quite know yet whether this is a bug in Emacs.  Here are the
> observed results, and note the unicode character:
> 
> --8<---------------cut here---------------start------------->8---
> $ for locale in {en_US,fr_FR,de_DE,zh_CN,ja_JA}.UTF-8; do
>     printf "$locale\t"
>     LANG="$locale" src/emacs -Q -batch \
>                    -eval '(message "%S" (format "%-5.5s" "1234…"))'
> done
> --8<---------------cut here---------------end--------------->8---
> 
> This results in the following output:
> 
> --8<---------------cut here---------------start------------->8---
> en_US.UTF-8	"1234…"
> fr_FR.UTF-8	"1234…"
> de_DE.UTF-8	"1234…"
> zh_CN.UTF-8	"1234 "
> ja_JA.UTF-8	"1234 "
> --8<---------------cut here---------------end--------------->8---
> 
> Notice that in zh_CN and ja_JA, we have a space instead of the expected
> ellipsis character.
> 
> 
> If this is expected behavior, how do we know how "wide" the `format'
> function thinks any given character is?  In other words, why _does_ it
> think "…" should be two-character wide?

This is a kludgey feature: in CJK locales some characters are always
considered double-width.  See code in characters.el that begins with a
comment around line 1140.  The function use-cjk-char-width-table
defined there is invoked (via the setup-function of the language
environment) when the language environment in Emacs is set to one of
those CJK locales.

The reason for this is that in CJK fonts these characters are supposed
to be rendered using full-width glyphs.

See also bug#54138 and
https://lists.gnu.org/archive/html/emacs-devel/2022-02/msg00917.html.

> And how do we, the elisp users, get this information?

I don't understand this question.  Please elaborate: what information
do you want to get, besides the width of the characters (which is
accessible via char-width-table).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63029; Package emacs. (Sun, 23 Apr 2023 14:30:02 GMT) Full text and rfc822 format available.

Message #17 received at 63029 <at> debbugs.gnu.org (full text, mbox):

From: Ruijie Yu <ruijie <at> netyu.xyz>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63029 <at> debbugs.gnu.org
Subject: Re: bug#63029: [BUG?] format inconsistency in deciding string
 widths on different locales
Date: Sun, 23 Apr 2023 22:23:16 +0800
Eli Zaretskii <eliz <at> gnu.org> writes:

>> If this is expected behavior, how do we know how "wide" the `format'
>> function thinks any given character is?  In other words, why _does_ it
>> think "…" should be two-character wide?
>
> This is a kludgey feature: in CJK locales some characters are always
> considered double-width.  See code in characters.el that begins with a
> comment around line 1140.  The function use-cjk-char-width-table
> defined there is invoked (via the setup-function of the language
> environment) when the language environment in Emacs is set to one of
> those CJK locales.
>
> The reason for this is that in CJK fonts these characters are supposed
> to be rendered using full-width glyphs.
>
> See also bug#54138 and
> https://lists.gnu.org/archive/html/emacs-devel/2022-02/msg00917.html.

Thanks for the link.  I have found the answer in your response there.

>> And how do we, the elisp users, get this information?
>
> I don't understand this question.  Please elaborate: what information
> do you want to get, besides the width of the characters (which is
> accessible via char-width-table).

You mentioning `char-width-table' here and `char-width' on the linked
thread precisely answered my question.  I was looking for `char-width'
without knowing its name.  Thanks.

-- 
Best,


RY

[Please note that this mail might go to spam due to some
misconfiguration in my mail server -- still investigating.]




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63029; Package emacs. (Sun, 23 Apr 2023 14:33:01 GMT) Full text and rfc822 format available.

Message #20 received at 63029 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Ruijie Yu <ruijie <at> netyu.xyz>
Cc: 63029 <at> debbugs.gnu.org
Subject: Re: bug#63029: [BUG?] format inconsistency in deciding string
 widths on different locales
Date: Sun, 23 Apr 2023 17:32:43 +0300
> From: Ruijie Yu <ruijie <at> netyu.xyz>
> Cc: 63029 <at> debbugs.gnu.org
> Date: Sun, 23 Apr 2023 22:23:16 +0800
> 
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > See also bug#54138 and
> > https://lists.gnu.org/archive/html/emacs-devel/2022-02/msg00917.html.
> 
> Thanks for the link.  I have found the answer in your response there.
> 
> >> And how do we, the elisp users, get this information?
> >
> > I don't understand this question.  Please elaborate: what information
> > do you want to get, besides the width of the characters (which is
> > accessible via char-width-table).
> 
> You mentioning `char-width-table' here and `char-width' on the linked
> thread precisely answered my question.  I was looking for `char-width'
> without knowing its name.  Thanks.

OK, so can we close this issue?

Btw, the recommended method of computing the width of a string is via
string-pixel-width.




Reply sent to Ruijie Yu <ruijie <at> netyu.xyz>:
You have taken responsibility. (Sun, 23 Apr 2023 14:40:01 GMT) Full text and rfc822 format available.

Notification sent to Ruijie Yu <ruijie <at> netyu.xyz>:
bug acknowledged by developer. (Sun, 23 Apr 2023 14:40:01 GMT) Full text and rfc822 format available.

Message #25 received at 63029-done <at> debbugs.gnu.org (full text, mbox):

From: Ruijie Yu <ruijie <at> netyu.xyz>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63029-done <at> debbugs.gnu.org
Subject: Re: bug#63029: [BUG?] format inconsistency in deciding string
 widths on different locales
Date: Sun, 23 Apr 2023 22:38:33 +0800
Eli Zaretskii <eliz <at> gnu.org> writes:

> OK, so can we close this issue?

We can -- done.

> Btw, the recommended method of computing the width of a string is via
> string-pixel-width.

Will take a look at this function.  Thanks.

-- 
Best,


RY

[Please note that this mail might go to spam due to some
misconfiguration in my mail server -- still investigating.]




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 22 May 2023 11:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 87 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.