GNU bug report logs - #64420
string-width of … is 2 in CJK environments

Previous Next

Package: emacs;

Reported by: Dmitry Gutov <dmitry <at> gutov.dev>

Date: Sun, 2 Jul 2023 12:58:02 UTC

Severity: normal

Full log


View this message in rfc822 format

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>, Yuan Fu <casouri <at> gmail.com>
Cc: 64420 <at> debbugs.gnu.org
Subject: bug#64420: string-width of … is 2 in CJK environments
Date: Thu, 27 Jul 2023 04:52:57 +0300
On 13/07/2023 08:23, Eli Zaretskii wrote:
>> From: Yuan Fu<casouri <at> gmail.com>
>> Date: Wed, 12 Jul 2023 14:11:14 -0700
>> Cc: Eli Zaretskii<eliz <at> gnu.org>,
>>   64420 <at> debbugs.gnu.org
>>
>> Here’s what I know: In a CJK “context”, “…” is supposed to be one ideograph wide (like all CJK punctuation), ie, width=2.
>>
>> However, it’s not as simple as “they used the wrong font”, because both Latin and CJK use the same Unicode code point for “…”, but expect different glyphs. In publication, this is solved by manually marking the text with style or font, so the software uses the desired glyph. Terminals and editors don’t have this luxury.
>>
>> BTW it’s not just ellipses, CJK and Latin shares the same code points for quotes, em dash and middle dot while expecting different glyphs for them.
>>
>> Since most terminal and editor (especially terminal) quires ASCII/Latin font before falling back to CJK fonts, I expect most terminal and editor to show the Latin glyph for “…” (width=1) most of the time.
>>
>> So practically, it would be correct most of the time if we assume the following code points have a width of 1, regardless of locale:
>>
>> – HORIZONTAL ELLIPSIS …
>> – LEFT/RIGHT DOUBLE QUOTATION MARK “”
>> – LEFT/RIGHT SINGLE QUOTATION MARK ‘’
>> – EM DASH —
>> – MIDDLE DOT ·
>>
>> But obviously if someone configures their terminal or editor to use CJK font first, these characters MIGHT have width = 2. I said MIGHT because there are plenty CJK fonts that uses the 1-width Latin glyph for these characters by default.
>>
>> It might be helpful to have a wrapper string-width that considers heuristics like this, while string-width goes strictly by Unicode and locale.
> Thanks.  My conclusion from the above is a bit different: we should
> introduce a user option to modify the behavior of
> use-cjk-char-width-table, such that users who have fonts where these
> characters are not double-width could have the width of these
> characters left at their Unicode values.

We could add an option, and then go with the default value which 
corresponds to whatever seems the common opinion here.

Anyway, it doesn't seem like anybody else in this discussion is better 
equipped to choose that user option's name, or write the rest of the patch.




This bug report was last modified 2 years and 1 day ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.