GNU bug report logs - #64420
string-width of … is 2 in CJK environments

Package: emacs;

Reported by: Dmitry Gutov <dmitry <at> gutov.dev>

Date: Sun, 2 Jul 2023 12:58:02 UTC

Severity: normal

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: 64420 <at> debbugs.gnu.org
Subject: bug#64420: string-width of … is 2 in CJK environments
Date: Tue, 11 Jul 2023 14:41:40 +0300

> Date: Tue, 11 Jul 2023 05:13:57 +0300
> Cc: 64420 <at> debbugs.gnu.org
> From: Dmitry Gutov <dmitry <at> gutov.dev>
> 
> >> Would it be appropriate to fix the entry for … in that table either way?
> > 
> > "Fix" in what way?  In most language-environments we get
> > 
> >    (char-width ?…) => 1
> > 
> > What's wrong with that?
> 
> It returns 2 in Chinese-BIG5. While the actual metrics of the char don't 
> change.

I explained why this happens and why Emacs works that way.  If
something in my explanation is unclear, please ask more specific
questions.

> >> Or does that not match the principle with which those entries are done?
> > 
> > Sorry, I don't understand the question: what principle are you talking
> > about?
> 
> The principles by which we fill in the said char-table which we fill "by 
> hand". E.g. which characters to include, and which to leave with 
> "automatic" metrics.

We fill the table by hand, but the data is synchronized with the
Unicode Standard, and is reviewed each time we import a new Unicode
version.  The tweaking of the char-width tables in CJK locales is due
to the issue I explained in my previous message:

> >>> In CJK locales, most characters are double-width because those locales
> >>> use fonts where the glyphs are wider.  Or at least this is the theory.
> >>> string-pixel-width is free from these assumptions because it actually
> >>> measures the font glyphs.

> What I meant is, string-lixel-width must be slower than string-width 
> because it uses a temp buffer and actual measurements, whereas the 
> latter function only does a table lookup, more or less (N times).

It is slower, yes, but much more accurate.  TANSTAAFL.

This bug report was last modified 2 years and 2 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #64420 string-width of … is 2 in CJK environments

GNU bug report logs - #64420
string-width of … is 2 in CJK environments