GNU bug report logs - #1399
23.0.60; Some Unicode glyphs incorrectly mapped to CJK

Previous Next

Package: emacs;

Reported by: Ian Eure <ian <at> digg.com>

Date: Thu, 20 Nov 2008 23:40:04 UTC

Severity: normal

Tags: notabug, wontfix

Done: Lars Magne Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 1399 in the body.
You can then email your comments to 1399 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#1399; Package emacs. Full text and rfc822 format available.

Acknowledgement sent to Ian Eure <ian <at> digg.com>:
New bug report received and forwarded. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. Full text and rfc822 format available.

Message #5 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Ian Eure <ian <at> digg.com>
To: emacs-pretest-bug <at> gnu.org
Subject: 23.0.60; Some Unicode glyphs incorrectly mapped to CJK
Date: Thu, 20 Nov 2008 15:33:47 -0800
It seems that some Unicode glyphs are incorrectly categorized.

For example, U+201C, U+201D, U+2018, U+2019 (LEFT/RIGHT SINGLE/DOUBLE  
QUOTATION MARK) are all mapped into the CHK category. This results in  
the use of the STHeiti font for those characters, which are a  
different width than the normal font I've chosen.

I think it's incorrect for them to be categorized as CJK, since they  
are widely used in latin scripts.


In GNU Emacs 23.0.60.1 (i386-apple-darwin9.5.0, NS apple-appkit-949.35)
 of 2008-11-20 on neutron.local
Windowing system distributor `Apple', version  
97.112.112.108.101.45.97.112.112.107.105.116.45.57.52.57.46.51.53
configured using `configure  '--with-ns''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: nil
  value of $XMODIFIERS: nil
  locale-coding-system: nil
  default-enable-multibyte-characters: t

Major mode: Help

Minor modes in effect:
  ime-bindings: t
  tooltip-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  global-auto-composition-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t
  view-mode: t

Recent input:
C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n
C-n C-n C-x b * <backspace> f o o <return> <return>
<backspace> C-x k RET C-x b f o <tab> <return> C-a
C-k C-p C-f C-f C-k C-a C-f C-a M-x d e s c r i b e
- c h a r <tab> <return> w C-_ C-x o C-n C-e C-b C-b
C-b C-b <return> C-c C-b C-b C-p C-b C-b C-b C-b C-b
C-b C-b C-b C-b C-b C-b C-b C-b C-b C-b C-b C-b C-b
C-b C-b C-b C-b M-> C-p C-p C-p C-p C-p C-n C-n C-e
C-a C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p
C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p
C-n C-n C-n C-n C-n C-p C-p C-p C-p C-p C-p C-p C-p
C-p C-p C-p C-p C-p C-p C-p C-p M-p M-p C-p C-p C-p
C-p C-p <help-echo> <down-mouse-1> <mouse-movement>
<drag-mouse-1> C-p C-p C-p C-p C-p C-p C-p C-p C-p
C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-a
C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p
C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p C-p
C-p C-p C-p C-p C-p C-p <help-echo> <down-mouse-1>
<down-mouse-1> <mouse-1> <wheel-up> <double-wheel-up>
<down-mouse-1> <mouse-1> <wheel-up> <wheel-down> <double-wheel-down>
<triple-wheel-down> <triple-wheel-down> <triple-wheel-down>
<triple-wheel-down> <triple-wheel-down> <triple-wheel-down>
<triple-wheel-up> <triple-wheel-up> <triple-wheel-up>
<triple-wheel-up> <triple-wheel-up> <triple-wheel-up>
<triple-wheel-up> <triple-wheel-up> <down-mouse-1>
<mouse-1> <help-echo> <down-mouse-1> <mouse-1> C-x
b f o n t <tab> <return> C-x 1 C-v C-v C-v M-v C-v
C-v C-v M-v M-v M-v C-x b <return> C-x b <return> C-x
b * H <tab> <return> C-p C-p C-p C-p C-p C-p C-p C-p
C-p C-p C-p C-p C-p C-n C-a C-SPC C-e M-w <menu-bar>
<help-menu> <send-emacs-bug-report>

Recent messages:
uncompressing fontset.el.gz...done
call-interactively: Beginning of buffer
Type C-x 1 to delete the help window.
Undo!
Mark set
Auto-saving...done
byte-code: Beginning of buffer [3 times]
byte-code: End of buffer [6 times]
byte-code: Beginning of buffer [7 times]
Mark set




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#1399; Package emacs. Full text and rfc822 format available.

Acknowledgement sent to Chong Yidong <cyd <at> stupidchicken.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. Full text and rfc822 format available.

Message #10 received at 1399 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Chong Yidong <cyd <at> stupidchicken.com>
To: Kenichi Handa  <handa <at> m17n.org>
Cc: 1399 <at> debbugs.gnu.org
Subject: Re: 23.0.60; Some Unicode glyphs incorrectly mapped to CJK
Date: Sat, 29 Nov 2008 21:26:05 -0500
Hi Handa-san,

Could you take a look at this bug report?

Ian Eure <ian <at> digg.com> wrote:

> It seems that some Unicode glyphs are incorrectly categorized.
>
> For example, U+201C, U+201D, U+2018, U+2019 (LEFT/RIGHT SINGLE/DOUBLE  
> QUOTATION MARK) are all mapped into the CHK category. This results in  
> the use of the STHeiti font for those characters, which are a  
> different width than the normal font I've chosen.
>
> I think it's incorrect for them to be categorized as CJK, since they  
> are widely used in latin scripts.




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#1399; Package emacs. (Wed, 17 Dec 2008 06:43:15 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kenichi Handa <handa <at> m17n.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Tue, 17 Mar 2009 02:10:05 GMT) Full text and rfc822 format available.

Message #15 received at 1399 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: 1399 <at> debbugs.gnu.org
Cc: ian <at> digg.com
Subject: 23.0.60; Some Unicode glyphs incorrectly mapped to CJK
Date: Tue, 17 Mar 2009 11:00:30 +0900
Sorry for the late response.

> It seems that some Unicode glyphs are incorrectly categorized.
> 
> For example, U+201C, U+201D, U+2018, U+2019 (LEFT/RIGHT SINGLE/DOUBLE  
> QUOTATION MARK) are all mapped into the CHK category. This results in  
> the use of the STHeiti font for those characters, which are a  
> different width than the normal font I've chosen.

Category doesn't affect the font selection.

As all of those characters are `symbol' script, Emacs at
first lists fonts that have at least one of #x201C, #x2200,
#x2500 (see script-representative-chars), and select one
that matches best with your default font's family, foundry,
etc.

In your case, perhaps all your listed fonts have different
family, foundry, etc than the default font, and thus Emacs
selects arbitrary one from the listed fonts.

Currently, Emacs can't know which kind of font is more
suitable for those charaters; a font that has double-width
glyphs for them, or a font that has single-width glyphs.

So, if you prefer a specific font for symbol characters, you
must modify the defualt fontset (or whatever fontset you are
using) for symbol characters, for example, as this:

(set-fontset-font
  "fontset-default"
  'symbol
  '("FAMILYNAME" . "iso10646-1"))

> I think it's incorrect for them to be categorized as CJK, since they  
> are widely used in latin scripts.

Character category is not exclusive.  Even if a character
has CJK category, it doesn't mean that the character is not
Latin.

---
Kenichi Handa
handa <at> m17n.org




Tags added: wontfix, notabug Request was from Chong Yidong <cyd <at> stupidchicken.com> to control <at> emacsbugs.donarmstrong.com. (Tue, 17 Mar 2009 14:10:04 GMT) Full text and rfc822 format available.

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#1399; Package emacs. (Thu, 16 Apr 2009 10:50:06 GMT) Full text and rfc822 format available.

Message #20 received at control <at> debbugs.gnu.org (full text, mbox):

From: Lars Magne Ingebrigtsen <larsi <at> gnus.org>
To: control <at> debbugs.gnu.org
Subject: control message for bug #1399
Date: Tue, 02 Aug 2011 18:04:27 +0200
close 1399 




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 31 Aug 2011 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 13 years and 353 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.