GNU bug report logs - #20789
auto-generate more Unicode data from sources

Previous Next

Package: emacs;

Reported by: Glenn Morris <rgm <at> gnu.org>

Date: Thu, 11 Jun 2015 22:06:02 UTC

Severity: wishlist

Found in version 25.0.50

Full log


Message #41 received at 20789 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Glenn Morris <rgm <at> gnu.org>
Cc: handa <at> gnu.org, 20789 <at> debbugs.gnu.org
Subject: Re: bug#20789: Invalid script or charset
 name:	cuneiform-numbers-and-punctuation
Date: Sat, 27 Jun 2015 10:42:51 +0300
> From: Glenn Morris <rgm <at> gnu.org>
> Cc: Kenichi Handa <handa <at> gnu.org>,  20789 <at> debbugs.gnu.org
> Date: Fri, 26 Jun 2015 22:02:36 -0400
> 
> Eli Zaretskii wrote:
> 
> >> The width 2 characters look like they might be the "W" and "F" characters,
> >
> > Yes.
> >
> >> but just doing that gives a list that has many differences to the list
> >> Emacs uses.
> 
> This is list of "F" and "W" characters, compared to the 11 ranges that
> Emacs uses:

Looks good to me.  The 11 ranges we have now are either identical or
more coarse than the list derived from the UCD that you show.

> > I don't see any significant differences, except perhaps in unassigned
> > codepoints (see paragraph 6.1 of UAX#11 for the treatment of
> > unassigned CJK codepoints).
> 
> I don't know if this means that the above needs modifying?

I was saying that we need to augment the list with the 5 ranges of
unassigned codepoints that belong to the CJK planes, as described in
that section of UAX#11.  An unassigned codepoint has its
'general-category' property set to 'Cn', and the list of the 5 planes
could be in some defconst, because it will probably never change.

Thanks.




This bug report was last modified 9 years and 356 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.