GNU bug report logs - #20789
auto-generate more Unicode data from sources

Package: emacs;

Reported by: Glenn Morris <rgm <at> gnu.org>

Date: Thu, 11 Jun 2015 22:06:02 UTC

Severity: wishlist

Found in version 25.0.50

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Glenn Morris <rgm <at> gnu.org>
Cc: handa <at> gnu.org, 20789 <at> debbugs.gnu.org
Subject: bug#20789: Invalid script or charset name:	cuneiform-numbers-and-punctuation
Date: Sat, 27 Jun 2015 10:42:51 +0300

> From: Glenn Morris <rgm <at> gnu.org>
> Cc: Kenichi Handa <handa <at> gnu.org>,  20789 <at> debbugs.gnu.org
> Date: Fri, 26 Jun 2015 22:02:36 -0400
> 
> Eli Zaretskii wrote:
> 
> >> The width 2 characters look like they might be the "W" and "F" characters,
> >
> > Yes.
> >
> >> but just doing that gives a list that has many differences to the list
> >> Emacs uses.
> 
> This is list of "F" and "W" characters, compared to the 11 ranges that
> Emacs uses:

Looks good to me.  The 11 ranges we have now are either identical or
more coarse than the list derived from the UCD that you show.

> > I don't see any significant differences, except perhaps in unassigned
> > codepoints (see paragraph 6.1 of UAX#11 for the treatment of
> > unassigned CJK codepoints).
> 
> I don't know if this means that the above needs modifying?

I was saying that we need to augment the list with the 5 ranges of
unassigned codepoints that belong to the CJK planes, as described in
that section of UAX#11.  An unassigned codepoint has its
'general-category' property set to 'Cn', and the list of the 5 planes
could be in some defconst, because it will probably never change.

Thanks.

This bug report was last modified 10 years and 86 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #20789 auto-generate more Unicode data from sources

GNU bug report logs - #20789
auto-generate more Unicode data from sources