GNU bug report logs -
#20789
auto-generate more Unicode data from sources
Previous Next
Full log
Message #17 received at 20789 <at> debbugs.gnu.org (full text, mbox):
> From: Glenn Morris <rgm <at> gnu.org>
> Cc: 20789 <at> debbugs.gnu.org
> Date: Mon, 15 Jun 2015 20:22:07 -0400
>
> Eli Zaretskii wrote:
>
> >> I don't suppose that big list can be auto-generated from the inputs?
> >
> > It's not trivial. I describe below some of the issues, in the hope
> > that Someone⢠will volunteer:
>
> Thanks. Script that processes Blocks.txt attached. Some questions:
>
> 1. In Blocks.txt:
>
> FF00..FFEF; Halfwidth and Fullwidth Forms
>
> In Emacs:
>
> (#xFF00 #xFF5F cjk-misc)
> (#xFF61 #xFF9F kana)
> (#xFFE0 #xFFEF cjk-misc)
>
> Is ff60 (FULLWIDTH RIGHT WHITE PARENTHESIS) intentionally omitted?
AFAICT, there's a small mess around there. Based on the names of the
pertinent characters, I think we should have this instead of the above
3 ranges:
(#xFF00 #xFF60 cjk-misc)
(#xFF61 #xFF9F kana)
(#xFFA0 #xFFDF hangul)
(#xFFE0 #xFFEF cjk-misc)
> 2. In Emacs "olt-italic" looks like a typo ("old-italic"). Can it be renamed?
Yes, please.
> 3. In Blocks.txt, Anatolian Hieroglyphs ends at 1467F.
> In Emacs, it ends at 1457F. Typo?
Yes.
> 4. In Blocks.txt:
>
> 20000..2A6DF; CJK Unified Ideographs Extension B
> 2A700..2B73F; CJK Unified Ideographs Extension C
> 2B740..2B81F; CJK Unified Ideographs Extension D
> 2B820..2CEAF; CJK Unified Ideographs Extension E
> 2F800..2FA1F; CJK Compatibility Ideographs Supplement
>
> In Emacs:
>
> (#x20000 #x2CEAF han)
> (#x2F800 #x2FFFF han)
>
> Emacs adds the ranges 2a6e0:2a6ff and 2fa20:2ffff, which Blocks.txt does
> not cover. Intentional?
I don't know, but probably not intentional. I think we had better
made it consistent with the UCD.
> 5. Newly added "sutton-sign-writing" - should be "sutton-signwriting"?
> (The case-insensitive source says "Sutton SignWriting".)
Well, "signwriting" is not a word, AFAIK, it's 2 words (and the funny
camel-case seems to agree with me). AFAIU, they used "SignWriting"
because it's the commercial name. But if you insist, I won't...
Thank you for doing this.
P.S. Does the script work with mawk? (Some systems have it as their
default Awk, I think.)
This bug report was last modified 9 years and 356 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.