GNU bug report logs -
#20499
[PROPOSED PATCH] C-x 8 shorthands for curved quotes, Euro, etc.
Previous Next
Reported by: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Mon, 4 May 2015 01:15:03 UTC
Severity: wishlist
Tags: patch
Merged with 16082
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
Message #114 received at 20499 <at> debbugs.gnu.org (full text, mbox):
> From: Ivan Shmakov <ivan <at> siamics.net>
> Date: Thu, 07 May 2015 10:00:38 +0000
>
> > Although I suppose it comes from a decomposition table, I don't know
> > what the table was designed for, and it's not clear to me how it's
> > relevant.
>
> I hope someone more knowledgeable could comment on this.
I'm not sure I'm your man, or what needs to be commented on, but I
will try nonetheless ;-)
The 'decomposition property of a character (as every other property
accessed by get-char-code-property) comes directly from Unicode
database. In this case, you will see that some characters in
UnicodeData.txt have this part non-empty:
1E99;LATIN SMALL LETTER Y WITH RING ABOVE;Ll;0;L;0079 030A;;;;N;;;;;
^^^^^^^^^
This gives the so-called "canonical decomposition" of the character;
in this case, we are told that U+1E99's decomposition is a sequence of
U+0079 (lower-case y) followed by U+030A (combining ring above).
Some characters have "compatibility decompositions" instead, like
this:
1E9A;LATIN SMALL LETTER A WITH RIGHT HALF RING;Ll;0;L;<compat> 0061 02BE;;;;N;;;;;
^^^^^^^^^^^^^^^^^^
which is useful for collation-driven sorting and for loose comparisons
a-la string-collate-lessp.
For more details about this, see http://unicode.org/reports/tr44/, the
Unicode Technical Report that describes the Unicode Character
Database.
This bug report was last modified 4 years and 343 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.