GNU bug report logs -
#16216
24.3.50; <control> entries in `ucs-names'
Previous Next
Reported by: Drew Adams <drew.adams <at> oracle.com>
Date: Sun, 22 Dec 2013 02:10:01 UTC
Severity: normal
Found in version 24.3.50
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your bug report
#16216: 24.3.50; <control> entries in `ucs-names'
which was filed against the emacs package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 16216 <at> debbugs.gnu.org.
--
16216: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16216
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
> Date: Sat, 21 Dec 2013 21:08:35 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 16216 <at> debbugs.gnu.org
>
> > Look at UnicodeData.txt, near the beginning of the file.
>
> I see; thanks. And I recall now that you pointed me to that
> file once before.
>
> Still, that does not really answer the questions I posed, AFAICT.
> At least not for a user of `ucs-names' or the other functions
> mentioned.
I looked deeper and decided that this was a bug. The Unicode Standard
explicitly says that control characters have no 'name' property (see
Section 4.8 in the Standard), and that those "<control>" things are
just labels. The 'name' property cannot have lower-case characters of
"<>" in it anyway.
So starting with trunk revision 115693, all control characters will
have nil as their 'name' property, and "C-x 8 RET < TAB" will say "No
match". (Some of the control characters have 'old-name' property, so
they still can be called out by name.)
> If `ucs-names' essentially corresponds to UnicodeData.txt, how
> about citing that in its doc?
The exact file is an implementation detail (there's a corresponding
XML file, which could be used if we wanted); the ELisp manual
documents that the properties are derived from UCD, the Unicode
Character Database.
Thanks.
[Message part 3 (message/rfc822, inline)]
The doc for `insert-char' and `ucs-names' is sketchy. But it does at
least say that it is about inserting a character "using its UNICODE
name or its code point."
So what are all of those `<control>' character names about? Many
characters are listed in `ucs-names' as having this same "character
name", `<control>':
C-x 8 RET TAB C-g
C-h v ucs-names
C-s <control> C-s C-s...
And yet, AFAICT, there is no UNICODE character that has the name
`<control>', or even any name that has that as a substring.
http://www.unicode.org/charts/charindex.html
The seems like a bug. But since the description of `ucs-names' is
so sketchy it's hard to assert that. If this is not a bug, then:
1. In what way is `<control>' a "CHAR-NAME" for a character with any
code point? What does CHAR-NAME mean in this case?
2. What is the purpose of the multiple `<control>' CHAR-NAMEs?
3. Why are different CHAR-CODE values associated with the same
CHAR-NAME, `<control>'? What does that mean?
4. Try `C-x 8 RET <contr TAB RET'. You get only one particular
character "named" <control>, the one with code point decimal
159. That's the character named "APPLICATION PROGRAM COMMAND".
Why that one?
In GNU Emacs 24.3.50.1 (i686-pc-mingw32)
of 2013-12-16 on ODIEONE
Bzr revision: 115543 rudalics <at> gmx.at-20131216095844-lbjh5yerk6ff0tm7
Windowing system distributor `Microsoft Corp.', version 6.1.7601
Configured using:
`configure --prefix=/c/Devel/emacs/binary --enable-checking=yes,glyphs
'CFLAGS=-O0 -g3' LDFLAGS=-Lc:/Devel/emacs/lib
CPPFLAGS=-Ic:/Devel/emacs/include'
This bug report was last modified 11 years and 245 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.