GNU bug report logs - #16216
24.3.50; <control> entries in `ucs-names'

Previous Next

Package: emacs;

Reported by: Drew Adams <drew.adams <at> oracle.com>

Date: Sun, 22 Dec 2013 02:10:01 UTC

Severity: normal

Found in version 24.3.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log

View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#16216: closed (24.3.50; <control> entries in `ucs-names')
Date: Sun, 22 Dec 2013 18:11:01 +0000

[Message part 1 (text/plain, inline)]

Your message dated Sun, 22 Dec 2013 20:10:36 +0200
with message-id <83haa092oz.fsf <at> gnu.org>
and subject line Re: bug#16216: 24.3.50; <control> entries in `ucs-names'
has caused the debbugs.gnu.org bug report #16216,
regarding 24.3.50; <control> entries in `ucs-names'
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
16216: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16216
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems

[Message part 2 (message/rfc822, inline)]

From: Drew Adams <drew.adams <at> oracle.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.3.50; <control> entries in `ucs-names'
Date: Sat, 21 Dec 2013 18:09:17 -0800 (PST)

The doc for `insert-char' and `ucs-names' is sketchy.  But it does at
least say that it is about inserting a character "using its UNICODE
name or its code point."

So what are all of those `<control>' character names about?  Many
characters are listed in `ucs-names' as having this same "character
name", `<control>':

 C-x 8 RET TAB C-g
 C-h v ucs-names
 C-s <control> C-s C-s...

And yet, AFAICT, there is no UNICODE character that has the name
`<control>', or even any name that has that as a substring.
http://www.unicode.org/charts/charindex.html

The seems like a bug.  But since the description of `ucs-names' is
so sketchy it's hard to assert that.  If this is not a bug, then:

1. In what way is `<control>' a "CHAR-NAME" for a character with any
   code point?  What does CHAR-NAME mean in this case?

2. What is the purpose of the multiple `<control>' CHAR-NAMEs?

3. Why are different CHAR-CODE values associated with the same
   CHAR-NAME, `<control>'?  What does that mean?

4. Try `C-x 8 RET <contr TAB RET'.  You get only one particular
   character "named" <control>, the one with code point decimal
   159.  That's the character named "APPLICATION PROGRAM COMMAND".
   Why that one?


In GNU Emacs 24.3.50.1 (i686-pc-mingw32)
 of 2013-12-16 on ODIEONE
Bzr revision: 115543 rudalics <at> gmx.at-20131216095844-lbjh5yerk6ff0tm7
Windowing system distributor `Microsoft Corp.', version 6.1.7601
Configured using:
 `configure --prefix=/c/Devel/emacs/binary --enable-checking=yes,glyphs
 'CFLAGS=-O0 -g3' LDFLAGS=-Lc:/Devel/emacs/lib
 CPPFLAGS=-Ic:/Devel/emacs/include'

[Message part 3 (message/rfc822, inline)]

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 16216-done <at> debbugs.gnu.org
Subject: Re: bug#16216: 24.3.50; <control> entries in `ucs-names'
Date: Sun, 22 Dec 2013 20:10:36 +0200

> Date: Sat, 21 Dec 2013 21:08:35 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 16216 <at> debbugs.gnu.org
> 
> > Look at UnicodeData.txt, near the beginning of the file.
> 
> I see; thanks.  And I recall now that you pointed me to that
> file once before.
> 
> Still, that does not really answer the questions I posed, AFAICT.
> At least not for a user of `ucs-names' or the other functions
> mentioned.

I looked deeper and decided that this was a bug.  The Unicode Standard
explicitly says that control characters have no 'name' property (see
Section 4.8 in the Standard), and that those "<control>" things are
just labels.  The 'name' property cannot have lower-case characters of
"<>" in it anyway.

So starting with trunk revision 115693, all control characters will
have nil as their 'name' property, and "C-x 8 RET < TAB" will say "No
match".  (Some of the control characters have 'old-name' property, so
they still can be called out by name.)

> If `ucs-names' essentially corresponds to UnicodeData.txt, how
> about citing that in its doc?

The exact file is an implementation detail (there's a corresponding
XML file, which could be used if we wanted); the ELisp manual
documents that the properties are derived from UCD, the Unicode
Character Database.

Thanks.

This bug report was last modified 11 years and 210 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #16216 24.3.50; <control> entries in `ucs-names'

GNU bug report logs - #16216
24.3.50; <control> entries in `ucs-names'