GNU bug report logs - #17412
24.3; Unicode key events broken, not usable in input method

Previous Next

Package: emacs;

Reported by: Stefan Dorn <mail <at> muflax.com>

Date: Mon, 5 May 2014 22:51:02 UTC

Severity: normal

Found in version 24.3

To reply to this bug, email your comments to 17412 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#17412; Package emacs. (Mon, 05 May 2014 22:51:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefan Dorn <mail <at> muflax.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 05 May 2014 22:51:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stefan Dorn <mail <at> muflax.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.3; Unicode key events broken, not usable in input method
Date: Mon, 5 May 2014 23:29:52 +0100
My keyboard layout includes Unicode keys like "ł", U+0142 (l with
stroke), and combining diacritics (U+0300 etc). I've been trying to use
them in quail layouts, eg:

  (quail-define-package
  "custom" "custom layout" "^" t
  "Proof-of-concept layout." nil t t nil nil nil nil nil nil nil t)

  (quail-define-rules ("ł" ?l))

The key is never passed into the input-method-function, and so just
inserted literally. (Typing Unicode keys directly works fine.)

Digging around in keyboard.c, I found that read_char() only passes
events with keycode < 256 (line 3050ff) to input-method-function:

  /* Pass this to the input method, if appropriate.  */
  if (INTEGERP (c)
      && ! NILP (Vinput_method_function)
      /* Don't run the input method within a key sequence,
     after the first event of the key sequence.  */
      && NILP (prev_event)
      && ' ' <= XINT (c) && XINT (c) < 256 && XINT (c) != 127)

Using read-key-sequence, Emacs seems to parse "ł" as [322] (0x142 in
decimal). Disabling the condition in read_char() (so the key is
actually passed to quail) only seems to cause an infinite loop in quail
that I've not been able diagnose yet.

[322] as key event seems strange to me. The XLib keycode for "ł" (as
reported by xev) is 0x1000142. Maybe Emacs cuts off the leading bit?

Interestingly, quail shows the key in the guidance screen just fine, ie:

  (quail-define-rules ("xł" ?l))

and typing "x" correctly suggest "xł" as a pattern; it's just impossible
to pass "ł" to quail and have it be parsed correctly.


In GNU Emacs 24.3.1 (x86_64-pc-linux-gnu, X toolkit)
 of 2014-05-05 on scabeiathrax
Windowing system distributor `The X.Org Foundation', version 11.0.11500000
System Description:    NAME=Gentoo

Configured using:
 `configure '--prefix=/usr' '--build=x86_64-pc-linux-gnu'
 '--host=x86_64-pc-linux-gnu' '--mandir=/usr/share/man'
 '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc'
 '--localstatedir=/var/lib' '--libdir=/usr/lib64'
 '--disable-silent-rules' '--disable-dependency-tracking'
 '--program-suffix=-emacs-24' '--infodir=/usr/share/info/emacs-24'
 '--localstatedir=/var'
 '--enable-locallisppath=/etc/emacs:/usr/share/emacs/site-lisp'
 '--with-crt-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.8.2/../../../../lib64'
 '--with-gameuser=games' '--without-compress-info' '--without-hesiod'
 '--without-kerberos' '--without-kerberos5' '--with-gpm' '--with-dbus'
 '--without-gnutls' '--without-xml2' '--without-selinux'
 '--without-wide-int' '--with-sound' '--with-x' '--without-ns'
 '--without-gconf' '--without-gsettings' '--without-toolkit-scroll-bars'
 '--with-gif' '--with-jpeg' '--with-png' '--with-rsvg' '--with-tiff'
 '--with-xpm' '--without-imagemagick' '--with-xft' '--with-libotf'
 '--with-m17n-flt' '--with-x-toolkit=lucid' '--with-xaw3d'
 'GENTOO_PACKAGE=app-editors/emacs-24.3-r4'
 'build_alias=x86_64-pc-linux-gnu' 'host_alias=x86_64-pc-linux-gnu'
 'CFLAGS=-O2 -pipe -march=core2' 'LDFLAGS=-Wl,-O1 -Wl,--sort-common
 -Wl,--hash-style=gnu -Wl,--as-needed' 'CPPFLAGS=''

Important settings:
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
( r e d a <backspace> <backspace> a d - k e y - s e
q u e n c e SPC " k e y : SPC " ) <left> <left> C-M-x
̈ C-M-x ł C-M-x e M-x b u g <tab> <backspace> <backspace>
<backspace> <backspace> <backspace> <backspace> <backspace>
<backspace> <backspace> <backspace> <backspace> <backspace>
<backspace> <backspace> r e p o <tab> r t <tab> <r
eturn>

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
[776]
[322]
"e"
Making completion list...

Load-path shadows:
None found.

Features:
(shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml
mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev
gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils help-mode easymenu time-date tooltip ediff-hook
vc-hooks lisp-float-type mwheel x-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list newcomment lisp-mode register page
menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core frame cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote make-network-process dbusbind dynamic-setting
font-render-setting x-toolkit x multi-tty emacs)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17412; Package emacs. (Tue, 06 May 2014 16:07:02 GMT) Full text and rfc822 format available.

Message #8 received at 17412 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Stefan Dorn <mail <at> muflax.com>
Cc: 17412 <at> debbugs.gnu.org
Subject: Re: bug#17412: 24.3; Unicode key events broken,
 not usable in input method
Date: Tue, 06 May 2014 12:06:47 -0400
> Digging around in keyboard.c, I found that read_char() only passes
> events with keycode < 256 (line 3050ff) to input-method-function:

Indeed, this has been in the input-method design from the start.
I'd be interested to know why.  Handa?

> [322] as key event seems strange to me. The XLib keycode for "ł" (as
> reported by xev) is 0x1000142. Maybe Emacs cuts off the leading bit?

322 = U+0142, so it's really not strange at all: Emacs uses
Unicode internally.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17412; Package emacs. (Tue, 06 May 2014 18:39:02 GMT) Full text and rfc822 format available.

Message #11 received at 17412 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Dorn <mail <at> muflax.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>, 17412 <at> debbugs.gnu.org
Subject: Re: bug#17412: 24.3;
 Unicode key events broken, not usable in input method
Date: Tue, 6 May 2014 19:38:24 +0100
>> Digging around in keyboard.c, I found that read_char() only passes
>> events with keycode < 256 (line 3050ff) to input-method-function:
>
> Indeed, this has been in the input-method design from the start.
> I'd be interested to know why.  Handa?

I write a lot of linguistic analysis, and so added common IPA symbols
to my core keyboard layout, like ß, ł or æ. (I could type them through
an input method, but that would be slower and force me to use a
different typing method inside and outside of Emacs, which would slow
me down a lot.)

I recently set up a Cyrillic input method, but was surprised I
arbitrarily could use ß in quail but not ł, just because ß is below
the magic threshold. Unfortunately, merely turning off the conditional
in read_char() is not enough to get it to work.

More importantly, I also have most combining diacritic characters
(U+0301 ff) on keys and use them a lot. Switching them to some
"similar looking punctuation -> diacritic" input method would be
seriously annoying due to lots of conflicts (quoting a letter vs
umlauting it etc).

Most search features in Emacs don't do Unicode normalization, so ä (a
with umlaut) and ä (a with combining diacritic umlaut) don't match. I
added some normalization hacks to isearch and just force-normalize the
buffer when I save it, but wanted a more universal and clean solution.

I thought I could just set up a "letter + combining diacritic" ->
"normalized character" input method to fix most of this, but again
arbitrarily can't use any of the diacritics in quail.

>> [322] as key event seems strange to me. The XLib keycode for "ł" (as
>> reported by xev) is 0x1000142. Maybe Emacs cuts off the leading bit?
>
> 322 = U+0142, so it's really not strange at all: Emacs uses
> Unicode internally.

Ah, cool.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17412; Package emacs. (Tue, 06 May 2014 18:56:02 GMT) Full text and rfc822 format available.

Message #14 received at 17412 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Dorn <mail <at> muflax.com>
Cc: monnier <at> iro.umontreal.ca, 17412 <at> debbugs.gnu.org
Subject: Re: bug#17412: 24.3;
 Unicode key events broken, not usable in input method
Date: Tue, 06 May 2014 21:55:41 +0300
> From: Stefan Dorn <mail <at> muflax.com>
> Date: Tue, 6 May 2014 19:38:24 +0100
> 
> Most search features in Emacs don't do Unicode normalization, so ä (a
> with umlaut) and ä (a with combining diacritic umlaut) don't match. I
> added some normalization hacks to isearch and just force-normalize the
> buffer when I save it, but wanted a more universal and clean solution.
> 
> I thought I could just set up a "letter + combining diacritic" ->
> "normalized character" input method to fix most of this, but again
> arbitrarily can't use any of the diacritics in quail.

That's not how to add normalization support to Emacs search.  It is
much better to define a case-table that maps each normalization
variant to a single canonical one, and then search functions will (or
at least should: I didn't actually try that) automatically do the
mapping for you, both in the search string and in the buffer/string
text you are searching through.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17412; Package emacs. (Tue, 06 May 2014 20:13:02 GMT) Full text and rfc822 format available.

Message #17 received at 17412 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Stefan Dorn <mail <at> muflax.com>, 17412 <at> debbugs.gnu.org
Subject: Re: bug#17412: 24.3;
 Unicode key events broken, not usable in input method
Date: Tue, 06 May 2014 16:12:13 -0400
> That's not how to add normalization support to Emacs search.  It is
> much better to define a case-table that maps each normalization
> variant to a single canonical one, and then search functions will (or
> at least should: I didn't actually try that) automatically do the

Can case-tables do such normalization?  Last I checked, they work "one
char at a time" and can't handle multi-char mappings at all (neither as
input nor as output).


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17412; Package emacs. (Tue, 06 May 2014 20:15:02 GMT) Full text and rfc822 format available.

Message #20 received at 17412 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>, 
 Eli Zaretskii <eliz <at> gnu.org>
Cc: Stefan Dorn <mail <at> muflax.com>, 17412 <at> debbugs.gnu.org
Subject: Re: bug#17412: 24.3; Unicode key events broken, not usable in input
 method
Date: Tue, 06 May 2014 13:14:08 -0700
[Message part 1 (text/plain, inline)]
On 05/06/2014 01:12 PM, Stefan Monnier wrote:
>> That's not how to add normalization support to Emacs search.  It is
>> much better to define a case-table that maps each normalization
>> variant to a single canonical one, and then search functions will (or
>> at least should: I didn't actually try that) automatically do the
> 
> Can case-tables do such normalization?  Last I checked, they work "one
> char at a time" and can't handle multi-char mappings at all (neither as
> input nor as output).

So why not make them stateful?

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17412; Package emacs. (Wed, 07 May 2014 18:14:01 GMT) Full text and rfc822 format available.

Message #23 received at 17412 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: mail <at> muflax.com, 17412 <at> debbugs.gnu.org
Subject: Re: bug#17412: 24.3;
 Unicode key events broken, not usable in input method
Date: Wed, 07 May 2014 21:13:01 +0300
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: Stefan Dorn <mail <at> muflax.com>,  17412 <at> debbugs.gnu.org
> Date: Tue, 06 May 2014 16:12:13 -0400
> 
> > That's not how to add normalization support to Emacs search.  It is
> > much better to define a case-table that maps each normalization
> > variant to a single canonical one, and then search functions will (or
> > at least should: I didn't actually try that) automatically do the
> 
> Can case-tables do such normalization?  Last I checked, they work "one
> char at a time" and can't handle multi-char mappings at all (neither as
> input nor as output).

I meant the canonical slot of the case-tables.  Of course, doing what
I suggested will need some changes on the C level, but they are
straightforward, I think.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17412; Package emacs. (Mon, 12 May 2014 23:24:02 GMT) Full text and rfc822 format available.

Message #26 received at 17412 <at> debbugs.gnu.org (full text, mbox):

From: handa <at> gnu.org (K. Handa)
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: mail <at> muflax.com, 17412 <at> debbugs.gnu.org
Subject: Re: bug#17412: 24.3;
 Unicode key events broken, not usable in input method
Date: Tue, 13 May 2014 08:23:50 +0900
In article <jwviopiki5k.fsf-monnier+emacsbugs <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

> > Digging around in keyboard.c, I found that read_char() only passes
> > events with keycode < 256 (line 3050ff) to input-method-function:

> Indeed, this has been in the input-method design from the start.
> I'd be interested to know why.  Handa?

As far as I remember, the relevant code was written by RMS,
and I'm sorry but I don't remember what I discussed with RMS
at that time.

Perhaps we had expected that a user typed C as a character
if C >= 256, not as a key to input another character.

---
Kenichi Handa
handa <at> gnu.org




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17412; Package emacs. (Tue, 13 May 2014 01:18:02 GMT) Full text and rfc822 format available.

Message #29 received at 17412 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: handa <at> gnu.org (K. Handa)
Cc: mail <at> muflax.com, 17412 <at> debbugs.gnu.org
Subject: Re: bug#17412: 24.3;
 Unicode key events broken, not usable in input method
Date: Mon, 12 May 2014 21:17:47 -0400
> Perhaps we had expected that a user typed C as a character
> if C >= 256, not as a key to input another character.

Sounds like it, indeed, but since we have decoded chars by the time we
get to input-event processing, it doesn't seem very useful to prevent
users from using non-ASCII keys for input-methods.

IOW, we should try and lift this restriction,


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17412; Package emacs. (Tue, 13 May 2014 12:12:02 GMT) Full text and rfc822 format available.

Message #32 received at 17412 <at> debbugs.gnu.org (full text, mbox):

From: handa <at> gnu.org (K. Handa)
To: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
Cc: mail <at> muflax.com, 17412 <at> debbugs.gnu.org
Subject: Re: bug#17412: 24.3;
 Unicode key events broken, not usable in input method
Date: Tue, 13 May 2014 21:11:49 +0900
In article <jwvwqdqzdeh.fsf-monnier+emacsbugs <at> gnu.org>, Stefan Monnier <monnier <at> IRO.UMontreal.CA> writes:

> > Perhaps we had expected that a user typed C as a character
> > if C >= 256, not as a key to input another character.

> Sounds like it, indeed, but since we have decoded chars by the time we
> get to input-event processing, it doesn't seem very useful to prevent
> users from using non-ASCII keys for input-methods.

> IOW, we should try and lift this restriction,

Yes, I agree.

---
Kenichi Handa
handa <at> gnu.org




This bug report was last modified 11 years and 33 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.