Package: emacs;
Reported by: Ilya Zakharevich <nospam-abuse <at> ilyaz.org>
Date: Tue, 3 Mar 2015 23:11:02 UTC
Severity: normal
Found in version 25.0.50
Done: Stefan Kangas <stefan <at> marxist.se>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Ilya Zakharevich <ilya <at> math.berkeley.edu> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 19994 <at> debbugs.gnu.org Subject: bug#19994: 25.0.50; Unicode keyboard input on Windows Date: Wed, 1 Jul 2015 03:07:12 -0700
On Wed, Mar 04, 2015 at 08:01:01PM +0200, Eli Zaretskii wrote: > > Date: Tue, 3 Mar 2015 15:09:49 -0800 > > From: Ilya Zakharevich <nospam-abuse <at> ilyaz.org> > > > > I’m working on a patch to make Unicode keyboard input to work properly on > > Windows (in graphic mode). > I suggest, indeed, to clean up the code so we could commit it to the > master branch. That way, it will get wider testing, and we can fix > whatever problems it might cause. Any deficiencies that don't cause > regressions wrt the current code can be fixed later, or even not at > all (if we decide them to not be important enough). I had no time to work on the code itself, but • I fixed the formatting, • I pumped up the docs, • I put in the suggested eassert(). ---------------- As it was before, the patch • defines two new static functions, • delays modification of wParam as late as needed (moves 1 LoC in w32_wnd_proc()), and • adds 8 LoC to w32_wnd_proc(). The call to these static functions is conditional on w32_unicode_gui. Enjoy, Ilya --- w32fns.c-ini 2015-01-30 15:33:23.505201400 -0800 +++ w32fns.c 2015-07-01 02:56:30.787672000 -0700 @@ -2832,6 +2832,233 @@ post_character_message (HWND hwnd, UINT my_post_msg (&wmsg, hwnd, msg, wParam, lParam); } +static int +get_wm_chars (HWND aWnd, int *buf, int buflen, int ignore_ctrl, int ctrl, + int *ctrl_cnt, int *is_dead, int vk, int exp) +{ + MSG msg; + /* If doubled is at the end, ignore it */ + int i = buflen, doubled = 0, code_unit; + + if (ctrl_cnt) + *ctrl_cnt = 0; + if (is_dead) + *is_dead = -1; + eassert(w32_unicode_gui); + while (buflen + /* Should be called only when w32_unicode_gui: */ + && PeekMessageW(&msg, aWnd, WM_KEYFIRST, WM_KEYLAST, + PM_NOREMOVE | PM_NOYIELD) + && (msg.message == WM_CHAR || msg.message == WM_SYSCHAR + || msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR + || msg.message == WM_UNICHAR)) + { + /* We extract character payload, but in this call we handle only the + characters which comes BEFORE the next keyup/keydown message. */ + int dead; + + GetMessageW(&msg, aWnd, msg.message, msg.message); + dead = (msg.message == WM_DEADCHAR || msg.message == WM_SYSDEADCHAR); + if (is_dead) + *is_dead = (dead ? msg.wParam : -1); + if (dead) + continue; + code_unit = msg.wParam; + if (doubled) + { + /* had surrogate */ + if (msg.message == WM_UNICHAR + || code_unit < 0xDC00 || code_unit > 0xDFFF) + { /* Mismatched first surrogate. + Pass both code units as if they were two characters. */ + *buf++ = doubled; + if (!--buflen) + return i; /* Drop the 2nd char if at the end of the buffer. */ + } + else /* see https://en.wikipedia.org/wiki/UTF-16 */ + { + code_unit = (doubled << 10) + code_unit - 0x35FDC00; + } + doubled = 0; + } + else if (code_unit >= 0xD800 && code_unit <= 0xDBFF) + { + /* Handle mismatched 2nd surrogate the same as a normal character. */ + doubled = code_unit; + continue; + } + + /* The only "fake" characters delivered by ToUnicode() or + TranslateMessage() are: + 0x01 .. 0x1a for Ctrl-letter, Enter, Tab, Ctrl-Break, Esc, Backspace + 0x00 and 0x1b .. 0x1f for Control- []\@^_ + 0x7f for Control-BackSpace + 0x20 for Control-Space */ + if (ignore_ctrl + && (code_unit < 0x20 || code_unit == 0x7f + || (code_unit == 0x20 && ctrl))) + { + /* Non-character payload in a WM_CHAR + (Ctrl-something pressed, see above). Ignore, and report. */ + if (ctrl_cnt) + *ctrl_cnt++; + continue; + } + /* Traditionally, Emacs would ignore the character payload of VK_NUMPAD* + keys, and would treat them later via `function-key-map'. In addition + to usual 102-key NUMPAD keys, this map also treats `kp-'-variants of + space, tab, enter, separator, equal. TAB and EQUAL, apparently, + cannot be generated on Win-GUI branch. ENTER is already handled + by the code above. According to `lispy_function_keys', kp_space is + generated by not-extended VK_CLEAR. (kp-tab != VK_OEM_NEC_EQUAL!). + + We do similarly for backward-compatibility, but ignore only the + characters restorable later by `function-key-map'. */ + if (code_unit < 0x7f + && ((vk >= VK_NUMPAD0 && vk <= VK_DIVIDE) + || (exp && ((vk >= VK_PRIOR && vk <= VK_DOWN) || + vk == VK_INSERT || vk == VK_DELETE || vk == VK_CLEAR))) + && strchr("0123456789/*-+.,", code_unit)) + continue; + *buf++ = code_unit; + buflen--; + } + return i - buflen; +} + +#ifdef DBG_WM_CHARS +# define FPRINTF_WM_CHARS(ARG) fprintf ARG +#else +# define FPRINTF_WM_CHARS(ARG) 0 +#endif + +int +deliver_wm_chars (int do_translate, HWND hwnd, UINT msg, UINT wParam, + UINT lParam, int legacy_alt_meta) +{ + /* An "old style" keyboard description may assign up to 125 UTF-16 code + points to a keypress. + (However, the "old style" TranslateMessage() would deliver at most 16 of + them.) Be on a safe side, and prepare to treat many more. */ + int ctrl_cnt, buf[1024], count, is_dead; + + /* Since the keypress processing logic of Windows has a lot of state, it + is important to call TranslateMessage() for every keyup/keydown, AND + do it exactly once. (The actual change of state is done by + ToUnicode[Ex](), which is called by TranslateMessage(). So one can + call ToUnicode[Ex]() instead.) + + The "usual" message pump calls TranslateMessage() for EVERY event. + Emacs calls TranslateMessage() very selectively (is it needed for doing + some tricky stuff with Win95??? With newer Windows, selectiveness is, + most probably, not needed - and harms a lot). + + So, with the usual message pump, the following call to TranslateMessage() + is not needed (and is going to be VERY harmful). With Emacs' message + pump, the call is needed. */ + if (do_translate) { + MSG windows_msg = { hwnd, msg, wParam, lParam, 0, {0,0} }; + + windows_msg.time = GetMessageTime (); + TranslateMessage (&windows_msg); + } + count = get_wm_chars (hwnd, buf, sizeof(buf)/sizeof(*buf), 1, + /* The message may have been synthesized by + who knows what; be conservative. */ + modifier_set (VK_LCONTROL) + || modifier_set (VK_RCONTROL) + || modifier_set (VK_CONTROL), + &ctrl_cnt, &is_dead, wParam, + (lParam & 0x1000000L) != 0); + if (count) { + W32Msg wmsg; + int *b = buf, strip_Alt = 1; + + /* wParam is checked when converting CapsLock to Shift */ + wmsg.dwModifiers = do_translate + ? w32_get_key_modifiers (wParam, lParam) : 0; + + /* What follows is just heuristics; the correct treatement requires + non-destructive ToUnicode(): + http://search.cpan.org/~ilyaz/UI-KeyboardLayout/lib/UI/KeyboardLayout.pm#Can_an_application_on_Windows_accept_keyboard_events?_Part_IV:_application-specific_modifiers + + What one needs to find is: + * which of the present modifiers AFFECT the resulting char(s) + (so should be stripped, since their EFFECT is "already + taken into account" in the string in buf), and + * which modifiers are not affecting buf, so should be reported to + the application for further treatment. + + Example: assume that we know: + (A) lCtrl+rCtrl+rAlt modifiers with VK_A key produce a Latin "f" + ("may be logical" with a JCUKEN-flavored Russian keyboard flavor); + (B) removing any one of lCtrl, rCtrl, rAlt changes the produced char; + (C) Win-modifier is not affecting the produced character + (this is the common case: happens with all "standard" layouts). + + Suppose the user presses Win+lCtrl+rCtrl+rAlt modifiers with VK_A. + What is the intent of the user? We need to guess the intent to decide + which event to deliver to the application. + + This looks like a reasonable logic: wince Win- modifier does not affect + the output string, the user was pressing Win for SOME OTHER purpose. + So the user wanted to generate Win-SOMETHING event. Now, what is + something? If one takes the mantra that "character payload is more + important than the combination of keypresses which resulted in this + payload", then one should ignore lCtrl+rCtrl+rAlt, ignore VK_A, and + assume that the user wanted to generate Win-f. + + Unfortunately, without non-destructive ToUnicode(), checking (B) and (C) + is out of question. So we use heuristics (hopefully, covering 99.9999% + of cases). + */ + + /* If ctrl-something delivers chars, ctrl and the rest should be hidden; + so the consumer of key-event won't interpret it as an accelerator. */ + if (wmsg.dwModifiers & ctrl_modifier) + wmsg.dwModifiers = wmsg.dwModifiers & shift_modifier; + /* In many keyboard layouts, (left) Alt is not changing the character. + Unless we are in this situation, strip Alt/Meta. */ + if (wmsg.dwModifiers & (alt_modifier | meta_modifier) + /* If alt-something delivers non-ASCIIchars, alt should be hidden */ + && count == 1 && *b < 0x10000) + { + SHORT r = VkKeyScanW( *b ); + + FPRINTF_WM_CHARS((stderr, "VkKeyScanW %#06x %#04x\n", (int)r, wParam)); + if ((r & 0xFF) == wParam && !(r & ~0x1FF)) + { + /* Char available without Alt modifier, so Alt is "on top" */ + if (legacy_alt_meta + && *b > 0x7f && ('A' <= wParam && wParam <= 'Z')) + /* For backward-compatibility with older Emacsen, let + this be processed by another branch below (which would convert + it to Alt-Latin char via wParam). */ + return 0; + strip_Alt = 0; + } + } + if (strip_Alt) + wmsg.dwModifiers = wmsg.dwModifiers & ~(alt_modifier | meta_modifier); + + signal_user_input (); + while (count--) + { + FPRINTF_WM_CHARS((stderr, "unichar %#06x\n", *b)); + my_post_msg (&wmsg, hwnd, WM_UNICHAR, *b++, lParam); + } + if (!ctrl_cnt) /* Process ALSO as ctrl */ + return 1; + else + FPRINTF_WM_CHARS((stderr, "extra ctrl char\n")); + return -1; + } else if (is_dead >= 0) { + FPRINTF_WM_CHARS((stderr, "dead %#06x\n", is_dead)); + return 1; + } + return 0; +} + /* Main window procedure */ static LRESULT CALLBACK @@ -3007,7 +3234,6 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA /* Synchronize modifiers with current keystroke. */ sync_modifiers (); record_keydown (wParam, lParam); - wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0); windows_translate = 0; @@ -3117,6 +3343,46 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA wParam = VK_NUMLOCK; break; default: + if (w32_unicode_gui) { + /* If this event generates characters or deadkeys, do not interpret + it as a "raw combination of modifiers and keysym". Hide + deadkeys, and use the generated character(s) instead of the + keysym. (Backward compatibility: exceptions for numpad keys + generating 0-9 . , / * - +, and for extra-Alt combined with a + non-Latin char.) + + Try to not report modifiers which have effect on which + character or deadkey is generated. + + Example (contrived): if rightAlt-? generates f (on a Cyrillic + keyboard layout), and Ctrl, leftAlt do not affect the generated + character, one wants to report Ctrl-leftAlt-f if the user + presses Ctrl-leftAlt-rightAlt-?. */ + int res; +#if 0 + /* Some of WM_CHAR may be fed to us directly, some are results of + TranslateMessage(). Using 0 as the first argument (in a + separate call) might help us distinguish these two cases. + + However, the keypress feeders would most probably expect the + "standard" message pump, when TranslateMessage() is called on + EVERY KeyDown/Keyup event. So they may feed us Down-Ctrl + Down-FAKE Char-o and expect us to recognize it as Ctrl-o. + Using 0 as the first argument would interfere with this. */ + deliver_wm_chars (0, hwnd, msg, wParam, lParam, 1); +#endif + /* Processing the generated WM_CHAR messages *WHILE* we handle + KEYDOWN/UP event is the best choice, since withoug any fuss, + we know all 3 of: scancode, virtual keycode, and expansion. + (Additionally, one knows boundaries of expansion of different + keypresses.) */ + res = deliver_wm_chars (1, hwnd, msg, wParam, lParam, 1); + windows_translate = -( res != 0 ); + if (res > 0) /* Bound to character(s) or a deadkey */ + break; + /* deliver_wm_chars() may make some branches after this vestigal */ + } + wParam = map_keypad_keys (wParam, (lParam & 0x1000000L) != 0); /* If not defined as a function key, change it to a WM_CHAR message. */ if (wParam > 255 || !lispy_function_keys[wParam]) { @@ -3184,6 +3450,8 @@ w32_wnd_proc (HWND hwnd, UINT msg, WPARA } } + if (windows_translate == -1) + break; translate: if (windows_translate) {
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.