GNU bug report logs - #29837
UTF-16 char display problems and the macOS "character palette"

Previous Next

Package: emacs;

Reported by: Alan Third <alan <at> idiocy.org>

Date: Sun, 24 Dec 2017 16:02:02 UTC

Severity: normal

Tags: fixed

Fixed in version 27.1

Done: Alan Third <alan <at> idiocy.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29837 in the body.
You can then email your comments to 29837 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#29837; Package emacs. (Sun, 24 Dec 2017 16:02:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Alan Third <alan <at> idiocy.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 24 Dec 2017 16:02:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Alan Third <alan <at> idiocy.org>
To: bug-gnu-emacs <at> gnu.org
Subject: UTF-16 char display problems and the macOS "character palette"
Date: Sun, 24 Dec 2017 16:00:53 +0000
[Message part 1 (text/plain, inline)]
Hi, I’ve had a go at enabling the macOS character palette, which is
just a virtual keyboard that helps you to enter special characters,
emoji’s, etc.

It’s easy enough to bring it up (patch attached) but some special
characters are put into Emacs incorrectly. I think the problem is that
we have multi code‐point UTF‐16 characters, and when they are ‘typed’
into Emacs they are entered as individual 16 bit code‐points and are
therefore displayed as a series of blank spaces.

An example is '🢫' (RIGHTWARDS FRONT-TILTED SHADOWED WHITE ARROW). If I
enter it using C‐x 8 RET, it appears correctly, but if I use the
character palette it shows up as two blank spaces. Describe-char
reveals these to be HIGH SURROGATE-D83E and LOW SURROGATE-DCAB, in
that order.

I can’t work out if Emacs should be able to handle these multi
code‐point characters being entered from a ‘keyboard’ input or not. If
so, does anyone have any idea what I need to do?

(Another minor irritation is that some characters (like pointing
hands) seem to insert the desired character then follow up with
VARIATION SELECTOR-15. I assume this is supposed to tell us what
colour we want the hand? If so should it be displayed?)
-- 
Alan Third
[0001-Add-macOS-character-palette.patch (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29837; Package emacs. (Sun, 24 Dec 2017 16:57:01 GMT) Full text and rfc822 format available.

Message #8 received at 29837 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Third <alan <at> idiocy.org>
Cc: 29837 <at> debbugs.gnu.org
Subject: Re: bug#29837: UTF-16 char display problems and the macOS "character
 palette"
Date: Sun, 24 Dec 2017 18:56:29 +0200
> Date: Sun, 24 Dec 2017 16:00:53 +0000
> From: Alan Third <alan <at> idiocy.org>
> 
> It’s easy enough to bring it up (patch attached) but some special
> characters are put into Emacs incorrectly. I think the problem is that
> we have multi code‐point UTF‐16 characters, and when they are ‘typed’
> into Emacs they are entered as individual 16 bit code‐points and are
> therefore displayed as a series of blank spaces.
> 
> An example is '🢫' (RIGHTWARDS FRONT-TILTED SHADOWED WHITE ARROW). If I
> enter it using C‐x 8 RET, it appears correctly, but if I use the
> character palette it shows up as two blank spaces. Describe-char
> reveals these to be HIGH SURROGATE-D83E and LOW SURROGATE-DCAB, in
> that order.

You need to tell Emacs that keyboard input is in UTF-16.  Did you try
"C-x RET k"?

> (Another minor irritation is that some characters (like pointing
> hands) seem to insert the desired character then follow up with
> VARIATION SELECTOR-15. I assume this is supposed to tell us what
> colour we want the hand? If so should it be displayed?)

Emacs doesn't yet support variation selectors.  Patches to add that
are welcome (I guess it will need some change in our interface with
font back-ends?).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29837; Package emacs. (Sun, 24 Dec 2017 18:24:01 GMT) Full text and rfc822 format available.

Message #11 received at 29837 <at> debbugs.gnu.org (full text, mbox):

From: Alan Third <alan <at> idiocy.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 29837 <at> debbugs.gnu.org
Subject: Re: bug#29837: UTF-16 char display problems and the macOS "character
 palette"
Date: Sun, 24 Dec 2017 18:23:21 +0000
On Sun, Dec 24, 2017 at 06:56:29PM +0200, Eli Zaretskii wrote:
> > An example is '🢫' (RIGHTWARDS FRONT-TILTED SHADOWED WHITE ARROW). If I
> > enter it using C‐x 8 RET, it appears correctly, but if I use the
> > character palette it shows up as two blank spaces. Describe-char
> > reveals these to be HIGH SURROGATE-D83E and LOW SURROGATE-DCAB, in
> > that order.
> 
> You need to tell Emacs that keyboard input is in UTF-16.  Did you try
> "C-x RET k"?

I have now but I can’t find a utf-16 option that is ‘suitable’ for
keyboard input.

-- 
Alan Third




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29837; Package emacs. (Sun, 24 Dec 2017 18:58:02 GMT) Full text and rfc822 format available.

Message #14 received at 29837 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Third <alan <at> idiocy.org>
Cc: 29837 <at> debbugs.gnu.org
Subject: Re: bug#29837: UTF-16 char display problems and the macOS "character
 palette"
Date: Sun, 24 Dec 2017 20:57:04 +0200
> Date: Sun, 24 Dec 2017 18:23:21 +0000
> From: Alan Third <alan <at> idiocy.org>
> Cc: 29837 <at> debbugs.gnu.org
> 
> > You need to tell Emacs that keyboard input is in UTF-16.  Did you try
> > "C-x RET k"?
> 
> I have now but I can’t find a utf-16 option that is ‘suitable’ for
> keyboard input.

What do you mean by "option" and by "suitable"?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29837; Package emacs. (Sun, 24 Dec 2017 19:29:02 GMT) Full text and rfc822 format available.

Message #17 received at 29837 <at> debbugs.gnu.org (full text, mbox):

From: Alan Third <alan <at> idiocy.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 29837 <at> debbugs.gnu.org
Subject: Re: bug#29837: UTF-16 char display problems and the macOS "character
 palette"
Date: Sun, 24 Dec 2017 19:28:07 +0000
On Sun, Dec 24, 2017 at 08:57:04PM +0200, Eli Zaretskii wrote:
> > Date: Sun, 24 Dec 2017 18:23:21 +0000
> > From: Alan Third <alan <at> idiocy.org>
> > Cc: 29837 <at> debbugs.gnu.org
> > 
> > > You need to tell Emacs that keyboard input is in UTF-16.  Did you try
> > > "C-x RET k"?
> > 
> > I have now but I can’t find a utf-16 option that is ‘suitable’ for
> > keyboard input.
> 
> What do you mean by "option" and by "suitable"?

If I try to select utf-16 I get this

    set-keyboard-coding-system: Unsuitable coding system for keyboard: utf-16

and I used tab completion to find which other coding systems were
available but all the ones beginning utf-16 that I tried return the
same message.
-- 
Alan Third




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29837; Package emacs. (Sun, 24 Dec 2017 19:35:02 GMT) Full text and rfc822 format available.

Message #20 received at 29837 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Third <alan <at> idiocy.org>
Cc: 29837 <at> debbugs.gnu.org
Subject: Re: bug#29837: UTF-16 char display problems and the macOS "character
 palette"
Date: Sun, 24 Dec 2017 21:34:37 +0200
> Date: Sun, 24 Dec 2017 19:28:07 +0000
> From: Alan Third <alan <at> idiocy.org>
> Cc: 29837 <at> debbugs.gnu.org
> 
> If I try to select utf-16 I get this
> 
>     set-keyboard-coding-system: Unsuitable coding system for keyboard: utf-16
> 
> and I used tab completion to find which other coding systems were
> available but all the ones beginning utf-16 that I tried return the
> same message.

Oh, I now recollect that Handa-san said at some point that keyboard
input doesn't support UTF-16...

How do other macOS programs read UTF-16 keyboard input?  Maybe you
could use the same way to read the sequences, and then decode them
internally as UTF-16 using coding.c facilities, and feed them into the
Emacs event queue?  Just a thought.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29837; Package emacs. (Mon, 25 Dec 2017 20:15:02 GMT) Full text and rfc822 format available.

Message #23 received at 29837 <at> debbugs.gnu.org (full text, mbox):

From: Philipp Stephani <p.stephani2 <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Alan Third <alan <at> idiocy.org>, 29837 <at> debbugs.gnu.org
Subject: Re: bug#29837: UTF-16 char display problems and the macOS "character
 palette"
Date: Mon, 25 Dec 2017 20:13:55 +0000
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> schrieb am So., 24. Dez. 2017 um 20:35 Uhr:

> > Date: Sun, 24 Dec 2017 19:28:07 +0000
> > From: Alan Third <alan <at> idiocy.org>
> > Cc: 29837 <at> debbugs.gnu.org
> >
> > If I try to select utf-16 I get this
> >
> >     set-keyboard-coding-system: Unsuitable coding system for keyboard:
> utf-16
> >
> > and I used tab completion to find which other coding systems were
> > available but all the ones beginning utf-16 that I tried return the
> > same message.
>
> Oh, I now recollect that Handa-san said at some point that keyboard
> input doesn't support UTF-16...
>
> How do other macOS programs read UTF-16 keyboard input?  Maybe you
> could use the same way to read the sequences, and then decode them
> internally as UTF-16 using coding.c facilities, and feed them into the
> Emacs event queue?  Just a thought.
>
>
IIUC Emacs receives the input as a single UTF-16 string (in insertText),
then iterates over the UTF-16 code units, converting each into an Emacs
event. That's wrong, no matter whether the input comes from the character
palette or from the keyboard; normal keyboard layouts just happen to not
contain non-BMP characters. The loop needs to account for surrogates.
As a small optimization (which is warranted because the function is
probably called on every keystroke), this should use [NSString
getCharacters:range:] to copy all the UTF-16 code units to a buffer first,
to avoid repeated calls to characterAtIndex.
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29837; Package emacs. (Mon, 25 Dec 2017 21:08:01 GMT) Full text and rfc822 format available.

Message #26 received at 29837 <at> debbugs.gnu.org (full text, mbox):

From: Philipp Stephani <p.stephani2 <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Alan Third <alan <at> idiocy.org>, 29837 <at> debbugs.gnu.org
Subject: Re: bug#29837: UTF-16 char display problems and the macOS "character
 palette"
Date: Mon, 25 Dec 2017 21:07:12 +0000
[Message part 1 (text/plain, inline)]
Philipp Stephani <p.stephani2 <at> gmail.com> schrieb am Mo., 25. Dez. 2017 um
21:13 Uhr:

>
>
> Eli Zaretskii <eliz <at> gnu.org> schrieb am So., 24. Dez. 2017 um 20:35 Uhr:
>
>> > Date: Sun, 24 Dec 2017 19:28:07 +0000
>> > From: Alan Third <alan <at> idiocy.org>
>> > Cc: 29837 <at> debbugs.gnu.org
>> >
>> > If I try to select utf-16 I get this
>> >
>> >     set-keyboard-coding-system: Unsuitable coding system for keyboard:
>> utf-16
>> >
>> > and I used tab completion to find which other coding systems were
>> > available but all the ones beginning utf-16 that I tried return the
>> > same message.
>>
>> Oh, I now recollect that Handa-san said at some point that keyboard
>> input doesn't support UTF-16...
>>
>> How do other macOS programs read UTF-16 keyboard input?  Maybe you
>> could use the same way to read the sequences, and then decode them
>> internally as UTF-16 using coding.c facilities, and feed them into the
>> Emacs event queue?  Just a thought.
>>
>>
> IIUC Emacs receives the input as a single UTF-16 string (in insertText) ...
>

On a somewhat related note, insertText: is itself deprecated and should be
replaced with insertText:replacementRange:.
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29837; Package emacs. (Tue, 26 Dec 2017 01:35:02 GMT) Full text and rfc822 format available.

Message #29 received at 29837 <at> debbugs.gnu.org (full text, mbox):

From: Alan Third <alan <at> idiocy.org>
To: Philipp Stephani <p.stephani2 <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 29837 <at> debbugs.gnu.org
Subject: Re: bug#29837: UTF-16 char display problems and the macOS "character
 palette"
Date: Tue, 26 Dec 2017 01:34:23 +0000
On Mon, Dec 25, 2017 at 08:13:55PM +0000, Philipp Stephani wrote:
> IIUC Emacs receives the input as a single UTF-16 string (in
> insertText), then iterates over the UTF-16 code units, converting
> each into an Emacs event. That's wrong, no matter whether the input
> comes from the character palette or from the keyboard; normal
> keyboard layouts just happen to not contain non-BMP characters. The
> loop needs to account for surrogates.

I finally came to this conclusion myself. I now know a lot more about
UTF‐16 than I did yesterday. :)

Wish I’d looked at my email earlier, though.

> As a small optimization (which is warranted because the function is
> probably called on every keystroke), this should use [NSString
> getCharacters:range:] to copy all the UTF-16 code units to a buffer
> first, to avoid repeated calls to characterAtIndex.

Presumably the vast majority of input will consist of just one code
unit, though?
-- 
Alan Third




Added tag(s) fixed. Request was from Alan Third <alan <at> idiocy.org> to control <at> debbugs.gnu.org. (Sun, 07 Jan 2018 20:44:01 GMT) Full text and rfc822 format available.

bug marked as fixed in version 27.1, send any further explanations to 29837 <at> debbugs.gnu.org and Alan Third <alan <at> idiocy.org> Request was from Alan Third <alan <at> idiocy.org> to control <at> debbugs.gnu.org. (Sun, 07 Jan 2018 20:44:01 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 05 Feb 2018 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 7 years and 138 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.