GNU bug report logs -
#29837
UTF-16 char display problems and the macOS "character palette"
Previous Next
Reported by: Alan Third <alan <at> idiocy.org>
Date: Sun, 24 Dec 2017 16:02:02 UTC
Severity: normal
Tags: fixed
Fixed in version 27.1
Done: Alan Third <alan <at> idiocy.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29837 in the body.
You can then email your comments to 29837 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#29837
; Package
emacs
.
(Sun, 24 Dec 2017 16:02:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Alan Third <alan <at> idiocy.org>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Sun, 24 Dec 2017 16:02:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi, I’ve had a go at enabling the macOS character palette, which is
just a virtual keyboard that helps you to enter special characters,
emoji’s, etc.
It’s easy enough to bring it up (patch attached) but some special
characters are put into Emacs incorrectly. I think the problem is that
we have multi code‐point UTF‐16 characters, and when they are ‘typed’
into Emacs they are entered as individual 16 bit code‐points and are
therefore displayed as a series of blank spaces.
An example is '🢫' (RIGHTWARDS FRONT-TILTED SHADOWED WHITE ARROW). If I
enter it using C‐x 8 RET, it appears correctly, but if I use the
character palette it shows up as two blank spaces. Describe-char
reveals these to be HIGH SURROGATE-D83E and LOW SURROGATE-DCAB, in
that order.
I can’t work out if Emacs should be able to handle these multi
code‐point characters being entered from a ‘keyboard’ input or not. If
so, does anyone have any idea what I need to do?
(Another minor irritation is that some characters (like pointing
hands) seem to insert the desired character then follow up with
VARIATION SELECTOR-15. I assume this is supposed to tell us what
colour we want the hand? If so should it be displayed?)
--
Alan Third
[0001-Add-macOS-character-palette.patch (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#29837
; Package
emacs
.
(Sun, 24 Dec 2017 16:57:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 29837 <at> debbugs.gnu.org (full text, mbox):
> Date: Sun, 24 Dec 2017 16:00:53 +0000
> From: Alan Third <alan <at> idiocy.org>
>
> It’s easy enough to bring it up (patch attached) but some special
> characters are put into Emacs incorrectly. I think the problem is that
> we have multi code‐point UTF‐16 characters, and when they are ‘typed’
> into Emacs they are entered as individual 16 bit code‐points and are
> therefore displayed as a series of blank spaces.
>
> An example is '🢫' (RIGHTWARDS FRONT-TILTED SHADOWED WHITE ARROW). If I
> enter it using C‐x 8 RET, it appears correctly, but if I use the
> character palette it shows up as two blank spaces. Describe-char
> reveals these to be HIGH SURROGATE-D83E and LOW SURROGATE-DCAB, in
> that order.
You need to tell Emacs that keyboard input is in UTF-16. Did you try
"C-x RET k"?
> (Another minor irritation is that some characters (like pointing
> hands) seem to insert the desired character then follow up with
> VARIATION SELECTOR-15. I assume this is supposed to tell us what
> colour we want the hand? If so should it be displayed?)
Emacs doesn't yet support variation selectors. Patches to add that
are welcome (I guess it will need some change in our interface with
font back-ends?).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#29837
; Package
emacs
.
(Sun, 24 Dec 2017 18:24:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 29837 <at> debbugs.gnu.org (full text, mbox):
On Sun, Dec 24, 2017 at 06:56:29PM +0200, Eli Zaretskii wrote:
> > An example is '🢫' (RIGHTWARDS FRONT-TILTED SHADOWED WHITE ARROW). If I
> > enter it using C‐x 8 RET, it appears correctly, but if I use the
> > character palette it shows up as two blank spaces. Describe-char
> > reveals these to be HIGH SURROGATE-D83E and LOW SURROGATE-DCAB, in
> > that order.
>
> You need to tell Emacs that keyboard input is in UTF-16. Did you try
> "C-x RET k"?
I have now but I can’t find a utf-16 option that is ‘suitable’ for
keyboard input.
--
Alan Third
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#29837
; Package
emacs
.
(Sun, 24 Dec 2017 18:58:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 29837 <at> debbugs.gnu.org (full text, mbox):
> Date: Sun, 24 Dec 2017 18:23:21 +0000
> From: Alan Third <alan <at> idiocy.org>
> Cc: 29837 <at> debbugs.gnu.org
>
> > You need to tell Emacs that keyboard input is in UTF-16. Did you try
> > "C-x RET k"?
>
> I have now but I can’t find a utf-16 option that is ‘suitable’ for
> keyboard input.
What do you mean by "option" and by "suitable"?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#29837
; Package
emacs
.
(Sun, 24 Dec 2017 19:29:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 29837 <at> debbugs.gnu.org (full text, mbox):
On Sun, Dec 24, 2017 at 08:57:04PM +0200, Eli Zaretskii wrote:
> > Date: Sun, 24 Dec 2017 18:23:21 +0000
> > From: Alan Third <alan <at> idiocy.org>
> > Cc: 29837 <at> debbugs.gnu.org
> >
> > > You need to tell Emacs that keyboard input is in UTF-16. Did you try
> > > "C-x RET k"?
> >
> > I have now but I can’t find a utf-16 option that is ‘suitable’ for
> > keyboard input.
>
> What do you mean by "option" and by "suitable"?
If I try to select utf-16 I get this
set-keyboard-coding-system: Unsuitable coding system for keyboard: utf-16
and I used tab completion to find which other coding systems were
available but all the ones beginning utf-16 that I tried return the
same message.
--
Alan Third
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#29837
; Package
emacs
.
(Sun, 24 Dec 2017 19:35:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 29837 <at> debbugs.gnu.org (full text, mbox):
> Date: Sun, 24 Dec 2017 19:28:07 +0000
> From: Alan Third <alan <at> idiocy.org>
> Cc: 29837 <at> debbugs.gnu.org
>
> If I try to select utf-16 I get this
>
> set-keyboard-coding-system: Unsuitable coding system for keyboard: utf-16
>
> and I used tab completion to find which other coding systems were
> available but all the ones beginning utf-16 that I tried return the
> same message.
Oh, I now recollect that Handa-san said at some point that keyboard
input doesn't support UTF-16...
How do other macOS programs read UTF-16 keyboard input? Maybe you
could use the same way to read the sequences, and then decode them
internally as UTF-16 using coding.c facilities, and feed them into the
Emacs event queue? Just a thought.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#29837
; Package
emacs
.
(Mon, 25 Dec 2017 20:15:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 29837 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> schrieb am So., 24. Dez. 2017 um 20:35 Uhr:
> > Date: Sun, 24 Dec 2017 19:28:07 +0000
> > From: Alan Third <alan <at> idiocy.org>
> > Cc: 29837 <at> debbugs.gnu.org
> >
> > If I try to select utf-16 I get this
> >
> > set-keyboard-coding-system: Unsuitable coding system for keyboard:
> utf-16
> >
> > and I used tab completion to find which other coding systems were
> > available but all the ones beginning utf-16 that I tried return the
> > same message.
>
> Oh, I now recollect that Handa-san said at some point that keyboard
> input doesn't support UTF-16...
>
> How do other macOS programs read UTF-16 keyboard input? Maybe you
> could use the same way to read the sequences, and then decode them
> internally as UTF-16 using coding.c facilities, and feed them into the
> Emacs event queue? Just a thought.
>
>
IIUC Emacs receives the input as a single UTF-16 string (in insertText),
then iterates over the UTF-16 code units, converting each into an Emacs
event. That's wrong, no matter whether the input comes from the character
palette or from the keyboard; normal keyboard layouts just happen to not
contain non-BMP characters. The loop needs to account for surrogates.
As a small optimization (which is warranted because the function is
probably called on every keystroke), this should use [NSString
getCharacters:range:] to copy all the UTF-16 code units to a buffer first,
to avoid repeated calls to characterAtIndex.
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#29837
; Package
emacs
.
(Mon, 25 Dec 2017 21:08:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 29837 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Philipp Stephani <p.stephani2 <at> gmail.com> schrieb am Mo., 25. Dez. 2017 um
21:13 Uhr:
>
>
> Eli Zaretskii <eliz <at> gnu.org> schrieb am So., 24. Dez. 2017 um 20:35 Uhr:
>
>> > Date: Sun, 24 Dec 2017 19:28:07 +0000
>> > From: Alan Third <alan <at> idiocy.org>
>> > Cc: 29837 <at> debbugs.gnu.org
>> >
>> > If I try to select utf-16 I get this
>> >
>> > set-keyboard-coding-system: Unsuitable coding system for keyboard:
>> utf-16
>> >
>> > and I used tab completion to find which other coding systems were
>> > available but all the ones beginning utf-16 that I tried return the
>> > same message.
>>
>> Oh, I now recollect that Handa-san said at some point that keyboard
>> input doesn't support UTF-16...
>>
>> How do other macOS programs read UTF-16 keyboard input? Maybe you
>> could use the same way to read the sequences, and then decode them
>> internally as UTF-16 using coding.c facilities, and feed them into the
>> Emacs event queue? Just a thought.
>>
>>
> IIUC Emacs receives the input as a single UTF-16 string (in insertText) ...
>
On a somewhat related note, insertText: is itself deprecated and should be
replaced with insertText:replacementRange:.
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#29837
; Package
emacs
.
(Tue, 26 Dec 2017 01:35:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 29837 <at> debbugs.gnu.org (full text, mbox):
On Mon, Dec 25, 2017 at 08:13:55PM +0000, Philipp Stephani wrote:
> IIUC Emacs receives the input as a single UTF-16 string (in
> insertText), then iterates over the UTF-16 code units, converting
> each into an Emacs event. That's wrong, no matter whether the input
> comes from the character palette or from the keyboard; normal
> keyboard layouts just happen to not contain non-BMP characters. The
> loop needs to account for surrogates.
I finally came to this conclusion myself. I now know a lot more about
UTF‐16 than I did yesterday. :)
Wish I’d looked at my email earlier, though.
> As a small optimization (which is warranted because the function is
> probably called on every keystroke), this should use [NSString
> getCharacters:range:] to copy all the UTF-16 code units to a buffer
> first, to avoid repeated calls to characterAtIndex.
Presumably the vast majority of input will consist of just one code
unit, though?
--
Alan Third
Added tag(s) fixed.
Request was from
Alan Third <alan <at> idiocy.org>
to
control <at> debbugs.gnu.org
.
(Sun, 07 Jan 2018 20:44:01 GMT)
Full text and
rfc822 format available.
bug marked as fixed in version 27.1, send any further explanations to
29837 <at> debbugs.gnu.org and Alan Third <alan <at> idiocy.org>
Request was from
Alan Third <alan <at> idiocy.org>
to
control <at> debbugs.gnu.org
.
(Sun, 07 Jan 2018 20:44:01 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 05 Feb 2018 12:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 7 years and 138 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.