GNU bug report logs - #17133
json-encode-string incorrectly encodes extra-BMP characters

Previous Next

Package: emacs;

Reported by: Nathan Trapuzzano <nbtrap <at> nbtrap.com>

Date: Fri, 28 Mar 2014 22:24:01 UTC

Severity: normal

Done: Simen Heggestøyl <simenheg <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17133 in the body.
You can then email your comments to 17133 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#17133; Package emacs. (Fri, 28 Mar 2014 22:24:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Nathan Trapuzzano <nbtrap <at> nbtrap.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 28 Mar 2014 22:24:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Nathan Trapuzzano <nbtrap <at> nbtrap.com>
To: bug-gnu-emacs <at> gnu.org
Subject: json-encode-string incorrectly encodes extra-BMP characters
Date: Fri, 28 Mar 2014 18:22:25 -0400
M-: (princ (json-encode "\U0001d11e"))
==> "\u1d11e"  ;; should be "\ud834\udd1e" or "𝄞"

From ECMA-404:

  To escape a code point that is not in the Basic Multilingual Plane,
  the character is represented as a twelve-character sequence, encoding
  the UTF-16 surrogate pair. So for example, a string containing only
  the G clef character (U+1D11E) may be represented as "\uD834\uDD1E".




Reply sent to Simen Heggestøyl <simenheg <at> gmail.com>:
You have taken responsibility. (Sun, 04 Oct 2015 15:56:01 GMT) Full text and rfc822 format available.

Notification sent to Nathan Trapuzzano <nbtrap <at> nbtrap.com>:
bug acknowledged by developer. (Sun, 04 Oct 2015 15:56:02 GMT) Full text and rfc822 format available.

Message #10 received at 17133-done <at> debbugs.gnu.org (full text, mbox):

From: Simen Heggestøyl <simenheg <at> gmail.com>
To: Nathan Trapuzzano <nbtrap <at> nbtrap.com>
Cc: 17133-done <at> debbugs.gnu.org, dgutov <at> yandex.ru
Subject: Re: bug#17133: json-encode-string incorrectly encodes extra-BMP
 characters
Date: Sun, 04 Oct 2015 17:55:22 +0200
Nathan Trapuzzano <nbtrap <at> nbtrap.com> writes:
> M-: (princ (json-encode "\U0001d11e"))
> ==> "\u1d11e"  ;; should be "\ud834\udd1e" or "𝄞"
>
>>From ECMA-404:
>
>   To escape a code point that is not in the Basic Multilingual Plane,
>   the character is represented as a twelve-character sequence, encoding
>   the UTF-16 surrogate pair. So for example, a string containing only
>   the G clef character (U+1D11E) may be represented as "\uD834\uDD1E".

This seems to be working as expected in master now; (json-encode
"\U0001d11e") produces "𝄞" as described.

-- Simen




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 02 Nov 2015 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 290 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.