GNU bug report logs -
#24784
26.0.50; JSON strings with utf-16 escape codes
Previous Next
Reported by: Helmut Eller <eller.helmut <at> gmail.com>
Date: Mon, 24 Oct 2016 18:07:01 UTC
Severity: normal
Found in version 26.0.50
Done: Philipp Stephani <p.stephani2 <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your bug report
#24784: 26.0.50; JSON strings with utf-16 escape codes
which was filed against the emacs package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 24784 <at> debbugs.gnu.org.
--
24784: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=24784
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
[Message part 3 (text/plain, inline)]
Dmitry Gutov <dgutov <at> yandex.ru> schrieb am So., 1. Jan. 2017 um 01:45 Uhr:
> On 31.12.2016 19:53, Philipp Stephani wrote:
>
> > Agreed; converted to defun. I've only used defsubst because some other
> > helper functions also used defsubst.
>
> Thanks. Those others can probably be changed as well.
>
Yes, but rather not in this commit.
>
> > No, the below case is more general and therefore has to come last.
>
> Makes sense.
>
> > It's not 100% related to the patch, but I think it can be included for
> > symmetry reasons (testing encoding as well as decoding).
>
> Of course. These are testing utf-8 encoding, though, right? It would be
> better if you split them to a separate commit, I think.
>
OK, I've removed it from this patch and pushed it as 93be35e038.
[Message part 4 (text/html, inline)]
[Message part 5 (message/rfc822, inline)]
json-read-from-string doesn't parse strings correctly if the the \u
syntax is used to write UTF-16 surrogates:
(equal (json-read-from-string "\"\\uD834\\uDD1E\"") "\"\U0001D11E\"")
=> nil
The correct result t. To quote RFC 7159[*]:
To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a 12-character sequence,
encoding the UTF-16 surrogate pair. So, for example, a string
containing only the G clef character (U+1D11E) may be represented as
"\uD834\uDD1E".
[*] https://tools.ietf.org/html/rfc7159#section-7
In GNU Emacs 26.0.50.2 (x86_64-unknown-linux-gnu, GTK+ Version 3.14.5)
of 2016-10-24 built on caladan
Repository revision: 26ccd19269c040ad5960a7567aa5fc88f142c709
Windowing system distributor 'The X.Org Foundation', version 11.0.11604000
System Description: Debian GNU/Linux 8.5 (jessie)
Configured using:
'configure --with-xpm=no --with-jpeg=no --with-gif=no --with-tiff=no'
Configured features:
PNG SOUND DBUS GSETTINGS NOTIFY GNUTLS LIBXML2 FREETYPE XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11
Important settings:
value of $LANG: C.UTF-8
locale-coding-system: utf-8-unix
This bug report was last modified 8 years and 145 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.