GNU bug report logs -
#24784
26.0.50; JSON strings with utf-16 escape codes
Previous Next
Reported by: Helmut Eller <eller.helmut <at> gmail.com>
Date: Mon, 24 Oct 2016 18:07:01 UTC
Severity: normal
Found in version 26.0.50
Done: Philipp Stephani <p.stephani2 <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
Message #14 received at 24784 <at> debbugs.gnu.org (full text, mbox):
On Tue, Oct 25 2016, Dmitry Gutov wrote:
> On 24.10.2016 22:57, Philipp Stephani wrote:
>
>> +(defsubst json--decode-utf-16-surrogates (high low)
>
> IIRC, there might be no actual benefit from making it a defsubst. If
> someone could benchmark it, I'd like to see the result.
I guess it doesn't hurt but I also doubt that it makes a measurable
difference as utf-16 surrogates are rarely needed.
>
>> + ;; Special-case UTF-16 surrogate pairs,
>> + ;; cf. https://tools.ietf.org/html/rfc7159#section-7
>> + ((looking-at
>> + (rx (group (any "Dd") (any "89ABab") (= 2 (any "0-9A-Fa-f")))
>> + "\\u" (group (any "Dd") (any "C-Fc-f") (= 2 (any "0-9A-Fa-f")))))
>> + (json-advance 10)
>> + (json--decode-utf-16-surrogates
>> + (string-to-number (match-string 1) 16)
>> + (string-to-number (match-string 2) 16)))
>
> Shouldn't this go below the UTF-8 case, as the less-frequent one?
There's also an opportunity to detect unpaired surrogates, e.g.:
(defun json-read-escaped-char ()
"Read the JSON string escaped character at point."
;; Skip over the '\'
(json-advance)
(let* ((char (json-pop))
(special (assq char json-special-chars)))
(cond
(special (cdr special))
((not (eq char ?u)) char)
((looking-at "[0-9A-Fa-f]\\{4\\}")
(let* ((code (string-to-number (match-string 0) 16)))
(json-advance 4)
(cond ((<= #xD800 code #xDBFF) ; UTF-16 high surrogate
(cond ((looking-at "\\\\u\\([Dd][C-Fc-f][0-9A-Fa-f]\\{2\\}\\)")
(let ((low (string-to-number (match-string 1) 16)))
(json-advance 6)
(json--decode-utf-16-surrogates code low)))
(t
;; Expected low surrogate missing
(signal 'json-string-escape (list (point))))))
((<= #xDC00 code #xDFFF)
;; Unexpected low surrogate
(signal 'json-string-escape (list (point))))
(t
code))))
(t
(signal 'json-string-escape (list (point)))))))
This bug report was last modified 8 years and 145 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.