GNU bug report logs -
#50391
28.0.50; json-read non-ascii data results in malformed string
Previous Next
Reported by: Zhiwei Chen <condy0919 <at> gmail.com>
Date: Sun, 5 Sep 2021 04:21:02 UTC
Severity: normal
Tags: notabug
Found in version 28.0.50
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
When fetch json from youdao (a dict service in China).
#+begin_src elisp
(url-retrieve
"https://dict.youdao.com/suggest?q=accumulate&le=eng&num=80&doctype=json"
(lambda (_status)
(goto-char (1+ url-http-end-of-headers))
(write-region (point) (point-max) "/tmp/acc1.json")))
#+end_src
Then C-x C-f "/tmp/acc1.json", the file is correctly encoded without
But If `json-read' then `json-insert', the file is malformed even if
uchardet shows the encoding of the file is utf-8.
#+begin_src elisp
(url-retrieve
"https://dict.youdao.com/suggest?q=accumulate&le=eng&num=80&doctype=json"
(lambda (_status)
(goto-char (1+ url-http-end-of-headers))
(let ((j (json-read)))
(with-temp-buffer
(json-insert j)
(write-region (point-min) (point-max) "/tmp/acc2.json")))))
#+end_src
#+begin_src shell
diff -u <(hexdump -C /tmp/acc1.json | head -n10) <(hexdump -C /tmp/acc2.json | head -n10) | diff-so-fancy
#+end_src
Screenshot: https://pb.nichi.co/jazz-estate-brave
Where diff shows the first word "累积" is encoded incorrectly in
"/tmp/acc2.json". (It uses `c3 a7 c2 b4 c2 af')
Actually,
#+begin_src shell
echo -n "累积" | hexdump -C
#+end_src
should be `e7 b4 af e7 a7 af' in utf-8 where "累" is represented with
`e7 b4 af' and "积" is represented with `e7 a7 af'
The environment variable LANG is `en_US.UTF-8', all tested in `emacs -Q'
--
Zhiwei Chen
This bug report was last modified 3 years and 319 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.