GNU bug report logs - #50391
28.0.50; json-read non-ascii data results in malformed string

Previous Next

Package: emacs;

Reported by: Zhiwei Chen <condy0919 <at> gmail.com>

Date: Sun, 5 Sep 2021 04:21:02 UTC

Severity: normal

Tags: notabug

Found in version 28.0.50

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Zhiwei Chen <condy0919 <at> gmail.com>
To: 50391 <at> debbugs.gnu.org
Subject: bug#50391: 28.0.50; json-read non-ascii data results in malformed string
Date: Sun, 05 Sep 2021 12:19:56 +0800
When fetch json from youdao (a dict service in China).

#+begin_src elisp
(url-retrieve
  "https://dict.youdao.com/suggest?q=accumulate&le=eng&num=80&doctype=json"
  (lambda (_status)
    (goto-char (1+ url-http-end-of-headers))
    (write-region (point) (point-max) "/tmp/acc1.json")))
#+end_src

Then C-x C-f "/tmp/acc1.json", the file is correctly encoded without 

But If `json-read' then `json-insert', the file is malformed even if
uchardet shows the encoding of the file is utf-8.

#+begin_src elisp
(url-retrieve
  "https://dict.youdao.com/suggest?q=accumulate&le=eng&num=80&doctype=json"
  (lambda (_status)
    (goto-char (1+ url-http-end-of-headers))
    (let ((j (json-read)))
    (with-temp-buffer
      (json-insert j)
      (write-region (point-min) (point-max) "/tmp/acc2.json")))))
#+end_src

#+begin_src shell
diff -u <(hexdump -C /tmp/acc1.json | head -n10) <(hexdump -C /tmp/acc2.json | head -n10) | diff-so-fancy
#+end_src

Screenshot: https://pb.nichi.co/jazz-estate-brave

Where diff shows the first word "累积" is encoded incorrectly in
"/tmp/acc2.json". (It uses `c3 a7 c2 b4 c2 af')

Actually,

#+begin_src shell
echo -n "累积" | hexdump -C
#+end_src

should be `e7 b4 af e7 a7 af' in utf-8 where "累" is represented with
`e7 b4 af' and "积" is represented with `e7 a7 af'

The environment variable LANG is `en_US.UTF-8', all tested in `emacs -Q'

-- 
Zhiwei Chen




This bug report was last modified 3 years and 319 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.