GNU bug report logs - #50391
28.0.50; json-read non-ascii data results in malformed string

Previous Next

Package: emacs;

Reported by: Zhiwei Chen <condy0919 <at> gmail.com>

Date: Sun, 5 Sep 2021 04:21:02 UTC

Severity: normal

Tags: notabug

Found in version 28.0.50

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 50391 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Zhiwei Chen <condy0919 <at> gmail.com>
Cc: 50391 <at> debbugs.gnu.org
Subject: Re: bug#50391: 28.0.50; json-read non-ascii data results in
 malformed string
Date: Sun, 05 Sep 2021 10:08:35 +0200
[Message part 1 (text/plain, inline)]
Zhiwei Chen <condy0919 <at> gmail.com> writes:

> When fetch json from youdao (a dict service in China).
>
> #+begin_src elisp
> (url-retrieve
>   "https://dict.youdao.com/suggest?q=accumulate&le=eng&num=80&doctype=json"
>   (lambda (_status)
>     (goto-char (1+ url-http-end-of-headers))
>     (write-region (point) (point-max) "/tmp/acc1.json")))
> #+end_src
>
> Then C-x C-f "/tmp/acc1.json", the file is correctly encoded without 
>
> But If `json-read' then `json-insert', the file is malformed even if
> uchardet shows the encoding of the file is utf-8.

When you do the `write-region', Emacs writes the octets you received
from the web server to a file.  When Emacs loads that file in again, it
guesses that it's utf-8 and decodes it that way, so that's why that
works correctly.

> #+begin_src elisp
> (url-retrieve
>   "https://dict.youdao.com/suggest?q=accumulate&le=eng&num=80&doctype=json"
>   (lambda (_status)
>     (goto-char (1+ url-http-end-of-headers))
>     (let ((j (json-read)))
>     (with-temp-buffer
>       (json-insert j)
>       (write-region (point-min) (point-max) "/tmp/acc2.json")))))
> #+end_src

But here you're asking Emacs to use json-read on a buffer that's not
been decoded.  The http buffer at this points looks like this:

[Message part 2 (image/png, inline)]
[Message part 3 (text/plain, inline)]
You have to say (decode-coding-region (point) (point-max) 'utf-8) first
for that to work.  I.e.,

  (url-retrieve
   "https://dict.youdao.com/suggest?q=accumulate&le=eng&num=80&doctype=json"
   (lambda (_status)
     (goto-char (1+ url-http-end-of-headers))
     (let ((buf (current-buffer))
	   (end (1+ url-http-end-of-headers)))
       (with-temp-buffer
	 (insert-buffer-substring buf end)
	 (goto-char (point-min))
	 (let ((j (json-read)))
	   (erase-buffer)
	   (json-insert j)
	   (write-region (point-min) (point-max) "/tmp/acc2.json"))))))


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

This bug report was last modified 3 years and 319 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.