GNU bug report logs - #23750
25.0.95; bug in url-retrieve or json.el

Previous Next

Package: emacs;

Reported by: Leo Liu <sdl.web <at> gmail.com>

Date: Sun, 12 Jun 2016 02:24:02 UTC

Severity: normal

Found in version 25.0.95

Done: Dmitry Gutov <dgutov <at> yandex.ru>

Bug is archived. No further changes may be made.

Full log


Message #77 received at 23750 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 23750 <at> debbugs.gnu.org, monnier <at> IRO.UMontreal.CA, sdl.web <at> gmail.com
Subject: Re: bug#23750: 25.0.95; bug in url-retrieve or json.el
Date: Mon, 20 Jun 2016 17:54:23 +0300
On 06/20/2016 05:38 PM, Eli Zaretskii wrote:

> This all sounds like my response is not welcome, but in that case why
> did you ask the question?

I was kind of hoping for "yes, let's get it into 25.1!"? :)

> No, the bug is where the invalid input is generated in the first
> place.  Each API has its contract; if you violate the contract, you
> invoke undefined behavior.

It's a bug in the API, or bad API, if you will. It needs stricter 
contract, and the submitted patch added it.

Or to look at it another way, the current contract allows url-http-data 
to be multibyte, because the requirement to the contrary is not 
documented anywhere that I can see. The variable is simply undocumented.

>>     If this is what you need, why not simply test the payload for being a
>>     unibyte string?  There a function, multibyte-string-p, for that.
>>
>> There are a lot of variables to test (see the comment above the mapconcat call).
>
> Looks like mapc will be able to deal with that.  Or just use concat,
> and test the result with multibyte-string-p before sending.  Or encode
> it with UTF-8, if it is not unibyte already.

I don't know if we want to be that permissive that we'll encode to UTF-8 
silently.

> Btw, I don't think the comment which explains why we started using
> mapconcat is accurate these days.  It was written before the move to
> Unicode in Emacs 23, but we stopped converting raw bytes into Latin-1
> characters in Emacs 23 and later.  So maybe we should just go back to
> using concat (with erroring out, if the result is multibyte, and/or
> maybe with replacing 'length' with 'string-bytes').

Better error out: the payload's encoding is something only the caller 
should be concerned with. Unless we're fine with the users assuming that 
Emacs's internal encoding is close enough to UTF-8.

> Bottom line: like I said, there should be no reason to use
> string-*-unibyte in modern Emacs code on the url-http level or higher
> (maybe not at all).  Its use is a sign of some basic misunderstanding,
> or a bug elsewhere, or remnant of old problems that no longer exist.
> So I think we should reconsider the solution on master as well.

I don't mind. Would you advocate for having this fix on emacs-25 if I 
implement it the way you described?

>> And you'll have to come up with the error message(s).
>
> Are you saying you like the error message from string-to-unibyte?
>
>   Cannot convert 123th character to unibyte

It's an order of magnitude better than what was before (no error and 
silent corruption), but yes, there is space for improvement.




This bug report was last modified 9 years and 47 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.