GNU bug report logs -
#24117
25.1; url-http-create-request: Multibyte text in HTTP request
Previous Next
Reported by: Sho Takemori <stakemorii <at> gmail.com>
Date: Sun, 31 Jul 2016 08:28:02 UTC
Severity: normal
Found in version 25.1
Done: Dmitry Gutov <dgutov <at> yandex.ru>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
On 08/10/2016 05:35 PM, Eli Zaretskii wrote:
> Are you saying that url-generic-parse-url performs this encoding, and
> that using a unibyte buffer causes that to fail?
No, url-generic-parse-url contains logic that allows to distinguish
between the domain and the path parts of an URL. So apparently it might
have to work on multibyte URLs.
That's not strictly necessary, however, given how url-encode-url uses it
currently (it performs encode-coding-string and decode-coding-string on
the URL string).
That approach seems flawed to me, but either way, someone will have to
choose how url-encode-url should use url-generic-parse-url. If we intend
to leave it as-is, then the proposed patch using set-buffer-multibyte
actually works fine, even on master, with multibyte URLs.
>> So I think the encoding of the URL parts should be performed inside
>> url-http-create-request.
>
> Fine with me, but when I suggested that, you didn't like the
> suggestion. If you changed your mind, let's do that.
See below. But yes, I'm more inclined toward this approach now, after
Lar's objection, and after looking at the code in master.
>> On the master branch, host is passed through IDNA encoding, but
>> real-fname is untouched. On emacs-25, I think we should convert both
>> to unibyte.
>
> Not sure I understand why there should be a difference between the two
> branches. Encoding an ASCII string doesn't do any harm.
Since it's ASCII, using utf-8 there seems misleading to me. It's a
question of readability. As a bonus, using us-ascii will validate that
the strings indeed do not contain any unexpected characters.
>> (Why doesn't (encode-coding-string "aaaa" 'ascii) work?)
>
> It's 'us-ascii, not 'ascii.
Thanks. Attaching a patch, it seems to work well enough.
I'd like to wait for Lar's response now, but someone will have to make
an executive decision. Both patches (this and the set-multibyte-buffer-p
one), work in the cases I've tested.
This one seems more conservative, but it'll require a manual merge to
master. The other one is very trivial, will merge automatically, but
might cause problems for potential less-careful uses of
url-generic-parse-url.
[url-http--encode-string.diff (text/x-patch, attachment)]
This bug report was last modified 8 years and 12 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.