#24117 - 25.1; url-http-create-request: Multibyte text in HTTP request

GNU bug report logs - #24117
25.1; url-http-create-request: Multibyte text in HTTP request

Package: emacs;

Reported by: Sho Takemori <stakemorii <at> gmail.com>

Date: Sun, 31 Jul 2016 08:28:02 UTC

Severity: normal

Found in version 25.1

Done: Dmitry Gutov <dgutov <at> yandex.ru>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Ted Zlatanov <tzz <at> lifelogs.com> To: Dmitry Gutov <dgutov <at> yandex.ru> Cc: stakemorii <at> gmail.com, Lars Ingebrigtsen <larsi <at> gnus.org>, schwab <at> linux-m68k.org, 24117 <at> debbugs.gnu.org Subject: bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request Date: Thu, 11 Aug 2016 08:57:50 -0400

On Thu, 11 Aug 2016 15:31:11 +0300 Dmitry Gutov <dgutov <at> yandex.ru> wrote: DG> On 08/11/2016 11:53 AM, Ted Zlatanov wrote: >> Could you add to your patch the cases you've tested? There's a specific >> place for URL parsing tests in test/lisp/url/url-parse-tests.el that >> would help everyone. DG> Sure, but only one of the patches affects URL parsing (and Lars prefers the DG> other one). Maybe the tests should be in a separate patch then. Neither your Russian example nor Lars' example have a parallel in the tests AFAICS. I'd also add the example hostname that Katsumi Yamaoka gave from the w3m source. Somewhat related: it would be nice if the URL parser also listed the non-ASCII scripts used in the domain name. Then eww and other programs could do one of the typical defenses: either ensure only one script is used; or allow only scripts that match the user's locale; or catch any non-ASCII domain names. Typically they'd use Punycode to display such suspicious domain names: https://en.wikipedia.org/wiki/IDN_homograph_attack I bring it up since explicitly allowing non-ASCII domain names automatically opens up these security concerns, and it's a bit hard to collect the confusables externally: https://elpa.gnu.org/packages/uni-confusables.html On Thu, 11 Aug 2016 13:05:12 +0200 Lars Ingebrigtsen <larsi <at> gnus.org> wrote: LI> Yes, the fix here should be in url-http-create-request, not in the URL LI> parsing functions. The main issue here is that the URL request buffer LI> is a multibyte buffer and (as with all network connection buffers), it LI> shouldn't be. (Or, rather, that function just creates a string instead LI> of a buffer, but the same principle applies.) I think this is correct: the URL parsing should not care about the provenance or potential use of that URL to make a HTTP request or otherwise. But maybe the URL parsing can be smart enough to return both the IDNA version and the original domain name, plus some parsing information like the list of scripts I suggested above, to save user agents from doing that extra work? Ted

This bug report was last modified 8 years and 65 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #24117 25.1; url-http-create-request: Multibyte text in HTTP request

GNU bug report logs - #24117
25.1; url-http-create-request: Multibyte text in HTTP request