GNU bug report logs - #24117
25.1; url-http-create-request: Multibyte text in HTTP request

Previous Next

Package: emacs;

Reported by: Sho Takemori <stakemorii <at> gmail.com>

Date: Sun, 31 Jul 2016 08:28:02 UTC

Severity: normal

Found in version 25.1

Done: Dmitry Gutov <dgutov <at> yandex.ru>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Ted Zlatanov <tzz <at> lifelogs.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: stakemorii <at> gmail.com, Lars Ingebrigtsen <larsi <at> gnus.org>, schwab <at> linux-m68k.org, 24117 <at> debbugs.gnu.org
Subject: bug#24117: 25.1; url-http-create-request: Multibyte text in HTTP request
Date: Thu, 11 Aug 2016 08:57:50 -0400
On Thu, 11 Aug 2016 15:31:11 +0300 Dmitry Gutov <dgutov <at> yandex.ru> wrote: 

DG> On 08/11/2016 11:53 AM, Ted Zlatanov wrote:
>> Could you add to your patch the cases you've tested? There's a specific
>> place for URL parsing tests in test/lisp/url/url-parse-tests.el that
>> would help everyone.

DG> Sure, but only one of the patches affects URL parsing (and Lars prefers the
DG> other one).

Maybe the tests should be in a separate patch then. Neither your Russian
example nor Lars' example have a parallel in the tests AFAICS. I'd also
add the example hostname that Katsumi Yamaoka gave from the w3m source.

Somewhat related: it would be nice if the URL parser also listed the
non-ASCII scripts used in the domain name. Then eww and other programs
could do one of the typical defenses: either ensure only one script is
used; or allow only scripts that match the user's locale; or catch any
non-ASCII domain names. Typically they'd use Punycode to display such
suspicious domain names:
https://en.wikipedia.org/wiki/IDN_homograph_attack

I bring it up since explicitly allowing non-ASCII domain names
automatically opens up these security concerns, and it's a bit hard to
collect the confusables externally:
https://elpa.gnu.org/packages/uni-confusables.html

On Thu, 11 Aug 2016 13:05:12 +0200 Lars Ingebrigtsen <larsi <at> gnus.org> wrote: 

LI> Yes, the fix here should be in url-http-create-request, not in the URL
LI> parsing functions.  The main issue here is that the URL request buffer
LI> is a multibyte buffer and (as with all network connection buffers), it
LI> shouldn't be.  (Or, rather, that function just creates a string instead
LI> of a buffer, but the same principle applies.)

I think this is correct: the URL parsing should not care about the
provenance or potential use of that URL to make a HTTP request or
otherwise. But maybe the URL parsing can be smart enough to return both
the IDNA version and the original domain name, plus some parsing
information like the list of scripts I suggested above, to save user
agents from doing that extra work?

Ted




This bug report was last modified 8 years and 12 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.