GNU bug report logs -
#6252
Emacs does not implement URL (aka "percent") decoding correctly.
Previous Next
Reported by: José A. Romero L. <escherdragon <at> gmail.com>
Date: Sun, 23 May 2010 00:52:02 UTC
Severity: normal
Tags: fixed
Fixed in version 24.2
Done: Lars Magne Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
>>>>> On Sun, 23 May 2010 01:46:54 +0200, José A. Romero L. <escherdragon <at> gmail.com> said:
> Seems that RFC 3986 has not been implemented correctly in
> Emacs. IMHO that is an important hole you have found there. The
> standard requires that all unreserved characters be encoded/decoded
> as UTF8 bytes.
If you are referring to the following part of RFC 3986, it doesn't say
anything about existing URI schemes (as opposed to "a new URI
scheme"), those defining a component that does NOT represent textual
data, or even for textual data, those NOT consisting of characters
from the Universal Character Sets.
When a new URI scheme defines a component that represents textual
data consisting of characters from the Universal Character Set
[UCS], the data should first be encoded as octets according to the
UTF-8 character encoding [STD63]; then only those octets that do not
correspond to characters in the unreserved set should be percent-
encoded.
(See also http://lists.gnu.org/archive/html/emacs-devel/2006-08/msg00065.html)
Though returning a multibyte string decoded as UTF-8 would be useful
for many cases, I think some "unhex"ing function should also provide a
functionality to return a unibyte string.
YAMAMOTO Mitsuharu
mituharu <at> math.s.chiba-u.ac.jp
This bug report was last modified 13 years and 104 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.