GNU bug report logs -
#6252
Emacs does not implement URL (aka "percent") decoding correctly.
Previous Next
Reported by: José A. Romero L. <escherdragon <at> gmail.com>
Date: Sun, 23 May 2010 00:52:02 UTC
Severity: normal
Tags: fixed
Fixed in version 24.2
Done: Lars Magne Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
Message #14 received at 6252 <at> debbugs.gnu.org (full text, mbox):
José A. Romero L. <escherdragon <at> gmail.com> writes:
> On May 18, 20:14, Xah Lee <xah...@gmail.com> wrote:
>
>> is there emacs lisp function that decode the url percent encoding?
>> e.g.http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem
>> should become
>> http://en.wikipedia.org/wiki/Sylvester–Gallai_theorem
>> that's a EN DASH (unicode 8211, #o20023, #x2013).
>> I know there's a
>> (require 'gnus-util)
>> gnus-url-unhex-string
>> but that just unhex, and generate gibberish if the url contain unicode
>> chars.
> (...)
>
> Seems that RFC 3986 has not been implemented correctly in Emacs. IMHO
> that is an important hole you have found there. The standard requires
> that all unreserved characters be encoded/decoded as UTF8 bytes. Even
> though the encoding part looks OK (in url-util.el), the decoding does
> not go that last mile to interpret the decoded bytes as UTF-8.
I'm not quite sure I understand what the problem is. Do you have a test
case that illustrates what url.el does wrong?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
This bug report was last modified 13 years and 104 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.