GNU bug report logs -
#6252
Emacs does not implement URL (aka "percent") decoding correctly.
Previous Next
Reported by: José A. Romero L. <escherdragon <at> gmail.com>
Date: Sun, 23 May 2010 00:52:02 UTC
Severity: normal
Tags: fixed
Fixed in version 24.2
Done: Lars Magne Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
On May 18, 20:14, Xah Lee <xah...@gmail.com> wrote:
> is there emacs lisp function that decode the url percent encoding?
> e.g.http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem
> should become
> http://en.wikipedia.org/wiki/Sylvester–Gallai_theorem
> that's a EN DASH (unicode 8211, #o20023, #x2013).
> I know there's a
> (require 'gnus-util)
> gnus-url-unhex-string
> but that just unhex, and generate gibberish if the url contain unicode
> chars.
(...)
Seems that RFC 3986 has not been implemented correctly in Emacs. IMHO
that is an important hole you have found there. The standard requires
that all unreserved characters be encoded/decoded as UTF8 bytes. Even
though the encoding part looks OK (in url-util.el), the decoding does
not go that last mile to interpret the decoded bytes as UTF-8.
Until a proper implementation is done, I guess you could work around
the problem with something like this:
(decode-coding-string
(apply 'unibyte-string
(string-to-list
(url-unhex-string "http://en.wikipedia.org/wiki/Sylvester
%E2%80%93Gallai_theorem")))
'utf-8)
(yes, it's ugly as hell but hey, it's free ;])
I've just sent this very message as a bug report to the Emacs team.
Cheers,
--
José A. Romero L.
escherdragon <at> gmail.com
"We who cut mere stones must always be envisioning cathedrals."
(Quarry worker's creed)
This bug report was last modified 13 years and 123 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.