GNU bug report logs - #6252
Emacs does not implement URL (aka "percent") decoding correctly.

Previous Next

Package: emacs;

Reported by: José A. Romero L. <escherdragon <at> gmail.com>

Date: Sun, 23 May 2010 00:52:02 UTC

Severity: normal

Tags: fixed

Fixed in version 24.2

Done: Lars Magne Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


Message #14 received at 6252 <at> debbugs.gnu.org (full text, mbox):

From: Lars Magne Ingebrigtsen <larsi <at> gnus.org>
To: José A. Romero L. <escherdragon <at> gmail.com>
Cc: 6252 <at> debbugs.gnu.org
Subject: Re: Emacs does not implement URL (aka "percent") decoding correctly.
Date: Wed, 21 Sep 2011 22:17:52 +0200
José A. Romero L. <escherdragon <at> gmail.com> writes:

> On May 18, 20:14, Xah Lee <xah...@gmail.com>  wrote:
>
>> is there emacs lisp function that decode the url percent encoding?
>> e.g.http://en.wikipedia.org/wiki/Sylvester%E2%80%93Gallai_theorem
>> should become
>> http://en.wikipedia.org/wiki/Sylvester–Gallai_theorem
>> that's a EN DASH (unicode 8211, #o20023, #x2013).
>> I know there's a
>>   (require 'gnus-util)
>>  gnus-url-unhex-string
>> but that just unhex, and generate gibberish if the url contain unicode
>> chars.
> (...)
>
> Seems that RFC 3986 has not been implemented correctly in Emacs. IMHO
> that is an important hole you have found there. The standard requires
> that all unreserved characters be encoded/decoded as UTF8 bytes. Even
> though the encoding part looks OK (in url-util.el), the decoding does
> not go that last mile to interpret the decoded bytes as UTF-8.

I'm not quite sure I understand what the problem is.  Do you have a test
case that illustrates what url.el does wrong?

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/




This bug report was last modified 13 years and 104 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.