GNU bug report logs - #75998
[guile-lib] html->sxml does not decode entities in attributes

Previous Next

Package: guile;

Reported by: Tomas Volf <~@wolfsden.cz>

Date: Sat, 1 Feb 2025 20:11:01 UTC

Severity: normal

Done: Tomas Volf <~@wolfsden.cz>

Bug is archived. No further changes may be made.

Forwarded to oleg@okmij.org

Full log


Message #8 received at 75998 <at> debbugs.gnu.org (full text, mbox):

From: <tomas <at> tuxteam.de>
To: Tomas Volf <~@wolfsden.cz>
Cc: 75998 <at> debbugs.gnu.org
Subject: Re: bug#75998: [guile-lib] html->sxml does not decode entities in
 attributes
Date: Sun, 2 Feb 2025 07:47:06 +0100
[Message part 1 (text/plain, inline)]
On Sat, Feb 01, 2025 at 09:10:04PM +0100, Tomas Volf wrote:
> 
> Hello,
> 
> I think I found a bug in the htmlprag module in guile-lib.  When parsing
> attributes, the values are not properly decoded:
> 
> --8<---------------cut here---------------start------------->8---
> scheme@(guile-user)> ,use (htmlprag)
> scheme@(guile-user)> (html->sxml "<hr aaa=\"bbb&quot;ccc'ddd\" />")
> $1 = (*TOP* (hr (@ (aaa "bbb&quot;ccc'ddd"))))
> scheme@(guile-user)> (html->sxml "<a href=\"a&amp;b\" />")
> $2 = (*TOP* (a (@ (href "a&amp;b"))))
> --8<---------------cut here---------------end--------------->8---
> 
> I think that $1 should be "bbb\"ccc'ddd" and $2 should be "a&b".

Ouch. Have you contacted Oleg Kiselyov about it? He's usually pretty
responsive and very friendly.

> The annoying part is that this cannot really be changed now, because
> people (me included) already have workarounds in place, and
> automatically decoding now would lead to double decoding.
> 
> I see few ways forward:
> 
> 1. Document the current behavior and keep it as it is.
> 2. Add argument #:decode-attributes, defaulting to #f, to the relevant
>    procedures, so that people can opt into the fixed behavior.
> 3. Introduce parameter %decode-attributes, so that people can opt into
>    the fixed behavior.
> 
> I am sure there are also other approaches possible.

If it were me, I'd take 2.

Cheers
-- 
tomás
[signature.asc (application/pgp-signature, inline)]

This bug report was last modified 95 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.