GNU bug report logs - #75998
[guile-lib] html->sxml does not decode entities in attributes

Previous Next

Package: guile;

Reported by: Tomas Volf <~@wolfsden.cz>

Date: Sat, 1 Feb 2025 20:11:01 UTC

Severity: normal

Done: Tomas Volf <~@wolfsden.cz>

Bug is archived. No further changes may be made.

Forwarded to oleg@okmij.org

Full log


Message #19 received at 75998 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Tomas Volf <~@wolfsden.cz>
Cc: 75998 <at> debbugs.gnu.org, tomas <at> tuxteam.de
Subject: Re: bug#75998: [guile-lib] html->sxml does not decode entities in
 attributes
Date: Mon, 03 Feb 2025 23:30:55 +0900
Hi Tomas,

Thank you for reporting this issue.

Tomas Volf <~@wolfsden.cz> writes:

> <tomas <at> tuxteam.de> writes:
>
>> On Sat, Feb 01, 2025 at 09:10:04PM +0100, Tomas Volf wrote:
>>> 
>>> Hello,
>>> 
>>> I think I found a bug in the htmlprag module in guile-lib.  When parsing
>>> attributes, the values are not properly decoded:
>>> 
>>> --8<---------------cut here---------------start------------->8---
>>> scheme@(guile-user)> ,use (htmlprag)
>>> scheme@(guile-user)> (html->sxml "<hr aaa=\"bbb&quot;ccc'ddd\" />")
>>> $1 = (*TOP* (hr (@ (aaa "bbb&quot;ccc'ddd"))))
>>> scheme@(guile-user)> (html->sxml "<a href=\"a&amp;b\" />")
>>> $2 = (*TOP* (a (@ (href "a&amp;b"))))
>>> --8<---------------cut here---------------end--------------->8---
>>> 
>>> I think that $1 should be "bbb\"ccc'ddd" and $2 should be "a&b".
>>
>> Ouch. Have you contacted Oleg Kiselyov about it? He's usually pretty
>> responsive and very friendly.
>
> I did not.  I did not find a "how to report bugs" section on guile-lib's
> website, and on the (htmlprag) documentation section Oleg Kiselyov is
> mentioned only in one sentence as a "Thanks".
>
> I think I have managed to find his email in one Haskell paper of his, so
> I will CC him on the bug report, as suggested.

And also for containing Oleg.  I hope they can provide us with their
opinion on whether this is an actual bug or was designed that way.  To
me, it's not clear whether html->sxml should alterate the raw value of
attributes in any way.  Users may haev different use cases requiring to
apply different transformation themselves?  If we hard-code a decoding
scheme ourselves, then force that choice onto users, no?

-- 
Thanks,
Maxim




This bug report was last modified 96 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.