GNU bug report logs -
#75998
[guile-lib] html->sxml does not decode entities in attributes
Previous Next
Reported by: Tomas Volf <~@wolfsden.cz>
Date: Sat, 1 Feb 2025 20:11:01 UTC
Severity: normal
Done: Tomas Volf <~@wolfsden.cz>
Bug is archived. No further changes may be made.
Forwarded to oleg@okmij.org
Full log
View this message in rfc822 format
Hi Tomas,
[...]
> It already modifies the raw value for regular HTML text:
>
> scheme@(htmlprag)> (html->sxml "a&b")
> $10 = (*TOP* "a&b")
> scheme@(htmlprag)> (sxml->html '(*TOP* "a&b"))
> $13 = "a&b"
>
>
> I now noticed this also affect encoding:
>
> scheme@(htmlprag)> (sxml->html '(*TOP* (a (@ (href "a&b")))))
> $12 = "<a href=\"a&b\"></a>"
>
>
> I am not sure why attributes should be special here.
>
> For what it is worth, (sxml simple) itself decodes even attributes:
>
> scheme@(htmlprag)> (xml->sxml "<a href=\"a&b\"></a>")
> $11 = (*TOP* (a (@ (href "a&b"))))
>
> For comparison, Firefox seems to decode the attributes as well even in
> HTML. That is actually how I discovered this issue, links I extracted
> from <a href=".."> using html->sxml were not working until I ran a
> decoding pass on them.
Good points. Thanks for these.
>> Users may haev different use cases requiring to apply different
>> transformation themselves?
>
> I agree in the abstract, but do you have any specific use case in mind
> when you would want to use the raw content of attributes (especially
> since you already cannot get raw content of text nodes).
>> If we hard-code a decoding scheme ourselves, then force that choice
>> onto users, no?
>
> I agree we cannot hard-code or change it now due to compatibility
> concerns, but adding #:decode-attributes to html->sxml,
> #:encode-attributes to sxml->html and possibly %deencode-attributes?
> parameter, in the spirit of %strict-tokenizer? would seem reasonable.
I see this situation and %strict-tokenizer as a bit different; the
htmlprag module was designed to be lenient, so being lenient could not
really be considered a bug :-). But this here could well be considered
a bug. So perhaps something we could do is fix this correctly, and bump
at least the minor digit in our version (we're still in an unstable 0
version (last one was 0.2.8.1), so technically we don't promise
stability yet (perhaps never, as this guile-lib project aims to be a lab
for components that could later be included in Guile). But we should
communicate this change well in the NEWS file.
--
Thanks,
Maxim
This bug report was last modified 95 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.