GNU bug report logs -
#63125
30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect?
Previous Next
Reported by: Ruijie Yu <ruijie <at> netyu.xyz>
Date: Thu, 27 Apr 2023 16:34:02 UTC
Severity: normal
Found in version 30.0.50
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> Date: Fri, 28 Apr 2023 00:19:22 +0800
> From: Ruijie Yu via "Bug reports for GNU Emacs,
> the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>
> I'm trying out the function `libxml2-parse-html-region' as recommended
> by a thread in help-gnu-emacs. However, I discovered that the last
> argument of this function does not help me normalize a relative url.
>
> Reproducer:
>
> Visit the attached toy html file. I imagine that it is hosted at
> "https://example.com/good/day".
>
> Run this snippet:
>
> (pp (libxml-parse-html-region
> (point-min) (point-max)
> "https://example.com/good/day"))
>
> Compare it with this snippet:
>
> (pp (libxml-parse-html-region
> (point-min) (point-max)))
>
> What I get is this result for both snippets (which is shown twice, once
> "pretty-printed", and once returned as a string):
>
> --8<---------------cut here---------------start------------->8---
> (html nil
> (body nil "\n "
> (a
> ((href . "/hello"))
> "1")
> "\n "
> (a
> ((href . "../world"))
> "2")
> "\n "
> (a
> ((href . "good"))
> "3")
> "\n "
> (a
> ((href . "morning/or/night"))
> "4")
> "\n "))
> --8<---------------cut here---------------end--------------->8---
>
> Notice, that the href values are not normalized: they are copied
> verbatim from the original html file.
>
> If I understand the docstring correctly, the last argument of
> `libxml2-parse-html-region', when specified as a url string, should be
> used as the "base point" of resolving relative paths found within the
> html document. But the <a href=xxx> paths are not resolved at the
> moment.
If you look at xml.c, you will see that we just call a libxml function
passing it this URL. So if anything isn't as expected, the answer is
in libxml, not in Emacs.
This bug report was last modified 2 years and 23 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.