Eli Zaretskii writes: >> Date: Fri, 28 Apr 2023 00:19:22 +0800 >> From: Ruijie Yu via "Bug reports for GNU Emacs, >> the Swiss army knife of text editors" >> >> I'm trying out the function `libxml2-parse-html-region' as recommended >> by a thread in help-gnu-emacs. However, I discovered that the last >> argument of this function does not help me normalize a relative url. >> >> Reproducer: >> >> Visit the attached toy html file. I imagine that it is hosted at >> "https://example.com/good/day". >> >> Run this snippet: >> >> (pp (libxml-parse-html-region >> (point-min) (point-max) >> "https://example.com/good/day")) >> >> Compare it with this snippet: >> >> (pp (libxml-parse-html-region >> (point-min) (point-max))) >> >> What I get is this result for both snippets (which is shown twice, once >> "pretty-printed", and once returned as a string): >> >> --8<---------------cut here---------------start------------->8--- >> (html nil >> (body nil "\n " >> (a >> ((href . "/hello")) >> "1") >> "\n " >> (a >> ((href . "../world")) >> "2") >> "\n " >> (a >> ((href . "good")) >> "3") >> "\n " >> (a >> ((href . "morning/or/night")) >> "4") >> "\n ")) >> --8<---------------cut here---------------end--------------->8--- >> >> Notice, that the href values are not normalized: they are copied >> verbatim from the original html file. >> >> If I understand the docstring correctly, the last argument of >> `libxml2-parse-html-region', when specified as a url string, should be >> used as the "base point" of resolving relative paths found within the >> html document. But the paths are not resolved at the >> moment. > > If you look at xml.c, you will see that we just call a libxml function > passing it this URL. So if anything isn't as expected, the answer is > in libxml, not in Emacs. Thank you for pointing that out. I will take a look at its source in a day or two. I am also upgrading it from 2.10.3-2 to 2.10.4-2, and will see if that changes anything. If I end up deciding that it is a libxml2 bug, I'll file a bug there and link to this bug. For completeness, here attached is the toy html file that I forgot to attach in my initial report.