GNU bug report logs - #63125
30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect?

Previous Next

Package: emacs;

Reported by: Ruijie Yu <ruijie <at> netyu.xyz>

Date: Thu, 27 Apr 2023 16:34:02 UTC

Severity: normal

Found in version 30.0.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Ruijie Yu <ruijie <at> netyu.xyz>
Subject: bug#63125: closed (Re: bug#63125: 30.0.50; [BUG] last argument of
 libxml-parse-html-region has no effect?)
Date: Sat, 29 Apr 2023 06:40:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect?

which was filed against the emacs package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 63125 <at> debbugs.gnu.org.

-- 
63125: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=63125
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Eli Zaretskii <eliz <at> gnu.org>
To: Ruijie Yu <ruijie <at> netyu.xyz>
Cc: larsi <at> gnus.org, 63125-done <at> debbugs.gnu.org
Subject: Re: bug#63125: 30.0.50; [BUG] last argument of
 libxml-parse-html-region has no effect?
Date: Sat, 29 Apr 2023 09:40:19 +0300
> From: Ruijie Yu <ruijie <at> netyu.xyz>
> Cc: 63125 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>
> Date: Sat, 29 Apr 2023 08:58:03 +0800
> 
> > The response doesn't say much.  What is this "base URL" argument used
> > for, and why is it named "bas URL"?  What does it mean "used for error
> > messages"?  And where is the up-to-date and accurate documentation of
> > this function, which explains what is this argument for?
> >
> > Without knowing all that, we cannot fix our documentation, let alone
> > code.
> 
> The "base-url" is an argument to the Elisp function
> `libxml-parse-html-region'.  I added Lars to the CC, who originally
> introduced this function according to git-blame, and who may have a
> better idea.
> 
> The following portion are my impressions, but I'm happy to pass any
> questions you still have to the libxml2 devs if you want (or you can
> comment there directly in the linked issue on gnome's gitlab instance).
> 
> -----
> 
> As you pointed out, these arguments of the Elisp function are passed
> with minimal transformations and sent to the libxml2 function
> `htmlReadMemory()' function.  This C function takes an argument `url',
> which is the string `base-url' or empty string if `base-url' is nil.
> 
> According to Nick (the libxml2 maintainer) and my interpretation, the
> `url' parameter of the libxml2 function is simply stored inside the
> `url' field of a `xmlDoc' struct, to be used when an error message needs
> to be displayed.  So, the `url' parameter practically does nothing for
> us, since we disable all libxml2-level warnings and errors in calling
> `htmlReadMemory()'.
> 
> I put this url [1] to the issue assuming that it is the documentation,
> and Nick doesn't have any comment regarding the url.  So this is
> probably the up-to-date, albeit not very elaborate, documentation for
> the function.
> 
> [1]: https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-HTMLparser.html#htmlReadMemory

Thanks.  So I've now updated our documentation with this information,
and I'm therefore closing the bug.

[Message part 3 (message/rfc822, inline)]
From: Ruijie Yu <ruijie <at> netyu.xyz>
To: bug-gnu-emacs <at> gnu.org
Subject: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no
 effect?
Date: Fri, 28 Apr 2023 00:19:22 +0800
[I know I'm running a one-month old master.  I will try to reproduce
this issue again within a day with an up-to-date master unless someone
else does it first.  And -Q as well.]

I'm trying out the function `libxml2-parse-html-region' as recommended
by a thread in help-gnu-emacs.  However, I discovered that the last
argument of this function does not help me normalize a relative url.

Reproducer:

Visit the attached toy html file.  I imagine that it is hosted at
"https://example.com/good/day".

Run this snippet:

    (pp (libxml-parse-html-region
         (point-min) (point-max)
         "https://example.com/good/day"))

Compare it with this snippet:

    (pp (libxml-parse-html-region
         (point-min) (point-max)))

What I get is this result for both snippets (which is shown twice, once
"pretty-printed", and once returned as a string):

--8<---------------cut here---------------start------------->8---
(html nil
      (body nil "\n    "
            (a
             ((href . "/hello"))
             "1")
            "\n    "
            (a
             ((href . "../world"))
             "2")
            "\n    "
            (a
             ((href . "good"))
             "3")
            "\n    "
            (a
             ((href . "morning/or/night"))
             "4")
            "\n  "))
--8<---------------cut here---------------end--------------->8---

Notice, that the href values are not normalized: they are copied
verbatim from the original html file.

If I understand the docstring correctly, the last argument of
`libxml2-parse-html-region', when specified as a url string, should be
used as the "base point" of resolving relative paths found within the
html document.  But the <a href=xxx> paths are not resolved at the
moment.

---

In GNU Emacs 30.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version
 3.24.37, cairo version 1.17.8) of 2023-03-25 built on ruijie
Repository revision: db7e95531ac36ae842787b6c5f2859d0642c78cc
Repository branch: makepkg
System Description: Arch Linux

Configured using:
 'configure --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
 --localstatedir=/var --mandir=/usr/share/man --with-gameuser=:games
 --with-modules --without-libotf --without-m17n-flt --without-gconf
 --enable-link-time-optimization --with-native-compilation=yes
 --with-xinput2 --with-pgtk --without-xaw3d --with-sound=alsa
 --with-tree-sitter '--program-transform-name=s/\([ec]tags\)/\1.emacs/'
 'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions
 -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security
 -fstack-clash-protection -fcf-protection'
 LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER
PGTK PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS
TREE_SITTER WEBP XIM GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: @im=fcitx
  locale-coding-system: utf-8-unix

-- 
Best,


RY

[Please note that this mail might go to spam due to some
misconfiguration in my mail server -- still investigating.]



This bug report was last modified 2 years and 24 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.