From unknown Fri Jun 20 19:50:17 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#63125 <63125@debbugs.gnu.org> To: bug#63125 <63125@debbugs.gnu.org> Subject: Status: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect? Reply-To: bug#63125 <63125@debbugs.gnu.org> Date: Sat, 21 Jun 2025 02:50:17 +0000 retitle 63125 30.0.50; [BUG] last argument of libxml2-parse-html-region has= no effect? reassign 63125 emacs submitter 63125 Ruijie Yu severity 63125 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 27 12:33:29 2023 Received: (at submit) by debbugs.gnu.org; 27 Apr 2023 16:33:29 +0000 Received: from localhost ([127.0.0.1]:59632 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ps4Yj-000321-7D for submit@debbugs.gnu.org; Thu, 27 Apr 2023 12:33:29 -0400 Received: from lists.gnu.org ([209.51.188.17]:49648) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ps4Yh-00031u-Tp for submit@debbugs.gnu.org; Thu, 27 Apr 2023 12:33:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ps4Yh-0004vK-Mr for bug-gnu-emacs@gnu.org; Thu, 27 Apr 2023 12:33:27 -0400 Received: from netyu.xyz ([152.44.41.246] helo=mail.netyu.xyz) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ps4Yd-0006GS-KD for bug-gnu-emacs@gnu.org; Thu, 27 Apr 2023 12:33:25 -0400 Received: from fw.net.yu.netyu.xyz ( [222.248.4.98]) by netyu.xyz (OpenSMTPD) with ESMTPSA id 9d8a9d77 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Thu, 27 Apr 2023 16:33:20 +0000 (UTC) User-agent: mu4e 1.9.22; emacs 30.0.50 From: Ruijie Yu To: bug-gnu-emacs@gnu.org Subject: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect? Date: Fri, 28 Apr 2023 00:19:22 +0800 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=152.44.41.246; envelope-from=ruijie@netyu.xyz; helo=mail.netyu.xyz X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 0.6 (/) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) [I know I'm running a one-month old master. I will try to reproduce this issue again within a day with an up-to-date master unless someone else does it first. And -Q as well.] I'm trying out the function `libxml2-parse-html-region' as recommended by a thread in help-gnu-emacs. However, I discovered that the last argument of this function does not help me normalize a relative url. Reproducer: Visit the attached toy html file. I imagine that it is hosted at "https://example.com/good/day". Run this snippet: (pp (libxml-parse-html-region (point-min) (point-max) "https://example.com/good/day")) Compare it with this snippet: (pp (libxml-parse-html-region (point-min) (point-max))) What I get is this result for both snippets (which is shown twice, once "pretty-printed", and once returned as a string): --8<---------------cut here---------------start------------->8--- (html nil (body nil "\n " (a ((href . "/hello")) "1") "\n " (a ((href . "../world")) "2") "\n " (a ((href . "good")) "3") "\n " (a ((href . "morning/or/night")) "4") "\n ")) --8<---------------cut here---------------end--------------->8--- Notice, that the href values are not normalized: they are copied verbatim from the original html file. If I understand the docstring correctly, the last argument of `libxml2-parse-html-region', when specified as a url string, should be used as the "base point" of resolving relative paths found within the html document. But the paths are not resolved at the moment. --- In GNU Emacs 30.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.37, cairo version 1.17.8) of 2023-03-25 built on ruijie Repository revision: db7e95531ac36ae842787b6c5f2859d0642c78cc Repository branch: makepkg System Description: Arch Linux Configured using: 'configure --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib --localstatedir=/var --mandir=/usr/share/man --with-gameuser=:games --with-modules --without-libotf --without-m17n-flt --without-gconf --enable-link-time-optimization --with-native-compilation=yes --with-xinput2 --with-pgtk --without-xaw3d --with-sound=alsa --with-tree-sitter '--program-transform-name=s/\([ec]tags\)/\1.emacs/' 'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection' LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now' Configured features: ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG JSON LCMS2 LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER PGTK PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XIM GTK3 ZLIB Important settings: value of $LANG: en_US.UTF-8 value of $XMODIFIERS: @im=fcitx locale-coding-system: utf-8-unix -- Best, RY [Please note that this mail might go to spam due to some misconfiguration in my mail server -- still investigating.] From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 27 13:08:09 2023 Received: (at 63125) by debbugs.gnu.org; 27 Apr 2023 17:08:09 +0000 Received: from localhost ([127.0.0.1]:59676 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ps56H-000490-A3 for submit@debbugs.gnu.org; Thu, 27 Apr 2023 13:08:09 -0400 Received: from eggs.gnu.org ([209.51.188.92]:49288) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ps56B-00048L-Ip for 63125@debbugs.gnu.org; Thu, 27 Apr 2023 13:08:07 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ps563-0004b1-VF; Thu, 27 Apr 2023 13:07:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=pgHJajaOGx66Y2ffOJ6P7/yAbvWwtm+xRu9kb49zlcQ=; b=MSvuIduF/nxP y6UdAlrZNognhA6hg4upOUo/E4f59bq5WE9l3jE/RebxkStUoe9dCFFNC1gGmiE0VnAaAynSOa2a8 naP5/af0PRRTmpnF4oocYy0AQ/m2hfq3SSkMDxGWl1dQm+Ebq9ku8rLplcTLhmB1ARVfZsG3hfXly O9zn6iZ61hVl5Um5kMz8O37smOPiaoP99leDzjSsA6Rs+q+8NRn/t+VqLPEQVu7fIFDBy7/LOZ4UR Gh/gjAhzKgt8i42alc9pnUKIyJsEEaJLZDDHbysRs4U6A5Zx1nWazme6wXrXEd1FCWJF8brmbdGoV ZF10qVhES6qmssL9jD52Mg==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ps55q-0006sw-9j; Thu, 27 Apr 2023 13:07:55 -0400 Date: Thu, 27 Apr 2023 20:08:14 +0300 Message-Id: <83h6t1s16p.fsf@gnu.org> From: Eli Zaretskii To: Ruijie Yu In-Reply-To: (bug-gnu-emacs@gnu.org) Subject: Re: bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect? References: X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 63125 Cc: 63125@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Fri, 28 Apr 2023 00:19:22 +0800 > From: Ruijie Yu via "Bug reports for GNU Emacs, > the Swiss army knife of text editors" > > I'm trying out the function `libxml2-parse-html-region' as recommended > by a thread in help-gnu-emacs. However, I discovered that the last > argument of this function does not help me normalize a relative url. > > Reproducer: > > Visit the attached toy html file. I imagine that it is hosted at > "https://example.com/good/day". > > Run this snippet: > > (pp (libxml-parse-html-region > (point-min) (point-max) > "https://example.com/good/day")) > > Compare it with this snippet: > > (pp (libxml-parse-html-region > (point-min) (point-max))) > > What I get is this result for both snippets (which is shown twice, once > "pretty-printed", and once returned as a string): > > --8<---------------cut here---------------start------------->8--- > (html nil > (body nil "\n " > (a > ((href . "/hello")) > "1") > "\n " > (a > ((href . "../world")) > "2") > "\n " > (a > ((href . "good")) > "3") > "\n " > (a > ((href . "morning/or/night")) > "4") > "\n ")) > --8<---------------cut here---------------end--------------->8--- > > Notice, that the href values are not normalized: they are copied > verbatim from the original html file. > > If I understand the docstring correctly, the last argument of > `libxml2-parse-html-region', when specified as a url string, should be > used as the "base point" of resolving relative paths found within the > html document. But the paths are not resolved at the > moment. If you look at xml.c, you will see that we just call a libxml function passing it this URL. So if anything isn't as expected, the answer is in libxml, not in Emacs. From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 27 21:33:47 2023 Received: (at 63125) by debbugs.gnu.org; 28 Apr 2023 01:33:47 +0000 Received: from localhost ([127.0.0.1]:60141 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1psCzb-0006qP-7Q for submit@debbugs.gnu.org; Thu, 27 Apr 2023 21:33:47 -0400 Received: from netyu.xyz ([152.44.41.246]:36074 helo=mail.netyu.xyz) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1psCzW-0006q7-QL for 63125@debbugs.gnu.org; Thu, 27 Apr 2023 21:33:46 -0400 Received: from fw.net.yu.netyu.xyz ( [222.248.4.98]) by netyu.xyz (OpenSMTPD) with ESMTPSA id 706b7ef3 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Fri, 28 Apr 2023 01:33:40 +0000 (UTC) References: <83h6t1s16p.fsf@gnu.org> User-agent: mu4e 1.9.22; emacs 30.0.50 From: Ruijie Yu To: Eli Zaretskii Subject: Re: bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect? Date: Fri, 28 Apr 2023 09:30:30 +0800 In-reply-to: <83h6t1s16p.fsf@gnu.org> Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 63125 Cc: 63125@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) --=-=-= Content-Type: text/plain Eli Zaretskii writes: >> Date: Fri, 28 Apr 2023 00:19:22 +0800 >> From: Ruijie Yu via "Bug reports for GNU Emacs, >> the Swiss army knife of text editors" >> >> I'm trying out the function `libxml2-parse-html-region' as recommended >> by a thread in help-gnu-emacs. However, I discovered that the last >> argument of this function does not help me normalize a relative url. >> >> Reproducer: >> >> Visit the attached toy html file. I imagine that it is hosted at >> "https://example.com/good/day". >> >> Run this snippet: >> >> (pp (libxml-parse-html-region >> (point-min) (point-max) >> "https://example.com/good/day")) >> >> Compare it with this snippet: >> >> (pp (libxml-parse-html-region >> (point-min) (point-max))) >> >> What I get is this result for both snippets (which is shown twice, once >> "pretty-printed", and once returned as a string): >> >> --8<---------------cut here---------------start------------->8--- >> (html nil >> (body nil "\n " >> (a >> ((href . "/hello")) >> "1") >> "\n " >> (a >> ((href . "../world")) >> "2") >> "\n " >> (a >> ((href . "good")) >> "3") >> "\n " >> (a >> ((href . "morning/or/night")) >> "4") >> "\n ")) >> --8<---------------cut here---------------end--------------->8--- >> >> Notice, that the href values are not normalized: they are copied >> verbatim from the original html file. >> >> If I understand the docstring correctly, the last argument of >> `libxml2-parse-html-region', when specified as a url string, should be >> used as the "base point" of resolving relative paths found within the >> html document. But the paths are not resolved at the >> moment. > > If you look at xml.c, you will see that we just call a libxml function > passing it this URL. So if anything isn't as expected, the answer is > in libxml, not in Emacs. Thank you for pointing that out. I will take a look at its source in a day or two. I am also upgrading it from 2.10.3-2 to 2.10.4-2, and will see if that changes anything. If I end up deciding that it is a libxml2 bug, I'll file a bug there and link to this bug. For completeness, here attached is the toy html file that I forgot to attach in my initial report. --=-=-= Content-Type: text/html Content-Disposition: attachment; filename=hello.html 1 2 3 4 --=-=-= Content-Type: text/plain -- Best, RY [Please note that this mail might go to spam due to some misconfiguration in my mail server -- still investigating.] --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 28 06:23:26 2023 Received: (at 63125) by debbugs.gnu.org; 28 Apr 2023 10:23:26 +0000 Received: from localhost ([127.0.0.1]:60516 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1psLG9-0006Il-T1 for submit@debbugs.gnu.org; Fri, 28 Apr 2023 06:23:26 -0400 Received: from netyu.xyz ([152.44.41.246]:47838 helo=mail.netyu.xyz) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1psLG6-0006IY-PF for 63125@debbugs.gnu.org; Fri, 28 Apr 2023 06:23:24 -0400 Received: from fw.net.yu.netyu.xyz ( [222.248.4.98]) by netyu.xyz (OpenSMTPD) with ESMTPSA id 9a72ae21 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Fri, 28 Apr 2023 10:23:21 +0000 (UTC) References: <83h6t1s16p.fsf@gnu.org> User-agent: mu4e 1.9.22; emacs 30.0.50 From: Ruijie Yu To: Ruijie Yu Subject: Re: bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect? Date: Fri, 28 Apr 2023 18:18:21 +0800 In-reply-to: Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 63125 Cc: Eli Zaretskii , 63125@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Ruijie Yu writes: >> >> If you look at xml.c, you will see that we just call a libxml function >> passing it this URL. So if anything isn't as expected, the answer is >> in libxml, not in Emacs. > > Thank you for pointing that out. I will take a look at its source in a > day or two. I am also upgrading it from 2.10.3-2 to 2.10.4-2, and will > see if that changes anything. No difference -- as expected. > If I end up deciding that it is a libxml2 bug, I'll file a bug there and > link to this bug. I have filed an issue [1] in libxml2. We'll see what they say about it. FTR, [2] is the documentation of the libxml2's htmlReadMemory() function -- though it does not say much. [1]: https://gitlab.gnome.org/GNOME/libxml2/-/issues/525 [2]: https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-HTMLparser.html#htmlReadMemory. -- Best, RY [Please note that this mail might go to spam due to some misconfiguration in my mail server -- still investigating.] From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 28 06:51:19 2023 Received: (at 63125) by debbugs.gnu.org; 28 Apr 2023 10:51:19 +0000 Received: from localhost ([127.0.0.1]:60559 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1psLh9-0007Qa-43 for submit@debbugs.gnu.org; Fri, 28 Apr 2023 06:51:19 -0400 Received: from netyu.xyz ([152.44.41.246]:38802 helo=mail.netyu.xyz) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1psLh4-0007Pv-B5 for 63125@debbugs.gnu.org; Fri, 28 Apr 2023 06:51:17 -0400 Received: from fw.net.yu.netyu.xyz ( [222.248.4.98]) by netyu.xyz (OpenSMTPD) with ESMTPSA id 12868252 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Fri, 28 Apr 2023 10:51:12 +0000 (UTC) References: <83h6t1s16p.fsf@gnu.org> User-agent: mu4e 1.9.22; emacs 30.0.50 From: Ruijie Yu To: Ruijie Yu Subject: Re: bug#63125: 30.0.50; [BUG] last argument of libxml-parse-html-region has no effect? Date: Fri, 28 Apr 2023 18:40:35 +0800 In-reply-to: Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 63125 Cc: Eli Zaretskii , 63125@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Ruijie Yu writes: > Ruijie Yu writes: >>> >>> If you look at xml.c, you will see that we just call a libxml function >>> passing it this URL. So if anything isn't as expected, the answer is >>> in libxml, not in Emacs. >> >> Thank you for pointing that out. I will take a look at its source in a >> day or two. I am also upgrading it from 2.10.3-2 to 2.10.4-2, and will >> see if that changes anything. > > No difference -- as expected. > >> If I end up deciding that it is a libxml2 bug, I'll file a bug there and >> link to this bug. > > I have filed an issue [1] in libxml2. We'll see what they say about it. > > FTR, [2] is the documentation of the libxml2's htmlReadMemory() > function -- though it does not say much. > > [1]: https://gitlab.gnome.org/GNOME/libxml2/-/issues/525 > [2]: https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-HTMLparser.html#htmlReadMemory. I just got a response from one of libxml2's maintainers. It seems that the docstring for `libxml-parse-html-region' is wrong: this argument has never served the purpose of resolving relative URLs. It was only used for error messages. So I suggest that we modify the docstring of this function and `libxml-parse-xml-region' to reflect this fact. I also don't know if, based on this new information, you want to mark this parameter obsolete. I see no immediate need, though. Should I send a patch for the documentation change, or will you do it? -- Best, RY [Please note that this mail might go to spam due to some misconfiguration in my mail server -- still investigating.] From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 28 07:31:06 2023 Received: (at 63125) by debbugs.gnu.org; 28 Apr 2023 11:31:06 +0000 Received: from localhost ([127.0.0.1]:60658 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1psMJd-0002ev-PW for submit@debbugs.gnu.org; Fri, 28 Apr 2023 07:31:06 -0400 Received: from eggs.gnu.org ([209.51.188.92]:43304) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1psMJY-0002ZV-Qv for 63125@debbugs.gnu.org; Fri, 28 Apr 2023 07:31:04 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1psMJS-0004bG-AO; Fri, 28 Apr 2023 07:30:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=Mos3TKwbHhYdY4DWNHoWpQX5Ij/41pw5MsIpHBhUTKU=; b=PILBxNhKlons WWHBxsmBRFsGNxnMdL3c3mbkU88wZv9kM9x7lG55/3v+S7d4spTStoyQEtgQz5D6s3ivrR1b3oQzy SaWFWtd/V519+SFsHNPgl3vKYyRIoh8OXkqsMVfd+nqpWUZPiPbL49FUg2cZF/6jvDD+MRyioKkjd bBnwbrIBzu7mveNvMnxoq04Tfu/IdUJvdHuXQtToh9ecIqocF/CBYoXz1w0sBtM7UlGhCXD9RPnX/ 5jLC+wkrtEJYMVtIPD81J+zeH5W0vBo4/579j5DdoghKqpRZrAmylNNoO+a/nFBO0NgeVP3z4CnS4 RvbvvkuZATyYOlAAso7SwA==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1psMJR-0006kR-KW; Fri, 28 Apr 2023 07:30:54 -0400 Date: Fri, 28 Apr 2023 14:31:28 +0300 Message-Id: <83ttx0qm3z.fsf@gnu.org> From: Eli Zaretskii To: Ruijie Yu In-Reply-To: (message from Ruijie Yu on Fri, 28 Apr 2023 18:40:35 +0800) Subject: Re: bug#63125: 30.0.50; [BUG] last argument of libxml-parse-html-region has no effect? References: <83h6t1s16p.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 63125 Cc: 63125@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Ruijie Yu > Cc: Eli Zaretskii , 63125@debbugs.gnu.org > Date: Fri, 28 Apr 2023 18:40:35 +0800 > > > I have filed an issue [1] in libxml2. We'll see what they say about it. > > > > FTR, [2] is the documentation of the libxml2's htmlReadMemory() > > function -- though it does not say much. > > > > [1]: https://gitlab.gnome.org/GNOME/libxml2/-/issues/525 > > [2]: https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-HTMLparser.html#htmlReadMemory. > > I just got a response from one of libxml2's maintainers. > > It seems that the docstring for `libxml-parse-html-region' is wrong: > this argument has never served the purpose of resolving relative URLs. > It was only used for error messages. So I suggest that we modify the > docstring of this function and `libxml-parse-xml-region' to reflect this > fact. The response doesn't say much. What is this "base URL" argument used for, and why is it named "bas URL"? What does it mean "used for error messages"? And where is the up-to-date and accurate documentation of this function, which explains what is this argument for? Without knowing all that, we cannot fix our documentation, let alone code. From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 28 21:21:14 2023 Received: (at 63125) by debbugs.gnu.org; 29 Apr 2023 01:21:14 +0000 Received: from localhost ([127.0.0.1]:34760 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1psZH0-0000Ly-1w for submit@debbugs.gnu.org; Fri, 28 Apr 2023 21:21:14 -0400 Received: from netyu.xyz ([152.44.41.246]:40170 helo=mail.netyu.xyz) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1psZGx-0000Lp-UU for 63125@debbugs.gnu.org; Fri, 28 Apr 2023 21:21:12 -0400 Received: from fw.net.yu.netyu.xyz ( [222.248.4.98]) by netyu.xyz (OpenSMTPD) with ESMTPSA id bed7e80d (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Sat, 29 Apr 2023 01:21:10 +0000 (UTC) References: <83h6t1s16p.fsf@gnu.org> <83ttx0qm3z.fsf@gnu.org> User-agent: mu4e 1.9.22; emacs 30.0.50 From: Ruijie Yu To: Eli Zaretskii Subject: Re: bug#63125: 30.0.50; [BUG] last argument of libxml-parse-html-region has no effect? Date: Sat, 29 Apr 2023 08:58:03 +0800 In-reply-to: <83ttx0qm3z.fsf@gnu.org> Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 63125 Cc: Lars Ingebrigtsen , 63125@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Eli Zaretskii writes: >> From: Ruijie Yu >> Cc: Eli Zaretskii , 63125@debbugs.gnu.org >> Date: Fri, 28 Apr 2023 18:40:35 +0800 >> >> > I have filed an issue [1] in libxml2. We'll see what they say about it. >> > >> > FTR, [2] is the documentation of the libxml2's htmlReadMemory() >> > function -- though it does not say much. >> > >> > [1]: https://gitlab.gnome.org/GNOME/libxml2/-/issues/525 >> > [2]: >> > https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-HTMLparser.html#htmlReadMemory. >> >> I just got a response from one of libxml2's maintainers. >> >> It seems that the docstring for `libxml-parse-html-region' is wrong: >> this argument has never served the purpose of resolving relative URLs. >> It was only used for error messages. So I suggest that we modify the >> docstring of this function and `libxml-parse-xml-region' to reflect this >> fact. > > The response doesn't say much. What is this "base URL" argument used > for, and why is it named "bas URL"? What does it mean "used for error > messages"? And where is the up-to-date and accurate documentation of > this function, which explains what is this argument for? > > Without knowing all that, we cannot fix our documentation, let alone > code. The "base-url" is an argument to the Elisp function `libxml-parse-html-region'. I added Lars to the CC, who originally introduced this function according to git-blame, and who may have a better idea. The following portion are my impressions, but I'm happy to pass any questions you still have to the libxml2 devs if you want (or you can comment there directly in the linked issue on gnome's gitlab instance). ----- As you pointed out, these arguments of the Elisp function are passed with minimal transformations and sent to the libxml2 function `htmlReadMemory()' function. This C function takes an argument `url', which is the string `base-url' or empty string if `base-url' is nil. According to Nick (the libxml2 maintainer) and my interpretation, the `url' parameter of the libxml2 function is simply stored inside the `url' field of a `xmlDoc' struct, to be used when an error message needs to be displayed. So, the `url' parameter practically does nothing for us, since we disable all libxml2-level warnings and errors in calling `htmlReadMemory()'. I put this url [1] to the issue assuming that it is the documentation, and Nick doesn't have any comment regarding the url. So this is probably the up-to-date, albeit not very elaborate, documentation for the function. [1]: https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-HTMLparser.html#htmlReadMemory -- Best, RY [Please note that this mail might go to spam due to some misconfiguration in my mail server -- still investigating.] From debbugs-submit-bounces@debbugs.gnu.org Sat Apr 29 02:39:53 2023 Received: (at 63125-done) by debbugs.gnu.org; 29 Apr 2023 06:39:53 +0000 Received: from localhost ([127.0.0.1]:35029 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pseFN-0001AE-9p for submit@debbugs.gnu.org; Sat, 29 Apr 2023 02:39:53 -0400 Received: from eggs.gnu.org ([209.51.188.92]:42574) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pseFL-00019x-JY for 63125-done@debbugs.gnu.org; Sat, 29 Apr 2023 02:39:52 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pseFF-0000JO-3d; Sat, 29 Apr 2023 02:39:45 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=wEKDVHoWtYlNqVr3JbVeT9f/Cjz/51gwB533yscLQ3U=; b=fNZZMGocvPDs gB+8iLz30d08Dk+ifwijp7Ja8KpouH6oyg0BWktrEJtr8m+OWIKSwEbzvwKwl3giBPr1UzkOlyO3x U6qNSHBxybmXemI0jzJGP9DAhGhzeRYF/xGNzYFwkwiFkY3W0e/tXiyXRqNZtI5a7Wt4KAHj+RPmc RaX6H3AymKy+du7DVkJD7BSdiwEYxxJ3bH98w+a9eXHJwSDCryBqjQC/odfNQRhE8wuq0AH+rBKv/ 5fDTIRkVIeGrYERlTUXgdVxAQ52wBBFPTCYf8FyIcx7aK5D+rkXxTBwbQeM1kxYIrjr0OO2/LQDFA ZKekHbcr6rE1qWhdiPOyNg==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pseFE-0001AG-BR; Sat, 29 Apr 2023 02:39:44 -0400 Date: Sat, 29 Apr 2023 09:40:19 +0300 Message-Id: <83a5yrqjho.fsf@gnu.org> From: Eli Zaretskii To: Ruijie Yu In-Reply-To: (message from Ruijie Yu on Sat, 29 Apr 2023 08:58:03 +0800) Subject: Re: bug#63125: 30.0.50; [BUG] last argument of libxml-parse-html-region has no effect? References: <83h6t1s16p.fsf@gnu.org> <83ttx0qm3z.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 63125-done Cc: larsi@gnus.org, 63125-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Ruijie Yu > Cc: 63125@debbugs.gnu.org, Lars Ingebrigtsen > Date: Sat, 29 Apr 2023 08:58:03 +0800 > > > The response doesn't say much. What is this "base URL" argument used > > for, and why is it named "bas URL"? What does it mean "used for error > > messages"? And where is the up-to-date and accurate documentation of > > this function, which explains what is this argument for? > > > > Without knowing all that, we cannot fix our documentation, let alone > > code. > > The "base-url" is an argument to the Elisp function > `libxml-parse-html-region'. I added Lars to the CC, who originally > introduced this function according to git-blame, and who may have a > better idea. > > The following portion are my impressions, but I'm happy to pass any > questions you still have to the libxml2 devs if you want (or you can > comment there directly in the linked issue on gnome's gitlab instance). > > ----- > > As you pointed out, these arguments of the Elisp function are passed > with minimal transformations and sent to the libxml2 function > `htmlReadMemory()' function. This C function takes an argument `url', > which is the string `base-url' or empty string if `base-url' is nil. > > According to Nick (the libxml2 maintainer) and my interpretation, the > `url' parameter of the libxml2 function is simply stored inside the > `url' field of a `xmlDoc' struct, to be used when an error message needs > to be displayed. So, the `url' parameter practically does nothing for > us, since we disable all libxml2-level warnings and errors in calling > `htmlReadMemory()'. > > I put this url [1] to the issue assuming that it is the documentation, > and Nick doesn't have any comment regarding the url. So this is > probably the up-to-date, albeit not very elaborate, documentation for > the function. > > [1]: https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-HTMLparser.html#htmlReadMemory Thanks. So I've now updated our documentation with this information, and I'm therefore closing the bug. From unknown Fri Jun 20 19:50:17 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 27 May 2023 11:24:07 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator