From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 16 08:25:02 2024 Received: (at submit) by debbugs.gnu.org; 16 Jan 2024 13:25:02 +0000 Received: from localhost ([127.0.0.1]:48136 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rPjR7-0002QI-Ia for submit@debbugs.gnu.org; Tue, 16 Jan 2024 08:25:02 -0500 Received: from lists.gnu.org ([2001:470:142::17]:57910) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rPjR5-0002PP-UH for submit@debbugs.gnu.org; Tue, 16 Jan 2024 08:25:00 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPjQr-0001qO-P2 for bug-gnu-emacs@gnu.org; Tue, 16 Jan 2024 08:24:46 -0500 Received: from mail.eshelyaron.com ([107.175.124.16] helo=eshelyaron.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPjQq-0003B5-2c for bug-gnu-emacs@gnu.org; Tue, 16 Jan 2024 08:24:45 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=eshelyaron.com; s=mail; t=1705411482; bh=BWDmJeXL3mIhepGxnw21xQ7nUSyoFpSh1at1RAFG7F8=; h=From:To:Subject:Date:From; b=oZ4tDIAPoZp0Kr7HMmN3hcJWKRTDH/b4CmKNF8lGfptsdebaX+9Rsims95Kzsa1tj RBNmaWOxZ02OAdi24iYSwtyM/DJkt90QgYWnXaW1kNpMu6z6vTvaSCI2JnjvRMb+6+ PwevQ8e3nvCtL1itF8IMqsGBu+MgLKsf/kT3zEPoqupEeSEs9KLuCqadNarKggZ1Ce 41VOAg6OfNJgM1nRTXoyEvf2716JOCl45UZxsrOVu+G76UkvSicHae9aQ3TOXW+SXt 4S1NOuVcCPW3Q5yTk5K9PnEthzsvpmfM6MGiOSjDCCr2xxjJlwGybk3M59Im8nI48u vpKQfSizvJRbg== From: Eshel Yaron To: bug-gnu-emacs@gnu.org Subject: [PATCH] ; (dom-print): Use HTML entities for reserved characters. X-Hashcash: 1:20:240116:bug-gnu-emacs@gnu.org::HrVEa/zHsm5/RWZj:1atS Date: Tue, 16 Jan 2024 14:24:40 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Received-SPF: pass client-ip=107.175.124.16; envelope-from=me@eshelyaron.com; helo=eshelyaron.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.1 (/) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Tags: patch This makes `dom-print` encode HTML reserved characters that occur in string elements of the DOM, to ensure the validity of the result. For example, put the following in `foo.html`: --8<---------------cut here---------------start------------->8--- Add =E2=80=98<div class=3D"default"> </div>= ;=E2=80=99 tags around the fontified body. --8<---------------cut here---------------end--------------->8--- (Fragment from https://www.gnu.org/software/emacs/manual/html_mono/htmlfont= ify.html) Open that file in Emacs and say `M-: (require 'dom)` and then `(dom-print (libxml-parse-html-region))` in the HTML buffer. This produces invalid HTML since `libxml-parse-html-region` correctly decodes HTML entities, but `dom-print` doesn't encode (without this patch). --=-=-= Content-Type: text/patch Content-Disposition: attachment; filename=0001-dom-print-Use-HTML-entities-for-reserved-characters.patch >From 259c0138623c352acc7bcd79a1fda42ec606a0cf Mon Sep 17 00:00:00 2001 From: Eshel Yaron Date: Fri, 5 Jan 2024 16:40:44 +0100 Subject: [PATCH] ; (dom-print): Use HTML entities for reserved characters. --- lisp/dom.el | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lisp/dom.el b/lisp/dom.el index f7043ba8252..b329379fdc3 100644 --- a/lisp/dom.el +++ b/lisp/dom.el @@ -288,7 +288,7 @@ dom-print (insert ">") (dolist (child children) (if (stringp child) - (insert child) + (insert (url-insert-entities-in-string child)) (setq non-text t) (when pretty (insert "\n" (make-string (+ column 2) ?\s))) -- 2.42.0 --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 16 08:47:51 2024 Received: (at 68508) by debbugs.gnu.org; 16 Jan 2024 13:47:51 +0000 Received: from localhost ([127.0.0.1]:48169 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rPjnD-0008Mf-Bg for submit@debbugs.gnu.org; Tue, 16 Jan 2024 08:47:51 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:38138) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rPjnA-0008Lq-QN for 68508@debbugs.gnu.org; Tue, 16 Jan 2024 08:47:50 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rPjn4-00007I-HA; Tue, 16 Jan 2024 08:47:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=Yu4l3cR79Ujp1dCVeLGTtRmZmix7+4nopg++CVdS6r8=; b=lpUb/3Qr34ENDRtRbn8d tAh5XS5ivY4E6W6fA//BBYiTG/9uF+RfEoFcjMzrgy8D3D9SgE9OyKffM5sMJ9DTiEYe3PXdvjy+l EgSn0fxaS2JHKDo7n8srglzgAxILPym4o5/cI205uTU4/RpSNj+YUG8X7CtHup/OjPbGdY1koA1am 1GwN1eFQfLPWk9CtZie/Zu8orE1F+xP3Iq5b0CVYkqpibaFYuLaAbCoC99TDjLFMWT1yGZaG+GVla owuKs2ckKuY5Tc1mEsM+1rf+EAUgOF8a1cWTRpyJvALZlGmlOgJkifgEaWlEEuSgBPN1sbsvwj7DW IZAwBxzyl55rIQ==; Date: Tue, 16 Jan 2024 15:47:30 +0200 Message-Id: <837ck9crv1.fsf@gnu.org> From: Eli Zaretskii To: Eshel Yaron In-Reply-To: (bug-gnu-emacs@gnu.org) Subject: Re: bug#68508: [PATCH] ; (dom-print): Use HTML entities for reserved characters. References: MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 68508 Cc: 68508@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Tue, 16 Jan 2024 14:24:40 +0100 > From: Eshel Yaron via "Bug reports for GNU Emacs, > the Swiss army knife of text editors" > > This makes `dom-print` encode HTML reserved characters that occur in > string elements of the DOM, to ensure the validity of the result. > > For example, put the following in `foo.html`: > > --8<---------------cut here---------------start------------->8--- > > Add ‘<div class="default"> </div>’ tags around the fontified body. > > --8<---------------cut here---------------end--------------->8--- > (Fragment from https://www.gnu.org/software/emacs/manual/html_mono/htmlfontify.html) > > Open that file in Emacs and say `M-: (require 'dom)` and then > `(dom-print (libxml-parse-html-region))` in the HTML buffer. This > produces invalid HTML since `libxml-parse-html-region` correctly decodes > HTML entities, but `dom-print` doesn't encode (without this patch). Thanks, but could you please also add tests for this? From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 16 11:29:19 2024 Received: (at 68508) by debbugs.gnu.org; 16 Jan 2024 16:29:19 +0000 Received: from localhost ([127.0.0.1]:49551 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rPmJS-00067E-L2 for submit@debbugs.gnu.org; Tue, 16 Jan 2024 11:29:18 -0500 Received: from mail.eshelyaron.com ([107.175.124.16]:32908 helo=eshelyaron.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rPmJQ-000676-1i for 68508@debbugs.gnu.org; Tue, 16 Jan 2024 11:29:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=eshelyaron.com; s=mail; t=1705422554; bh=Npq7LYejLn14g+vyCAgJKyUcK+jMduKopQRSs4MH5rI=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=WLZbF7dIuD7+gFwMRfRO05JZySJdimcbTMjjnSUEvXQdhNU2Sb9qnn2BlRXhREoBD jMR5FZvu83cAzBAlfNBfkGk1wvnTNKYEUSQUDdsyaMf9O/6DkR16XcVHU058eHdf0X 7CDaf7FNbnfqTgSGtdS2b3J5+LM06q1LP6IHOUEOLJa2s6rKHpM7Oysbc5y+sgZg3p UOnVPFyx/6g2YeWvxmfPxkpZPWuga+p3nqAE6TBwZUvfXfLY3bOfpt9Fev1DnVBTmi 5sO/UMOvlQGj6o9Ia2DUCxy7jJR35oWA4URIhMqTKpA0ggh5PNEM/Nv1oYWn4n1F59 2+g1UAUXcnbdw== From: Eshel Yaron To: Eli Zaretskii Subject: Re: bug#68508: [PATCH] ; (dom-print): Use HTML entities for reserved characters. In-Reply-To: <837ck9crv1.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 16 Jan 2024 15:47:30 +0200") References: <837ck9crv1.fsf@gnu.org> X-Hashcash: 1:20:240116:eliz@gnu.org::islpTNJAJAN3zb1j:cto X-Hashcash: 1:20:240116:68508@debbugs.gnu.org::HiP52M1Ee0dtB/Hb:604j Date: Tue, 16 Jan 2024 17:29:12 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 68508 Cc: 68508@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Eli Zaretskii writes: >> Date: Tue, 16 Jan 2024 14:24:40 +0100 >> From: Eshel Yaron via "Bug reports for GNU Emacs, >> the Swiss army knife of text editors" >> >> This makes `dom-print` encode HTML reserved characters that occur in >> string elements of the DOM, to ensure the validity of the result. >> >> For example, put the following in `foo.html`: >> >> --8<---------------cut here---------------start------------->8--- >> >> Add =E2=80=98<div class=3D"default"> </div= >=E2=80=99 tags around the fontified body. >> >> --8<---------------cut here---------------end--------------->8--- >> (Fragment from https://www.gnu.org/software/emacs/manual/html_mono/htmlf= ontify.html) >> >> Open that file in Emacs and say `M-: (require 'dom)` and then >> `(dom-print (libxml-parse-html-region))` in the HTML buffer. This >> produces invalid HTML since `libxml-parse-html-region` correctly decodes >> HTML entities, but `dom-print` doesn't encode (without this patch). > > Thanks, but could you please also add tests for this? Sure, I've added a test to dom-tests.el in the updated patch below. --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=v2-0001-Use-HTML-entities-for-reserved-characters-in-dom-.patch >From 8d60074053ee1ebc04fc3fda417d53ddc5a4fac9 Mon Sep 17 00:00:00 2001 From: Eshel Yaron Date: Fri, 5 Jan 2024 16:40:44 +0100 Subject: [PATCH v2] ; Use HTML entities for reserved characters in 'dom-print' * lisp/dom.el (dom-print): Encode HTML reserved characters in strings. * test/lisp/dom-tests.el (dom-tests-print): New test. (Bug#68508) --- lisp/dom.el | 2 +- test/lisp/dom-tests.el | 10 ++++++++++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/lisp/dom.el b/lisp/dom.el index f7043ba8252..b329379fdc3 100644 --- a/lisp/dom.el +++ b/lisp/dom.el @@ -288,7 +288,7 @@ dom-print (insert ">") (dolist (child children) (if (stringp child) - (insert child) + (insert (url-insert-entities-in-string child)) (setq non-text t) (when pretty (insert "\n" (make-string (+ column 2) ?\s))) diff --git a/test/lisp/dom-tests.el b/test/lisp/dom-tests.el index 8cbfb9ad9df..a4e913541bf 100644 --- a/test/lisp/dom-tests.el +++ b/test/lisp/dom-tests.el @@ -209,6 +209,16 @@ dom-tests-pp (dom-pp node t) (should (equal (buffer-string) "(\"foo\" nil)"))))) +(ert-deftest dom-tests-print () + "Test that `dom-print' correctly encodes HTML reserved characters." + (with-temp-buffer + (dom-print '(samp ((class . "samp")) "
")) + (should (equal + (buffer-string) + (concat "" + "<div class="default"> </div>" + ""))))) + (ert-deftest dom-test-search () (let ((dom '(a nil (b nil (c nil))))) (should (equal (dom-search dom (lambda (d) (eq (dom-tag d) 'a))) -- 2.42.0 --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 20 04:42:37 2024 Received: (at 68508-done) by debbugs.gnu.org; 20 Jan 2024 09:42:37 +0000 Received: from localhost ([127.0.0.1]:60869 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rR7s5-0001lS-5s for submit@debbugs.gnu.org; Sat, 20 Jan 2024 04:42:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:43884) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rR7s3-0001lD-5r for 68508-done@debbugs.gnu.org; Sat, 20 Jan 2024 04:42:35 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rR7ru-000108-I0; Sat, 20 Jan 2024 04:42:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=IGv43dayNMsM4OF7imoLWbVlhkwxSLkce3DEqTpPnuE=; b=c/3w7n64fkTm fMTjiNTOYbyXLzByzFnquVIlp/CC2w9B4CGDp1hoWQRASuPyabGeA8Nr7scrZYBNdLg1UT/qLiD1E LaAK3UHllSdrWRtDlBEMTS7jrDf+3iv7hYK3NSy7ux6s8uNdEV3SVqrZzDi1a7nTluPEhCJrs18uc qvXcMmrJ3vIQXNK1YAoVRC/L51AxNaW9d2O4YM+z0XupObQSIrzXQ2pyWsQJ0V35iJHZQEwxEE2WX o3jplZd2XDLTzUhEG938e2UuxVfsBwrQDIIMBpOPKOqQ/f/O5mQac6FCabB4kZ+dRjHwgZQI10SIV p3ORpt1RlZvW/+EmC4+XZg==; Date: Sat, 20 Jan 2024 11:42:07 +0200 Message-Id: <83frystk7k.fsf@gnu.org> From: Eli Zaretskii To: Eshel Yaron In-Reply-To: (message from Eshel Yaron on Tue, 16 Jan 2024 17:29:12 +0100) Subject: Re: bug#68508: [PATCH] ; (dom-print): Use HTML entities for reserved characters. References: <837ck9crv1.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 68508-done Cc: 68508-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Eshel Yaron > Cc: 68508@debbugs.gnu.org > Date: Tue, 16 Jan 2024 17:29:12 +0100 > > Eli Zaretskii writes: > > > Thanks, but could you please also add tests for this? > > Sure, I've added a test to dom-tests.el in the updated patch below. Thanks, installed on master, and closing the bug. From unknown Thu Jun 19 14:29:17 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 17 Feb 2024 12:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator