From unknown Sat Aug 16 00:33:50 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#24831 <24831@debbugs.gnu.org> To: bug#24831 <24831@debbugs.gnu.org> Subject: Status: shr mangling messages Reply-To: bug#24831 <24831@debbugs.gnu.org> Date: Sat, 16 Aug 2025 07:33:50 +0000 retitle 24831 shr mangling messages reassign 24831 emacs submitter 24831 =E7=A9=8D=E4=B8=B9=E5=B0=BC Dan Jacobson severity 24831 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Sun Oct 30 22:46:20 2016 Received: (at submit) by debbugs.gnu.org; 31 Oct 2016 02:46:20 +0000 Received: from localhost ([127.0.0.1]:37530 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c12c8-0002bg-3r for submit@debbugs.gnu.org; Sun, 30 Oct 2016 22:46:20 -0400 Received: from eggs.gnu.org ([208.118.235.92]:52391) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c12c6-0002bT-6S for submit@debbugs.gnu.org; Sun, 30 Oct 2016 22:46:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c12c0-0002Ul-3l for submit@debbugs.gnu.org; Sun, 30 Oct 2016 22:46:13 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:37662) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c12c0-0002UF-0g for submit@debbugs.gnu.org; Sun, 30 Oct 2016 22:46:12 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43358) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c12by-0002SB-DR for bug-gnu-emacs@gnu.org; Sun, 30 Oct 2016 22:46:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c12bv-0002Q1-9T for bug-gnu-emacs@gnu.org; Sun, 30 Oct 2016 22:46:10 -0400 Received: from homie.mail.dreamhost.com ([208.97.132.208]:40663 helo=homiemail-a2.g.dreamhost.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c12bu-0002NC-SZ for bug-gnu-emacs@gnu.org; Sun, 30 Oct 2016 22:46:07 -0400 Received: from homiemail-a2.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a2.g.dreamhost.com (Postfix) with ESMTP id C2111280070; Sun, 30 Oct 2016 19:46:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=jidanni.org; h=from:to:cc :subject:references:date:message-id:mime-version:content-type; s=jidanni.org; bh=4grULzmgRmXv2spZXIQudNHGHdc=; b=T/UY0mwpl+ryc j5GC1yyVXDvkUEJuvt/QIxz/qpWsmw31pHcFmCHgrn6WjzK1b0Q7Iy/Zm1lOWK6u QA6qwRYsTqLOVApgwU4D7+lqke60CUN9Zy5eRe5PJwQPjHdjcOS3SlsW9cPiZEXU IhCEOQQSw7AkiVY7t2K323INV65XT8= Received: from jidanni.org (122-118-148-81.dynamic.hinet.net [122.118.148.81]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: jidanni@jidanni.org) by homiemail-a2.g.dreamhost.com (Postfix) with ESMTPSA id EA8D0280062; Sun, 30 Oct 2016 19:46:02 -0700 (PDT) From: =?utf-8?B?56mN5Li55bC8?= Dan Jacobson To: bug-gnu-emacs Subject: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> Date: Mon, 31 Oct 2016 10:45:58 +0800 Message-ID: <87shrd6xsp.fsf_-_@jidanni.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit Cc: Katsumi Yamaoka X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Gentelmen, the "shr" program is mangling messages. It could remove vital words, causing arguments: "I did include the address!" "No you didn't." "Yes I did. Your mail reader probably cut it out." We're talking data loss here. It may still be on the disk, but not shown to the user. True, the HTML might not be perfect, but at least Chromium, Firefox, etc. show it fine. >>>>> "KY" =3D=3D Katsumi Yamaoka writes: KY> Emacs-w3m renders it as: KY> http://w = =20 KY> Hi, you have a new email from Catherineme = =20 KY> [25] = =20 KY> View your inbox at http://www.travel-buddies.com/Inbox.aspx= =20 = =20 KY> =C2=A9 Travel Buddies 2015 | All rights reserved = =20 Hmmm, w3m -dump on the attachment shows the first URL in full. KY> However shr renders it as: KY> Travel Buddies =20 KY> =C2=A9 Travel Buddies 2015 | All rights reserved =20 KY> http://www.travel-buddies.com/=20 KY> *=20 KY> There lacks the "Hi, you have a new mail" message. The return KY> value of `libxml-parse-html-region' contains the message as KY> (h1 nil (span nil "Hi, you have a new email from") "Catherineme") KY> (p nil "View your inbox at " KY> (a ((href . "http://www.travel-buddies.com/Inbox.aspx")) KY> "http://www.travel-buddies.com/Inbox.aspx")) KY> regardless of whether all style specs are removed[1] or not KY> (three nil portions above are replaced with style specs if they KY> are not removed). So, style specs are not cause of not KY> displaying some meaningful message in an html mail, I believe. KY> In that case, making shr display images does not help. KY> I think there's something wrong in shr.el, and what you should KY> do would be to send a bug report to the Emacs bug team, i.e., KY> M-x report-emacs-bug, with the sample html part (I'm not so KY> familiar with recent shr, sorry). Note that a mail containing KY> html part might be rejected by the server, so putting it in your KY> web site separately would be better. KY> [1] I tested it by modifying mm-shr so as to remove style specs. OK I'll send the message, --=-=-= Content-Type: application/gzip Content-Disposition: attachment; filename=SHRcutOFFmessage.gz Content-Transfer-Encoding: base64 Content-Description: message H4sICCX/E1gAA1NIUmN1dE9GRm1lc3NhZ2UApVdtb+JIEv48/hUtRneT3GHT7RgMBqNhgNlFE5Io 5KJdjVaRsRvcM7bb224msFrdb78qA8Fkkkx2L1YQrq7uquept+ajkinJ5FxGGzILNLkMNbE7hHY8 p+nRDrEpaxnXXK9UZl4FOvZIL5OK58nmvVbBN56Y81UUCV5YoUz7oBly8Y1HHlngwYkMgySWhSYn zHYtCg/zGGXO2SmZb8gXEQVZJs6s3RdLqiW5Fzo2CJlMB1dOl9jNrUvgBmEtr0k95hCTwp/xiznL g9Qcxjz8ypV5y1UhZOYRlA6KIigKkREHjZqKtTpO02mats3a5AROa5q0bdLWKZGZ8SYvGKPUtSLF gxTdzUs4ewvnHHB65F/795kO9KrwyIWskyIENnxmtYniv6+E4hG8dIjmhS78n2+m53cA5Kfx3eXF +a93Nq0bb0rhdDybgbg+nUzHd6UEFerXw9vR3eTibjq7mnyCBbtOrv9zPp7djq9nk8sLn7mtlt20 HxwLsuUqWHLwBUSXSixFFiTmjfT21LL3z2Eb8QQCBf6+Tv1RYGOZCp4GIjHTtW0tD+qoTE5QWKpY qPNo9bNN21bHtdiZbdmM/nZqvIFceMYwObmC7wuxPi0Tg4xn05srIiIydNxBxx6yFqUf2dB4s5CK 9H4Eo98lNyteP8oq2vIc17PbkFUupWBvdHP6GC6i4OsSDMbbSjlg/F72ueMAqpblQJo7rR2wH1D1 DLz22BmOBjalg8ExtveVWvm7cAquIPQ2c03mOqbtUKgLamE+mNulwso41OxnULFAxQIVC1R+IzFP pD+a3V5TcM1tbREWqc6pc8RECYbjQkBOxmuRQiG2gRXYcMIzKCeZc7P05aV2AtpABZv/YTsXWPMf ODc/7mL9BB1PsMEcZIMx8m8K+WBguR0aBbOo8RF88EjtpjROPmyN14yXu1ylYqr2jVGgufeU/dbO /mw1/8JD7ZFf5YrEcDAJSMbvScqhXy25MZSZ5pk2bzY5HKT5WjdinSZdEsaBKrj2V4UZFKEQB00V ZMUC2t84C2UksqVHfl9JDWWdK5HpYJ5waA3TXWjMwQpbOKPOWcd1m9WVkYT4ISkQb9rGpU/n05l5 vUq4ORmBfC+Zbn01B6EuWQwTHmT7xUGmxVGPzKSGHgkM8ahOEhHyrODQKgutRLm/uvEqFkVcQvgr u26FQkOfgiKHwH6FMcbDlRJ6Q9oWJZgq5yJbrQkihTVM7zr5ts0BVLGY5dqsTuZBgY0Uw9ZgtAFR pK7nUA+K6G0b093ufGf2AeYQSYBp8FXkOY9A76cMVtGkOZMrFUI0sTwiobz/Nqbbbw2IvLFncwK1 2UPbjNpNdua4dseq9ID3L3aRPhhccl0uKw4kwcCKyvIyUTaXa49MLj5c/mIAFQjyDKKp+MJ7egKX PTsVReixtt05M4weJmG/F/Mg6ve00AnvHxdMr7GV9hpbnfI6MV+GMpHKPxvVfOMtt/GpwX7MSRLy JMmDCDMWFWitlBR5EB4kc6kirnYvvnEvIh3jG1TSP2rV4yuHK/iPSJCIZYYrkDmaq51V3/gbdh+s wqBBLwq9STi+p4GCcWtqmXt2M19399Z/BM83XmPowcxWx2P5mhQygXb4NnR9I3RDt3vA+wI12KY/ wIExF8tYlzo2zde+UbGxgGZiLoJUJBtvoEQADacUFeIP7jHL5ulOcF8e4s1lEnWBi9KK93ZR/nXJ DjBc8MBZCg/yAk6KdEkKFaKlWOvcazTu7+8tbKwQkaPW2hApXmcaiVxKK8+WNYikfmJfZRd6kTYq YEou4PYKTnS3oMEhfKmRBiSojvBDVanzjefI2yKrHL6HSB5CXob73jf2EWg2d8H+K1mG3vhG6c9j U+Zcag0zqjS427SXHSUFPL5RJsWB73fP84as5bHUsmjoeJXOC+h4FqO+YWEXumPUanbg885tudaX fPnuwbF3i0QG2kv4Qnd3NaBKljGtuu/6Pd9Aksn/AwaA7DM8Zq/NU8zSXUZC0vsGpD0eAKxnhyN2 Cs0AH1j+WdTJ5ngYl20WL0q+kfYauL1PhvDTh8NA5SmHHscAXf4qtyDBPWp1Kq7tLMNUy7i5z0+r CRq1/q0A8+CNIiKDpk3g91gv8A0SQ69+uQowmhPcY8EYhJo71vSN53V7jQAKIu//AJBvHJh+PaDy 4H3JlQmunk7yhy66TwoUPE5vt4ZHHCq4URbf0xL1XG+sdHC8X5nlqPC2c+KhhZXmCfaxXSiPyICI VMlwmxU2OgE+j9lwUaNaA76xr4Ja/5+hzDdd0oPbjcyW303WnRgvJk3yJxkkCSnrDYDglQgvNNHr SMGpjCMah7lh/A8jgc8k+g8AAA== --=-=-= Content-Type: text/plain here in this bug report about In GNU Emacs 24.5.1 (i686-pc-linux-gnu, GTK+ Version 3.21.5) of 2016-09-06 on x86-csail-01, modified by Debian. --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Mon Oct 31 21:39:25 2016 Received: (at 24831) by debbugs.gnu.org; 1 Nov 2016 01:39:25 +0000 Received: from localhost ([127.0.0.1]:38437 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1O2v-0004eu-GQ for submit@debbugs.gnu.org; Mon, 31 Oct 2016 21:39:25 -0400 Received: from mail-hampton.hostforweb.net ([205.234.186.191]:52127 helo=hampton.hostforweb.net) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1O2t-0004eg-EK for 24831@debbugs.gnu.org; Mon, 31 Oct 2016 21:39:23 -0400 Received: from s70.gtokyofl21.vectant.ne.jp ([202.215.75.70]:60000 helo=localhost) by hampton.hostforweb.net with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from ) id 1c1O2l-003C2n-Cc; Mon, 31 Oct 2016 20:39:16 -0500 Date: Tue, 01 Nov 2016 10:39:12 +0900 Message-ID: From: Katsumi Yamaoka To: 24831@debbugs.gnu.org Subject: Re: bug#24831: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> <87shrd6xsp.fsf_-_@jidanni.org> Organization: Emacsen advocacy group X-Face: #kKnN,xUnmKia.'[pp`; Omh}odZK)?7wQSl"4o04=EixTF+V[""w~iNbM9ZL+.b*_CxUmFk B#Fu[*?MZZH@IkN:!"\w%I_zt>[$nm7nQosZ<3eu; B:$Q_:p!',P.c0-_Cy[dz4oIpw0ESA^D*1Lw= L&i*6&( User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (i686-pc-cygwin) Cancel-Lock: sha1:jDVZ/wkVIBJBYzcFf5+nCLnV/c0= MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-OutGoing-Spam-Status: No, score=-2.9 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - hampton.hostforweb.net X-AntiAbuse: Original Domain - debbugs.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jpl.org X-Get-Message-Sender-Via: hampton.hostforweb.net: authenticated_id: yamaoka/from_h X-Authenticated-Sender: hampton.hostforweb.net: yamaoka@jpl.org X-Source: X-Source-Args: X-Source-Dir: X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 24831 Cc: jidanni@jidanni.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Mon, 31 Oct 2016 10:45:58 +0800, Dan Jacobson wrote: > Gentelmen, the "shr" program is mangling messages. I found the cause of the problem that shr does not display the "Hi, you have a new email..." statement contained in the example message. That is, the message has a table in which the td element is omitted or lost. Here is a simplified html form (try `M-x shr-render-region RET' on it): --8<---------------cut here---------------start------------->8---
Hi, you have a new email
--8<---------------cut here---------------end--------------->8--- > True, the HTML might not be perfect, but at least Chromium, Firefox, > etc. show it fine. Yes, what is bad is the html message, but shr should show it. From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 01 05:59:53 2016 Received: (at 24831) by debbugs.gnu.org; 1 Nov 2016 09:59:53 +0000 Received: from localhost ([127.0.0.1]:38566 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1VrE-00083R-QS for submit@debbugs.gnu.org; Tue, 01 Nov 2016 05:59:52 -0400 Received: from mail-hampton.hostforweb.net ([205.234.186.191]:52121 helo=hampton.hostforweb.net) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1VrD-00083C-7A for 24831@debbugs.gnu.org; Tue, 01 Nov 2016 05:59:51 -0400 Received: from s70.gtokyofl21.vectant.ne.jp ([202.215.75.70]:60000 helo=localhost) by hampton.hostforweb.net with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from ) id 1c1Vr5-002sdI-AN; Tue, 01 Nov 2016 04:59:44 -0500 Date: Tue, 01 Nov 2016 18:59:39 +0900 Message-ID: From: Katsumi Yamaoka To: 24831@debbugs.gnu.org Subject: Re: bug#24831: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> <87shrd6xsp.fsf_-_@jidanni.org> Organization: Emacsen advocacy group X-Face: #kKnN,xUnmKia.'[pp`; Omh}odZK)?7wQSl"4o04=EixTF+V[""w~iNbM9ZL+.b*_CxUmFk B#Fu[*?MZZH@IkN:!"\w%I_zt>[$nm7nQosZ<3eu; B:$Q_:p!',P.c0-_Cy[dz4oIpw0ESA^D*1Lw= L&i*6&( User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (i686-pc-cygwin) Cancel-Lock: sha1:eVqpKdXB1w736zFoAlVOIf3e97E= MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-OutGoing-Spam-Status: No, score=-2.9 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - hampton.hostforweb.net X-AntiAbuse: Original Domain - debbugs.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jpl.org X-Get-Message-Sender-Via: hampton.hostforweb.net: authenticated_id: yamaoka/from_h X-Authenticated-Sender: hampton.hostforweb.net: yamaoka@jpl.org X-Source: X-Source-Args: X-Source-Dir: X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 24831 Cc: jidanni@jidanni.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Tue, 01 Nov 2016 10:39:12 +0900, Katsumi Yamaoka wrote: > I found the cause of the problem that shr does not display the > "Hi, you have a new email..." > statement contained in the example message. That is, the message > has a table in which the td element is omitted or lost. I tried fixing it. A patch is below. But I feel it somewhat awkward, so I hope Lars or someone will review it. My patch simply adds the missing td tag as follows: (table nil (tr nil contents)) =E2=86=93 (table nil (tr nil (td nil contents))) Thanks. --=-=-= Content-Type: text/x-patch Content-Disposition: inline --- shr.el~ 2016-11-01 02:35:57.788777000 +0000 +++ shr.el 2016-11-01 09:51:32.251984400 +0000 @@ -1759,6 +1759,7 @@ ;; we then render everything again with the new widths, and finally ;; insert all these boxes into the main buffer. (defun shr-tag-table-1 (dom) + (shr-add-missing-td dom) (setq dom (or (dom-child-by-tag dom 'tbody) dom)) (let* ((shr-inhibit-images t) (shr-table-depth (1+ shr-table-depth)) @@ -1787,6 +1788,19 @@ ;; Then render the table again with these new "hard" widths. (shr-insert-table (shr-make-table dom sketch-widths t) sketch-widths))) +(defun shr-add-missing-td (dom) + "Add missing td tag to table." + (let (tr td) + (dolist (elem (dom-children dom)) + (when (eq (car-safe elem) 'tr) + (setq tr elem + td nil + elem (cddr elem)) + (while (and (not td) elem) + (setq td (eq (car-safe (pop elem)) 'td))) + (unless td + (setcdr (cdr tr) (list (cons 'td (cons nil (cddr tr)))))))))) + (defun shr-table-body (dom) (let ((tbodies (seq-filter (lambda (child) (eq (dom-tag child) 'tbody)) --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 01 06:09:00 2016 Received: (at 24831) by debbugs.gnu.org; 1 Nov 2016 10:09:00 +0000 Received: from localhost ([127.0.0.1]:38570 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1W04-0008HS-Ln for submit@debbugs.gnu.org; Tue, 01 Nov 2016 06:09:00 -0400 Received: from hermes.netfonds.no ([80.91.224.195]:36652) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1W03-0008HJ-0n for 24831@debbugs.gnu.org; Tue, 01 Nov 2016 06:08:59 -0400 Received: from cm-84.215.1.64.getinternet.no ([84.215.1.64] helo=stories) by hermes.netfonds.no with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1c1Vzu-0003nv-JY; Tue, 01 Nov 2016 11:08:52 +0100 From: Lars Ingebrigtsen To: Katsumi Yamaoka Subject: Re: bug#24831: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> <87shrd6xsp.fsf_-_@jidanni.org> Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAGFBMVEVOyPtUyfpNxflv0fZZ k61Ox/tOxvpQwPFXvLhHAAACb0lEQVQ4jW1UTXPaMBDdThFcsUnIOaI4Z2r3TooSriqZVa5GiZSr ARP9/T7J2Ek72WH8sU/v7Up+Cx1lSCGlLAppvezijo63szLLslm5KstylV1CSjqWn2I1PEi6LfNb 7/dZNt/vp4TFNMZrKWnx01qvhdFCUx/Ms5wWuTSFY+Gc4C4tNC+fEmCRZwaWOJrw8j2nU+52kWBY c6cWFyXAGzwZx+4iBQ4nqcLVSPMHgPv3VMM5qh2UtO4B0TE86yjEg5QWXQ2PnMXFDRvpAUThTeGX rFNXOkqdcrv0zvu3MMRDKn6aI13YEKY8qbG+bagZpIwPBxI0WhONVE1nMK4BFPwcmjH6Ha0n1aQi MJJULn3Qftw2YrQdqVfV9MBcFsHTRL3zqzqpVg01EjBuVUCyVcgnRqxhAXikUkzR2Jl1VyMCVak2 qt0iPaGzSRucWx/2bQX9KFPTTrwzp50XPtxUCoSgiKofit8dLy4AKrRqWwLA/fjg7EuUAlC12+xK qfUEjW22j96lGt4ENbrPru7bLdpq2+2j3b0kM3BQszVdrWPLlWqef0uZAENhU639pIlAe3gLuZSp uBcoHjfWbtQjhYORF8Do8G0KzxAauwlnYSNwE9t1Ido1fqO7MHVpQhaXY+/joE03Nx2D++/dGDb2 A7AORoJN4Tg4spu1TsrB5gaGNZycZHugcFoQe/w0G8zE7qN49BnD9BgAY90nRp385zFTDn1/AsAQ gsDQMDpHqcshIg8EM6hx1aJw5umaFnPfDZhIYyDiIWDUAPyi/0N047wa5rtjRsqfckf78qvAX0Y9 +wrIJS3/kdfCxRo7b/8C+Dr13LXWZP0AAAAASUVORK5CYII= Date: Tue, 01 Nov 2016 11:06:17 +0100 In-Reply-To: (Katsumi Yamaoka's message of "Tue, 01 Nov 2016 18:59:39 +0900") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 24831 Cc: 24831@debbugs.gnu.org, jidanni@jidanni.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Katsumi Yamaoka writes: > I tried fixing it. A patch is below. But I feel it somewhat > awkward, so I hope Lars or someone will review it. My patch > simply adds the missing td tag as follows: > > (table nil (tr nil contents)) > =E2=86=93 > (table nil (tr nil (td nil contents))) I'm not sure I think it's worth trying to work around invalid HTML to this extent. In addition, other browsers do not correct "missing" TDs in this way: Instead they typically render non-TD/TH nodes before the table, which I think might be a better idea. --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 01 06:15:44 2016 Received: (at 24831) by debbugs.gnu.org; 1 Nov 2016 10:15:44 +0000 Received: from localhost ([127.0.0.1]:38574 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1W6W-0008R3-CX for submit@debbugs.gnu.org; Tue, 01 Nov 2016 06:15:44 -0400 Received: from hermes.netfonds.no ([80.91.224.195]:36869) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1W6R-0008Qr-FG for 24831@debbugs.gnu.org; Tue, 01 Nov 2016 06:15:39 -0400 Received: from cm-84.215.1.64.getinternet.no ([84.215.1.64] helo=stories) by hermes.netfonds.no with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1c1W6M-00042Y-S1; Tue, 01 Nov 2016 11:15:34 +0100 From: Lars Ingebrigtsen To: Katsumi Yamaoka Subject: Re: bug#24831: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> <87shrd6xsp.fsf_-_@jidanni.org> Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAGFBMVEVOyPtUyfpNxflv0fZZ k61Ox/tOxvpQwPFXvLhHAAACb0lEQVQ4jW1UTXPaMBDdThFcsUnIOaI4Z2r3TooSriqZVa5GiZSr ARP9/T7J2Ek72WH8sU/v7Up+Cx1lSCGlLAppvezijo63szLLslm5KstylV1CSjqWn2I1PEi6LfNb 7/dZNt/vp4TFNMZrKWnx01qvhdFCUx/Ms5wWuTSFY+Gc4C4tNC+fEmCRZwaWOJrw8j2nU+52kWBY c6cWFyXAGzwZx+4iBQ4nqcLVSPMHgPv3VMM5qh2UtO4B0TE86yjEg5QWXQ2PnMXFDRvpAUThTeGX rFNXOkqdcrv0zvu3MMRDKn6aI13YEKY8qbG+bagZpIwPBxI0WhONVE1nMK4BFPwcmjH6Ha0n1aQi MJJULn3Qftw2YrQdqVfV9MBcFsHTRL3zqzqpVg01EjBuVUCyVcgnRqxhAXikUkzR2Jl1VyMCVak2 qt0iPaGzSRucWx/2bQX9KFPTTrwzp50XPtxUCoSgiKofit8dLy4AKrRqWwLA/fjg7EuUAlC12+xK qfUEjW22j96lGt4ENbrPru7bLdpq2+2j3b0kM3BQszVdrWPLlWqef0uZAENhU639pIlAe3gLuZSp uBcoHjfWbtQjhYORF8Do8G0KzxAauwlnYSNwE9t1Ido1fqO7MHVpQhaXY+/joE03Nx2D++/dGDb2 A7AORoJN4Tg4spu1TsrB5gaGNZycZHugcFoQe/w0G8zE7qN49BnD9BgAY90nRp385zFTDn1/AsAQ gsDQMDpHqcshIg8EM6hx1aJw5umaFnPfDZhIYyDiIWDUAPyi/0N047wa5rtjRsqfckf78qvAX0Y9 +wrIJS3/kdfCxRo7b/8C+Dr13LXWZP0AAAAASUVORK5CYII= Date: Tue, 01 Nov 2016 11:12:58 +0100 In-Reply-To: (Lars Ingebrigtsen's message of "Tue, 01 Nov 2016 11:06:17 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 24831 Cc: 24831@debbugs.gnu.org, jidanni@jidanni.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Lars Ingebrigtsen writes: > I'm not sure I think it's worth trying to work around invalid HTML to > this extent. Besides, there's often lots of empty space text nodes interspersed, aren't there?
... will have a node with "\n " before the TD node, I think? Those text nodes are supposed to be ignored. I'd prefer just to close this bug with a WONTFIX. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 01 07:22:57 2016 Received: (at 24831) by debbugs.gnu.org; 1 Nov 2016 11:22:57 +0000 Received: from localhost ([127.0.0.1]:38602 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1X9d-0001fL-Ez for submit@debbugs.gnu.org; Tue, 01 Nov 2016 07:22:57 -0400 Received: from homie.mail.dreamhost.com ([208.97.132.208]:52148 helo=homiemail-a62.g.dreamhost.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1X9b-0001fD-Ub for 24831@debbugs.gnu.org; Tue, 01 Nov 2016 07:22:56 -0400 Received: from homiemail-a62.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a62.g.dreamhost.com (Postfix) with ESMTP id 362A2634075; Tue, 1 Nov 2016 04:22:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=jidanni.org; h=from:to:cc :subject:references:date:message-id:mime-version:content-type; s=jidanni.org; bh=yz8cX0QmndDRuwMw4xUsu2xXej0=; b=BWwGnpsDVGmvZ Kjveit/EtiTrMZXdH4/Lhgla/75JtZhXEWEN78jAp2IxwGC8SVD0rosSVFCF6PCu ZRZol//Zv1vuAlA+nkKf9aAwJqW1daGIMdFKkQAWmyXaictc4PF0PYth2R0frsvO ikQLhXgZuHj3hMOIZIeY7h119ERcIw= Received: from jidanni.org (1-170-82-2.dynamic.hinet.net [1.170.82.2]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: jidanni@jidanni.org) by homiemail-a62.g.dreamhost.com (Postfix) with ESMTPSA id ED99063406C; Tue, 1 Nov 2016 04:22:50 -0700 (PDT) From: =?utf-8?B?56mN5Li55bC8?= Dan Jacobson To: Lars Ingebrigtsen Subject: Re: bug#24831: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> Date: Tue, 01 Nov 2016 19:22:47 +0800 Message-ID: <87oa1z5trs.fsf@jidanni.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 24831 Cc: Katsumi Yamaoka , 24831@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Another idea would be first run it through a validator. If valid, proceed as before. If invalid, just spew out all the text nodes of the whole document, separated by spaces. Anything is better than vital sentences going missing. From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 01 07:24:56 2016 Received: (at 24831) by debbugs.gnu.org; 1 Nov 2016 11:24:56 +0000 Received: from localhost ([127.0.0.1]:38607 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1XBY-0001ih-Ol for submit@debbugs.gnu.org; Tue, 01 Nov 2016 07:24:56 -0400 Received: from homie.mail.dreamhost.com ([208.97.132.208]:52176 helo=homiemail-a62.g.dreamhost.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1XBX-0001ia-Sa for 24831@debbugs.gnu.org; Tue, 01 Nov 2016 07:24:56 -0400 Received: from homiemail-a62.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a62.g.dreamhost.com (Postfix) with ESMTP id 242EC634079; Tue, 1 Nov 2016 04:24:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=jidanni.org; h=from:to:cc :subject:references:date:message-id:mime-version:content-type; s=jidanni.org; bh=CrnECI2fMlrf/0URehWizxOKJQk=; b=ELBcSufe0k14E IllSfGyUNo9Zqyt0/+Qg5J41GFLOh3X/SoTiWu9BGKRpN/rNO3IcVFOa+Wi0KHcw 94/2y2tGhyvTa1Cywqx7uFGhbnxK5Hv3X+/3wIYbFucRXGBYS4WFofjJqgcjmpmK Cz9IZ3gkjiUld29a4ciasnYKCMOxcY= Received: from jidanni.org (1-170-82-2.dynamic.hinet.net [1.170.82.2]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: jidanni@jidanni.org) by homiemail-a62.g.dreamhost.com (Postfix) with ESMTPSA id D80B3634075; Tue, 1 Nov 2016 04:24:54 -0700 (PDT) From: =?utf-8?B?56mN5Li55bC8?= Dan Jacobson To: Lars Ingebrigtsen Subject: Re: bug#24831: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> Date: Tue, 01 Nov 2016 19:24:51 +0800 Message-ID: <87mvhj5toc.fsf@jidanni.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 24831 Cc: Katsumi Yamaoka , 24831@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) At least print a warning, *** Invalid HTML detected, some text might be missing *** in red, which stays at the top of the message. (Not in the fleeting minibuffer.) From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 01 13:17:58 2016 Received: (at 24831) by debbugs.gnu.org; 1 Nov 2016 17:17:58 +0000 Received: from localhost ([127.0.0.1]:39257 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1chC-0006fc-Ap for submit@debbugs.gnu.org; Tue, 01 Nov 2016 13:17:58 -0400 Received: from eggs.gnu.org ([208.118.235.92]:40570) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1chA-0006fP-9S for 24831@debbugs.gnu.org; Tue, 01 Nov 2016 13:17:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c1ch4-0005i2-Aj for 24831@debbugs.gnu.org; Tue, 01 Nov 2016 13:17:51 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:41738) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c1cg9-0005Qs-Lk; Tue, 01 Nov 2016 13:16:53 -0400 Received: from rms by fencepost.gnu.org with local (Exim 4.82) (envelope-from ) id 1c1cg8-0003gg-PY; Tue, 01 Nov 2016 13:16:52 -0400 From: Richard Stallman To: =?utf-8?B?56mN5Li55bC8?= Dan Jacobson In-reply-to: <87oa1z5trs.fsf@jidanni.org> Subject: Re: bug#24831: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> <87oa1z5trs.fsf@jidanni.org> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Message-Id: Date: Tue, 01 Nov 2016 13:16:52 -0400 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -7.5 (-------) X-Debbugs-Envelope-To: 24831 Cc: larsi@gnus.org, yamaoka@jpl.org, 24831@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: rms@gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -7.5 (-------) [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Another idea would be first run it through a validator. > If valid, proceed as before. > If invalid, just spew out all the text nodes of the whole document, > separated by spaces. Do we have a validator in Emacs Lisp? Or would we run one as a child? What program is available? -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 01 14:46:10 2016 Received: (at 24831) by debbugs.gnu.org; 1 Nov 2016 18:46:10 +0000 Received: from localhost ([127.0.0.1]:39321 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1e4U-0000IL-Gy for submit@debbugs.gnu.org; Tue, 01 Nov 2016 14:46:10 -0400 Received: from hermes.netfonds.no ([80.91.224.195]:58907) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1e4P-0000H7-Hg for 24831@debbugs.gnu.org; Tue, 01 Nov 2016 14:46:05 -0400 Received: from cm-84.215.1.64.getinternet.no ([84.215.1.64] helo=stories) by hermes.netfonds.no with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1c1e4K-000736-8C; Tue, 01 Nov 2016 19:46:00 +0100 From: Lars Ingebrigtsen To: Katsumi Yamaoka Subject: Re: bug#24831: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> <87shrd6xsp.fsf_-_@jidanni.org> Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWYjovRzcwPBgcMBAQL AgMJAAFdU1IxKCj////+//4bEhL4+PeB5JKfAAACYElEQVQ4jZ3TT2/TMBQAcHegtFovLUijaJfJ 7hYpN+zsA6xJhLQTgkZIO4WBbOguWYdsKb20lylRLpygxy6daim7DCGhsn05nCZlYQsX3iWyf3m2 n/+A8B8B/gMm7Xa71amAWEG7E157zcqhJpRuLSsg1hhvNR7CRKOcgub0PkR1LpgHAGj/+hvGC0YZ BVm0t6cleCI549xbiVa7vYMNgwuqMjQNAMZB7fkazgzVz9XvAx4gAnXeyeELMiQVFGzfqqmICjzK YYNAmXKvNo187WQTIcibK4jOiC4kp80w9rXTuAW2i8kjREQgZAYzwJZ3dcQIJSIxjEYY+bW63ykB hEMLm1dh6LW0ZNEsZWATm/0XCkB9CIW3ruNyV7eI0z8IJ+9AfQDTRBZ1fHawY0G0B9gOpTpOIbzI 4b1pu31opLLb475xZkC4l8MHBe4bwxBzm7P5AEOo5/DYdFXgNFXlDBK1I6sMtf1PHdXv9NI0lYFu EALhLM/4sZ9luHqQSiiHSG3WVQ7fdlX34aV7KAIpGB1C+DGHczW362uL7pEQUh0UM4o6xqbjvgXa py2nJ0QqeI0VECO3/wponbFpKxkKwZfF0R677ktKp2PbIkSoiWZ/bsnua079cGLaFjSlCE7XcI6O Fuw0jOe2hQnWh6M1xMdET9TaH9kEEbRabXF3r700a90QgjAiRqP0DAZZa0xQBnqnBM921BLjRI1D 8N60BN9Xd2ADIgU7pYdTxFcIESGzhxCzdE7IxUMIfwZdt18FEwXWqALCzX0XN6rgxrT0ZRWMu72T 7Bvdh5j5y0pYR/QbXAbhsXppCpMAAAAASUVORK5CYII= Date: Tue, 01 Nov 2016 19:43:23 +0100 In-Reply-To: (Lars Ingebrigtsen's message of "Tue, 01 Nov 2016 11:06:17 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 24831 Cc: 24831@debbugs.gnu.org, jidanni@jidanni.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Lars Ingebrigtsen writes: > In addition, other browsers do not correct "missing" TDs in this way: > Instead they typically render non-TD/TH nodes before the table, which I > think might be a better idea. And thinking about it a bit more, I think that would perhaps be the most likely solution for shr, too. That is, `shr-tag-table' could, at the end there, go through and find all non-blank non-td/th elements and insert them at the end. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 02 05:50:13 2016 Received: (at 24831) by debbugs.gnu.org; 2 Nov 2016 09:50:14 +0000 Received: from localhost ([127.0.0.1]:39771 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1sBN-0004ff-6c for submit@debbugs.gnu.org; Wed, 02 Nov 2016 05:50:13 -0400 Received: from mail-hampton.hostforweb.net ([205.234.186.191]:58466 helo=hampton.hostforweb.net) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1sBH-0004f1-Uw for 24831@debbugs.gnu.org; Wed, 02 Nov 2016 05:50:07 -0400 Received: from s70.gtokyofl21.vectant.ne.jp ([202.215.75.70]:60000 helo=localhost) by hampton.hostforweb.net with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from ) id 1c1sB6-001FXN-Ik; Wed, 02 Nov 2016 04:49:57 -0500 Date: Wed, 02 Nov 2016 18:49:49 +0900 Message-ID: From: Katsumi Yamaoka To: Lars Ingebrigtsen Subject: Re: bug#24831: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> <87shrd6xsp.fsf_-_@jidanni.org> Organization: Emacsen advocacy group X-Face: #kKnN,xUnmKia.'[pp`; Omh}odZK)?7wQSl"4o04=EixTF+V[""w~iNbM9ZL+.b*_CxUmFk B#Fu[*?MZZH@IkN:!"\w%I_zt>[$nm7nQosZ<3eu; B:$Q_:p!',P.c0-_Cy[dz4oIpw0ESA^D*1Lw= L&i*6&( User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (i686-pc-cygwin) Cancel-Lock: sha1:tkmzNeRnhYwvXSp3zeOh3TKQN+M= MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-OutGoing-Spam-Status: No, score=-2.9 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - hampton.hostforweb.net X-AntiAbuse: Original Domain - debbugs.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jpl.org X-Get-Message-Sender-Via: hampton.hostforweb.net: authenticated_id: yamaoka/from_h X-Authenticated-Sender: hampton.hostforweb.net: yamaoka@jpl.org X-Source: X-Source-Args: X-Source-Dir: X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 24831 Cc: 24831@debbugs.gnu.org, jidanni@jidanni.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Tue, 01 Nov 2016 19:43:23 +0100, Lars Ingebrigtsen wrote: > Lars Ingebrigtsen writes: >> In addition, other browsers do not correct "missing" TDs in this way: >> Instead they typically render non-TD/TH nodes before the table, which I >> think might be a better idea. > And thinking about it a bit more, I think that would perhaps be the most > likely solution for shr, too. That is, `shr-tag-table' could, at the > end there, go through and find all non-blank non-td/th elements and > insert them at the end. Thanks. I'm trying it but not succeeded yet though, I think I understand what I should do. The function for it should gather only those extra elements, that are parts of a table tag of which the parent (of the parent ...) table has no TD/TH tag. It's a good brain teaser. :) From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 04 03:19:24 2016 Received: (at 24831) by debbugs.gnu.org; 4 Nov 2016 07:19:24 +0000 Received: from localhost ([127.0.0.1]:42796 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c2Yma-0007Ke-IQ for submit@debbugs.gnu.org; Fri, 04 Nov 2016 03:19:24 -0400 Received: from mail-hampton.hostforweb.net ([205.234.186.191]:50462 helo=hampton.hostforweb.net) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c2YmZ-0007KQ-Jk for 24831@debbugs.gnu.org; Fri, 04 Nov 2016 03:19:23 -0400 Received: from s70.gtokyofl21.vectant.ne.jp ([202.215.75.70]:60000 helo=localhost) by hampton.hostforweb.net with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from ) id 1c2YmR-000fRq-76; Fri, 04 Nov 2016 02:19:16 -0500 Date: Fri, 04 Nov 2016 16:19:12 +0900 Message-ID: From: Katsumi Yamaoka To: Lars Ingebrigtsen Subject: Re: bug#24831: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> <87shrd6xsp.fsf_-_@jidanni.org> Organization: Emacsen advocacy group X-Face: #kKnN,xUnmKia.'[pp`; Omh}odZK)?7wQSl"4o04=EixTF+V[""w~iNbM9ZL+.b*_CxUmFk B#Fu[*?MZZH@IkN:!"\w%I_zt>[$nm7nQosZ<3eu; B:$Q_:p!',P.c0-_Cy[dz4oIpw0ESA^D*1Lw= L&i*6&( User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (i686-pc-cygwin) Cancel-Lock: sha1:T98tGkD7IYAVyQ3D/A0ZMuw+wqM= MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-OutGoing-Spam-Status: No, score=-2.9 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - hampton.hostforweb.net X-AntiAbuse: Original Domain - debbugs.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jpl.org X-Get-Message-Sender-Via: hampton.hostforweb.net: authenticated_id: yamaoka/from_h X-Authenticated-Sender: hampton.hostforweb.net: yamaoka@jpl.org X-Source: X-Source-Args: X-Source-Dir: X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 24831 Cc: 24831@debbugs.gnu.org, jidanni@jidanni.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On Wed, 02 Nov 2016 18:49:58 +0900, Katsumi Yamaoka wrote: > On Tue, 01 Nov 2016 19:43:23 +0100, Lars Ingebrigtsen wrote: >> And thinking about it a bit more, I think that would perhaps be the most >> likely solution for shr, too. That is, `shr-tag-table' could, at the >> end there, go through and find all non-blank non-td/th elements and >> insert them at the end. > Thanks. I'm trying it but not succeeded yet though,... I did it. A patch is below. Bad things in this version I know at least are: =E3=83=BBIt does not support styles -- font, color, etc. =E3=83=BBNo way to exclude text existing outside of .... Thers is no such problems in the first version I posted. ;-) --=-=-= Content-Type: text/x-patch Content-Disposition: inline --- shr.el~ 2016-11-01 02:35:57.788777000 +0000 +++ shr.el 2016-11-04 07:17:19.789855000 +0000 @@ -1897,11 +1897,48 @@ (when (zerop shr-table-depth) (save-excursion (shr-expand-alignments start (point))) + ;; Insert also non-td/th strings excluding comments and styles. + (save-restriction + (narrow-to-region (point) (point)) + (insert (mapconcat #'identity + (shr-collect-extra-strings-in-table dom) + "\n")) + (shr-fill-lines (point-min) (point-max))) (dolist (elem (dom-by-tag dom 'object)) (shr-tag-object elem)) (dolist (elem (dom-by-tag dom 'img)) (shr-tag-img elem))))) +(defun shr-collect-extra-strings-in-table (dom &optional flags) + "Return extra strings in DOM of which the root is a table clause. +FLAGS is a cons of two flags that control whether to collect strings." + ;; If and only if the cdr is not set, the car will be set to t when + ;; a or a clause is found in the children of DOM, and reset + ;; to nil when a clause is found in the children of DOM. + ;; The cdr will be set to t when a
clause is found if the car + ;; is not set then, and will never be reset. + ;; This function collects strings if the car of FLAGS is not set. + (unless flags (setq flags (cons nil nil))) + (cl-loop for child in (dom-children dom) + if (stringp child) + when (and (not (car flags)) + (string-match "\\(?:[^\t\n\r ]+[\t\n\r ]+\\)*[^\t\n\r ]+" + child)) + collect (match-string 0 child) + end + else + unless (let ((tag (dom-tag child))) + (or (memq tag '(comment style)) + (progn + (cond ((memq tag '(td th)) + (unless (cdr flags) (setcar flags t))) + ((eq tag 'table) + (if (car flags) + (unless (cdr flags) (setcar flags nil)) + (setcdr flags t)))) + nil))) + append (shr-collect-extra-strings-in-table child flags))) + (defun shr-insert-table (table widths) (let* ((collapse (equal (cdr (assq 'border-collapse shr-stylesheet)) "collapse")) --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 04 04:54:38 2016 Received: (at 24831) by debbugs.gnu.org; 4 Nov 2016 08:54:38 +0000 Received: from localhost ([127.0.0.1]:43077 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c2aGk-00019M-KB for submit@debbugs.gnu.org; Fri, 04 Nov 2016 04:54:38 -0400 Received: from hermes.netfonds.no ([80.91.224.195]:36545) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c2aGj-00019F-MO for 24831@debbugs.gnu.org; Fri, 04 Nov 2016 04:54:38 -0400 Received: from cm-84.215.1.64.getinternet.no ([84.215.1.64] helo=stories) by hermes.netfonds.no with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1c2aGb-0001S7-4u; Fri, 04 Nov 2016 09:54:31 +0100 From: Lars Ingebrigtsen To: Katsumi Yamaoka Subject: Re: bug#24831: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> <87shrd6xsp.fsf_-_@jidanni.org> Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAElBMVEV5YUUNCAUiFg9NMyKX hW3XzL7v2mYtAAACZklEQVQ4jWWTwW7jMAxEiVD5AMm7d5ts7yrp3gVEugeo/P+/skM3aYutgThO nkWOZigS91pH672P4+dFAO4pwPEfkKrm9Bu4qGql9Asou1ZnwWWu4n+fYGWvkz4c9S7szq9PYEy1 99lb6ij3rQ3NV2r0+XN+95nktvcvSfMHUNv3+aPEfAJo8XHMdn2C/gSy09EcG4GA/tUGYKt+k1wg DWC2x04DrOacc02pE/WGz3gAldVyXmAkXm+VKB3jaABF1HMuAeAlVsQSojBJLZdCeJ86h50nyHFZ gTCGx2SSBmQQn2CRl96mR5ZwCNVcnHKRrJzQ4bjJdbpU5uquWLFld4hJR7NxxzKUE5cAxVYIcogf zV3SWxE7V5Sya2OPDTc3ZSsYBADJ+YWIdQxYhYfVNEIKUKGO/Qr5E+nu7FKilJTm953sepZS31tD eyqFF5FJfHkPcFOHzyCwhOpbwaYur0c/GpvXcYDAmsZbgU32CpOg6gx03Mn3zttShwuGYnzEJF/x cCetnS5lT1YC3L2apwiYpDav29ql1DYGxokAIkEYIiyRi8b+oKJGtJdwV03qTaLHCc7MDUAjacO0 REBUP8cVAGNwQCx0XjtCrHyO3hmtJmQNmTtmAybCgX4Ld7PsN/ITYMuN38fE8gBLfUtMrvs80KWi pVskCF0batjCmEbUSkhzK59guSH/DMvx5TElWz7HR/6wqmNWo5NjJwj1BOXFCkDBucYSbAWzTCWI y+K25bJUHHso32IFbuWyLS54AIZBhkqZ4rbEonhN0ULP/x9AcHwcxypA1MyP5gFxFBBJOGDnAfgH AE/P6kAmZyUAAAAASUVORK5CYII= Date: Fri, 04 Nov 2016 09:51:52 +0100 In-Reply-To: (Katsumi Yamaoka's message of "Fri, 04 Nov 2016 16:19:12 +0900") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 24831 Cc: 24831@debbugs.gnu.org, jidanni@jidanni.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Katsumi Yamaoka writes: > I did it. A patch is below. Great! Looks good to me. > Bad things in this version I know at least are: > > =E3=83=BBIt does not support styles -- font, color, etc. I don't think that matters very much. The HTML is invalid. > =E3=83=BBNo way to exclude text existing outside of .... Hm... I don't quite follow... --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 04 06:28:32 2016 Received: (at 24831) by debbugs.gnu.org; 4 Nov 2016 10:28:33 +0000 Received: from localhost ([127.0.0.1]:43346 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c2bjc-0003R9-PY for submit@debbugs.gnu.org; Fri, 04 Nov 2016 06:28:32 -0400 Received: from mail-hampton.hostforweb.net ([205.234.186.191]:55215 helo=hampton.hostforweb.net) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c2bja-0003Qr-M6 for 24831@debbugs.gnu.org; Fri, 04 Nov 2016 06:28:31 -0400 Received: from s70.gtokyofl21.vectant.ne.jp ([202.215.75.70]:60000 helo=localhost) by hampton.hostforweb.net with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from ) id 1c2bjQ-003yMU-Mz; Fri, 04 Nov 2016 05:28:24 -0500 Date: Fri, 04 Nov 2016 19:28:17 +0900 Message-ID: From: Katsumi Yamaoka To: Lars Ingebrigtsen Subject: Re: bug#24831: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> <87shrd6xsp.fsf_-_@jidanni.org> Organization: Emacsen advocacy group X-Face: #kKnN,xUnmKia.'[pp`; Omh}odZK)?7wQSl"4o04=EixTF+V[""w~iNbM9ZL+.b*_CxUmFk B#Fu[*?MZZH@IkN:!"\w%I_zt>[$nm7nQosZ<3eu; B:$Q_:p!',P.c0-_Cy[dz4oIpw0ESA^D*1Lw= L&i*6&( User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (i686-pc-cygwin) Cancel-Lock: sha1:3vZd236bDyDW/19JUNtnNR9AeMc= MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-OutGoing-Spam-Status: No, score=-2.9 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - hampton.hostforweb.net X-AntiAbuse: Original Domain - debbugs.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jpl.org X-Get-Message-Sender-Via: hampton.hostforweb.net: authenticated_id: yamaoka/from_h X-Authenticated-Sender: hampton.hostforweb.net: yamaoka@jpl.org X-Source: X-Source-Args: X-Source-Dir: X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 24831 Cc: 24831@debbugs.gnu.org, jidanni@jidanni.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Fri, 04 Nov 2016 09:51:52 +0100, Lars Ingebrigtsen wrote: > Katsumi Yamaoka writes: >> I did it. A patch is below. > Great! Looks good to me. Thanks! I'll commit it to master. [...] >> =E3=83=BBNo way to exclude text existing outside of .... > Hm... I don't quite follow... I found it in some mails from amazon.co.jp, but not so many and not so annoying. Here it is: ... --MuLtIpArT_BoUnDaRy-- Well, is this a reasonable operation? (with-temp-buffer (insert "FooBar") (libxml-parse-html-region (point-min) (point-max))) =3D> (html nil (body nil "Foo") (html nil (p nil "Bar"))) From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 04 07:20:01 2016 Received: (at 24831) by debbugs.gnu.org; 4 Nov 2016 11:20:01 +0000 Received: from localhost ([127.0.0.1]:43473 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c2cXR-0004j8-6b for submit@debbugs.gnu.org; Fri, 04 Nov 2016 07:20:01 -0400 Received: from hermes.netfonds.no ([80.91.224.195]:42348) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c2cXO-0004ik-9X for 24831@debbugs.gnu.org; Fri, 04 Nov 2016 07:19:59 -0400 Received: from cm-84.215.1.64.getinternet.no ([84.215.1.64] helo=stories) by hermes.netfonds.no with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1c2cXL-0005Qs-4h; Fri, 04 Nov 2016 12:19:57 +0100 From: Lars Ingebrigtsen To: Katsumi Yamaoka Subject: Re: bug#24831: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> <87shrd6xsp.fsf_-_@jidanni.org> Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAElBMVEU6KjhLOkwlGCNcS1+m h40QChKWF2rhAAACZ0lEQVQ4jU2TQZLjIAxFBcR7MJ09Ic3eRO79ePABHLp0/6vMF056WuVUynr+ khAS7bB2zdlm2I2Z90YmxJUGeJTePOVEC8Aj5xR4gD0Em8ik4FNcVJnNCZoJZMj27Ga3FQUpnoCI A7HN9/q5P/qQnCDfWwt7s8lfNAUUJ9jpc9vbjvyhsbcajE7w59JU6GOtXGlWyQlgKwBzrWvzIcsv AOOKZ923W3mD9QU4VN73Lb9B4xM0XvSTlgBGVVw/VAW/HYDyCVpFbUiMSI+hzG8QIjkUZNjPqtgU OAVmnxPznMh6BX+zdD1gm2/7YqO3CefVSHMvA8z4bfa+2GQHaNMJFgr0sdtUxz1ElFCQwhLVGPxl s3kegMz6d7QwKVgMIQxSJHjWLdQAP1UTPNmUxi0gguNYLwRbTAiEGRlFASBLuCGvKgJKpQGyMfDj iYEQx0SPKB0ZinGUIhfniexRjLkKiRxl6rdF5HuReEUZWYqIZFHr90kyrqMXUYCmSRf9oLu5W3iP kjSUHFKe+pbFcSnfxR4TQvkxRomyt5QdhiQaNl7PTkJlfRzkrRxf7ipucnnRHLjhEq8946rlmqQc pcvUUS9yQiOiAD684V/IJ03rMeo0dflyfX5OXzLfqMfsu4oEOZ6a4zlx94KmobvYjmHoFNcxL9gK Riu91vcbVCZMFSQAxuhljOaqAXgNFTw6PcYrvgGWjAIMF6q3VPlHEdQfY0XC+KOoxC/Aqlp/hQIb oSAwugn/FREriXLhX8fyxNc5KnvPfsTRwY0/gKsPPJKMpXJnpAHg9wbHGiCcwBuAWM3oxbmHr9zh H7CesbCxSL6cAAAAAElFTkSuQmCC Date: Fri, 04 Nov 2016 12:17:18 +0100 In-Reply-To: (Katsumi Yamaoka's message of "Fri, 04 Nov 2016 19:28:17 +0900") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 24831 Cc: 24831@debbugs.gnu.org, jidanni@jidanni.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Katsumi Yamaoka writes: > I found it in some mails from amazon.co.jp, but not so many and > not so annoying. Here it is: > > > ... > --MuLtIpArT_BoUnDaRy-- Oh, right... > Well, is this a reasonable operation? > > (with-temp-buffer > (insert "FooBar") > (libxml-parse-html-region (point-min) (point-max))) > => (html nil (body nil "Foo") (html nil (p nil "Bar"))) Yes, it's two elements after each other. In HTML, the start (and end) tags are optional. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 04 14:18:14 2016 Received: (at 24831) by debbugs.gnu.org; 4 Nov 2016 18:18:14 +0000 Received: from localhost ([127.0.0.1]:44973 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c2j49-00031A-Qx for submit@debbugs.gnu.org; Fri, 04 Nov 2016 14:18:13 -0400 Received: from mail-qk0-f169.google.com ([209.85.220.169]:34369) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c2j47-00030v-O4 for 24831@debbugs.gnu.org; Fri, 04 Nov 2016 14:18:12 -0400 Received: by mail-qk0-f169.google.com with SMTP id q130so107895733qke.1 for <24831@debbugs.gnu.org>; Fri, 04 Nov 2016 11:18:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lifelogs.com; s=google; h=from:to:cc:subject:organization:references:mail-copies-to :gmane-reply-to-list:date:in-reply-to:message-id:user-agent :mime-version; bh=wz4AC9v1p8Jn8UmMyqr707wBZ//do26hE44O+v5qZ0s=; b=OjWGIt+hus48Mh3iMag8jVW5242lWjz2PktfScckEY286/RKQxRP2iLeUUHTo/fKVe MwyMPlHXkfgba+lWeRLCLREpz7+n4IK3xlqZkRugGBT8RuL2ften04nj+G4MoHkVTwnG ilZGqXXbhTSxBi+B3QTCn/cHqU/sZcAxnbKEg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:organization:references :mail-copies-to:gmane-reply-to-list:date:in-reply-to:message-id :user-agent:mime-version; bh=wz4AC9v1p8Jn8UmMyqr707wBZ//do26hE44O+v5qZ0s=; b=AECC85sUpc/S4E46FwatR9tugwbvye9PR0zfpw69k2a6TpWTpkOnztwomBtBA/7/mg RBdxAx+gJKumDoZ9QPriRLHSGIGeX5+P2Ew+Wj0QH8caqLxivyfo+hEkmBwXwwOVUlGD +jdogZxOsGyVCKFUTMVX7fUK+3sgArehFaVrLNKtT9K05+qeODR6jdYySW1gJ8fiba+B 7X2WugHWfTAML55lSC1BF4uXHR5qMNgs0fquqOl0Ug9K4q160PNXJ4u3tcpjaPJOAD3H jz6HCDlQyUqaU7JRiPziqrSmGYjJx0xPNDoArTnVyjiFcnAdezykendXGr8O5hlXsR3H ccjg== X-Gm-Message-State: ABUngvfpNt9NXGNpGoyUPTieYQFnmvlyLzWSETayjKLUMv+XVRJkpMbjq4r4SMYIPeu0pQ== X-Received: by 10.55.104.68 with SMTP id d65mr14330559qkc.119.1478283486340; Fri, 04 Nov 2016 11:18:06 -0700 (PDT) Received: from flea (c-98-229-60-157.hsd1.ma.comcast.net. [98.229.60.157]) by smtp.gmail.com with ESMTPSA id s23sm8240321qka.10.2016.11.04.11.18.05 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 04 Nov 2016 11:18:05 -0700 (PDT) From: Ted Zlatanov To: Richard Stallman Subject: Re: bug#24831: shr mangling messages Organization: =?utf-8?B?0KLQtdC+0LTQvtGAINCX0LvQsNGC0LDQvdC+0LI=?= @ Cienfuegos References: <87shrgvt8y.fsf@jidanni.org> <87oa1z5trs.fsf@jidanni.org> X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6; d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx" Mail-Copies-To: never Gmane-Reply-To-List: yes Date: Fri, 04 Nov 2016 14:18:03 -0400 In-Reply-To: (Richard Stallman's message of "Tue, 01 Nov 2016 13:16:52 -0400") Message-ID: <87twbn3y90.fsf@lifelogs.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 24831 Cc: larsi@gnus.org, 24831@debbugs.gnu.org, yamaoka@jpl.org, =?utf-8?B?56mN5Li55bC8?= Dan Jacobson X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.5 (/) On Tue, 01 Nov 2016 13:16:52 -0400 Richard Stallman wrote: >> Another idea would be first run it through a validator. >> If valid, proceed as before. >> If invalid, just spew out all the text nodes of the whole document, >> separated by spaces. RS> Do we have a validator in Emacs Lisp? Or would we run one as a child? RS> What program is available? IMHO validation is not a workable solution, both because of complexity and because real-world HTML authors are incredibly skilled at writing broken HTML that somehow renders in the browsers they support. Ted From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 06 18:32:23 2016 Received: (at 24831-done) by debbugs.gnu.org; 6 Nov 2016 23:32:23 +0000 Received: from localhost ([127.0.0.1]:47245 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c3WvH-0001ML-6k for submit@debbugs.gnu.org; Sun, 06 Nov 2016 18:32:23 -0500 Received: from mail-hampton.hostforweb.net ([205.234.186.191]:58077 helo=hampton.hostforweb.net) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c3WvE-0001M6-Lz for 24831-done@debbugs.gnu.org; Sun, 06 Nov 2016 18:32:21 -0500 Received: from s70.gtokyofl21.vectant.ne.jp ([202.215.75.70]:60000 helo=localhost) by hampton.hostforweb.net with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from ) id 1c3Wv3-0018X5-5r; Sun, 06 Nov 2016 17:32:12 -0600 Date: Mon, 07 Nov 2016 08:32:06 +0900 Message-ID: From: Katsumi Yamaoka To: Lars Ingebrigtsen Subject: Re: bug#24831: shr mangling messages References: <87shrgvt8y.fsf@jidanni.org> <87shrd6xsp.fsf_-_@jidanni.org> Organization: Emacsen advocacy group X-Face: #kKnN,xUnmKia.'[pp`; Omh}odZK)?7wQSl"4o04=EixTF+V[""w~iNbM9ZL+.b*_CxUmFk B#Fu[*?MZZH@IkN:!"\w%I_zt>[$nm7nQosZ<3eu; B:$Q_:p!',P.c0-_Cy[dz4oIpw0ESA^D*1Lw= L&i*6&( User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (i686-pc-cygwin) Cancel-Lock: sha1:q7dbBmimVK5Mkpkp/AfktfjszG8= MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-OutGoing-Spam-Status: No, score=-2.9 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - hampton.hostforweb.net X-AntiAbuse: Original Domain - debbugs.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - jpl.org X-Get-Message-Sender-Via: hampton.hostforweb.net: authenticated_id: yamaoka/from_h X-Authenticated-Sender: hampton.hostforweb.net: yamaoka@jpl.org X-Source: X-Source-Args: X-Source-Dir: X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 24831-done Cc: jidanni@jidanni.org, 24831-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Fri, 04 Nov 2016 12:17:18 +0100, Lars Ingebrigtsen wrote: > Katsumi Yamaoka writes: >> Well, is this a reasonable operation? >> >> (with-temp-buffer >> (insert "FooBar") >> (libxml-parse-html-region (point-min) (point-max))) >> => (html nil (body nil "Foo") (html nil (p nil "Bar"))) > Yes, it's two elements after each other. In HTML, the > start (and end) tags are optional. I see. But I'm sorry for my confusion; that extra text appearing is not due to my change. So, I'm closing this bug. Thanks. From unknown Sat Aug 16 00:33:50 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 05 Dec 2016 12:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator