GNU bug report logs - #24831
shr mangling messages

Previous Next

Package: emacs;

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Mon, 31 Oct 2016 02:47:02 UTC

Severity: normal

Done: Katsumi Yamaoka <yamaoka <at> jpl.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 24831 in the body.
You can then email your comments to 24831 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Mon, 31 Oct 2016 02:47:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 31 Oct 2016 02:47:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: bug-gnu-emacs <bug-gnu-emacs <at> gnu.org>
Cc: Katsumi Yamaoka <yamaoka <at> jpl.org>
Subject: shr mangling messages
Date: Mon, 31 Oct 2016 10:45:58 +0800
[Message part 1 (text/plain, inline)]
Gentelmen, the "shr" program is mangling messages.

It could remove vital words, causing arguments:

"I did include the address!" "No you didn't." "Yes I did. Your mail
reader probably cut it out."

We're talking data loss here. It may still be on the disk, but not shown
to the user.

True, the HTML might not be perfect, but at least Chromium, Firefox,
etc. show it fine.

>>>>> "KY" == Katsumi Yamaoka <yamaoka <at> jpl.org> writes:

KY> Emacs-w3m renders it as:

KY>  http://w                                                                     
KY>           Hi, you have a new email from Catherineme                           
KY>  [25]                                                                         
KY>           View your inbox at http://www.travel-buddies.com/Inbox.aspx         
                                                                              
KY>  © Travel Buddies 2015 | All rights reserved                                  

Hmmm, w3m -dump on the attachment shows the first URL in full.

KY> However shr renders it as:

KY> Travel Buddies

    
KY>   © Travel Buddies 2015 | All rights reserved    
KY> http://www.travel-buddies.com/ 
KY> * 


KY> There lacks the "Hi, you have a new mail" message.  The return
KY> value of `libxml-parse-html-region' contains the message as

KY> (h1 nil (span nil "Hi, you have a new email from") "Catherineme")
KY> (p nil "View your inbox at "
KY>    (a ((href . "http://www.travel-buddies.com/Inbox.aspx"))
KY>       "http://www.travel-buddies.com/Inbox.aspx"))

KY> regardless of whether all style specs are removed[1] or not
KY> (three nil portions above are replaced with style specs if they
KY>  are not removed).  So, style specs are not cause of not
KY> displaying some meaningful message in an html mail, I believe.
KY> In that case, making shr display images does not help.

KY> I think there's something wrong in shr.el, and what you should
KY> do would be to send a bug report to the Emacs bug team, i.e.,
KY> M-x report-emacs-bug, with the sample html part (I'm not so
KY> familiar with recent shr, sorry).  Note that a mail containing
KY> html part might be rejected by the server, so putting it in your
KY> web site separately would be better.

KY> [1] I tested it by modifying mm-shr so as to remove style specs.

OK I'll send the message,

[SHRcutOFFmessage.gz (application/gzip, attachment)]
[Message part 3 (text/plain, inline)]
here in this bug report about In GNU Emacs 24.5.1 (i686-pc-linux-gnu,
GTK+ Version 3.21.5) of 2016-09-06 on x86-csail-01, modified by Debian.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Tue, 01 Nov 2016 01:40:02 GMT) Full text and rfc822 format available.

Message #8 received at 24831 <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: 24831 <at> debbugs.gnu.org
Cc: jidanni <at> jidanni.org
Subject: Re: bug#24831: shr mangling messages
Date: Tue, 01 Nov 2016 10:39:12 +0900
On Mon, 31 Oct 2016 10:45:58 +0800, Dan Jacobson wrote:
> Gentelmen, the "shr" program is mangling messages.

I found the cause of the problem that shr does not display the
"Hi, you have a new email..."
statement contained in the example message.  That is, the message
has a table in which the td element is omitted or lost.  Here is
a simplified html form (try `M-x shr-render-region RET' on it):

--8<---------------cut here---------------start------------->8---
<html>
  <body>
    <table>
      <tr>
<!--td-->
        <table>
          <tr>
            <td>
              Hi, you have a new email
            </td>
          </tr>
        </table>
<!--/td-->
      </tr>
    </table>
  </body>
</html>
--8<---------------cut here---------------end--------------->8---

> True, the HTML might not be perfect, but at least Chromium, Firefox,
> etc. show it fine.

Yes, what is bad is the html message, but shr should show it.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Tue, 01 Nov 2016 10:00:02 GMT) Full text and rfc822 format available.

Message #11 received at 24831 <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: 24831 <at> debbugs.gnu.org
Cc: jidanni <at> jidanni.org
Subject: Re: bug#24831: shr mangling messages
Date: Tue, 01 Nov 2016 18:59:39 +0900
[Message part 1 (text/plain, inline)]
On Tue, 01 Nov 2016 10:39:12 +0900, Katsumi Yamaoka wrote:
> I found the cause of the problem that shr does not display the
> "Hi, you have a new email..."
> statement contained in the example message.  That is, the message
> has a table in which the td element is omitted or lost.

I tried fixing it.  A patch is below.  But I feel it somewhat
awkward, so I hope Lars or someone will review it.  My patch
simply adds the missing td tag as follows:

(table nil (tr nil contents))
	↓
(table nil (tr nil (td nil contents)))

Thanks.
[Message part 2 (text/x-patch, inline)]
--- shr.el~	2016-11-01 02:35:57.788777000 +0000
+++ shr.el	2016-11-01 09:51:32.251984400 +0000
@@ -1759,6 +1759,7 @@
 ;; we then render everything again with the new widths, and finally
 ;; insert all these boxes into the main buffer.
 (defun shr-tag-table-1 (dom)
+  (shr-add-missing-td dom)
   (setq dom (or (dom-child-by-tag dom 'tbody) dom))
   (let* ((shr-inhibit-images t)
 	 (shr-table-depth (1+ shr-table-depth))
@@ -1787,6 +1788,19 @@
     ;; Then render the table again with these new "hard" widths.
     (shr-insert-table (shr-make-table dom sketch-widths t) sketch-widths)))
 
+(defun shr-add-missing-td (dom)
+  "Add missing td tag to table."
+  (let (tr td)
+    (dolist (elem (dom-children dom))
+      (when (eq (car-safe elem) 'tr)
+	(setq tr elem
+	      td nil
+	      elem (cddr elem))
+	(while (and (not td) elem)
+	  (setq td (eq (car-safe (pop elem)) 'td)))
+	(unless td
+	  (setcdr (cdr tr) (list (cons 'td (cons nil (cddr tr))))))))))
+
 (defun shr-table-body (dom)
   (let ((tbodies (seq-filter (lambda (child)
                                (eq (dom-tag child) 'tbody))

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Tue, 01 Nov 2016 10:09:01 GMT) Full text and rfc822 format available.

Message #14 received at 24831 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Katsumi Yamaoka <yamaoka <at> jpl.org>
Cc: 24831 <at> debbugs.gnu.org, jidanni <at> jidanni.org
Subject: Re: bug#24831: shr mangling messages
Date: Tue, 01 Nov 2016 11:06:17 +0100
Katsumi Yamaoka <yamaoka <at> jpl.org> writes:

> I tried fixing it.  A patch is below.  But I feel it somewhat
> awkward, so I hope Lars or someone will review it.  My patch
> simply adds the missing td tag as follows:
>
> (table nil (tr nil contents))
> 	↓
> (table nil (tr nil (td nil contents)))

I'm not sure I think it's worth trying to work around invalid HTML to
this extent.

In addition, other browsers do not correct "missing" TDs in this way:
Instead they typically render non-TD/TH nodes before the table, which I
think might be a better idea.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Tue, 01 Nov 2016 10:16:01 GMT) Full text and rfc822 format available.

Message #17 received at 24831 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Katsumi Yamaoka <yamaoka <at> jpl.org>
Cc: 24831 <at> debbugs.gnu.org, jidanni <at> jidanni.org
Subject: Re: bug#24831: shr mangling messages
Date: Tue, 01 Nov 2016 11:12:58 +0100
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> I'm not sure I think it's worth trying to work around invalid HTML to
> this extent.

Besides, there's often lots of empty space text nodes interspersed,
aren't there?

<table>
<tr>
  <td>
  ...

will have a node with "\n  " before the TD node, I think?  Those text
nodes are supposed to be ignored.

I'd prefer just to close this bug with a WONTFIX.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Tue, 01 Nov 2016 11:23:01 GMT) Full text and rfc822 format available.

Message #20 received at 24831 <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: Katsumi Yamaoka <yamaoka <at> jpl.org>, 24831 <at> debbugs.gnu.org
Subject: Re: bug#24831: shr mangling messages
Date: Tue, 01 Nov 2016 19:22:47 +0800
Another idea would be first run it through a validator.
If valid, proceed as before.
If invalid, just spew out all the text nodes of the whole document,
separated by spaces.

Anything is better than vital sentences going missing.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Tue, 01 Nov 2016 11:25:02 GMT) Full text and rfc822 format available.

Message #23 received at 24831 <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: Katsumi Yamaoka <yamaoka <at> jpl.org>, 24831 <at> debbugs.gnu.org
Subject: Re: bug#24831: shr mangling messages
Date: Tue, 01 Nov 2016 19:24:51 +0800
At least print a warning,
*** Invalid HTML detected, some text might be missing ***
in red, which stays at the top of the message. (Not in the fleeting minibuffer.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Tue, 01 Nov 2016 17:18:02 GMT) Full text and rfc822 format available.

Message #26 received at 24831 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Cc: larsi <at> gnus.org, yamaoka <at> jpl.org, 24831 <at> debbugs.gnu.org
Subject: Re: bug#24831: shr mangling messages
Date: Tue, 01 Nov 2016 13:16:52 -0400
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Another idea would be first run it through a validator.
  > If valid, proceed as before.
  > If invalid, just spew out all the text nodes of the whole document,
  > separated by spaces.

Do we have a validator in Emacs Lisp?  Or would we run one as a child?
What program is available?

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Tue, 01 Nov 2016 18:47:01 GMT) Full text and rfc822 format available.

Message #29 received at 24831 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Katsumi Yamaoka <yamaoka <at> jpl.org>
Cc: 24831 <at> debbugs.gnu.org, jidanni <at> jidanni.org
Subject: Re: bug#24831: shr mangling messages
Date: Tue, 01 Nov 2016 19:43:23 +0100
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> In addition, other browsers do not correct "missing" TDs in this way:
> Instead they typically render non-TD/TH nodes before the table, which I
> think might be a better idea.

And thinking about it a bit more, I think that would perhaps be the most
likely solution for shr, too.  That is, `shr-tag-table' could, at the
end there, go through and find all non-blank non-td/th elements and
insert them at the end.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Wed, 02 Nov 2016 09:51:02 GMT) Full text and rfc822 format available.

Message #32 received at 24831 <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 24831 <at> debbugs.gnu.org, jidanni <at> jidanni.org
Subject: Re: bug#24831: shr mangling messages
Date: Wed, 02 Nov 2016 18:49:49 +0900
On Tue, 01 Nov 2016 19:43:23 +0100, Lars Ingebrigtsen wrote:
> Lars Ingebrigtsen <larsi <at> gnus.org> writes:

>> In addition, other browsers do not correct "missing" TDs in this way:
>> Instead they typically render non-TD/TH nodes before the table, which I
>> think might be a better idea.

> And thinking about it a bit more, I think that would perhaps be the most
> likely solution for shr, too.  That is, `shr-tag-table' could, at the
> end there, go through and find all non-blank non-td/th elements and
> insert them at the end.

Thanks.  I'm trying it but not succeeded yet though, I think I
understand what I should do.  The function for it should gather
only those extra elements, that are parts of a table tag of
which the parent (of the parent ...) table has no TD/TH tag.
It's a good brain teaser. :)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Wed, 02 Nov 2016 09:51:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Fri, 04 Nov 2016 07:20:01 GMT) Full text and rfc822 format available.

Message #38 received at 24831 <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 24831 <at> debbugs.gnu.org, jidanni <at> jidanni.org
Subject: Re: bug#24831: shr mangling messages
Date: Fri, 04 Nov 2016 16:19:12 +0900
[Message part 1 (text/plain, inline)]
On Wed, 02 Nov 2016 18:49:58 +0900, Katsumi Yamaoka wrote:
> On Tue, 01 Nov 2016 19:43:23 +0100, Lars Ingebrigtsen wrote:
>> And thinking about it a bit more, I think that would perhaps be the most
>> likely solution for shr, too.  That is, `shr-tag-table' could, at the
>> end there, go through and find all non-blank non-td/th elements and
>> insert them at the end.

> Thanks.  I'm trying it but not succeeded yet though,...

I did it.  A patch is below.  Bad things in this version I know
at least are:

・It does not support styles -- font, color, etc.
・No way to exclude text existing outside of <html>...</html>.

Thers is no such problems in the first version I posted. ;-)

[Message part 2 (text/x-patch, inline)]
--- shr.el~	2016-11-01 02:35:57.788777000 +0000
+++ shr.el	2016-11-04 07:17:19.789855000 +0000
@@ -1897,11 +1897,48 @@
     (when (zerop shr-table-depth)
       (save-excursion
 	(shr-expand-alignments start (point)))
+      ;; Insert also non-td/th strings excluding comments and styles.
+      (save-restriction
+	(narrow-to-region (point) (point))
+	(insert (mapconcat #'identity
+			   (shr-collect-extra-strings-in-table dom)
+			   "\n"))
+	(shr-fill-lines (point-min) (point-max)))
       (dolist (elem (dom-by-tag dom 'object))
 	(shr-tag-object elem))
       (dolist (elem (dom-by-tag dom 'img))
 	(shr-tag-img elem)))))
 
+(defun shr-collect-extra-strings-in-table (dom &optional flags)
+  "Return extra strings in DOM of which the root is a table clause.
+FLAGS is a cons of two flags that control whether to collect strings."
+  ;; If and only if the cdr is not set, the car will be set to t when
+  ;; a <td> or a <th> clause is found in the children of DOM, and reset
+  ;; to nil when a <table> clause is found in the children of DOM.
+  ;; The cdr will be set to t when a <table> clause is found if the car
+  ;; is not set then, and will never be reset.
+  ;; This function collects strings if the car of FLAGS is not set.
+  (unless flags (setq flags (cons nil nil)))
+  (cl-loop for child in (dom-children dom)
+	   if (stringp child)
+	     when (and (not (car flags))
+		       (string-match "\\(?:[^\t\n\r ]+[\t\n\r ]+\\)*[^\t\n\r ]+"
+				     child))
+	       collect (match-string 0 child)
+	     end
+	   else
+	     unless (let ((tag (dom-tag child)))
+		      (or (memq tag '(comment style))
+			  (progn
+			    (cond ((memq tag '(td th))
+				   (unless (cdr flags) (setcar flags t)))
+				  ((eq tag 'table)
+				   (if (car flags)
+				       (unless (cdr flags) (setcar flags nil))
+				     (setcdr flags t))))
+			    nil)))
+	       append (shr-collect-extra-strings-in-table child flags)))
+
 (defun shr-insert-table (table widths)
   (let* ((collapse (equal (cdr (assq 'border-collapse shr-stylesheet))
 			  "collapse"))

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Fri, 04 Nov 2016 07:20:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Fri, 04 Nov 2016 08:55:01 GMT) Full text and rfc822 format available.

Message #44 received at 24831 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Katsumi Yamaoka <yamaoka <at> jpl.org>
Cc: 24831 <at> debbugs.gnu.org, jidanni <at> jidanni.org
Subject: Re: bug#24831: shr mangling messages
Date: Fri, 04 Nov 2016 09:51:52 +0100
Katsumi Yamaoka <yamaoka <at> jpl.org> writes:

> I did it.  A patch is below.

Great!  Looks good to me.

> Bad things in this version I know at least are:
>
> ・It does not support styles -- font, color, etc.

I don't think that matters very much.  The HTML is invalid.

> ・No way to exclude text existing outside of <html>...</html>.

Hm...  I don't quite follow...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Fri, 04 Nov 2016 10:29:02 GMT) Full text and rfc822 format available.

Message #47 received at 24831 <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 24831 <at> debbugs.gnu.org, jidanni <at> jidanni.org
Subject: Re: bug#24831: shr mangling messages
Date: Fri, 04 Nov 2016 19:28:17 +0900
On Fri, 04 Nov 2016 09:51:52 +0100, Lars Ingebrigtsen wrote:
> Katsumi Yamaoka <yamaoka <at> jpl.org> writes:
>> I did it.  A patch is below.
> Great!  Looks good to me.

Thanks!  I'll commit it to master.

[...]
>> ・No way to exclude text existing outside of <html>...</html>.
> Hm...  I don't quite follow...

I found it in some mails from amazon.co.jp, but not so many and
not so annoying.  Here it is:

<html>
...
</html>             --MuLtIpArT_BoUnDaRy--

Well, is this a reasonable operation?

(with-temp-buffer
  (insert "<html><body>Foo</body></html>Bar")
  (libxml-parse-html-region (point-min) (point-max)))
 => (html nil (body nil "Foo") (html nil (p nil "Bar")))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Fri, 04 Nov 2016 10:29:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Fri, 04 Nov 2016 11:20:02 GMT) Full text and rfc822 format available.

Message #53 received at 24831 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Katsumi Yamaoka <yamaoka <at> jpl.org>
Cc: 24831 <at> debbugs.gnu.org, jidanni <at> jidanni.org
Subject: Re: bug#24831: shr mangling messages
Date: Fri, 04 Nov 2016 12:17:18 +0100
Katsumi Yamaoka <yamaoka <at> jpl.org> writes:

> I found it in some mails from amazon.co.jp, but not so many and
> not so annoying.  Here it is:
>
> <html>
> ...
> </html>             --MuLtIpArT_BoUnDaRy--

Oh, right...

> Well, is this a reasonable operation?
>
> (with-temp-buffer
>   (insert "<html><body>Foo</body></html>Bar")
>   (libxml-parse-html-region (point-min) (point-max)))
>  => (html nil (body nil "Foo") (html nil (p nil "Bar")))

Yes, it's two <html> elements after each other.  In HTML, the <html>
start (and end) tags are optional.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Fri, 04 Nov 2016 18:19:01 GMT) Full text and rfc822 format available.

Message #56 received at 24831 <at> debbugs.gnu.org (full text, mbox):

From: Ted Zlatanov <tzz <at> lifelogs.com>
To: Richard Stallman <rms <at> gnu.org>
Cc: larsi <at> gnus.org, 24831 <at> debbugs.gnu.org, yamaoka <at> jpl.org,
 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Subject: Re: bug#24831: shr mangling messages
Date: Fri, 04 Nov 2016 14:18:03 -0400
On Tue, 01 Nov 2016 13:16:52 -0400 Richard Stallman <rms <at> gnu.org> wrote: 

>> Another idea would be first run it through a validator.
>> If valid, proceed as before.
>> If invalid, just spew out all the text nodes of the whole document,
>> separated by spaces.

RS> Do we have a validator in Emacs Lisp?  Or would we run one as a child?
RS> What program is available?

IMHO validation is not a workable solution, both because of complexity
and because real-world HTML authors are incredibly skilled at writing
broken HTML that somehow renders in the browsers they support.

Ted




Reply sent to Katsumi Yamaoka <yamaoka <at> jpl.org>:
You have taken responsibility. (Sun, 06 Nov 2016 23:33:02 GMT) Full text and rfc822 format available.

Notification sent to 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>:
bug acknowledged by developer. (Sun, 06 Nov 2016 23:33:02 GMT) Full text and rfc822 format available.

Message #61 received at 24831-done <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: jidanni <at> jidanni.org, 24831-done <at> debbugs.gnu.org
Subject: Re: bug#24831: shr mangling messages
Date: Mon, 07 Nov 2016 08:32:06 +0900
On Fri, 04 Nov 2016 12:17:18 +0100, Lars Ingebrigtsen wrote:
> Katsumi Yamaoka <yamaoka <at> jpl.org> writes:
>> Well, is this a reasonable operation?
>>
>> (with-temp-buffer
>>   (insert "<html><body>Foo</body></html>Bar")
>>   (libxml-parse-html-region (point-min) (point-max)))
>>  => (html nil (body nil "Foo") (html nil (p nil "Bar")))

> Yes, it's two <html> elements after each other.  In HTML, the <html>
> start (and end) tags are optional.

I see.  But I'm sorry for my confusion; that extra text appearing
is not due to my change.  So, I'm closing this bug.  Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24831; Package emacs. (Sun, 06 Nov 2016 23:33:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 05 Dec 2016 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 254 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.