GNU bug report logs - #37009
EWW Gets Confused on Invalid HTML

Previous Next

Package: emacs;

Reported by: Nick Daly <nick.m.daly <at> gmail.com>

Date: Mon, 12 Aug 2019 04:20:01 UTC

Severity: minor

Tags: fixed

Merged with 37397

Found in version 26.2

Fixed in version 27.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 37009 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: 37009 <at> debbugs.gnu.org, nick.m.daly <at> gmail.com
Subject: Re: bug#37009: EWW Gets Confused on Invalid HTML
Date: Tue, 13 Aug 2019 21:13:46 +0300
> From: Noam Postavsky <npostavs <at> gmail.com>
> Date: Tue, 13 Aug 2019 13:55:01 -0400
> Cc: 37009 <at> debbugs.gnu.org
> 
> > Unfortunately, the page does not escape the less-than symbol before "xs"
> > on the second line, so the "<-" (and several more characters) aren't
> > displayed.
> 
> I'm not sure how feasible it will be to fix this at all.  Eww relies on
> libxml for parsing, and it's not as flexible as a typical web browser:
> 
>     (with-temp-buffer
>       (insert "<html>
>       <body>abc <- xyz<body>
>     </html>")
>       (libxml-parse-html-region (point-min) (point-max)))
> 
>     ;=> (html nil (body nil "abc\n"))

Maybe we should report this to libxml developers and hear their
opinion?




This bug report was last modified 5 years and 279 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.