GNU bug report logs -
#30789
26.0.91; xml-parse-region works but libxml-parse-html-region doesn't
Previous Next
Reported by: Katsumi Yamaoka <yamaoka <at> jpl.org>
Date: Mon, 12 Mar 2018 23:40:02 UTC
Severity: wishlist
Tags: wontfix
Found in version 26.0.91
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
Message #11 received at 30789 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Tue, 13 Mar 2018 01:44:22 +0100, Lars Ingebrigtsen wrote:
> libxml is more strict about correctness of the input than most other
> HTML parsers. I don't think there's anything we can do about this
> problematic input other than ponder whether Emacs should use a different
> HTML parser, which I think sounds of unlikely. :-)
I see. I agree not to modify libxml. Jidanni, how about trying
the following patch personally if you often get such broken mails?
Though I'm not quite sure if it does not cause another problem,
it fixes at least the mail in question.
[Message part 2 (text/x-patch, inline)]
--- mm-decode.el~ 2018-02-28 02:01:37.897607000 +0000
+++ mm-decode.el 2018-03-13 02:23:04.321753900 +0000
@@ -1810,6 +1810,11 @@
(when (and (or coding
(setq coding (mm-charset-to-coding-system charset nil t)))
(not (eq coding 'ascii)))
+ ;; Remove extra bytes in utf-8 encoded data.
+ (when (eq coding 'utf-8)
+ (goto-char (point-min))
+ (while (re-search-forward "[\x00-\x7f]+\\([\x80-\xbf]\\)" nil t)
+ (replace-match "\\1")))
(insert (prog1
(decode-coding-string (buffer-string) coding)
(erase-buffer)
This bug report was last modified 7 years and 39 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.