GNU bug report logs - #30789
26.0.91; xml-parse-region works but libxml-parse-html-region doesn't

Previous Next

Package: emacs;

Reported by: Katsumi Yamaoka <yamaoka <at> jpl.org>

Date: Mon, 12 Mar 2018 23:40:02 UTC

Severity: wishlist

Tags: wontfix

Found in version 26.0.91

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


Message #20 received at 30789 <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>, 積丹尼 Dan
 Jacobson <jidanni <at> jidanni.org>
Cc: 30789 <at> debbugs.gnu.org
Subject: Re: bug#30789: 26.0.91;
 xml-parse-region works but libxml-parse-html-region doesn't
Date: Tue, 13 Mar 2018 12:31:09 +0900
[Message part 1 (text/plain, inline)]
On Tue, 13 Mar 2018 11:28:45 +0900, Katsumi Yamaoka wrote:
> +	;; Remove extra bytes in utf-8 encoded data.
> +	(when (eq coding 'utf-8)
> +	  (goto-char (point-min))
> +	  (while (re-search-forward "[\x00-\x7f]+\\([\x80-\xbf]\\)" nil t)
> +	    (replace-match "\\1")))

Corrected:
[Message part 2 (text/x-patch, inline)]
--- mm-decode.el~	2018-02-28 02:01:37.897607000 +0000
+++ mm-decode.el	2018-03-13 03:27:56.885844100 +0000
@@ -1810,6 +1810,13 @@
       (when (and (or coding
 		     (setq coding (mm-charset-to-coding-system charset nil t)))
 		 (not (eq coding 'ascii)))
+	;; Remove extra bytes in utf-8 encoded data.
+	(when (eq coding 'utf-8)
+	  (goto-char (point-min))
+	  (while (re-search-forward
+		  "\\([\xc2-\xf7][\x80-\xbf]?\\)[\x00-\x7f]+\\([\x80-\xbf]\\)"
+		  nil t)
+	    (replace-match "\\1\\2")))
 	(insert (prog1
 		    (decode-coding-string (buffer-string) coding)
 		  (erase-buffer)

This bug report was last modified 7 years and 39 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.