From unknown Sat Jun 21 02:57:58 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#7410 <7410@debbugs.gnu.org> To: bug#7410 <7410@debbugs.gnu.org> Subject: Status: Impossible multibyte->unibyte conversion Reply-To: bug#7410 <7410@debbugs.gnu.org> Date: Sat, 21 Jun 2025 09:57:58 +0000 retitle 7410 Impossible multibyte->unibyte conversion reassign 7410 emacs,gnus submitter 7410 Stefan Monnier severity 7410 normal tag 7410 fixed thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 15 16:42:12 2010 Received: (at submit) by debbugs.gnu.org; 15 Nov 2010 21:42:12 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PI6oV-0003zY-6m for submit@debbugs.gnu.org; Mon, 15 Nov 2010 16:42:11 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PI6oP-0003z9-J5 for submit@debbugs.gnu.org; Mon, 15 Nov 2010 16:42:09 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PI6tC-0005K2-4c for submit@debbugs.gnu.org; Mon, 15 Nov 2010 16:47:03 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:51014) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PI6tB-0005Jx-US for submit@debbugs.gnu.org; Mon, 15 Nov 2010 16:47:02 -0500 Received: from [140.186.70.92] (port=43616 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PI6t9-0005yW-Pd for bug-gnu-emacs@gnu.org; Mon, 15 Nov 2010 16:47:01 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PI6t5-0005JN-KN for bug-gnu-emacs@gnu.org; Mon, 15 Nov 2010 16:46:59 -0500 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:12230 helo=ironport2-out.pppoe.ca) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PI6t5-0005JI-Bt for bug-gnu-emacs@gnu.org; Mon, 15 Nov 2010 16:46:55 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArcHAG844UxFpY76/2dsb2JhbACUVwGNBX1ywHSDEYI5BIRajV8 X-IronPort-AV: E=Sophos;i="4.59,202,1288584000"; d="scan'208";a="82625482" Received: from 69-165-142-250.dsl.teksavvy.com (HELO ceviche.home) ([69.165.142.250]) by ironport2-out.pppoe.ca with ESMTP/TLS/ADH-AES256-SHA; 15 Nov 2010 16:46:53 -0500 Received: by ceviche.home (Postfix, from userid 20848) id 481C56611E; Mon, 15 Nov 2010 16:46:53 -0500 (EST) From: Stefan Monnier To: bug-gnu-emacs@gnu.org Subject: Impossible multibyte->unibyte conversion Message-ID: Date: Mon, 15 Nov 2010 16:46:53 -0500 MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -4.1 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.1 (----) Package: Emacs Version: 24.0.50 I get incorrect treatment of accents in gnus-article-wash-html in the trunk. More specifically, accents from latin-1 HTML email get turned into \NNN byte chars. With extra checks, I get that the accented chars are properly decoded into the *mm*<4> buffer, and then in mm-shr, we do (mm-with-part handle (when (and charset (setq charset (mm-charset-to-coding-system charset)) (not (eq charset 'ascii))) (insert (prog1 (mm-decode-coding-string (buffer-string) charset) (erase-buffer) (mm-enable-multibyte)))) (libxml-parse-html-region (point-min) (point-max))) where mm-part inserts the `handle' part into a unibyte temp buffer, thus turning those latin-1 accents back into bytes (well, in my own branch of Emacs this signals an error instead, which is how I caught it). It looks like mm-handle-buffer does not consistently return bytes (as it usually does) but also occasionally returns chars. Such inconsistencies will hurt until we get rid of them. Stefan In GNU Emacs 24.0.50.1 (i686-pc-linux-gnu, X toolkit, Xaw3d scroll bars) of 2010-11-04 on ceviche Windowing system distributor `The X.Org Foundation', version 11.0.10707000 configured using `configure 'CFLAGS=-Wall -Wno-pointer-sign -DUSE_LISP_UNION_TYPE -DSYNC_INPUT -DENABLE_CHECKING -DXASSERTS -DFONTSET_DEBUG -g -O1 -I/usr/include/GNUstep' '--enable-maintainer-mode' '--with-x-toolkit=lucid'' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: fr_CH.UTF-8 value of $XMODIFIERS: nil locale-coding-system: utf-8-unix default enable-multibyte-characters: t Major mode: Article Minor modes in effect: diff-auto-refine-mode: t electric-pair-mode: t electric-indent-mode: t url-handler-mode: t global-reveal-mode: t reveal-mode: t auto-insert-mode: t savehist-mode: t minibuffer-electric-default-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Recent input: e ( p o p t - o - b u f f e r - t o - b u f f e r SPC " SPC * m m * < 4 > > C-e M-< C-s C-w C-w C-a C-e C-c @ C-a M-x r e p o r Recent messages: Mark saved where search started mm-shr Mark saved where search started [3 times] Mark set mm-shr Entering debugger... #> Mark set Mark saved where search started Making completion list... Load-path shadows: /usr/share/emacs23/site-lisp/bbdb/bbdb-migrate hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-migrate /usr/share/emacs23/site-lisp/bbdb/bbdb hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb /usr/share/emacs23/site-lisp/bbdb/bbdb-rmail hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-rmail /usr/share/emacs23/site-lisp/bbdb/bbdb-gnus hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-gnus /usr/share/emacs23/site-lisp/bbdb/bbdb-w3 hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-w3 /usr/share/emacs23/site-lisp/bbdb/bbdb-com hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-com /usr/share/emacs23/site-lisp/bbdb/bbdb-merge hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-merge /usr/share/emacs23/site-lisp/bbdb/bbdb-ftp hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-ftp /usr/share/emacs23/site-lisp/bbdb/bbdb-sc hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-sc /usr/share/emacs23/site-lisp/bbdb/bbdb-vm hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-vm /usr/share/emacs23/site-lisp/bbdb/bbdb-gui hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-gui /usr/share/emacs23/site-lisp/bbdb/bbdb-print hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-print /usr/share/emacs23/site-lisp/bbdb/bbdb-hooks hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-hooks /usr/share/emacs23/site-lisp/bbdb/bbdb-mhe hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-mhe /usr/share/emacs23/site-lisp/bbdb/bbdb-whois hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-whois /usr/share/emacs23/site-lisp/bbdb/bbdb-snarf hides /usr/share/emacs/site-lisp/bbdb/lisp/bbdb-snarf Features: (emacsbug gnus-topic cl-specs shr url-http url-auth url-gw footnote xscheme warnings trace testcover scheme unsafep re-builder shadow inf-lisp ielm comint ring elp edebug cust-print vc-bzr filecache find-func dabbrev multi-isearch diff-mode jka-compr rect pp descr-text gnus-fun skeleton canlock sha1 hex-util novice woman tutorial help-macro man assoc info-look info help-at-pt ehelp apropos cus-edit cus-start cus-load gnus-html browse-url xml url-cache mm-url url url-proxy url-privacy url-expand url-methods url-history url-cookie url-util supercite regi flow-fill executable copyright debug gnus-draft gnus-dup mule-util sort smiley ansi-color gnus-cite mail-extr gnus-async gnus-bcklg qp byte-opt bytecomp byte-compile gnus-ml disp-table nnfolder utf-7 nnimap parse-time tls utf7 nndraft nnmh nnagent nnml gnus-agent gnus-srvr gnus-score score-mode nnvirtual gnus-msg gnus-art mm-uu mml2015 epg-config mm-view smime password-cache dig mailcap nntp gnus-cache gnus-sum nnoo gnus-group time-date gnus-undo nnmail mail-source format-spec server gnus-start gnus-spec gnus-int gnus-range message sendmail rfc822 mml mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045 ietf-drums mailabbrev gmm-utils mailheader gnus-win gnus gnus-ems nnheader mail-utils wid-edit noutline outline easy-mmode flyspell ispell eldoc checkdoc regexp-opt thingatpt help-mode easymenu view prog-mode electric url-handlers url-parse auth-source netrc gnus-util url-vars mm-util mail-prsvr reveal autoinsert uniquify advice help-fns advice-preload savehist minibuf-eldef cl cl-loaddefs proof-site proof-autoloads pg-vars bbdb-autoloads agda2 tooltip ediff-hook vc-hooks lisp-float-type mwheel x-win x-dnd tool-bar dnd fontset image fringe lisp-mode register page newcomment menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs button faces cus-face files text-properties overlay md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote make-network-process dbusbind dynamic-setting system-font-setting font-render-setting x-toolkit x multi-tty emacs) From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 10 16:54:49 2012 Received: (at 7410) by debbugs.gnu.org; 10 Apr 2012 20:54:49 +0000 Received: from localhost ([127.0.0.1]:49364 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHi5R-00062J-2H for submit@debbugs.gnu.org; Tue, 10 Apr 2012 16:54:49 -0400 Received: from hermes.netfonds.no ([80.91.224.195]:32797) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHi5O-00062B-I6 for 7410@debbugs.gnu.org; Tue, 10 Apr 2012 16:54:47 -0400 Received: from cm-84.215.51.58.getinternet.no ([84.215.51.58] helo=stories.gnus.org) by hermes.netfonds.no with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1SHi4G-0000Kz-TQ; Tue, 10 Apr 2012 22:53:36 +0200 From: Lars Magne Ingebrigtsen To: Stefan Monnier Subject: Re: bug#7410: Impossible multibyte->unibyte conversion References: X-Now-Playing: Burial's _Street Halo_: "NYC" Date: Tue, 10 Apr 2012 22:53:36 +0200 In-Reply-To: (Stefan Monnier's message of "Mon, 15 Nov 2010 16:46:53 -0500") Message-ID: User-Agent: Gnus/5.130004 (Ma Gnus v0.4) Emacs/24.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-MailScanner-ID: 1SHi4G-0000Kz-TQ X-Netfonds-MailScanner: Found to be clean X-Netfonds-MailScanner-From: larsi@gnus.org MailScanner-NULL-Check: 1334696017.07204@l3Y/mpsEpglnxFsC1bNLxA X-Spam-Status: No X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 7410 Cc: 7410@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) Stefan Monnier writes: > I get incorrect treatment of accents in gnus-article-wash-html in > the trunk. More specifically, accents from latin-1 HTML email get > turned into \NNN byte chars. I was able to reproduce this bug, but the real problem seemed to be that the article buffer was in unibyte mode after `C-u g', and that made the actual insertion go wrong. I've now fixed that. ... Oh, and now I tested the non `C-u g' case. That still doesn't work. Back to the drawing board... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 10 17:00:27 2012 Received: (at 7410) by debbugs.gnu.org; 10 Apr 2012 21:00:27 +0000 Received: from localhost ([127.0.0.1]:49370 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHiAr-0006BX-2p for submit@debbugs.gnu.org; Tue, 10 Apr 2012 17:00:27 -0400 Received: from hermes.netfonds.no ([80.91.224.195]:43866) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHiAo-0006BO-Af for 7410@debbugs.gnu.org; Tue, 10 Apr 2012 17:00:23 -0400 Received: from cm-84.215.51.58.getinternet.no ([84.215.51.58] helo=stories.gnus.org) by hermes.netfonds.no with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1SHi9g-0000R2-6y; Tue, 10 Apr 2012 22:59:12 +0200 From: Lars Magne Ingebrigtsen To: Stefan Monnier Subject: Re: bug#7410: Impossible multibyte->unibyte conversion References: X-Now-Playing: Burial's _Street Halo_: "Stolen Dog" Date: Tue, 10 Apr 2012 22:59:11 +0200 In-Reply-To: (Lars Magne Ingebrigtsen's message of "Tue, 10 Apr 2012 22:53:36 +0200") Message-ID: User-Agent: Gnus/5.130004 (Ma Gnus v0.4) Emacs/24.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-MailScanner-ID: 1SHi9g-0000R2-6y X-Netfonds-MailScanner: Found to be clean X-Netfonds-MailScanner-From: larsi@gnus.org MailScanner-NULL-Check: 1334696352.49394@zcWxL2U0UChAtr6/WkmKyA X-Spam-Status: No X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 7410 Cc: 7410@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) Found the real bug. The `gnus-article-wash-html' had parsed the displayed article, and not the original one, so it was missing charset info (and stuff). Now fixed in No Gnus, so I expect it to show up in Emacs 24.1 soon. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog http://lars.ingebrigtsen.no/ From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 10 17:00:28 2012 Received: (at control) by debbugs.gnu.org; 10 Apr 2012 21:00:28 +0000 Received: from localhost ([127.0.0.1]:49373 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHiAt-0006Bi-NG for submit@debbugs.gnu.org; Tue, 10 Apr 2012 17:00:27 -0400 Received: from hermes.netfonds.no ([80.91.224.195]:43872) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHiAr-0006BV-1g for control@debbugs.gnu.org; Tue, 10 Apr 2012 17:00:26 -0400 Received: from cm-84.215.51.58.getinternet.no ([84.215.51.58] helo=stories.gnus.org) by hermes.netfonds.no with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1SHi9j-0000RB-Sv for control@debbugs.gnu.org; Tue, 10 Apr 2012 22:59:16 +0200 Date: Tue, 10 Apr 2012 22:59:15 +0200 Message-Id: To: control@debbugs.gnu.org From: Lars Magne Ingebrigtsen Subject: control message for bug #7410 X-MailScanner-ID: 1SHi9j-0000RB-Sv X-Netfonds-MailScanner: Found to be clean X-Netfonds-MailScanner-From: larsi@gnus.org MailScanner-NULL-Check: 1334696356.4185@69wGmaWYAtQpBwmUMDugDw X-Spam-Status: No X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) tags 7410 fixed close 7410 24.1 From unknown Sat Jun 21 02:57:58 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 09 May 2012 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator