From unknown Sat Sep 13 07:22:32 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#79376 <79376@debbugs.gnu.org> To: bug#79376 <79376@debbugs.gnu.org> Subject: Status: [PATCH] [WIP] Fix mm multibyte Reply-To: bug#79376 <79376@debbugs.gnu.org> Date: Sat, 13 Sep 2025 14:22:32 +0000 retitle 79376 [PATCH] [WIP] Fix mm multibyte reassign 79376 emacs submitter 79376 Manuel Giraud severity 79376 normal tag 79376 patch thanks From debbugs-submit-bounces@debbugs.gnu.org Wed Sep 03 05:33:53 2025 Received: (at submit) by debbugs.gnu.org; 3 Sep 2025 09:33:54 +0000 Received: from localhost ([127.0.0.1]:38595 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1utjsH-0003sy-G1 for submit@debbugs.gnu.org; Wed, 03 Sep 2025 05:33:53 -0400 Received: from lists.gnu.org ([2001:470:142::17]:55458) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1utjsF-0003sf-I9 for submit@debbugs.gnu.org; Wed, 03 Sep 2025 05:33:52 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1utjs6-0001Og-Fw for bug-gnu-emacs@gnu.org; Wed, 03 Sep 2025 05:33:44 -0400 Received: from ledu-giraud.fr ([51.159.28.247]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1utjs2-0002di-3u for bug-gnu-emacs@gnu.org; Wed, 03 Sep 2025 05:33:41 -0400 DKIM-Signature: v=1; a=ed25519-sha256; c=simple/simple; s=ed25519; bh=+7plyEuO xDAo/XKsSM5SMnHgdiSyVWhU+wZcZCqdQjI=; h=date:subject:to:from; d=ledu-giraud.fr; b=PD+e0pcb5jaf/J3lWaCtwU2xDhgAtbIUCbZNE+LZVjZVy0E9l3 FDpYVKG3gR3ZkA5Qz6WdLbtHCGsj106Ny9Ag== DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; s=rsa; bh=+7plyEuOxDAo/XKs SM5SMnHgdiSyVWhU+wZcZCqdQjI=; h=date:subject:to:from; d=ledu-giraud.fr; b=sPjQRwA1tmTa+4u5ungZ7l/BMqkQudgvHtRGNk6paSWTZ3lT66 kuGvaBdRibllONYkDka4dPbJek1FTJphyPym5Y0v1N6i1zcqO4RgqOuo6lCIiljQ/ouAN9 eRXtXNUswigUfZFLPhw7wK4nTdYnj1mn9Q92sAKDwzV8lFJZ2i6qBob0WgZlIsl5O9oO8x 5DpHKkl365SscmbZfizgAPUDpJYeTFXMobqEfs8anUfJMQeZ0Zz1+qeEpUfF+qzsmTkwu2 fAmL0CKpzp/G6TDOdJJUWrXhHB+jeR34qQWajwvRR4+RrLlNL9j1KDn76hljDo+lCmNnKv FTs32TrjEg/w== Received: from computer ( [10.1.1.1]) by ledu-giraud.fr (OpenSMTPD) with ESMTPSA id 4b4e2947 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Wed, 3 Sep 2025 11:33:34 +0200 (CEST) From: Manuel Giraud To: bug-gnu-emacs@gnu.org Subject: [PATCH] [WIP] Fix mm multibyte X-Debbugs-Cc: Lars Magne Ingebrigtsen , MORIOKA Tomohiko Date: Wed, 03 Sep 2025 11:33:33 +0200 Message-ID: <87qzwnj2xe.fsf@ledu-giraud.fr> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Received-SPF: pass client-ip=51.159.28.247; envelope-from=manuel@ledu-giraud.fr; helo=ledu-giraud.fr X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.1 (/) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Tags: patch Hi, I'm trying to fix an issue in Gnus where some Atom sources (namely planet.emacslife.com/atom.xml, here) are not rendered correctly. This seems to be related to multibyte/unibyte buffer. Here is a minimal exemple to reproduce what I see: --8<---------------cut here---------------start------------->8--- (defun my/gen-handle () (with-current-buffer (get-buffer-create " foo") (erase-buffer) (insert "=E2=80=99=E2=80=A6") (list (current-buffer) '("text/html")))) (defun my/test () (let ((handle (my/gen-handle))) (mm-with-part handle (buffer-string)))) --8<---------------cut here---------------end--------------->8--- When evaluating (my/test), see that the buffer string content does not have the correct characters. I get the behaviour I wanted with the attached patch but I don't know if this is the way to handle this. In GNU Emacs 31.0.50 (build 36, x86_64-unknown-openbsd7.7) of 2025-09-03 built on computer Repository revision: 6762ffca6b387df73b62db1adcec127317328604 Repository branch: mgi/mm-multibyte-wip Windowing system distributor 'The X.Org Foundation', version 11.0.12101018 System Description: OpenBSD computer 7.7 GENERIC.MP#10 amd64 Configured using: 'configure CPPFLAGS=3D-I/usr/local/include LDFLAGS=3D-L/usr/local/lib MAKEINFO=3Dgmakeinfo --prefix=3D/home/manuel/emacs --bindir=3D/home/manuel/bin --with-x-toolkit=3Dno --with-toolkit-scroll-bars=3Dno --without-cairo --without-dbus --without-gconf --without-gsettings --without-compress-install' --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0001-WIP-Fix-mm-multibyte.patch >From 6762ffca6b387df73b62db1adcec127317328604 Mon Sep 17 00:00:00 2001 From: Manuel Giraud Date: Wed, 3 Sep 2025 10:36:13 +0200 Subject: [PATCH] [WIP] Fix mm multibyte --- lisp/gnus/mm-decode.el | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/lisp/gnus/mm-decode.el b/lisp/gnus/mm-decode.el index 759d19a047e..ba4e4983ab3 100644 --- a/lisp/gnus/mm-decode.el +++ b/lisp/gnus/mm-decode.el @@ -1306,12 +1306,13 @@ mm-with-part ;; handle-buffer is multibyte for some reason, in which case now is a good ;; time to adjust it, since we know at this point that it should ;; be unibyte. - `(let* ((handle ,handle)) - (when (and (mm-handle-buffer handle) - (buffer-name (mm-handle-buffer handle))) + `(let* ((handle ,handle) + (handle-buffer (mm-handle-buffer handle))) + (when (and handle-buffer (buffer-name handle-buffer)) (with-temp-buffer - (mm-disable-multibyte) - (insert-buffer-substring (mm-handle-buffer handle)) + (unless (with-current-buffer handle-buffer (mm-multibyte-p)) + (mm-disable-multibyte)) + (insert-buffer-substring handle-buffer) (mm-decode-content-transfer-encoding (mm-handle-encoding handle) (mm-handle-media-type handle)) -- 2.51.0 --=-=-= Content-Type: text/plain -- Manuel Giraud --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Wed Sep 03 08:57:52 2025 Received: (at 79376) by debbugs.gnu.org; 3 Sep 2025 12:57:52 +0000 Received: from localhost ([127.0.0.1]:39329 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1utn3g-0008Pc-1e for submit@debbugs.gnu.org; Wed, 03 Sep 2025 08:57:52 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52856) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1utn3a-0008P4-Jk for 79376@debbugs.gnu.org; Wed, 03 Sep 2025 08:57:50 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1utn3T-0004pt-3X; Wed, 03 Sep 2025 08:57:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=gcOXwP8MhyNUdqC7HhQOH+EeGDwl8E0NzrJbFQMBLRk=; b=Ifk1DPhpuHwcDbUxTzXv 6l6lAKHQCWNzmz2XhatrIsnOvH++Pneg1iVpEYZrM2aPGCD+ia1jcBBv1xQdgekaISZ2/Vq5VW08I sYYU/8kk6HTerAoy3xh2saiWaxHlt5Ah8292TjxChKY7kTXvzeRWDeVMuQyc3ctP2qOfAqc1as8q2 IxUhDPJ2iUWu5MjPYhJV4RWLnH5nkV/n5YD+7nVlBYa52SbqWYQ3kjgc08P3hWCQnuYNSHkiNDPXP Uq63p2BI01hKuuq/BtRS8qvzWGqlcfbiujnSPqknhyjXnHxelt384Ayg/S/dxUMBwbnqimBPMPOTL p7BCDVKg5v1xIw==; Date: Wed, 03 Sep 2025 15:57:36 +0300 Message-Id: <86wm6fk81r.fsf@gnu.org> From: Eli Zaretskii To: Manuel Giraud In-Reply-To: <87qzwnj2xe.fsf@ledu-giraud.fr> (message from Manuel Giraud on Wed, 03 Sep 2025 11:33:33 +0200) Subject: Re: bug#79376: [PATCH] [WIP] Fix mm multibyte References: <87qzwnj2xe.fsf@ledu-giraud.fr> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79376 Cc: larsi@gnus.org, morioka@jaist.ac.jp, 79376@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Cc: Lars Magne Ingebrigtsen , > MORIOKA Tomohiko > From: Manuel Giraud > Date: Wed, 03 Sep 2025 11:33:33 +0200 > > I'm trying to fix an issue in Gnus where some Atom sources (namely > planet.emacslife.com/atom.xml, here) are not rendered correctly. > > This seems to be related to multibyte/unibyte buffer. Here is a minimal > exemple to reproduce what I see: > > --8<---------------cut here---------------start------------->8--- > (defun my/gen-handle () > (with-current-buffer (get-buffer-create " foo") > (erase-buffer) > (insert "’…") > (list (current-buffer) '("text/html")))) > > (defun my/test () > (let ((handle (my/gen-handle))) > (mm-with-part handle > (buffer-string)))) > --8<---------------cut here---------------end--------------->8--- > > When evaluating (my/test), see that the buffer string content does not > have the correct characters. Hmm... I'm not familiar with this code, but the comment in mm-with-part says: ;; The handle-buffer's content is a sequence of bytes, not a sequence of ;; chars, so the buffer should be unibyte. It may happen that the ;; handle-buffer is multibyte for some reason, in which case now is a good ;; time to adjust it, since we know at this point that it should ;; be unibyte. But your test case inserts a multibyte string into the buffer, so aren't you violating what this macro expects and should handle? And also, is a call to buffer-string something that this macro's body is useful for? Apologies if I'm not making sense. From debbugs-submit-bounces@debbugs.gnu.org Wed Sep 03 09:53:23 2025 Received: (at 79376) by debbugs.gnu.org; 3 Sep 2025 13:53:23 +0000 Received: from localhost ([127.0.0.1]:39471 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1utnvO-0002vz-OX for submit@debbugs.gnu.org; Wed, 03 Sep 2025 09:53:23 -0400 Received: from ledu-giraud.fr ([51.159.28.247]:22487) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1utnvJ-0002vh-Md for 79376@debbugs.gnu.org; Wed, 03 Sep 2025 09:53:20 -0400 DKIM-Signature: v=1; a=ed25519-sha256; c=simple/simple; s=ed25519; bh=1nCNzLXF R5kZj+nIxgrmepNEbVC7dYK3Tcn8XBi/Ekc=; h=date:references:in-reply-to: subject:cc:to:from; d=ledu-giraud.fr; b=IS5iqv1+msQQISk1UdLYKrKO4Ovob/ TI6PnHBXvUNBpRxojHjHqkh9BaG1XNat+EgZ54EvOm6zFjoUp4nSPLBg== DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; s=rsa; bh=1nCNzLXFR5kZj+nI xgrmepNEbVC7dYK3Tcn8XBi/Ekc=; h=date:references:in-reply-to:subject: cc:to:from; d=ledu-giraud.fr; b=FJxsdPbDqbotgSGINeVywDTJYEvjDefq9fW7Es H2KWMUWVCRW0Gjuop4HdLyX6/bbEq7d7BRhszcmUjLjCH+AaBtG+f2p75bv6KVrIi8mOoM J/b5ft3F9L0yo+/C5LSIj7K/fcq3lmSmScBRQN75bY2utzF8A4xVQilwTLoOHKZXqfpfCZ 45O7sGG4+E8iMtvpi1eYf1OCmAUL+oKlpjioa04BztodxfKBtLp1JtAF9RC84d6iJRTk8L pExYcs+cXX34IWmPqSf3S+2nZ3oUwxILkodBT2ceuccNWMWbWOzRvdvcZipjvrGDAKS/Su yyWdQ7Ifd1fmk1g4/q9CHqXQ== Received: from computer ( [10.1.1.1]) by ledu-giraud.fr (OpenSMTPD) with ESMTPSA id 7d8f504b (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Wed, 3 Sep 2025 15:53:15 +0200 (CEST) From: Manuel Giraud To: Eli Zaretskii Subject: Re: bug#79376: [PATCH] [WIP] Fix mm multibyte In-Reply-To: <86wm6fk81r.fsf@gnu.org> References: <87qzwnj2xe.fsf@ledu-giraud.fr> <86wm6fk81r.fsf@gnu.org> Date: Wed, 03 Sep 2025 15:53:14 +0200 Message-ID: <87plc7hcc5.fsf@ledu-giraud.fr> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 79376 Cc: larsi@gnus.org, morioka@jaist.ac.jp, 79376@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Eli Zaretskii writes: >> Cc: Lars Magne Ingebrigtsen , >> MORIOKA Tomohiko >> From: Manuel Giraud >> Date: Wed, 03 Sep 2025 11:33:33 +0200 >>=20 >> I'm trying to fix an issue in Gnus where some Atom sources (namely >> planet.emacslife.com/atom.xml, here) are not rendered correctly. >>=20 >> This seems to be related to multibyte/unibyte buffer. Here is a minimal >> exemple to reproduce what I see: >>=20 >> --8<---------------cut here---------------start------------->8--- >> (defun my/gen-handle () >> (with-current-buffer (get-buffer-create " foo") >> (erase-buffer) >> (insert "=E2=80=99=E2=80=A6") >> (list (current-buffer) '("text/html")))) >>=20 >> (defun my/test () >> (let ((handle (my/gen-handle))) >> (mm-with-part handle >> (buffer-string)))) >> --8<---------------cut here---------------end--------------->8--- >>=20 >> When evaluating (my/test), see that the buffer string content does not >> have the correct characters. > > Hmm... I'm not familiar with this code, but the comment in > mm-with-part says: > > ;; The handle-buffer's content is a sequence of bytes, not a sequence of > ;; chars, so the buffer should be unibyte. It may happen that the > ;; handle-buffer is multibyte for some reason, in which case now is a g= ood > ;; time to adjust it, since we know at this point that it should > ;; be unibyte. > > But your test case inserts a multibyte string into the buffer, so > aren't you violating what this macro expects and should handle? Yes, I've seen this comment and I do think that I'm violating what is expected here=E2=80=A6 but then so does `mm-shr' (in my example trying to r= ead "planet.emacslife.com/atom.xml"). > And also, is a call to buffer-string something that this macro's body > is useful for? In mm-decode.el:1903, there is the following code: (decode-coding-string (buffer-string) coding) > Apologies if I'm not making sense. No, I think you're perfectly on point. --=20 Manuel Giraud From debbugs-submit-bounces@debbugs.gnu.org Thu Sep 04 05:47:57 2025 Received: (at 79376) by debbugs.gnu.org; 4 Sep 2025 09:47:57 +0000 Received: from localhost ([127.0.0.1]:45619 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1uu6ZQ-0001MV-LO for submit@debbugs.gnu.org; Thu, 04 Sep 2025 05:47:57 -0400 Received: from ledu-giraud.fr ([51.159.28.247]:37718) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1uu6ZL-0001LS-ST for 79376@debbugs.gnu.org; Thu, 04 Sep 2025 05:47:53 -0400 DKIM-Signature: v=1; a=ed25519-sha256; c=simple/simple; s=ed25519; bh=DOL1UrMn 6pafckQ30VXfTSxw+oeENZKhRcK2jflZSOA=; h=date:references:in-reply-to: subject:cc:to:from; d=ledu-giraud.fr; b=ZUcx5uUKlZYcxumgTI5U9gaf6z2cnu bBVEdhtkmJCk66WcgplV7ylZjga7OiFe2tFBvj6FhpPuc0WyKbe0McAw== DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; s=rsa; bh=DOL1UrMn6pafckQ3 0VXfTSxw+oeENZKhRcK2jflZSOA=; h=date:references:in-reply-to:subject: cc:to:from; d=ledu-giraud.fr; b=wg4ip1y7N/dViK4YKQSM3veHBm+6kqm+YhEOHZ SkbyNJADZMMcp0f5vts8uYjGVoIt96en/6NqkMlaSowvoEvxpplF28LRIIaJutf67pxhrq IUIeRQHnQSvz2EBuEBRFm4PMM4kA7DFrxdhVHuvmCWOWTZ6+jXfQcT+40niujHoFdtj5bY 5Jo/O9VBAabGwhNWX68zNmzLxztpqS769vhnb52csI0AkxTv7bku2yiVjXSNxuHwJXAH2G aP8aYuMY74E+pNl0SalglZIG/fkF/X9ygiFs7UroBvmHNIAJ9V947L3F5duyWTUzK1K0zD 8tRvY+87cAFmDsThayl8cmDA== Received: from computer ( [10.1.1.1]) by ledu-giraud.fr (OpenSMTPD) with ESMTPSA id 792954d1 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 4 Sep 2025 11:47:49 +0200 (CEST) From: Manuel Giraud To: 79376@debbugs.gnu.org Subject: Re: bug#79376: [PATCH] [WIP] Fix mm multibyte In-Reply-To: <87qzwnj2xe.fsf@ledu-giraud.fr> References: <87qzwnj2xe.fsf@ledu-giraud.fr> Date: Thu, 04 Sep 2025 11:47:48 +0200 Message-ID: <87v7lymtvf.fsf@ledu-giraud.fr> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 79376 Cc: Lars Magne Ingebrigtsen , Eli Zaretskii , MORIOKA Tomohiko X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --=-=-= Content-Type: text/plain Hi, Hopefully, this new patch is a better fix. AFAIU, with this, the content of the temporary MIME buffer is preserved as unibyte (as it should?) and its content is encoded from a possibly multibyte buffer. FWIW, I did not used `insert-buffer-substring' anymore as this is using `string-make-unibyte' that does not do TRT. --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=0001-Do-preserve-MIME-buffer-as-unibyte.patch >From 0be5c11bf307c32d0b2ee1e27f20f9a1ef80e030 Mon Sep 17 00:00:00 2001 From: Manuel Giraud Date: Thu, 4 Sep 2025 11:23:05 +0200 Subject: [PATCH] Do preserve MIME buffer as unibyte * lisp/gnus/mm-decode.el (mm-copy-to-buffer): Preserve unibyte in MIME buffer. --- lisp/gnus/mm-decode.el | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/lisp/gnus/mm-decode.el b/lisp/gnus/mm-decode.el index 759d19a047e..732c8c42837 100644 --- a/lisp/gnus/mm-decode.el +++ b/lisp/gnus/mm-decode.el @@ -793,17 +793,15 @@ mm-dissect-multipart (defun mm-copy-to-buffer () "Copy the contents of the current buffer to a fresh buffer." - (let ((obuf (current-buffer)) - (mb enable-multibyte-characters) - beg) + (let (content) (goto-char (point-min)) (search-forward-regexp "^\n" nil 'move) ;; There might be no body. - (setq beg (point)) + (setq content (buffer-substring (point) (point-max))) (with-current-buffer (generate-new-buffer " *mm*") ;; Preserve the data's unibyteness (for url-insert-file-contents). - (set-buffer-multibyte mb) - (insert-buffer-substring obuf beg) + (set-buffer-multibyte nil) + (insert (encode-coding-string content 'undecided)) (current-buffer)))) (defun mm-display-parts (handle) -- 2.51.0 --=-=-= Content-Type: text/plain -- Manuel Giraud --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Sat Sep 13 04:17:37 2025 Received: (at 79376) by debbugs.gnu.org; 13 Sep 2025 08:17:37 +0000 Received: from localhost ([127.0.0.1]:53197 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1uxLRw-0003MS-Sy for submit@debbugs.gnu.org; Sat, 13 Sep 2025 04:17:37 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:39796) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1uxLRs-0003Lx-Ss for 79376@debbugs.gnu.org; Sat, 13 Sep 2025 04:17:34 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uxLRk-0005qM-Hq; Sat, 13 Sep 2025 04:17:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=kfmSAoeXcAhDFfYbWXx3KAFt7fQMfRCafBc9qvJuGQE=; b=fUQBt/Ea75+K NPJGY7nOsOHIfg4buBuTCHekirjHwKCRkVJhp+bvpPN+tP9CprctW6lxYQRGPix05m+gx0YGAXXnS 23kldSjTOrp4CXOQg+XOiXoSaLg7DxkDUcbycLOQtQga1Ynmx+59QSHb0b0sd32E7cKZCxattGlIR MYLEtjDLa2QQJyEc32I5WVHsY5aMNY/uaVlA/UHMCREFLdpYI4wYMiD9OPYnVl+GMPvmY5aRFSGBO HqyFMpfSoZPs7nTrbESXTckbwn2RcGWhD/x5yMp0oQgM8Bg/QZSmf5UpZROdF0EtkohxnanP654tl mM8H62IfR3//om4fhnmR1g==; Date: Sat, 13 Sep 2025 11:17:21 +0300 Message-Id: <861poasr5a.fsf@gnu.org> From: Eli Zaretskii To: Manuel Giraud In-Reply-To: <87v7lymtvf.fsf@ledu-giraud.fr> (message from Manuel Giraud on Thu, 04 Sep 2025 11:47:48 +0200) Subject: Re: bug#79376: [PATCH] [WIP] Fix mm multibyte References: <87qzwnj2xe.fsf@ledu-giraud.fr> <87v7lymtvf.fsf@ledu-giraud.fr> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79376 Cc: larsi@gnus.org, morioka@jaist.ac.jp, 79376@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Manuel Giraud > Cc: Lars Magne Ingebrigtsen , MORIOKA Tomohiko > , > Eli Zaretskii > Date: Thu, 04 Sep 2025 11:47:48 +0200 > > Hopefully, this new patch is a better fix. AFAIU, with this, the > content of the temporary MIME buffer is preserved as unibyte (as it > should?) and its content is encoded from a possibly multibyte buffer. I'm still not convinced this is the correct fix, see below. > FWIW, I did not used `insert-buffer-substring' anymore as this is using > `string-make-unibyte' that does not do TRT. How is that not TRT, can you tell the details? (In any case, the doc string of insert-buffer-substring is misleading, since the function doesn't call string-make-unibyte, at least not directly. I feel that we should take a step back and examine your original problem in more detail. In your OP, you said "I'm trying to fix an issue in Gnus where some Atom sources (namely planet.emacslife.com/atom.xml, here) are not rendered correctly", but never told the details. Can we please see those details? I'm asking because it is not clear to me that unconditionally making the buffer returned by mm-copy-to-buffer unibyte is TRT. And if it must be unibyte, it isn't clear to me how why inserting stuff there like it does in the existing code base is incorrect. > (defun mm-copy-to-buffer () > "Copy the contents of the current buffer to a fresh buffer." > - (let ((obuf (current-buffer)) > - (mb enable-multibyte-characters) > - beg) > + (let (content) > (goto-char (point-min)) > (search-forward-regexp "^\n" nil 'move) ;; There might be no body. > - (setq beg (point)) > + (setq content (buffer-substring (point) (point-max))) > (with-current-buffer > (generate-new-buffer " *mm*") > ;; Preserve the data's unibyteness (for url-insert-file-contents). > - (set-buffer-multibyte mb) > - (insert-buffer-substring obuf beg) > + (set-buffer-multibyte nil) > + (insert (encode-coding-string content 'undecided)) > (current-buffer)))) The ELisp manual explicitly recommends against using 'undecided' when encoding, so at the very least this needs to be rethought. Also, your change has the disadvantage of consing a string, where the original code doesn't. But these details should be considered once we have a clear understanding of the problem which prompted your to make changes there. Thanks. From debbugs-submit-bounces@debbugs.gnu.org Sat Sep 13 06:28:00 2025 Received: (at 79376) by debbugs.gnu.org; 13 Sep 2025 10:28:00 +0000 Received: from localhost ([127.0.0.1]:53780 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1uxNU8-00089A-0J for submit@debbugs.gnu.org; Sat, 13 Sep 2025 06:28:00 -0400 Received: from ledu-giraud.fr ([51.159.28.247]:9126) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1uxNU3-00088g-BI for 79376@debbugs.gnu.org; Sat, 13 Sep 2025 06:27:58 -0400 DKIM-Signature: v=1; a=ed25519-sha256; c=simple/simple; s=ed25519; bh=n70WwUMD o0SufCnW65IKcjh15Eku7N/9TxHRo1GQPWU=; h=date:references:in-reply-to: subject:cc:to:from; d=ledu-giraud.fr; b=nBG2vUQfIVoyXZDtXke29D9bajzbt5 JXd4yf5a2A3wu3GIkWhs7zCpWNLri69IfOCnAHzd9+UTII2jhXL34TCA== DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; s=rsa; bh=n70WwUMDo0SufCnW 65IKcjh15Eku7N/9TxHRo1GQPWU=; h=date:references:in-reply-to:subject: cc:to:from; d=ledu-giraud.fr; b=DKbOPI6oE/ZWNLC3QsvwGAPNRfIe16N205lY7t kGz4vcpATcWARaYHGqUTSbmaTQuSpoMv9vmsmGkvVp/oOGZYwI/tCHJB+WqgsOp6eE1Juw Py1mzInMHGKxSbQvS+qxiJRN69cXRz8zqeEx7bLNcg2aBpd6LRv0YRZNp8uyzAQ/59Yenz /eo59Rk1JOMzZSw4d3wQxQZX9h3v7ePN722MwDJ00Geznrq44/sfpY2LfFpO312jLp+bzZ ECT4T2VfZWwVz2FykJ1vjuayF+eJGpHFs3jUgdNzIoLzhDxP0d5i3vRoLvH1WbkhBGEogn LMG/mMzG5JzFDyttNGRvkPdA== Received: from computer ( [10.1.1.1]) by ledu-giraud.fr (OpenSMTPD) with ESMTPSA id c65f0e02 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Sat, 13 Sep 2025 12:27:53 +0200 (CEST) From: Manuel Giraud To: Eli Zaretskii Subject: Re: bug#79376: [PATCH] [WIP] Fix mm multibyte In-Reply-To: <861poasr5a.fsf@gnu.org> References: <87qzwnj2xe.fsf@ledu-giraud.fr> <87v7lymtvf.fsf@ledu-giraud.fr> <861poasr5a.fsf@gnu.org> Date: Sat, 13 Sep 2025 12:27:50 +0200 Message-ID: <87tt16lk9l.fsf@ledu-giraud.fr> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 79376 Cc: larsi@gnus.org, morioka@jaist.ac.jp, 79376@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Eli Zaretskii writes: >> From: Manuel Giraud >> Cc: Lars Magne Ingebrigtsen , MORIOKA Tomohiko >> , >> Eli Zaretskii >> Date: Thu, 04 Sep 2025 11:47:48 +0200 >>=20 >> Hopefully, this new patch is a better fix. AFAIU, with this, the >> content of the temporary MIME buffer is preserved as unibyte (as it >> should?) and its content is encoded from a possibly multibyte buffer. > > I'm still not convinced this is the correct fix, see below. > >> FWIW, I did not used `insert-buffer-substring' anymore as this is using >> `string-make-unibyte' that does not do TRT. > > How is that not TRT, can you tell the details? (In any case, the doc > string of insert-buffer-substring is misleading, since the function > doesn't call string-make-unibyte, at least not directly. Ok my assumption was based on the docstring only so never mind. > I feel that we should take a step back and examine your original > problem in more detail. In your OP, you said "I'm trying to fix an > issue in Gnus where some Atom sources (namely > planet.emacslife.com/atom.xml, here) are not rendered correctly", but > never told the details. Can we please see those details? Yes of course. When I want to read an entry from planet.emacslife.com/atom.xml, the article buffer contains, for example, the following excerpt: --8<---------------cut here---------------start------------->8--- Roman Numerals. On the one hand, its hard to understand why anyone cares anymore. Some, like the late Rich Stevens considered them an anachronistic barbarism and labeled his books Volume 1, 2, & rather than the more conventional Volume I, II, &. Others continue to label volumes with the conventional Roman numerals and, of course, theres all those buildings with their erection date labeled, of course, with Roman numerals on their facade= .=20 --8<---------------cut here---------------end--------------->8--- I expect to see : "On the one hand, it=E2=80=99s hard to understand..." and "books =E2=80=9CVolume 1, 2, =E2=80=A6=E2=80=9D rather". This is what I'm = trying to fix here. FWIW, I've opened the file which seems to have the content of an Atom source (here: ~/News/atom/planet.emacslife.com.atom.xml.eld) and this file is encoded in UTF-8 and such strings are displayed correctly. > I'm asking because it is not clear to me that unconditionally making > the buffer returned by mm-copy-to-buffer unibyte is TRT. And if it > must be unibyte, it isn't clear to me how why inserting stuff there > like it does in the existing code base is incorrect. > >> (defun mm-copy-to-buffer () >> "Copy the contents of the current buffer to a fresh buffer." >> - (let ((obuf (current-buffer)) >> - (mb enable-multibyte-characters) >> - beg) >> + (let (content) >> (goto-char (point-min)) >> (search-forward-regexp "^\n" nil 'move) ;; There might be no body. >> - (setq beg (point)) >> + (setq content (buffer-substring (point) (point-max))) >> (with-current-buffer >> (generate-new-buffer " *mm*") >> ;; Preserve the data's unibyteness (for url-insert-file-contents). >> - (set-buffer-multibyte mb) >> - (insert-buffer-substring obuf beg) >> + (set-buffer-multibyte nil) >> + (insert (encode-coding-string content 'undecided)) >> (current-buffer)))) > > The ELisp manual explicitly recommends against using 'undecided' when > encoding, so at the very least this needs to be rethought.=20=20 Ok I was not aware of this. > Also, your change has the disadvantage of consing a string, where the > original code doesn't. Fair enough. > But these details should be considered once we have a clear > understanding of the problem which prompted your to make changes > there. > > Thanks. > > --=20 Manuel Giraud From debbugs-submit-bounces@debbugs.gnu.org Sat Sep 13 07:03:04 2025 Received: (at 79376) by debbugs.gnu.org; 13 Sep 2025 11:03:04 +0000 Received: from localhost ([127.0.0.1]:53950 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1uxO23-0002Xc-V8 for submit@debbugs.gnu.org; Sat, 13 Sep 2025 07:03:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50184) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1uxO20-0002Ws-Hy for 79376@debbugs.gnu.org; Sat, 13 Sep 2025 07:03:01 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uxO1t-0001cM-9s; Sat, 13 Sep 2025 07:02:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=BV0yUZDKoyyNIDB/jkjthKlVPu+1EEQQG+X46qgNulQ=; b=WL7QldfFdRUCg8UihDXB rTWBLp18fCbvh9wiE14q1e4TO4OzmLFb02PyavuTu1MpuwoGxoueKlC2Tapcpj3pVsaac/fSMygZF Dihup5wr1VyoggbKxTsX2S0oqhjVmOXU6l1fcSCqQkB213VNNWqUoaHBMj9MtegzLZLtD3a5f/P5T SA2f3PlAWiR0h1psfonVgI/fVcdqmGBiNpWH4l2apEjKRD5IANW+IfGVFuA3bPVa/cLGFpNtJZ1Mu CzYysmmqVopauoc+e9yrCo9lh8xdiMwKVv2tszdlsAFRWruAJporRjLlzn+73fOjNsY1Lj+tWLKtZ vouvOAzK7esobQ==; Date: Sat, 13 Sep 2025 14:02:50 +0300 Message-Id: <86348qr4x1.fsf@gnu.org> From: Eli Zaretskii To: Manuel Giraud In-Reply-To: <87tt16lk9l.fsf@ledu-giraud.fr> (message from Manuel Giraud on Sat, 13 Sep 2025 12:27:50 +0200) Subject: Re: bug#79376: [PATCH] [WIP] Fix mm multibyte References: <87qzwnj2xe.fsf@ledu-giraud.fr> <87v7lymtvf.fsf@ledu-giraud.fr> <861poasr5a.fsf@gnu.org> <87tt16lk9l.fsf@ledu-giraud.fr> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79376 Cc: larsi@gnus.org, morioka@jaist.ac.jp, 79376@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Manuel Giraud > Cc: 79376@debbugs.gnu.org, larsi@gnus.org, morioka@jaist.ac.jp > Date: Sat, 13 Sep 2025 12:27:50 +0200 > > Eli Zaretskii writes: > > > I feel that we should take a step back and examine your original > > problem in more detail. In your OP, you said "I'm trying to fix an > > issue in Gnus where some Atom sources (namely > > planet.emacslife.com/atom.xml, here) are not rendered correctly", but > > never told the details. Can we please see those details? > > Yes of course. When I want to read an entry from > planet.emacslife.com/atom.xml, the article buffer contains, for example, > the following excerpt: > > --8<---------------cut here---------------start------------->8--- > Roman Numerals. On the one hand, its hard to understand why anyone cares > anymore. Some, like the late Rich Stevens considered them an anachronistic > barbarism and labeled his books Volume 1, 2, & rather than the more > conventional Volume I, II, &. Others continue to label volumes with the > conventional Roman numerals and, of course, theres all those buildings with > their erection date labeled, of course, with Roman numerals on their facade. > --8<---------------cut here---------------end--------------->8--- > > I expect to see : "On the one hand, it’s hard to understand..." and > "books “Volume 1, 2, …” rather". This is what I'm trying to fix here. > > FWIW, I've opened the file which seems to have the content of an Atom > source (here: ~/News/atom/planet.emacslife.com.atom.xml.eld) and this > file is encoded in UTF-8 and such strings are displayed correctly. Thanks, but this is not enough for me to understand the root cause(s). Could you take me through the code involved in processing that text until it gets to mm-copy-to-buffer, and tell what should be its processing afterwards? (If someone who knows the Gnus code reads this and has suggestions, please feel free to chime in. I'm only trying to help Manuel fix this because no one else chimes in.)