GNU bug report logs - #44307
27.1; UTF-8 parts transferred as 8bit in multipart messages fail to decode

Previous Next

Packages: gnus, emacs;

Reported by: Thomas Schneider <qsx <at> chaotikum.eu>

Date: Thu, 29 Oct 2020 14:12:01 UTC

Severity: normal

Tags: fixed

Merged with 45657

Found in version 27.1

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


Message #37 received at 44307 <at> debbugs.gnu.org (full text, mbox):

From: Alexandre Duret-Lutz <adl <at> lrde.epita.fr>
To: 44307 <at> debbugs.gnu.org
Cc: larsi <at> gnus.org
Subject: Re: bug#44307: 27.1; UTF-8 parts transferred as 8bit in multipart
 messages fail to decode
Date: Mon, 04 Jan 2021 22:54:18 +0100
Alexandre Duret-Lutz <adl <at> lrde.epita.fr> writes:
> Clicking inside this message on the "Attachement: [2. text/plain]"
> button inserts "\344\344\344\344".   I.e., that's
> the Latin-1 version of "ääää".  (M-x describe-char on these say that they
> are "not encodable by coding system utf-8-unix")

Digging the code, I believe that the unexpected conversion occurs in this macro:

(defmacro mm-with-part (handle &rest forms)
  "Run FORMS in the temp buffer containing the contents of HANDLE."
  ;; The handle-buffer's content is a sequence of bytes, not a sequence of
  ;; chars, so the buffer should be unibyte.  It may happen that the
  ;; handle-buffer is multibyte for some reason, in which case now is a good
  ;; time to adjust it, since we know at this point that it should
  ;; be unibyte.
  `(let* ((handle ,handle))
     (when (and (mm-handle-buffer handle)
		(buffer-name (mm-handle-buffer handle)))
       (with-temp-buffer
	 (mm-disable-multibyte)
	 (insert-buffer-substring (mm-handle-buffer handle))
	 (mm-decode-content-transfer-encoding
	  (mm-handle-encoding handle)
	  (mm-handle-media-type handle))
	 ,@forms))))


In my case the (mm-handle-buffer handle) is multibyte.  This
multibyteness was preserved by mm-copy-to-buffer while creating the
handle buffer, but a did not check the original source of it, since the
comment above the macro suggests that having multibyte parts is OK.

However the 

	 (mm-disable-multibyte)
	 (insert-buffer-substring (mm-handle-buffer handle))

seems to be doing harm.  The documentation of
insert-buffer-substring/insert notes that multibyte strings will be
converted by taking the lowest 8 bits of each multibyte character, not
by spliting those characters.

Mimicking it with

(let ((utf8string "ääää")) ; typed as utf8
  (with-temp-buffer
    (mm-disable-multibyte)
    (insert utf8string)
    (print (string-bytes utf8string))
    (print (string-bytes (buffer-string)))
    (buffer-string)))

this prints :

8
4
"\344\344\344\344"


So it would seem that (mm-disable-multibyte) should be called *after* the
insertion and not before, in order to perserve all bytes.

Does this make sense?

-- 
Alexandre Duret-Lutz




This bug report was last modified 4 years and 159 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.