GNU bug report logs - #79376
[PATCH] [WIP] Fix mm multibyte

Previous Next

Package: emacs;

Reported by: Manuel Giraud <manuel <at> ledu-giraud.fr>

Date: Wed, 3 Sep 2025 09:34:02 UTC

Severity: normal

Tags: patch

Full log


Message #20 received at 79376 <at> debbugs.gnu.org (full text, mbox):

From: Manuel Giraud <manuel <at> ledu-giraud.fr>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: larsi <at> gnus.org, morioka <at> jaist.ac.jp, 79376 <at> debbugs.gnu.org
Subject: Re: bug#79376: [PATCH] [WIP] Fix mm multibyte
Date: Sat, 13 Sep 2025 12:27:50 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Manuel Giraud <manuel <at> ledu-giraud.fr>
>> Cc: Lars Magne Ingebrigtsen <larsi <at> gnus.org>,  MORIOKA Tomohiko
>>  <morioka <at> jaist.ac.jp>,
>>     Eli Zaretskii <eliz <at> gnu.org>
>> Date: Thu, 04 Sep 2025 11:47:48 +0200
>> 
>> Hopefully, this new patch is a better fix.  AFAIU, with this, the
>> content of the temporary MIME buffer is preserved as unibyte (as it
>> should?) and its content is encoded from a possibly multibyte buffer.
>
> I'm still not convinced this is the correct fix, see below.
>
>> FWIW, I did not used `insert-buffer-substring' anymore as this is using
>> `string-make-unibyte' that does not do TRT.
>
> How is that not TRT, can you tell the details?  (In any case, the doc
> string of insert-buffer-substring is misleading, since the function
> doesn't call string-make-unibyte, at least not directly.

Ok my assumption was based on the docstring only so never mind.

> I feel that we should take a step back and examine your original
> problem in more detail.  In your OP, you said "I'm trying to fix an
> issue in Gnus where some Atom sources (namely
> planet.emacslife.com/atom.xml, here) are not rendered correctly", but
> never told the details.  Can we please see those details?

Yes of course.  When I want to read an entry from
planet.emacslife.com/atom.xml, the article buffer contains, for example,
the following excerpt:

--8<---------------cut here---------------start------------->8---
Roman Numerals. On the one hand, its hard to understand why anyone cares
anymore. Some, like the late Rich Stevens considered them an anachronistic
barbarism and labeled his books Volume 1, 2, & rather than the more
conventional Volume I, II, &. Others continue to label volumes with the
conventional Roman numerals and, of course, theres all those buildings with
their erection date labeled, of course, with Roman numerals on their facade. 
--8<---------------cut here---------------end--------------->8---

I expect to see : "On the one hand, it’s hard to understand..." and
"books “Volume 1, 2, …” rather".  This is what I'm trying to fix here.

FWIW, I've opened the file which seems to have the content of an Atom
source (here: ~/News/atom/planet.emacslife.com.atom.xml.eld) and this
file is encoded in UTF-8 and such strings are displayed correctly.

> I'm asking because it is not clear to me that unconditionally making
> the buffer returned by mm-copy-to-buffer unibyte is TRT.  And if it
> must be unibyte, it isn't clear to me how why inserting stuff there
> like it does in the existing code base is incorrect.
>
>>  (defun mm-copy-to-buffer ()
>>    "Copy the contents of the current buffer to a fresh buffer."
>> -  (let ((obuf (current-buffer))
>> -        (mb enable-multibyte-characters)
>> -        beg)
>> +  (let (content)
>>      (goto-char (point-min))
>>      (search-forward-regexp "^\n" nil 'move) ;; There might be no body.
>> -    (setq beg (point))
>> +    (setq content (buffer-substring (point) (point-max)))
>>      (with-current-buffer
>>            (generate-new-buffer " *mm*")
>>        ;; Preserve the data's unibyteness (for url-insert-file-contents).
>> -      (set-buffer-multibyte mb)
>> -      (insert-buffer-substring obuf beg)
>> +      (set-buffer-multibyte nil)
>> +      (insert (encode-coding-string content 'undecided))
>>        (current-buffer))))
>
> The ELisp manual explicitly recommends against using 'undecided' when
> encoding, so at the very least this needs to be rethought.  

Ok I was not aware of this.

> Also, your change has the disadvantage of consing a string, where the
> original code doesn't.

Fair enough.

> But these details should be considered once we have a clear
> understanding of the problem which prompted your to make changes
> there.
>
> Thanks.
>
>
-- 
Manuel Giraud




This bug report was last modified today.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.