GNU bug report logs - #35507
Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird

Previous Next

Packages: gnus, emacs;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Tue, 30 Apr 2019 19:22:02 UTC

Severity: minor

Tags: fixed

Found in version 27

Done: "Basil L. Contovounesios" <contovob <at> tcd.ie>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: 35507 <at> debbugs.gnu.org, andrewjmoreton <at> gmail.com
Subject: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from	Thunderbird
Date: Thu, 02 May 2019 14:04:26 +0300
On May 2, 2019 10:17:51 AM GMT+03:00, Andy Moreton <andrewjmoreton <at> gmail.com> wrote:
> On Wed 01 May 2019, Noam Postavsky wrote:
> 
> > Eli Zaretskii <eliz <at> gnu.org> writes:
> >
> >>> From: Andy Moreton <andrewjmoreton <at> gmail.com>
> >>> Date: Wed, 01 May 2019 17:42:18 +0100
> >>> 
> >>> +		     (mm-decode-string text 'utf-8))))
> >>
> >> As I said, I'm not sure we should do this, let alone
> unconditionally
> >> force UTF-8 here, but if we must, why not use decode-coding-string?
> >> Do we really need the mm-* stuff?
> >
> > As far as I can tell, the mm-* version is useful for handling stuff
> lke
> > "UTF-8" as the charset argument (which might be useful if we extract
> it
> > from the "Content-Type: text/plain; charset=UTF-8" header).  If
> passing
> > 'utf-8, then it's just the same as calling decode-coding-string.
> 
> OK, in that case we could indeed just call decode-coding-string.
> 
> > For a default if we don't find a charset header, I guess `undecided'
> > would make more sense, right?  After all, Emacs already has the
> coding
> > detection machinery, may as well use it.
> 
> Please re-read the original bug report: the problem is with malformed
> messages that do not contain a charset field in the Content-Type
> header.
> 
> The one-liner patch changes the default for inline display in the
> Gnus article buffer to assume UTF-8 when nothing is specified, rather
> than just inserting the text without decoding it.
> 
> That should result in text that actually is UTF-8 being displayed
> correctly, and no change to plain ASCII. For anything else, the user
> can
> use the `gnus-mime-view-part-as-charset' command to override the
> default.
> 
>     AndyM

Using 'undecided' doesn't disable decoding, it just means Emacs will try to detect the correct encoding by looking at the text (not at the charset header).  In a UTF-8 locale, we will guess UTF-8 anyway, unless we see invalid sequences.

So yes, I think Noam is right, and 'undecided' is a better alternative here.




This bug report was last modified 6 years and 81 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.