GNU bug report logs - #35507
Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird

Previous Next

Packages: emacs, gnus;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Tue, 30 Apr 2019 19:22:02 UTC

Severity: minor

Tags: fixed

Found in version 27

Done: "Basil L. Contovounesios" <contovob <at> tcd.ie>

Bug is archived. No further changes may be made.

Full log


Message #32 received at 35507 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 35507 <at> debbugs.gnu.org
Subject: Re: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from
 Thunderbird
Date: Wed, 1 May 2019 11:26:35 -0700
On 5/1/19 10:32 AM, Eli Zaretskii wrote:
> Is text/x-patch a "new media type" or not? 

It's not a registered media type so strictly speaking the RFCs' SHOULD
statements do not apply (and they are SHOULDs not MUSTs so they could be
disregarded for good reason). That being said, the ordinary and usual
intent is for the x- media types to follow these recommendations and my
bug report was filed under that assumption.

> my reading of the RFC is that we should not define
> or expect any defaults, which means this bug is squarely in
> Thunderbird's yard

Ah, sorry, I see that my bug report misstated a point. This particular
patch clearly identifies its own encoding because its header says
"Content-Type: text/plain; charset=UTF-8". (I think Git-generated
patches always specify an encoding unless it's ASCII.) So in this
particular case the RFC's recommendation seems to be respected by the
sender.

Gnus could look for a Content-Type: header in text bodies that do not
specify charsets; this would follow the Internet's robustness principle
better.

> I don't see why we should
> change Gnus in this regard, certainly not unconditionally assuming
> UTF-8.
Gnus is mishandling emails sent from Thunderbird right now, so it would
be a practical benefit for Gnus users if it did a better job of decoding
these admittedly-iffy messages.

These days, UTF-8 is by far the most common encoding specified for
non-ASCII text in email and its popularity is growing, so it's the best
choice for a default if Gnus will have one - certainly better than the
confusing behavior that Robert Pluim observed in his Gnus session.
Gnus's current behavior may have been a good idea in 1996 when RFC 2046
said US-ASCII was the default, but it stopped being a good idea in 2012
when RFC 6657 came out and said that UTF-8 should be the default if
there is a default.

Another possibility is that Gnus could ask the user which encoding to
use when the email headers don't specify one and when the text is not
ASCII; even that would be better than Gnus's current behavior of forcing
US-ASCII and displaying something like "\xe2\x80\x99" when it encounters
a non-ASCII character.





This bug report was last modified 6 years and 81 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.