GNU bug report logs -
#35507
Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
Previous Next
Reported by: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Tue, 30 Apr 2019 19:22:02 UTC
Severity: minor
Tags: fixed
Found in version 27
Done: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Tue, 30 Apr 2019 12:20:58 -0700
>
> Although Internet RFC 2046 section 4.1.2 says the default charset for
> text/* media types is US-ASCII, Internet RFC 6557 section 3 amends this
> to say that registered text/* media types should require a charset
> specification (or should say it's not needed because the payload has
> that info, which obviously doesn't apply here). It later says that if
> there is a strong reason to have a charset default, the default should
> be UTF-8.
(You meant RFC 6657, I believe.)
That's not exactly my reading of the RFC language. First, it sounds
like the text there is primarily intended for the sending MUA, not for
the receiving MUA. And second, this text:
In order to improve interoperability with deployed agents, "text/*"
media type registrations SHOULD either
a. specify that the "charset" parameter is not used for the defined
subtype, because the charset information is transported inside
the payload (such as in "text/xml"), or
b. require explicit unconditional inclusion of the "charset"
parameter, eliminating the need for a default value.
In accordance with option (a) above, registrations for "text/*" media
types that can transport charset information inside the corresponding
payloads (such as "text/html" and "text/xml") SHOULD NOT specify the
use of a "charset" parameter, nor any default value, in order to
avoid conflicting interpretations should the "charset" parameter
value and the value specified in the payload disagree.
Thus, new subtypes of the "text" media type SHOULD NOT define a
default "charset" value. If there is a strong reason to do so
despite this advice, they SHOULD use the "UTF-8" [RFC3629] charset as
the default.
Regardless of what approach is chosen, all new "text/*" registrations
MUST clearly specify how the charset is determined; relying on the
default defined in Section 4.1.2 of [RFC2046] is no longer permitted.
However, existing "text/*" registrations that fail to specify how the
charset is determined still default to US-ASCII.
seems to say that:
. it is preferable, for new types of text/* media, not to have any
default charset, unless there's a strong reason to the contrary
. all new text/* registrations must specify how the charset is
determined, and not rely on the default from RFC 2046
Is text/x-patch a "new media type" or not? If it is not new, then
where is it defined? I couldn't find it on the IANA site.
If it _is_ "new", my reading of the RFC is that we should not define
or expect any defaults, which means this bug is squarely in
Thunderbird's yard, and we shouldn't change Gnus to arbitrarily assume
UTF-8 as the default.
> I have filed a Thunderbird bug report for this, as Thunderbird should
> specify a charset; see
> <https://bugzilla.mozilla.org/show_bug.cgi?id=1167982>. However, Gnus
> should be a polite citizen and handle these attachments nicely rather
> than converting the non-ASCII UTF-8 characters to mojibake.
Does Gnus have a command to re-decode an already decoded MIME part?
If not, it should. But other than that, I don't see why we should
change Gnus in this regard, certainly not unconditionally assuming
UTF-8.
This bug report was last modified 6 years and 81 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.