GNU bug report logs -
#35507
Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
Previous Next
Reported by: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Tue, 30 Apr 2019 19:22:02 UTC
Severity: minor
Tags: fixed
Found in version 27
Done: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
On Tue 30 Apr 2019, Paul Eggert wrote:
> The attachment has a text/* media type but it has no charset parameter.
> The patch itself (output by git format-patch) says its charset is UTF-8.
> Unfortunately, Gnus doesn't recognize the patch as UTF-8 and so
> mishandles the non-ASCII characters in the attachment. To reproduce the
> problem, read this email with Gnus; the full attachment is attached to
> this email in the Thunderbird way.
>
> Although Internet RFC 2046 section 4.1.2 says the default charset for
> text/* media types is US-ASCII, Internet RFC 6557 section 3 amends this
> to say that registered text/* media types should require a charset
> specification (or should say it's not needed because the payload has
> that info, which obviously doesn't apply here). It later says that if
> there is a strong reason to have a charset default, the default should
> be UTF-8.
>
> Unfortunately Gnus apparently doesn't default to UTF-8 for such
> attachments, which means that sending a text/x-patch attachment from
> Thunderbird to Gnus messes up if the attachment contains non-ASCII
> characters. This has been causing problems on the Emacs mailing list for
> years and it bit a correspondent of mine again today; see
> <https://debbugs.gnu.org/cgi/bugreport.cgi?bug=35502#35>.
>
> I have filed a Thunderbird bug report for this, as Thunderbird should
> specify a charset; see
> <https://bugzilla.mozilla.org/show_bug.cgi?id=1167982>. However, Gnus
> should be a polite citizen and handle these attachments nicely rather
> than converting the non-ASCII UTF-8 characters to mojibake.
After a bit of experimenting, this minimal patch appears to fix things.
Should this also allow the user to choose the charset if none is
specified, or just hardwire it to utf-8 ?
diff --git a/lisp/gnus/mm-decode.el b/lisp/gnus/mm-decode.el
index 3f255419e7..a99d52a7e7 100644
--- a/lisp/gnus/mm-decode.el
+++ b/lisp/gnus/mm-decode.el
@@ -665,6 +665,9 @@ mm-dissect-buffer
(setq type (split-string (car ctl) "/"))
(setq subtype (cadr type)
type (car type))
+ ;; Fix missing charset in Thunderbird
+ (unless (assq 'charset (cdr ctl))
+ (push '(charset . utf-8) (cdr ctl)))
(setq
result
(cond
This bug report was last modified 6 years and 81 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.