GNU bug report logs -
#74624
29.4.50; Gnus cannot parse some filenames(UTF8) in an attachment
Previous Next
Reported by: Konstantin <reich-cv <at> yandex.ru>
Date: Sat, 30 Nov 2024 16:00:02 UTC
Severity: normal
Found in version 29.4.50
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Visuwesh <visuweshm <at> gmail.com> writes:
> [சனி நவம்பர் 30, 2024] Eli Zaretskii wrote:
>
>>> From: Konstantin <reich-cv <at> yandex.ru>
>>> Date: Sat, 30 Nov 2024 18:59:25 +0300
>>>
>>> >From time to time i get emails with attachments from my colleges, which they send from
>>> "Roundcube" web-interface.
>>>
>>> Often, i cannot open these attachments by =RET=(gnus-article-press-button)
>>> or save them =o=(gnus-mime-save-part) with correct name.
>>> (interestingly =X-m=(gnus-summary-save-parts) works correctly)
>>>
>>> The reason is gnus cannot parse correctly some attached filenames.
>>>
>>> The example of such attachment (I took it from gnus-summary-show-raw-article)
>>>
>>> --=_d38c0abddd645077f401d42fa430d9d5
>>> Content-Transfer-Encoding: base64
>>> Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document;
>>> name="=?UTF-8?Q?=D0=9E=D0=B1=D0=B7=D0=BE=D1=80_2024_=28=D0=BD=D0=B0_=2Ed?=
>>> =?UTF-8?Q?ocx?="
>>> Content-Disposition: attachment;
>>> filename*0*=UTF-8''%D0%9E%D0%B1%D0%B7%D0%BE%D1%80%202024%20%28%D0%BD%D0;
>>> filename*1*=%B0%20.docx;
>>> size=10
>>>
>>> c2Rmc2FmYXNmCg==
>>> --=_d38c0abddd645077f401d42fa430d9d5--
>>>
>>> I have tried to examine the reason. As i see it,
>>> gnus-data for such attachment is formed incorrectly:
>>>
>>> (#<buffer *mm*-480444>
>>> ("application/vnd.openxmlformats-officedocument.word..."
>>> (name . "О️бзор 2024 (на .docx"))
>>> base64 nil
>>> ("attachment" (size . "10")
>>> (filename . "О️бзор 2024 (н\320")) nil nil nil)
>>>
>>> One can see that the filename is broken.
>>> It should be "О️бзор 2024 (на .docx" just like the name.
>>
>> It looks like Gnus fails to decipher the file name when it is split in
>> the middle of a UTF-8 sequence.
>>
>> I don't know Gnus. If you can help me by showing where the value of
>> 'gnus-data property is calculated, I might be able to find the bug and
>> suggest a fix.
>
> The decoding of the filename in the Content-Disposition header is done
> in mm-dissect-buffer by calling mail-header-parse-content-disposition.
> Specifically, rfc2231-parse-string. The following patch fixes the issue
> on my end:
>
> diff --git a/lisp/mail/rfc2231.el b/lisp/mail/rfc2231.el
> index 33324cafb5b..632e270a922 100644
> --- a/lisp/mail/rfc2231.el
> +++ b/lisp/mail/rfc2231.el
> @@ -193,7 +193,7 @@ rfc2231-parse-string
> (push (list attribute value encoded) cparams))
> ;; Repetition of a part; do nothing.
> ((and elem
> - (null number))
> + (null part))
> )
> ;; Concatenate continuation parts.
> (t
>
> NUMBER is the variable used during the parsing portion of the function
> in the big condition-case form above the cl-loop form which the patch
> modifies. In the header below
>
> Content-Disposition: attachment;
> filename*0*=UTF-8''%D0%9E%D0%B1%D0%B7%D0%BE%D1%80%202024%20%28%D0%BD%D0;
> filename*1*=%B0%20.docx;
> size=10
>
> the function first parses filename*0* and here NUMBER is 0, then
> filename*1* and here NUMBER is 1. By the time it finishes parsing size,
> NUMBER is set to nil. The loop should use the value of NUMBER pushed to
> PARAMETERS as the 3rd element (referred to as `part' by the cl-loop
> form) instead of whatever value NUMBER happened to be when we parsed the
> last element.
Thank you,
indeed the patch fixes this bug.
This bug report was last modified 171 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.