From unknown Mon Jun 16 23:47:01 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#15984 <15984@debbugs.gnu.org> To: bug#15984 <15984@debbugs.gnu.org> Subject: Status: 24.3; Problem with combining characters in attachment filename Reply-To: bug#15984 <15984@debbugs.gnu.org> Date: Tue, 17 Jun 2025 06:47:01 +0000 retitle 15984 24.3; Problem with combining characters in attachment filename reassign 15984 emacs submitter 15984 nisse@lysator.liu.se (Niels M=C3=B6ller) severity 15984 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Thu Nov 28 03:32:53 2013 Received: (at submit) by debbugs.gnu.org; 28 Nov 2013 08:32:53 +0000 Received: from localhost ([127.0.0.1]:48347 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vlx1n-0008Iy-Qw for submit@debbugs.gnu.org; Thu, 28 Nov 2013 03:32:52 -0500 Received: from eggs.gnu.org ([208.118.235.92]:35254) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vlx1k-0008Ih-O6 for submit@debbugs.gnu.org; Thu, 28 Nov 2013 03:32:49 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Vlx1X-0005iL-Cs for submit@debbugs.gnu.org; Thu, 28 Nov 2013 03:32:43 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:42975) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vlx1X-0005iF-9p for submit@debbugs.gnu.org; Thu, 28 Nov 2013 03:32:35 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36870) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vlx1S-0006DY-48 for bug-gnu-emacs@gnu.org; Thu, 28 Nov 2013 03:32:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Vlx1P-0005g2-8B for bug-gnu-emacs@gnu.org; Thu, 28 Nov 2013 03:32:30 -0500 Received: from vindbrygga.lysator.liu.se ([2001:6b0:17:f0a0::de]:48239 helo=bacon.lysator.liu.se) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vlx1O-0005dz-T7 for bug-gnu-emacs@gnu.org; Thu, 28 Nov 2013 03:32:27 -0500 Received: from bacon.lysator.liu.se (localhost [127.0.0.1]) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5) with ESMTP id rAS88uGW012358 for ; Thu, 28 Nov 2013 09:08:56 +0100 (MET) Received: (from nisse@localhost) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5/Submit) id rAS88tej012357; Thu, 28 Nov 2013 09:08:55 +0100 (MET) X-Authentication-Warning: bacon.lysator.liu.se: nisse set sender to nisse@lysator.liu.se using -f From: nisse@lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) To: bug-gnu-emacs@gnu.org Subject: 24.3; Problem with combining characters in attachment filename Date: Thu, 28 Nov 2013 09:08:54 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (usg-unix-v) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by bacon.lysator.liu.se id rAS88uGW012358 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.3 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.3 (----) I'm reading email with Gnus. I received an email with an attachment containing the headers Content-Type: application/pdf; name=3D"Brev =3D?UTF-8?B?YWt0aWVhzIhnYXIgMTMxMTI3LnBkZg=3D=3D?=3D" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*0*=3DUTF-8''%42%72%65%76%20%61%6B%74%69%65%61%CC%88%67%61%72%= 20%31; filename*1*=3D%33%31%31%32%37%2E%70%64%66 Apparently sent by a Mac user, User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko= /20100101 Thunderbird/24.1.1 The attachement was displayed in the *Article* buffer as [2. application/pdf; Brev aktiea?gar 131127.pdf]... I was running emacs-24.3 in a tty, in a latin-1 locale, on a sparc Solaris system. (In a latin-1 tty, emacs ought to display "=E4" instead o= f "a?", but that's a less severe and possibly unrelated problem). SunOS bacon 5.10 Generic_147147-26 sun4u sparc SUNW,Sun-Fire-15000 When I tried to save the attachment by pressing "o" on that button (gnus-mime-save-part), emacs immediately crashed with a segmentation violation signal. Since emacs very rarely crashes, I was a bit surprised. I just restarted emacs and Gnus and tried again, and it crashed again. So at least for me, the problem is reproducible. And a crash triggered by untrusted data in a received email is always scary. After fixing the bug, exploit possibilities ought to be analyzed. The gdb backtrace, based on the generated core file, looks like this: (gdb) bt #0 0xfec4ebd4 in _lwp_kill () from /lib/libc.so.1 #1 0xfebe7bb8 in raise () from /lib/libc.so.1 #2 0x000e7f78 in terminate_due_to_signal () #3 0x00103d04 in handle_fatal_signal () #4 0x001037d0 in deliver_thread_signal () #5 0xfec4b014 in __sighndlr () from /lib/libc.so.1 #6 0xfec3f6c4 in call_user_handler () from /lib/libc.so.1 #7 #8 0x000b5748 in char_table_ref () #9 0x001ad54c in composition_compute_stop_pos () #10 0x001266ec in scan_for_column () #11 0x00127328 in current_column () #12 0x00114cec in read_minibuf () #13 0x00115688 in Fread_from_minibuffer () #14 0x0015c538 in Ffuncall () #15 0x00190de0 in exec_byte_code () #16 0x0015c368 in Ffuncall () #17 0x001158a0 in Fcompleting_read () #18 0x0015c4e4 in Ffuncall () #19 0x00190de0 in exec_byte_code () #20 0x0015c368 in Ffuncall () #21 0x00190de0 in exec_byte_code () #22 0x0015c368 in Ffuncall () #23 0x00190de0 in exec_byte_code () #24 0x0015bf18 in funcall_lambda () #25 0x0015c368 in Ffuncall () #26 0x00190de0 in exec_byte_code () #27 0x0015bf18 in funcall_lambda () #28 0x0015c368 in Ffuncall () #29 0x0015cbf0 in apply1 () #30 0x001573b4 in Fcall_interactively () #31 0x0015c574 in Ffuncall () #32 0x0015c77c in call3 () #33 0x000f0ac0 in Fcommand_execute () #34 0x000f829c in command_loop_1 () #35 0x001591dc in internal_condition_case () #36 0x000ea2a0 in command_loop_2 () #37 0x001590c0 in internal_catch () #38 0x000ea11c in recursive_edit_1 () #39 0x000ea264 in Frecursive_edit () #40 0x000e9b28 in main () The emacs binary I use appear to have been stripped, so bt full gives no additional information, and xbacktrace fails with No symbol "CHECK_LISP_OBJECT_TYPE" in current context. If I decode the base-64 part of the Content-type "name" value, I get $ od -tx1c fname.txt=20 0000000 61 6b 74 69 65 61 cc 88 67 61 72 20 31 33 31 31 a k t i e a 314 210 g a r 1 3 1 1 0000020 32 37 2e 70 64 66 2 7 . p d f 0000026 So it appears to contain the character "=E4" (a with two dots), coded as "a" followed by a unicode combining character. All in utf-8. If I run cat fname.txt in xterm with a utf-8 locale, it displays the string as "aktie=E4gar 131127.pdf", which seems correct. I don't understand the meaning of the Content-disposition: header, but I guess it's possible that Content-type: ...; name=3D *is* processed correctly, and it's the code processing Content-disposition which crashes. But looking at the backtrace, it looks like the problem is related to handling of combining characters. Below is the info generated by report-emacs-bug, except that I deleted recent input and recent messages, since the problem was in the emacs process which crashed, not in this one where I'm composing this message. Environment should otherwise be identical (same emacs, same Gnus, same machine, same tty). Regards, /Niels In GNU Emacs 24.3.1 (sparc-sun-solaris2.10, X toolkit, Xaw scroll bars) of 2013-03-15 on stalhein Configured using: `configure '--prefix=3D/pkg/emacs/sparc-sol10/24.3' '--with-gif=3Dno' '--with-jpeg=3Dno' '--with-tiff=3Dno' '--with-png=3Dno' '--with-dbus=3Dn= o' '--with-gsettings=3Dno' '--with-gnutls=3Dno' 'CC=3Dgcc' 'CFLAGS=3D-O2 -m= cpu=3Dv9' 'LDFLAGS=3D-L/usr/local/lib -R/usr/local/lib' 'CPPFLAGS=3D-I/usr/local/include'' Important settings: value of $LC_COLLATE: C value of $LC_CTYPE: sv_SE.ISO8859-1 value of $LC_MESSAGES: C value of $LC_MONETARY: en_US.ISO8859-1 value of $LC_NUMERIC: en_US.ISO8859-1 value of $LC_TIME: en_US.ISO8859-1 locale-coding-system: iso-latin-1-unix default enable-multibyte-characters: t Major mode: Summary Minor modes in effect: type-break-mode: t tooltip-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t buffer-read-only: t line-number-mode: t transient-mark-mode: t Recent input: [omitted] Recent messages: [omitted] Features: (shadow emacsbug help-mode sort ansi-color gnus-cite flow-fill mm-archive mail-extr gnus-async gnus-bcklg qp parse-time gnus-ml disp-table misearch multi-isearch gnus-topic byte-opt bytecomp byte-compile cconv nndraft nnmh nnml gnus-agent gnus-srvr gnus-score score-mode nnvirtual gnus-msg gnus-art mm-uu mml2015 epg-config mm-view mml-smime smime password-cache dig mailcap nntp gnus-cache gnus-sum nnoo gnus-group gnus-undo nnmail mail-source gnus-start gnus-spec gnus-int gnus-range message sendmail format-spec rfc822 mml mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045 ietf-drums mailabbrev gmm-utils mailheader gnus-win gnus gnus-ems nnheader gnus-util mail-utils mm-util mail-prsvr wid-edit bbdb-autoloads package cl-macs gv bookmark pp recurse cl time-date type-break uniquify advice help-fns cl-lib advice-preload info easymenu tooltip ediff-hook vc-hooks lisp-float-type mwheel x-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment lisp-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote make-network-process dynamic-setting x-toolkit x multi-tty emacs) --=20 Niels M=F6ller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. From debbugs-submit-bounces@debbugs.gnu.org Thu Nov 28 15:25:28 2013 Received: (at 15984) by debbugs.gnu.org; 28 Nov 2013 20:25:28 +0000 Received: from localhost ([127.0.0.1]:49208 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vm89P-0003EO-9S for submit@debbugs.gnu.org; Thu, 28 Nov 2013 15:25:27 -0500 Received: from mtaout22.012.net.il ([80.179.55.172]:52605) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vm89J-0003E5-HF for 15984@debbugs.gnu.org; Thu, 28 Nov 2013 15:25:23 -0500 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MWZ00C00PTJEU00@a-mtaout22.012.net.il> for 15984@debbugs.gnu.org; Thu, 28 Nov 2013 22:25:15 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MWZ00CP1Q215LA0@a-mtaout22.012.net.il>; Thu, 28 Nov 2013 22:25:13 +0200 (IST) Date: Thu, 28 Nov 2013 22:25:01 +0200 From: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In-reply-to: X-012-Sender: halo1@inter.net.il To: nisse@lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) Message-id: <83iovc8eaq.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 8BIT References: X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > From: nisse@lysator.liu.se (Niels Möller) > Date: Thu, 28 Nov 2013 09:08:54 +0100 > > I'm reading email with Gnus. I received an email with an attachment > containing the headers > > Content-Type: application/pdf; > name="Brev =?UTF-8?B?YWt0aWVhzIhnYXIgMTMxMTI3LnBkZg==?=" > Content-Transfer-Encoding: base64 > Content-Disposition: attachment; > filename*0*=UTF-8''%42%72%65%76%20%61%6B%74%69%65%61%CC%88%67%61%72%20%31; > filename*1*=%33%31%31%32%37%2E%70%64%66 > > Apparently sent by a Mac user, > > User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 > > The attachement was displayed in the *Article* buffer as > > [2. application/pdf; Brev aktiea?gar 131127.pdf]... > > I was running emacs-24.3 in a tty, in a latin-1 locale, on a sparc > Solaris system. (In a latin-1 tty, emacs ought to display "ä" instead of > "a?", but that's a less severe and possibly unrelated problem). If ä was supposed to be produced by character compositions, then Emacs cannot do that on a TTY, because compositions require drawing one glyph over the other (with certain offsets). If you expected Emacs to perform normalization in this case, then I don't think we do this automatically (or at all). > When I tried to save the attachment by pressing "o" on that button > (gnus-mime-save-part), emacs immediately crashed with a segmentation > violation signal. Since emacs very rarely crashes, I was a bit > surprised. I just restarted emacs and Gnus and tried again, and it > crashed again. So at least for me, the problem is reproducible. Can you send that message as a binary attachment? > And a crash triggered by untrusted data in a received email is always > scary. After fixing the bug, exploit possibilities ought to be analyzed. I suggest to try a recent development trunk, several similar crashes were fixed a few months ago. If that doesn't help, please reproduce the problem in a non-optimized non-stripped build, and show the variables from char_table_ref that are involved in the crash. (I'm guessing char_table_ref got a bogus character code.) From debbugs-submit-bounces@debbugs.gnu.org Thu Nov 28 17:17:12 2013 Received: (at 15984) by debbugs.gnu.org; 28 Nov 2013 22:17:12 +0000 Received: from localhost ([127.0.0.1]:49291 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vm9tY-0006Ar-32 for submit@debbugs.gnu.org; Thu, 28 Nov 2013 17:17:12 -0500 Received: from bacon.lysator.liu.se ([130.236.254.206]:56544) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vm9tV-0006Ad-2z for 15984@debbugs.gnu.org; Thu, 28 Nov 2013 17:17:10 -0500 Received: from bacon.lysator.liu.se (localhost [127.0.0.1]) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5) with ESMTP id rASMH7bn006570; Thu, 28 Nov 2013 23:17:07 +0100 (MET) Received: (from nisse@localhost) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5/Submit) id rASMH6tF006569; Thu, 28 Nov 2013 23:17:06 +0100 (MET) X-Authentication-Warning: bacon.lysator.liu.se: nisse set sender to nisse@lysator.liu.se using -f From: nisse@lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) To: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename References: <83iovc8eaq.fsf@gnu.org> Date: Thu, 28 Nov 2013 23:17:06 +0100 In-Reply-To: <83iovc8eaq.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 28 Nov 2013 22:25:01 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (usg-unix-v) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.7 (/) Eli Zaretskii writes: > If you expected Emacs to perform normalization in this case, then I > don't think we do this automatically (or at all). I think for display, normalizing is definitely the right thing to do (the unicode spec, as I understand it, require that a "compliant" implementation treats different ways to code "ä" equivalently). But I understand if emacs currenty doesn't do that. (Digression: I think text-processor supporting unicode really ought to represent "characters" as interned strings of unicode (or utf-8) code points. These characters can have relations such as "normalized to", and glyphs should usually be associated only with the normalized form. One could also have configurable rules for character boundaries, as is described in the unicode book, or at least was in the version which was current when I tried to read up on this some years ago). > Can you send that message as a binary attachment? It's not very sensitive (it's about shares and options for a company I used to be employed by), but I'd prefer it not to be posted publicly on the bugtracker, or widely distributed among emacs hackers. I'll try to send you a private mail with the bulk of the message with the body of the attachment replaced (the base64 text in the raw message; if the problem really is with the attachment headers, that shouldn't matter); if that's for some reason not usable, I'll send you the complete message. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. From debbugs-submit-bounces@debbugs.gnu.org Thu Nov 28 17:46:37 2013 Received: (at 15984) by debbugs.gnu.org; 28 Nov 2013 22:46:37 +0000 Received: from localhost ([127.0.0.1]:49326 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmAM1-0006vK-5B for submit@debbugs.gnu.org; Thu, 28 Nov 2013 17:46:37 -0500 Received: from bacon.lysator.liu.se ([130.236.254.206]:57374) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmALx-0006vB-V6 for 15984@debbugs.gnu.org; Thu, 28 Nov 2013 17:46:35 -0500 Received: from bacon.lysator.liu.se (localhost [127.0.0.1]) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5) with ESMTP id rASMkWaG007357; Thu, 28 Nov 2013 23:46:32 +0100 (MET) Received: (from nisse@localhost) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5/Submit) id rASMkWtU007356; Thu, 28 Nov 2013 23:46:32 +0100 (MET) X-Authentication-Warning: bacon.lysator.liu.se: nisse set sender to nisse@lysator.liu.se using -f From: nisse@lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) To: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename References: <83iovc8eaq.fsf@gnu.org> Date: Thu, 28 Nov 2013 23:46:31 +0100 In-Reply-To: ("Niels =?iso-8859-1?Q?M=F6ller=22's?= message of "Thu, 28 Nov 2013 23:17:06 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (usg-unix-v) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.7 (/) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable nisse@lysator.liu.se (Niels M=C3=B6ller) writes: >> Can you send that message as a binary attachment? > > It's not very sensitive (it's about shares and options for a company I > used to be employed by), but I'd prefer it not to be posted publicly on > the bugtracker, or widely distributed among emacs hackers. I've now created a smaller an anonymized example. I tried to mail it to myself with sendmail -t, to confirm it still crashes emacs. Mailing for some reason didn't work, but the bounce I got back is a good enough example: It is displayed by Gnus with a button looking like [5. application/pdf; Brev aktiea=CC=88gar 131127.pdf]... and pressing "o" on that makes emacs crash, just as withh the original message. Attached in gzip form. I hope emacs doesn't automagically unpack and display the buttons for the embedded attachment when you read this in emacs, but if it does, be careful. Regards, /Niels --=-=-= Content-Type: application/octet-stream Content-Disposition: attachment; filename=bounce.gz Content-Transfer-Encoding: base64 Content-Description: Compressed problem message H4sICHPEl1IAA2JvdW5jZQC9WGtz2joa/lz/Ck1naJOeyNjmalO3IUBadkOSBXJpO52MsGXQxkis LEM4v3d/yL6yCUkonKY97SaZgGRdnue9v+5TlUqOz4maeOjtO+Man0k2ZpzEeCg8xFmS0MN4mRAl pBmz1Eyo0aYxm1NJw91L+jSgsCb00GiJpoTF5tMFaO9cJCpid/vGCxaiSrVuOWXLspoNNJykB8ip o1MxR45ll5BT8hzXs130h2VbFtprdYb7Rpso6j1z7bEUUw/1mt2TTh+3m53e2enhJpweYEQrYks0 WCaKTveNQTr6Nw2Uhy54eE8aZUv7mdxgpAQaUHgqjZ3CaKZKYDhqypTSIiF6LOksZjQ0et1eB19S mTDBPWSbltESXFGu8HA5A4rTNFZsRqQqwg4hVQPln1jBY3+FaokTRVSaNIwXI5HykMil/7Lstps1 EKpTM+1SvVKt1WoVt7hFGS8frpSEJxGVuMMDETI+9lB9xBRYxQeeJlgzxwORykADg0HIpFecE5md WszIF3v5fJHThdGjSULGFHeB9VutINt26o7juLZrPuj8cAumd8YJ4zQBidiOcS1pBJZEAsE37Sjb GsG/VMJqp1azDGM4YQmCP4Iy4VIekFmSxmAxIZrmiEzDwPjZAlrLp02TQLKZynR1KhSLWED0aENp it6p4iwmjDdQMCEyocoH+ZEkYOwBn5rk+FGSmRsiCk3AK7b5C+Dtvp6iREiwTjC5CZlT/cl4JOQU LUUKp8F++CLvOaJApHGIuFDGiKIH+4VtglMkYKGQFMwpYDMG2BMTddVrEJtSJJjAwhGNxQJuPoal USoBrkQkSRjYGg/oAZrFlIAKEjD/nAecPAMCUwJ0pIYcZchCAcDXyxkP4jQE9FoKMylGMVDPbdpE n2B5QLgBaKmiOR2x4JlAUQR+nAltjVDee+GDWtG3P8OncjaMt9vc9F1u0ggEuiBSGz+KhZjp8Xa3 /vsWtI43Of1Nz885FTd83IDgqlcDQNwbNj0U8qSxzWbAa1dBFv8rpSl4YdtDD4gfPc4DmIdkFIB3 NnZEMSnZHNLCj0Re4zjLJP17E/vOFevU8+2G7ZCCXI7a/yGUDjL5eKhils2q0WZkzIEgC3BLhAB5 zbfxf1f0k/SRq3WHsnO63w3IRv9J3t5qz4+ScOY5W9NwLAISZ1Hniw3kLPi1v0JS/l7aRgumJqgz 6A3PESTwR2b1QstxO6LnJvcN4CMCKTAwmVqj+GbmS/d8XvUcy7K96sjy7JoXWVbJ89zaT5AptSqt 4ywz/QYyFGsoEYshRFolM0k5VRmYXQ82qNU9x4NRacXrW9nsIOV0LPfYPa5Uj36Lhrak5r054+FI Lsdjsvlom7qI5XkhXdHaKaW6aZfNUnH10aYjRjh2zfImXdmEb93jK8t2nVLZeLE3z0ssf3gymNvF weBkXkIQZCCn+e2PHdwfNHGzM3AqVTz42ETgY4kPAwTbWLT0T8/2f05s9VxszxPYXzjj9g2ZGCp/ DFKeS6SyVQwfm2eWVa1XKz9HoXav+d5Tze9lTFaH3SP/IbTFvCjef0Dqfs6QlqGQfR6cX96v/PZQ 2So1j8sQXezvRJcrGgL/2iP+tmeXvJLzxBGv8SWTUFoOoGriGn/uEohMyZwlIYYqXNeVGuB4YTvg STLCCkoqEzwKdg9mZAqboRD0kGPqY9+82UdfhmSst1VM2/qKhjf985tec9j62Bnc9FuXbQ8Defvg iCxpsmeZdceFqGRa+/fnndA5jT20Pn6Vl0+hCEz0VT7sr9ZdpMh4TMMbMhJzmHNdFwqh/6QMNOVD HlE0ASf80jw5uRn2LwbDTtvHtlkuH6BBs9+5OTs6vriBh75l1ir216yrggpTcj9kCRnpguBvty1Q ML2tOG614lRapmvVLdeyDukdmUItawo5frfqQ3foyq6uvD/vQN9GQjzdfZFAam+Os0KnJ/5kcUyK FVPbNgkYVyKZNFAXKoEYms4AnQ3QNbKBLvSAc88pg8TRBxrciqKjb7EtW/uwLuVGTIZFWGCb9pYG c6cXrNtdALqzDZ2yOxo2DPTQaOJHP1bFKoOcLPhfhd8yoII2z6q8BGVk9oLPoeT3UG41aC8zPeQc QO9wS3mSud92dAdoNcYwxgmFZBERQAXzG+MReTw6POyfD0+1bc6sCHehYfKQSPwrSE5ikaDTIbql 0EPE+hx+63d0mwO+kfdHIbQL17hFeFfhDxR2spm/LW01dLPFFUhj0LnfcN4/A9OXlAChDejb+KE9 xuFqSDx5DNrBdmP4mOt+fjNTmcclmflao38OoUO9KyOMolG9StzKaFQPIhje9+Owq8kVS7SrQr3J uO4soOcbU7CEiVKzxCsWA33uOg/nw+LInE1m75m/vuTV1H98ySvl31/yKvCjrRedCq4nfulNfOtN g199TbI2jaYMJpA+cCtOdevrofOo16fF+T+ueux60XM+dT9Wji47d/yCL3RUzIM1Plp6KDsA7S0W C2QiKYgEqDPKxynjMBGI6ZYY34rJtHmJ0kT3Lnqgc8vj9x6Zt2LtruuXAXCcjgS6y5kSZRoYP8dt n/Feozs4w/V6xcV2Y3W4H8ViAeH3u03MBxGi2/l/4/jAMH4OEJnN4tUrmOIsjCAscTKl/ssjSefI f38xPMb190fvP10pi1xdTv7sTvin6+64N+zdQXlYOuFHt5/Hvv/e/8t3YNrJquWH1g4MSyQsb+3y dxFTmIfLoWClGsAb642f3f36daHsFGpOoVop1KoFxypU7UL1qFArF6qunoRhq1Wo1wvVmv4OK2FN yX58lv3GL5RKMJn9OYVSreB0CjU4Cg6pGsbnqTtvAYlnShDjH2poYfn/ABxFyX8kFgAA --=-=-= Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 29 02:17:06 2013 Received: (at 15984) by debbugs.gnu.org; 29 Nov 2013 07:17:06 +0000 Received: from localhost ([127.0.0.1]:49624 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmIK1-00048z-GU for submit@debbugs.gnu.org; Fri, 29 Nov 2013 02:17:06 -0500 Received: from mtaout22.012.net.il ([80.179.55.172]:49168) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmIJx-00048D-0q for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 02:17:02 -0500 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MX000H00K575B00@a-mtaout22.012.net.il> for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 09:16:54 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MX000GO0K86WA70@a-mtaout22.012.net.il>; Fri, 29 Nov 2013 09:16:54 +0200 (IST) Date: Fri, 29 Nov 2013 09:16:44 +0200 From: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In-reply-to: X-012-Sender: halo1@inter.net.il To: nisse@lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) Message-id: <83a9gn8yoz.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 8BIT References: <83iovc8eaq.fsf@gnu.org> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > From: nisse@lysator.liu.se (Niels Möller) > Cc: 15984@debbugs.gnu.org > Date: Thu, 28 Nov 2013 23:17:06 +0100 > > Eli Zaretskii writes: > > > If you expected Emacs to perform normalization in this case, then I > > don't think we do this automatically (or at all). > > I think for display, normalizing is definitely the right thing to do > (the unicode spec, as I understand it, require that a "compliant" > implementation treats different ways to code "ä" equivalently). > But I understand if emacs currenty doesn't do that. Someone(TM) should write the code to do that. > (Digression: I think text-processor supporting unicode really ought to > represent "characters" as interned strings of unicode (or utf-8) code > points. That's what Emacs does since v23.1 (except that we extend the range of Unicode codepoints to represent some non-unified characters and binary raw bytes). > These characters can have relations such as "normalized to" This part requires incorporation of tables and supporting code, which needs to be written. > glyphs should usually be associated only with the normalized form. Here I disagree. There are definitely situations where this is not TRT, and they aren't "unusual". > I'll try to send you a private mail with the bulk of the message with > the body of the attachment replaced (the base64 text in the raw message; > if the problem really is with the attachment headers, that shouldn't > matter); if that's for some reason not usable, I'll send you the > complete message. Thanks. I'd also need instructions to display that message in Gnus after saving it to a file, starting with "emacs -Q", as I don't have Gnus set up and don't use it in my day-to-day work. From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 29 03:49:20 2013 Received: (at 15984) by debbugs.gnu.org; 29 Nov 2013 08:49:21 +0000 Received: from localhost ([127.0.0.1]:49690 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmJlI-0006di-Is for submit@debbugs.gnu.org; Fri, 29 Nov 2013 03:49:20 -0500 Received: from bacon.lysator.liu.se ([130.236.254.206]:32795) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmJlF-0006dU-IN for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 03:49:18 -0500 Received: from bacon.lysator.liu.se (localhost [127.0.0.1]) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5) with ESMTP id rAT8nFfY023112; Fri, 29 Nov 2013 09:49:15 +0100 (MET) Received: (from nisse@localhost) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5/Submit) id rAT8nF31023111; Fri, 29 Nov 2013 09:49:15 +0100 (MET) X-Authentication-Warning: bacon.lysator.liu.se: nisse set sender to nisse@lysator.liu.se using -f From: nisse@lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) To: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename References: <83iovc8eaq.fsf@gnu.org> <83a9gn8yoz.fsf@gnu.org> Date: Fri, 29 Nov 2013 09:49:15 +0100 In-Reply-To: <83a9gn8yoz.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 29 Nov 2013 09:16:44 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (usg-unix-v) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.7 (/) Eli Zaretskii writes: >> (Digression: I think text-processor supporting unicode really ought to >> represent "characters" as interned strings of unicode (or utf-8) code >> points. > > That's what Emacs does since v23.1 (except that we extend the range of > Unicode codepoints to represent some non-unified characters and binary > raw bytes). Good! I thought emacs used a simpler mapping character <-> a single unicode value. > > glyphs should usually be associated only with the normalized form. > > Here I disagree. There are definitely situations where this is not > TRT, and they aren't "unusual". Ok. What's the typical use case where you'd want to have different glyphs for "Å", "A" + ring above combining char, and Ångström unit sign? > Thanks. I'd also need instructions to display that message in Gnus > after saving it to a file, starting with "emacs -Q", as I don't have > Gnus set up and don't use it in my day-to-day work. I'm also not sure how to do that, but I'll try to figure out. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 29 04:00:38 2013 Received: (at 15984) by debbugs.gnu.org; 29 Nov 2013 09:00:38 +0000 Received: from localhost ([127.0.0.1]:49722 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmJwC-0006wd-OQ for submit@debbugs.gnu.org; Fri, 29 Nov 2013 04:00:37 -0500 Received: from mtaout20.012.net.il ([80.179.55.166]:34328) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmJw8-0006wL-2V for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 04:00:33 -0500 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MX000400P0IK500@a-mtaout20.012.net.il> for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 11:00:25 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MX0004FXP0PCI90@a-mtaout20.012.net.il>; Fri, 29 Nov 2013 11:00:25 +0200 (IST) Date: Fri, 29 Nov 2013 11:00:15 +0200 From: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In-reply-to: X-012-Sender: halo1@inter.net.il To: nisse@lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) Message-id: <831u1z8twg.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 8BIT References: <83iovc8eaq.fsf@gnu.org> <83a9gn8yoz.fsf@gnu.org> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > From: nisse@lysator.liu.se (Niels Möller) > Cc: 15984@debbugs.gnu.org > Date: Fri, 29 Nov 2013 09:49:15 +0100 > > Eli Zaretskii writes: > > >> (Digression: I think text-processor supporting unicode really ought to > >> represent "characters" as interned strings of unicode (or utf-8) code > >> points. > > > > That's what Emacs does since v23.1 (except that we extend the range of > > Unicode codepoints to represent some non-unified characters and binary > > raw bytes). > > Good! I thought emacs used a simpler mapping character <-> a single > unicode value. Maybe I misunderstood you: what's the difference between those two alternatives? > > > glyphs should usually be associated only with the normalized form. > > > > Here I disagree. There are definitely situations where this is not > > TRT, and they aren't "unusual". > > Ok. What's the typical use case where you'd want to have different > glyphs for "Å", "A" + ring above combining char, and Ångström unit sign? MacOS file names, I think. Also, display in "C-u C-x =", which is very important for understanding and debugging Emacs display features. > > Thanks. I'd also need instructions to display that message in Gnus > > after saving it to a file, starting with "emacs -Q", as I don't have > > Gnus set up and don't use it in my day-to-day work. > > I'm also not sure how to do that, but I'll try to figure out. Thanks. From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 29 05:43:50 2013 Received: (at 15984) by debbugs.gnu.org; 29 Nov 2013 10:43:51 +0000 Received: from localhost ([127.0.0.1]:49780 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmLY6-000134-FE for submit@debbugs.gnu.org; Fri, 29 Nov 2013 05:43:50 -0500 Received: from bacon.lysator.liu.se ([130.236.254.206]:33929) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmLY3-00012r-Cj for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 05:43:49 -0500 Received: from bacon.lysator.liu.se (localhost [127.0.0.1]) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5) with ESMTP id rATAhjB8025950; Fri, 29 Nov 2013 11:43:45 +0100 (MET) Received: (from nisse@localhost) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5/Submit) id rATAhjxE025949; Fri, 29 Nov 2013 11:43:45 +0100 (MET) X-Authentication-Warning: bacon.lysator.liu.se: nisse set sender to nisse@lysator.liu.se using -f From: nisse@lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) To: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename References: <83iovc8eaq.fsf@gnu.org> <83a9gn8yoz.fsf@gnu.org> <831u1z8twg.fsf@gnu.org> Date: Fri, 29 Nov 2013 11:43:45 +0100 In-Reply-To: <831u1z8twg.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 29 Nov 2013 11:00:15 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (usg-unix-v) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.7 (/) Eli Zaretskii writes: >> Good! I thought emacs used a simpler mapping character <-> a single >> unicode value. > > Maybe I misunderstood you: what's the difference between those two > alternatives? What I think is the right thing, is to allow a sequence of unicode values, e.g., "A" + combining character, or "A" + any random sequence of combining characters, intern this string, and treat this as a single "character". The idea is that this character object should correspond to what the user thinks of as a single character. E.g, one glyph per character, and treated as a unit by forward-char, and regexp matching with "." and character sets. When reading text files, the character boundaries may be configurble. E.g, there could be a mode which makes each and every unicode value a single character, which will then be displayed as separate glyphs, separate characters for regexp matching, etc. >> > Thanks. I'd also need instructions to display that message in Gnus >> > after saving it to a file, starting with "emacs -Q", as I don't have >> > Gnus set up and don't use it in my day-to-day work. Move away any gnus-related configuration files (~/.gnus, ~/.newsrc*). Create a spool-like directory, e.g, "~/tmp/mail". Copy the file to "~/tmp/mail/1". Start emacs -Q -nw -f gnus-no-server. In the *Group* buffer, press G d to create a directory group, enter ~/tmp/mail. You should now be able to enter that group, and select the message in the *Summary* buffer. To mimic my setup, do this in an xterm running in a latin-1 locale. (I have to send this off now, I'll try later to really see if this recipe reproduces the problem for me). I also tried to reproduce the problem on another machine, with debian gnu/linux and emacs-23.4. This version worked fine, no crash. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 29 06:27:12 2013 Received: (at 15984) by debbugs.gnu.org; 29 Nov 2013 11:27:12 +0000 Received: from localhost ([127.0.0.1]:49831 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmME3-00029P-1w for submit@debbugs.gnu.org; Fri, 29 Nov 2013 06:27:11 -0500 Received: from mtaout20.012.net.il ([80.179.55.166]:37696) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmMDz-00028v-9Z for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 06:27:08 -0500 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MX000500VIGKD00@a-mtaout20.012.net.il> for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 13:27:01 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MX00059PVT061C0@a-mtaout20.012.net.il>; Fri, 29 Nov 2013 13:27:01 +0200 (IST) Date: Fri, 29 Nov 2013 13:26:50 +0200 From: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In-reply-to: X-012-Sender: halo1@inter.net.il To: nisse@lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) Message-id: <83r49z78jp.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 8BIT References: <83iovc8eaq.fsf@gnu.org> <83a9gn8yoz.fsf@gnu.org> <831u1z8twg.fsf@gnu.org> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > From: nisse@lysator.liu.se (Niels Möller) > Cc: 15984@debbugs.gnu.org > Date: Fri, 29 Nov 2013 11:43:45 +0100 > > Eli Zaretskii writes: > > >> Good! I thought emacs used a simpler mapping character <-> a single > >> unicode value. > > > > Maybe I misunderstood you: what's the difference between those two > > alternatives? > > What I think is the right thing, is to allow a sequence of unicode > values, e.g., "A" + combining character, or "A" + any random sequence of > combining characters, intern this string, and treat this as a single > "character". That's not how Emacs represents and treats characters. The composition happens only at display time, and normalization, as it's currently implemented, happens when text is read into a buffer. Thereafter, each Unicode character is a single character, and there's no combining of them for any purpose except display. > The idea is that this character object should correspond to what the > user thinks of as a single character. E.g, one glyph per character, and > treated as a unit by forward-char, and regexp matching with "." and > character sets. What gets displayed as a single unit is a "grapheme cluster", not a single glyph. Whether a grapheme cluster that corresponds to "A" + any random sequence of combining characters maps to a single glyph depends on the font being used, which is something the user should not need to worry about. However, we do want to give the user a way to delete only one or more of the combining characters, so forcing the entire combination to be a single indivisible entity would not be TRT for users. Cursor motion does consider the entire thing as a single entity and moves across all of it, but that requires special code. IOW, things are not that simple, and I think the design you are suggesting is problematic in that it will remove several important features, or make them harder to implement. > When reading text files, the character boundaries may be configurble. The important question is what to do by default, as many users will not be happy if asked too many questions or requested to specify too many parameters for reading text. Compare this with the need to specify the encoding in too many cases in the early days of multilingual Emacs -- there was a user outcry about that. > E.g, there could be a mode which makes each and every unicode value a > single character, which will then be displayed as separate glyphs, > separate characters for regexp matching, etc. You are mixing display issues with editing issues and with how characters are represented internally in an Emacs buffer. These all are separate, and do not necessarily need to handle characters in the same rigid way. > Move away any gnus-related configuration files (~/.gnus, ~/.newsrc*). > > Create a spool-like directory, e.g, "~/tmp/mail". Copy the file to > "~/tmp/mail/1". Start emacs -Q -nw -f gnus-no-server. In the *Group* buffer, > press G d to create a directory group, enter ~/tmp/mail. You should now > be able to enter that group, and select the message in the *Summary* > buffer. > > To mimic my setup, do this in an xterm running in a latin-1 locale. (I > have to send this off now, I'll try later to really see if this recipe > reproduces the problem for me). Thanks, I will try that. From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 29 07:41:12 2013 Received: (at 15984) by debbugs.gnu.org; 29 Nov 2013 12:41:12 +0000 Received: from localhost ([127.0.0.1]:49876 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmNNf-00055U-By for submit@debbugs.gnu.org; Fri, 29 Nov 2013 07:41:12 -0500 Received: from bacon.lysator.liu.se ([130.236.254.206]:35044) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmNNX-000553-Bb for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 07:41:07 -0500 Received: from bacon.lysator.liu.se (localhost [127.0.0.1]) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5) with ESMTP id rATCf1lA029775; Fri, 29 Nov 2013 13:41:01 +0100 (MET) Received: (from nisse@localhost) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5/Submit) id rATCf1Cc029774; Fri, 29 Nov 2013 13:41:01 +0100 (MET) X-Authentication-Warning: bacon.lysator.liu.se: nisse set sender to nisse@lysator.liu.se using -f From: nisse@lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) To: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename References: <83iovc8eaq.fsf@gnu.org> <83a9gn8yoz.fsf@gnu.org> <831u1z8twg.fsf@gnu.org> <83r49z78jp.fsf@gnu.org> Date: Fri, 29 Nov 2013 13:41:01 +0100 In-Reply-To: <83r49z78jp.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 29 Nov 2013 13:26:50 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (usg-unix-v) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.7 (/) Eli Zaretskii writes: > However, we do want to give the user a way to > delete only one or more of the combining characters, so forcing the > entire combination to be a single indivisible entity would not be TRT > for users. Good question, how to handle this. Today, to remove the dots from an "ä" character, I'll have to delete the complete "ä" character and insert a new "a" character. Or similarly for the reverse edit. I think this "atomic" handling is the desired behaviour in many cases. And I don't think it should behave differently depending on the representation of "ä" in the original file. But if you have a complex sequence of unicode combining characters, I agree there's some need to be able to edit it. Maybe put point on the character and invoke edit-char to go in some special mode which explodes the usually "atomic" character into smaller pieces. And such a character edit mode might be useful for more things than unicode composing characters, e.g, manipulationg the different sub-parts of a chinese character. Anyway, this user interface is not intimately tied to the internal character representation; its overall effect on the buffer will be the same as replacing any substring. >> When reading text files, the character boundaries may be configurble. > > The important question is what to do by default, I'm pretty sure the default should be that a sequence of one unicode base char and all following unicode combining chars is interned as a single "emacs character". (I think the detailed rules for this are spelled out in the unicode book). With some arbitrary limit to prevent a GByte file with only unicode combining characters to get read as a single emacs character; say at most 10 combining characters. > You are mixing display issues with editing issues and with how > characters are represented internally in an Emacs buffer. I think it's confusing for users if the units of text which forward-char skips over, do not correspond to the units matched by "." in isearch-forward-regexp. My suggested internal representation seems to be a natural way to get this correspondence right, at the cost of some memory (or lots of complexity in reducing memory usage). I'm sure there are other ways, and maybe also a lot better ways, to implement the same thing. > Thanks, I will try that. Now I've also reproduced it on the same machine, without my normal Gnus setup getting in the way. I start emacs with $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs -nw -Q -l bug.el where bug.el contains (setq gnus-init-file nil) (setq gnus-nntp-server nil) (gnus-no-server) Then create the group with G d, pointing out the spool-like directory, enter the group (RET), view the message (RET), try to write out the attachment ("o" on the attachment button). Still crashes for me. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 29 08:12:00 2013 Received: (at 15984) by debbugs.gnu.org; 29 Nov 2013 13:12:00 +0000 Received: from localhost ([127.0.0.1]:49898 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmNrT-0005q9-Uc for submit@debbugs.gnu.org; Fri, 29 Nov 2013 08:12:00 -0500 Received: from fencepost.gnu.org ([208.118.235.10]:41999 ident=Debian-exim) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmNrQ-0005q0-JV for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 08:11:57 -0500 Received: from fl1-110-233-32-186.iba.mesh.ad.jp ([110.233.32.186]:39418 helo=shatin) by fencepost.gnu.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1VmNrO-0006ju-Se; Fri, 29 Nov 2013 08:11:55 -0500 Received: from handa by shatin with local (Exim 4.76) (envelope-from ) id 1VmNrE-0001LK-N9; Fri, 29 Nov 2013 22:11:44 +0900 From: Kenichi Handa To: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In-Reply-To: <83iovc8eaq.fsf@gnu.org> (message from Eli Zaretskii on Thu, 28 Nov 2013 22:25:01 +0200) Date: Fri, 29 Nov 2013 22:11:44 +0900 Message-ID: <8738mfjqsv.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org, nisse@lysator.liu.se X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) In article <83iovc8eaq.fsf@gnu.org>, Eli Zaretskii writes: > If you expected Emacs to perform normalization in this case, then I > don't think we do this automatically (or at all). The library "ucs-normalize" (under lisp/international/) provides the coding system utf-8-hfs which may be appropiate for file-name-coding-system on Mac. --- Kenichi Handa handa@m17n.org From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 29 09:51:04 2013 Received: (at 15984) by debbugs.gnu.org; 29 Nov 2013 14:51:04 +0000 Received: from localhost ([127.0.0.1]:49935 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmPPM-0008H4-59 for submit@debbugs.gnu.org; Fri, 29 Nov 2013 09:51:04 -0500 Received: from mtaout22.012.net.il ([80.179.55.172]:43772) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmPPJ-0008GU-7T for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 09:51:02 -0500 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MX100K0053LRM00@a-mtaout22.012.net.il> for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 16:50:54 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MX100KBN58TR710@a-mtaout22.012.net.il>; Fri, 29 Nov 2013 16:50:54 +0200 (IST) Date: Fri, 29 Nov 2013 16:50:44 +0200 From: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In-reply-to: X-012-Sender: halo1@inter.net.il To: nisse@lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) Message-id: <83haav6z3v.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 8BIT References: <83iovc8eaq.fsf@gnu.org> <83a9gn8yoz.fsf@gnu.org> <831u1z8twg.fsf@gnu.org> <83r49z78jp.fsf@gnu.org> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > From: nisse@lysator.liu.se (Niels Möller) > Cc: 15984@debbugs.gnu.org > Date: Fri, 29 Nov 2013 13:41:01 +0100 > > Today, to remove the dots from an "ä" character, I'll have to delete the > complete "ä" character and insert a new "a" character. Not if they were originally two or more characters which were composed into one. In that case, we let the user edit them separately. > I think this "atomic" handling is the desired behaviour in many > cases. For "ä", this is arguable. For more complex script, this is definitely wrong: users want to be able to edit each component separately. > But if you have a complex sequence of unicode combining characters, > I agree there's some need to be able to edit it. Maybe put point on > the character and invoke edit-char to go in some special mode which > explodes the usually "atomic" character into smaller pieces. We already do that, but if the characters were combined, and Emacs doesn't even know they were separate to begin with, it cannot do that, can it? > > You are mixing display issues with editing issues and with how > > characters are represented internally in an Emacs buffer. > > I think it's confusing for users if the units of text which forward-char > skips over, do not correspond to the units matched by "." in > isearch-forward-regexp. What happens under the hood with matching and what is shown to the user doesn't have to be identical. In fact, it cannot be identical. Again, please don't mix internal implementation and UI, they cannot be possibly identical anyway, because there are conflicting user requirements in different situations. > My suggested internal representation seems to be a natural way to get > this correspondence right, at the cost of some memory (or lots of > complexity in reducing memory usage). It only seems to be that. Real life is much more messy, and defeats such simplicity on many levels. > Now I've also reproduced it on the same machine, without my normal Gnus > setup getting in the way. I start emacs with > > $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs -nw -Q -l bug.el > > where bug.el contains > > (setq gnus-init-file nil) > (setq gnus-nntp-server nil) > (gnus-no-server) > > Then create the group with G d, pointing out the spool-like directory, > enter the group (RET), view the message (RET), try to write out the > attachment ("o" on the attachment button). Still crashes for me. Thanks. From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 29 10:04:13 2013 Received: (at 15984) by debbugs.gnu.org; 29 Nov 2013 15:04:13 +0000 Received: from localhost ([127.0.0.1]:50405 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmPc4-0000Fb-Ow for submit@debbugs.gnu.org; Fri, 29 Nov 2013 10:04:13 -0500 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:43352) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmPc3-0000FP-3B for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 10:04:11 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av8EABK/CFHO+KEh/2dsb2JhbABEhke4Rxdzgh4BAQQBIzMjBQsLGgIYDgICFBgNiEIGrl+SToEjjlSBEwOIYZwZgV6DFQ X-IPAS-Result: Av8EABK/CFHO+KEh/2dsb2JhbABEhke4Rxdzgh4BAQQBIzMjBQsLGgIYDgICFBgNiEIGrl+SToEjjlSBEwOIYZwZgV6DFQ X-IronPort-AV: E=Sophos;i="4.84,565,1355115600"; d="scan'208";a="40685477" Received: from 206-248-161-33.dsl.teksavvy.com (HELO pastel.home) ([206.248.161.33]) by ironport2-out.teksavvy.com with ESMTP/TLS/ADH-AES256-SHA; 29 Nov 2013 10:04:05 -0500 Received: by pastel.home (Postfix, from userid 20848) id 31B3760EFA; Fri, 29 Nov 2013 10:04:05 -0500 (EST) From: Stefan Monnier To: nisse@lysator.liu.se (Niels =?windows-1252?Q?M=F6ller?=) Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename Message-ID: References: <83iovc8eaq.fsf@gnu.org> <83a9gn8yoz.fsf@gnu.org> <831u1z8twg.fsf@gnu.org> Date: Fri, 29 Nov 2013 10:04:04 -0500 In-Reply-To: ("Niels =?windows-1252?Q?M=F6ller=22's?= message of "Fri, 29 Nov 2013 11:43:45 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org, Eli Zaretskii X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.3 (/) > What I think is the right thing, is to allow a sequence of unicode > values, e.g., "A" + combining character, or "A" + any random sequence of > combining characters, intern this string, and treat this as a single > "character". For the Lisp-level notion of "character", I think this would require too many deep changes. > The idea is that this character object should correspond to what the > user thinks of as a single character. E.g, one glyph per character, and > treated as a unit by forward-char, and regexp matching with "." and > character sets. For forward-char, we do try to fake that behavior (e.g. a `forward-char' command will skip over the whole A+ring combo) but not faithfully (e.g. `C-u 2 forward-char' will also just skip that combo, and not the subsequent char). It's not perfect, but it seems "close enough" that it hasn't proved problematic. Adjusting . in regexps would indeed help solve some unexpected behaviors. We would probably want to keep the ability to match a single "code point", so we'd need to introduce a new regexp operator. Maybe we could follow the lead of the POSIX collation thingy, IIRC, where [=CF=90] in case-folding mode wants to be able to match SS in a German locale. So maybe [[:any:]] could match A+ring. > E.g, there could be a mode which makes each and every unicode value a > single character, which will then be displayed as separate glyphs, > separate characters for regexp matching, etc. I think we wouldn't want to use different modes (too coarse) but different commands instead. In any case, a first step would be to find a name for that notion of "multi character character". "Grapheme cluster" doesn't sound too good if we want to expose the concept to the end user. Stefan From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 29 10:27:26 2013 Received: (at 15984) by debbugs.gnu.org; 29 Nov 2013 15:27:26 +0000 Received: from localhost ([127.0.0.1]:50414 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmPyX-0000oK-8f for submit@debbugs.gnu.org; Fri, 29 Nov 2013 10:27:25 -0500 Received: from mtaout22.012.net.il ([80.179.55.172]:53227) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmPyT-0000o3-K0 for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 10:27:22 -0500 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MX100L006QS0V00@a-mtaout22.012.net.il> for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 17:27:14 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MX100KTZ6XER780@a-mtaout22.012.net.il>; Fri, 29 Nov 2013 17:27:14 +0200 (IST) Date: Fri, 29 Nov 2013 17:27:04 +0200 From: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In-reply-to: X-012-Sender: halo1@inter.net.il To: Stefan Monnier Message-id: <83d2lj6xfb.fsf@gnu.org> References: <83iovc8eaq.fsf@gnu.org> <83a9gn8yoz.fsf@gnu.org> <831u1z8twg.fsf@gnu.org> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org, nisse@lysator.liu.se X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > From: Stefan Monnier > Cc: Eli Zaretskii , 15984@debbugs.gnu.org > Date: Fri, 29 Nov 2013 10:04:04 -0500 > > In any case, a first step would be to find a name for that notion of "multi > character character". "Grapheme cluster" doesn't sound too good if we > want to expose the concept to the end user. Why should we invent terminology where one already exists and is widely accepted and used? It sounds like waste of energy. Explain the term well enough, and users will have no difficulty. From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 29 11:18:23 2013 Received: (at 15984) by debbugs.gnu.org; 29 Nov 2013 16:18:23 +0000 Received: from localhost ([127.0.0.1]:50442 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmQlq-00025B-QG for submit@debbugs.gnu.org; Fri, 29 Nov 2013 11:18:23 -0500 Received: from mtaout20.012.net.il ([80.179.55.166]:37724) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmQln-00024q-R3 for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 11:18:21 -0500 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MX10070098ONI00@a-mtaout20.012.net.il> for 15984@debbugs.gnu.org; Fri, 29 Nov 2013 18:18:09 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MX1007TR9A9N100@a-mtaout20.012.net.il>; Fri, 29 Nov 2013 18:18:09 +0200 (IST) Date: Fri, 29 Nov 2013 18:18:00 +0200 From: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In-reply-to: X-012-Sender: halo1@inter.net.il To: nisse@lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) Message-id: <83a9gn6v2f.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 8BIT References: <83iovc8eaq.fsf@gnu.org> <83a9gn8yoz.fsf@gnu.org> <831u1z8twg.fsf@gnu.org> <83r49z78jp.fsf@gnu.org> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > From: nisse@lysator.liu.se (Niels Möller) > Cc: 15984@debbugs.gnu.org > Date: Fri, 29 Nov 2013 13:41:01 +0100 > > $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs -nw -Q -l bug.el > > where bug.el contains > > (setq gnus-init-file nil) > (setq gnus-nntp-server nil) > (gnus-no-server) > > Then create the group with G d, pointing out the spool-like directory, > enter the group (RET), view the message (RET), try to write out the > attachment ("o" on the attachment button). Still crashes for me. It crashes in the current development trunk as well, but only if the locale is set to Latin-1, like yours. I'm looking at this. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 30 03:53:57 2013 Received: (at 15984) by debbugs.gnu.org; 30 Nov 2013 08:53:57 +0000 Received: from localhost ([127.0.0.1]:51103 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmgJI-0004LG-DW for submit@debbugs.gnu.org; Sat, 30 Nov 2013 03:53:56 -0500 Received: from bacon.lysator.liu.se ([130.236.254.206]:48441) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmgJE-0004L3-1Q for 15984@debbugs.gnu.org; Sat, 30 Nov 2013 03:53:53 -0500 Received: from bacon.lysator.liu.se (localhost [127.0.0.1]) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5) with ESMTP id rAU8roAg007776; Sat, 30 Nov 2013 09:53:50 +0100 (MET) Received: (from nisse@localhost) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5/Submit) id rAU8rmek007775; Sat, 30 Nov 2013 09:53:48 +0100 (MET) X-Authentication-Warning: bacon.lysator.liu.se: nisse set sender to nisse@lysator.liu.se using -f From: nisse@lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) To: Stefan Monnier Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename References: <83iovc8eaq.fsf@gnu.org> <83a9gn8yoz.fsf@gnu.org> <831u1z8twg.fsf@gnu.org> Date: Sat, 30 Nov 2013 09:53:48 +0100 In-Reply-To: (Stefan Monnier's message of "Fri, 29 Nov 2013 10:04:04 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (usg-unix-v) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org, Eli Zaretskii X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.7 (/) Stefan Monnier writes: >> What I think is the right thing, is to allow a sequence of unicode >> values, e.g., "A" + combining character, or "A" + any random sequence of >> combining characters, intern this string, and treat this as a single >> "character". > > For the Lisp-level notion of "character", I think this would require too > many deep changes. I can understand that. I'm actually impressed by the move from MULE encodings to unicode, which to a user appeared to very smooth. But I still think that type of "character" abstraction the right thing for unicode text processing in general. > For forward-char, we do try to fake that behavior (e.g. a `forward-char' > command will skip over the whole A+ring combo) but not faithfully > (e.g. `C-u 2 forward-char' will also just skip that combo, and not the > subsequent char). It's not perfect, but it seems "close enough" that it > hasn't proved problematic. Didn't know, that's a bit weird. I just tried, as Eli suggested, editing text with "ä" represented with a as a combining character. In emacs-23.4, pressing DEL after the "ä" deletes the dots only. I now understand why, but it's not what I had expected, and I think deleteing the entire A + dots would be preferable. Plain C-x = on the "a" shows just "Char: a (97, #o141, #x61) point=443 of 455 (97%) column=1", but C-u C-x = also shows the combining char. However, emacs-24.3 behaves differently, the 'a' and the '"' gets displayed differently, and are not combined at all for display. The buffer shows 'a"', and according to C-u C-x 8 the '"' is a "COMBINING DIAERESIS". These tests done in an X11 frame, so maybe they're just picking up different fonts? >> E.g, there could be a mode which makes each and every unicode value a >> single character, which will then be displayed as separate glyphs, >> separate characters for regexp matching, etc. > > I think we wouldn't want to use different modes (too coarse) but > different commands instead. I didn't mean an emacs major or minor mode. It would be more like a special coding system, applied when reading the text from file. > In any case, a first step would be to find a name for that notion of "multi > character character". "Grapheme cluster" doesn't sound too good if we > want to expose the concept to the end user. I think "character" is the right word, the main source of confusion is that unicode code points are often referred to as "characters". Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 30 08:20:33 2013 Received: (at 15984) by debbugs.gnu.org; 30 Nov 2013 13:20:33 +0000 Received: from localhost ([127.0.0.1]:51269 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmkTI-0003PQ-O3 for submit@debbugs.gnu.org; Sat, 30 Nov 2013 08:20:33 -0500 Received: from mtaout20.012.net.il ([80.179.55.166]:58103) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmkTF-0003PC-73 for 15984@debbugs.gnu.org; Sat, 30 Nov 2013 08:20:31 -0500 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MX200F00VAFOM00@a-mtaout20.012.net.il> for 15984@debbugs.gnu.org; Sat, 30 Nov 2013 15:20:21 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MX200FJFVPVNW30@a-mtaout20.012.net.il>; Sat, 30 Nov 2013 15:20:20 +0200 (IST) Date: Sat, 30 Nov 2013 15:20:13 +0200 From: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In-reply-to: <83a9gn6v2f.fsf@gnu.org> X-012-Sender: halo1@inter.net.il To: Kenichi Handa Message-id: <83siue58mq.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 8BIT References: <83iovc8eaq.fsf@gnu.org> <83a9gn8yoz.fsf@gnu.org> <831u1z8twg.fsf@gnu.org> <83r49z78jp.fsf@gnu.org> <83a9gn6v2f.fsf@gnu.org> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org, nisse@lysator.liu.se X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > From: Eli Zaretskii > Cc: 15984@debbugs.gnu.org > > > From: nisse@lysator.liu.se (Niels Möller) > > Cc: 15984@debbugs.gnu.org > > Date: Fri, 29 Nov 2013 13:41:01 +0100 > > > > $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs -nw -Q -l bug.el > > > > where bug.el contains > > > > (setq gnus-init-file nil) > > (setq gnus-nntp-server nil) > > (gnus-no-server) > > > > Then create the group with G d, pointing out the spool-like directory, > > enter the group (RET), view the message (RET), try to write out the > > attachment ("o" on the attachment button). Still crashes for me. > > It crashes in the current development trunk as well, but only if the > locale is set to Latin-1, like yours. > > I'm looking at this. There's something strange going on here; I'm CC'ing Handa-san, because the problem is related to processing character compositions on a TTY. The reason for the crash is simple: the following code from indent.c:scan_for_column /* Check composition sequence. */ if (cmp_it.id >= 0 || (scan == cmp_it.stop_pos && composition_reseat_it (&cmp_it, scan, scan_byte, end, w, NULL, Qnil))) composition_update_it (&cmp_it, scan, scan_byte, Qnil); if (cmp_it.id >= 0) { scan += cmp_it.nchars; scan_byte += cmp_it.nbytes; if (scan <= end) col += cmp_it.width; if (cmp_it.to == cmp_it.nglyphs) { cmp_it.id = -1; composition_compute_stop_pos (&cmp_it, scan, scan_byte, end, Qnil); } else cmp_it.from = cmp_it.to; continue; } incorrectly steps into the middle of a multibyte sequence #xCC #x88 for the character u+0308, the Combining Diaeresis, because cmp_it.nbytes is computed as 1 instead of 2. The question is why it does so. >From stepping through composition_reseat_it and composition_update_it, it looks like the code contradicts itself: it thinks that 'a' and the combining diaeresis should be composed, but then acts as if no composition should happen. As result, this code in composition_update_it: glyph = LGSTRING_GLYPH (gstring, cmp_it->from); cmp_it->nchars = LGLYPH_TO (glyph) + 1 - from; cmp_it->nbytes = 0; cmp_it->width = 0; for (i = cmp_it->nchars - 1; i >= 0; i--) { c = XINT (LGSTRING_CHAR (gstring, i)); cmp_it->nbytes += CHAR_BYTES (c); cmp_it->width += CHAR_WIDTH (c); } always considers only 'a', never the diaeresis, and so cmp_it->nbytes is always computed as 1. So scan_for_column advances only 1 byte, instead of 2, and finds itself in the middle of a multibyte sequence. >From there, it's a sure way to a crash. I hope Handa-san will be able to find the problem. The crash is 100% reproducible with the steps described above and a mail message that Niels can send you off-list. TIA From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 30 09:25:18 2013 Received: (at 15984) by debbugs.gnu.org; 30 Nov 2013 14:25:18 +0000 Received: from localhost ([127.0.0.1]:51341 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmlTx-00050c-1X for submit@debbugs.gnu.org; Sat, 30 Nov 2013 09:25:17 -0500 Received: from fencepost.gnu.org ([208.118.235.10]:41630 ident=Debian-exim) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmlTs-00050P-Tb for 15984@debbugs.gnu.org; Sat, 30 Nov 2013 09:25:13 -0500 Received: from fl1-110-233-32-186.iba.mesh.ad.jp ([110.233.32.186]:39434 helo=shatin) by fencepost.gnu.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1VmlTr-00089R-6F; Sat, 30 Nov 2013 09:25:11 -0500 Received: from handa by shatin with local (Exim 4.76) (envelope-from ) id 1VmlTm-0001o3-Q8; Sat, 30 Nov 2013 23:25:06 +0900 From: Kenichi Handa To: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In-Reply-To: <83siue58mq.fsf@gnu.org> (message from Eli Zaretskii on Sat, 30 Nov 2013 15:20:13 +0200) Date: Sat, 30 Nov 2013 23:25:06 +0900 Message-ID: <87siue6k71.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org, nisse@lysator.liu.se X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) In article <83siue58mq.fsf@gnu.org>, Eli Zaretskii writes: > There's something strange going on here; I'm CC'ing Handa-san, because > the problem is related to processing character compositions on a TTY. [...] > I hope Handa-san will be able to find the problem. The crash is 100% > reproducible with the steps described above and a mail message that > Niels can send you off-list. Thank you for tracking down the bug. I'll investigate the cause of of the problem. --- Kenichi Handa handa@m17n.org From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 30 10:50:41 2013 Received: (at 15984) by debbugs.gnu.org; 30 Nov 2013 15:50:41 +0000 Received: from localhost ([127.0.0.1]:51903 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vmmoa-00077M-C0 for submit@debbugs.gnu.org; Sat, 30 Nov 2013 10:50:41 -0500 Received: from bacon.lysator.liu.se ([130.236.254.206]:54071) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VmmoU-000773-GS for 15984@debbugs.gnu.org; Sat, 30 Nov 2013 10:50:35 -0500 Received: from bacon.lysator.liu.se (localhost [127.0.0.1]) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5) with ESMTP id rAUFoWmh019591; Sat, 30 Nov 2013 16:50:32 +0100 (MET) Received: (from nisse@localhost) by bacon.lysator.liu.se (8.14.5+Sun/8.14.5/Submit) id rAUFoV5O019589; Sat, 30 Nov 2013 16:50:31 +0100 (MET) X-Authentication-Warning: bacon.lysator.liu.se: nisse set sender to nisse@lysator.liu.se using -f From: nisse@lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) To: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename References: <83iovc8eaq.fsf@gnu.org> <83a9gn8yoz.fsf@gnu.org> <831u1z8twg.fsf@gnu.org> <83r49z78jp.fsf@gnu.org> <83a9gn6v2f.fsf@gnu.org> <83siue58mq.fsf@gnu.org> Date: Sat, 30 Nov 2013 16:50:31 +0100 In-Reply-To: <83siue58mq.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 30 Nov 2013 15:20:13 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (usg-unix-v) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org, Kenichi Handa X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.7 (/) Eli Zaretskii writes: > I hope Handa-san will be able to find the problem. The crash is 100% > reproducible with the steps described above and a mail message that > Niels can send you off-list. I ended up sending an anonymized example message to the list, see http://debbugs.gnu.org/cgi/bugreport.cgi?msg=14;filename=bounce.gz;att=1;bug=15984 Thanks for looking into this. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 30 11:09:46 2013 Received: (at 15984) by debbugs.gnu.org; 30 Nov 2013 16:09:46 +0000 Received: from localhost ([127.0.0.1]:51913 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vmn74-0007aX-3x for submit@debbugs.gnu.org; Sat, 30 Nov 2013 11:09:46 -0500 Received: from mtaout22.012.net.il ([80.179.55.172]:41765) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vmn71-0007aH-66 for 15984@debbugs.gnu.org; Sat, 30 Nov 2013 11:09:44 -0500 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MX3008003IEQL00@a-mtaout22.012.net.il> for 15984@debbugs.gnu.org; Sat, 30 Nov 2013 18:09:36 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MX3008YG3K0LL40@a-mtaout22.012.net.il>; Sat, 30 Nov 2013 18:09:36 +0200 (IST) Date: Sat, 30 Nov 2013 18:09:30 +0200 From: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In-reply-to: <87siue6k71.fsf@gnu.org> X-012-Sender: halo1@inter.net.il To: Kenichi Handa Message-id: <83ppph6fd1.fsf@gnu.org> References: <87siue6k71.fsf@gnu.org> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org, nisse@lysator.liu.se X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > From: Kenichi Handa > Cc: nisse@lysator.liu.se, 15984@debbugs.gnu.org > Date: Sat, 30 Nov 2013 23:25:06 +0900 > > > I hope Handa-san will be able to find the problem. The crash is 100% > > reproducible with the steps described above and a mail message that > > Niels can send you off-list. > > Thank you for tracking down the bug. I'll investigate > the cause of of the problem. Thanks. To save you some time, the problem only happens in a Latin-1 locale, so I used this command to invoke Emacs: HOME=$HOME/tmp LC_CTYPE=sv_SE.ISO8859-1 src/emacs -Q -l bug.el From debbugs-submit-bounces@debbugs.gnu.org Fri Jan 17 08:30:23 2014 Received: (at 15984) by debbugs.gnu.org; 17 Jan 2014 13:30:23 +0000 Received: from localhost ([127.0.0.1]:54576 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W49V8-0006IT-9z for submit@debbugs.gnu.org; Fri, 17 Jan 2014 08:30:23 -0500 Received: from fencepost.gnu.org ([208.118.235.10]:51026) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W49V4-0006IH-3x for 15984@debbugs.gnu.org; Fri, 17 Jan 2014 08:30:19 -0500 Received: from fl1-119-240-87-91.iba.mesh.ad.jp ([119.240.87.91]:32778 helo=wanchai) by fencepost.gnu.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1W49V2-0000LX-KJ; Fri, 17 Jan 2014 08:30:17 -0500 Received: from handa by wanchai with local (Exim 4.80) (envelope-from ) id 1W49Us-0000y5-Em; Fri, 17 Jan 2014 22:30:06 +0900 From: handa@gnu.org (K. Handa) To: handa@gnu.org (K. Handa) Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename In-Reply-To: <87eh574qmm.fsf@gnu.org> (handa@gnu.org) Date: Fri, 17 Jan 2014 22:30:05 +0900 Message-ID: <874n52pwgy.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -5.3 (-----) X-Debbugs-Envelope-To: 15984 Cc: 15984@debbugs.gnu.org, eliz@gnu.org, nisse@lysator.liu.se X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.3 (-----) In article <87eh574qmm.fsf@gnu.org>, handa@gnu.org (K. Handa) writes: > I'll keep trying to find why the trunk doesn't crash with > you recipe, and once I find the whole story, I'll install a > proper patch (which may be the same as what I sent) to the > trunk. I couldn't reproduce that bug with the trunk code. I rewinded back to the day 2013-03-11 which is the day 24.3 was released and I can reproduce the bug with 24.3. So, I am now very puzzled. Anyway, I installed that fix to the trunk because the previous code was apparently wrong. --- Kenichi Handa handa@gnu.org PS. I've just noticed that recent mails exchanged on this matter were not CC:ed to 15984@debbugs.gnu.org. So, to provide the context, I attach some key mails here. -1-------------------------------------------------------------------- From: nisse@lysator.liu.se (Niels M=F6ller) To: handa@gnu.org (K. Handa) Subject: Re: bug#15984: 24.3; Problem with combining characters in attachme= nt filename handa@gnu.org (K. Handa) writes: > In article <83siue58mq.fsf@gnu.org>, Eli Zaretskii writes: > >> I hope Handa-san will be able to find the problem. The crash is 100% >> reproducible with the steps described above and a mail message that >> Niels can send you off-list. > > Could you please send me that mail message? I'll delete it > as soon as I can find a fix. I believe the smaller bounce message I posted in the bugtracker exhibits the the problem. That's the same file Eli was using when reproducing the problem. Described at http://debbugs.gnu.org/cgi/bugreport.cgi?msg=3D14;bug=3D15984 actual messge (gzipped): http://debbugs.gnu.org/cgi/bugreport.cgi?msg=3D14;filename=3Dbounce.gz;at= t=3D1;bug=3D15984 Steps to reproduce the problem (this info spread out in the bug thread): 1. Create a new directory, say mail-tmp. Copy the message (uncompressed) into that directory, with filename "1". 2. Start emacs in tty mode, with a latin-1 locale, like HOME=3D$HOME/tmp LC_CTYPE=3Dsv_SE.ISO8859-1 src/emacs -Q -l bug.el with bug.el containing (setq gnus-init-file nil) (setq gnus-nntp-server nil) (gnus-no-server) 3. Then, in Gnus' *Group* buffer, create the group with G d, pointing out the mail-tmp directory, enter the group (RET), view the message (RET), try to write out the attachment ("o" on the attachment button). Still crashes for me. Let me know if you need anything further info. Regards, /Niels --=20 Niels M=F6ller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. -2-------------------------------------------------------------------- From: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachme= nt filename To: handa@gnu.org (K. Handa) Cc: nisse@lysator.liu.se, handa@gnu.org > From: handa@gnu.org (K. Handa) > Cc: eliz@gnu.org, handa@gnu.org > Date: Fri, 13 Dec 2013 23:15:00 +0900 >=20 > In article , nisse@lysator.liu.se (N= iels M=F6ller) writes: >=20 > > And tty mode, no X frame (I used an xterm, started in a latin-1 locale). >=20 > Yes. I surely add "-nw" argument, and I tried the recipe > with xterm and lxterminal. I cannot reproduce this either, with today's trunk. Perhaps you could try with the trunk as it was on Nov 30, or with Emacs 24.3? > By the way, I noticed that buffer-file-coding-system of > Gnus's message buffer (the buffer showing that bounce mail) > is raw-text-unix. Is it the same with you? Yes. This might be part of the problem, or it could be the trigger for the crash. -3--------------------------------------------------------------------- From: handa@gnu.org (K. Handa) To: nisse@lysator.liu.se (Niels M=F6ller) Cc: eliz@gnu.org, handa@gnu.org Subject: Re: bug#15984: 24.3; Problem with combining characters in attachme= nt filename In article , nisse@lysator.liu.se (Nie= ls M=F6ller) writes: > And tty mode, no X frame (I used an xterm, started in a latin-1 locale). Yes. I surely add "-nw" argument, and I tried the recipe with xterm and lxterminal. By the way, I noticed that buffer-file-coding-system of Gnus's message buffer (the buffer showing that bounce mail) is raw-text-unix. Is it the same with you? --- Kenichi Handa handa@gnu.org -4-------------------------------------------------------------------- From: nisse@lysator.liu.se (Niels M=F6ller) To: handa@gnu.org (K. Handa) Cc: eliz@gnu.org Subject: Re: bug#15984: 24.3; Problem with combining characters in attachme= nt filename handa@gnu.org (K. Handa) writes: > By the way, I noticed that buffer-file-coding-system of > Gnus's message buffer (the buffer showing that bounce mail) > is raw-text-unix. Is it the same with you? Yes. Probably wasn't in the original mail (if you like, I can look into that further, but I don't want to crash the emacs I'm writing this in right now). Regards, /Niels --=20 Niels M=F6ller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. -5-------------------------------------------------------------------- From: handa@gnu.org (K. Handa) To: Eli Zaretskii Cc: nisse@lysator.liu.se, handa@gnu.org Subject: Re: bug#15984: 24.3; Problem with combining characters in attachme= nt filename In article <838uvo6cjx.fsf@gnu.org>, Eli Zaretskii writes: > I cannot reproduce this either, with today's trunk. Perhaps you > could try with the trunk as it was on Nov 30, or with Emacs 24.3? With Emacs 24.3, I could reproduce the bug and the patch attached at the tail seems to fix it. Could you please try it? It is applicable to the latest code too. But, with the trunk, I have not yet succeeded in reproducing the bug. I tried from the revision of Nov 30 and went back to April one month by one. > > By the way, I noticed that buffer-file-coding-system of > > Gnus's message buffer (the buffer showing that bounce mail) > > is raw-text-unix. Is it the same with you? > Yes. This might be part of the problem, or it could be the trigger > for the crash. With Emacs 24.3, the bug can be reproduced with a multibyte buffer. --- Kenichi Handa handa@gnu.org =3D=3D=3D modified file 'src/composite.c' --- src/composite.c 2013-01-01 09:11:05 +0000 +++ src/composite.c 2013-12-19 13:49:53 +0000 @@ -1426,7 +1426,7 @@ cmp_it->width =3D 0; for (i =3D cmp_it->nchars - 1; i >=3D 0; i--) { - c =3D XINT (LGSTRING_CHAR (gstring, i)); + c =3D XINT (LGSTRING_CHAR (gstring, cmp_it->from + i)); cmp_it->nbytes +=3D CHAR_BYTES (c); cmp_it->width +=3D CHAR_WIDTH (c); } -6-------------------------------------------------------------------- From: nisse@lysator.liu.se (Niels M=F6ller) To: handa@gnu.org (K. Handa) Cc: Eli Zaretskii Subject: Re: bug#15984: 24.3; Problem with combining characters in attachme= nt filename handa@gnu.org (K. Handa) writes: > With Emacs 24.3, I could reproduce the bug and the patch > attached at the tail seems to fix it. Could you please try > it? It is applicable to the latest code too. I compiled 24.3.1 with the patch applied. It no longer crashes. Great! Behavior is that on saving the attachment, the default filename is displayed as "Brev aktiea?gar 131127.pdf", where the question mark really is a COMBINING DIAERESIS (according to C-u C-x =3D). When I press enter, the file is saved under the file name "Brev aktiea gar 131127.pdf", with the combining diaeresis replaced by a SPC character (checked with GNU ls -N | od -tx1c). Regards, /Niels --=20 Niels M=F6ller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. -7--------------------------------------------------------------------- From: handa@gnu.org (K. Handa) To: nisse@lysator.liu.se (Niels M=F6ller) Cc: eliz@gnu.org, handa@gnu.org Subject: Re: bug#15984: 24.3; Problem with combining characters in attachme= nt filename In article , nisse@lysator.liu.se (Nie= ls M=F6ller) writes: > handa@gnu.org (K. Handa) writes: > > With Emacs 24.3, I could reproduce the bug and the patch > > attached at the tail seems to fix it. Could you please try > > it? It is applicable to the latest code too. > I compiled 24.3.1 with the patch applied. It no longer crashes. Great! Thank you for testing that. > Behavior is that on saving the attachment, the default filename is > displayed as "Brev aktiea?gar 131127.pdf", where the question mark > really is a COMBINING DIAERESIS (according to C-u C-x =3D). When I press > enter, the file is saved under the file name "Brev aktiea gar > 131127.pdf", with the combining diaeresis replaced by a SPC character > (checked with GNU ls -N | od -tx1c). This just my guess, but, as far as you are in ISO-8859-1 locale, there's no way to encode that combining diaeresis, so gnus uses SPC as a replacement character. Perhaps, gnus should warn you about that and ask you how to encode the file name. Anyway that is completely different matter than bug#15984. I'll keep trying to find why the trunk doesn't crash with you recipe, and once I find the whole story, I'll install a proper patch (which may be the same as what I sent) to the trunk. --- Kenichi Handa handa@gnu.org From debbugs-submit-bounces@debbugs.gnu.org Fri Feb 07 21:21:43 2014 Received: (at control) by debbugs.gnu.org; 8 Feb 2014 02:21:43 +0000 Received: from localhost ([127.0.0.1]:55660 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WBxY7-0008AC-0h for submit@debbugs.gnu.org; Fri, 07 Feb 2014 21:21:43 -0500 Received: from fencepost.gnu.org ([208.118.235.10]:54211) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WBxY6-0008A6-1A for control@debbugs.gnu.org; Fri, 07 Feb 2014 21:21:42 -0500 Received: from rgm by fencepost.gnu.org with local (Exim 4.71) (envelope-from ) id 1WBxY5-0005UN-OP for control@debbugs.gnu.org; Fri, 07 Feb 2014 21:21:41 -0500 Date: Fri, 07 Feb 2014 21:21:41 -0500 Message-Id: Subject: control message for bug 15984 To: X-Mailer: mail (GNU Mailutils 2.1) From: Glenn Morris X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.6 (-----) close 15984 24.4 From unknown Mon Jun 16 23:47:01 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 08 Mar 2014 12:24:08 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator