#15984 - 24.3; Problem with combining characters in attachment filename

GNU bug report logs - #15984
24.3; Problem with combining characters in attachment filename

Package: emacs;

Reported by: nisse <at> lysator.liu.se (Niels Möller)

Date: Thu, 28 Nov 2013 08:33:01 UTC

Severity: normal

Found in version 24.3

Fixed in version 24.4

Done: Glenn Morris <rgm <at> gnu.org>

Bug is archived. No further changes may be made.

Message #32 received at 15984 <at> debbugs.gnu.org (full text, mbox):

From: nisse <at> lysator.liu.se (Niels Möller) To: Eli Zaretskii <eliz <at> gnu.org> Cc: 15984 <at> debbugs.gnu.org Subject: Re: bug#15984: 24.3; Problem with combining characters in attachment filename Date: Fri, 29 Nov 2013 13:41:01 +0100

Eli Zaretskii <eliz <at> gnu.org> writes: > However, we do want to give the user a way to > delete only one or more of the combining characters, so forcing the > entire combination to be a single indivisible entity would not be TRT > for users. Good question, how to handle this. Today, to remove the dots from an "ä" character, I'll have to delete the complete "ä" character and insert a new "a" character. Or similarly for the reverse edit. I think this "atomic" handling is the desired behaviour in many cases. And I don't think it should behave differently depending on the representation of "ä" in the original file. But if you have a complex sequence of unicode combining characters, I agree there's some need to be able to edit it. Maybe put point on the character and invoke edit-char to go in some special mode which explodes the usually "atomic" character into smaller pieces. And such a character edit mode might be useful for more things than unicode composing characters, e.g, manipulationg the different sub-parts of a chinese character. Anyway, this user interface is not intimately tied to the internal character representation; its overall effect on the buffer will be the same as replacing any substring. >> When reading text files, the character boundaries may be configurble. > > The important question is what to do by default, I'm pretty sure the default should be that a sequence of one unicode base char and all following unicode combining chars is interned as a single "emacs character". (I think the detailed rules for this are spelled out in the unicode book). With some arbitrary limit to prevent a GByte file with only unicode combining characters to get read as a single emacs character; say at most 10 combining characters. > You are mixing display issues with editing issues and with how > characters are represented internally in an Emacs buffer. I think it's confusing for users if the units of text which forward-char skips over, do not correspond to the units matched by "." in isearch-forward-regexp. My suggested internal representation seems to be a natural way to get this correspondence right, at the cost of some memory (or lots of complexity in reducing memory usage). I'm sure there are other ways, and maybe also a lot better ways, to implement the same thing. > Thanks, I will try that. Now I've also reproduced it on the same machine, without my normal Gnus setup getting in the way. I start emacs with $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs -nw -Q -l bug.el where bug.el contains (setq gnus-init-file nil) (setq gnus-nntp-server nil) (gnus-no-server) Then create the group with G d, pointing out the spool-like directory, enter the group (RET), view the message (RET), try to write out the attachment ("o" on the attachment button). Still crashes for me. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance.

This bug report was last modified 11 years and 158 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #15984 24.3; Problem with combining characters in attachment filename

GNU bug report logs - #15984
24.3; Problem with combining characters in attachment filename