#15984 - 24.3; Problem with combining characters in attachment filename

GNU bug report logs - #15984
24.3; Problem with combining characters in attachment filename

Package: emacs;

Reported by: nisse <at> lysator.liu.se (Niels Möller)

Date: Thu, 28 Nov 2013 08:33:01 UTC

Severity: normal

Found in version 24.3

Fixed in version 24.4

Done: Glenn Morris <rgm <at> gnu.org>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: nisse <at> lysator.liu.se (Niels Möller) To: Stefan Monnier <monnier <at> iro.umontreal.ca> Cc: 15984 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org> Subject: bug#15984: 24.3; Problem with combining characters in attachment filename Date: Sat, 30 Nov 2013 09:53:48 +0100

Stefan Monnier <monnier <at> iro.umontreal.ca> writes: >> What I think is the right thing, is to allow a sequence of unicode >> values, e.g., "A" + combining character, or "A" + any random sequence of >> combining characters, intern this string, and treat this as a single >> "character". > > For the Lisp-level notion of "character", I think this would require too > many deep changes. I can understand that. I'm actually impressed by the move from MULE encodings to unicode, which to a user appeared to very smooth. But I still think that type of "character" abstraction the right thing for unicode text processing in general. > For forward-char, we do try to fake that behavior (e.g. a `forward-char' > command will skip over the whole A+ring combo) but not faithfully > (e.g. `C-u 2 forward-char' will also just skip that combo, and not the > subsequent char). It's not perfect, but it seems "close enough" that it > hasn't proved problematic. Didn't know, that's a bit weird. I just tried, as Eli suggested, editing text with "ä" represented with a as a combining character. In emacs-23.4, pressing DEL after the "ä" deletes the dots only. I now understand why, but it's not what I had expected, and I think deleteing the entire A + dots would be preferable. Plain C-x = on the "a" shows just "Char: a (97, #o141, #x61) point=443 of 455 (97%) column=1", but C-u C-x = also shows the combining char. However, emacs-24.3 behaves differently, the 'a' and the '"' gets displayed differently, and are not combined at all for display. The buffer shows 'a"', and according to C-u C-x 8 the '"' is a "COMBINING DIAERESIS". These tests done in an X11 frame, so maybe they're just picking up different fonts? >> E.g, there could be a mode which makes each and every unicode value a >> single character, which will then be displayed as separate glyphs, >> separate characters for regexp matching, etc. > > I think we wouldn't want to use different modes (too coarse) but > different commands instead. I didn't mean an emacs major or minor mode. It would be more like a special coding system, applied when reading the text from file. > In any case, a first step would be to find a name for that notion of "multi > character character". "Grapheme cluster" doesn't sound too good if we > want to expose the concept to the end user. I think "character" is the right word, the main source of confusion is that unicode code points are often referred to as "characters". Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance.

This bug report was last modified 11 years and 103 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #15984 24.3; Problem with combining characters in attachment filename

GNU bug report logs - #15984
24.3; Problem with combining characters in attachment filename