GNU bug report logs - #15984
24.3; Problem with combining characters in attachment filename

Previous Next

Package: emacs;

Reported by: nisse <at> lysator.liu.se (Niels Möller)

Date: Thu, 28 Nov 2013 08:33:01 UTC

Severity: normal

Found in version 24.3

Fixed in version 24.4

Done: Glenn Morris <rgm <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: nisse <at> lysator.liu.se (Niels Möller)
Cc: 15984 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>
Subject: bug#15984: 24.3; Problem with combining characters in attachment filename
Date: Fri, 29 Nov 2013 10:04:04 -0500
> What I think is the right thing, is to allow a sequence of unicode
> values, e.g., "A" + combining character, or "A" + any random sequence of
> combining characters, intern this string, and treat this as a single
> "character".

For the Lisp-level notion of "character", I think this would require too
many deep changes.

> The idea is that this character object should correspond to what the
> user thinks of as a single character. E.g, one glyph per character, and
> treated as a unit by forward-char, and regexp matching with "." and
> character sets.

For forward-char, we do try to fake that behavior (e.g. a `forward-char'
command will skip over the whole A+ring combo) but not faithfully
(e.g. `C-u 2 forward-char' will also just skip that combo, and not the
subsequent char).  It's not perfect, but it seems "close enough" that it
hasn't proved problematic.

Adjusting . in regexps would indeed help solve some
unexpected behaviors.  We would probably want to keep the ability to match
a single "code point", so we'd need to introduce a new regexp operator.

Maybe we could follow the lead of the POSIX collation thingy, IIRC,
where [ϐ] in case-folding mode wants to be able to match SS in
a German locale.  So maybe [[:any:]] could match A+ring.

> E.g, there could be a mode which makes each and every unicode value a
> single character, which will then be displayed as separate glyphs,
> separate characters for regexp matching, etc.

I think we wouldn't want to use different modes (too coarse) but
different commands instead.

In any case, a first step would be to find a name for that notion of "multi
character character".  "Grapheme cluster" doesn't sound too good if we
want to expose the concept to the end user.


        Stefan




This bug report was last modified 11 years and 102 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.