#14461 - 24.3.50; bad display for 'space' + (U+0336) unicode combination

GNU bug report logs - #14461
24.3.50; bad display for 'space' + (U+0336) unicode combination

Package: emacs;

Reported by: Cédric Chépied <cedric.chepied <at> gmail.com>

Date: Fri, 24 May 2013 15:48:01 UTC

Severity: normal

Found in version 24.3.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org> To: Stephen Berman <stephen.berman <at> gmx.net>, Kenichi Handa <handa <at> gnu.org> Cc: 14461 <at> debbugs.gnu.org, larsi <at> gnus.org, cedric.chepied <at> gmail.com Subject: bug#14461: 24.3.50; bad display for 'space' + (U+0336) unicode combination Date: Sat, 17 Aug 2019 15:00:18 +0300

> From: Stephen Berman <stephen.berman <at> gmx.net> > Date: Thu, 15 Aug 2019 14:29:08 +0200 > Cc: 14461 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org> > > On Thu, 15 Aug 2019 12:02:21 +0200 Cédric Chépied <cedric.chepied <at> gmail.com> wrote: > > ... I assume combining characters are always displayed after a space > instead of over it -- at least that's what I see with e.g. U+0301 > (COMBINING ACUTE ACCENT) and U+0302 (COMBINING CIRCUMFLEX ACCENT). Indeed, we reject base characters of certain general categories, including those whose general category is Zs (space separator). In composite.el:compose-gstring-for-graphic we have: ;; This sequence doesn't start with a proper base character. ((memq (get-char-code-property (lgstring-char gstring 0) 'general-category) '(Mn Mc Me Zs Zl Zp Cc Cf Cs)) nil) > That makes sense to me (otherwise, you couldn't visually distinguish > e.g. the sequence 'aU+0301U+0302' from the sequence 'aU+0301 U+0302') I don't see why: the former should be displayed as a single grapheme cluster, with both diacritics on top of a, whereas the latter should be displayed as 2 grapheme clusters, with U+0302 on top of the SPC character instead of on top of a. > and I would guess some Unicode standard prescribes it. Actually , the Unicode Standard prescribes the opposite. It says (paragraph 3.6): D50 Graphic character: A character with the General Category of Letter (L), Combining Mark (M), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs). ... D51 Base character: Any graphic character except for those with the General Category of Combining Mark (M). • Most Unicode characters are base characters. In terms of General Category values, a base character is any code point that has one of the following categories: Letter (L), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs). ... D52 Combining character: A character with the General Category of Combining Mark (M). and (in 2.11) All combining characters can be applied to any base character and can, in principle, be used with any script. So I don't think we are right when we exclude space separators from base characters eligible for character composition, I think it's a mistake. Perhaps Handa-san (CC'ed) could comment on why we do that.

This bug report was last modified 5 years and 315 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #14461 24.3.50; bad display for 'space' + (U+0336) unicode combination

GNU bug report logs - #14461
24.3.50; bad display for 'space' + (U+0336) unicode combination