#11073 - 24.0.94; BIDI-related crash in redisplay with certain byte sequences

GNU bug report logs - #11073
24.0.94; BIDI-related crash in redisplay with certain byte sequences

Package: emacs;

Reported by: Eli Zaretskii <eliz <at> gnu.org>

Date: Fri, 23 Mar 2012 11:27:02 UTC

Severity: normal

Found in version 24.0.94

Done: Glenn Morris <rgm <at> gnu.org>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Kenichi Handa <handa <at> m17n.org> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 11073 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca Subject: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences Date: Mon, 26 Mar 2012 16:45:56 +0900

In article <837gybupdf.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes: > > Why do we need this unification? Or rather, why do we need multiple > > codepoints, which then forces us to unify them? > That's something Handa-san (CC'ed) will be able to explain much better > than I ever could. It's a long story. When I designed emacs-unicode (the version before merged to the trunk, more than 10 years ago), the unification maps of CJK charsets to Unicode were not stable. In addtion, there were various conflicting policies on which character to unify to which character. One reason of this confusion was that Unicode itself didn't define mapping to/from such CJK charsets (JIS, GB, KSC). The unification problem is not only for Ideographic characters. Many CJK charsets contain, for instance, full-width version of Greek characters, but Unicode doesn't distinguish them from single-width versions (though Unicode has full-width version of 'A'..'Z', etc). There were people who wanted to distinguish full-width Greek chars from single-width chars. There also were people who have a text of iso-2022-7bit file which distinguishes characters of GB charset and JIS charset. To edit such a file and write it back as the original one, one has to disable unification of one of GB and JIS (or both of them). So, I decided at that time to give each CJK charset unique code space (above #x110000) in Emacs, and allow users to freely unify/disunify them to Unicode code space (below #x110000) by giving the function unify-charset. FYI, http://www.unicode.org/reports/tr38/ tells some difficulty of mappings. > AFAIU, there are good reasons to have some CJK > characters on separate codepoints, because they need to be treated > differently from their Unicode codepoints (perhaps a different choice > of font to display them?) That was one reaons, but the current code pay attention to `charset' text property of each character to select a proper font. --- Kenichi Handa handa <at> m17n.org

This bug report was last modified 12 years and 94 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #11073 24.0.94; BIDI-related crash in redisplay with certain byte sequences

GNU bug report logs - #11073
24.0.94; BIDI-related crash in redisplay with certain byte sequences