GNU bug report logs - #11073
24.0.94; BIDI-related crash in redisplay with certain byte sequences

Previous Next

Package: emacs;

Reported by: Eli Zaretskii <eliz <at> gnu.org>

Date: Fri, 23 Mar 2012 11:27:02 UTC

Severity: normal

Found in version 24.0.94

Done: Glenn Morris <rgm <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Kenichi Handa <handa <at> m17n.org>
Cc: eliz <at> gnu.org, 11073 <at> debbugs.gnu.org
Subject: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences
Date: Tue, 03 Apr 2012 00:22:32 -0400
>> > Usually, yes.  But as far as there is a code space in high
>> > area for a CJK charset, it is unavoidable to have a
>> > buffer/string that contains a character represented by a
>> > byte sequence in that high area as the test case of
>> > Bug#11073.  And, as "unification" means to treat such a
>> > character the same way as the unified character, I thought
>> > they both have the same character code.

>> Since there are two internal byte-sequence representation, I don't see
>> any good reason why we shouldn't have 2 internal int representations.
>> I.e. if unification failed for the byte-sequence (which might be the
>> result of a bug, for all I know), we may as well keep them non-unified
>> in the int representation.

> Please note that not all characters in the code-space of a
> CJK charset are unified.  For instance, Big5 has it's own
> PUA (private use area), and characters in PUA are not
> unified by default.  So, if Emacs reads a Big5 file that
> contains PUA chars, those chars stay in high-area.   Then,
> one can provide his own unification map that also maps PUA
> chars to some Unicode chars as this:
>   (unify-charset 'big5 "MyBig5.map")
> After this, I thought that previously read PUA chars staying
> in the high-area should be treated as the corresponding
> Unicode chars (in displaying, search, etc).

But again, this unification takes place during decoding.  Whereas what
I'm talking about takes place when reading the internal utf-8
representation, which should be already unified.


        Stefan




This bug report was last modified 12 years and 95 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.