From unknown Tue Jun 17 20:11:34 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#11073 <11073@debbugs.gnu.org> To: bug#11073 <11073@debbugs.gnu.org> Subject: Status: 24.0.94; BIDI-related crash in redisplay with certain byte sequences Reply-To: bug#11073 <11073@debbugs.gnu.org> Date: Wed, 18 Jun 2025 03:11:34 +0000 retitle 11073 24.0.94; BIDI-related crash in redisplay with certain byte se= quences reassign 11073 emacs submitter 11073 Eli Zaretskii severity 11073 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 23 07:26:33 2012 Received: (at submit) by debbugs.gnu.org; 23 Mar 2012 11:26:33 +0000 Received: from localhost ([127.0.0.1]:34254 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SB2dY-00053z-Om for submit@debbugs.gnu.org; Fri, 23 Mar 2012 07:26:29 -0400 Received: from eggs.gnu.org ([208.118.235.92]:34140) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SB2dV-00053s-UE for submit@debbugs.gnu.org; Fri, 23 Mar 2012 07:26:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SB29U-0008Tp-Sx for submit@debbugs.gnu.org; Fri, 23 Mar 2012 06:55:33 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:34241) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SB29U-0008Tk-Pt for submit@debbugs.gnu.org; Fri, 23 Mar 2012 06:55:24 -0400 Received: from eggs.gnu.org ([208.118.235.92]:40614) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SB29S-00074W-K6 for bug-gnu-emacs@gnu.org; Fri, 23 Mar 2012 06:55:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SB29L-0008T2-Hf for bug-gnu-emacs@gnu.org; Fri, 23 Mar 2012 06:55:22 -0400 Received: from mtaout22.012.net.il ([80.179.55.172]:49844) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SB29L-0008Sw-32 for bug-gnu-emacs@gnu.org; Fri, 23 Mar 2012 06:55:15 -0400 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0M1C005003N0V900@a-mtaout22.012.net.il> for bug-gnu-emacs@gnu.org; Fri, 23 Mar 2012 12:55:13 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.229.241.151]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0M1C005YN3O00KK0@a-mtaout22.012.net.il> for bug-gnu-emacs@gnu.org; Fri, 23 Mar 2012 12:55:13 +0200 (IST) Date: Fri, 23 Mar 2012 12:55:19 +0200 From: Eli Zaretskii Subject: 24.0.94; BIDI-related crash in redisplay with certain byte sequences X-012-Sender: halo1@inter.net.il To: bug-gnu-emacs@gnu.org Message-id: <83sjgzvb6w.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) The person who reported this to me in private email won't go public, for whatever reasons, so I'm reporting this for them. The recipe: emacs -Q C-x C-f bidicrash.txt RET where the file bidicrash.txt was created with this shell command: echo -e "\0365\0205\0264\0225" (On Windows, use the port of GNU `echo' rather than the built-in shell command.) Emacs crashes; the backtrace is below. I'm working on fixing this. Breakpoint 1, w32_abort () at w32fns.c:7196 7196 button = MessageBox (NULL, (gdb) bt #0 w32_abort () at w32fns.c:7196 #1 0x012f2e49 in bidi_get_type (ch=4195533, override=NEUTRAL_DIR) at bidi.c:108 #2 0x012f4120 in bidi_resolve_explicit_1 (bidi_it=0x82cff8) at bidi.c:1400 #3 0x012f44a8 in bidi_resolve_explicit (bidi_it=0x82cff8) at bidi.c:1529 #4 0x012f4a2f in bidi_resolve_weak (bidi_it=0x82cff8) at bidi.c:1614 #5 0x012f5110 in bidi_resolve_neutral (bidi_it=0x82cff8) at bidi.c:1850 #6 0x012f5a49 in bidi_type_of_next_char (bidi_it=0x82cff8) at bidi.c:2020 #7 0x012f5d6f in bidi_level_of_next_char (bidi_it=0x82cff8) at bidi.c:2133 #8 0x012f630e in bidi_move_to_visually_next (bidi_it=0x82cff8) at bidi.c:2342 #9 0x0116aded in set_iterator_to_next (it=0x82ca40, reseat_p=1) at xdisp.c:6898 #10 0x011941c1 in display_line (it=0x82ca40) at xdisp.c:19341 #11 0x0118917a in try_window (window=55991301, pos=..., flags=1) at xdisp.c:15977 #12 0x01186a32 in redisplay_window (window=55991301, just_this_one_p=0) at xdisp.c:15502 #13 0x011800b8 in redisplay_window_0 (window=55991301) at xdisp.c:13625 #14 0x01033d1b in internal_condition_case_1 ( bfun=0x1180086 , arg=55991301, handlers=53234414, hfun=0x1180065 ) at eval.c:1553 #15 0x01180055 in redisplay_windows (window=55991301) at xdisp.c:13605 #16 0x0117dff8 in redisplay_internal () at xdisp.c:13182 #17 0x0117b2ea in redisplay () at xdisp.c:12405 #18 0x010087fb in read_char (commandflag=1, nmaps=2, maps=0x82fa30, prev_event=53250074, used_mouse_menu=0x82fb5c, end_time=0x0) at keyboard.c:2446 #19 0x0101c246 in read_key_sequence (keybuf=0x82fc60, bufsize=30, prompt=53250074, dont_downcase_last=0, can_return_switch_frame=1, fix_current_buffer=1) at keyboard.c:9326 #20 0x01005aa8 in command_loop_1 () at keyboard.c:1448 #21 0x01033c0b in internal_condition_case (bfun=0x10054b6 , handlers=53307802, hfun=0x1004ce0 ) at eval.c:1515 #22 0x0100511c in command_loop_2 (ignore=53250074) at keyboard.c:1159 #23 0x010335cb in internal_catch (tag=53305826, func=0x10050f9 , arg=53250074) at eval.c:1272 #24 0x010050d4 in command_loop () at keyboard.c:1138 #25 0x0100469e in recursive_edit_1 () at keyboard.c:758 #26 0x010049c0 in Frecursive_edit () at keyboard.c:822 #27 0x010027c8 in main (argc=2, argv=0xa32880) at emacs.c:1715 (gdb) up #1 0x012f2e49 in bidi_get_type (ch=4195533, override=NEUTRAL_DIR) at bidi.c:108 108 abort (); (gdb) up #2 0x012f4120 in bidi_resolve_explicit_1 (bidi_it=0x82cff8) at bidi.c:1400 1400 type = bidi_get_type (curchar, NEUTRAL_DIR); (gdb) p bidi_it->charpos $1 = 2 (gdb) p bidi_it->bytepos $2 = 4 (gdb) p bidi_it->ch_len $3 = 2 (gdb) p bidi_it->ch $4 = 4195533 (gdb) p/x bidi_it->ch $5 = 0x4004cd (gdb) This is on Windows. On GNU/Linux, or if you change the EOL format of the file to be Unix-style LF, the last command prints 0x4004ca instead. Evidently, Emacs is trying to produce a Unicode codepoint from bytes that include the newline sequence. In GNU Emacs 24.0.94.1 (i386-mingw-nt5.1.2600) of 2012-02-27 on HOME-C4E4A596F7 Windowing system distributor `Microsoft Corp.', version 5.1.2600 Configured using: `configure --with-gcc (3.4)' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: ENU value of $XMODIFIERS: nil locale-coding-system: cp1255 default enable-multibyte-characters: t Major mode: Mail Minor modes in effect: diff-auto-refine-mode: t flyspell-mode: t desktop-save-mode: t show-paren-mode: t display-time-mode: t tooltip-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t temp-buffer-resize-mode: t line-number-mode: t abbrev-mode: t Recent input: a l SPC D E F A U L T _ F A C E _ I D S-SPC i n t a c t . ) S o SPC t h i s SPC b u g SPC h a s SPC r a h e t h e r SPC l o w SPC p r i o r i t y SPC a t SPC t h i s SPC t i m e , SPC a s SPC i t ' s SPC n o t SPC a SPC r e g r e s s i o n SPC w r t SPC E m a c s SPC 2 3 . SPC SPC N e v e r t h e l e s s , I S-SPC w i l l SPC a t SPC t h e SPC v e r y SPC l e a s t SPC t r y SPC t o SPC f i g u r e SPC o u t SPC w h a t SPC c h a n g e s SPC a r e SPC n e e d e d SPC t o SPC m a k e SPC t h i s SPC w o r k SPC a s SPC e x p e c t e d . T i m e SPC p e r m i t t i n g , SPC M-q C-c C-s n n n n p p M-x e m a c s - r e r e p o r t Recent messages: Mark set [4 times] Auto-saving...done byte-code: End of buffer Auto-saving...done Mark set Sending... Added to d:/usr/eli/rmail/SENT.MAIL Sending email Sending email done Sending...done Load-path shadows: None found. Features: (shadow emacsbug etags cc-awk network-stream starttls tls smtpmail auth-source eieio assoc gnus-util password-cache mailalias sendmail multi-isearch find-func help-mode view rmailout dabbrev ld-script dired-x dired tcl nxml-uchnm rng-xsd xsd-regexp rng-cmpct rng-nxml rng-valid rng-loc rng-uri rng-parse nxml-parse rng-match rng-dt rng-util rng-pttrn nxml-ns nxml-mode nxml-outln nxml-rap nxml-util nxml-glyph nxml-enc xmltok sgml-mode org-wl org-w3m org-vm org-rmail org-mhe org-mew org-irc org-jsinfo org-infojs org-html org-exp ob-exp org-exp-blocks org-agenda org-info org-gnus org-docview org-bibtex bibtex org-bbdb org byte-opt warnings bytecomp byte-compile cconv macroexp advice help-fns advice-preload ob-emacs-lisp ob-tangle ob-ref ob-lob ob-table org-footnote org-src ob-comint ob-keys ob ob-eval org-pcomplete pcomplete org-list org-faces org-compat org-entities org-macs cal-menu calendar cal-loaddefs noutline outline arc-mode archive-mode diff-mode conf-mode newcomment parse-time sh-script executable gud easy-mmode comint ansi-color ring generic jka-compr make-mode flyspell ispell vc-cvs autorevert info vc-bzr cc-mode cc-fonts cc-guess cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars cc-defs regexp-opt qp rmailsum rmailmm message format-spec rfc822 mml mml-sec mm-decode mm-bodies mm-encode mailabbrev gmm-utils mailheader mail-parse rfc2231 rmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils desktop server filecache mairix cus-edit easymenu cus-start cus-load wid-edit saveplace midnight generic-x paren battery time time-date tooltip ediff-hook vc-hooks lisp-float-type mwheel dos-w32 disp-table ls-lisp w32-win w32-vars tool-bar dnd fontset image fringe lisp-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs button faces cus-face files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote make-network-process multi-tty emacs) From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 23 09:06:26 2012 Received: (at 11073) by debbugs.gnu.org; 23 Mar 2012 13:06:26 +0000 Received: from localhost ([127.0.0.1]:34984 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SB4CE-0008Mo-38 for submit@debbugs.gnu.org; Fri, 23 Mar 2012 09:06:25 -0400 Received: from mtaout22.012.net.il ([80.179.55.172]:48627) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SB4C8-0008Md-8c for 11073@debbugs.gnu.org; Fri, 23 Mar 2012 09:06:20 -0400 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0M1C0060088LJP00@a-mtaout22.012.net.il> for 11073@debbugs.gnu.org; Fri, 23 Mar 2012 14:35:22 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.229.241.151]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0M1C006B58AWFA20@a-mtaout22.012.net.il> for 11073@debbugs.gnu.org; Fri, 23 Mar 2012 14:35:21 +0200 (IST) Date: Fri, 23 Mar 2012 14:35:28 +0200 From: Eli Zaretskii Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences In-reply-to: <83sjgzvb6w.fsf@gnu.org> X-012-Sender: halo1@inter.net.il To: 11073@debbugs.gnu.org Message-id: <83mx77v6jz.fsf@gnu.org> References: <83sjgzvb6w.fsf@gnu.org> X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 11073 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) > Date: Fri, 23 Mar 2012 12:55:19 +0200 > From: Eli Zaretskii > > emacs -Q > C-x C-f bidicrash.txt RET > > where the file bidicrash.txt was created with this shell command: > > echo -e "\0365\0205\0264\0225" > > (On Windows, use the port of GNU `echo' rather than the built-in shell > command.) > > Emacs crashes; the backtrace is below. > > I'm working on fixing this. Fixed in revision 107665 on the trunk. It was a pretty basic blunder. (Repeat after me: FETCH_MULTIBYTE_CHAR followed by CHAR_BYTES is not always equivalent to STRING_CHAR_AND_LENGTH.) From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 23 10:58:53 2012 Received: (at 11073) by debbugs.gnu.org; 23 Mar 2012 14:58:53 +0000 Received: from localhost ([127.0.0.1]:35491 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SB5x4-0002cl-33 for submit@debbugs.gnu.org; Fri, 23 Mar 2012 10:58:53 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:36073) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SB5wo-0002cN-BH for 11073@debbugs.gnu.org; Fri, 23 Mar 2012 10:58:48 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AicFAKU/KE9FpYqr/2dsb2JhbACBX5x7eacJhhkEmxmECQ X-IronPort-AV: E=Sophos;i="4.73,1,1325480400"; d="scan'208";a="169689891" Received: from 69-165-138-171.dsl.teksavvy.com (HELO pastel.home) ([69.165.138.171]) by ironport2-out.teksavvy.com with ESMTP/TLS/ADH-AES256-SHA; 23 Mar 2012 10:27:40 -0400 Received: by pastel.home (Postfix, from userid 20848) id 0D1CB58EA2; Fri, 23 Mar 2012 10:27:40 -0400 (EDT) From: Stefan Monnier To: Eli Zaretskii Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences Message-ID: References: <83sjgzvb6w.fsf@gnu.org> <83mx77v6jz.fsf@gnu.org> Date: Fri, 23 Mar 2012 10:27:39 -0400 In-Reply-To: <83mx77v6jz.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 23 Mar 2012 14:35:28 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.94 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 11073 Cc: 11073@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) > (Repeat after me: FETCH_MULTIBYTE_CHAR followed by CHAR_BYTES is not > always equivalent to STRING_CHAR_AND_LENGTH.) Do we really absolutely have to have such a trap? I mean: is there a good reason why they're not always equivalent? Stefan From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 23 12:29:54 2012 Received: (at 11073) by debbugs.gnu.org; 23 Mar 2012 16:29:54 +0000 Received: from localhost ([127.0.0.1]:35581 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SB7N8-0004lY-7q for submit@debbugs.gnu.org; Fri, 23 Mar 2012 12:29:54 -0400 Received: from mtaout22.012.net.il ([80.179.55.172]:41333) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SB7MZ-0004kt-CO for 11073@debbugs.gnu.org; Fri, 23 Mar 2012 12:29:48 -0400 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0M1C00800HEV1400@a-mtaout22.012.net.il> for 11073@debbugs.gnu.org; Fri, 23 Mar 2012 17:58:19 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.229.241.151]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0M1C007XMHP6HPF0@a-mtaout22.012.net.il>; Fri, 23 Mar 2012 17:58:19 +0200 (IST) Date: Fri, 23 Mar 2012 17:58:25 +0200 From: Eli Zaretskii Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences In-reply-to: X-012-Sender: halo1@inter.net.il To: Stefan Monnier Message-id: <83fwczux5q.fsf@gnu.org> References: <83sjgzvb6w.fsf@gnu.org> <83mx77v6jz.fsf@gnu.org> X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 11073 Cc: 11073@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) > From: Stefan Monnier > Cc: 11073@debbugs.gnu.org > Date: Fri, 23 Mar 2012 10:27:39 -0400 > > > (Repeat after me: FETCH_MULTIBYTE_CHAR followed by CHAR_BYTES is not > > always equivalent to STRING_CHAR_AND_LENGTH.) > > Do we really absolutely have to have such a trap? > I mean: is there a good reason why they're not always equivalent? They are not equivalent when conversion of the multibyte form into a character unifies a CJK character that is represented by a codepoint from one of the private use areas. This unification is done in char_string, via a call to MAYBE_UNIFY_CHAR, which converts the private codepoint into the equivalent codepoint in one of the "normal" planes. The UTF-8 encoding of the unified character can be shorter or longer than the original multibyte sequence. The problem with the code I had in bidi.c, viz.: character = FETCH_MULTIBYTE_CHAR (bytepos); char_len = CHAR_BYTES (character); is that the value in `character' is not guaranteed to correspond to the multibyte sequence consumed by FETCH_MULTIBYTE_CHAR, and therefore that character's length as returned by CHAR_BYTES is not the right instrument to advance to the next character. So, I'd say that FETCH_MULTIBYTE_CHAR should only be used for fetching a single character; if one wants to advance, one should either use FETCH_CHAR_ADVANCE or (if they are paranoiac about speed, like I am) use character = STRING_CHAR_AND_LENGTH (BYTE_POS_ADDR (bytepos), length); which returns the length of the consumed sequence, and use that to advance to the next character position. And note the other gotcha: that the length returned by STRING_CHAR_AND_LENGTH is not necessarily the length of the UTF-8 encoding of the character it returns, but rather the length of the multibyte sequence which was converted to the character. From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 23 14:06:00 2012 Received: (at 11073) by debbugs.gnu.org; 23 Mar 2012 18:06:00 +0000 Received: from localhost ([127.0.0.1]:35673 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SB8sC-000747-3J for submit@debbugs.gnu.org; Fri, 23 Mar 2012 14:06:00 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.183]:12586) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SB8rs-00073e-91 for 11073@debbugs.gnu.org; Fri, 23 Mar 2012 14:05:59 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AicFAKU/KE9FpYqr/2dsb2JhbACBX5x7eYhwnhmGGQSbGYQJ X-IronPort-AV: E=Sophos;i="4.73,1,1325480400"; d="scan'208";a="169730601" Received: from 69-165-138-171.dsl.teksavvy.com (HELO pastel.home) ([69.165.138.171]) by ironport2-out.teksavvy.com with ESMTP/TLS/ADH-AES256-SHA; 23 Mar 2012 13:34:45 -0400 Received: by pastel.home (Postfix, from userid 20848) id 3E0A858EA2; Fri, 23 Mar 2012 13:34:45 -0400 (EDT) From: Stefan Monnier To: Eli Zaretskii Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences Message-ID: References: <83sjgzvb6w.fsf@gnu.org> <83mx77v6jz.fsf@gnu.org> <83fwczux5q.fsf@gnu.org> Date: Fri, 23 Mar 2012 13:34:45 -0400 In-Reply-To: <83fwczux5q.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 23 Mar 2012 17:58:25 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.94 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 11073 Cc: 11073@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) > They are not equivalent when conversion of the multibyte form into a > character unifies a CJK character that is represented by a codepoint > from one of the private use areas. Why do we need this unification? Or rather, why do we need multiple codepoints, which then forces us to unify them? Stefan From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 23 15:17:46 2012 Received: (at 11073) by debbugs.gnu.org; 23 Mar 2012 19:17:46 +0000 Received: from localhost ([127.0.0.1]:35707 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SB9za-0000ND-CB for submit@debbugs.gnu.org; Fri, 23 Mar 2012 15:17:46 -0400 Received: from mtaout23.012.net.il ([80.179.55.175]:49803) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SB9zK-0000Mp-09 for 11073@debbugs.gnu.org; Fri, 23 Mar 2012 15:17:41 -0400 Received: from conversion-daemon.a-mtaout23.012.net.il by a-mtaout23.012.net.il (HyperSendmail v2007.08) id <0M1C00F00PGUK800@a-mtaout23.012.net.il> for 11073@debbugs.gnu.org; Fri, 23 Mar 2012 20:46:29 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.229.241.151]) by a-mtaout23.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0M1C00FVYPHGHW90@a-mtaout23.012.net.il>; Fri, 23 Mar 2012 20:46:29 +0200 (IST) Date: Fri, 23 Mar 2012 20:46:36 +0200 From: Eli Zaretskii Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences In-reply-to: X-012-Sender: halo1@inter.net.il To: Stefan Monnier , Kenichi Handa Message-id: <837gybupdf.fsf@gnu.org> References: <83sjgzvb6w.fsf@gnu.org> <83mx77v6jz.fsf@gnu.org> <83fwczux5q.fsf@gnu.org> X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 11073 Cc: 11073@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) > From: Stefan Monnier > Cc: 11073@debbugs.gnu.org > Date: Fri, 23 Mar 2012 13:34:45 -0400 > > > They are not equivalent when conversion of the multibyte form into a > > character unifies a CJK character that is represented by a codepoint > > from one of the private use areas. > > Why do we need this unification? Or rather, why do we need multiple > codepoints, which then forces us to unify them? That's something Handa-san (CC'ed) will be able to explain much better than I ever could. AFAIU, there are good reasons to have some CJK characters on separate codepoints, because they need to be treated differently from their Unicode codepoints (perhaps a different choice of font to display them?) From debbugs-submit-bounces@debbugs.gnu.org Mon Mar 26 04:17:27 2012 Received: (at 11073) by debbugs.gnu.org; 26 Mar 2012 08:17:28 +0000 Received: from localhost ([127.0.0.1]:39058 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SC57H-0005pc-5S for submit@debbugs.gnu.org; Mon, 26 Mar 2012 04:17:27 -0400 Received: from mx1.aist.go.jp ([150.29.246.133]:40864) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SC56z-0005p0-8g for 11073@debbugs.gnu.org; Mon, 26 Mar 2012 04:17:25 -0400 Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id q2Q7jwnL025188; Mon, 26 Mar 2012 16:45:58 +0900 (JST) env-from (handa@m17n.org) Received: from smtp3.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id q2Q7jwkG009492; Mon, 26 Mar 2012 16:45:58 +0900 (JST) env-from (handa@m17n.org) Received: by smtp3.aist.go.jp with ESMTP id q2Q7jvHk008251; Mon, 26 Mar 2012 16:45:57 +0900 (JST) env-from (handa@m17n.org) From: Kenichi Handa To: Eli Zaretskii Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences In-Reply-To: <837gybupdf.fsf@gnu.org> (message from Eli Zaretskii on Fri, 23 Mar 2012 20:46:36 +0200) Date: Mon, 26 Mar 2012 16:45:56 +0900 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 11073 Cc: 11073@debbugs.gnu.org, monnier@iro.umontreal.ca X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) In article <837gybupdf.fsf@gnu.org>, Eli Zaretskii writes: > > Why do we need this unification? Or rather, why do we need multiple > > codepoints, which then forces us to unify them? > That's something Handa-san (CC'ed) will be able to explain much better > than I ever could. It's a long story. When I designed emacs-unicode (the version before merged to the trunk, more than 10 years ago), the unification maps of CJK charsets to Unicode were not stable. In addtion, there were various conflicting policies on which character to unify to which character. One reason of this confusion was that Unicode itself didn't define mapping to/from such CJK charsets (JIS, GB, KSC). The unification problem is not only for Ideographic characters. Many CJK charsets contain, for instance, full-width version of Greek characters, but Unicode doesn't distinguish them from single-width versions (though Unicode has full-width version of 'A'..'Z', etc). There were people who wanted to distinguish full-width Greek chars from single-width chars. There also were people who have a text of iso-2022-7bit file which distinguishes characters of GB charset and JIS charset. To edit such a file and write it back as the original one, one has to disable unification of one of GB and JIS (or both of them). So, I decided at that time to give each CJK charset unique code space (above #x110000) in Emacs, and allow users to freely unify/disunify them to Unicode code space (below #x110000) by giving the function unify-charset. FYI, http://www.unicode.org/reports/tr38/ tells some difficulty of mappings. > AFAIU, there are good reasons to have some CJK > characters on separate codepoints, because they need to be treated > differently from their Unicode codepoints (perhaps a different choice > of font to display them?) That was one reaons, but the current code pay attention to `charset' text property of each character to select a proper font. --- Kenichi Handa handa@m17n.org From debbugs-submit-bounces@debbugs.gnu.org Mon Mar 26 08:55:27 2012 Received: (at 11073) by debbugs.gnu.org; 26 Mar 2012 12:55:27 +0000 Received: from localhost ([127.0.0.1]:39355 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SC9SI-0004pO-4Q for submit@debbugs.gnu.org; Mon, 26 Mar 2012 08:55:26 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:19852) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SC9S1-0004p1-Bk for 11073@debbugs.gnu.org; Mon, 26 Mar 2012 08:55:23 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AicFAKU/KE9FpYqr/2dsb2JhbACBX5x7eYhwnhmGGQSbGYQJ X-IronPort-AV: E=Sophos;i="4.73,1,1325480400"; d="scan'208";a="170084664" Received: from 69-165-138-171.dsl.teksavvy.com (HELO pastel.home) ([69.165.138.171]) by ironport2-out.teksavvy.com with ESMTP/TLS/ADH-AES256-SHA; 26 Mar 2012 08:23:58 -0400 Received: by pastel.home (Postfix, from userid 20848) id 8218F590DD; Mon, 26 Mar 2012 08:23:58 -0400 (EDT) From: Stefan Monnier To: Kenichi Handa Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences Message-ID: References: Date: Mon, 26 Mar 2012 08:23:58 -0400 In-Reply-To: (Kenichi Handa's message of "Mon, 26 Mar 2012 16:45:56 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.94 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 11073 Cc: Eli Zaretskii , 11073@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) > So, I decided at that time to give each CJK charset unique > code space (above #x110000) in Emacs, and allow users to > freely unify/disunify them to Unicode code space (below > #x110000) by giving the function unify-charset. I understand this part. The part I don't understand is why we do unification when reading a char from the buffer's text. That is: why unify chars in `int' (or Lisp_Object) form but not in the internal-utf-8 representation? I would expect the unification to happen during encoding/decoding only, and not during internal conversions from byte byte-sequence to int. Stefan From debbugs-submit-bounces@debbugs.gnu.org Thu Mar 29 01:51:56 2012 Received: (at 11073) by debbugs.gnu.org; 29 Mar 2012 05:51:56 +0000 Received: from localhost ([127.0.0.1]:43981 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SD8H4-00074L-Uc for submit@debbugs.gnu.org; Thu, 29 Mar 2012 01:51:56 -0400 Received: from mx1.aist.go.jp ([150.29.246.133]:42669) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SD8GV-00073c-RA for 11073@debbugs.gnu.org; Thu, 29 Mar 2012 01:51:53 -0400 Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id q2T5JqOJ018654; Thu, 29 Mar 2012 14:19:52 +0900 (JST) env-from (handa@m17n.org) Received: from smtp4.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id q2T5JqhF029699; Thu, 29 Mar 2012 14:19:52 +0900 (JST) env-from (handa@m17n.org) Received: by smtp4.aist.go.jp with ESMTP id q2T5Jonp018772; Thu, 29 Mar 2012 14:19:50 +0900 (JST) env-from (handa@m17n.org) From: Kenichi Handa To: Stefan Monnier Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences In-Reply-To: (message from Stefan Monnier on Mon, 26 Mar 2012 08:23:58 -0400) Date: Thu, 29 Mar 2012 14:19:50 +0900 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 11073 Cc: eliz@gnu.org, 11073@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) In article , Stefan Monnier writes: > I understand this part. The part I don't understand is why we do > unification when reading a char from the buffer's text. That is: why > unify chars in `int' (or Lisp_Object) form but not in the > internal-utf-8 representation? > I would expect the unification to happen during encoding/decoding Usually, yes. But as far as there is a code space in high area for a CJK charset, it is unavoidable to have a buffer/string that contains a character represented by a byte sequence in that high area as the test case of Bug#11073. And, as "unification" means to treat such a character the same way as the unified character, I thought they both have the same character code. --- Kenichi Handa handa@m17n.org From debbugs-submit-bounces@debbugs.gnu.org Thu Mar 29 12:35:53 2012 Received: (at 11073) by debbugs.gnu.org; 29 Mar 2012 16:35:53 +0000 Received: from localhost ([127.0.0.1]:45058 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SDIKH-0007Jv-0y for submit@debbugs.gnu.org; Thu, 29 Mar 2012 12:35:53 -0400 Received: from chene.dit.umontreal.ca ([132.204.246.20]:39852) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SDIKF-0007Jo-9B for 11073@debbugs.gnu.org; Thu, 29 Mar 2012 12:35:51 -0400 Received: from faina.iro.umontreal.ca (lechon.iro.umontreal.ca [132.204.27.242]) by chene.dit.umontreal.ca (8.14.1/8.14.1) with ESMTP id q2TG4M3E032267; Thu, 29 Mar 2012 12:04:22 -0400 Received: by faina.iro.umontreal.ca (Postfix, from userid 20848) id 8660FB4507; Thu, 29 Mar 2012 12:04:22 -0400 (EDT) From: Stefan Monnier To: Kenichi Handa Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences Message-ID: References: Date: Thu, 29 Mar 2012 12:04:22 -0400 In-Reply-To: (Kenichi Handa's message of "Thu, 29 Mar 2012 14:19:50 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.94 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-NAI-Spam-Flag: NO X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0 X-NAI-Spam-Rules: 1 Rules triggered RV4176=0 X-NAI-Spam-Version: 2.2.0.9309 : core <4176> : streams <742149> : uri <1092189> X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 11073 Cc: eliz@gnu.org, 11073@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) >> I understand this part. The part I don't understand is why we do >> unification when reading a char from the buffer's text. That is: why >> unify chars in `int' (or Lisp_Object) form but not in the >> internal-utf-8 representation? >> I would expect the unification to happen during encoding/decoding > Usually, yes. But as far as there is a code space in high > area for a CJK charset, it is unavoidable to have a > buffer/string that contains a character represented by a > byte sequence in that high area as the test case of > Bug#11073. And, as "unification" means to treat such a > character the same way as the unified character, I thought > they both have the same character code. Since there are two internal byte-sequence representation, I don't see any good reason why we shouldn't have 2 internal int representations. I.e. if unification failed for the byte-sequence (which might be the result of a bug, for all I know), we may as well keep them non-unified in the int representation. Stefan From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 02 22:22:51 2012 Received: (at 11073) by debbugs.gnu.org; 3 Apr 2012 02:22:52 +0000 Received: from localhost ([127.0.0.1]:37243 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SEtOV-0001df-3y for submit@debbugs.gnu.org; Mon, 02 Apr 2012 22:22:51 -0400 Received: from mx1.aist.go.jp ([150.29.246.133]:65354) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SEtOR-0001dW-Hp for 11073@debbugs.gnu.org; Mon, 02 Apr 2012 22:22:49 -0400 Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id q332MPWK017340; Tue, 3 Apr 2012 11:22:25 +0900 (JST) env-from (handa@m17n.org) Received: from smtp3.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id q332MPvi014142; Tue, 3 Apr 2012 11:22:25 +0900 (JST) env-from (handa@m17n.org) Received: by smtp3.aist.go.jp with ESMTP id q332MNWj010620; Tue, 3 Apr 2012 11:22:23 +0900 (JST) env-from (handa@m17n.org) From: Kenichi Handa To: Stefan Monnier Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences In-Reply-To: (message from Stefan Monnier on Thu, 29 Mar 2012 12:04:22 -0400) Date: Tue, 03 Apr 2012 11:22:23 +0900 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 11073 Cc: eliz@gnu.org, 11073@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) In article , Stefan Monnier writes: > > Usually, yes. But as far as there is a code space in high > > area for a CJK charset, it is unavoidable to have a > > buffer/string that contains a character represented by a > > byte sequence in that high area as the test case of > > Bug#11073. And, as "unification" means to treat such a > > character the same way as the unified character, I thought > > they both have the same character code. > Since there are two internal byte-sequence representation, I don't see > any good reason why we shouldn't have 2 internal int representations. > I.e. if unification failed for the byte-sequence (which might be the > result of a bug, for all I know), we may as well keep them non-unified > in the int representation. Please note that not all characters in the code-space of a CJK charset are unified. For instance, Big5 has it's own PUA (private use area), and characters in PUA are not unified by default. So, if Emacs reads a Big5 file that contains PUA chars, those chars stay in high-area. Then, one can provide his own unification map that also maps PUA chars to some Unicode chars as this: (unify-charset 'big5 "MyBig5.map") After this, I thought that previously read PUA chars staying in the high-area should be treated as the corresponding Unicode chars (in displaying, search, etc). One may find some bug in his map or find another map is better. Then he can do this again: (unify-charset 'big5 "MyNewBig5.map") The current design was to enable such a scenario. Of course, there will be an opinion that such a functionality is too much for Emacs, and when one changes any unification map, he must re-read a file, process-output, mail etc. --- Kenichi Handa handa@m17n.org From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 03 00:22:55 2012 Received: (at 11073) by debbugs.gnu.org; 3 Apr 2012 04:22:55 +0000 Received: from localhost ([127.0.0.1]:37301 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SEvGh-0004Od-Ez for submit@debbugs.gnu.org; Tue, 03 Apr 2012 00:22:55 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.183]:41171) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SEvGf-0004OV-Cs for 11073@debbugs.gnu.org; Tue, 03 Apr 2012 00:22:54 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AicFAKU/KE9soXt6/2dsb2JhbACBX5x7eYhwnhmGGQSbGYQJ X-IronPort-AV: E=Sophos;i="4.73,1,1325480400"; d="scan'208";a="171485558" Received: from 108-161-123-122.dsl.teksavvy.com (HELO pastel.home) ([108.161.123.122]) by ironport2-out.teksavvy.com with ESMTP/TLS/ADH-AES256-SHA; 03 Apr 2012 00:22:32 -0400 Received: by pastel.home (Postfix, from userid 20848) id 92EB559322; Tue, 3 Apr 2012 00:22:32 -0400 (EDT) From: Stefan Monnier To: Kenichi Handa Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences Message-ID: References: Date: Tue, 03 Apr 2012 00:22:32 -0400 In-Reply-To: (Kenichi Handa's message of "Tue, 03 Apr 2012 11:22:23 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.94 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 11073 Cc: eliz@gnu.org, 11073@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) >> > Usually, yes. But as far as there is a code space in high >> > area for a CJK charset, it is unavoidable to have a >> > buffer/string that contains a character represented by a >> > byte sequence in that high area as the test case of >> > Bug#11073. And, as "unification" means to treat such a >> > character the same way as the unified character, I thought >> > they both have the same character code. >> Since there are two internal byte-sequence representation, I don't see >> any good reason why we shouldn't have 2 internal int representations. >> I.e. if unification failed for the byte-sequence (which might be the >> result of a bug, for all I know), we may as well keep them non-unified >> in the int representation. > Please note that not all characters in the code-space of a > CJK charset are unified. For instance, Big5 has it's own > PUA (private use area), and characters in PUA are not > unified by default. So, if Emacs reads a Big5 file that > contains PUA chars, those chars stay in high-area. Then, > one can provide his own unification map that also maps PUA > chars to some Unicode chars as this: > (unify-charset 'big5 "MyBig5.map") > After this, I thought that previously read PUA chars staying > in the high-area should be treated as the corresponding > Unicode chars (in displaying, search, etc). But again, this unification takes place during decoding. Whereas what I'm talking about takes place when reading the internal utf-8 representation, which should be already unified. Stefan From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 03 01:55:39 2012 Received: (at 11073) by debbugs.gnu.org; 3 Apr 2012 05:55:39 +0000 Received: from localhost ([127.0.0.1]:37356 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SEwiQ-0006bk-EC for submit@debbugs.gnu.org; Tue, 03 Apr 2012 01:55:39 -0400 Received: from mx1.aist.go.jp ([150.29.246.133]:34558) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SEwiN-0006bc-F1 for 11073@debbugs.gnu.org; Tue, 03 Apr 2012 01:55:37 -0400 Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id q335tDPE010924; Tue, 3 Apr 2012 14:55:13 +0900 (JST) env-from (handa@m17n.org) Received: from smtp4.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id q335tCK0025726; Tue, 3 Apr 2012 14:55:12 +0900 (JST) env-from (handa@m17n.org) Received: by smtp4.aist.go.jp with ESMTP id q335tBFn026094; Tue, 3 Apr 2012 14:55:11 +0900 (JST) env-from (handa@m17n.org) From: Kenichi Handa To: Stefan Monnier Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences In-Reply-To: (message from Stefan Monnier on Tue, 03 Apr 2012 00:22:32 -0400) Date: Tue, 03 Apr 2012 14:55:11 +0900 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 11073 Cc: eliz@gnu.org, 11073@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) In article , Stefan Monnier writes: > > Please note that not all characters in the code-space of a > > CJK charset are unified. For instance, Big5 has it's own > > PUA (private use area), and characters in PUA are not > > unified by default. So, if Emacs reads a Big5 file that > > contains PUA chars, those chars stay in high-area. Then, > > one can provide his own unification map that also maps PUA > > chars to some Unicode chars as this: > > (unify-charset 'big5 "MyBig5.map") > > After this, I thought that previously read PUA chars staying > > in the high-area should be treated as the corresponding > > Unicode chars (in displaying, search, etc). > But again, this unification takes place during decoding. No. In the above scenario, PUA chars read before the call of unify-charset are not unified. The unification should take place after the call of unify-charset. > Whereas what > I'm talking about takes place when reading the internal utf-8 > representation, which should be already unified. I'm talking about exactly that case. --- Kenichi Handa handa@m17n.org From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 03 09:03:19 2012 Received: (at 11073) by debbugs.gnu.org; 3 Apr 2012 13:03:20 +0000 Received: from localhost ([127.0.0.1]:37797 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SF3OI-0000XN-UN for submit@debbugs.gnu.org; Tue, 03 Apr 2012 09:03:19 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:55011) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SF3OF-0000XF-QW for 11073@debbugs.gnu.org; Tue, 03 Apr 2012 09:03:16 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AicFAKU/KE9soXt6/2dsb2JhbACBX5x7eYhwnhmGGQSbGYQJ X-IronPort-AV: E=Sophos;i="4.73,1,1325480400"; d="scan'208";a="171657392" Received: from 108-161-123-122.dsl.teksavvy.com (HELO pastel.home) ([108.161.123.122]) by ironport2-out.teksavvy.com with ESMTP/TLS/ADH-AES256-SHA; 03 Apr 2012 09:02:53 -0400 Received: by pastel.home (Postfix, from userid 20848) id A457359322; Tue, 3 Apr 2012 09:02:52 -0400 (EDT) From: Stefan Monnier To: Kenichi Handa Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences Message-ID: References: Date: Tue, 03 Apr 2012 09:02:52 -0400 In-Reply-To: (Kenichi Handa's message of "Tue, 03 Apr 2012 14:55:11 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.94 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 11073 Cc: eliz@gnu.org, 11073@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) >> > Please note that not all characters in the code-space of a >> > CJK charset are unified. For instance, Big5 has it's own >> > PUA (private use area), and characters in PUA are not >> > unified by default. So, if Emacs reads a Big5 file that >> > contains PUA chars, those chars stay in high-area. Then, >> > one can provide his own unification map that also maps PUA >> > chars to some Unicode chars as this: >> > (unify-charset 'big5 "MyBig5.map") >> > After this, I thought that previously read PUA chars staying >> > in the high-area should be treated as the corresponding >> > Unicode chars (in displaying, search, etc). > No. In the above scenario, PUA chars read before the call > of unify-charset are not unified. The unification should > take place after the call of unify-charset. But isn't this (unify-charset 'big5 "MyBig5.map") performed in the .emacs? Is it really important to support adding unification rules after decoding took place? If so, why? And also, what about removing unification rules after decoding? Stefan From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 03 20:07:35 2012 Received: (at 11073) by debbugs.gnu.org; 4 Apr 2012 00:07:35 +0000 Received: from localhost ([127.0.0.1]:38613 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SFDl8-00087n-ON for submit@debbugs.gnu.org; Tue, 03 Apr 2012 20:07:35 -0400 Received: from mx1.aist.go.jp ([150.29.246.133]:60188) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SFDl5-00087c-FU for 11073@debbugs.gnu.org; Tue, 03 Apr 2012 20:07:33 -0400 Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id q34074tw028555; Wed, 4 Apr 2012 09:07:04 +0900 (JST) env-from (handa@m17n.org) Received: from smtp1.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id q34074Sn009430; Wed, 4 Apr 2012 09:07:04 +0900 (JST) env-from (handa@m17n.org) Received: by smtp1.aist.go.jp with ESMTP id q34073Mq013628; Wed, 4 Apr 2012 09:07:03 +0900 (JST) env-from (handa@m17n.org) From: Kenichi Handa To: Stefan Monnier Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences In-Reply-To: (message from Stefan Monnier on Tue, 03 Apr 2012 09:02:52 -0400) Date: Wed, 04 Apr 2012 09:07:02 +0900 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 11073 Cc: eliz@gnu.org, 11073@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) In article , Stefan Monnier writes: > But isn't this (unify-charset 'big5 "MyBig5.map") performed in the > .emacs? Usually yes. But, in that case, if .emacs is encoded in Big5 and it contains some Big5 PUA chars, they are not unified while loading .emacs. > Is it really important to support adding unification rules > after decoding took place? If so, why? As I wrote, I can't tell how important it is. It may be very important for those (but I guess very few) who need the above operation, but not important for the majority. I'm ok to remove such a feature if the maintainers decide that. > And also, what about removing unification rules after > decoding? When one tells Emacs to unify some chars, and then reads a file containing those chars, there's no way to dis-unify them. --- Kenichi Handa handa@m17n.org From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 03 21:17:47 2012 Received: (at 11073) by debbugs.gnu.org; 4 Apr 2012 01:17:47 +0000 Received: from localhost ([127.0.0.1]:38638 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SFEr5-0001HR-Fh for submit@debbugs.gnu.org; Tue, 03 Apr 2012 21:17:47 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.183]:44112) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SFEr0-0001HH-KO for 11073@debbugs.gnu.org; Tue, 03 Apr 2012 21:17:46 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AicFAKU/KE9FpZV7/2dsb2JhbACBX5x7eYhwnhmGGQSbGYQJ X-IronPort-AV: E=Sophos;i="4.73,1,1325480400"; d="scan'208";a="171906012" Received: from 69-165-149-123.dsl.teksavvy.com (HELO pastel.home) ([69.165.149.123]) by ironport2-out.teksavvy.com with ESMTP/TLS/ADH-AES256-SHA; 03 Apr 2012 21:17:17 -0400 Received: by pastel.home (Postfix, from userid 20848) id E129959388; Tue, 3 Apr 2012 21:17:16 -0400 (EDT) From: Stefan Monnier To: Kenichi Handa Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences Message-ID: References: Date: Tue, 03 Apr 2012 21:17:16 -0400 In-Reply-To: (Kenichi Handa's message of "Wed, 04 Apr 2012 09:07:02 +0900") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.94 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 11073 Cc: eliz@gnu.org, 11073@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) >> But isn't this (unify-charset 'big5 "MyBig5.map") performed in the .emacs? > Usually yes. But, in that case, if .emacs is encoded in > Big5 and it contains some Big5 PUA chars, they are not > unified while loading .emacs. Hmm... that doesn't sound like it would be a very common problem, but it's not completely hypothetical either. Would this problem also come up in a BIG5 locale? If not, then I think we can ignore this problem. >> Is it really important to support adding unification rules >> after decoding took place? If so, why? > As I wrote, I can't tell how important it is. It may be very > important for those (but I guess very few) who need the above > operation, but not important for the majority. > I'm ok to remove such a feature if the maintainers decide that. The problem with it is that it costs all the time for everyone, and it makes the behavior of some macros subtly more complex/different and hence adds a nasty complexity. So if at all possible, I'd rather find a way to remove it (not for 24.1, obviously). >> And also, what about removing unification rules after decoding? > When one tells Emacs to unify some chars, and then reads a file > containing those chars, there's no way to dis-unify them. But I guess this problem is even much less common. Stefan From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 05 21:13:56 2012 Received: (at 11073) by debbugs.gnu.org; 6 Apr 2012 01:13:56 +0000 Received: from localhost ([127.0.0.1]:41893 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SFxkS-0000sI-8r for submit@debbugs.gnu.org; Thu, 05 Apr 2012 21:13:56 -0400 Received: from mx1.aist.go.jp ([150.29.246.133]:56823) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SFxkO-0000s7-T2 for 11073@debbugs.gnu.org; Thu, 05 Apr 2012 21:13:55 -0400 Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id q361DECn008100; Fri, 6 Apr 2012 10:13:14 +0900 (JST) env-from (handa@m17n.org) Received: from smtp4.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id q361DE7w020364; Fri, 6 Apr 2012 10:13:14 +0900 (JST) env-from (handa@m17n.org) Received: by smtp4.aist.go.jp with ESMTP id q361DCXJ021330; Fri, 6 Apr 2012 10:13:12 +0900 (JST) env-from (handa@m17n.org) From: Kenichi Handa To: Stefan Monnier Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences In-Reply-To: (message from Stefan Monnier on Tue, 03 Apr 2012 21:17:16 -0400) Date: Fri, 06 Apr 2012 10:13:12 +0900 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 11073 Cc: eliz@gnu.org, 11073@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) In article , Stefan Monnier writes: >>> But isn't this (unify-charset 'big5 "MyBig5.map") performed in the .emacs? > > Usually yes. But, in that case, if .emacs is encoded in > > Big5 and it contains some Big5 PUA chars, they are not > > unified while loading .emacs. > Hmm... that doesn't sound like it would be a very common problem, but > it's not completely hypothetical either. Would this problem also come > up in a BIG5 locale? If not, then I think we can ignore this problem. If it ever comes up, it is mostly for people in BIG5 locale. But, please note that the reason I used BIG5 as an example is just because that charset name is short. Almost all CJK charsets have PUA (officially or just by convention). >>> Is it really important to support adding unification rules >>> after decoding took place? If so, why? > > As I wrote, I can't tell how important it is. It may be very > > important for those (but I guess very few) who need the above > > operation, but not important for the majority. > > I'm ok to remove such a feature if the maintainers decide that. > The problem with it is that it costs all the time for everyone, and it I believe the extra cost is almost negligible because such (dynamic) unification happens only for characters that is greater than MAX_UNICODE_CHAR. > makes the behavior of some macros subtly more complex/different and > hence adds a nasty complexity. That's mostly because I didn't write a proper comments on the relavant macros, and didn't provide a better macros for such a case as Eli's. > So if at all possible, I'd rather find a way to remove it (not for > 24.1, obviously). I myself think that it doens't cause much problem even if we keep this functionality, but, also don't raise strong objection to remove it for 24.2. >>> And also, what about removing unification rules after decoding? > > When one tells Emacs to unify some chars, and then reads a file > > containing those chars, there's no way to dis-unify them. > But I guess this problem is even much less common. Yes. That's why I didn't implement such a feature. --- Kenichi Handa handa@m17n.org From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 06 09:15:37 2012 Received: (at 11073) by debbugs.gnu.org; 6 Apr 2012 13:15:38 +0000 Received: from localhost ([127.0.0.1]:42358 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SG90r-0001DN-G7 for submit@debbugs.gnu.org; Fri, 06 Apr 2012 09:15:37 -0400 Received: from mtaout21.012.net.il ([80.179.55.169]:54438) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SG90f-0001D2-Fy for 11073@debbugs.gnu.org; Fri, 06 Apr 2012 09:15:35 -0400 Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0M2200F007963100@a-mtaout21.012.net.il> for 11073@debbugs.gnu.org; Fri, 06 Apr 2012 16:14:42 +0300 (IDT) Received: from HOME-C4E4A596F7 ([84.229.252.114]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0M2200FOP7GI2P10@a-mtaout21.012.net.il>; Fri, 06 Apr 2012 16:14:42 +0300 (IDT) Date: Fri, 06 Apr 2012 16:13:33 +0300 From: Eli Zaretskii Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences In-reply-to: X-012-Sender: halo1@inter.net.il To: Kenichi Handa Message-id: <837gxtatqa.fsf@gnu.org> References: X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 11073 Cc: 11073@debbugs.gnu.org, monnier@iro.umontreal.ca X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) > From: Kenichi Handa > Cc: eliz@gnu.org, 11073@debbugs.gnu.org > Date: Fri, 06 Apr 2012 10:13:12 +0900 > > > makes the behavior of some macros subtly more complex/different and > > hence adds a nasty complexity. > > That's mostly because I didn't write a proper comments on > the relavant macros, and didn't provide a better macros for > such a case as Eli's. I added comments to the relevant macros (as trunk revision 107781) to warn about these subtleties. From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 09 01:41:51 2012 Received: (at 11073) by debbugs.gnu.org; 9 Apr 2012 05:41:51 +0000 Received: from localhost ([127.0.0.1]:45902 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SH7MK-0003jR-9V for submit@debbugs.gnu.org; Mon, 09 Apr 2012 01:41:51 -0400 Received: from na3sys010aog107.obsmtp.com ([74.125.245.82]:59997) by debbugs.gnu.org with smtp (Exim 4.72) (envelope-from ) id 1SH60z-0001jC-Gd for 11073@debbugs.gnu.org; Mon, 09 Apr 2012 00:15:44 -0400 Received: from mail-pb0-f52.google.com ([209.85.160.52]) (using TLSv1) by na3sys010aob107.postini.com ([74.125.244.12]) with SMTP ID DSNKT4JiNtWCmF2viOpPFt5El5lNFNny7QoN@postini.com; Sun, 08 Apr 2012 21:14:47 PDT Received: by pbcuo15 with SMTP id uo15so5940596pbc.11 for <11073@debbugs.gnu.org>; Sun, 08 Apr 2012 21:14:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aist.go.jp; s=google; h=from:to:cc:subject:in-reply-to:date:message-id:mime-version :content-type; bh=/6CbIFJmLUtSYobQiQnJ/qExZQU2q5iV+YJHqHBimCw=; b=LFS/oeTTTmCrwvSG9pWYwsUWa/UVXQDWxoHxXXHXT4jsLdc/bxvhOf81ktEY/Rb8jb OhozX7jwsgiepm3EPs1nl6aW4XC9gQhAdCU3mGA3qDZn0CZbwdE6aDR0E60WrD1wMGvr gRIL4Xaf1ouHfgibwRkrMtTanWEJjNrxmGcYk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:in-reply-to:date:message-id:mime-version :content-type:x-gm-message-state; bh=/6CbIFJmLUtSYobQiQnJ/qExZQU2q5iV+YJHqHBimCw=; b=c8igW3pt2y78YyBy8PWUt6dI+XPj+dT0XtPk0hth1nQRctWSI6EIP2pY2fQHTEfONb CLpI1WMhF3vK0DYUJIy/MrLsXz6PwGq7QwhmseXctNCHw0DaD/K+HIpm5JyvAKtmaqjz YBP8vahSNBfQMFs/4gAQxXpbfcaE3HGhwLOskHGvz8QiyFEYf1unmGJl88C+DiyfzlMm EVnBY6YNi+T/R4jziTQfra+cer8gtqGrQSXwgJKUTSi2/kp+JmAjrStfz9Dm98TWYgS2 L8zWhBBi0l4Rg4MwgycSmhNYi2+oGGYCsIeNYAAlpKNUn62snEyyZt9pG1wSAYo0Vurp RtZA== Received: by 10.68.225.39 with SMTP id rh7mr15855564pbc.104.1333944886430; Sun, 08 Apr 2012 21:14:46 -0700 (PDT) Received: from etlken ([150.29.148.131]) by mx.google.com with ESMTPS id tz1sm1968479pbc.45.2012.04.08.21.14.45 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 08 Apr 2012 21:14:45 -0700 (PDT) From: Kenichi Handa To: Eli Zaretskii Subject: Re: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences In-Reply-To: <837gxtatqa.fsf@gnu.org> (message from Eli Zaretskii on Fri, 06 Apr 2012 16:13:33 +0300) Date: Mon, 09 Apr 2012 13:14:43 +0900 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Gm-Message-State: ALoCoQlhF093mquodHTQVxxBEbVaOpHsRPStqxrOqrvgC4FtLVIbeg4CgTWpKa0wh7grhMND2sEK X-Spam-Score: -4.2 (----) X-Debbugs-Envelope-To: 11073 X-Mailman-Approved-At: Mon, 09 Apr 2012 01:41:46 -0400 Cc: 11073@debbugs.gnu.org, monnier@iro.umontreal.ca X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.2 (----) In article <837gxtatqa.fsf@gnu.org>, Eli Zaretskii writes: > > That's mostly because I didn't write a proper comments on > > the relavant macros, and didn't provide a better macros for > > such a case as Eli's. > I added comments to the relevant macros (as trunk revision 107781) to > warn about these subtleties. Thank you!! --- Kenichi Handa handa@m17n.org From debbugs-submit-bounces@debbugs.gnu.org Sat Feb 16 22:24:09 2013 Received: (at control) by debbugs.gnu.org; 17 Feb 2013 03:24:09 +0000 Received: from localhost ([127.0.0.1]:59845 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1U6urI-0006df-KY for submit@debbugs.gnu.org; Sat, 16 Feb 2013 22:24:08 -0500 Received: from fencepost.gnu.org ([208.118.235.10]:59933) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1U6urH-0006dY-EQ for control@debbugs.gnu.org; Sat, 16 Feb 2013 22:24:07 -0500 Received: from rgm by fencepost.gnu.org with local (Exim 4.71) (envelope-from ) id 1U6uqR-0004QF-MW for control@debbugs.gnu.org; Sat, 16 Feb 2013 22:23:15 -0500 Date: Sat, 16 Feb 2013 22:23:15 -0500 Message-Id: Subject: control message for bug 11073 To: X-Mailer: mail (GNU Mailutils 2.1) From: Glenn Morris X-Spam-Score: -4.2 (----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.2 (----) close 11073 From unknown Tue Jun 17 20:11:34 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 17 Mar 2013 11:24:12 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator