From unknown Fri Jun 20 07:19:09 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#3745 <3745@debbugs.gnu.org> To: bug#3745 <3745@debbugs.gnu.org> Subject: Status: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment Reply-To: bug#3745 <3745@debbugs.gnu.org> Date: Fri, 20 Jun 2025 14:19:09 +0000 retitle 3745 23.0.95; emacs-23.0.95: unibyte-display-via-language-environme= nt reassign 3745 emacs submitter 3745 Jay Berkenbilt severity 3745 normal thanks From ejb@ql.org Thu Jul 2 18:40:19 2009 Received: (at submit) by emacsbugs.donarmstrong.com; 3 Jul 2009 01:40:20 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=0.1 required=4.0 tests=FOURLA autolearn=no version=3.2.5-bugs.debian.org_2005_01_02 Received: from fencepost.gnu.org (fencepost.gnu.org [140.186.70.10]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n631e9k2026774 for ; Thu, 2 Jul 2009 18:40:12 -0700 Received: from mx10.gnu.org ([199.232.76.166]:47778) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1MMXl3-0004vp-0k for emacs-pretest-bug@gnu.org; Thu, 02 Jul 2009 21:40:09 -0400 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1MMXkx-0007l1-3O for emacs-pretest-bug@gnu.org; Thu, 02 Jul 2009 21:40:06 -0400 Received: from hermes.mail.tigertech.net ([64.62.209.72]:51025) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MMXkv-0007kO-V1 for emacs-pretest-bug@gnu.org; Thu, 02 Jul 2009 21:40:02 -0400 Received: from localhost (localhost [127.0.0.1]) by hermes.tigertech.net (Postfix) with ESMTP id 99123437E20 for ; Thu, 2 Jul 2009 18:39:59 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at hermes.tigertech.net Received: from glerbl (ip68-100-225-145.dc.dc.cox.net [68.100.225.145]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by hermes.tigertech.net (Postfix) with ESMTP id 5E328434029 for ; Thu, 2 Jul 2009 18:39:59 -0700 (PDT) Received: from soup ([10.160.59.17] helo=soup.q.qbilt.org) by glerbl with esmtp (Exim 4.69) (envelope-from ) id 1MMXks-0001H2-EA for emacs-pretest-bug@gnu.org; Thu, 02 Jul 2009 21:39:58 -0400 Received: from ejb by soup.q.qbilt.org with local (Exim 4.69) (envelope-from ) id 1MMXks-0004Fy-Di; Thu, 02 Jul 2009 21:39:58 -0400 From: Jay Berkenbilt To: emacs-pretest-bug@gnu.org Subject: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment Message-ID: <20090702213958.0458148346.qww314159@soup.q.qbilt.org> Date: Thu, 02 Jul 2009 21:39:58 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: ejb@q.qbilt.org X-SA-Exim-Scanned: No (on soup.q.qbilt.org); SAEximRunCond expanded to false X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) I have this habit of editing binary files in emacs. I notice a change in behavior in 23.0.95 (which is the first 23 pretest I've run) relative to what I've seen in emacs 22. Specifically, I no longer see most characters in unibyte mode. I'll be specific. xrdb -load /dev/null emacs-22 -q M-x set-variable unibyte-display-via-language-environment RET t RET M-x set-language-environment RET Latin-1 RET M-x find-file-literally RET /bin/ls RET In this case, I see ^x for characters between 0 and \037, the ASCII character for \040-\177, \ooo for (unprintable) characters between \200 and \237, and the ISO-Latin-1 character for \240 through \377, as expected. With the same commands under emacs-23.0.95, I see ^x for \0 to \037, and I see some normal 7-bit ASCII characters, but for other ASCII characters and for everything \200 or above, I see various rectangles of various widths. I can still see the buffer the way I want to by doing C-x RET c iso-latin-1-unix RET C-x C-f /bin/ls which is, I suppose, pretty much the same thing, but it seems like the old behavior is right and the new behavior is probably a bug. Please let me know if there's any other information I should supply. ---------------------------------------------------------------------- In GNU Emacs 23.0.95.1 (i686-pc-linux-gnu, GTK+ Version 2.16.1) of 2009-06-25 on soup Windowing system distributor `The X.Org Foundation', version 11.0.10402000 Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: en_US.UTF-8 value of $XMODIFIERS: nil locale-coding-system: utf-8-unix default-enable-multibyte-characters: t Major mode: Emacs-Lisp Minor modes in effect: which-function-mode: t tooltip-mode: t mouse-wheel-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t global-auto-composition-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t column-number-mode: t line-number-mode: t Recent input: C-h v s e t SPC l a n C-g C-h f s e t SPC a l a n e n C-x b C-s s e t - l a n g C-s C-g C-g C-x 1 C-l C-v C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-z C-o C-n C-o C-o ( s e t - l a n g M-/ SPC " L a t i n - 1 " ) C-x C-e C-x C-s C-x b l s C-x C-v C-x b C-x C-e C-x b C-x C-v C-x k C-x C-f / b i n l / l s M-x u n i b C-v C-v C-v C-v C-v M-> M-< C-n C-x u C-x u C-f C-f C-z C-v C-f C-f C-f C-f C-f C-b C-z C-v C-x b C-s s e t - l a n C-g C-g C-x C-f ~ / e l q f X r e f o n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n M-f M-f M-f C-f C-SPC C-e M-w C-x C-x M-w C-x C-f / b i n / l s C-v C-v C-v C-v C-v C-v C-v C-v C-v C-x k C-h f u n i b C-x o C-e M-b C-x m q C-g C-x k y e s M-x r e p o r t SPC e m SPC b SPC Recent messages: U+0020 U+0028 Quit Note: file is write protected Mark set Making completion list... Type C-x 1 to delete the help window. Note: file is write protected Quit Scanning for dabbrevs...100% -- Jay Berkenbilt From handa@m17n.org Thu Jul 2 23:42:32 2009 Received: (at 3745) by emacsbugs.donarmstrong.com; 3 Jul 2009 06:42:32 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.3 required=4.0 tests=AWL,HAS_BUG_NUMBER, SPF_HELO_PASS autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mx1.aist.go.jp (mx1.aist.go.jp [150.29.246.133]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n636gPk2014324 for <3745@emacsbugs.donarmstrong.com>; Thu, 2 Jul 2009 23:42:28 -0700 Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id n636gOOC021398; Fri, 3 Jul 2009 15:42:24 +0900 (JST) env-from (handa@m17n.org) Received: from smtp1.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id n636gO8X011392; Fri, 3 Jul 2009 15:42:24 +0900 (JST) env-from (handa@m17n.org) Received: by smtp1.aist.go.jp with ESMTP id n636gNmX012610; Fri, 3 Jul 2009 15:42:23 +0900 (JST) env-from (handa@m17n.org) Received: from handa by etlken with local (Exim 4.69) (envelope-from ) id 1MMcTX-0003QK-HN; Fri, 03 Jul 2009 15:42:23 +0900 From: Kenichi Handa To: Jay Berkenbilt , 3745@debbugs.gnu.org Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment In-Reply-To: <20090702213958.0458148346.qww314159@soup.q.qbilt.org> (message from Jay Berkenbilt on Thu, 02 Jul 2009 21:39:58 -0400) References: <20090702213958.0458148346.qww314159@soup.q.qbilt.org> Date: Fri, 03 Jul 2009 15:42:23 +0900 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In article <20090702213958.0458148346.qww314159@soup.q.qbilt.org>, Jay Berkenbilt writes: > I have this habit of editing binary files in emacs. I notice a change > in behavior in 23.0.95 (which is the first 23 pretest I've run) relative > to what I've seen in emacs 22. Specifically, I no longer see most > characters in unibyte mode. I'll be specific. > xrdb -load /dev/null > emacs-22 -q > M-x set-variable unibyte-display-via-language-environment RET t RET > M-x set-language-environment RET Latin-1 RET > M-x find-file-literally RET /bin/ls RET > In this case, I see ^x for characters between 0 and \037, the ASCII > character for \040-\177, \ooo for (unprintable) characters between \200 > and \237, and the ISO-Latin-1 character for \240 through \377, as > expected. I confirmed the bug. The problem is that unibyte_char_to_multibyte now always returns an eight-bit multibyte-character. Now `charset_unibyte' is always 0 (i.e. the same as `charset_ascii'). So, unibyte->multibyte conversion always results in an eight-bit multibyte character. To fix the above problem, I propose these changes for 23.1 and the trunk. (1) Fix all codes accessing charset_unibyte (e.g. Funibyte_char_to_multibyte) not to refer to it. (2) Setup charset_unibyte correctly in Fset_charset_priority. (3) Fix x_produce_glyphs to do DECODE_CHAR (charset_unibyte, it->c) instead of unibyte_char_to_multibyte (it->c). Those changes are surely very safe. --- Kenichi Handa handa@m17n.org From cyd@stupidchicken.com Fri Jul 3 11:13:04 2009 Received: (at 3745) by emacsbugs.donarmstrong.com; 3 Jul 2009 18:13:04 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.0 required=4.0 tests=AWL,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from pantheon-po32.its.yale.edu (pantheon-po32.its.yale.edu [130.132.50.88]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n63ID0Na005266 for <3745@emacsbugs.donarmstrong.com>; Fri, 3 Jul 2009 11:13:01 -0700 Received: from furry (dhcp128036014241.central.yale.edu [128.36.14.241]) (authenticated bits=0) by pantheon-po32.its.yale.edu (8.12.11.20060308/8.12.11) with ESMTP id n63ICs5f026665 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 3 Jul 2009 14:12:54 -0400 Received: by furry (Postfix, from userid 1000) id 9852CC0E1; Fri, 3 Jul 2009 10:26:15 -0400 (EDT) From: Chong Yidong To: Kenichi Handa Cc: 3745@debbugs.gnu.org Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment Date: Fri, 03 Jul 2009 10:26:15 -0400 Message-ID: <87bpo13ks8.fsf@stupidchicken.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-YaleITSMailFilter: Version 1.2c (attachment(s) not renamed) > Now `charset_unibyte' is always 0 (i.e. the same as `charset_ascii'). Is this variable obsolete, then? From cyd@stupidchicken.com Fri Jul 3 12:06:38 2009 Received: (at 3745) by emacsbugs.donarmstrong.com; 3 Jul 2009 19:06:39 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.0 required=4.0 tests=AWL,FOURLA,HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from pantheon-po31.its.yale.edu (pantheon-po31.its.yale.edu [130.132.50.82]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n63J6Y1X016499 for <3745@emacsbugs.donarmstrong.com>; Fri, 3 Jul 2009 12:06:35 -0700 Received: from furry (dhcp128036014241.central.yale.edu [128.36.14.241]) (authenticated bits=0) by pantheon-po31.its.yale.edu (8.12.11.20060308/8.12.11) with ESMTP id n63J6Sew026204 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 3 Jul 2009 15:06:28 -0400 Received: by furry (Postfix, from userid 1000) id 8007EC09B; Fri, 3 Jul 2009 15:06:28 -0400 (EDT) From: Chong Yidong To: Kenichi Handa Cc: Jay Berkenbilt , 3745@debbugs.gnu.org Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment Date: Fri, 03 Jul 2009 15:06:28 -0400 Message-ID: <87y6r560y3.fsf@stupidchicken.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-YaleITSMailFilter: Version 1.2c (attachment(s) not renamed) > Now `charset_unibyte' is always 0 (i.e. the same as > `charset_ascii'). So, unibyte->multibyte conversion always > results in an eight-bit multibyte character. Looking through the code, I see that the variable `charset_unibyte' is not initialized properly. That's the only reason it's 0. We have to fix this for sure. > To fix the above problem, I propose these changes for 23.1 > and the trunk. > > (1) Fix all codes accessing charset_unibyte > (e.g. Funibyte_char_to_multibyte) not to refer to it. Can we use charset_iso_8859_1 instead of charset_unibyte, or add a line that says charset_unibyte = define_charset_internal (...); in syms_of_charset? > (2) Setup charset_unibyte correctly in Fset_charset_priority. > > (3) Fix x_produce_glyphs to do DECODE_CHAR (charset_unibyte, > it->c) instead of unibyte_char_to_multibyte (it->c). Number 3 is not a trivial change. IIUC, unibyte_char_to_multibyte is very fast. Changing it to use DECODE_CHAR may lead to a performance hit. From handa@m17n.org Sun Jul 5 17:52:06 2009 Received: (at 3745) by emacsbugs.donarmstrong.com; 6 Jul 2009 00:52:07 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.8 required=4.0 tests=AWL,FOURLA,HAS_BUG_NUMBER, IMPRONONCABLE_2,SPF_HELO_PASS autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mx1.aist.go.jp (mx1.aist.go.jp [150.29.246.133]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n660q17S022598 for <3745@emacsbugs.donarmstrong.com>; Sun, 5 Jul 2009 17:52:02 -0700 Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id n660pxMt011161; Mon, 6 Jul 2009 09:51:59 +0900 (JST) env-from (handa@m17n.org) Received: from smtp3.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id n660pxHT028898; Mon, 6 Jul 2009 09:51:59 +0900 (JST) env-from (handa@m17n.org) Received: by smtp3.aist.go.jp with ESMTP id n660pwUN013444; Mon, 6 Jul 2009 09:51:58 +0900 (JST) env-from (handa@m17n.org) Received: from handa by etlken with local (Exim 4.69) (envelope-from ) id 1MNcR4-00033N-7V; Mon, 06 Jul 2009 09:51:58 +0900 From: Kenichi Handa To: Chong Yidong Cc: 3745@debbugs.gnu.org Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment In-Reply-To: <87bpo13ks8.fsf@stupidchicken.com> (message from Chong Yidong on Fri, 03 Jul 2009 10:26:15 -0400) References: <87bpo13ks8.fsf@stupidchicken.com> Date: Mon, 06 Jul 2009 09:51:58 +0900 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In article <87bpo13ks8.fsf@stupidchicken.com>, Chong Yidong writes: > > Now `charset_unibyte' is always 0 (i.e. the same as `charset_ascii'). > Is this variable obsolete, then? Yes, at the moment. But, I'd like to use it for unibyte-display-via-language-environment. In article <87y6r560y3.fsf@stupidchicken.com>, Chong Yidong writes: > > Now `charset_unibyte' is always 0 (i.e. the same as > > `charset_ascii'). So, unibyte->multibyte conversion always > > results in an eight-bit multibyte character. > Looking through the code, I see that the variable `charset_unibyte' is > not initialized properly. That's the only reason it's 0. We have to > fix this for sure. Yes. > > To fix the above problem, I propose these changes for 23.1 > > and the trunk. > > > > (1) Fix all codes accessing charset_unibyte > > (e.g. Funibyte_char_to_multibyte) not to refer to it. > Can we use charset_iso_8859_1 instead of charset_unibyte, or add a line > that says > charset_unibyte > = define_charset_internal (...); > in syms_of_charset? No. Stefan's change was to make unibyte-char-to-multibyte (and unibyte_char_to_multibyte) always returning an 8-bit char for an 8-bit byte. To do that, charset_unibyte must be the same as charset_ascii, but, first of all, we don't have to use charset_unibyte in such an operation. We can simply use BYTE8_TO_CHAR. > > (2) Setup charset_unibyte correctly in Fset_charset_priority. > > > > (3) Fix x_produce_glyphs to do DECODE_CHAR (charset_unibyte, > > it->c) instead of unibyte_char_to_multibyte (it->c). > Number 3 is not a trivial change. IIUC, unibyte_char_to_multibyte is > very fast. Changing it to use DECODE_CHAR may lead to a performance > hit. But, using unibyte_char_to_multibyte here is a clear bug. If the overhead by DECODE_CHAR is untolerable (I don't believe it), we can do this: (1) modify unibyte_char_to_multibyte to use BYTE8_TO_CHAR instead of the table unibyte_to_multibyte_table. (2) Setup unibyte_to_multibyte_table for unibyte_charset. (3) Just lookup that table in x_produce_glyphs. --- Kenichi Handa handa@m17n.org From handa@m17n.org Sun Jul 5 23:51:07 2009 Received: (at 3745) by emacsbugs.donarmstrong.com; 6 Jul 2009 06:51:07 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.7 required=4.0 tests=AWL,FOURLA,FVGT_m_MULTI_ODD, HAS_BUG_NUMBER,IMPRONONCABLE_2,MURPHY_DRUGS_REL8,SPF_HELO_PASS autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mx1.aist.go.jp (mx1.aist.go.jp [150.29.246.133]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n666p0va019072 for <3745@emacsbugs.donarmstrong.com>; Sun, 5 Jul 2009 23:51:02 -0700 Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id n666oxHd006859; Mon, 6 Jul 2009 15:50:59 +0900 (JST) env-from (handa@m17n.org) Received: from smtp3.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id n666oxwb011002; Mon, 6 Jul 2009 15:50:59 +0900 (JST) env-from (handa@m17n.org) Received: by smtp3.aist.go.jp with ESMTP id n666owei004456; Mon, 6 Jul 2009 15:50:58 +0900 (JST) env-from (handa@m17n.org) Received: from handa by etlken with local (Exim 4.69) (envelope-from ) id 1MNi2U-0003xo-Pz; Mon, 06 Jul 2009 15:50:58 +0900 From: Kenichi Handa To: 3745@debbugs.gnu.org Cc: cyd@stupidchicken.com, 3745@debbugs.gnu.org Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment In-Reply-To: (message from Kenichi Handa on Mon, 06 Jul 2009 09:51:58 +0900) References: <87bpo13ks8.fsf@stupidchicken.com> Date: Mon, 06 Jul 2009 15:50:58 +0900 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In article , Kenichi Handa writes: > But, using unibyte_char_to_multibyte here is a clear bug. > If the overhead by DECODE_CHAR is untolerable (I don't > believe it), we can do this: > (1) modify unibyte_char_to_multibyte to use BYTE8_TO_CHAR > instead of the table unibyte_to_multibyte_table. > (2) Setup unibyte_to_multibyte_table for unibyte_charset. > (3) Just lookup that table in x_produce_glyphs. To minimize the changes, I made the attached patch. It doesn't touch unibyte_to_multibyte_table, but introduced charset_unibyte_decoder[128]. I confirmed it didn't make the display code slow. --- Kenichi Handa handa@m17n.org Index: character.c =================================================================== RCS file: /cvsroot/emacs/emacs/src/character.c,v retrieving revision 1.24 diff -u -r1.24 character.c --- character.c 5 Feb 2009 08:46:52 -0000 1.24 +++ character.c 6 Jul 2009 06:42:31 -0000 @@ -90,9 +90,9 @@ /* Mapping table from unibyte chars to multibyte chars. */ int unibyte_to_multibyte_table[256]; -/* Nth element is 1 iff unibyte char N can be mapped to a multibyte - char. */ -char unibyte_has_multibyte_table[256]; +/* Decoding table for 8-bit byte codes of the charset charset_unibyte. + Nth element is for the code (N-0x80). */ +int charset_unibyte_decoder[128]; @@ -270,9 +270,8 @@ return c; } -/* Convert the multibyte character C to unibyte 8-bit character based - on the current value of charset_unibyte. If dimension of - charset_unibyte is more than one, return (C & 0xFF). +/* Convert ASCII or 8-bit character C to unibyte. If C is none of + them, return (C & 0xFF). The argument REV_TBL is now ignored. It will be removed in the future. */ @@ -282,14 +281,11 @@ int c; Lisp_Object rev_tbl; { - struct charset *charset; - unsigned c1; - + if (c < 0x80) + return c; if (CHAR_BYTE8_P (c)) return CHAR_TO_BYTE8 (c); - charset = CHARSET_FROM_ID (charset_unibyte); - c1 = ENCODE_CHAR (charset, c); - return ((c1 != CHARSET_INVALID_CODE (charset)) ? c1 : c & 0xFF); + return (c & 0xFF); } /* Like multibyte_char_to_unibyte, but return -1 if C is not supported @@ -302,11 +298,11 @@ struct charset *charset; unsigned c1; + if (c < 0x80) + return c; if (CHAR_BYTE8_P (c)) return CHAR_TO_BYTE8 (c); - charset = CHARSET_FROM_ID (charset_unibyte); - c1 = ENCODE_CHAR (charset, c); - return ((c1 != CHARSET_INVALID_CODE (charset)) ? c1 : -1); + return -1; } DEFUN ("characterp", Fcharacterp, Scharacterp, 1, 2, 0, @@ -337,10 +333,8 @@ c = XFASTINT (ch); if (c >= 0400) error ("Invalid unibyte character: %d", c); - charset = CHARSET_FROM_ID (charset_unibyte); - c = DECODE_CHAR (charset, c); - if (c < 0) - c = BYTE8_TO_CHAR (XFASTINT (ch)); + if (c >= 0x80) + c = BYTE8_TO_CHAR (c); return make_number (c); } Index: character.h =================================================================== RCS file: /cvsroot/emacs/emacs/src/character.h,v retrieving revision 1.15 diff -u -r1.15 character.h --- character.h 8 Jan 2009 03:15:27 -0000 1.15 +++ character.h 6 Jul 2009 06:42:31 -0000 @@ -87,11 +87,15 @@ #define unibyte_char_to_multibyte(c) \ ((c) < 256 ? unibyte_to_multibyte_table[(c)] : (c)) -/* Nth element is 1 iff unibyte char N can be mapped to a multibyte - char. */ -extern char unibyte_has_multibyte_table[256]; - -#define UNIBYTE_CHAR_HAS_MULTIBYTE_P(c) (unibyte_has_multibyte_table[(c)]) +/* Decoding table for 8-bit byte codes of the charset charset_unibyte. + Nth element is for the code (N-0x80). */ +extern int charset_unibyte_decoder[128]; + +/* Return a character correspoinding to the code BYTE of + charset_unibyte. BYTE must be a byte; i.e. less than 0x100. If + BYTE is not a valid code of charset_unibyte, return -1. */ +#define DECODE_UNIBYTE(BYTE) \ + ((BYTE) < 0x80 ? (int) (BYTE) : charset_unibyte_decoder[(BYTE) - 0x80]) /* If C is not ASCII, make it unibyte. */ #define MAKE_CHAR_UNIBYTE(c) \ Index: charset.c =================================================================== RCS file: /cvsroot/emacs/emacs/src/charset.c,v retrieving revision 1.179 diff -u -r1.179 charset.c --- charset.c 9 Jun 2009 02:53:07 -0000 1.179 +++ charset.c 6 Jul 2009 06:42:32 -0000 @@ -2260,6 +2260,7 @@ Vcharset_ordered_list = Fnconc (2, arglist); charset_ordered_list_tick++; + charset_unibyte = -1; for (old_list = Vcharset_ordered_list, list_2022 = list_emacs_mule = Qnil; CONSP (old_list); old_list = XCDR (old_list)) { @@ -2267,9 +2268,25 @@ list_2022 = Fcons (XCAR (old_list), list_2022); if (! NILP (Fmemq (XCAR (old_list), Vemacs_mule_charset_list))) list_emacs_mule = Fcons (XCAR (old_list), list_emacs_mule); + if (charset_unibyte < 0) + { + struct charset *charset = CHARSET_FROM_ID (XINT (XCAR (old_list))); + + if (CHARSET_DIMENSION (charset) == 1 + && CHARSET_ASCII_COMPATIBLE_P (charset) + && CHARSET_MAX_CHAR (charset) >= 0x80) + charset_unibyte = CHARSET_ID (charset); + } } Viso_2022_charset_list = Fnreverse (list_2022); Vemacs_mule_charset_list = Fnreverse (list_emacs_mule); + if (charset_unibyte < 0) + charset_unibyte = charset_iso_8859_1; + { + struct charset *charset = CHARSET_FROM_ID (charset_unibyte); + for (i = 128; i < 256; i++) + charset_unibyte_decoder[i - 128] = DECODE_CHAR (charset, i); + } return Qnil; } @@ -2328,6 +2345,10 @@ unibyte_to_multibyte_table[i] = i; for (; i < 256; i++) unibyte_to_multibyte_table[i] = BYTE8_TO_CHAR (i); + for (i = 0; i < 32; i++) + charset_unibyte_decoder[i] = -1; + for (; i < 128; i++) + charset_unibyte_decoder[i] = 128 + i; } #ifdef emacs @@ -2429,6 +2450,7 @@ = define_charset_internal (Qeight_bit, 1, "\x80\xFF\x00\x00\x00\x00", 128, 255, -1, 0, -1, 0, 1, MAX_5_BYTE_CHAR + 1); + charset_unibyte = charset_iso_8859_1; } #endif /* emacs */ Index: xdisp.c =================================================================== RCS file: /cvsroot/emacs/emacs/src/xdisp.c,v retrieving revision 1.1288 diff -u -r1.1288 xdisp.c --- xdisp.c 18 Jun 2009 09:49:07 -0000 1.1288 +++ xdisp.c 6 Jul 2009 06:42:34 -0000 @@ -5743,7 +5743,7 @@ || it->c == 0xAD /* SOFT HYPHEN */))) : (it->c >= 127 && (! unibyte_display_via_language_environment - || (UNIBYTE_CHAR_HAS_MULTIBYTE_P (it->c))))))) + || (DECODE_UNIBYTE (it->c) <= 0xA0)))))) { /* IT->c is a control character which must be displayed either as '\003' or as `^C' where the '\\' and '^' @@ -21196,9 +21196,8 @@ { if (SINGLE_BYTE_CHAR_P (it->c) && unibyte_display_via_language_environment) - it->char_to_display = unibyte_char_to_multibyte (it->c); - if (! SINGLE_BYTE_CHAR_P (it->char_to_display)) { + it->char_to_display = DECODE_UNIBYTE (it->c); it->multibyte_p = 1; it->face_id = FACE_FOR_CHAR (it->f, face, it->char_to_display, -1, Qnil); From cyd@stupidchicken.com Mon Jul 6 07:04:22 2009 Received: (at 3745) by emacsbugs.donarmstrong.com; 6 Jul 2009 14:04:22 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-1.9 required=4.0 tests=AWL,FOURLA,HAS_BUG_NUMBER, MURPHY_DRUGS_REL8 autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from pantheon-po34.its.yale.edu (pantheon-po34.its.yale.edu [130.132.50.80]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n66E4IC9002754 for <3745@emacsbugs.donarmstrong.com>; Mon, 6 Jul 2009 07:04:20 -0700 Received: from furry (dhcp128036014241.central.yale.edu [128.36.14.241]) (authenticated bits=0) by pantheon-po34.its.yale.edu (8.12.11.20060308/8.12.11) with ESMTP id n66E42AK030721 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Mon, 6 Jul 2009 10:04:08 -0400 Received: by furry (Postfix, from userid 1000) id F3B23C09B; Mon, 6 Jul 2009 10:03:58 -0400 (EDT) From: Chong Yidong To: Kenichi Handa Cc: 3745@debbugs.gnu.org Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment References: <87bpo13ks8.fsf@stupidchicken.com> Date: Mon, 06 Jul 2009 10:03:58 -0400 In-Reply-To: (Kenichi Handa's message of "Mon, 06 Jul 2009 15:50:58 +0900") Message-ID: <87my7h6h81.fsf@stupidchicken.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-YaleITSMailFilter: Version 1.2c (attachment(s) not renamed) Kenichi Handa writes: > To minimize the changes, I made the attached patch. It > doesn't touch unibyte_to_multibyte_table, but introduced > charset_unibyte_decoder[128]. I confirmed it didn't make > the display code slow. > @@ -302,11 +298,11 @@ > struct charset *charset; > unsigned c1; > > + if (c < 0x80) > + return c; > if (CHAR_BYTE8_P (c)) > return CHAR_TO_BYTE8 (c); You should also delete the unused `charset' and `c1' variables in this block. Other than that, these changes look good. Thanks very much for making this patch, and please install on the branch ASAP. For the trunk, I agree that we should try using use DECODE_CHAR in x_produce_glyphs. From handa@m17n.org Mon Jul 6 23:28:17 2009 Received: (at 3745) by emacsbugs.donarmstrong.com; 7 Jul 2009 06:28:17 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-2.7 required=4.0 tests=AWL,FOURLA,HAS_BUG_NUMBER, IMPRONONCABLE_2,MURPHY_DRUGS_REL8,SPF_HELO_PASS autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mx1.aist.go.jp (mx1.aist.go.jp [150.29.246.133]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n676SC1Q016216 for <3745@emacsbugs.donarmstrong.com>; Mon, 6 Jul 2009 23:28:14 -0700 Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id n676SAh4027848; Tue, 7 Jul 2009 15:28:11 +0900 (JST) env-from (handa@m17n.org) Received: from smtp2.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id n676SAwp021304; Tue, 7 Jul 2009 15:28:10 +0900 (JST) env-from (handa@m17n.org) Received: by smtp2.aist.go.jp with ESMTP id n676SAa6009545; Tue, 7 Jul 2009 15:28:10 +0900 (JST) env-from (handa@m17n.org) Received: from handa by etlken with local (Exim 4.69) (envelope-from ) id 1MO49y-0007M7-Ax; Tue, 07 Jul 2009 15:28:10 +0900 From: Kenichi Handa To: Chong Yidong Cc: 3745@debbugs.gnu.org Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment In-Reply-To: <87my7h6h81.fsf@stupidchicken.com> (message from Chong Yidong on Mon, 06 Jul 2009 10:03:58 -0400) References: <87bpo13ks8.fsf@stupidchicken.com> <87my7h6h81.fsf@stupidchicken.com> Date: Tue, 07 Jul 2009 15:28:10 +0900 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In article <87my7h6h81.fsf@stupidchicken.com>, Chong Yidong writes: > Kenichi Handa writes: > > To minimize the changes, I made the attached patch. It > > doesn't touch unibyte_to_multibyte_table, but introduced > > charset_unibyte_decoder[128]. I confirmed it didn't make > > the display code slow. > > @@ -302,11 +298,11 @@ > > struct charset *charset; > > unsigned c1; > > > > + if (c < 0x80) > > + return c; > > if (CHAR_BYTE8_P (c)) > > return CHAR_TO_BYTE8 (c); > You should also delete the unused `charset' and `c1' variables in this > block. Ah, yes. > Other than that, these changes look good. Thanks very much for making > this patch, and please install on the branch ASAP. > For the trunk, I agree that we should try using use DECODE_CHAR in > x_produce_glyphs. Ok, done. I also installed this change of reset-language-environment for completion. --- mule-cmds.el 8 Apr 2009 18:03:17 -0000 1.360 +++ mule-cmds.el 7 Jul 2009 05:59:18 -0000 1.360.2.1 @@ -1794,6 +1794,11 @@ (coding-system-error 'iso-latin-1)))) (setq default-process-coding-system (cons output-coding input-coding))) + ;; Put the highest priority to the charset iso-8859-1 to prefer the + ;; registry iso8859-1 over iso8859-2 in font selection. It also + ;; makes unibyte-display-via-language-environment to use iso-8859-1 + ;; as the unibyte charset. + (set-charset-priority 'iso-8859-1) ;; Don't alter the terminal and keyboard coding systems here. ;; The terminal still supports the same coding system --- Kenichi Handa handa@m17n.org From schwab@linux-m68k.org Tue Jul 7 05:33:38 2009 Received: (at 3745) by emacsbugs.donarmstrong.com; 7 Jul 2009 12:33:38 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER,SPF_HELO_PASS autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mx2.redhat.com (mx2.redhat.com [66.187.237.31]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n67CXWGu016819 for <3745@emacsbugs.donarmstrong.com>; Tue, 7 Jul 2009 05:33:34 -0700 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n67CXVxK012355; Tue, 7 Jul 2009 08:33:31 -0400 Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199]) by int-mx2.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n67CXUlU025104; Tue, 7 Jul 2009 08:33:31 -0400 Received: from hase.home (dhcp-64-164.muc.redhat.com [10.32.64.164] (may be forged)) by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id n67CXP6u015724; Tue, 7 Jul 2009 08:33:27 -0400 From: Andreas Schwab To: Kenichi Handa Cc: 3745@debbugs.gnu.org, cyd@stupidchicken.com Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment References: <87bpo13ks8.fsf@stupidchicken.com> X-Yow: Here I am in the POSTERIOR OLFACTORY LOBULE but I don't see CARL SAGAN anywhere!! Date: Tue, 07 Jul 2009 14:33:24 +0200 In-Reply-To: (Kenichi Handa's message of "Mon, 06 Jul 2009 15:50:58 +0900") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.95 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Scanned-By: MIMEDefang 2.58 on 172.16.27.26 Kenichi Handa writes: > +/* Decoding table for 8-bit byte codes of the charset charset_unibyte. > + Nth element is for the code (N-0x80). */ You probably mean (N+0x80). > +int charset_unibyte_decoder[128]; Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From handa@m17n.org Tue Jul 7 05:45:08 2009 Received: (at 3745) by emacsbugs.donarmstrong.com; 7 Jul 2009 12:45:08 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.3 required=4.0 tests=AWL,HAS_BUG_NUMBER, SPF_HELO_PASS autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mx1.aist.go.jp (mx1.aist.go.jp [150.29.246.133]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n67Cj28o018595 for <3745@emacsbugs.donarmstrong.com>; Tue, 7 Jul 2009 05:45:04 -0700 Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id n67Cj1GJ006654; Tue, 7 Jul 2009 21:45:01 +0900 (JST) env-from (handa@m17n.org) Received: from smtp4.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id n67Cj19J005305; Tue, 7 Jul 2009 21:45:01 +0900 (JST) env-from (handa@m17n.org) Received: by smtp4.aist.go.jp with ESMTP id n67Cj0Yq017690; Tue, 7 Jul 2009 21:45:00 +0900 (JST) env-from (handa@m17n.org) Received: from handa by etlken with local (Exim 4.69) (envelope-from ) id 1MOA2e-0008F5-Cn; Tue, 07 Jul 2009 21:45:00 +0900 From: Kenichi Handa To: Andreas Schwab Cc: 3745@debbugs.gnu.org, cyd@stupidchicken.com Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment In-Reply-To: (message from Andreas Schwab on Tue, 07 Jul 2009 14:33:24 +0200) References: <87bpo13ks8.fsf@stupidchicken.com> Date: Tue, 07 Jul 2009 21:45:00 +0900 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In article , Andreas Schwab writes: > Kenichi Handa writes: > > +/* Decoding table for 8-bit byte codes of the charset charset_unibyte. > > + Nth element is for the code (N-0x80). */ > You probably mean (N+0x80). Yes! Just fixed, thank you. --- Kenichi Handa handa@m17n.org From cyd@stupidchicken.com Wed Jul 8 07:01:46 2009 Received: (at control) by emacsbugs.donarmstrong.com; 8 Jul 2009 14:01:46 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-0.4 required=4.0 tests=AWL autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from pantheon-po31.its.yale.edu (pantheon-po31.its.yale.edu [130.132.50.82]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n68E1gHq029331 for ; Wed, 8 Jul 2009 07:01:44 -0700 Received: from furry (dhcp128036014241.central.yale.edu [128.36.14.241]) (authenticated bits=0) by pantheon-po31.its.yale.edu (8.12.11.20060308/8.12.11) with ESMTP id n68E1bJP002915 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 8 Jul 2009 10:01:37 -0400 Received: by furry (Postfix, from userid 1000) id 84B71C09B; Wed, 8 Jul 2009 10:01:37 -0400 (EDT) From: Chong Yidong To: control@debbugs.gnu.org Subject: close 3745 Date: Wed, 08 Jul 2009 10:01:37 -0400 Message-ID: <87my7f6zpa.fsf@stupidchicken.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-YaleITSMailFilter: Version 1.2c (attachment(s) not renamed) close 3745 thanks From unknown Fri Jun 20 07:19:09 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: $requester Subject: Internal Control Message-Id: bug archived. Date: Wed, 05 Aug 2009 14:24:10 +0000 User-Agent: Fakemail v42.6.9 # A New Hope # A log time ago, in a galaxy far, far away # something happened. # # Magically this resulted in the following # action being taken, but this fake control # message doesn't tell you why it happened # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator