GNU bug report logs - #3745
23.0.95; emacs-23.0.95: unibyte-display-via-language-environment

Package: emacs;

Reported by: Jay Berkenbilt <ejb <at> ql.org>

Date: Fri, 3 Jul 2009 01:45:04 UTC

Severity: normal

Done: Chong Yidong <cyd <at> stupidchicken.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 3745 in the body.
You can then email your comments to 3745 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3745; Package emacs. (Fri, 03 Jul 2009 01:45:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jay Berkenbilt <ejb <at> ql.org>:
New bug report received and forwarded. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Fri, 03 Jul 2009 01:45:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Jay Berkenbilt <ejb <at> ql.org>
To: emacs-pretest-bug <at> gnu.org
Subject: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment
Date: Thu, 02 Jul 2009 21:39:58 -0400

I have this habit of editing binary files in emacs.  I notice a change
in behavior in 23.0.95 (which is the first 23 pretest I've run) relative
to what I've seen in emacs 22.  Specifically, I no longer see most
characters in unibyte mode.  I'll be specific.

xrdb -load /dev/null

emacs-22 -q
M-x set-variable unibyte-display-via-language-environment RET t RET
M-x set-language-environment RET Latin-1 RET
M-x find-file-literally RET /bin/ls RET

In this case, I see ^x for characters between 0 and \037, the ASCII
character for \040-\177, \ooo for (unprintable) characters between \200
and \237, and the ISO-Latin-1 character for \240 through \377, as
expected.

With the same commands under emacs-23.0.95, I see ^x for \0 to \037, and
I see some normal 7-bit ASCII characters, but for other ASCII characters
and for everything \200 or above, I see various rectangles of various
widths.  I can still see the buffer the way I want to by doing

C-x RET c iso-latin-1-unix RET C-x C-f /bin/ls

which is, I suppose, pretty much the same thing, but it seems like the
old behavior is right and the new behavior is probably a bug.  Please
let me know if there's any other information I should supply.

----------------------------------------------------------------------

In GNU Emacs 23.0.95.1 (i686-pc-linux-gnu, GTK+ Version 2.16.1)
 of 2009-06-25 on soup
Windowing system distributor `The X.Org Foundation', version 11.0.10402000
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

Major mode: Emacs-Lisp

Minor modes in effect:
  which-function-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  global-auto-composition-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t

Recent input:
C-h v s e t SPC l a n <tab> C-g C-h f s e t SPC a <backspace> 
l <tab> a n <tab> e n <tab> <return> C-x b <return> 
C-s s e t - l a n g C-s C-g C-g C-x 1 C-l C-v C-n C-n 
C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n 
C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n C-n 
C-n C-n C-n C-n C-n C-n C-z C-o C-n C-o C-o <tab> ( 
s e t - l a n g M-/ <backspace> SPC " L a t i n - 1 
" ) C-x C-e C-x C-s C-x b l s <tab> <return> C-x C-v 
<return> C-x b <return> C-x C-e C-x b <return> C-x 
C-v <return> C-x k <return> C-x C-f / b i n l <backspace> 
/ l s <tab> <return> M-x u n i <tab> b <tab> <return> 
C-v C-v C-v C-v C-v M-> M-< C-n C-x u C-x u C-f C-f 
C-z C-v C-f C-f C-f C-f C-f C-b C-z C-v C-x b <return> 
C-s s e t - l a n C-g C-g C-x C-f ~ / e l <tab> q f 
<tab> <M-backspace> <M-backspace> X r e <tab> f o n 
<tab> <return> C-n C-n C-n C-n C-n C-n C-n C-n C-n 
C-n C-n C-n C-n M-f M-f M-f C-f C-SPC C-e M-w C-x C-x 
M-w C-x C-f / b i n / l s <return> C-v C-v C-v C-v 
C-v C-v C-v C-v C-v C-x k <return> C-h f u n i <tab> 
b <tab> <return> C-x o C-e M-b <return> C-x m q <tab> 
C-g C-x k <return> y e s <return> M-x r e p o r t SPC 
e m SPC b SPC <return>

Recent messages:
U+0020
U+0028
Quit
Note: file is write protected
Mark set
Making completion list...
Type C-x 1 to delete the help window.
Note: file is write protected
Quit
Scanning for dabbrevs...100%

-- 
Jay Berkenbilt <ejb <at> ql.org>

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3745; Package emacs. (Fri, 03 Jul 2009 06:45:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kenichi Handa <handa <at> m17n.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Fri, 03 Jul 2009 06:45:05 GMT) Full text and rfc822 format available.

Message #10 received at 3745 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Jay Berkenbilt <ejb <at> ql.org>, 3745 <at> debbugs.gnu.org
Subject: Re: bug#3745: 23.0.95;
	emacs-23.0.95: unibyte-display-via-language-environment
Date: Fri, 03 Jul 2009 15:42:23 +0900

In article <20090702213958.0458148346.qww314159 <at> soup.q.qbilt.org>, Jay Berkenbilt <ejb <at> ql.org> writes:

> I have this habit of editing binary files in emacs.  I notice a change
> in behavior in 23.0.95 (which is the first 23 pretest I've run) relative
> to what I've seen in emacs 22.  Specifically, I no longer see most
> characters in unibyte mode.  I'll be specific.

> xrdb -load /dev/null

> emacs-22 -q
> M-x set-variable unibyte-display-via-language-environment RET t RET
> M-x set-language-environment RET Latin-1 RET
> M-x find-file-literally RET /bin/ls RET

> In this case, I see ^x for characters between 0 and \037, the ASCII
> character for \040-\177, \ooo for (unprintable) characters between \200
> and \237, and the ISO-Latin-1 character for \240 through \377, as
> expected.

I confirmed the bug.  The problem is that
unibyte_char_to_multibyte now always returns an eight-bit
multibyte-character.

Now `charset_unibyte' is always 0 (i.e. the same as
`charset_ascii').  So, unibyte->multibyte conversion always
results in an eight-bit multibyte character.

To fix the above problem, I propose these changes for 23.1
and the trunk.

(1) Fix all codes accessing charset_unibyte
(e.g. Funibyte_char_to_multibyte) not to refer to it.

(2) Setup charset_unibyte correctly in Fset_charset_priority.

(3) Fix x_produce_glyphs to do DECODE_CHAR (charset_unibyte,
    it->c) instead of unibyte_char_to_multibyte (it->c).

Those changes are surely very safe.

---
Kenichi Handa
handa <at> m17n.org

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3745; Package emacs. (Fri, 03 Jul 2009 18:15:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Chong Yidong <cyd <at> stupidchicken.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Fri, 03 Jul 2009 18:15:08 GMT) Full text and rfc822 format available.

Message #15 received at 3745 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Chong Yidong <cyd <at> stupidchicken.com>
To: Kenichi Handa <handa <at> m17n.org>
Cc: 3745 <at> debbugs.gnu.org
Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment
Date: Fri, 03 Jul 2009 10:26:15 -0400

> Now `charset_unibyte' is always 0 (i.e. the same as `charset_ascii').

Is this variable obsolete, then?

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3745; Package emacs. (Fri, 03 Jul 2009 19:10:10 GMT) Full text and rfc822 format available.

Acknowledgement sent to Chong Yidong <cyd <at> stupidchicken.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Fri, 03 Jul 2009 19:10:12 GMT) Full text and rfc822 format available.

Message #20 received at 3745 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Chong Yidong <cyd <at> stupidchicken.com>
To: Kenichi Handa <handa <at> m17n.org>
Cc: Jay Berkenbilt <ejb <at> ql.org>, 3745 <at> debbugs.gnu.org
Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment
Date: Fri, 03 Jul 2009 15:06:28 -0400

> Now `charset_unibyte' is always 0 (i.e. the same as
> `charset_ascii').  So, unibyte->multibyte conversion always
> results in an eight-bit multibyte character.

Looking through the code, I see that the variable `charset_unibyte' is
not initialized properly.  That's the only reason it's 0.  We have to
fix this for sure.

> To fix the above problem, I propose these changes for 23.1
> and the trunk.
>
> (1) Fix all codes accessing charset_unibyte
> (e.g. Funibyte_char_to_multibyte) not to refer to it.

Can we use charset_iso_8859_1 instead of charset_unibyte, or add a line
that says

  charset_unibyte
    = define_charset_internal (...);

in syms_of_charset?

> (2) Setup charset_unibyte correctly in Fset_charset_priority.
>
> (3) Fix x_produce_glyphs to do DECODE_CHAR (charset_unibyte,
>     it->c) instead of unibyte_char_to_multibyte (it->c).

Number 3 is not a trivial change.  IIUC, unibyte_char_to_multibyte is
very fast.  Changing it to use DECODE_CHAR may lead to a performance
hit.

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3745; Package emacs. (Mon, 06 Jul 2009 01:00:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kenichi Handa <handa <at> m17n.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Mon, 06 Jul 2009 01:00:04 GMT) Full text and rfc822 format available.

Message #25 received at 3745 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Chong Yidong <cyd <at> stupidchicken.com>
Cc: 3745 <at> debbugs.gnu.org
Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment
Date: Mon, 06 Jul 2009 09:51:58 +0900

In article <87bpo13ks8.fsf <at> stupidchicken.com>, Chong Yidong <cyd <at> stupidchicken.com> writes:

> > Now `charset_unibyte' is always 0 (i.e. the same as `charset_ascii').
> Is this variable obsolete, then?

Yes, at the moment.  But, I'd like to use it for
unibyte-display-via-language-environment.

In article <87y6r560y3.fsf <at> stupidchicken.com>, Chong Yidong <cyd <at> stupidchicken.com> writes:

> > Now `charset_unibyte' is always 0 (i.e. the same as
> > `charset_ascii').  So, unibyte->multibyte conversion always
> > results in an eight-bit multibyte character.

> Looking through the code, I see that the variable `charset_unibyte' is
> not initialized properly.  That's the only reason it's 0.  We have to
> fix this for sure.

Yes.

> > To fix the above problem, I propose these changes for 23.1
> > and the trunk.
> >
> > (1) Fix all codes accessing charset_unibyte
> > (e.g. Funibyte_char_to_multibyte) not to refer to it.

> Can we use charset_iso_8859_1 instead of charset_unibyte, or add a line
> that says

>   charset_unibyte
>     = define_charset_internal (...);

> in syms_of_charset?

No.  Stefan's change was to make unibyte-char-to-multibyte
(and unibyte_char_to_multibyte) always returning an 8-bit
char for an 8-bit byte.  To do that, charset_unibyte must be
the same as charset_ascii, but, first of all, we don't have
to use charset_unibyte in such an operation.  We can simply
use BYTE8_TO_CHAR.

> > (2) Setup charset_unibyte correctly in Fset_charset_priority.
> >
> > (3) Fix x_produce_glyphs to do DECODE_CHAR (charset_unibyte,
> >     it->c) instead of unibyte_char_to_multibyte (it->c).

> Number 3 is not a trivial change.  IIUC, unibyte_char_to_multibyte is
> very fast.  Changing it to use DECODE_CHAR may lead to a performance
> hit.

But, using unibyte_char_to_multibyte here is a clear bug.
If the overhead by DECODE_CHAR is untolerable (I don't
believe it), we can do this:

(1) modify unibyte_char_to_multibyte to use BYTE8_TO_CHAR
    instead of the table unibyte_to_multibyte_table.
(2) Setup unibyte_to_multibyte_table for unibyte_charset.
(3) Just lookup that table in x_produce_glyphs.

---
Kenichi Handa
handa <at> m17n.org

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3745; Package emacs. (Mon, 06 Jul 2009 07:00:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kenichi Handa <handa <at> m17n.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Mon, 06 Jul 2009 07:00:08 GMT) Full text and rfc822 format available.

Message #30 received at 3745 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: 3745 <at> debbugs.gnu.org
Cc: cyd <at> stupidchicken.com, 3745 <at> debbugs.gnu.org
Subject: Re: bug#3745: 23.0.95;
	emacs-23.0.95: unibyte-display-via-language-environment
Date: Mon, 06 Jul 2009 15:50:58 +0900

In article <tl74otqk501.fsf <at> m17n.org>, Kenichi Handa <handa <at> m17n.org> writes:

> But, using unibyte_char_to_multibyte here is a clear bug.
> If the overhead by DECODE_CHAR is untolerable (I don't
> believe it), we can do this:

> (1) modify unibyte_char_to_multibyte to use BYTE8_TO_CHAR
>     instead of the table unibyte_to_multibyte_table.
> (2) Setup unibyte_to_multibyte_table for unibyte_charset.
> (3) Just lookup that table in x_produce_glyphs.

To minimize the changes, I made the attached patch.  It
doesn't touch unibyte_to_multibyte_table, but introduced
charset_unibyte_decoder[128].  I confirmed it didn't make
the display code slow.

---
Kenichi Handa
handa <at> m17n.org

Index: character.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/character.c,v
retrieving revision 1.24
diff -u -r1.24 character.c
--- character.c	5 Feb 2009 08:46:52 -0000	1.24
+++ character.c	6 Jul 2009 06:42:31 -0000
@@ -90,9 +90,9 @@
 /* Mapping table from unibyte chars to multibyte chars.  */
 int unibyte_to_multibyte_table[256];
 
-/* Nth element is 1 iff unibyte char N can be mapped to a multibyte
-   char.  */
-char unibyte_has_multibyte_table[256];
+/* Decoding table for 8-bit byte codes of the charset charset_unibyte.
+   Nth element is for the code (N-0x80).  */
+int charset_unibyte_decoder[128];
 
 
 
@@ -270,9 +270,8 @@
   return c;
 }
 
-/* Convert the multibyte character C to unibyte 8-bit character based
-   on the current value of charset_unibyte.  If dimension of
-   charset_unibyte is more than one, return (C & 0xFF).
+/* Convert ASCII or 8-bit character C to unibyte.  If C is none of
+   them, return (C & 0xFF).
 
    The argument REV_TBL is now ignored.  It will be removed in the
    future.  */
@@ -282,14 +281,11 @@
      int c;
      Lisp_Object rev_tbl;
 {
-  struct charset *charset;
-  unsigned c1;
-
+  if (c < 0x80)
+    return c;
   if (CHAR_BYTE8_P (c))
     return CHAR_TO_BYTE8 (c);
-  charset = CHARSET_FROM_ID (charset_unibyte);
-  c1 = ENCODE_CHAR (charset, c);
-  return ((c1 != CHARSET_INVALID_CODE (charset)) ? c1 : c & 0xFF);
+  return (c & 0xFF);
 }
 
 /* Like multibyte_char_to_unibyte, but return -1 if C is not supported
@@ -302,11 +298,11 @@
   struct charset *charset;
   unsigned c1;
 
+  if (c < 0x80)
+    return c;
   if (CHAR_BYTE8_P (c))
     return CHAR_TO_BYTE8 (c);
-  charset = CHARSET_FROM_ID (charset_unibyte);
-  c1 = ENCODE_CHAR (charset, c);
-  return ((c1 != CHARSET_INVALID_CODE (charset)) ? c1 : -1);
+  return -1;
 }
 
 DEFUN ("characterp", Fcharacterp, Scharacterp, 1, 2, 0,
@@ -337,10 +333,8 @@
   c = XFASTINT (ch);
   if (c >= 0400)
     error ("Invalid unibyte character: %d", c);
-  charset = CHARSET_FROM_ID (charset_unibyte);
-  c = DECODE_CHAR (charset, c);
-  if (c < 0)
-    c = BYTE8_TO_CHAR (XFASTINT (ch));
+  if (c >= 0x80)
+    c = BYTE8_TO_CHAR (c);
   return make_number (c);
 }
 
Index: character.h
===================================================================
RCS file: /cvsroot/emacs/emacs/src/character.h,v
retrieving revision 1.15
diff -u -r1.15 character.h
--- character.h	8 Jan 2009 03:15:27 -0000	1.15
+++ character.h	6 Jul 2009 06:42:31 -0000
@@ -87,11 +87,15 @@
 #define unibyte_char_to_multibyte(c)	\
   ((c) < 256 ? unibyte_to_multibyte_table[(c)] : (c))
 
-/* Nth element is 1 iff unibyte char N can be mapped to a multibyte
-   char.  */
-extern char unibyte_has_multibyte_table[256];
-
-#define UNIBYTE_CHAR_HAS_MULTIBYTE_P(c) (unibyte_has_multibyte_table[(c)])
+/* Decoding table for 8-bit byte codes of the charset charset_unibyte.
+   Nth element is for the code (N-0x80).  */
+extern int charset_unibyte_decoder[128];
+
+/* Return a character correspoinding to the code BYTE of
+   charset_unibyte.  BYTE must be a byte; i.e. less than 0x100.  If
+   BYTE is not a valid code of charset_unibyte, return -1.  */
+#define DECODE_UNIBYTE(BYTE)	\
+  ((BYTE) < 0x80 ? (int) (BYTE) : charset_unibyte_decoder[(BYTE) - 0x80])
 
 /* If C is not ASCII, make it unibyte. */
 #define MAKE_CHAR_UNIBYTE(c)	\
Index: charset.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/charset.c,v
retrieving revision 1.179
diff -u -r1.179 charset.c
--- charset.c	9 Jun 2009 02:53:07 -0000	1.179
+++ charset.c	6 Jul 2009 06:42:32 -0000
@@ -2260,6 +2260,7 @@
   Vcharset_ordered_list = Fnconc (2, arglist);
   charset_ordered_list_tick++;
 
+  charset_unibyte = -1;
   for (old_list = Vcharset_ordered_list, list_2022 = list_emacs_mule = Qnil;
        CONSP (old_list); old_list = XCDR (old_list))
     {
@@ -2267,9 +2268,25 @@
 	list_2022 = Fcons (XCAR (old_list), list_2022);
       if (! NILP (Fmemq (XCAR (old_list), Vemacs_mule_charset_list)))
 	list_emacs_mule = Fcons (XCAR (old_list), list_emacs_mule);
+      if (charset_unibyte < 0)
+	{
+	  struct charset *charset = CHARSET_FROM_ID (XINT (XCAR (old_list)));
+
+	  if (CHARSET_DIMENSION (charset) == 1
+	      && CHARSET_ASCII_COMPATIBLE_P (charset)
+	      && CHARSET_MAX_CHAR (charset) >= 0x80)
+	    charset_unibyte = CHARSET_ID (charset);
+	}
     }
   Viso_2022_charset_list = Fnreverse (list_2022);
   Vemacs_mule_charset_list = Fnreverse (list_emacs_mule);
+  if (charset_unibyte < 0)
+    charset_unibyte = charset_iso_8859_1;
+  {
+    struct charset *charset = CHARSET_FROM_ID (charset_unibyte);
+    for (i = 128; i < 256; i++)
+      charset_unibyte_decoder[i - 128] = DECODE_CHAR (charset, i);
+  }
 
   return Qnil;
 }
@@ -2328,6 +2345,10 @@
     unibyte_to_multibyte_table[i] = i;
   for (; i < 256; i++)
     unibyte_to_multibyte_table[i] = BYTE8_TO_CHAR (i);
+  for (i = 0; i < 32; i++)
+    charset_unibyte_decoder[i] = -1;
+  for (; i < 128; i++)
+    charset_unibyte_decoder[i] = 128 + i;
 }
 
 #ifdef emacs
@@ -2429,6 +2450,7 @@
     = define_charset_internal (Qeight_bit, 1, "\x80\xFF\x00\x00\x00\x00",
 			       128, 255, -1, 0, -1, 0, 1,
 			       MAX_5_BYTE_CHAR + 1);
+  charset_unibyte = charset_iso_8859_1;
 }
 
 #endif /* emacs */
Index: xdisp.c
===================================================================
RCS file: /cvsroot/emacs/emacs/src/xdisp.c,v
retrieving revision 1.1288
diff -u -r1.1288 xdisp.c
--- xdisp.c	18 Jun 2009 09:49:07 -0000	1.1288
+++ xdisp.c	6 Jul 2009 06:42:34 -0000
@@ -5743,7 +5743,7 @@
 				  || it->c == 0xAD /* SOFT HYPHEN */)))
 		       : (it->c >= 127
 			  && (! unibyte_display_via_language_environment
-			      || (UNIBYTE_CHAR_HAS_MULTIBYTE_P (it->c)))))))
+			      || (DECODE_UNIBYTE (it->c) <= 0xA0))))))
 	    {
 	      /* IT->c is a control character which must be displayed
 		 either as '\003' or as `^C' where the '\\' and '^'
@@ -21196,9 +21196,8 @@
 	{
 	  if (SINGLE_BYTE_CHAR_P (it->c)
 	      && unibyte_display_via_language_environment)
-	    it->char_to_display = unibyte_char_to_multibyte (it->c);
-	  if (! SINGLE_BYTE_CHAR_P (it->char_to_display))
 	    {
+	      it->char_to_display = DECODE_UNIBYTE (it->c);
 	      it->multibyte_p = 1;
 	      it->face_id = FACE_FOR_CHAR (it->f, face, it->char_to_display,
 					   -1, Qnil);

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3745; Package emacs. (Mon, 06 Jul 2009 14:10:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Chong Yidong <cyd <at> stupidchicken.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Mon, 06 Jul 2009 14:10:05 GMT) Full text and rfc822 format available.

Message #35 received at 3745 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Chong Yidong <cyd <at> stupidchicken.com>
To: Kenichi Handa <handa <at> m17n.org>
Cc: 3745 <at> debbugs.gnu.org
Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment
Date: Mon, 06 Jul 2009 10:03:58 -0400

Kenichi Handa <handa <at> m17n.org> writes:

> To minimize the changes, I made the attached patch.  It
> doesn't touch unibyte_to_multibyte_table, but introduced
> charset_unibyte_decoder[128].  I confirmed it didn't make
> the display code slow.

> @@ -302,11 +298,11 @@
>    struct charset *charset;
>    unsigned c1;
>  
> +  if (c < 0x80)
> +    return c;
>    if (CHAR_BYTE8_P (c))
>      return CHAR_TO_BYTE8 (c);

You should also delete the unused `charset' and `c1' variables in this
block.

Other than that, these changes look good.  Thanks very much for making
this patch, and please install on the branch ASAP.

For the trunk, I agree that we should try using use DECODE_CHAR in
x_produce_glyphs.

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3745; Package emacs. (Tue, 07 Jul 2009 06:35:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kenichi Handa <handa <at> m17n.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Tue, 07 Jul 2009 06:35:05 GMT) Full text and rfc822 format available.

Message #40 received at 3745 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Chong Yidong <cyd <at> stupidchicken.com>
Cc: 3745 <at> debbugs.gnu.org
Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment
Date: Tue, 07 Jul 2009 15:28:10 +0900

In article <87my7h6h81.fsf <at> stupidchicken.com>, Chong Yidong <cyd <at> stupidchicken.com> writes:

> Kenichi Handa <handa <at> m17n.org> writes:
> > To minimize the changes, I made the attached patch.  It
> > doesn't touch unibyte_to_multibyte_table, but introduced
> > charset_unibyte_decoder[128].  I confirmed it didn't make
> > the display code slow.

> > @@ -302,11 +298,11 @@
> >    struct charset *charset;
> >    unsigned c1;
> >  
> > +  if (c < 0x80)
> > +    return c;
> >    if (CHAR_BYTE8_P (c))
> >      return CHAR_TO_BYTE8 (c);

> You should also delete the unused `charset' and `c1' variables in this
> block.

Ah, yes.

> Other than that, these changes look good.  Thanks very much for making
> this patch, and please install on the branch ASAP.

> For the trunk, I agree that we should try using use DECODE_CHAR in
> x_produce_glyphs.

Ok, done.  I also installed this change of
reset-language-environment for completion.

--- mule-cmds.el	8 Apr 2009 18:03:17 -0000	1.360
+++ mule-cmds.el	7 Jul 2009 05:59:18 -0000	1.360.2.1
@@ -1794,6 +1794,11 @@
 	   (coding-system-error 'iso-latin-1))))
     (setq default-process-coding-system
 	  (cons output-coding input-coding)))
+  ;; Put the highest priority to the charset iso-8859-1 to prefer the
+  ;; registry iso8859-1 over iso8859-2 in font selection.  It also
+  ;; makes unibyte-display-via-language-environment to use iso-8859-1
+  ;; as the unibyte charset.
+  (set-charset-priority 'iso-8859-1)
 
   ;; Don't alter the terminal and keyboard coding systems here.
   ;; The terminal still supports the same coding system



---
Kenichi Handa
handa <at> m17n.org

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3745; Package emacs. (Tue, 07 Jul 2009 12:35:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Andreas Schwab <schwab <at> linux-m68k.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Tue, 07 Jul 2009 12:35:05 GMT) Full text and rfc822 format available.

Message #45 received at 3745 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Kenichi Handa <handa <at> m17n.org>
Cc: 3745 <at> debbugs.gnu.org, cyd <at> stupidchicken.com
Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment
Date: Tue, 07 Jul 2009 14:33:24 +0200

Kenichi Handa <handa <at> m17n.org> writes:

> +/* Decoding table for 8-bit byte codes of the charset charset_unibyte.
> +   Nth element is for the code (N-0x80).  */

You probably mean (N+0x80).

> +int charset_unibyte_decoder[128];

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3745; Package emacs. (Tue, 07 Jul 2009 12:50:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kenichi Handa <handa <at> m17n.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Tue, 07 Jul 2009 12:50:04 GMT) Full text and rfc822 format available.

Message #50 received at 3745 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: 3745 <at> debbugs.gnu.org, cyd <at> stupidchicken.com
Subject: Re: bug#3745: 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment
Date: Tue, 07 Jul 2009 21:45:00 +0900

In article <m3bpnwisff.fsf <at> hase.home>, Andreas Schwab <schwab <at> linux-m68k.org> writes:

> Kenichi Handa <handa <at> m17n.org> writes:
> > +/* Decoding table for 8-bit byte codes of the charset charset_unibyte.
> > +   Nth element is for the code (N-0x80).  */

> You probably mean (N+0x80).

Yes!  Just fixed, thank you.

---
Kenichi Handa
handa <at> m17n.org

bug closed, send any further explanations to Jay Berkenbilt <ejb <at> ql.org> Request was from Chong Yidong <cyd <at> stupidchicken.com> to control <at> emacsbugs.donarmstrong.com. (Wed, 08 Jul 2009 14:05:11 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> emacsbugs.donarmstrong.com. (Wed, 05 Aug 2009 14:24:10 GMT) Full text and rfc822 format available.

This bug report was last modified 16 years and 8 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #3745 23.0.95; emacs-23.0.95: unibyte-display-via-language-environment

GNU bug report logs - #3745
23.0.95; emacs-23.0.95: unibyte-display-via-language-environment