GNU bug report logs - #970
23.0.60; Non-ASCII display problems on a tty

Previous Next

Package: emacs;

Reported by: Eli Zaretskii <eliz <at> fencepost.gnu.org>

Date: Fri, 12 Sep 2008 10:25:04 UTC

Severity: normal

Done: Chong Yidong <cyd <at> stupidchicken.com>

Bug is archived. No further changes may be made.

Full log


Message #25 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> m17n.org>, 970 <at> debbugs.gnu.org
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#970: 23.0.60; Non-ASCII display problems on a tty
Date: Sat, 27 Sep 2008 17:48:02 +0300
I have some more info about this bug.

The below is based on displaying a file that is encoded in
iso-2022-7bit-unix, and has a single line that is a copy of line 20
from etc/HELLO, which is the entry for the Bengali language.

To produce this file, copy line 20 of HELLO, paste it into a new file,
type "C-x RET f iso-2022-7bit-unix RET" and save the file.

The display problems for this line are directly caused by the fact
that tty_write_glyphs is called with its last argument len=22, which
means the display engine expects 22 characters to be displayed.  And
tty_write_glyphs therefore moves cursor by 22 positions to account for
that.

However, encode_terminal_code returns a string whose length is only 13
characters, and the difference between 13 and 22 is the immediate
cause for display problems: the displayed string looks as if it were
padded by whitespace, but typing "C-x =" on these ``whitespace''
characters reveals that they are not spaces at all.

Looking inside encode_terminal_code, I see that the problem is somehow
related to composite characters.  The first group of non-ASCII
characters (in parentheses) are composite characters whose
u.cmp.automatic flag is set.  The Lisp object returned by
composition_gstring_from_id for this group of characters is a Lisp
vector:

  [[nil 2476 2494 2434 2482 2494] 0 [0 0 2476 2476 1 0 1 1 0 nil] [1 1 2494 2494 1 0 1 1 0 nil] [2 2 2434 2434 1 0 1 1 0 nil] [3 3 2482 2482 1 0 1 1 0 nil] [4 4 2494 2494 1 0 1 1 0 nil]]

When this code:

	  if (src->u.cmp.automatic)
	    for (i = src->u.cmp.from; i < src->u.cmp.to; i++)
	      {
		Lisp_Object g = LGSTRING_GLYPH (gstring, i);
		int c = LGLYPH_CHAR (g);

		if (! char_charset (c, charset_list, NULL))
		  break;
		buf += CHAR_STRING (c, buf);
		nchars++;
	      }

walks this Lisp vector, it immediately finds that the 1st character
cannot be encoded by the current terminal's encoding, and breaks out
of the loop.  Then the `?' character gets stored in the buffer that is
being prepared for encoding:

	  if (i == 0)
	    {
	      /* The first character of the composition is not encodable.  */
	      *buf++ = '?';
	      nchars++;
	    }

This is all as expected, but because of the "if (i == 0)" clause
above, the `?' character gets stored only for the first character in
this composition, whose codepoint is 2476.  For other characters, the
u.cmp.from value is greater than 0, so `?' is not stored for them.

By contrast, on a graphics terminal, the 5 characters inside the
parentheses are displayed as 2 visible glyphs, one (codepoint 2476)
for buffer position 10, the other (codepoint 2482) for buffer position
13.  Thus, I would expect to see two `?' question marks inside
parentheses, not one.

Similar problem happens with the second group of non-ASCII characters
on this line, the characters that follow the TAB character.  Here's
the Lisp object returned by composition_gstring_from_id:

  [[nil 2472 2478 2488 2509 2453 2494 2480] 1 [0 0 2472 2472 1 0 1 1 0 nil] [1 1 2478 2478 1 0 1 1 0 nil] [2 3 2488 2488 1 0 1 1 0 nil] [2 3 2509 2509 0 0 0 1 0 nil] [4 4 2453 2453 1 0 1 1 0 nil] [5 5 2494 2494 1 0 1 1 0 nil] [6 6 2480 2480 1 0 1 1 0 nil]]

(Note that in this case, there are elements in this vector whose
FROM-IDX and TO-IDX values are not identical, and also the WIDTH value
is zero for one of them.)  This group of characters is displayed as 4
visible glyphs on a graphics terminal: respectively, for buffer
positions 17 (code 2472), 18 (code 2478), 19 (code 2488), and 23
(2480).  On a TTY, only one `?' is shown, again for the same reason as
described above: the "if (i == 0)" test.

My first suspicion would be that the object returned by
composition_gstring_from_id gives incorrect data for FROM-IDX and
TO-IDX, but I'm not sure I understood the composition machinery enough
to draw a definitive conclusion.  It is not even clear to me how do we
want to display these characters: do we want the number of `?'s to be
identical to the number of glyphs displayed by a graphics terminal, or
do we want something else?

Handa-san, can you please comment on these findings?





This bug report was last modified 16 years and 129 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.