GNU bug report logs -
#970
23.0.60; Non-ASCII display problems on a tty
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 970 in the body.
You can then email your comments to 970 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#970
; Package
emacs
.
Full text and
rfc822 format available.
Acknowledgement sent to
Eli Zaretskii <eliz <at> fencepost.gnu.org>
:
New bug report received and forwarded. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
Full text and
rfc822 format available.
Message #5 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):
emacs -Q
C-h H
Type C-n several times, and you will see some very strange behavior:
for example, some lines are skipped and point never enters them.
Also, some non-ASCII characters are displayed incorrectly. For
example, the "Bengali" line has only 1 "?" character in the
parentheses following the language name, whereas 2 characters are
displayed on a graphics display (I tried MS-Windows). On the same
line, under "HELLO", there are 2 "?" characters instead of 4, and
they are not aligned with the rest of greetings; moving point with
C-f skips those "?"s and lands on what is displayed as space,
but "C-x =" shows that there are non-ASCII characters in the buffer
at those "blank" positions.
Etc., etc., it looks like tty display of non-ASCII characters that
cannot be displayed by the current terminal-coding-system is very
much screwed up.
Here's what "locale" reports, in case it's important:
eliz <at> fencepost:~/emacs.cvs/emacs$ locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
In GNU Emacs 23.0.60.63 (x86_64-unknown-linux-gnu, X toolkit)
of 2008-09-12 on fencepost
configured using `configure '--with-jpeg=no' '--with-png=no' '--with-gif=no' '--with-tiff=no''
Important settings:
value of $LC_ALL: nil
value of $LC_COLLATE: nil
value of $LC_CTYPE: nil
value of $LC_MESSAGES: nil
value of $LC_MONETARY: nil
value of $LC_NUMERIC: nil
value of $LC_TIME: nil
value of $LANG: nil
value of $XMODIFIERS: nil
locale-coding-system: nil
default-enable-multibyte-characters: t
Major mode: Fundamental
Minor modes in effect:
tooltip-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
global-auto-composition-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
view-mode: t
Recent input:
ESC [ > 0 ; 1 3 6 ; 0 c C-h H ESC O B ESC O B ESC O
B ESC O B ESC O B ESC O B ESC O B ESC O B C-n C-n C-n
C-n C-n C-n ESC x r e p o r t - e m a TAB TAB RET
Recent messages:
("./src/emacs" "-Q")
For information about GNU Emacs and the GNU system, type C-h C-a.
Loading vc-cvs...done
View mode: type C-h for help, h for commands, q to quit.
Information forwarded to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#970
; Package
emacs
.
Full text and
rfc822 format available.
Acknowledgement sent to
Chong Yidong <cyd <at> stupidchicken.com>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
Full text and
rfc822 format available.
Message #10 received at 970 <at> emacsbugs.donarmstrong.com (full text, mbox):
> emacs -Q
> C-h H
>
> Type C-n several times, and you will see some very strange behavior:
> for example, some lines are skipped and point never enters them.
I think Kenichi Handa's latest composition changes should have fixed
this. Can you verify?
Information forwarded to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#970
; Package
emacs
.
Full text and
rfc822 format available.
Acknowledgement sent to
Eli Zaretskii <eliz <at> gnu.org>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
Full text and
rfc822 format available.
Message #15 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):
> From: Chong Yidong <cyd <at> stupidchicken.com>
> Date: Thu, 18 Sep 2008 14:32:00 -0400
> Cc: 970 <at> emacsbugs.donarmstrong.com
>
> > emacs -Q
> > C-h H
> >
> > Type C-n several times, and you will see some very strange behavior:
> > for example, some lines are skipped and point never enters them.
>
> I think Kenichi Handa's latest composition changes should have fixed
> this. Can you verify?
The ``some lines are skipped'' part is indeed solved. But the other
problems mentioned in my bug report are still there. For example,
compare the "South Asia" and "Bengali" lines with a graphics display:
the number and screen position of the `?' question marks displayed
on a tty instead of non-ASCII characters do not match those displayed
on a graphics terminal.
Information forwarded to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#970
; Package
emacs
.
Full text and
rfc822 format available.
Acknowledgement sent to
Eli Zaretskii <eliz <at> gnu.org>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
Full text and
rfc822 format available.
Information forwarded to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#970
; Package
emacs
.
Full text and
rfc822 format available.
Acknowledgement sent to
Eli Zaretskii <eliz <at> gnu.org>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
Full text and
rfc822 format available.
Message #25 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):
I have some more info about this bug.
The below is based on displaying a file that is encoded in
iso-2022-7bit-unix, and has a single line that is a copy of line 20
from etc/HELLO, which is the entry for the Bengali language.
To produce this file, copy line 20 of HELLO, paste it into a new file,
type "C-x RET f iso-2022-7bit-unix RET" and save the file.
The display problems for this line are directly caused by the fact
that tty_write_glyphs is called with its last argument len=22, which
means the display engine expects 22 characters to be displayed. And
tty_write_glyphs therefore moves cursor by 22 positions to account for
that.
However, encode_terminal_code returns a string whose length is only 13
characters, and the difference between 13 and 22 is the immediate
cause for display problems: the displayed string looks as if it were
padded by whitespace, but typing "C-x =" on these ``whitespace''
characters reveals that they are not spaces at all.
Looking inside encode_terminal_code, I see that the problem is somehow
related to composite characters. The first group of non-ASCII
characters (in parentheses) are composite characters whose
u.cmp.automatic flag is set. The Lisp object returned by
composition_gstring_from_id for this group of characters is a Lisp
vector:
[[nil 2476 2494 2434 2482 2494] 0 [0 0 2476 2476 1 0 1 1 0 nil] [1 1 2494 2494 1 0 1 1 0 nil] [2 2 2434 2434 1 0 1 1 0 nil] [3 3 2482 2482 1 0 1 1 0 nil] [4 4 2494 2494 1 0 1 1 0 nil]]
When this code:
if (src->u.cmp.automatic)
for (i = src->u.cmp.from; i < src->u.cmp.to; i++)
{
Lisp_Object g = LGSTRING_GLYPH (gstring, i);
int c = LGLYPH_CHAR (g);
if (! char_charset (c, charset_list, NULL))
break;
buf += CHAR_STRING (c, buf);
nchars++;
}
walks this Lisp vector, it immediately finds that the 1st character
cannot be encoded by the current terminal's encoding, and breaks out
of the loop. Then the `?' character gets stored in the buffer that is
being prepared for encoding:
if (i == 0)
{
/* The first character of the composition is not encodable. */
*buf++ = '?';
nchars++;
}
This is all as expected, but because of the "if (i == 0)" clause
above, the `?' character gets stored only for the first character in
this composition, whose codepoint is 2476. For other characters, the
u.cmp.from value is greater than 0, so `?' is not stored for them.
By contrast, on a graphics terminal, the 5 characters inside the
parentheses are displayed as 2 visible glyphs, one (codepoint 2476)
for buffer position 10, the other (codepoint 2482) for buffer position
13. Thus, I would expect to see two `?' question marks inside
parentheses, not one.
Similar problem happens with the second group of non-ASCII characters
on this line, the characters that follow the TAB character. Here's
the Lisp object returned by composition_gstring_from_id:
[[nil 2472 2478 2488 2509 2453 2494 2480] 1 [0 0 2472 2472 1 0 1 1 0 nil] [1 1 2478 2478 1 0 1 1 0 nil] [2 3 2488 2488 1 0 1 1 0 nil] [2 3 2509 2509 0 0 0 1 0 nil] [4 4 2453 2453 1 0 1 1 0 nil] [5 5 2494 2494 1 0 1 1 0 nil] [6 6 2480 2480 1 0 1 1 0 nil]]
(Note that in this case, there are elements in this vector whose
FROM-IDX and TO-IDX values are not identical, and also the WIDTH value
is zero for one of them.) This group of characters is displayed as 4
visible glyphs on a graphics terminal: respectively, for buffer
positions 17 (code 2472), 18 (code 2478), 19 (code 2488), and 23
(2480). On a TTY, only one `?' is shown, again for the same reason as
described above: the "if (i == 0)" test.
My first suspicion would be that the object returned by
composition_gstring_from_id gives incorrect data for FROM-IDX and
TO-IDX, but I'm not sure I understood the composition machinery enough
to draw a definitive conclusion. It is not even clear to me how do we
want to display these characters: do we want the number of `?'s to be
identical to the number of glyphs displayed by a graphics terminal, or
do we want something else?
Handa-san, can you please comment on these findings?
Information forwarded to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#970
; Package
emacs
.
Full text and
rfc822 format available.
Acknowledgement sent to
Eli Zaretskii <eliz <at> gnu.org>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
Full text and
rfc822 format available.
Information forwarded
to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#970
; Package
emacs
.
(Fri, 06 Feb 2009 16:00:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Eli Zaretskii <eliz <at> gnu.org>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
(Fri, 06 Feb 2009 16:00:03 GMT)
Full text and
rfc822 format available.
Message #35 received at 970 <at> emacsbugs.donarmstrong.com (full text, mbox):
> From: Kenichi Handa <handa <at> m17n.org>
> CC: eliz <at> gnu.org, cyd <at> stupidchicken.com, emacs-devel <at> gnu.org
> Date: Wed, 04 Feb 2009 11:49:19 +0900
>
> > > Bug #970 is still not fixed, as of today's CVS. Is someone working on
> > > it? I don't think we can release Emacs 23 with this problem.
>
> > I've just started to work on Bug #970.
>
> I've just installed fixes.
Thank you, I confirm that most of the problems with compositions seem
to be solved, at least in the HELLO file display.
There are still a few strange phenomena with terminal display,
although they seem unrelated to compositions.
For example, after typing "C-h H", go to the line that begins with
"CJK variety", and type "C-f": you will see that the cursor jumps past
some of the characters inside parentheses. Is this a bug?
Information forwarded
to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#970
; Package
emacs
.
(Tue, 10 Feb 2009 00:50:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Kenichi Handa <handa <at> m17n.org>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
(Tue, 10 Feb 2009 00:50:03 GMT)
Full text and
rfc822 format available.
Message #40 received at 970 <at> emacsbugs.donarmstrong.com (full text, mbox):
In article <uljsjtvjh.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:
> There are still a few strange phenomena with terminal display,
> although they seem unrelated to compositions.
> For example, after typing "C-h H", go to the line that begins with
> "CJK variety", and type "C-f": you will see that the cursor jumps past
> some of the characters inside parentheses. Is this a bug?
No. Those CJK characters have width 2, and if they are not
supported by the terminal coding system,
encode_terminal_code produces two '?'s.
---
Kenichi Handa
handa <at> m17n.org
bug closed, send any further explanations to Eli Zaretskii <eliz <at> fencepost.gnu.org>
Request was from
Chong Yidong <cyd <at> stupidchicken.com>
to
control <at> emacsbugs.donarmstrong.com
.
(Sun, 15 Mar 2009 16:10:07 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> emacsbugs.donarmstrong.com
.
(Mon, 13 Apr 2009 14:24:11 GMT)
Full text and
rfc822 format available.
This bug report was last modified 16 years and 128 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.