GNU bug report logs -
#25288
25.1; term, ansi-term, broken output of utf8 text
Previous Next
Reported by: Vjacheslav <fvamail <at> gmail.com>
Date: Wed, 28 Dec 2016 16:58:02 UTC
Severity: normal
Tags: confirmed, fixed, patch
Found in versions 24.5, 25.1
Fixed in version 26.1
Done: npostavs <at> users.sourceforge.net
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 25288 in the body.
You can then email your comments to 25288 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#25288
; Package
emacs
.
(Wed, 28 Dec 2016 16:58:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Vjacheslav <fvamail <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Wed, 28 Dec 2016 16:58:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Trying to use this command from terminal running bash:
[fva <at> localhost ~]$ python -c 'print "ш"*5000'
produces garbage (шшш\321\210шшш) in output. Terminal needs reset. Possibly this
is a bug which seen in very old linux, (breaks multibyte characters on buffer
borders).
default-process-coding-system is OK:
default-process-coding-system is a variable defined in ‘C source code’.
Its value is (utf-8-unix . utf-8-unix)
In GNU Emacs 25.1.1 (x86_64-redhat-linux-gnu, GTK+ Version 3.22.4)
of 2016-12-15 built on buildvm-30.phx2.fedoraproject.org
Windowing system distributor 'Fedora Project', version 11.0.11900000
Configured using:
'configure --build=x86_64-redhat-linux-gnu
--host=x86_64-redhat-linux-gnu --program-prefix=
--disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
--bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc
--datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64
--libexecdir=/usr/libexec --localstatedir=/var
--sharedstatedir=/var/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-dbus --with-gif --with-jpeg --with-png
--with-rsvg --with-tiff --with-xft --with-xpm --with-x-toolkit=gtk3
--with-gpm=no --with-xwidgets build_alias=x86_64-redhat-linux-gnu
host_alias=x86_64-redhat-linux-gnu 'CFLAGS=-DMAIL_USE_LOCKF -O2 -g
-pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
-m64 -mtune=generic' LDFLAGS=-Wl,-z,relro
PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND DBUS GCONF GSETTINGS NOTIFY
ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 XWIDGETS
Important settings:
value of $LANG: ru_RU.UTF-8
value of $XMODIFIERS: @im=ibus
locale-coding-system: utf-8-unix
Major mode: Term
Minor modes in effect:
show-paren-mode: t
recentf-mode: t
delete-selection-mode: t
global-auto-complete-mode: t
tooltip-mode: t
global-eldoc-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
Recent messages:
Checking 120 files in /usr/share/emacs/25.1/lisp/obsolete...
Checking for load-path shadows...done
Auto-saving...
next-line: End of buffer [2 times]
previous-line: Beginning of buffer [7 times]
Quit
funcall-interactively: End of buffer [4 times]
previous-line: Beginning of buffer [2 times]
mwheel-scroll: Beginning of buffer [2 times]
Making completion list... [2 times]
Load-path shadows:
None found.
Features:
(pp shadow sort mail-extr emacsbug message idna dired format-spec rfc822
mml mml-sec password-cache epg epg-config gnus-util mm-decode mm-bodies
mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail
rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils thingatpt
help-fns help-mode term disp-table ehelp easy-mmode ropemacs ring pymacs
advice paren recentf tree-widget wid-edit easymenu delsel cus-start
cus-load erlang-start auto-complete-config auto-complete edmacro kmacro
cl-loaddefs pcase cl-lib popup time-date mule-util cyril-util tooltip
eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel x-win
term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list newcomment elisp-mode lisp-mode prog-mode register page
menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core frame cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese charscript case-table epa-hook jka-cmpr-hook help
simple abbrev minibuffer cl-preloaded nadvice loaddefs button faces
cus-face macroexp files text-properties overlay sha1 md5 base64 format
env code-pages mule custom widget hashtable-print-readable backquote
dbusbind inotify dynamic-setting system-font-setting font-render-setting
xwidget-internal move-toolbar gtk x-toolkit x multi-tty
make-network-process emacs)
Memory information:
((conses 16 118333 17341)
(symbols 48 23114 0)
(miscs 40 145 285)
(strings 32 22117 5473)
(string-bytes 1 586321)
(vectors 16 15669)
(vector-slots 8 490744 11337)
(floats 8 203 310)
(intervals 56 965 1)
(buffers 976 25))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#25288
; Package
emacs
.
(Wed, 28 Dec 2016 19:10:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 25288 <at> debbugs.gnu.org (full text, mbox):
found 25288 24.5
tags 25288 confirmed
quit
Vjacheslav <fvamail <at> gmail.com> writes:
> Trying to use this command from terminal running bash:
>
> [fva <at> localhost ~]$ python -c 'print "ш"*5000'
>
> produces garbage (шшш\321\210шшш) in output. Terminal needs
> reset. Possibly this is a bug which seen in very old linux, (breaks
> multibyte characters on buffer borders).
>
> default-process-coding-system is OK:
>
> default-process-coding-system is a variable defined in ‘C source code’.
> Its value is (utf-8-unix . utf-8-unix)
It looks like the problem is that the process filter function,
term-emulate-terminal, receives the output in chunks of 4096 bytes[1]. The
ш character is encoded in 2 bytes, which means it can be split across
chunks.
Is there a way to recognize incomplete decoding from lisp? I can't see
any.
[1]: It's getting bytes rather than characters because in term-exec-1 we
have:
;; The process's output contains not just chars but also binary
;; escape codes, so we need to see the raw output. We will have to
;; do the decoding by hand on the parts that are made of chars.
(coding-system-for-read 'binary))
bug Marked as found in versions 24.5.
Request was from
npostavs <at> users.sourceforge.net
to
control <at> debbugs.gnu.org
.
(Wed, 28 Dec 2016 19:10:02 GMT)
Full text and
rfc822 format available.
Added tag(s) confirmed.
Request was from
npostavs <at> users.sourceforge.net
to
control <at> debbugs.gnu.org
.
(Wed, 28 Dec 2016 19:10:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#25288
; Package
emacs
.
(Wed, 28 Dec 2016 19:32:01 GMT)
Full text and
rfc822 format available.
Message #15 received at 25288 <at> debbugs.gnu.org (full text, mbox):
> From: npostavs <at> users.sourceforge.net
> Date: Wed, 28 Dec 2016 14:10:30 -0500
> Cc: 25288 <at> debbugs.gnu.org
>
> Is there a way to recognize incomplete decoding from lisp? I can't see
> any.
If you know the encoding of the byte stream (and term.el must, since
it evidently decodes it later on), then you could probably use
char-charset, after decoding: if you get 'eight-bit, then you've got
incomplete byte sequence. But I didn't try that.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#25288
; Package
emacs
.
(Thu, 29 Dec 2016 02:37:01 GMT)
Full text and
rfc822 format available.
Message #18 received at 25288 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
tags 25288 patch
quit
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: npostavs <at> users.sourceforge.net
>> Date: Wed, 28 Dec 2016 14:10:30 -0500
>> Cc: 25288 <at> debbugs.gnu.org
>>
>> Is there a way to recognize incomplete decoding from lisp? I can't see
>> any.
>
> If you know the encoding of the byte stream (and term.el must, since
> it evidently decodes it later on), then you could probably use
> char-charset, after decoding: if you get 'eight-bit, then you've got
> incomplete byte sequence. But I didn't try that.
That should work at least for encodings like utf-8 for which undecoded
bytes are not ascii. I guess parsing of escape codes would only work on
such encodings anyway, so it should be fine. Patch attached.
[v1-0001-Handle-multibyte-chars-spanning-chunks-in-term.el.patch (text/plain, attachment)]
Added tag(s) patch.
Request was from
npostavs <at> users.sourceforge.net
to
control <at> debbugs.gnu.org
.
(Thu, 29 Dec 2016 02:37:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#25288
; Package
emacs
.
(Thu, 29 Dec 2016 16:07:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 25288 <at> debbugs.gnu.org (full text, mbox):
> From: npostavs <at> users.sourceforge.net
> Cc: 25288 <at> debbugs.gnu.org, fvamail <at> gmail.com
> Date: Wed, 28 Dec 2016 21:37:19 -0500
>
> > If you know the encoding of the byte stream (and term.el must, since
> > it evidently decodes it later on), then you could probably use
> > char-charset, after decoding: if you get 'eight-bit, then you've got
> > incomplete byte sequence. But I didn't try that.
>
> That should work at least for encodings like utf-8 for which undecoded
> bytes are not ascii. I guess parsing of escape codes would only work on
> such encodings anyway, so it should be fine. Patch attached.
LGTM, thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#25288
; Package
emacs
.
(Tue, 03 Jan 2017 14:06:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 25288 <at> debbugs.gnu.org (full text, mbox):
tags 25288 fixed
close 25288 26.1
quit
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: npostavs <at> users.sourceforge.net
>> Cc: 25288 <at> debbugs.gnu.org, fvamail <at> gmail.com
>> Date: Wed, 28 Dec 2016 21:37:19 -0500
>>
>> > If you know the encoding of the byte stream (and term.el must, since
>> > it evidently decodes it later on), then you could probably use
>> > char-charset, after decoding: if you get 'eight-bit, then you've got
>> > incomplete byte sequence. But I didn't try that.
>>
>> That should work at least for encodings like utf-8 for which undecoded
>> bytes are not ascii. I guess parsing of escape codes would only work on
>> such encodings anyway, so it should be fine. Patch attached.
>
> LGTM, thanks.
Pushed as 134e86b360ca.
Added tag(s) fixed.
Request was from
npostavs <at> users.sourceforge.net
to
control <at> debbugs.gnu.org
.
(Tue, 03 Jan 2017 14:06:02 GMT)
Full text and
rfc822 format available.
bug marked as fixed in version 26.1, send any further explanations to
25288 <at> debbugs.gnu.org and Vjacheslav <fvamail <at> gmail.com>
Request was from
npostavs <at> users.sourceforge.net
to
control <at> debbugs.gnu.org
.
(Tue, 03 Jan 2017 14:06:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 01 Feb 2017 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 8 years and 196 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.