GNU bug report logs -
#25288
25.1; term, ansi-term, broken output of utf8 text
Previous Next
Reported by: Vjacheslav <fvamail <at> gmail.com>
Date: Wed, 28 Dec 2016 16:58:02 UTC
Severity: normal
Tags: confirmed, fixed, patch
Found in versions 24.5, 25.1
Fixed in version 26.1
Done: npostavs <at> users.sourceforge.net
Bug is archived. No further changes may be made.
Full log
Message #10 received at control <at> debbugs.gnu.org (full text, mbox):
found 25288 24.5
tags 25288 confirmed
quit
Vjacheslav <fvamail <at> gmail.com> writes:
> Trying to use this command from terminal running bash:
>
> [fva <at> localhost ~]$ python -c 'print "ш"*5000'
>
> produces garbage (шшш\321\210шшш) in output. Terminal needs
> reset. Possibly this is a bug which seen in very old linux, (breaks
> multibyte characters on buffer borders).
>
> default-process-coding-system is OK:
>
> default-process-coding-system is a variable defined in ‘C source code’.
> Its value is (utf-8-unix . utf-8-unix)
It looks like the problem is that the process filter function,
term-emulate-terminal, receives the output in chunks of 4096 bytes[1]. The
ш character is encoded in 2 bytes, which means it can be split across
chunks.
Is there a way to recognize incomplete decoding from lisp? I can't see
any.
[1]: It's getting bytes rather than characters because in term-exec-1 we
have:
;; The process's output contains not just chars but also binary
;; escape codes, so we need to see the raw output. We will have to
;; do the decoding by hand on the parts that are made of chars.
(coding-system-for-read 'binary))
This bug report was last modified 8 years and 196 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.