From unknown Thu Aug 14 12:25:25 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#25288 <25288@debbugs.gnu.org> To: bug#25288 <25288@debbugs.gnu.org> Subject: Status: 25.1; term, ansi-term, broken output of utf8 text Reply-To: bug#25288 <25288@debbugs.gnu.org> Date: Thu, 14 Aug 2025 19:25:25 +0000 retitle 25288 25.1; term, ansi-term, broken output of utf8 text reassign 25288 emacs submitter 25288 Vjacheslav severity 25288 normal tag 25288 confirmed fixed patch thanks From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 28 11:57:24 2016 Received: (at submit) by debbugs.gnu.org; 28 Dec 2016 16:57:24 +0000 Received: from localhost ([127.0.0.1]:58260 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMHXY-0004Sx-AZ for submit@debbugs.gnu.org; Wed, 28 Dec 2016 11:57:24 -0500 Received: from eggs.gnu.org ([208.118.235.92]:54254) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMBgR-00009w-JR for submit@debbugs.gnu.org; Wed, 28 Dec 2016 05:42:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cMBgL-0007pL-DY for submit@debbugs.gnu.org; Wed, 28 Dec 2016 05:42:06 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:49018) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cMBgL-0007pH-9a for submit@debbugs.gnu.org; Wed, 28 Dec 2016 05:42:05 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45222) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cMBgJ-0004nq-Qh for bug-gnu-emacs@gnu.org; Wed, 28 Dec 2016 05:42:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cMBgG-0007nn-MJ for bug-gnu-emacs@gnu.org; Wed, 28 Dec 2016 05:42:03 -0500 Received: from mail-lf0-x231.google.com ([2a00:1450:4010:c07::231]:35737) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cMBgG-0007mR-Dn for bug-gnu-emacs@gnu.org; Wed, 28 Dec 2016 05:42:00 -0500 Received: by mail-lf0-x231.google.com with SMTP id b14so202054878lfg.2 for ; Wed, 28 Dec 2016 02:41:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:message-id:date:user-agent:mime-version :content-transfer-encoding; bh=Ch3URCq06ep/BP9qWqnSxfSeNzlQGqVDyKGt9IqAXYA=; b=SmG+C/L/RYC/SHHCi3gJomDOnJ/gXJCyijY7JF97QlSthyZe6WAbPZcfQDiU1k6Fnr Mgz4VoGACx4QUAUGD63hU5fjte/ntFVr87prcNSXqL0YdH8jBUL2Ix6kgITf/FFrFxUJ DkZDlUDX7gPiFT7zz3tXuup+KlI5GpTzfSuKW30v6mRGTzbgPbtflZLQtGFfdLbMVE1r NuKBerjy9KfcV+bqQUHqd1DAtsqcA9rUQzr8Hj++BKuxNSDIrJH9HZEHnx8iJlMPhi6Q uS4cnfUamwmSC3SjV8xej5fyZKs1/a7QO+eNfCyiJuoHFG1SmqXYAwj2aj6lSCcWpP6l AfkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:message-id:date:user-agent :mime-version:content-transfer-encoding; bh=Ch3URCq06ep/BP9qWqnSxfSeNzlQGqVDyKGt9IqAXYA=; b=LJDZRGiqe+/uhHJ55lk1cIYrt19l7Y6Qo879QVqegIjdLHoHHo1V0meL2jDJjOFiiI 7lTCEpL7Lfq9Iqgw+Qnb7K5bD1GGdxjU6OKXAvSpTR59T0cMOHB8SbdAUQoCPyVs8+dv 9lfOJIg3R8ab201hPm4BOXmBctQfn4V2EmS5Dwkybk1iUaT9ZIxeZ2RCLaqAj3vewmYx Rs2iN8xXyZvfpQ2pX/DCVSGXKLRSW8ZTXX8/bEBEIpooibY5qQGInRDGISHNn7lZgRGc gMnJU+uXDzRBmyaeNvVlZBxkGhA/tTnZhqsFs0izKxJ35iwqK961di9bdCVuQJWW2poI kyLg== X-Gm-Message-State: AIkVDXLflsJFC7GoV0mCi9FFbXcjt15nBTfxOKQYfEZIFSLvopSnRyNmRBK0QGzxqSQaiQ== X-Received: by 10.25.76.195 with SMTP id z186mr13222262lfa.104.1482921717216; Wed, 28 Dec 2016 02:41:57 -0800 (PST) Received: from localhost.localdomain (nat.ftelecom.ru. [37.18.25.2]) by smtp.gmail.com with ESMTPSA id s127sm12091117lja.31.2016.12.28.02.41.56 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 28 Dec 2016 02:41:56 -0800 (PST) From: Vjacheslav Subject: 25.1; term, ansi-term, broken output of utf8 text To: bug-gnu-emacs@gnu.org Message-ID: Date: Wed, 28 Dec 2016 13:41:55 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 28 Dec 2016 11:57:23 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Trying to use this command from terminal running bash: [fva@localhost ~]$ python -c 'print "ш"*5000' produces garbage (шшш\321\210шшш) in output. Terminal needs reset. Possibly this is a bug which seen in very old linux, (breaks multibyte characters on buffer borders). default-process-coding-system is OK: default-process-coding-system is a variable defined in ‘C source code’. Its value is (utf-8-unix . utf-8-unix) In GNU Emacs 25.1.1 (x86_64-redhat-linux-gnu, GTK+ Version 3.22.4) of 2016-12-15 built on buildvm-30.phx2.fedoraproject.org Windowing system distributor 'Fedora Project', version 11.0.11900000 Configured using: 'configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-dbus --with-gif --with-jpeg --with-png --with-rsvg --with-tiff --with-xft --with-xpm --with-x-toolkit=gtk3 --with-gpm=no --with-xwidgets build_alias=x86_64-redhat-linux-gnu host_alias=x86_64-redhat-linux-gnu 'CFLAGS=-DMAIL_USE_LOCKF -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic' LDFLAGS=-Wl,-z,relro PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig' Configured features: XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND DBUS GCONF GSETTINGS NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 XWIDGETS Important settings: value of $LANG: ru_RU.UTF-8 value of $XMODIFIERS: @im=ibus locale-coding-system: utf-8-unix Major mode: Term Minor modes in effect: show-paren-mode: t recentf-mode: t delete-selection-mode: t global-auto-complete-mode: t tooltip-mode: t global-eldoc-mode: t electric-indent-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Recent messages: Checking 120 files in /usr/share/emacs/25.1/lisp/obsolete... Checking for load-path shadows...done Auto-saving... next-line: End of buffer [2 times] previous-line: Beginning of buffer [7 times] Quit funcall-interactively: End of buffer [4 times] previous-line: Beginning of buffer [2 times] mwheel-scroll: Beginning of buffer [2 times] Making completion list... [2 times] Load-path shadows: None found. Features: (pp shadow sort mail-extr emacsbug message idna dired format-spec rfc822 mml mml-sec password-cache epg epg-config gnus-util mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils thingatpt help-fns help-mode term disp-table ehelp easy-mmode ropemacs ring pymacs advice paren recentf tree-widget wid-edit easymenu delsel cus-start cus-load erlang-start auto-complete-config auto-complete edmacro kmacro cl-loaddefs pcase cl-lib popup time-date mule-util cyril-util tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel x-win term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode prog-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese charscript case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer cl-preloaded nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote dbusbind inotify dynamic-setting system-font-setting font-render-setting xwidget-internal move-toolbar gtk x-toolkit x multi-tty make-network-process emacs) Memory information: ((conses 16 118333 17341) (symbols 48 23114 0) (miscs 40 145 285) (strings 32 22117 5473) (string-bytes 1 586321) (vectors 16 15669) (vector-slots 8 490744 11337) (floats 8 203 310) (intervals 56 965 1) (buffers 976 25)) From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 28 14:09:43 2016 Received: (at 25288) by debbugs.gnu.org; 28 Dec 2016 19:09:43 +0000 Received: from localhost ([127.0.0.1]:58282 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMJbb-0007fb-B5 for submit@debbugs.gnu.org; Wed, 28 Dec 2016 14:09:43 -0500 Received: from mail-io0-f194.google.com ([209.85.223.194]:35172) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMJbY-0007fF-Iq; Wed, 28 Dec 2016 14:09:42 -0500 Received: by mail-io0-f194.google.com with SMTP id f73so43983608ioe.2; Wed, 28 Dec 2016 11:09:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=2mx2/4k3FyaHIpu/ttfEvra1GOSI502T50KWXVv4gOg=; b=tbO52D6ErUS/tiYksDRjCPRBYqBmlccitSWJxFlCxAyf4vNyr4kiTidVWkCj4JMPgJ iWTNcwMDNUcqwjBRCVqHO00oVioyl0gJh3FLdPs1wnZ0WZHWzdavx3raFkYHSA4Ed5JS qbyeBQSevdz3w7R0tOEXvprDPZMM4t5VPpkyefsVXK8WP1gl08ZG3oOCzjC/N5xKdBZ0 FcJ0pGP6KsMLRh5nt2+A1HT+FZkVkdQ5T9HN1Nf7a/8w9jrazNSZA21O+Ouo3wjxBAgF 5W4p2aikXi6RUOALdWXcp6ESZczHD3NxzZ1hjfxiCzQmNqwqkAbyLy4JyRrwAAcJUxfb gPqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:references:date :in-reply-to:message-id:user-agent:mime-version :content-transfer-encoding; bh=2mx2/4k3FyaHIpu/ttfEvra1GOSI502T50KWXVv4gOg=; b=bFp9sD3M4vZDbG+9WQqzdKRr2tG8Ltj9BHTrcYxCfhtnKDOb2LPgk0PCfUtwfVBHdY 5e6xA+Rr7LJGw5UclzDF4HO/eCeVy3Aw7cGUpjP3MGI5LE7V+qwqFEwckRJUlwu0TPMI GvNAX3HXWl+q8PclwDAFDk8tS65bN3BNhkElReoXCd7+HvjBls7RJ+9XhWigyvtBw1SH Y3XHMnTHcz6okPSfKyFcHwX2P2u/D6mMwGS9AIqvlc2kTK7wYT3eHfnAZyo0mL9WGLNJ BMpvqV/jTcwHl7FaR8cYj5gIvEiTqYXHOuf5PRgbQd0fS//c7s54QcgiHfz/fWPhjpes c9UA== X-Gm-Message-State: AIkVDXIP5XLjvCe2Pw2JBQc/bPRO2doqPtisq/GruLrYnjKAaxdlGdiQCmmhE8zqDKxbUw== X-Received: by 10.107.15.84 with SMTP id x81mr30130941ioi.68.1482952174921; Wed, 28 Dec 2016 11:09:34 -0800 (PST) Received: from zony ([45.2.7.65]) by smtp.googlemail.com with ESMTPSA id e72sm24078603iof.26.2016.12.28.11.09.32 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 28 Dec 2016 11:09:34 -0800 (PST) From: npostavs@users.sourceforge.net To: Vjacheslav Subject: Re: bug#25288: 25.1; term, ansi-term, broken output of utf8 text References: Date: Wed, 28 Dec 2016 14:10:30 -0500 In-Reply-To: (Vjacheslav's message of "Wed, 28 Dec 2016 13:41:55 +0300") Message-ID: <87r34r98ex.fsf@users.sourceforge.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.6 (/) X-Debbugs-Envelope-To: 25288 Cc: 25288@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.6 (/) found 25288 24.5 tags 25288 confirmed quit Vjacheslav writes: > Trying to use this command from terminal running bash: > > [fva@localhost ~]$ python -c 'print "=D1=88"*5000' > > produces garbage (=D1=88=D1=88=D1=88\321\210=D1=88=D1=88=D1=88) in output= . Terminal needs > reset. Possibly this is a bug which seen in very old linux, (breaks > multibyte characters on buffer borders). > > default-process-coding-system is OK: > > default-process-coding-system is a variable defined in =E2=80=98C source = code=E2=80=99. > Its value is (utf-8-unix . utf-8-unix) It looks like the problem is that the process filter function, term-emulate-terminal, receives the output in chunks of 4096 bytes[1]. The =D1=88 character is encoded in 2 bytes, which means it can be split across chunks. Is there a way to recognize incomplete decoding from lisp? I can't see any. [1]: It's getting bytes rather than characters because in term-exec-1 we have: ;; The process's output contains not just chars but also binary ;; escape codes, so we need to see the raw output. We will have to ;; do the decoding by hand on the parts that are made of chars. (coding-system-for-read 'binary)) From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 28 14:31:51 2016 Received: (at 25288) by debbugs.gnu.org; 28 Dec 2016 19:31:51 +0000 Received: from localhost ([127.0.0.1]:58288 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMJx1-0001Wm-5H for submit@debbugs.gnu.org; Wed, 28 Dec 2016 14:31:51 -0500 Received: from eggs.gnu.org ([208.118.235.92]:60591) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMJwz-0001WZ-Gd for 25288@debbugs.gnu.org; Wed, 28 Dec 2016 14:31:49 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cMJwp-0007wn-EI for 25288@debbugs.gnu.org; Wed, 28 Dec 2016 14:31:44 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_20,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:44201) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cMJwp-0007wj-Ax; Wed, 28 Dec 2016 14:31:39 -0500 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4278 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1cMJwm-0005Jg-G4; Wed, 28 Dec 2016 14:31:39 -0500 Date: Wed, 28 Dec 2016 21:31:14 +0200 Message-Id: <83h95nvojh.fsf@gnu.org> From: Eli Zaretskii To: npostavs@users.sourceforge.net In-reply-to: <87r34r98ex.fsf@users.sourceforge.net> (npostavs@users.sourceforge.net) Subject: Re: bug#25288: 25.1; term, ansi-term, broken output of utf8 text References: <87r34r98ex.fsf@users.sourceforge.net> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -8.1 (--------) X-Debbugs-Envelope-To: 25288 Cc: 25288@debbugs.gnu.org, fvamail@gmail.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -8.1 (--------) > From: npostavs@users.sourceforge.net > Date: Wed, 28 Dec 2016 14:10:30 -0500 > Cc: 25288@debbugs.gnu.org > > Is there a way to recognize incomplete decoding from lisp? I can't see > any. If you know the encoding of the byte stream (and term.el must, since it evidently decodes it later on), then you could probably use char-charset, after decoding: if you get 'eight-bit, then you've got incomplete byte sequence. But I didn't try that. From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 28 21:36:24 2016 Received: (at 25288) by debbugs.gnu.org; 29 Dec 2016 02:36:24 +0000 Received: from localhost ([127.0.0.1]:58404 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMQZr-0007c5-Tx for submit@debbugs.gnu.org; Wed, 28 Dec 2016 21:36:24 -0500 Received: from mail-it0-f67.google.com ([209.85.214.67]:34836) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMQZp-0007bj-P5; Wed, 28 Dec 2016 21:36:22 -0500 Received: by mail-it0-f67.google.com with SMTP id b123so37870610itb.2; Wed, 28 Dec 2016 18:36:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=gwf419gIP/e8HrU7lWYs+bj2Npz4sidkiG/VfwLrUFE=; b=VxF6/wLnBpInMrzBg4iSPnN0LkYQGLKDvBWSqDsKQQCEyDVm7IhRgOcfe0EWvzCA+n EBRwVQh/3JeXtKJCAMNlpBY+qGXmjBGxqjw9gx0Bso5hxVQTTUpSSg9MEt5kn2IGb2iL 1FzV73wh1hr0jfKh27pkVFyBcQK0Trbw2oNHyrN4qiLIIqcT1QfooSzKzlavuXWbWBFN GTYhQBxBbOFk/ZytavsN5AMg3aLogE/JtPzUYwF+lj8uPKfi7DIRn/HH+2F7BJk9VczU qZip9ip8GrcioSuyMwdXEnofuozKV/eemrItIiLa96zLHuEi0FbwG7Fas82ceTjN/OO7 GBkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:references:date :in-reply-to:message-id:user-agent:mime-version; bh=gwf419gIP/e8HrU7lWYs+bj2Npz4sidkiG/VfwLrUFE=; b=oOzgieb5v+BpOmJ1QR2ApnyUHLfGVCtbny21rw5eLcH13oOD1ZSNdsowgGijReKRo6 PCXe3aczS8ZI1wy5GwN/zjTA8dqa04wUrz5UeOLNGIMIPJWogXpZWL1stuT9JQHX5wC9 Z5l9bQSEXaT7SpfsG6eledFyvS4p1L1fUVfn5Xt1+A/0mZGLUqGqD45fiKeBNCeYLm8d mxiloXNsTSgqpQfNc4RFa1hLYgE7APHE0RURC0XZsblA0LJwqSlaIb6ot5Ljd9M3UwyP 2oLKWTZ1YmRc9N4ykXonjAaXrUojjEG3E16no4ExEafSz3DmK5A9M2o0iCXtfjs56B57 wjBg== X-Gm-Message-State: AIkVDXLb6d2HWDekSn8ji2bbskcvOBswoTskIFamKHP7Rkw2CSX3fMDK+0O4uKo8noEp9w== X-Received: by 10.36.26.148 with SMTP id 142mr34330653iti.74.1482978976210; Wed, 28 Dec 2016 18:36:16 -0800 (PST) Received: from zony ([45.2.7.65]) by smtp.googlemail.com with ESMTPSA id j143sm24182204ita.1.2016.12.28.18.36.15 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 28 Dec 2016 18:36:15 -0800 (PST) From: npostavs@users.sourceforge.net To: Eli Zaretskii Subject: Re: bug#25288: 25.1; term, ansi-term, broken output of utf8 text References: <87r34r98ex.fsf@users.sourceforge.net> <83h95nvojh.fsf@gnu.org> Date: Wed, 28 Dec 2016 21:37:19 -0500 In-Reply-To: <83h95nvojh.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 28 Dec 2016 21:31:14 +0200") Message-ID: <87inq38nq8.fsf@users.sourceforge.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 25288 Cc: 25288@debbugs.gnu.org, fvamail@gmail.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.5 (/) --=-=-= Content-Type: text/plain tags 25288 patch quit Eli Zaretskii writes: >> From: npostavs@users.sourceforge.net >> Date: Wed, 28 Dec 2016 14:10:30 -0500 >> Cc: 25288@debbugs.gnu.org >> >> Is there a way to recognize incomplete decoding from lisp? I can't see >> any. > > If you know the encoding of the byte stream (and term.el must, since > it evidently decodes it later on), then you could probably use > char-charset, after decoding: if you get 'eight-bit, then you've got > incomplete byte sequence. But I didn't try that. That should work at least for encodings like utf-8 for which undecoded bytes are not ascii. I guess parsing of escape codes would only work on such encodings anyway, so it should be fine. Patch attached. --=-=-= Content-Type: text/plain Content-Disposition: attachment; filename=v1-0001-Handle-multibyte-chars-spanning-chunks-in-term.el.patch Content-Description: patch >From 6b052065c60406df5b4cd54f698f78594a010922 Mon Sep 17 00:00:00 2001 From: Noam Postavsky Date: Wed, 28 Dec 2016 20:13:20 -0500 Subject: [PATCH v1] Handle multibyte chars spanning chunks in term.el * lisp/term.el (term-terminal-undecoded-bytes): New variable. (term-mode): Make it buffer local. Don't make `term-terminal-parameter' buffer-local twice. (term-emulate-terminal): Check for bytes of incompletely decoded characters, and save them until the next call when they can be fully decoded (Bug#25288). --- lisp/term.el | 39 +++++++++++++++++++++++++++++++-------- 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/lisp/term.el b/lisp/term.el index d3d6390..696e39f 100644 --- a/lisp/term.el +++ b/lisp/term.el @@ -341,6 +341,7 @@ (defconst term-protocol-version "0.96") (eval-when-compile (require 'ange-ftp)) +(eval-when-compile (require 'cl-lib)) (require 'ring) (require 'ehelp) @@ -404,6 +405,7 @@ term-terminal-state (defvar term-kill-echo-list nil "A queue of strings whose echo we want suppressed.") (defvar term-terminal-parameter) +(defvar term-terminal-undecoded-bytes nil) (defvar term-terminal-previous-parameter) (defvar term-current-face 'term) (defvar term-scroll-start 0 "Top-most line (inclusive) of scrolling region.") @@ -1015,7 +1017,6 @@ term-mode ;; These local variables are set to their local values: (make-local-variable 'term-saved-home-marker) - (make-local-variable 'term-terminal-parameter) (make-local-variable 'term-saved-cursor) (make-local-variable 'term-prompt-regexp) (make-local-variable 'term-input-ring-size) @@ -1052,6 +1053,7 @@ term-mode (make-local-variable 'term-ansi-current-invisible) (make-local-variable 'term-terminal-parameter) + (make-local-variable 'term-terminal-undecoded-bytes) (make-local-variable 'term-terminal-previous-parameter) (make-local-variable 'term-terminal-previous-parameter-2) (make-local-variable 'term-terminal-previous-parameter-3) @@ -2748,6 +2750,10 @@ term-emulate-terminal (when term-log-buffer (princ str term-log-buffer)) + (when term-terminal-undecoded-bytes + (setq str (concat term-terminal-undecoded-bytes str)) + (setq str-length (length str)) + (setq term-terminal-undecoded-bytes nil)) (cond ((eq term-terminal-state 4) ;; Have saved pending output. (setq str (concat term-terminal-parameter str)) (setq term-terminal-parameter nil) @@ -2763,13 +2769,6 @@ term-emulate-terminal str i)) (when (not funny) (setq funny str-length)) (cond ((> funny i) - ;; Decode the string before counting - ;; characters, to avoid garbling of certain - ;; multibyte characters (bug#1006). - (setq decoded-substring - (decode-coding-string - (substring str i funny) - locale-coding-system)) (cond ((eq term-terminal-state 1) ;; We are in state 1, we need to wrap ;; around. Go to the beginning of @@ -2778,7 +2777,31 @@ term-emulate-terminal (term-down 1 t) (term-move-columns (- (term-current-column))) (setq term-terminal-state 0))) + ;; Decode the string before counting + ;; characters, to avoid garbling of certain + ;; multibyte characters (bug#1006). + (setq decoded-substring + (decode-coding-string + (substring str i funny) + locale-coding-system)) (setq count (length decoded-substring)) + ;; Check for multibyte characters that ends + ;; before end of string, and save it for + ;; next time. + (when (= funny str-length) + (let ((partial 0)) + (while (eq (char-charset (aref decoded-substring + (- count 1 partial))) + 'eight-bit) + (cl-incf partial)) + (when (> partial 0) + (setq term-terminal-undecoded-bytes + (substring decoded-substring (- partial))) + (setq decoded-substring + (substring decoded-substring 0 (- partial))) + (cl-decf str-length partial) + (cl-decf count partial) + (cl-decf funny partial)))) (setq temp (- (+ (term-horizontal-column) count) term-width)) (cond ((or term-suppress-hard-newline (<= temp 0))) -- 2.9.3 --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 29 11:06:45 2016 Received: (at 25288) by debbugs.gnu.org; 29 Dec 2016 16:06:45 +0000 Received: from localhost ([127.0.0.1]:59276 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMdE5-0005rt-L5 for submit@debbugs.gnu.org; Thu, 29 Dec 2016 11:06:45 -0500 Received: from eggs.gnu.org ([208.118.235.92]:58815) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMdE3-0005rh-He for 25288@debbugs.gnu.org; Thu, 29 Dec 2016 11:06:43 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cMdDx-0001ub-GY for 25288@debbugs.gnu.org; Thu, 29 Dec 2016 11:06:38 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_40,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:54854) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cMdDr-0001tn-Mq; Thu, 29 Dec 2016 11:06:31 -0500 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1270 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1cMdDr-0004aS-0g; Thu, 29 Dec 2016 11:06:31 -0500 Date: Thu, 29 Dec 2016 18:06:27 +0200 Message-Id: <834m1mvhx8.fsf@gnu.org> From: Eli Zaretskii To: npostavs@users.sourceforge.net In-reply-to: <87inq38nq8.fsf@users.sourceforge.net> (npostavs@users.sourceforge.net) Subject: Re: bug#25288: 25.1; term, ansi-term, broken output of utf8 text References: <87r34r98ex.fsf@users.sourceforge.net> <83h95nvojh.fsf@gnu.org> <87inq38nq8.fsf@users.sourceforge.net> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -8.2 (--------) X-Debbugs-Envelope-To: 25288 Cc: 25288@debbugs.gnu.org, fvamail@gmail.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -8.2 (--------) > From: npostavs@users.sourceforge.net > Cc: 25288@debbugs.gnu.org, fvamail@gmail.com > Date: Wed, 28 Dec 2016 21:37:19 -0500 > > > If you know the encoding of the byte stream (and term.el must, since > > it evidently decodes it later on), then you could probably use > > char-charset, after decoding: if you get 'eight-bit, then you've got > > incomplete byte sequence. But I didn't try that. > > That should work at least for encodings like utf-8 for which undecoded > bytes are not ascii. I guess parsing of escape codes would only work on > such encodings anyway, so it should be fine. Patch attached. LGTM, thanks. From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 03 09:05:03 2017 Received: (at 25288) by debbugs.gnu.org; 3 Jan 2017 14:05:03 +0000 Received: from localhost ([127.0.0.1]:40076 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cOPi3-0006fU-5R for submit@debbugs.gnu.org; Tue, 03 Jan 2017 09:05:03 -0500 Received: from mail-io0-f194.google.com ([209.85.223.194]:33846) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cOPi1-0006eq-1y; Tue, 03 Jan 2017 09:05:01 -0500 Received: by mail-io0-f194.google.com with SMTP id n85so30509208ioi.1; Tue, 03 Jan 2017 06:05:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=Vyjbyk21/S1KdZRNwC/WX2aNJJMjFDAyS0nv/4gEQRU=; b=iM/qL7tpp8TQJ6WbNekR7Qhw0vOX8g18HkMu0yD/wNZTpT/icPC0MX0UKtmoooG/M9 Jvhu1AzcUv8FtWhl/Ff7QD0/r0umwlQ8VmV8B9UMetfmrXz4C2y5vWNFTVK+rAJtOKYn ltdqA/OESiTkQ7IkegtRoty6LK+utRp4b/NJ4Py8cxeRg0A/1QxwZdUtydqF4HQd+05D 929nBtc6qF1HsXiCiVBKRzHTq9XlcO82ccFp4CvLqCGZbaJQIxZ+koZCJTfWtbeQUkN4 wJrp7yaHAbguuR06hpHvra07gDNjlXgqkjgh3V4J5l/YNnSi1Tf0gtIC3ahA+oRxY4Xd 7XJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:references:date :in-reply-to:message-id:user-agent:mime-version; bh=Vyjbyk21/S1KdZRNwC/WX2aNJJMjFDAyS0nv/4gEQRU=; b=mvIJpT+H0DuwkPfXuMLMO95Cqt0vlofLYM9XhOVhBbXj7aad/qcmkoys49m+0Eyx64 ti+6gz+Hdog1Gy6jdTLj26EQc4mvijv56ieJhSW74qLa3QA49ojYdQnz398eMsC2uIJL BpiP275EPXBNpM00OBrKLVhVkTMaxkYb6PohiaP4pJS9avkfNpUpTive3zKTde3RuA+p 2231am/lvhCmctX42N31U0Jk82w3rYvnKuXfhZLqr/MY3uD8mJykKgdwTFEI2CPaSD8Z f1B9INE3cbkLkGA7op79i0CGlUXcpx9sGEiYDYkb3XxJ3BJYJhP5cv7dBF7iGDlVKHPQ PGlw== X-Gm-Message-State: AIkVDXL1OXbk0dlsVjDVrJ9+FPe1N6Y+OK4pBh8/I702sW/pyBgL/DPuuh4GdrPbwclKDA== X-Received: by 10.107.26.15 with SMTP id a15mr44965496ioa.103.1483452295277; Tue, 03 Jan 2017 06:04:55 -0800 (PST) Received: from zony ([45.2.7.65]) by smtp.googlemail.com with ESMTPSA id n134sm19298138itg.19.2017.01.03.06.04.53 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 03 Jan 2017 06:04:54 -0800 (PST) From: npostavs@users.sourceforge.net To: Eli Zaretskii Subject: Re: bug#25288: 25.1; term, ansi-term, broken output of utf8 text References: <87r34r98ex.fsf@users.sourceforge.net> <83h95nvojh.fsf@gnu.org> <87inq38nq8.fsf@users.sourceforge.net> <834m1mvhx8.fsf@gnu.org> Date: Tue, 03 Jan 2017 09:05:57 -0500 In-Reply-To: <834m1mvhx8.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 29 Dec 2016 18:06:27 +0200") Message-ID: <87ful05jcq.fsf@users.sourceforge.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.6 (/) X-Debbugs-Envelope-To: 25288 Cc: 25288@debbugs.gnu.org, fvamail@gmail.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.6 (/) tags 25288 fixed close 25288 26.1 quit Eli Zaretskii writes: >> From: npostavs@users.sourceforge.net >> Cc: 25288@debbugs.gnu.org, fvamail@gmail.com >> Date: Wed, 28 Dec 2016 21:37:19 -0500 >> >> > If you know the encoding of the byte stream (and term.el must, since >> > it evidently decodes it later on), then you could probably use >> > char-charset, after decoding: if you get 'eight-bit, then you've got >> > incomplete byte sequence. But I didn't try that. >> >> That should work at least for encodings like utf-8 for which undecoded >> bytes are not ascii. I guess parsing of escape codes would only work on >> such encodings anyway, so it should be fine. Patch attached. > > LGTM, thanks. Pushed as 134e86b360ca. From unknown Thu Aug 14 12:25:25 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 01 Feb 2017 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator