From unknown Tue Jun 17 22:16:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#77410: term.el sometimes prints undecoded multibyte UTF-8 chars Resent-From: Stephane Zermatten Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 31 Mar 2025 17:46:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 77410 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: 77410@debbugs.gnu.org Cc: szermatt@gmail.com X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.174344311718238 (code B ref -1); Mon, 31 Mar 2025 17:46:02 +0000 Received: (at submit) by debbugs.gnu.org; 31 Mar 2025 17:45:17 +0000 Received: from localhost ([127.0.0.1]:42745 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tzJCG-0004jt-6F for submit@debbugs.gnu.org; Mon, 31 Mar 2025 13:45:17 -0400 Received: from lists.gnu.org ([2001:470:142::17]:51120) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1tzFyW-0002Od-Po for submit@debbugs.gnu.org; Mon, 31 Mar 2025 10:18:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tzFyP-0006KW-0m for bug-gnu-emacs@gnu.org; Mon, 31 Mar 2025 10:18:46 -0400 Received: from mail-wm1-x334.google.com ([2a00:1450:4864:20::334]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tzFyM-0005lB-IO for bug-gnu-emacs@gnu.org; Mon, 31 Mar 2025 10:18:44 -0400 Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-43ea40a6e98so4136245e9.1 for ; Mon, 31 Mar 2025 07:18:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743430720; x=1744035520; darn=gnu.org; h=mime-version:message-id:date:cc:subject:to:from:sender:from:to:cc :subject:date:message-id:reply-to; bh=LJCwQxRVD3ss4lriMEK40lWXBYNmRdnxnDLfdF/0gzs=; b=OGI1q+13vnESkhu7uh+gcsGnZwNHppQ8pllEGmkZMJL15T1uI/3jmSBiOWBOatvNMY EWK++U/sUxDIgsIPkxYRig3JGP9UDYNiYrFJ6g3ne7zxcUujyhzFZz6kVRYNKwnzJx2Q mZcPJSFG25FGTPaqArUsc1QevUQdZ1CEdRdWLo8x/s8apC7capl0BPUEop5Am+/Kmany 1gLXiBg6n7guCmMtne8jlRKye+n1NzKAfKlX9i2LOJqygVpEwxODFK2DuOYpDlQ3BVcR O3y7heVu+VJcm8pRcfyKoTXfLVe5mwt81EcyVma8lUEee56M0vZG703Cy9hW6ToH2lLV /qAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743430720; x=1744035520; h=mime-version:message-id:date:cc:subject:to:from:sender :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LJCwQxRVD3ss4lriMEK40lWXBYNmRdnxnDLfdF/0gzs=; b=GfAUbtxsLgaQUqO+qVwDQGs4WSPERMPVSEL4FHizH+u+Ykl5jHICWlYMIgmc2ph7DW Uh5SeZBaE2+t9HuLFIOV142o/xQ0hDU2AuF2pEMQHMT5xETw1hPU5PPwFS/FzH4Ktu8n BdCbR3wrYj91zOjaWxcpkyXmQjAHQ+9OptjB5TxzThGxQ5EynZBQZmX/gH0tvK+g+/Gf kYbKmWEYA03emEuUoTPkJsm9A+cdxfaBh8+WSUpzu/XTiOXi5qQxuqAMvKQ36Hi5b4UR VtcbmyAQhuGyuLqJKH2aPxoO1LqTuaDD3h/qYa5MDbWrD/6A7/gfPYd2lPffZp/4gFb5 Swxw== X-Gm-Message-State: AOJu0Yxs/CKzc1bw5strkcVfNvko6oKLtlHvQcZd52PM0ljyr+Fhi6EU 66vEL7heGZUym5GJas5E3p75kBiYdhWmxyVDibtE4R0hVQKIBmCkqWqhoiqY X-Gm-Gg: ASbGncuvMjmKnAVikp+vV2Wf7TQd7Zg0ymOORH+aE79me5GAURmf4uF1Piqo+KRmoyX 0keNGjNPjsTlLpQZoM0ZS9zIV3ftGOJPJM+9Bo/7Qb+6glakbHmpUYpTerXrKcIT1djZGqXtuha 4e1XeQCKEDK7Y5ZU+aXCaKzNXJYNMKQ9HB7FYvpaM3Gl789uP/0DpUdkPrqqaRzj3k8vSW5HsFd +T42rRIPjG2yviBBVlEnDBidKbgIQU6gNXqp3bL04dpCUEJOL1rYLi5IoprtFneaYpZmCZEVmTJ nuQFpe+xNckJQguatIlpT/rgLapq7Ng0purY39MAqsFDUKN9V+p+GJKhCFFk X-Google-Smtp-Source: AGHT+IH2r0vurB18Xl9WLYVT+F/bcFedgdUzE7wlG+eEpi/OHIrJt3jq+M14kNzhS+9L51QgoppgkQ== X-Received: by 2002:a05:600c:699b:b0:43c:e305:6d50 with SMTP id 5b1f17b1804b1-43db62c034bmr86446655e9.24.1743430719888; Mon, 31 Mar 2025 07:18:39 -0700 (PDT) Received: from boomer.zia ([62.74.15.163]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-39c0b79e082sm11610488f8f.69.2025.03.31.07.18.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Mar 2025 07:18:39 -0700 (PDT) From: Stephane Zermatten Date: Mon, 31 Mar 2025 17:18:35 +0300 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Received-SPF: pass client-ip=2a00:1450:4864:20::334; envelope-from=szermatt@gmail.com; helo=mail-wm1-x334.google.com X-Spam_score_int: -19 X-Spam_score: -2.0 X-Spam_bar: -- X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.0 (+) X-Mailman-Approved-At: Mon, 31 Mar 2025 13:45:14 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Tags: patch If I run a shell in a terminal with M-x term, with a very unicode-heavy prompt (fish 3.6 + tide), sometimes the Unicode characters are printed undecoded. One possible cause of this might be unfortunate chunking in the middle of a character, which the attached patch fixes. Without the patch, if I type this in M-x term /usr/bin/bash for j in $(seq 0 3); do for i in $(seq 0 30); do printf '\xf0\x9f'; sleep 0.1; printf '\x98\x80'; done; echo; done I get \360\237\203\022\360\... Instead of: =F0=9F=98=80=F0=9F=98=80=F0=9F=98=80=F0=9F=98=80=F0=9F=98=80=F0=9F=98=80= =F0=9F=98=80... With the patch included, I get the correct output. The issue comes from an incorrect check (> count partial 0), which should really be (and (>=3D count partial) (> partial 0)), but I simplified that to (> partial 0) in the patch, because the while loop guarantees (>=3D count partial). I rewrote the existing test to cover this case, and try out multiple different combination of chunks. I'm still looking into other causes of the issue, but this, at least, seems like an easy fix. In GNU Emacs 30.1 (build 2, x86_64-apple-darwin23.6.0, NS appkit-2487.70 Version 14.7.4 (Build 23H420)) of 2025-03-24 built on boomer.zia Windowing system distributor 'Apple', version 10.3.2487 System Description: macOS 14.7.4 Configured using: 'configure --disable-dependency-tracking --disable-silent-rules --enable-locallisppath=3D/usr/local/share/emacs/site-lisp --infodir=3D/usr/local/Cellar/emacs-plus@30/30.1/share/info/emacs --prefix=3D/usr/local/Cellar/emacs-plus@30/30.1 --with-native-compilation=3Daot --with-xml2 --with-gnutls --without-compress-install --without-dbus --without-imagemagick --with-modules --with-rsvg --with-webp --with-ns --disable-ns-self-contained 'CFLAGS=3D-O2 -DFD_SETSIZE=3D10000 -DDARWIN_UNLIMITED_SELECT -I/usr/local/opt/sqlite/include -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include' 'LDFLAGS=3D-L/usr/local/opt/sqlite/lib -L/usr/local/lib/gcc/14 -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include'' --=-=-= Content-Type: text/patch; charset=utf-8 Content-Disposition: attachment; filename=0001-Fix-issue-with-very-short-multibyte-character-chunk.patch Content-Transfer-Encoding: quoted-printable >From 2bb6cec8f4f72009bcde1edab367f90ab82e5e2a Mon Sep 17 00:00:00 2001 From: Stephane Zermatten Date: Mon, 31 Mar 2025 16:41:08 +0300 Subject: [PATCH] Fix issue with very short multibyte character chunk. Before this change, a chunk containing only a part of a multibyte character would be discarded and displayed undecoded on the terminal. * lisp/term.el --- lisp/term.el | 2 +- test/lisp/term-tests.el | 15 ++++++++------- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/lisp/term.el b/lisp/term.el index 862103d88e6..a971300c055 100644 --- a/lisp/term.el +++ b/lisp/term.el @@ -3116,7 +3116,7 @@ term-emulate-terminal (- count 1 parti= al))) 'eight-bit)) (incf partial)) - (when (> count partial 0) + (when (> partial 0) (setq term-terminal-undecoded-bytes (substring decoded-substring (- partial))) (setq decoded-substring diff --git a/test/lisp/term-tests.el b/test/lisp/term-tests.el index 5ef8c1174df..aad84e171b2 100644 --- a/test/lisp/term-tests.el +++ b/test/lisp/term-tests.el @@ -402,13 +402,14 @@ term-to-margin (ert-deftest term-decode-partial () ;; Bug#25288. "Test multibyte characters sent into multiple chunks." ;; Set `locale-coding-system' so test will be deterministic. - (let* ((locale-coding-system 'utf-8-unix) - (string (make-string 7 ?=D1=88)) - (bytes (encode-coding-string string locale-coding-system))) - (should (equal string - (term-test-screen-from-input - 40 1 `(,(substring bytes 0 (/ (length bytes) 2)) - ,(substring bytes (/ (length bytes) 2)))))))) + (let ((locale-coding-system 'utf-8-unix)) + (should (equal "=D1=88=D1=88=D1=88" (term-test-screen-from-input + 40 1 '("\321" "\210\321\210\321\210")))) + (should (equal "=D1=88=D1=88=D1=88" (term-test-screen-from-input + 40 1 '("\321\210\321" "\210\321\210")))) + (should (equal "=D1=88=D1=88=D1=88" (term-test-screen-from-input + 40 1 '("\321\210\321\210\321" "\210")))))) + (ert-deftest term-undecodable-input () ;; Bug#29918. "Undecodable bytes should be passed through without error." (let* ((locale-coding-system 'utf-8-unix) ; As above. --=20 2.47.0 --=-=-=-- From unknown Tue Jun 17 22:16:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#77410: term.el sometimes prints undecoded multibyte UTF-8 chars References: In-Reply-To: Resent-From: Stephane Zermatten Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 02 Apr 2025 12:19:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 77410 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: 77410@debbugs.gnu.org Received: via spool by 77410-submit@debbugs.gnu.org id=B77410.17435963059338 (code B ref 77410); Wed, 02 Apr 2025 12:19:01 +0000 Received: (at 77410) by debbugs.gnu.org; 2 Apr 2025 12:18:25 +0000 Received: from localhost ([127.0.0.1]:55861 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tzx33-0002QT-0h for submit@debbugs.gnu.org; Wed, 02 Apr 2025 08:18:25 -0400 Received: from mail-ed1-f44.google.com ([209.85.208.44]:47386) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1tzx30-0002PV-J4 for 77410@debbugs.gnu.org; Wed, 02 Apr 2025 08:18:23 -0400 Received: by mail-ed1-f44.google.com with SMTP id 4fb4d7f45d1cf-5edc07c777eso7015128a12.3 for <77410@debbugs.gnu.org>; Wed, 02 Apr 2025 05:18:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743596295; x=1744201095; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=BmabwQXFIy3o1ZC+EQakiTvLf+0k82RDnw/QXrrDRy8=; b=IripONkvfIRDfHjJH9ZjxUoKTImPobA5XDCGxbHP7DRcPlkEUPANiVXChN7AQvwcGh AkQhYzvRWtE0/CxZJtEkQBpsMZe3oLNOCl3J927tKozBQj+XaOR7/swkNoQx9XEyjvHE aL+USMvnItF6MjLS2XWSa3rU8TPRQy7s2nNscJFxHlPTgJ9XoszqzQVMd9pEa2JdlJt3 1z1vJBc8/CC5eb4xBe6PV4CpJqNfORXaudD9JRdtncUOaaThKVKFhvprmjIcRCw9RSCD y07SnaiIv7wHpNx+ESoO7Wtl2DpmTTTIJblSlL37FlrCrqAt5mHgokW6nYAusOfL5zI4 NrxA== X-Gm-Message-State: AOJu0YyV04zyhdJ3SsnJuwe2BvLljC8qTptJhuzvoL2wzFNyQITjQ3Cu B5JvXorCDP+XCp2YLeRgDhurJjnN5da1+xw1eDCVwm5QqzRaYUYSzVTh25BBlCfJKXMdj7C8IIL x736nkxc+a5CAjteZ8aHXsQHduFJQ+0RH X-Gm-Gg: ASbGncvFxuFgmRF1gru1gjudNVmx76kcv9fDcdFKvZeQt0l9Jb0MUnRRDYXnb1DTzaO 0VynEknucNPkJryPgZOsgrj/S5IWNA9mS//7Z+N1NVDR4JpAMn5AAxs6GeJdGFZp87d68fMRn8W n6rEyKLM53QKzcCeQCKB/MiS8= X-Google-Smtp-Source: AGHT+IEwnLqDZSX+IDaMbLkZsX4M33DivhMSXgD6hL3GWg4obfGaC69cs1y0Fzvh6c8Lr9p/qhpB8yBr2VD7PqBSvcA= X-Received: by 2002:a05:6402:5207:b0:5e0:3584:28d2 with SMTP id 4fb4d7f45d1cf-5f04eaa6674mr2000113a12.2.1743596295417; Wed, 02 Apr 2025 05:18:15 -0700 (PDT) MIME-Version: 1.0 From: Stephane Zermatten Date: Wed, 2 Apr 2025 15:18:03 +0300 X-Gm-Features: AQ5f1Jr5y-DPPbLmEt-sCVbZKVHZaWIURrkdiUKIqey9olP03QvOdcJYTt_ES7o Message-ID: Content-Type: multipart/mixed; boundary="000000000000d225050631caa519" X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --000000000000d225050631caa519 Content-Type: multipart/alternative; boundary="000000000000d225040631caa517" --000000000000d225040631caa517 Content-Type: text/plain; charset="UTF-8" Update: The new version of the patch attached to this e-mail fixes the test term-undecodable-input. Some decodable input is necessary when sending undecodable input for term to flush the buffer it uses to keep undecoded multibyte characters. It seems that term-undecodable-input was relying on the issue fixed by this bug. --000000000000d225040631caa517 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Update: The new version of the patch attached to this= e-mail fixes the test term-undecodable-input.

Some decodable input is necessary when sending undecodable input for term= to flush the buffer it uses to keep undecoded multibyte characters. It see= ms that term-undecodable-input was relying on the issue fixed by this bug. =
--000000000000d225040631caa517-- --000000000000d225050631caa519 Content-Type: application/octet-stream; name="0001-Fix-issue-with-very-short-multibyte-character-chunk.patch" Content-Disposition: attachment; filename="0001-Fix-issue-with-very-short-multibyte-character-chunk.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_m8zw3ix10 RnJvbSA2MzEwMjQzMDMyY2ZlYmQ0YWY2NWFjM2ViMzQ0OTEzYmJmZDkyZTY0IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBTdGVwaGFuZSBaZXJtYXR0ZW4gPHN6ZXJtYXR0QGdteC5uZXQ+ CkRhdGU6IE1vbiwgMzEgTWFyIDIwMjUgMTY6NDE6MDggKzAzMDAKU3ViamVjdDogW1BBVENIXSBG aXggaXNzdWUgd2l0aCB2ZXJ5IHNob3J0IG11bHRpYnl0ZSBjaGFyYWN0ZXIgY2h1bmsuCgpCZWZv cmUgdGhpcyBjaGFuZ2UsIGEgY2h1bmsgY29udGFpbmluZyBvbmx5IGEgcGFydApvZiBhIG11bHRp Ynl0ZSBjaGFyYWN0ZXIgd291bGQgYmUgZGlzY2FyZGVkIGFuZApkaXNwbGF5ZWQgdW5kZWNvZGVk IG9uIHRoZSB0ZXJtaW5hbC4KCiogbGlzcC90ZXJtLmVsCi0tLQogbGlzcC90ZXJtLmVsICAgICAg ICAgICAgfCAgMiArLQogdGVzdC9saXNwL3Rlcm0tdGVzdHMuZWwgfCAxNyArKysrKysrKystLS0t LS0tLQogMiBmaWxlcyBjaGFuZ2VkLCAxMCBpbnNlcnRpb25zKCspLCA5IGRlbGV0aW9ucygtKQoK ZGlmZiAtLWdpdCBhL2xpc3AvdGVybS5lbCBiL2xpc3AvdGVybS5lbAppbmRleCA4NjIxMDNkODhl Ni4uYTk3MTMwMGMwNTUgMTAwNjQ0Ci0tLSBhL2xpc3AvdGVybS5lbAorKysgYi9saXNwL3Rlcm0u ZWwKQEAgLTMxMTYsNyArMzExNiw3IEBAIHRlcm0tZW11bGF0ZS10ZXJtaW5hbAogICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICgtIGNvdW50 IDEgcGFydGlhbCkpKQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAnZWln aHQtYml0KSkKICAgICAgICAgICAgICAgICAgICAgICAgIChpbmNmIHBhcnRpYWwpKQotICAgICAg ICAgICAgICAgICAgICAgICh3aGVuICg+IGNvdW50IHBhcnRpYWwgMCkKKyAgICAgICAgICAgICAg ICAgICAgICAod2hlbiAoPiBwYXJ0aWFsIDApCiAgICAgICAgICAgICAgICAgICAgICAgICAoc2V0 cSB0ZXJtLXRlcm1pbmFsLXVuZGVjb2RlZC1ieXRlcwogICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgKHN1YnN0cmluZyBkZWNvZGVkLXN1YnN0cmluZyAoLSBwYXJ0aWFsKSkpCiAgICAgICAg ICAgICAgICAgICAgICAgICAoc2V0cSBkZWNvZGVkLXN1YnN0cmluZwpkaWZmIC0tZ2l0IGEvdGVz dC9saXNwL3Rlcm0tdGVzdHMuZWwgYi90ZXN0L2xpc3AvdGVybS10ZXN0cy5lbAppbmRleCA1ZWY4 YzExNzRkZi4uZmZiMzQxZjNiNTIgMTAwNjQ0Ci0tLSBhL3Rlc3QvbGlzcC90ZXJtLXRlc3RzLmVs CisrKyBiL3Rlc3QvbGlzcC90ZXJtLXRlc3RzLmVsCkBAIC00MDIsMTcgKzQwMiwxOCBAQCB0ZXJt LXRvLW1hcmdpbgogKGVydC1kZWZ0ZXN0IHRlcm0tZGVjb2RlLXBhcnRpYWwgKCkgOzsgQnVnIzI1 Mjg4LgogICAiVGVzdCBtdWx0aWJ5dGUgY2hhcmFjdGVycyBzZW50IGludG8gbXVsdGlwbGUgY2h1 bmtzLiIKICAgOzsgU2V0IGBsb2NhbGUtY29kaW5nLXN5c3RlbScgc28gdGVzdCB3aWxsIGJlIGRl dGVybWluaXN0aWMuCi0gIChsZXQqICgobG9jYWxlLWNvZGluZy1zeXN0ZW0gJ3V0Zi04LXVuaXgp Ci0gICAgICAgICAoc3RyaW5nIChtYWtlLXN0cmluZyA3ID/RiCkpCi0gICAgICAgICAoYnl0ZXMg KGVuY29kZS1jb2Rpbmctc3RyaW5nIHN0cmluZyBsb2NhbGUtY29kaW5nLXN5c3RlbSkpKQotICAg IChzaG91bGQgKGVxdWFsIHN0cmluZwotICAgICAgICAgICAgICAgICAgICh0ZXJtLXRlc3Qtc2Ny ZWVuLWZyb20taW5wdXQKLSAgICAgICAgICAgICAgICAgICAgNDAgMSBgKCwoc3Vic3RyaW5nIGJ5 dGVzIDAgKC8gKGxlbmd0aCBieXRlcykgMikpCi0gICAgICAgICAgICAgICAgICAgICAgICAgICAs KHN1YnN0cmluZyBieXRlcyAoLyAobGVuZ3RoIGJ5dGVzKSAyKSkpKSkpKSkKKyAgKGxldCAoKGxv Y2FsZS1jb2Rpbmctc3lzdGVtICd1dGYtOC11bml4KSkKKyAgICAoc2hvdWxkIChlcXVhbCAi0YjR iNGIIiAodGVybS10ZXN0LXNjcmVlbi1mcm9tLWlucHV0CisgICAgICAgICAgICAgICAgICAgICAg ICAgIDQwIDEgJygiXDMyMSIgIlwyMTBcMzIxXDIxMFwzMjFcMjEwIikpKSkKKyAgICAoc2hvdWxk IChlcXVhbCAi0YjRiNGIIiAodGVybS10ZXN0LXNjcmVlbi1mcm9tLWlucHV0CisgICAgICAgICAg ICAgICAgICAgICAgICAgIDQwIDEgJygiXDMyMVwyMTBcMzIxIiAiXDIxMFwzMjFcMjEwIikpKSkK KyAgICAoc2hvdWxkIChlcXVhbCAi0YjRiNGIIiAodGVybS10ZXN0LXNjcmVlbi1mcm9tLWlucHV0 CisgICAgICAgICAgICAgICAgICAgICAgICAgIDQwIDEgJygiXDMyMVwyMTBcMzIxXDIxMFwzMjEi ICJcMjEwIikpKSkpKQorCiAoZXJ0LWRlZnRlc3QgdGVybS11bmRlY29kYWJsZS1pbnB1dCAoKSA7 OyBCdWcjMjk5MTguCiAgICJVbmRlY29kYWJsZSBieXRlcyBzaG91bGQgYmUgcGFzc2VkIHRocm91 Z2ggd2l0aG91dCBlcnJvci4iCiAgIChsZXQqICgobG9jYWxlLWNvZGluZy1zeXN0ZW0gJ3V0Zi04 LXVuaXgpIDsgQXMgYWJvdmUuCi0gICAgICAgICAoYnl0ZXMgIlwzNzZcMzQwXDM2MFwzNzAiKQor ICAgICAgICAgKGJ5dGVzICJcMzc2XDM0MFwzNjBcMzcwLiIpCiAgICAgICAgICAoc3RyaW5nIChk ZWNvZGUtY29kaW5nLXN0cmluZyBieXRlcyBsb2NhbGUtY29kaW5nLXN5c3RlbSkpKQogICAgIChz aG91bGQgKGVxdWFsIHN0cmluZwogICAgICAgICAgICAgICAgICAgICh0ZXJtLXRlc3Qtc2NyZWVu LWZyb20taW5wdXQKLS0gCjIuNDcuMAoK --000000000000d225050631caa519-- From unknown Tue Jun 17 22:16:43 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Stephane Zermatten Subject: bug#77410: closed (Re: bug#77410: term.el sometimes prints undecoded multibyte UTF-8 chars) Message-ID: References: <86ecxwlbzr.fsf@gnu.org> X-Gnu-PR-Message: they-closed 77410 X-Gnu-PR-Package: emacs X-Gnu-PR-Keywords: patch Reply-To: 77410@debbugs.gnu.org Date: Sun, 13 Apr 2025 08:09:04 +0000 Content-Type: multipart/mixed; boundary="----------=_1744531744-12600-1" This is a multi-part message in MIME format... ------------=_1744531744-12600-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #77410: term.el sometimes prints undecoded multibyte UTF-8 chars which was filed against the emacs package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 77410@debbugs.gnu.org. --=20 77410: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D77410 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1744531744-12600-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 77410-done) by debbugs.gnu.org; 13 Apr 2025 08:08:55 +0000 Received: from localhost ([127.0.0.1]:37979 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1u3sOa-0003FV-9t for submit@debbugs.gnu.org; Sun, 13 Apr 2025 04:08:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:48922) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1u3sOX-0003EC-U4 for 77410-done@debbugs.gnu.org; Sun, 13 Apr 2025 04:08:50 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1u3sOS-0007wN-6c; Sun, 13 Apr 2025 04:08:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=LeVrUwmhIkMk2cQYSuuQ3gq3E0RZz7Cxi20o9dDdaE0=; b=BYDI76adH5jh AwUU9JrZbpo/cAvICaEvBM+wMhf8vIMEtmVd9/TU9SOhxzQvYwTBl8+hDL6L5ESY8QuKl99yyuds3 Xw7I5yzfNk4zNm4AMCnZaQ1j2GxUZ5+PhW+0n80lKvLdli0epRWpF4D2qTs9icwVlAl8pH6bXtvnn G6jfhzyPbA+9owPxCK+0XKpsmenQpYgsfVGTdocNuIsGKeAwFYuC3OLeS+XPuQRmb8HmNRIofPzi/ u0uymUllNhZXbmLzLzu5+Ool71hYg3g5uLGfg4GvPsY+mrLr7o5oyE30WPzD3K+zglVBofwAph7cd xVLejHRE3XO5R5UmMobaUg==; Date: Sun, 13 Apr 2025 11:08:40 +0300 Message-Id: <86ecxwlbzr.fsf@gnu.org> From: Eli Zaretskii To: Stephane Zermatten In-Reply-To: (bug-gnu-emacs@gnu.org) Subject: Re: bug#77410: term.el sometimes prints undecoded multibyte UTF-8 chars References: X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 77410-done Cc: 77410-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Wed, 2 Apr 2025 15:18:03 +0300 > From: Stephane Zermatten via "Bug reports for GNU Emacs, > the Swiss army knife of text editors" > > Update: The new version of the patch attached to this e-mail fixes the test term-undecodable-input. > > Some decodable input is necessary when sending undecodable input for term to flush the buffer it uses to > keep undecoded multibyte characters. It seems that term-undecodable-input was relying on the issue fixed by > this bug. Thanks, installed on the master branch, and closing the bug. ------------=_1744531744-12600-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 31 Mar 2025 17:45:17 +0000 Received: from localhost ([127.0.0.1]:42745 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tzJCG-0004jt-6F for submit@debbugs.gnu.org; Mon, 31 Mar 2025 13:45:17 -0400 Received: from lists.gnu.org ([2001:470:142::17]:51120) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1tzFyW-0002Od-Po for submit@debbugs.gnu.org; Mon, 31 Mar 2025 10:18:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tzFyP-0006KW-0m for bug-gnu-emacs@gnu.org; Mon, 31 Mar 2025 10:18:46 -0400 Received: from mail-wm1-x334.google.com ([2a00:1450:4864:20::334]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tzFyM-0005lB-IO for bug-gnu-emacs@gnu.org; Mon, 31 Mar 2025 10:18:44 -0400 Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-43ea40a6e98so4136245e9.1 for ; Mon, 31 Mar 2025 07:18:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743430720; x=1744035520; darn=gnu.org; h=mime-version:message-id:date:cc:subject:to:from:sender:from:to:cc :subject:date:message-id:reply-to; bh=LJCwQxRVD3ss4lriMEK40lWXBYNmRdnxnDLfdF/0gzs=; b=OGI1q+13vnESkhu7uh+gcsGnZwNHppQ8pllEGmkZMJL15T1uI/3jmSBiOWBOatvNMY EWK++U/sUxDIgsIPkxYRig3JGP9UDYNiYrFJ6g3ne7zxcUujyhzFZz6kVRYNKwnzJx2Q mZcPJSFG25FGTPaqArUsc1QevUQdZ1CEdRdWLo8x/s8apC7capl0BPUEop5Am+/Kmany 1gLXiBg6n7guCmMtne8jlRKye+n1NzKAfKlX9i2LOJqygVpEwxODFK2DuOYpDlQ3BVcR O3y7heVu+VJcm8pRcfyKoTXfLVe5mwt81EcyVma8lUEee56M0vZG703Cy9hW6ToH2lLV /qAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743430720; x=1744035520; h=mime-version:message-id:date:cc:subject:to:from:sender :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LJCwQxRVD3ss4lriMEK40lWXBYNmRdnxnDLfdF/0gzs=; b=GfAUbtxsLgaQUqO+qVwDQGs4WSPERMPVSEL4FHizH+u+Ykl5jHICWlYMIgmc2ph7DW Uh5SeZBaE2+t9HuLFIOV142o/xQ0hDU2AuF2pEMQHMT5xETw1hPU5PPwFS/FzH4Ktu8n BdCbR3wrYj91zOjaWxcpkyXmQjAHQ+9OptjB5TxzThGxQ5EynZBQZmX/gH0tvK+g+/Gf kYbKmWEYA03emEuUoTPkJsm9A+cdxfaBh8+WSUpzu/XTiOXi5qQxuqAMvKQ36Hi5b4UR VtcbmyAQhuGyuLqJKH2aPxoO1LqTuaDD3h/qYa5MDbWrD/6A7/gfPYd2lPffZp/4gFb5 Swxw== X-Gm-Message-State: AOJu0Yxs/CKzc1bw5strkcVfNvko6oKLtlHvQcZd52PM0ljyr+Fhi6EU 66vEL7heGZUym5GJas5E3p75kBiYdhWmxyVDibtE4R0hVQKIBmCkqWqhoiqY X-Gm-Gg: ASbGncuvMjmKnAVikp+vV2Wf7TQd7Zg0ymOORH+aE79me5GAURmf4uF1Piqo+KRmoyX 0keNGjNPjsTlLpQZoM0ZS9zIV3ftGOJPJM+9Bo/7Qb+6glakbHmpUYpTerXrKcIT1djZGqXtuha 4e1XeQCKEDK7Y5ZU+aXCaKzNXJYNMKQ9HB7FYvpaM3Gl789uP/0DpUdkPrqqaRzj3k8vSW5HsFd +T42rRIPjG2yviBBVlEnDBidKbgIQU6gNXqp3bL04dpCUEJOL1rYLi5IoprtFneaYpZmCZEVmTJ nuQFpe+xNckJQguatIlpT/rgLapq7Ng0purY39MAqsFDUKN9V+p+GJKhCFFk X-Google-Smtp-Source: AGHT+IH2r0vurB18Xl9WLYVT+F/bcFedgdUzE7wlG+eEpi/OHIrJt3jq+M14kNzhS+9L51QgoppgkQ== X-Received: by 2002:a05:600c:699b:b0:43c:e305:6d50 with SMTP id 5b1f17b1804b1-43db62c034bmr86446655e9.24.1743430719888; Mon, 31 Mar 2025 07:18:39 -0700 (PDT) Received: from boomer.zia ([62.74.15.163]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-39c0b79e082sm11610488f8f.69.2025.03.31.07.18.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Mar 2025 07:18:39 -0700 (PDT) From: Stephane Zermatten To: bug-gnu-emacs@gnu.org Subject: term.el sometimes prints undecoded multibyte UTF-8 chars Date: Mon, 31 Mar 2025 17:18:35 +0300 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Received-SPF: pass client-ip=2a00:1450:4864:20::334; envelope-from=szermatt@gmail.com; helo=mail-wm1-x334.google.com X-Spam_score_int: -19 X-Spam_score: -2.0 X-Spam_bar: -- X-Spam_report: (-2.0 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 31 Mar 2025 13:45:14 -0400 Cc: szermatt@gmail.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Tags: patch If I run a shell in a terminal with M-x term, with a very unicode-heavy prompt (fish 3.6 + tide), sometimes the Unicode characters are printed undecoded. One possible cause of this might be unfortunate chunking in the middle of a character, which the attached patch fixes. Without the patch, if I type this in M-x term /usr/bin/bash for j in $(seq 0 3); do for i in $(seq 0 30); do printf '\xf0\x9f'; sleep 0.1; printf '\x98\x80'; done; echo; done I get \360\237\203\022\360\... Instead of: =F0=9F=98=80=F0=9F=98=80=F0=9F=98=80=F0=9F=98=80=F0=9F=98=80=F0=9F=98=80= =F0=9F=98=80... With the patch included, I get the correct output. The issue comes from an incorrect check (> count partial 0), which should really be (and (>=3D count partial) (> partial 0)), but I simplified that to (> partial 0) in the patch, because the while loop guarantees (>=3D count partial). I rewrote the existing test to cover this case, and try out multiple different combination of chunks. I'm still looking into other causes of the issue, but this, at least, seems like an easy fix. In GNU Emacs 30.1 (build 2, x86_64-apple-darwin23.6.0, NS appkit-2487.70 Version 14.7.4 (Build 23H420)) of 2025-03-24 built on boomer.zia Windowing system distributor 'Apple', version 10.3.2487 System Description: macOS 14.7.4 Configured using: 'configure --disable-dependency-tracking --disable-silent-rules --enable-locallisppath=3D/usr/local/share/emacs/site-lisp --infodir=3D/usr/local/Cellar/emacs-plus@30/30.1/share/info/emacs --prefix=3D/usr/local/Cellar/emacs-plus@30/30.1 --with-native-compilation=3Daot --with-xml2 --with-gnutls --without-compress-install --without-dbus --without-imagemagick --with-modules --with-rsvg --with-webp --with-ns --disable-ns-self-contained 'CFLAGS=3D-O2 -DFD_SETSIZE=3D10000 -DDARWIN_UNLIMITED_SELECT -I/usr/local/opt/sqlite/include -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include' 'LDFLAGS=3D-L/usr/local/opt/sqlite/lib -L/usr/local/lib/gcc/14 -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include'' --=-=-= Content-Type: text/patch; charset=utf-8 Content-Disposition: attachment; filename=0001-Fix-issue-with-very-short-multibyte-character-chunk.patch Content-Transfer-Encoding: quoted-printable >From 2bb6cec8f4f72009bcde1edab367f90ab82e5e2a Mon Sep 17 00:00:00 2001 From: Stephane Zermatten Date: Mon, 31 Mar 2025 16:41:08 +0300 Subject: [PATCH] Fix issue with very short multibyte character chunk. Before this change, a chunk containing only a part of a multibyte character would be discarded and displayed undecoded on the terminal. * lisp/term.el --- lisp/term.el | 2 +- test/lisp/term-tests.el | 15 ++++++++------- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/lisp/term.el b/lisp/term.el index 862103d88e6..a971300c055 100644 --- a/lisp/term.el +++ b/lisp/term.el @@ -3116,7 +3116,7 @@ term-emulate-terminal (- count 1 parti= al))) 'eight-bit)) (incf partial)) - (when (> count partial 0) + (when (> partial 0) (setq term-terminal-undecoded-bytes (substring decoded-substring (- partial))) (setq decoded-substring diff --git a/test/lisp/term-tests.el b/test/lisp/term-tests.el index 5ef8c1174df..aad84e171b2 100644 --- a/test/lisp/term-tests.el +++ b/test/lisp/term-tests.el @@ -402,13 +402,14 @@ term-to-margin (ert-deftest term-decode-partial () ;; Bug#25288. "Test multibyte characters sent into multiple chunks." ;; Set `locale-coding-system' so test will be deterministic. - (let* ((locale-coding-system 'utf-8-unix) - (string (make-string 7 ?=D1=88)) - (bytes (encode-coding-string string locale-coding-system))) - (should (equal string - (term-test-screen-from-input - 40 1 `(,(substring bytes 0 (/ (length bytes) 2)) - ,(substring bytes (/ (length bytes) 2)))))))) + (let ((locale-coding-system 'utf-8-unix)) + (should (equal "=D1=88=D1=88=D1=88" (term-test-screen-from-input + 40 1 '("\321" "\210\321\210\321\210")))) + (should (equal "=D1=88=D1=88=D1=88" (term-test-screen-from-input + 40 1 '("\321\210\321" "\210\321\210")))) + (should (equal "=D1=88=D1=88=D1=88" (term-test-screen-from-input + 40 1 '("\321\210\321\210\321" "\210")))))) + (ert-deftest term-undecodable-input () ;; Bug#29918. "Undecodable bytes should be passed through without error." (let* ((locale-coding-system 'utf-8-unix) ; As above. --=20 2.47.0 --=-=-=-- ------------=_1744531744-12600-1--