GNU bug report logs - #77410
term.el sometimes prints undecoded multibyte UTF-8 chars

Previous Next

Package: emacs;

Reported by: Stephane Zermatten <szermatt <at> gmx.net>

Date: Mon, 31 Mar 2025 17:46:02 UTC

Severity: normal

Tags: patch

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 77410 in the body.
You can then email your comments to 77410 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#77410; Package emacs. (Mon, 31 Mar 2025 17:46:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stephane Zermatten <szermatt <at> gmx.net>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 31 Mar 2025 17:46:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stephane Zermatten <szermatt <at> gmx.net>
To: bug-gnu-emacs <at> gnu.org
Cc: szermatt <at> gmail.com
Subject: term.el sometimes prints undecoded multibyte UTF-8 chars
Date: Mon, 31 Mar 2025 17:18:35 +0300
[Message part 1 (text/plain, inline)]
Tags: patch

If I run a shell in a terminal with M-x term, with a very unicode-heavy
prompt (fish 3.6 + tide), sometimes the Unicode characters are printed
undecoded.

One possible cause of this might be unfortunate chunking in the middle
of a character, which the attached patch fixes.

Without the patch, if I type this in M-x term /usr/bin/bash

for j in $(seq 0 3); do
  for i in $(seq 0 30); do
    printf '\xf0\x9f'; sleep 0.1; printf '\x98\x80';
  done;
  echo;
done

I get
 \360\237\203\022\360\...

Instead of:
 😀😀😀😀😀😀😀...

With the patch included, I get the correct output.

The issue comes from an incorrect check (> count partial 0), which
should really be (and (>= count partial) (> partial 0)), but I
simplified that to (> partial 0) in the patch, because the while loop
guarantees (>= count partial).

I rewrote the existing test to cover this case, and try out multiple
different combination of chunks.

I'm still looking into other causes of the issue, but this, at least,
seems like an easy fix.

In GNU Emacs 30.1 (build 2, x86_64-apple-darwin23.6.0, NS appkit-2487.70
 Version 14.7.4 (Build 23H420)) of 2025-03-24 built on boomer.zia
Windowing system distributor 'Apple', version 10.3.2487
System Description:  macOS 14.7.4

Configured using:
 'configure --disable-dependency-tracking --disable-silent-rules
 --enable-locallisppath=/usr/local/share/emacs/site-lisp
 --infodir=/usr/local/Cellar/emacs-plus <at> 30/30.1/share/info/emacs
 --prefix=/usr/local/Cellar/emacs-plus <at> 30/30.1
 --with-native-compilation=aot --with-xml2 --with-gnutls
 --without-compress-install --without-dbus --without-imagemagick
 --with-modules --with-rsvg --with-webp --with-ns
 --disable-ns-self-contained 'CFLAGS=-O2 -DFD_SETSIZE=10000
 -DDARWIN_UNLIMITED_SELECT -I/usr/local/opt/sqlite/include
 -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include'
 'LDFLAGS=-L/usr/local/opt/sqlite/lib -L/usr/local/lib/gcc/14
 -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include''

[0001-Fix-issue-with-very-short-multibyte-character-chunk.patch (text/patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77410; Package emacs. (Wed, 02 Apr 2025 12:19:01 GMT) Full text and rfc822 format available.

Message #8 received at 77410 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Zermatten <szermatt <at> gmx.net>
To: 77410 <at> debbugs.gnu.org
Subject: Re: term.el sometimes prints undecoded multibyte UTF-8 chars
Date: Wed, 2 Apr 2025 15:18:03 +0300
[Message part 1 (text/plain, inline)]
Update: The new version of the patch attached to this e-mail fixes the test
term-undecodable-input.

Some decodable input is necessary when sending undecodable input for term
to flush the buffer it uses to keep undecoded multibyte characters. It
seems that term-undecodable-input was relying on the issue fixed by this
bug.
[Message part 2 (text/html, inline)]
[0001-Fix-issue-with-very-short-multibyte-character-chunk.patch (application/octet-stream, attachment)]

Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sun, 13 Apr 2025 08:09:03 GMT) Full text and rfc822 format available.

Notification sent to Stephane Zermatten <szermatt <at> gmx.net>:
bug acknowledged by developer. (Sun, 13 Apr 2025 08:09:04 GMT) Full text and rfc822 format available.

Message #13 received at 77410-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stephane Zermatten <szermatt <at> gmx.net>
Cc: 77410-done <at> debbugs.gnu.org
Subject: Re: bug#77410: term.el sometimes prints undecoded multibyte UTF-8
 chars
Date: Sun, 13 Apr 2025 11:08:40 +0300
> Date: Wed, 2 Apr 2025 15:18:03 +0300
> From:  Stephane Zermatten via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> Update: The new version of the patch attached to this e-mail fixes the test term-undecodable-input. 
> 
> Some decodable input is necessary when sending undecodable input for term to flush the buffer it uses to
> keep undecoded multibyte characters. It seems that term-undecodable-input was relying on the issue fixed by
> this bug. 

Thanks, installed on the master branch, and closing the bug.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 11 May 2025 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 35 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.