GNU bug report logs -
#77410
term.el sometimes prints undecoded multibyte UTF-8 chars
Previous Next
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your bug report
#77410: term.el sometimes prints undecoded multibyte UTF-8 chars
which was filed against the emacs package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 77410 <at> debbugs.gnu.org.
--
77410: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=77410
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
> Date: Wed, 2 Apr 2025 15:18:03 +0300
> From: Stephane Zermatten via "Bug reports for GNU Emacs,
> the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>
> Update: The new version of the patch attached to this e-mail fixes the test term-undecodable-input.
>
> Some decodable input is necessary when sending undecodable input for term to flush the buffer it uses to
> keep undecoded multibyte characters. It seems that term-undecodable-input was relying on the issue fixed by
> this bug.
Thanks, installed on the master branch, and closing the bug.
[Message part 3 (message/rfc822, inline)]
[Message part 4 (text/plain, inline)]
Tags: patch
If I run a shell in a terminal with M-x term, with a very unicode-heavy
prompt (fish 3.6 + tide), sometimes the Unicode characters are printed
undecoded.
One possible cause of this might be unfortunate chunking in the middle
of a character, which the attached patch fixes.
Without the patch, if I type this in M-x term /usr/bin/bash
for j in $(seq 0 3); do
for i in $(seq 0 30); do
printf '\xf0\x9f'; sleep 0.1; printf '\x98\x80';
done;
echo;
done
I get
\360\237\203\022\360\...
Instead of:
😀😀😀😀😀😀😀...
With the patch included, I get the correct output.
The issue comes from an incorrect check (> count partial 0), which
should really be (and (>= count partial) (> partial 0)), but I
simplified that to (> partial 0) in the patch, because the while loop
guarantees (>= count partial).
I rewrote the existing test to cover this case, and try out multiple
different combination of chunks.
I'm still looking into other causes of the issue, but this, at least,
seems like an easy fix.
In GNU Emacs 30.1 (build 2, x86_64-apple-darwin23.6.0, NS appkit-2487.70
Version 14.7.4 (Build 23H420)) of 2025-03-24 built on boomer.zia
Windowing system distributor 'Apple', version 10.3.2487
System Description: macOS 14.7.4
Configured using:
'configure --disable-dependency-tracking --disable-silent-rules
--enable-locallisppath=/usr/local/share/emacs/site-lisp
--infodir=/usr/local/Cellar/emacs-plus <at> 30/30.1/share/info/emacs
--prefix=/usr/local/Cellar/emacs-plus <at> 30/30.1
--with-native-compilation=aot --with-xml2 --with-gnutls
--without-compress-install --without-dbus --without-imagemagick
--with-modules --with-rsvg --with-webp --with-ns
--disable-ns-self-contained 'CFLAGS=-O2 -DFD_SETSIZE=10000
-DDARWIN_UNLIMITED_SELECT -I/usr/local/opt/sqlite/include
-I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include'
'LDFLAGS=-L/usr/local/opt/sqlite/lib -L/usr/local/lib/gcc/14
-I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include''
[0001-Fix-issue-with-very-short-multibyte-character-chunk.patch (text/patch, attachment)]
This bug report was last modified 35 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.