GNU bug report logs - #77410
term.el sometimes prints undecoded multibyte UTF-8 chars

Previous Next

Package: emacs;

Reported by: Stephane Zermatten <szermatt <at> gmx.net>

Date: Mon, 31 Mar 2025 17:46:02 UTC

Severity: normal

Tags: patch

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stephane Zermatten <szermatt <at> gmx.net>
To: bug-gnu-emacs <at> gnu.org
Cc: szermatt <at> gmail.com
Subject: term.el sometimes prints undecoded multibyte UTF-8 chars
Date: Mon, 31 Mar 2025 17:18:35 +0300
[Message part 1 (text/plain, inline)]
Tags: patch

If I run a shell in a terminal with M-x term, with a very unicode-heavy
prompt (fish 3.6 + tide), sometimes the Unicode characters are printed
undecoded.

One possible cause of this might be unfortunate chunking in the middle
of a character, which the attached patch fixes.

Without the patch, if I type this in M-x term /usr/bin/bash

for j in $(seq 0 3); do
  for i in $(seq 0 30); do
    printf '\xf0\x9f'; sleep 0.1; printf '\x98\x80';
  done;
  echo;
done

I get
 \360\237\203\022\360\...

Instead of:
 😀😀😀😀😀😀😀...

With the patch included, I get the correct output.

The issue comes from an incorrect check (> count partial 0), which
should really be (and (>= count partial) (> partial 0)), but I
simplified that to (> partial 0) in the patch, because the while loop
guarantees (>= count partial).

I rewrote the existing test to cover this case, and try out multiple
different combination of chunks.

I'm still looking into other causes of the issue, but this, at least,
seems like an easy fix.

In GNU Emacs 30.1 (build 2, x86_64-apple-darwin23.6.0, NS appkit-2487.70
 Version 14.7.4 (Build 23H420)) of 2025-03-24 built on boomer.zia
Windowing system distributor 'Apple', version 10.3.2487
System Description:  macOS 14.7.4

Configured using:
 'configure --disable-dependency-tracking --disable-silent-rules
 --enable-locallisppath=/usr/local/share/emacs/site-lisp
 --infodir=/usr/local/Cellar/emacs-plus <at> 30/30.1/share/info/emacs
 --prefix=/usr/local/Cellar/emacs-plus <at> 30/30.1
 --with-native-compilation=aot --with-xml2 --with-gnutls
 --without-compress-install --without-dbus --without-imagemagick
 --with-modules --with-rsvg --with-webp --with-ns
 --disable-ns-self-contained 'CFLAGS=-O2 -DFD_SETSIZE=10000
 -DDARWIN_UNLIMITED_SELECT -I/usr/local/opt/sqlite/include
 -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include'
 'LDFLAGS=-L/usr/local/opt/sqlite/lib -L/usr/local/lib/gcc/14
 -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include''

[0001-Fix-issue-with-very-short-multibyte-character-chunk.patch (text/patch, attachment)]

This bug report was last modified 65 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.