GNU bug report logs - #77410
term.el sometimes prints undecoded multibyte UTF-8 chars

Previous Next

Package: emacs;

Reported by: Stephane Zermatten <szermatt <at> gmx.net>

Date: Mon, 31 Mar 2025 17:46:02 UTC

Severity: normal

Tags: patch

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Stephane Zermatten <szermatt <at> gmx.net>
Subject: bug#77410: closed (Re: bug#77410: term.el sometimes prints
 undecoded multibyte UTF-8 chars)
Date: Sun, 13 Apr 2025 08:09:04 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#77410: term.el sometimes prints undecoded multibyte UTF-8 chars

which was filed against the emacs package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 77410 <at> debbugs.gnu.org.

-- 
77410: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=77410
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Eli Zaretskii <eliz <at> gnu.org>
To: Stephane Zermatten <szermatt <at> gmx.net>
Cc: 77410-done <at> debbugs.gnu.org
Subject: Re: bug#77410: term.el sometimes prints undecoded multibyte UTF-8
 chars
Date: Sun, 13 Apr 2025 11:08:40 +0300
> Date: Wed, 2 Apr 2025 15:18:03 +0300
> From:  Stephane Zermatten via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> Update: The new version of the patch attached to this e-mail fixes the test term-undecodable-input. 
> 
> Some decodable input is necessary when sending undecodable input for term to flush the buffer it uses to
> keep undecoded multibyte characters. It seems that term-undecodable-input was relying on the issue fixed by
> this bug. 

Thanks, installed on the master branch, and closing the bug.

[Message part 3 (message/rfc822, inline)]
From: Stephane Zermatten <szermatt <at> gmx.net>
To: bug-gnu-emacs <at> gnu.org
Cc: szermatt <at> gmail.com
Subject: term.el sometimes prints undecoded multibyte UTF-8 chars
Date: Mon, 31 Mar 2025 17:18:35 +0300
[Message part 4 (text/plain, inline)]
Tags: patch

If I run a shell in a terminal with M-x term, with a very unicode-heavy
prompt (fish 3.6 + tide), sometimes the Unicode characters are printed
undecoded.

One possible cause of this might be unfortunate chunking in the middle
of a character, which the attached patch fixes.

Without the patch, if I type this in M-x term /usr/bin/bash

for j in $(seq 0 3); do
  for i in $(seq 0 30); do
    printf '\xf0\x9f'; sleep 0.1; printf '\x98\x80';
  done;
  echo;
done

I get
 \360\237\203\022\360\...

Instead of:
 😀😀😀😀😀😀😀...

With the patch included, I get the correct output.

The issue comes from an incorrect check (> count partial 0), which
should really be (and (>= count partial) (> partial 0)), but I
simplified that to (> partial 0) in the patch, because the while loop
guarantees (>= count partial).

I rewrote the existing test to cover this case, and try out multiple
different combination of chunks.

I'm still looking into other causes of the issue, but this, at least,
seems like an easy fix.

In GNU Emacs 30.1 (build 2, x86_64-apple-darwin23.6.0, NS appkit-2487.70
 Version 14.7.4 (Build 23H420)) of 2025-03-24 built on boomer.zia
Windowing system distributor 'Apple', version 10.3.2487
System Description:  macOS 14.7.4

Configured using:
 'configure --disable-dependency-tracking --disable-silent-rules
 --enable-locallisppath=/usr/local/share/emacs/site-lisp
 --infodir=/usr/local/Cellar/emacs-plus <at> 30/30.1/share/info/emacs
 --prefix=/usr/local/Cellar/emacs-plus <at> 30/30.1
 --with-native-compilation=aot --with-xml2 --with-gnutls
 --without-compress-install --without-dbus --without-imagemagick
 --with-modules --with-rsvg --with-webp --with-ns
 --disable-ns-self-contained 'CFLAGS=-O2 -DFD_SETSIZE=10000
 -DDARWIN_UNLIMITED_SELECT -I/usr/local/opt/sqlite/include
 -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include'
 'LDFLAGS=-L/usr/local/opt/sqlite/lib -L/usr/local/lib/gcc/14
 -I/usr/local/opt/gcc/include -I/usr/local/opt/libgccjit/include''

[0001-Fix-issue-with-very-short-multibyte-character-chunk.patch (text/patch, attachment)]

This bug report was last modified 35 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.