GNU bug report logs - #42904
[PATCH] Non-Unicode frame title crashes Emacs on macOS

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Mon, 17 Aug 2020 14:13:02 UTC

Severity: normal

Tags: patch

Merged with 41184

Found in version 28.0.50

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 42904 <at> debbugs.gnu.org, alan <at> idiocy.org
Subject: bug#42904: [PATCH] Non-Unicode frame title crashes Emacs on macOS
Date: Fri, 21 Aug 2020 16:26:11 +0300
> From: Mattias Engdegård <mattiase <at> acm.org>
> Date: Fri, 21 Aug 2020 11:39:30 +0200
> Cc: alan <at> idiocy.org, 42904 <at> debbugs.gnu.org
> 
> Basically, it's about how the bytes end up in mode_line_noprop_buf in the first place, since currently the information of whether it should be interpreted as unibyte or multibyte gets lost as soon as data from the strings it is composed of (like the buffer name for %b, file name for %f etc) is added to it. Then make_string tries to restore that information by looking at the bytes, and it is not always accurate.

make_string was written to work on byte sequences that don't begin as
the payload of a Lisp string.  So it doesn't handle the information
you say is being lost, because it doesn't expect such information to
be available to begin with.

Which is basically just another way of saying "you want something
other than make_string" here.

> One way of doing this is to always make sure that the input strings (buffer name, file name, frame-title-format etc) are always in multibyte form.

That's what I thought I was suggesting.

> > Again, what would you like to have instead?  Would calling
> > str_as_multibyte do what you want?
> 
> No, I don't think so -- once the unibyte/multibyte bit is lost, it can only be restored imperfectly if all we have is the sequence of bytes.

That is true, but str_as_multibyte simply interprets any valid UTF-8
sequence as a character, and any invalid sequence as a raw bytes.  I
thought this was precisely what you wanted for this use case, no?

> If we wrote Emacs from scratch we likely wouldn't have unibyte strings at all: they are only there for compatibility and various niche uses and performance hacks. I don't think it's unreasonable to start normalising strings to multibyte where it matters.

Emacs (as any other old editor) started with only unibyte strings, so
that's history for you.  Some modern text-handling environments solve
this conundrum by not supporting raw bytes at all, but Emacs knows
better.




This bug report was last modified 4 years and 270 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.