GNU bug report logs - #42904
[PATCH] Non-Unicode frame title crashes Emacs on macOS

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Mon, 17 Aug 2020 14:13:02 UTC

Severity: normal

Tags: patch

Merged with 41184

Found in version 28.0.50

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

Full log


Message #83 received at 42904 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 42904 <at> debbugs.gnu.org, alan <at> idiocy.org
Subject: Re: bug#42904: [PATCH] Non-Unicode frame title crashes Emacs on macOS
Date: Fri, 21 Aug 2020 11:39:30 +0200
20 aug. 2020 kl. 21.13 skrev Eli Zaretskii <eliz <at> gnu.org>:

> I don't think I understand.  mode_line_noprop_buf gets the bytes, and
> then we call make_string on it, so the result is the same as the one
> you'd like to avoid.  Or am I missing something?
> 
> By "settling on multibyte representation", do you mean that we should
> convert raw bytes to their multibyte form?  Or do you mean something
> else?

No, I think we are talking about the same thing. Basically, it's about how the bytes end up in mode_line_noprop_buf in the first place, since currently the information of whether it should be interpreted as unibyte or multibyte gets lost as soon as data from the strings it is composed of (like the buffer name for %b, file name for %f etc) is added to it. Then make_string tries to restore that information by looking at the bytes, and it is not always accurate.

One way of doing this is to always make sure that the input strings (buffer name, file name, frame-title-format etc) are always in multibyte form. Another would be to convert to multibyte as those strings are used, presumably in decode_mode_spec. You know this code a lot better than I do, but the former may be slightly more workable (and efficient).

> Again, what would you like to have instead?  Would calling
> str_as_multibyte do what you want?

No, I don't think so -- once the unibyte/multibyte bit is lost, it can only be restored imperfectly if all we have is the sequence of bytes. In mathematical terms, the function that maps an arbitrary string object to its bytes has no inverse. (Consider the unibyte string "\xc3\xa5" -- should the bytes {c3, a5} be recreated as that unibyte string, or as the multibyte string "å"?)

Again we are talking about trivialities here, but perhaps the same syndrome will arise in other contexts where it matters more. If we wrote Emacs from scratch we likely wouldn't have unibyte strings at all: they are only there for compatibility and various niche uses and performance hacks. I don't think it's unreasonable to start normalising strings to multibyte where it matters.

Thanks for your patience!





This bug report was last modified 4 years and 269 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.