GNU bug report logs - #75581
30.0.93; Crash when copy-sequence on clear-string'ed multibyte strings with text properties

Previous Next

Package: emacs;

Reported by: Kanakana <gudzpoz <at> gmail.com>

Date: Wed, 15 Jan 2025 14:37:02 UTC

Severity: normal

Found in version 30.0.93

Fixed in version 31.1

Done: Stefan Kangas <stefankangas <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 75581 in the body.
You can then email your comments to 75581 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Wed, 15 Jan 2025 14:37:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kanakana <gudzpoz <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 15 Jan 2025 14:37:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Kanakana <gudzpoz <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 30.0.93; Crash when copy-sequence on clear-string'ed multibyte
 strings with text properties
Date: Wed, 15 Jan 2025 18:48:30 +0800

** Reproducing Steps

Run:

    emacs -Q --batch --eval '(let ((s #("🤗" 0 1 (prop value)))) 
(clear-string s) (copy-sequence s))'


** Expected Behavior

I don't know. Are text properties expected to handle string length
change via clear-string?

** Observed Symptoms

Emacs crashes, with stack trace saying:

#+begin_verse
#0 0x00005555557e2a58 in copy_intervals ()
#1 0x000055555576dfc4 in Fcopy_sequence ()
#2 0x0000555555765c65 in eval_sub ()
#3 0x0000555555769080 in Flet ()
#4 0x0000555555765b0a in eval_sub ()
#5 0x0000555555766353 in Feval ()
#6 0x0000555555766eee in Ffuncall ()
#7 0x00007fffeed925b6 in F636f6d6d616e642d6c696e652d31_command_line_1_0 ()
from 
/usr/bin/../lib/emacs/30.0.93/native-lisp/30.0.93-3fae9fb4/preloaded/startup-bbc6ea72-bc20aae4.eln
#8 0x0000555555766eee in Ffuncall ()
#9 0x00007fffeed890a7 in F636f6d6d616e642d6c696e65_command_line_0 ()
from 
/usr/bin/../lib/emacs/30.0.93/native-lisp/30.0.93-3fae9fb4/preloaded/startup-bbc6ea72-bc20aae4.eln
#10 0x0000555555766eee in Ffuncall ()
#11 0x00007fffeed85500 in 
F6e6f726d616c2d746f702d6c6576656c_normal_top_level_0 ()
from 
/usr/bin/../lib/emacs/30.0.93/native-lisp/30.0.93-3fae9fb4/preloaded/startup-bbc6ea72-bc20aae4.eln
#12 0x0000555555765aff in eval_sub ()
#13 0x0000555555766353 in Feval ()
#14 0x00005555556cbe3e in top_level_2 ()
#15 0x0000555555762127 in internal_condition_case ()
#16 0x00005555556cc97b in top_level_1 ()
#17 0x000055555576206a in internal_catch ()
#18 0x00005555556cbd46 in command_loop ()
#19 0x00005555556d3cad in recursive_edit_1 ()
#20 0x00005555556d4066 in Frecursive_edit ()
#21 0x00005555555e2428 in main ()
#+end_verse

(Also reproduced in an Emacs 29.1 AppImage from
https://github.com/blahgeek/emacs-appimage/releases/20240102-1200 .)

In GNU Emacs 30.0.93 (build 2, x86_64-pc-linux-gnu, GTK+ Version
3.24.43, cairo version 1.18.2) of 2025-01-15 built on lil-end
Repository revision: b981889e9ee0a37f1bc8e2c9b90a5d154c1d032e
Repository branch: makepkg
System Description: EndeavourOS

Configured using:
'configure --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
--localstatedir=/var --mandir=/usr/share/man --with-gameuser=:games
--with-modules --without-m17n-flt --without-gconf
--with-native-compilation=yes --with-xinput2 --with-pgtk
--without-xaw3d --with-sound=no --with-tree-sitter --without-gpm
--without-compress-install
'--program-transform-name=s/\([ec]tags\)/\1.emacs/'
'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions
-Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security
-fstack-clash-protection -fcf-protection'
LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now
'CXXFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions
-Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security
-fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS''

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG
LCMS2 LIBOTF LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY
PDUMPER PGTK PNG RSVG SECCOMP SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS
TREE_SITTER WEBP XIM GTK3 ZLIB

Important settings:
value of $LC_MONETARY: en_US.UTF-8
value of $LC_NUMERIC: en_US.UTF-8
value of $LC_TIME: en_US.UTF-8
value of $LANG: en_US.UTF-8
value of $XMODIFIERS: fcitx
locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
tooltip-mode: t
global-eldoc-mode: t
eldoc-mode: t
show-paren-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
minibuffer-regexp-mode: t
line-number-mode: t
indent-tabs-mode: t
transient-mark-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
cl-loaddefs cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils rmc iso-transl tooltip cconv eldoc paren electric
uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel
term/pgtk-win pgtk-win term/common-win touch-screen pgtk-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow
isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax
font-core term/tty-colors frame minibuffer nadvice seq simple cl-generic
indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
theme-loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget keymap
hashtable-print-readable backquote threads dbusbind inotify
dynamic-setting system-font-setting font-render-setting cairo gtk pgtk
lcms2 multi-tty move-toolbar make-network-process native-compile emacs)

Memory information:
((conses 16 49610 11031) (symbols 48 5334 0) (strings 32 13417 2300)
(string-bytes 1 392723) (vectors 16 8817)
(vector-slots 8 124703 9243) (floats 8 22 12) (intervals 56 397 0)
(buffers 992 11))

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Wed, 15 Jan 2025 19:46:01 GMT) Full text and rfc822 format available.

Message #8 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Kanakana <gudzpoz <at> gmail.com>
Cc: 75581 <at> debbugs.gnu.org
Subject: Re: bug#75581: 30.0.93; Crash when copy-sequence on clear-string'ed
 multibyte strings with text properties
Date: Wed, 15 Jan 2025 20:44:56 +0100

On Jan 15 2025, Kanakana wrote:

> I don't know. Are text properties expected to handle string length
> change via clear-string?

clear-string should probably remove any text properties.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Wed, 15 Jan 2025 20:16:02 GMT) Full text and rfc822 format available.

Message #11 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Andreas Schwab <schwab <at> linux-m68k.org>, Kanakana <gudzpoz <at> gmail.com>
Cc: 75581 <at> debbugs.gnu.org
Subject: Re: bug#75581: 30.0.93; Crash when copy-sequence on clear-string'ed
 multibyte strings with text properties
Date: Wed, 15 Jan 2025 15:15:06 -0500

Andreas Schwab <schwab <at> linux-m68k.org> writes:

> On Jan 15 2025, Kanakana wrote:
>
>> I don't know. Are text properties expected to handle string length
>> change via clear-string?
>
> clear-string should probably remove any text properties.

Agreed.  This patch fixes the crash for me:

diff --git a/src/fns.c b/src/fns.c
index 191154c651a..c48cdf879e4 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -3315,6 +3315,8 @@ DEFUN ("clear-string", Fclear_string, Sclear_string,
 {
   CHECK_STRING (string);
   ptrdiff_t len = SBYTES (string);
+  Fset_text_properties (make_fixnum (0), make_fixnum (SCHARS (string)),
+			Qnil, string);
   if (len != 0 || STRING_MULTIBYTE (string))
     {
       CHECK_IMPURE (string, XSTRING (string));

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Wed, 15 Jan 2025 20:36:02 GMT) Full text and rfc822 format available.

Message #14 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 75581 <at> debbugs.gnu.org, Kanakana <gudzpoz <at> gmail.com>,
 Andreas Schwab <schwab <at> linux-m68k.org>
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Wed, 15 Jan 2025 20:35:07 +0000

"Stefan Kangas" <stefankangas <at> gmail.com> writes:

> Andreas Schwab <schwab <at> linux-m68k.org> writes:
>
>> On Jan 15 2025, Kanakana wrote:
>>
>>> I don't know. Are text properties expected to handle string length
>>> change via clear-string?
>>
>> clear-string should probably remove any text properties.
>
> Agreed.  This patch fixes the crash for me:
>
> diff --git a/src/fns.c b/src/fns.c
> index 191154c651a..c48cdf879e4 100644
> --- a/src/fns.c
> +++ b/src/fns.c
> @@ -3315,6 +3315,8 @@ DEFUN ("clear-string", Fclear_string, Sclear_string,
>  {
>    CHECK_STRING (string);
>    ptrdiff_t len = SBYTES (string);
> +  Fset_text_properties (make_fixnum (0), make_fixnum (SCHARS (string)),
> +			Qnil, string);
>    if (len != 0 || STRING_MULTIBYTE (string))
>      {
>        CHECK_IMPURE (string, XSTRING (string));

Looks good to me.  However, on the scratch/no-purespace branch, we need
to ensure that this function cannot be called on the empty multibyte
string, turning it into a second empty unibyte string!  I'll push a fix
to that branch in a bit.

However, if this function is meant to clear a string from memory
to prevent leaving confidential data in memory, that won't work with
MPS.  (I suppose it's reasonably clear that no protection whatsoever
applies to string properties?)

The problem is MPS allocates string data in a "copying" pool, so several
copies of the string will usually be present.

alloc.c string compaction may also result in recently-used short strings
remaining in memory, IIUC, but only if an alloc.c GC cycle happens
between string creation and the clear-string call.  (But if it does
happen, it's very likely that a copy of the string is created!)

How important is this feature?  What we have now is better that nothing,
but if people start relying on it in any way, that's worse than nothing.
A stern warning in the docstring may help reduce the problem.

Pip

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Wed, 15 Jan 2025 20:50:01 GMT) Full text and rfc822 format available.

Message #17 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 75581 <at> debbugs.gnu.org, gudzpoz <at> gmail.com, schwab <at> linux-m68k.org
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Wed, 15 Jan 2025 22:48:52 +0200

> Cc: 75581 <at> debbugs.gnu.org
> From: Stefan Kangas <stefankangas <at> gmail.com>
> Date: Wed, 15 Jan 2025 15:15:06 -0500
> 
> Andreas Schwab <schwab <at> linux-m68k.org> writes:
> 
> > On Jan 15 2025, Kanakana wrote:
> >
> >> I don't know. Are text properties expected to handle string length
> >> change via clear-string?
> >
> > clear-string should probably remove any text properties.
> 
> Agreed.  This patch fixes the crash for me:
> 
> diff --git a/src/fns.c b/src/fns.c
> index 191154c651a..c48cdf879e4 100644
> --- a/src/fns.c
> +++ b/src/fns.c
> @@ -3315,6 +3315,8 @@ DEFUN ("clear-string", Fclear_string, Sclear_string,
>  {
>    CHECK_STRING (string);
>    ptrdiff_t len = SBYTES (string);
> +  Fset_text_properties (make_fixnum (0), make_fixnum (SCHARS (string)),
> +			Qnil, string);
>    if (len != 0 || STRING_MULTIBYTE (string))
>      {
>        CHECK_IMPURE (string, XSTRING (string));

Thanks, but please also document this aspect of the function's
behavior in both the doc string and in the ELisp manual.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Wed, 15 Jan 2025 20:52:02 GMT) Full text and rfc822 format available.

Message #20 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 75581 <at> debbugs.gnu.org, Kanakana <gudzpoz <at> gmail.com>,
 Andreas Schwab <schwab <at> linux-m68k.org>
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Wed, 15 Jan 2025 20:50:48 +0000

Pip Cet <pipcet <at> protonmail.com> writes:

> "Stefan Kangas" <stefankangas <at> gmail.com> writes:
>
>> Andreas Schwab <schwab <at> linux-m68k.org> writes:
>>
>>> On Jan 15 2025, Kanakana wrote:
>>>
>>>> I don't know. Are text properties expected to handle string length
>>>> change via clear-string?
>>>
>>> clear-string should probably remove any text properties.
>>
>> Agreed.  This patch fixes the crash for me:
>>
>> diff --git a/src/fns.c b/src/fns.c
>> index 191154c651a..c48cdf879e4 100644
>> --- a/src/fns.c
>> +++ b/src/fns.c
>> @@ -3315,6 +3315,8 @@ DEFUN ("clear-string", Fclear_string, Sclear_string,
>>  {
>>    CHECK_STRING (string);
>>    ptrdiff_t len = SBYTES (string);
>> +  Fset_text_properties (make_fixnum (0), make_fixnum (SCHARS (string)),
>> +			Qnil, string);
>>    if (len != 0 || STRING_MULTIBYTE (string))
>>      {
>>        CHECK_IMPURE (string, XSTRING (string));
>
> Looks good to me.  However, on the scratch/no-purespace branch, we need
> to ensure that this function cannot be called on the empty multibyte
> string, turning it into a second empty unibyte string!  I'll push a fix
> to that branch in a bit.

Sorry, false alarm.  STRING_SET_UNIBYTE modifies its argument in this
case, replacing it by the appropriate empty string (which is unused).

The only difference is that (clear-string "") no longer throws an error,
but I don't see a problem with either behavior in that case.

Pip

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Wed, 15 Jan 2025 22:03:01 GMT) Full text and rfc822 format available.

Message #23 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: 75581 <at> debbugs.gnu.org, Kanakana <gudzpoz <at> gmail.com>,
 Andreas Schwab <schwab <at> linux-m68k.org>
Subject: Re: bug#75581: 30.0.93; Crash when copy-sequence on clear-string'ed
 multibyte strings with text properties
Date: Wed, 15 Jan 2025 17:02:27 -0500

Pip Cet <pipcet <at> protonmail.com> writes:

> Looks good to me.  However, on the scratch/no-purespace branch, we need
> to ensure that this function cannot be called on the empty multibyte
> string, turning it into a second empty unibyte string!  I'll push a fix
> to that branch in a bit.

Please do, thanks.

> However, if this function is meant to clear a string from memory
> to prevent leaving confidential data in memory, that won't work with
> MPS.  (I suppose it's reasonably clear that no protection whatsoever
> applies to string properties?)
>
> The problem is MPS allocates string data in a "copying" pool, so several
> copies of the string will usually be present.
>
> alloc.c string compaction may also result in recently-used short strings
> remaining in memory, IIUC, but only if an alloc.c GC cycle happens
> between string creation and the clear-string call.  (But if it does
> happen, it's very likely that a copy of the string is created!)
>
> How important is this feature?  What we have now is better that nothing,
> but if people start relying on it in any way, that's worse than nothing.
> A stern warning in the docstring may help reduce the problem.

I think a stern warning in the docstring makes sense.  This is
potentially security-critical, and we should better make sure that we
can make no guarantees here.

It would be nice if we could have a guaranteed secure way of allocating
and deallocating strings for passwords from ELisp.  Could something like
that be done?  I can think of several dumb and hacky ways to do it, but
none that are clean or smart.

Reply sent to Stefan Kangas <stefankangas <at> gmail.com>:
You have taken responsibility. (Wed, 15 Jan 2025 22:20:02 GMT) Full text and rfc822 format available.

Notification sent to Kanakana <gudzpoz <at> gmail.com>:
bug acknowledged by developer. (Wed, 15 Jan 2025 22:20:02 GMT) Full text and rfc822 format available.

Message #28 received at 75581-done <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: gudzpoz <at> gmail.com, schwab <at> linux-m68k.org, 75581-done <at> debbugs.gnu.org
Subject: Re: bug#75581: 30.0.93; Crash when copy-sequence on clear-string'ed
 multibyte strings with text properties
Date: Wed, 15 Jan 2025 14:19:06 -0800

Version: 31.1

Eli Zaretskii <eliz <at> gnu.org> writes:

>> Cc: 75581 <at> debbugs.gnu.org
>> From: Stefan Kangas <stefankangas <at> gmail.com>
>> Date: Wed, 15 Jan 2025 15:15:06 -0500
>>
>> Andreas Schwab <schwab <at> linux-m68k.org> writes:
>>
>> > On Jan 15 2025, Kanakana wrote:
>> >
>> >> I don't know. Are text properties expected to handle string length
>> >> change via clear-string?
>> >
>> > clear-string should probably remove any text properties.
>>
>> Agreed.  This patch fixes the crash for me:
>>
>> diff --git a/src/fns.c b/src/fns.c
>> index 191154c651a..c48cdf879e4 100644
>> --- a/src/fns.c
>> +++ b/src/fns.c
>> @@ -3315,6 +3315,8 @@ DEFUN ("clear-string", Fclear_string, Sclear_string,
>>  {
>>    CHECK_STRING (string);
>>    ptrdiff_t len = SBYTES (string);
>> +  Fset_text_properties (make_fixnum (0), make_fixnum (SCHARS (string)),
>> +			Qnil, string);
>>    if (len != 0 || STRING_MULTIBYTE (string))
>>      {
>>        CHECK_IMPURE (string, XSTRING (string));
>
> Thanks, but please also document this aspect of the function's
> behavior in both the doc string and in the ELisp manual.

Thanks, done.  However, I pushed to master before I realized that maybe
this belongs on emacs-30?  WDYT?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Wed, 15 Jan 2025 22:22:02 GMT) Full text and rfc822 format available.

Message #31 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: 75581 <at> debbugs.gnu.org, Kanakana <gudzpoz <at> gmail.com>,
 Andreas Schwab <schwab <at> linux-m68k.org>
Subject: Re: bug#75581: 30.0.93; Crash when copy-sequence on clear-string'ed
 multibyte strings with text properties
Date: Wed, 15 Jan 2025 17:21:22 -0500

Pip Cet <pipcet <at> protonmail.com> writes:

> Sorry, false alarm.  STRING_SET_UNIBYTE modifies its argument in this
> case, replacing it by the appropriate empty string (which is unused).
>
> The only difference is that (clear-string "") no longer throws an error,
> but I don't see a problem with either behavior in that case.

Thanks for checking that.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Wed, 15 Jan 2025 23:03:02 GMT) Full text and rfc822 format available.

Message #34 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 75581 <at> debbugs.gnu.org, Kanakana <gudzpoz <at> gmail.com>,
 Andreas Schwab <schwab <at> linux-m68k.org>
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Wed, 15 Jan 2025 23:02:32 +0000

"Stefan Kangas" <stefankangas <at> gmail.com> writes:

> Pip Cet <pipcet <at> protonmail.com> writes:
>
>> Looks good to me.  However, on the scratch/no-purespace branch, we need
>> to ensure that this function cannot be called on the empty multibyte
>> string, turning it into a second empty unibyte string!  I'll push a fix
>> to that branch in a bit.
>
> Please do, thanks.

No need, I had forgotten the strange way STRING_SET_MULTIBYTE treats its
argument as an lvalue.  It's a minor issue, but I would prefer it if the
argument were provided as a pointer to a Lisp_Object so it's clear it
can be modified.

>> However, if this function is meant to clear a string from memory
>> to prevent leaving confidential data in memory, that won't work with
>> MPS.  (I suppose it's reasonably clear that no protection whatsoever
>> applies to string properties?)
>>
>> The problem is MPS allocates string data in a "copying" pool, so several
>> copies of the string will usually be present.
>>
>> alloc.c string compaction may also result in recently-used short strings
>> remaining in memory, IIUC, but only if an alloc.c GC cycle happens
>> between string creation and the clear-string call.  (But if it does
>> happen, it's very likely that a copy of the string is created!)
>>
>> How important is this feature?  What we have now is better that nothing,
>> but if people start relying on it in any way, that's worse than nothing.
>> A stern warning in the docstring may help reduce the problem.
>
> I think a stern warning in the docstring makes sense.  This is
> potentially security-critical, and we should better make sure that we
> can make no guarantees here.

I'm worried that providing a "security" feature like this one may raise
false expectations that Emacs provides similar "security" features
(usually, in this context, "security" very quickly becomes code for
"ensure that sensitive data is only ever exposed to
manufacturer-provided proprietary software") in other contexts, and I
really don't think we want to do that.

The technical reason is that we simply cannot guarantee a string which
is entered in the minibuffer or in a real buffer won't be visible
somewhere: if it's in a buffer, it may be in the gap, and AFAIK there's
no way to clear that; it's in the undo history; it's in the M-x
view-lossage output.  Physically, clearing memory only makes sense if
it's an entire cache line: a partial cache line memset will leave the
original copy in RAM while creating a write-back entry in the CPU cache:
if that is what you're worried about, the naive approach will actually
make things worse.

The philosophical reason is that it seems harmless to offer an optional
additional security feature with no promises, but we've seen these
"evolve" into requirements which are being abused to harm Free Software,
in particular, in too many cases.  Software signing used to be optional
with all promises made it would be used only to improve security, not to
lock out Free Software, but now many people cannot run Emacs on their
locked-down employer-provided devices, and I fear it's only a matter of
time before that is expanded to most Android phones.

> It would be nice if we could have a guaranteed secure way of allocating
> and deallocating strings for passwords from ELisp.  Could something like
> that be done?  I can think of several dumb and hacky ways to do it, but
> none that are clean or smart.

We could expose pin_string: pinned strings aren't compacted, so clearing
them is safer if the old GC is in use.

But it's not clear to me what this is supposed to protect against.  If
it's physical memory inspection, we need to know a lot about the CPU for
our operation to make sense; pinning a short string and memsetting it
will virtually ensure that the previous contents will remain in RAM for
longer than they would if the string had been garbage-collected and
overwritten.

If it's disk storage, we need to look into madvise(2), mlock(2),
mlock2(), and memfd_secret(2), at least on GNU/Linux (in the case of
memfd_secret, I'm tempted to omit the "GNU/": it seems to me that this
is a Linux kernel attempt to obey security protocols which ultimately
lead to "Trusted Computing", and the GNU position on that is clear).

This isn't a simple matter, and it may be best to remove clear-string
entirely.

Pip

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Thu, 16 Jan 2025 00:04:02 GMT) Full text and rfc822 format available.

Message #37 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 75581 <at> debbugs.gnu.org, Kanakana <gudzpoz <at> gmail.com>,
 Andreas Schwab <schwab <at> linux-m68k.org>
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Thu, 16 Jan 2025 00:03:10 +0000

Pip Cet <pipcet <at> protonmail.com> writes:

> "Stefan Kangas" <stefankangas <at> gmail.com> writes:
>
>> Pip Cet <pipcet <at> protonmail.com> writes:
>>
>>> Looks good to me.  However, on the scratch/no-purespace branch, we need
>>> to ensure that this function cannot be called on the empty multibyte
>>> string, turning it into a second empty unibyte string!  I'll push a fix
>>> to that branch in a bit.
>>
>> Please do, thanks.
>
> No need, I had forgotten the strange way STRING_SET_MULTIBYTE treats its
> argument as an lvalue.  It's a minor issue, but I would prefer it if the
> argument were provided as a pointer to a Lisp_Object so it's clear it
> can be modified.

Patch (I've verified that the eassume-in-a-loop thing doesn't generate
any code with current GCC, but it'd be nice to check this for clang):

STRING_SET_UNIBYTE (str[i++]) probably wasn't worth worrying about, but
STRING_SET_UNIBYTE (empty_multibyte_string) seemed like a plausible
mistake to me, and would have had dire results.  Anyway, changing the
API ensures we have a good look at all callers, and everything appears
to be in order (except that I think asetting a single-character string
should always work; right now, sometimes (aset str 0 #x100) will fail to
work when (progn (aset str 0 0) (aset str 0 #x100)) would work, based on
the character supposedly overwritten.  IOW, we can't implement
dead-store elimination for aset because an apparent "dead" store can
have an effect).

commit 9df159043cff1af82848e3a232c9e6cf2ecf5992 (HEAD)
Author: Pip Cet <pipcet <at> protonmail.com>
Date:   Wed Jan 15 23:15:19 2025 +0000

    Align STRING_SET_UNIBYTE and STRING_SET_MULTIBYTE (bug#75581)
    
    Calling convention is now that both identifiers refer to
    ordinary (inline) functions receiving a pointer to a Lisp_Object,
    which is modified to be the unique appropriate empty string if
    necessary.
    
    * src/alloc.c (make_multibyte_string): Drop 'register' hint.
    (make_string_from_bytes): Drop 'register' hint; pass a pointer to the
    Lisp_Object which might be modified.
    (make_specified_string):
    (make_clear_string):
    modified.
    * src/print.c (Fprin1_to_string):
    * src/fns.c (Fdelete):
    (Fclear_string): Pass a pointer to the Lisp_Object which might be
    * src/data.c (Faset): Also drop 'register' hint.
    * src/lisp.h (STRING_SET_UNIBYTE, STRING_SET_MULTIBYTE): Move.  Turn
    both into functions with the same calling convention.  Assert that no
    invalid multibyte string is created in 'STRING_SET_MULTIBYTE'.

diff --git a/src/alloc.c b/src/alloc.c
index 8307c74c106..1e9d3bfd69f 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -2510,7 +2510,7 @@ make_unibyte_string (const char *contents, ptrdiff_t length)
 make_multibyte_string (const char *contents,
 		       ptrdiff_t nchars, ptrdiff_t nbytes)
 {
-  register Lisp_Object val;
+  Lisp_Object val;
   val = make_uninit_multibyte_string (nchars, nbytes);
   memcpy (SDATA (val), contents, nbytes);
   return val;
@@ -2524,11 +2524,11 @@ make_multibyte_string (const char *contents,
 make_string_from_bytes (const char *contents,
 			ptrdiff_t nchars, ptrdiff_t nbytes)
 {
-  register Lisp_Object val;
+  Lisp_Object val;
   val = make_uninit_multibyte_string (nchars, nbytes);
   memcpy (SDATA (val), contents, nbytes);
   if (SBYTES (val) == SCHARS (val))
-    STRING_SET_UNIBYTE (val);
+    STRING_SET_UNIBYTE (&val);
   return val;
 }
 
@@ -2555,7 +2555,7 @@ make_specified_string (const char *contents,
   val = make_uninit_multibyte_string (nchars, nbytes);
   memcpy (SDATA (val), contents, nbytes);
   if (!multibyte)
-    STRING_SET_UNIBYTE (val);
+    STRING_SET_UNIBYTE (&val);
   return val;
 }
 
@@ -2572,7 +2572,7 @@ make_clear_string (EMACS_INT length, bool clearit)
   if (!length)
     return empty_unibyte_string;
   val = make_clear_multibyte_string (length, length, clearit);
-  STRING_SET_UNIBYTE (val);
+  STRING_SET_UNIBYTE (&val);
   return val;
 }
 
diff --git a/src/data.c b/src/data.c
index be85f817014..deaa8c502d4 100644
--- a/src/data.c
+++ b/src/data.c
@@ -2584,9 +2584,9 @@ DEFUN ("aset", Faset, Saset, 3, 3, 0,
        doc: /* Store into the element of ARRAY at index IDX the value NEWELT.
 Return NEWELT.  ARRAY may be a vector, a string, a char-table or a
 bool-vector.  IDX starts at 0.  */)
-  (register Lisp_Object array, Lisp_Object idx, Lisp_Object newelt)
+  (Lisp_Object array, Lisp_Object idx, Lisp_Object newelt)
 {
-  register EMACS_INT idxval;
+  EMACS_INT idxval;
 
   CHECK_FIXNUM (idx);
   idxval = XFIXNUM (idx);
@@ -2646,7 +2646,7 @@ DEFUN ("aset", Faset, Saset, 3, 3, 0,
 	    if (!ASCII_CHAR_P (SREF (array, i)))
 	      args_out_of_range (array, newelt);
 	  /* ARRAY is an ASCII string.  Convert it to a multibyte string.  */
-	  STRING_SET_MULTIBYTE (array);
+	  STRING_SET_MULTIBYTE (&array);
 	  idxval_byte = idxval;
 	  p1 = SDATA (array) + idxval_byte;
 	  prev_bytes = 1;
diff --git a/src/fns.c b/src/fns.c
index 191154c651a..edde21469cb 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -2203,7 +2203,7 @@ DEFUN ("delete", Fdelete, Sdelete, 2, 2, 0,
 
 	  tem = make_uninit_multibyte_string (nchars, nbytes);
 	  if (!STRING_MULTIBYTE (seq))
-	    STRING_SET_UNIBYTE (tem);
+	    STRING_SET_UNIBYTE (&tem);
 
 	  for (i = nchars = nbytes = ibyte = 0;
 	       i < SCHARS (seq);
@@ -3320,7 +3320,7 @@ DEFUN ("clear-string", Fclear_string, Sclear_string,
       CHECK_IMPURE (string, XSTRING (string));
       memset (SDATA (string), 0, len);
       STRING_SET_CHARS (string, len);
-      STRING_SET_UNIBYTE (string);
+      STRING_SET_UNIBYTE (&string);
     }
   return Qnil;
 }
diff --git a/src/lisp.h b/src/lisp.h
index a8fe2e9f6bc..31c087eda73 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -1654,25 +1654,6 @@ STRING_MULTIBYTE (Lisp_Object str)
 #define STRING_BYTES_BOUND  \
   ((ptrdiff_t) min (MOST_POSITIVE_FIXNUM, min (SIZE_MAX, PTRDIFF_MAX) - 1))
 
-/* Mark STR as a unibyte string.  */
-#define STRING_SET_UNIBYTE(STR)				\
-  do {							\
-    if (XSTRING (STR)->u.s.size == 0)			\
-      (STR) = empty_unibyte_string;			\
-    else						\
-      XSTRING (STR)->u.s.size_byte = -1;		\
-  } while (false)
-
-/* Mark STR as a multibyte string.  Assure that STR contains only
-   ASCII characters in advance.  */
-INLINE void
-STRING_SET_MULTIBYTE (Lisp_Object str)
-{
-  /* The 0-length strings are unique&shared so we can't modify them.  */
-  eassert (XSTRING (str)->u.s.size > 0);
-  XSTRING (str)->u.s.size_byte = XSTRING (str)->u.s.size;
-}
-
 /* Convenience functions for dealing with Lisp strings.  */
 
 /* WARNING: Use the 'char *' pointers to string data with care in code
@@ -4605,6 +4586,7 @@ list4i (intmax_t a, intmax_t b, intmax_t c, intmax_t d)
 extern Lisp_Object make_formatted_string (char *, const char *, ...)
   ATTRIBUTE_FORMAT_PRINTF (2, 3);
 extern Lisp_Object make_unibyte_string (const char *, ptrdiff_t);
+
 extern ptrdiff_t vectorlike_nbytes (const union vectorlike_header *hdr);
 
 INLINE ptrdiff_t
@@ -5242,6 +5224,32 @@ fast_c_string_match_ignore_case (Lisp_Object regexp,
 #endif
 extern Lisp_Object decode_env_path (const char *, const char *, bool);
 extern Lisp_Object empty_unibyte_string, empty_multibyte_string;
+
+/* Mark *STRPTR as a unibyte string.  Modify *STRPTR if necessary.  */
+INLINE void
+STRING_SET_UNIBYTE (Lisp_Object *strptr)
+{
+  eassume (STRING_MULTIBYTE (*strptr));
+  if (XSTRING (*strptr)->u.s.size == 0)
+    *strptr = empty_unibyte_string;
+  else
+    XSTRING (*strptr)->u.s.size_byte = -1;
+}
+
+/* Mark *STRPTR as a multibyte string.  Assumes that *STRPTR contains
+   only ASCII characters.  Modify *STRPTR if necessary.  */
+INLINE void
+STRING_SET_MULTIBYTE (Lisp_Object *strptr)
+{
+  eassume (!STRING_MULTIBYTE (*strptr));
+  for (ptrdiff_t i = 0; i < SBYTES (*strptr); i++)
+    eassume (ASCII_CHAR_P (SDATA (*strptr)[i]));
+  if (XSTRING (*strptr)->u.s.size == 0)
+    *strptr = empty_multibyte_string;
+  else
+    XSTRING (*strptr)->u.s.size_byte = XSTRING (*strptr)->u.s.size;
+}
+
 extern AVOID terminate_due_to_signal (int, int);
 #ifdef WINDOWSNT
 extern Lisp_Object Vlibrary_cache;
diff --git a/src/print.c b/src/print.c
index f3814859cb3..62c7780416a 100644
--- a/src/print.c
+++ b/src/print.c
@@ -819,7 +819,7 @@ DEFUN ("prin1-to-string", Fprin1_to_string, Sprin1_to_string, 1, 3, 0,
   set_buffer_internal (XBUFFER (Vprin1_to_string_buffer));
   object = Fbuffer_string ();
   if (SBYTES (object) == SCHARS (object))
-    STRING_SET_UNIBYTE (object);
+    STRING_SET_UNIBYTE (&object);
 
   /* Note that this won't make prepare_to_modify_buffer call
      ask-user-about-supersession-threat because this buffer

(I'd also like to propose that (aset str 0 x) always should work on
single-character strings; right now, if str contains a single unibyte
non-ASCII string and x is >= 0x100, the aset will fail, which seems
weird to me).

Pip

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Thu, 16 Jan 2025 06:41:02 GMT) Full text and rfc822 format available.

Message #40 received at 75581-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: gudzpoz <at> gmail.com, schwab <at> linux-m68k.org, 75581-done <at> debbugs.gnu.org
Subject: Re: bug#75581: 30.0.93; Crash when copy-sequence on clear-string'ed
 multibyte strings with text properties
Date: Thu, 16 Jan 2025 08:39:48 +0200

> From: Stefan Kangas <stefankangas <at> gmail.com>
> Date: Wed, 15 Jan 2025 14:19:06 -0800
> Cc: schwab <at> linux-m68k.org, gudzpoz <at> gmail.com, 75581-done <at> debbugs.gnu.org
> 
> > Thanks, but please also document this aspect of the function's
> > behavior in both the doc string and in the ELisp manual.
> 
> Thanks, done.  However, I pushed to master before I realized that maybe
> this belongs on emacs-30?  WDYT?

No, master is better.  This is not an urgent problem, and the code was
like that since the function was introduced 22 years ago.

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Thu, 16 Jan 2025 07:06:02 GMT) Full text and rfc822 format available.

Message #43 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: 75581 <at> debbugs.gnu.org, schwab <at> linux-m68k.org, stefankangas <at> gmail.com,
 gudzpoz <at> gmail.com
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Thu, 16 Jan 2025 09:05:45 +0200

> Cc: 75581 <at> debbugs.gnu.org, Kanakana <gudzpoz <at> gmail.com>,
>  Andreas Schwab <schwab <at> linux-m68k.org>
> Date: Wed, 15 Jan 2025 23:02:32 +0000
> From:  Pip Cet via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> The technical reason is that we simply cannot guarantee a string which
> is entered in the minibuffer or in a real buffer won't be visible
> somewhere: if it's in a buffer, it may be in the gap, and AFAIK there's
> no way to clear that

compact_buffer does at least part of that job, AFAICT, if it shrinks
the gap.  And it should be easy to add a new function that clears the
gap completely, and expose it to Lisp or call it from clear-string.

> it's in the undo history; it's in the M-x view-lossage output.

We could handle that as well, by clearing the undo-list and
recent-keys.

> Physically, clearing memory only makes sense if
> it's an entire cache line: a partial cache line memset will leave the
> original copy in RAM while creating a write-back entry in the CPU cache:
> if that is what you're worried about, the naive approach will actually
> make things worse.

I think there are APIs to clear memory securely, no?  How do other
applications do this?

In any case, we don't say in our documentation that clear-string is
secure or even intended for secure clearing of string text, so I don't
think we give someone false security promises.

Removing this function sounds too extreme to me, since we have that
for many years.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Thu, 16 Jan 2025 07:19:01 GMT) Full text and rfc822 format available.

Message #46 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: 75581 <at> debbugs.gnu.org, schwab <at> linux-m68k.org, stefankangas <at> gmail.com,
 gudzpoz <at> gmail.com
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Thu, 16 Jan 2025 09:18:36 +0200

> Cc: 75581 <at> debbugs.gnu.org, Kanakana <gudzpoz <at> gmail.com>,
>  Andreas Schwab <schwab <at> linux-m68k.org>
> Date: Thu, 16 Jan 2025 00:03:10 +0000
> From:  Pip Cet via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> > No need, I had forgotten the strange way STRING_SET_MULTIBYTE treats its
> > argument as an lvalue.  It's a minor issue, but I would prefer it if the
> > argument were provided as a pointer to a Lisp_Object so it's clear it
> > can be modified.
> 
> Patch (I've verified that the eassume-in-a-loop thing doesn't generate
> any code with current GCC, but it'd be nice to check this for clang):

We have many SET_SOMETHING macros that modify the objects passed to
them as arguments.  So I see no compelling reason to make this change,
which confusingly deviates from the other SET macros.  Let's leave
alone this code, which worked for us for many years.

> (I'd also like to propose that (aset str 0 x) always should work on
> single-character strings; right now, if str contains a single unibyte
> non-ASCII string and x is >= 0x100, the aset will fail, which seems
> weird to me).

This is a separate issue with (AFAIR) a long history.  It should be
discussed separately.  And I don't think I understand the situation
you describe; the below recipe doesn't signal an error (or did you
mean a different kind of failure?):

  (setq str "a")
  (multibyte-string-p str)
    => nil
  (aset str 0 #x100)
    => 256
  str
    => "Ā"
  (multibyte-string-p str)
    => t

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Thu, 16 Jan 2025 11:18:02 GMT) Full text and rfc822 format available.

Message #49 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 75581 <at> debbugs.gnu.org, schwab <at> linux-m68k.org, stefankangas <at> gmail.com,
 gudzpoz <at> gmail.com
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Thu, 16 Jan 2025 11:17:05 +0000

"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Cc: 75581 <at> debbugs.gnu.org, Kanakana <gudzpoz <at> gmail.com>,
>>  Andreas Schwab <schwab <at> linux-m68k.org>
>> Date: Thu, 16 Jan 2025 00:03:10 +0000
>> From:  Pip Cet via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>>
>> > No need, I had forgotten the strange way STRING_SET_MULTIBYTE treats its
>> > argument as an lvalue.  It's a minor issue, but I would prefer it if the
>> > argument were provided as a pointer to a Lisp_Object so it's clear it
>> > can be modified.
>>
>> Patch (I've verified that the eassume-in-a-loop thing doesn't generate
>> any code with current GCC, but it'd be nice to check this for clang):
>
> We have many SET_SOMETHING macros that modify the objects passed to
> them as arguments.

The problem is treating the arguments as lvalues and turning them into
entirely different objects.  XSET* does that, and there are a few macros
that set structs, but I'm aware of only two macros that do this to Lisp
objects, and the other one is clearly named to indicate it coerces its
argument.

So, no, not many cases like this, and this case is bad for other
reasons: assuming that STRING_SET_MULTIBYTE and STRING_SET_UNIBYTE
behave similarly seems natural to me.

> So I see no compelling reason to make this change,
> which confusingly deviates from the other SET macros.

??? Which other *_SET_* macro assumes it is passed an lval, and turns it
into an object with a different identity?  Note that "STRING_SET_"
doesn't start with "SET" or "XSET", and that distinction is important.

> Let's leave alone this code, which worked for us for many years.

I have no problem with reaching that decision ultimately, but the
premises you used to reach it aren't correct: there are no comparable
macros that do this, AFAIK.

>> (I'd also like to propose that (aset str 0 x) always should work on
>> single-character strings; right now, if str contains a single unibyte
>> non-ASCII string and x is >= 0x100, the aset will fail, which seems
>> weird to me).

Sorry, I meant to say "non-ASCII character".

> This is a separate issue with (AFAIR) a long history.  It should be
> discussed separately.  And I don't think I understand the situation
> you describe; the below recipe doesn't signal an error (or did you
> mean a different kind of failure?):
>
>   (setq str "a")
>   (multibyte-string-p str)
>     => nil
>   (aset str 0 #x100)
>     => 256
>   str
>     => "Ā"
>   (multibyte-string-p str)
>     => t

The situation I described was:

(setq str "a")
(aset str 0 #xff)
(aset str 0 #x100)

Pip

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Thu, 16 Jan 2025 11:41:01 GMT) Full text and rfc822 format available.

Message #52 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 75581 <at> debbugs.gnu.org, schwab <at> linux-m68k.org, stefankangas <at> gmail.com,
 gudzpoz <at> gmail.com
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Thu, 16 Jan 2025 11:40:16 +0000

"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Cc: 75581 <at> debbugs.gnu.org, Kanakana <gudzpoz <at> gmail.com>,
>>  Andreas Schwab <schwab <at> linux-m68k.org>
>> Date: Wed, 15 Jan 2025 23:02:32 +0000
>> From:  Pip Cet via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>>
>> The technical reason is that we simply cannot guarantee a string which
>> is entered in the minibuffer or in a real buffer won't be visible
>> somewhere: if it's in a buffer, it may be in the gap, and AFAIK there's
>> no way to clear that
>
> compact_buffer does at least part of that job, AFAICT, if it shrinks
> the gap.

I don't see how.  Worst case, it calls xrealloc which often leaves an
extra copy in memory.

> And it should be easy to add a new function that clears the
> gap completely, and expose it to Lisp or call it from clear-string.

It's easy to clear the current gap completely, but it's not what people
want, which is to actually remove the sensitive data from memory.  That
is much, much harder.

>> it's in the undo history; it's in the M-x view-lossage output.
>
> We could handle that as well, by clearing the undo-list and
> recent-keys.

So we're up to five necessary changes to do what the callers of this
function apparently assume will be achieved by it:

1. realloc must be hardened to clear the old memory
2. the gap must be cleared
3. undo history must be cleared
4. recent-keys must be cleared
5. string compaction must be hardened not to leave extra copies of
strings around

The cache line problem still remains, of course.

None of this is easy.

>> Physically, clearing memory only makes sense if
>> it's an entire cache line: a partial cache line memset will leave the
>> original copy in RAM while creating a write-back entry in the CPU cache:
>> if that is what you're worried about, the naive approach will actually
>> make things worse.
>
> I think there are APIs to clear memory securely, no?

Probably.  Even getting to the point where that matters would require
too many changes, though.

> Removing this function sounds too extreme to me, since we have that
> for many years.

You're probably right, but it should be clearly documented to prevent
only ordinary Lisp code from accessing the password, and not to do any
of the five things above.

read-passwd, for example, definitely needs more warnings than it
currently has: if I dump core after evaluating (clear-string (prog1
(read-passwd "") (garbage-collect))), I see at least three copies of the
password in the coredump.  (Possibly more that use fixnums).

Pip

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Thu, 16 Jan 2025 11:53:01 GMT) Full text and rfc822 format available.

Message #55 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: 75581 <at> debbugs.gnu.org, schwab <at> linux-m68k.org, stefankangas <at> gmail.com,
 gudzpoz <at> gmail.com
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Thu, 16 Jan 2025 13:52:03 +0200

> Date: Thu, 16 Jan 2025 11:17:05 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: stefankangas <at> gmail.com, 75581 <at> debbugs.gnu.org, gudzpoz <at> gmail.com, schwab <at> linux-m68k.org
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> 
> > We have many SET_SOMETHING macros that modify the objects passed to
> > them as arguments.
> 
> The problem is treating the arguments as lvalues and turning them into
> entirely different objects.  XSET* does that, and there are a few macros
> that set structs, but I'm aware of only two macros that do this to Lisp
> objects, and the other one is clearly named to indicate it coerces its
> argument.
> 
> So, no, not many cases like this, and this case is bad for other
> reasons: assuming that STRING_SET_MULTIBYTE and STRING_SET_UNIBYTE
> behave similarly seems natural to me.
> 
> > So I see no compelling reason to make this change,
> > which confusingly deviates from the other SET macros.
> 
> ??? Which other *_SET_* macro assumes it is passed an lval, and turns it
> into an object with a different identity?  Note that "STRING_SET_"
> doesn't start with "SET" or "XSET", and that distinction is important.

I said nothing about lvalues and identities.  I was talking about the
various SET* macros: the XSET* ones, SSET, STRING_SET_CHARS, ASET, and
CHAR_TABLE_SET, to name just a few.  They all create or modify Lisp
objects, and they all accept the objects directly, not as a pointer.

> > Let's leave alone this code, which worked for us for many years.
> 
> I have no problem with reaching that decision ultimately, but the
> premises you used to reach it aren't correct: there are no comparable
> macros that do this, AFAIK.

There are for me: I'm used to these macros from 30 years of hacking
Emacs, and they all receive Lisp objects, not pointers to them.  So
having one or two macros that break this pattern will be confusing,
certainly after so many years of using the same macros as we do now.

> The situation I described was:
> 
> (setq str "a")
> (aset str 0 #xff)
> (aset str 0 #x100)

OK, but why is this weird?  The value #x100 cannot be represented by a
single byte, so we refuse.  Looks understandable to me.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Thu, 16 Jan 2025 12:24:02 GMT) Full text and rfc822 format available.

Message #58 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 75581 <at> debbugs.gnu.org, schwab <at> linux-m68k.org, stefankangas <at> gmail.com,
 gudzpoz <at> gmail.com
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Thu, 16 Jan 2025 12:23:34 +0000

"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Thu, 16 Jan 2025 11:17:05 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: stefankangas <at> gmail.com, 75581 <at> debbugs.gnu.org, gudzpoz <at> gmail.com, schwab <at> linux-m68k.org
>>
>> "Eli Zaretskii" <eliz <at> gnu.org> writes:
>>
>> > We have many SET_SOMETHING macros that modify the objects passed to
>> > them as arguments.
>>
>> The problem is treating the arguments as lvalues and turning them into
>> entirely different objects.  XSET* does that, and there are a few macros
>> that set structs, but I'm aware of only two macros that do this to Lisp
>> objects, and the other one is clearly named to indicate it coerces its
>> argument.
>>
>> So, no, not many cases like this, and this case is bad for other
>> reasons: assuming that STRING_SET_MULTIBYTE and STRING_SET_UNIBYTE
>> behave similarly seems natural to me.
>>
>> > So I see no compelling reason to make this change,
>> > which confusingly deviates from the other SET macros.
>>
>> ??? Which other *_SET_* macro assumes it is passed an lval, and turns it
>> into an object with a different identity?  Note that "STRING_SET_"
>> doesn't start with "SET" or "XSET", and that distinction is important.
>
> I said nothing about lvalues and identities.  I was talking about the

That's the point, though:

Lisp_Object string = string2;
STRING_SET_UNIBYTE (string);
eassert (EQ (string, string2));

will fail, sometimes.  The only other macros that behave in this way are
XSET* (which all follow this convention) and CHECK_FIXNUM_COERCE_MARKER,
which clearly indicates in its name that it coerces its argument.

> various SET* macros: the XSET* ones, SSET, STRING_SET_CHARS, ASET, and
> CHAR_TABLE_SET, to name just a few.  They all create or modify Lisp
> objects, and they all accept the objects directly, not as a pointer.

Because they don't change the value of the Lisp_Object: they only use it
as a pointer and change the underlying object.

>> > Let's leave alone this code, which worked for us for many years.
>>
>> I have no problem with reaching that decision ultimately, but the
>> premises you used to reach it aren't correct: there are no comparable
>> macros that do this, AFAIK.
>
> There are for me: I'm used to these macros from 30 years of hacking
> Emacs, and they all receive Lisp objects, not pointers to them.  So
> having one or two macros that break this pattern will be confusing,
> certainly after so many years of using the same macros as we do now.

The macro that breaks the pattern is STRING_SET_UNIBYTE.  It is
confusing.  It confused me.  "STRING_SET_UNIBYTE (string)" does not look
like it will change the value of the Lisp_Object string: it looks like
it treats the Lisp_Object as a pointer and modifies the string object it
points to.  If STRING_SET_UNIBYTE were a function rather than a macro,
it couldn't change the value of string.  STRING_SET_MULTIBYTE is a
function, and doesn't, and that's even more confusing.  If we want to
change string, the universally accepted way of doing that is to pass a
pointer to it.

>> The situation I described was:
>>
>> (setq str "a")
>> (aset str 0 #xff)
>> (aset str 0 #x100)
>
> OK, but why is this weird?  The value #x100 cannot be represented by a
> single byte, so we refuse.

If I replace the #xff by #x7f, it works.  If I replace it by #x100, it
works.  We don't refuse because #x100 requires several bytes and the
previous character didn't, we refuse because the previous character
cannot be converted to a multibyte form; however, as that character
would be overwritten anyway, why do we care?

> Looks understandable to me.

Not to me.

Pip

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Thu, 16 Jan 2025 12:42:01 GMT) Full text and rfc822 format available.

Message #61 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: 75581 <at> debbugs.gnu.org, schwab <at> linux-m68k.org, stefankangas <at> gmail.com,
 gudzpoz <at> gmail.com
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Thu, 16 Jan 2025 14:40:57 +0200

> Date: Thu, 16 Jan 2025 11:40:16 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: stefankangas <at> gmail.com, 75581 <at> debbugs.gnu.org, gudzpoz <at> gmail.com, schwab <at> linux-m68k.org
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> 
> >> The technical reason is that we simply cannot guarantee a string which
> >> is entered in the minibuffer or in a real buffer won't be visible
> >> somewhere: if it's in a buffer, it may be in the gap, and AFAIK there's
> >> no way to clear that
> >
> > compact_buffer does at least part of that job, AFAICT, if it shrinks
> > the gap.
> 
> I don't see how.  Worst case, it calls xrealloc which often leaves an
> extra copy in memory.

make_gap_smaller zeroes out the part the gap which is no longer the
gap.  The rest is moved to the end of the buffer, and then buffer text
is shrunk, which at least in some cases could unmap that part from the
Emacs process's address space.

> > And it should be easy to add a new function that clears the
> > gap completely, and expose it to Lisp or call it from clear-string.
> 
> It's easy to clear the current gap completely, but it's not what people
> want, which is to actually remove the sensitive data from memory.  That
> is much, much harder.

The data in the gap is part of the problem, no?  You said that, and I
agree.  So I'm saying that we can handle at least that part.  It might
not solve the problem completely, but it will make it smaller, no?

> >> it's in the undo history; it's in the M-x view-lossage output.
> >
> > We could handle that as well, by clearing the undo-list and
> > recent-keys.
> 
> So we're up to five necessary changes to do what the callers of this
> function apparently assume will be achieved by it:

Only if we want to.

> > Removing this function sounds too extreme to me, since we have that
> > for many years.
> 
> You're probably right, but it should be clearly documented to prevent
> only ordinary Lisp code from accessing the password, and not to do any
> of the five things above.
> 
> read-passwd, for example, definitely needs more warnings than it
> currently has: if I dump core after evaluating (clear-string (prog1
> (read-passwd "") (garbage-collect))), I see at least three copies of the
> password in the coredump.  (Possibly more that use fixnums).

Yes, documenting these caveats cannot do any harm.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Thu, 16 Jan 2025 12:58:02 GMT) Full text and rfc822 format available.

Message #64 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: 75581 <at> debbugs.gnu.org, schwab <at> linux-m68k.org, stefankangas <at> gmail.com,
 gudzpoz <at> gmail.com
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Thu, 16 Jan 2025 14:57:34 +0200

> Date: Thu, 16 Jan 2025 12:23:34 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: stefankangas <at> gmail.com, 75581 <at> debbugs.gnu.org, gudzpoz <at> gmail.com, schwab <at> linux-m68k.org
> 
> >> ??? Which other *_SET_* macro assumes it is passed an lval, and turns it
> >> into an object with a different identity?  Note that "STRING_SET_"
> >> doesn't start with "SET" or "XSET", and that distinction is important.
> >
> > I said nothing about lvalues and identities.  I was talking about the
> 
> That's the point, though:
> 
> Lisp_Object string = string2;
> STRING_SET_UNIBYTE (string);
> eassert (EQ (string, string2));
> 
> will fail, sometimes.  The only other macros that behave in this way are
> XSET* (which all follow this convention) and CHECK_FIXNUM_COERCE_MARKER,
> which clearly indicates in its name that it coerces its argument.
> 
> > various SET* macros: the XSET* ones, SSET, STRING_SET_CHARS, ASET, and
> > CHAR_TABLE_SET, to name just a few.  They all create or modify Lisp
> > objects, and they all accept the objects directly, not as a pointer.
> 
> Because they don't change the value of the Lisp_Object: they only use it
> as a pointer and change the underlying object.
> 
> >> > Let's leave alone this code, which worked for us for many years.
> >>
> >> I have no problem with reaching that decision ultimately, but the
> >> premises you used to reach it aren't correct: there are no comparable
> >> macros that do this, AFAIK.
> >
> > There are for me: I'm used to these macros from 30 years of hacking
> > Emacs, and they all receive Lisp objects, not pointers to them.  So
> > having one or two macros that break this pattern will be confusing,
> > certainly after so many years of using the same macros as we do now.
> 
> The macro that breaks the pattern is STRING_SET_UNIBYTE.  It is
> confusing.  It confused me.  "STRING_SET_UNIBYTE (string)" does not look
> like it will change the value of the Lisp_Object string: it looks like
> it treats the Lisp_Object as a pointer and modifies the string object it
> points to.  If STRING_SET_UNIBYTE were a function rather than a macro,
> it couldn't change the value of string.  STRING_SET_MULTIBYTE is a
> function, and doesn't, and that's even more confusing.  If we want to
> change string, the universally accepted way of doing that is to pass a
> pointer to it.

I understand where you are coming from, but after so many years of
staring at the Emacs code this is very natural to me and doesn't cause
any confusion.  When I think about these macros, I don't care about
the differences in their internal workings, whether they modify the
object or just its guts.

> >> The situation I described was:
> >>
> >> (setq str "a")
> >> (aset str 0 #xff)
> >> (aset str 0 #x100)
> >
> > OK, but why is this weird?  The value #x100 cannot be represented by a
> > single byte, so we refuse.
> 
> If I replace the #xff by #x7f, it works.  If I replace it by #x100, it
> works.  We don't refuse because #x100 requires several bytes and the
> previous character didn't, we refuse because the previous character
> cannot be converted to a multibyte form; however, as that character
> would be overwritten anyway, why do we care?

I'd have to dig deep into the history to recollect why we make the
exception for ASCII.  My guess is because ASCII strings are sometimes
unibyte and sometimes multibyte, and we wanted to avoid an even bigger
surprise where the same character instead of #xff sometimes works and
sometimes doesn't.  Users and Lisp programs cannot easily control when
ASCII strings are unibyte and when they are multibyte.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Thu, 16 Jan 2025 14:56:01 GMT) Full text and rfc822 format available.

Message #67 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 75581 <at> debbugs.gnu.org, schwab <at> linux-m68k.org, stefankangas <at> gmail.com,
 gudzpoz <at> gmail.com
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Thu, 16 Jan 2025 14:54:55 +0000

"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> The macro that breaks the pattern is STRING_SET_UNIBYTE.  It is
>> confusing.  It confused me.  "STRING_SET_UNIBYTE (string)" does not look
>> like it will change the value of the Lisp_Object string: it looks like
>> it treats the Lisp_Object as a pointer and modifies the string object it
>> points to.  If STRING_SET_UNIBYTE were a function rather than a macro,
>> it couldn't change the value of string.  STRING_SET_MULTIBYTE is a
>> function, and doesn't, and that's even more confusing.  If we want to
>> change string, the universally accepted way of doing that is to pass a
>> pointer to it.
>
> I understand where you are coming from, but after so many years of
> staring at the Emacs code this is very natural to me and doesn't cause
> any confusion.  When I think about these macros, I don't care about
> the differences in their internal workings, whether they modify the
> object or just its guts.

Okay, I think you said that this isn't doesn't cause relevant developers
any confusion.  I disagree, but ultimately it's your decision whom you
consider relevant, so thanks for clarifying that.

I'm not sure about the XSET* macros, FWIW.  They're not type safe
(fixable, to the extent XSETVECTOR can be type-safe):

+#define XSETCONS(a, b) ({ struct Lisp_Cons *c = (b); ((a) = make_lisp_ptr (c, Lisp_Cons)); })
+#define XSETVECTOR(a, b) ({ union vectorlike_header *d = &(b)->header; struct Lisp_Vector *c = (struct Lisp_Vector *)(d); ((a) = make_lisp_ptr (c, Lisp_Vectorlike)); })

That doesn't reveal any bugs, but I don't really see the reason for
their existence at this point.  I suspect the reason was originally to
avoid struct assignment / struct return values, but that's no longer an
issue.  (Another reason may have been avoiding intermediate Lisp_Object
values because they couldn't be GCPRO'd).

If I have a struct Lisp_Cons *, and I want a Lisp_Object, it would feel
natural to me to write

Lisp_Object x = make_lisp_cons (c);

No void *, no lvalue macro arguments, just an ordinary C expression.

Pip

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Thu, 16 Jan 2025 15:12:02 GMT) Full text and rfc822 format available.

Message #70 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>, Richard Stallman <rms <at> gnu.org>
Cc: 75581 <at> debbugs.gnu.org, schwab <at> linux-m68k.org, stefankangas <at> gmail.com,
 gudzpoz <at> gmail.com
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Thu, 16 Jan 2025 17:11:01 +0200

> Date: Thu, 16 Jan 2025 14:54:55 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: stefankangas <at> gmail.com, 75581 <at> debbugs.gnu.org, gudzpoz <at> gmail.com, schwab <at> linux-m68k.org
> 
> I'm not sure about the XSET* macros, FWIW.  They're not type safe
> (fixable, to the extent XSETVECTOR can be type-safe):
> 
> +#define XSETCONS(a, b) ({ struct Lisp_Cons *c = (b); ((a) = make_lisp_ptr (c, Lisp_Cons)); })
> +#define XSETVECTOR(a, b) ({ union vectorlike_header *d = &(b)->header; struct Lisp_Vector *c = (struct Lisp_Vector *)(d); ((a) = make_lisp_ptr (c, Lisp_Vectorlike)); })
> 
> That doesn't reveal any bugs, but I don't really see the reason for
> their existence at this point.  I suspect the reason was originally to
> avoid struct assignment / struct return values, but that's no longer an
> issue.  (Another reason may have been avoiding intermediate Lisp_Object
> values because they couldn't be GCPRO'd).
> 
> If I have a struct Lisp_Cons *, and I want a Lisp_Object, it would feel
> natural to me to write
> 
> Lisp_Object x = make_lisp_cons (c);
> 
> No void *, no lvalue macro arguments, just an ordinary C expression.

I don't know why these macros were invented, that was long before my
time.  I won't be surprised to hear that they come from some ancient
prototype of Emacs or somesuch.  Perhaps Richard knows.

If we were developing Emacs from scratch today, we would probably use
something simpler.  But we've had those macros for so long, and for
each kind of Lisp object in Emacs, that removing them now would need
to have a very good reason.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Sun, 19 Jan 2025 01:15:02 GMT) Full text and rfc822 format available.

Message #73 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Pip Cet <pipcet <at> protonmail.com>, 75581 <at> debbugs.gnu.org,
 schwab <at> linux-m68k.org, Richard Stallman <rms <at> gnu.org>, gudzpoz <at> gmail.com
Subject: Re: bug#75581: 30.0.93; Crash when copy-sequence on clear-string'ed
 multibyte strings with text properties
Date: Sat, 18 Jan 2025 19:14:39 -0600

Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: stefankangas <at> gmail.com, 75581 <at> debbugs.gnu.org, gudzpoz <at> gmail.com, schwab <at> linux-m68k.org
>>
>> I'm not sure about the XSET* macros, FWIW.  They're not type safe
>> (fixable, to the extent XSETVECTOR can be type-safe):
>>
>> +#define XSETCONS(a, b) ({ struct Lisp_Cons *c = (b); ((a) = make_lisp_ptr (c, Lisp_Cons)); })
>> +#define XSETVECTOR(a, b) ({ union vectorlike_header *d = &(b)->header; struct
>> Lisp_Vector *c = (struct Lisp_Vector *)(d); ((a) = make_lisp_ptr (c,
>> Lisp_Vectorlike)); })
>>
>> That doesn't reveal any bugs, but I don't really see the reason for
>> their existence at this point.  I suspect the reason was originally to
>> avoid struct assignment / struct return values, but that's no longer an
>> issue.  (Another reason may have been avoiding intermediate Lisp_Object
>> values because they couldn't be GCPRO'd).
>>
>> If I have a struct Lisp_Cons *, and I want a Lisp_Object, it would feel
>> natural to me to write
>>
>> Lisp_Object x = make_lisp_cons (c);
>>
>> No void *, no lvalue macro arguments, just an ordinary C expression.
>
> I don't know why these macros were invented, that was long before my
> time.  I won't be surprised to hear that they come from some ancient
> prototype of Emacs or somesuch.  Perhaps Richard knows.
>
> If we were developing Emacs from scratch today, we would probably use
> something simpler.  But we've had those macros for so long, and for
> each kind of Lisp object in Emacs, that removing them now would need
> to have a very good reason.

FTR, I agree that most of these macros should be removed.  It would be a
useful simplification of our code.

This is not a new idea:
https://lists.gnu.org/r/emacs-devel/2024-02/msg00806.html
https://lists.gnu.org/r/emacs-devel/2024-03/msg00064.html

The counter-argument is "muscle memory" for old-time Emacs hackers.
This disregards the complexity and confusion these macros mean for
everyone that do not share said muscle memory.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Sun, 19 Jan 2025 06:36:02 GMT) Full text and rfc822 format available.

Message #76 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: pipcet <at> protonmail.com, 75581 <at> debbugs.gnu.org, schwab <at> linux-m68k.org,
 rms <at> gnu.org, gudzpoz <at> gmail.com
Subject: Re: bug#75581: 30.0.93; Crash when copy-sequence on clear-string'ed
 multibyte strings with text properties
Date: Sun, 19 Jan 2025 08:35:30 +0200

> From: Stefan Kangas <stefankangas <at> gmail.com>
> Date: Sat, 18 Jan 2025 19:14:39 -0600
> Cc: Pip Cet <pipcet <at> protonmail.com>, Richard Stallman <rms <at> gnu.org>, 75581 <at> debbugs.gnu.org, 
> 	schwab <at> linux-m68k.org, gudzpoz <at> gmail.com
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > If we were developing Emacs from scratch today, we would probably use
> > something simpler.  But we've had those macros for so long, and for
> > each kind of Lisp object in Emacs, that removing them now would need
> > to have a very good reason.
> 
> FTR, I agree that most of these macros should be removed.  It would be a
> useful simplification of our code.
> 
> This is not a new idea:
> https://lists.gnu.org/r/emacs-devel/2024-02/msg00806.html
> https://lists.gnu.org/r/emacs-devel/2024-03/msg00064.html
> 
> The counter-argument is "muscle memory" for old-time Emacs hackers.
> This disregards the complexity and confusion these macros mean for
> everyone that do not share said muscle memory.

This is a known tension for which there's no good solution except
asking the newcomers to get used to those macros.  The advantage of
remembering them is that one can instantly know where to look for some
problem or aspect of how Emacs internals work.  Removing them is a
terrible blow; I'm still recovering from the massive overhaul of
macros in lisp.h done years ago, and have to follow the chain of
macros and inline functions there too frequently, where previously I
just _knew_ how a macro works.  By contrast, newcomers will need to
learn the internals anyway, even if these macros weer removed, it's
just that their learning curve could perhaps be somewhat less steep,
because these macros are not hard to understand and the mental model
of their workings is quite simple.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75581; Package emacs. (Sun, 19 Jan 2025 11:16:02 GMT) Full text and rfc822 format available.

Message #79 received at 75581 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 75581 <at> debbugs.gnu.org, rms <at> gnu.org, schwab <at> linux-m68k.org,
 Stefan Kangas <stefankangas <at> gmail.com>, Daniel Colascione <dancol <at> dancol.org>,
 gudzpoz <at> gmail.com
Subject: Re: bug#75581: 30.0.93;
 Crash when copy-sequence on clear-string'ed multibyte strings with
 text properties
Date: Sun, 19 Jan 2025 11:14:46 +0000

"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> From: Stefan Kangas <stefankangas <at> gmail.com>
>> Date: Sat, 18 Jan 2025 19:14:39 -0600
>> Cc: Pip Cet <pipcet <at> protonmail.com>, Richard Stallman <rms <at> gnu.org>, 75581 <at> debbugs.gnu.org,
>> 	schwab <at> linux-m68k.org, gudzpoz <at> gmail.com
>>
>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>
>> > If we were developing Emacs from scratch today, we would probably use
>> > something simpler.  But we've had those macros for so long, and for
>> > each kind of Lisp object in Emacs, that removing them now would need
>> > to have a very good reason.
>>
>> FTR, I agree that most of these macros should be removed.  It would be a
>> useful simplification of our code.
>>
>> This is not a new idea:
>> https://lists.gnu.org/r/emacs-devel/2024-02/msg00806.html
>> https://lists.gnu.org/r/emacs-devel/2024-03/msg00064.html
>>
>> The counter-argument is "muscle memory" for old-time Emacs hackers.
>> This disregards the complexity and confusion these macros mean for
>> everyone that do not share said muscle memory.
>
> This is a known tension for which there's no good solution except
> asking the newcomers to get used to those macros.  The advantage of
> remembering them is that one can instantly know where to look for some
> problem or aspect of how Emacs internals work.  Removing them is a
> terrible blow; I'm still recovering from the massive overhaul of
> macros in lisp.h done years ago, and have to follow the chain of
> macros and inline functions there too frequently, where previously I
> just _knew_ how a macro works.  By contrast, newcomers will need to
> learn the internals anyway, even if these macros weer removed, it's
> just that their learning curve could perhaps be somewhat less steep,
> because these macros are not hard to understand and the mental model
> of their workings is quite simple.

If there is more than a tiny advantage to removing the macros for new
developers, we should do it: otherwise, we're giving the impression that
the future of Emacs isn't long or significant enough for minor
improvements to be worth it.  Even if we aren't actually convinced this
is true, working on a project is best done in a state of unreasonable
optimism.

I don't think replacing XSETCONS(x,c) by "x = make_lisp_ptr (c,
Lisp_Cons)" is worth it.  Both fail to be type safe and simply convert
any pointer to look like a Lisp_Cons (if you're lucky, and
--enable-checking is in use; if you're not, and you pass an unaligned
char *, you'll get an object with a random Lisp tag; worst case, a
fixnum tag, which would produce a valid but incorrect Lisp_Object).

Who'd ever pass an unaligned char * to XSETCONS, you ask?  alloc.c does
precisely that.  The code depends on !USE_LSB_TAG, so this isn't a bug,
but it's not obvious (to me) that this code will cause an assertion
violation if we run it while USE_LSB_TAG is in use:

  if (val && type != MEM_TYPE_NON_LISP)
    {
      Lisp_Object tem;
      XSETCONS (tem, (char *) val + nbytes - 1);
      if ((char *) XCONS (tem) != (char *) val + nbytes - 1)
	{
	  lisp_malloc_loser = val;
	  free (val);
	  val = 0;
	}
    }

I'd be in favor of:

Lisp_Object lisp_cons (struct Lisp_Cons *)
Lisp_Object lisp_vectorlike (union vectorlike_header *)

While the individual changes (use lowercase; don't modify lval
arguments; drop the "make_" because nothing is made; type safety) aren't
worth it individually, all four of them combined constitute a
significant advantage.

(Maybe we could find a better prefix than lisp_ to indicate that we're
turning a C object into a Lisp_Object)

There is one feature of this that some might consider a problem: type
safety means you cannot pass a "const" pointer to make_lisp_ptr or the
new functions.  This would mean we might have to simplify pdumper.c to
avoid some "const" attributes, and the corresponding casts we use to
remove the attributes.

I think that's *good* because, once no-purespace lands, there are no
more "read-only" Lisp_Objects: there's no flag to say that a cons or
string cannot be written to.  There are plenty of objects you shouldn't
modify, of course, but we never throw an error because we think we
detected such a case. (Two exceptions: hash table mutability and trapped
symbol writes).

Pip

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 16 Feb 2025 12:24:13 GMT) Full text and rfc822 format available.

This bug report was last modified 184 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #75581 30.0.93; Crash when copy-sequence on clear-string'ed multibyte strings with text properties

GNU bug report logs - #75581
30.0.93; Crash when copy-sequence on clear-string'ed multibyte strings with text properties