#75322 - SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)

GNU bug report logs - #75322
SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)

Package: emacs;

Reported by: Pip Cet <pipcet <at> protonmail.com>

Date: Fri, 3 Jan 2025 17:21:02 UTC

Severity: normal

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org> To: Daniel Colascione <dancol <at> dancol.org> Cc: gerd.moellmann <at> gmail.com, pipcet <at> protonmail.com, 75322 <at> debbugs.gnu.org Subject: bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Date: Mon, 06 Jan 2025 14:59:07 +0200

> Date: Sun, 05 Jan 2025 16:15:47 -0500 > From: Daniel Colascione <dancol <at> dancol.org> > CC: pipcet <at> protonmail.com, 75322 <at> debbugs.gnu.org > > > . an automatic variable > > . a static variable that is protected by someone > > . a global variable that is protected by someone > > . a result of dereferencing a pointer that is somehow protected > > > >etc. etc., where "protected by someone" means that it is a descendant > >of some staticpro, or of some root, or... > > Well, yeah. Every other GC program does this. Emacs can too. There's no great burden: all Lisp objects get traced automatically. Everything on the stack or in a register gets traced automatically, and, because the scanning is conservative, pinned. You only have to take extra steps to tell the GC about something when you're going out of your way to go around the GC. > > It's simply not true that to adopt a modern GC every line of code has to change. I wrote a moving GC for Emacs myself years ago. Worked fine. No rewrite. The opinions in this thread are that changes in the code _are_ needed. Maybe that's not a "rewrite" you had in mind, but if we need to make such substantial changes in many places, that's a "rewrite" in my book. > >And if we cannot prove to ourselves that one of the above happens, > >then we'd need to force a copy of the variable to be on the stack? > > > >Does this sound practical? > > > >If this is the price of using MPS, and I'm not missing something > >obvious, then it sounds like we should run away from MPS, fast. > >Because we will sooner or later have to rewrite every single line of > >code we ever wrote. > > No, you do it by adopting a rule that when a function receives a pointer, the caller guarantees the validity of the pointer for the duration of the call. This way, only the level of the stack that spills the array to the heap has to take on the responsibility of keeping the referenced objects alive, and making the spilled array a pointer to the pinned guts of a Lisp vector is an adequate way to do this. We are talking about code such as this one: SAFE_NALLOCA (args2, 1, nargs + 1); args2[0] = Qcall_process; for (i = 0; i < nargs; i++) args2[i + 1] = args[i]; coding_systems = Ffind_operation_coding_system (nargs + 1, args2); val = CONSP (coding_systems) ? XCDR (coding_systems) : Qnil; "Look, ma: no pointers!" The args[] array is fine: it comes from the caller and is valid. The problem being discussed here is the args2[] array, in the case where SAFE_NALLOCA decides args2[] is too large for the stack and instead malloc's it. In that case, args2[] stores on the heap copies of the objects in args[] (still no pointers!), and the issue is that when GC happens (say, at some point inside Ffind_operation_coding_system), the objects in args[] are updated by GC, but their copies in args2[] are not. So this code needs to be changed. And if you look around, we have quite a lot of these in many places. > "Oh, but won't that kill performance?" That wasn't my primary bother. The primary bother is the need to modify many places. Which is not necessarily a purely mechanical refactoring, either. Consider: static Lisp_Object last_coding; Lisp_Object coding = Vcoding_system_for_write; [...] last_coding = coding; call_some_function (); if (NILP (last_coding)) do_something (); If call_some_function can trigger GC, the value of 'coding' after the call is still valid, but the value of 'last_coding' could be garbage (unless 'last_coding' is staticpro'd). We have almost 200 static Lisp_Object variables, probably not all of them staticpro'd (8 of them inside functions, like the above example, so definitely not staticpro'd). So now we need to examine the uses of all of them and either staticpro them or do something else (like move the assignment to 'last_coding' to after call_some_function). And we will probably find other usage patterns which trip the same wire. And the amazing part is not the above, it''s the fact that with all that "unsafe" code people are using the branch for production, and most of them report a stable Emacs. What does that tell us? that most, if not all, of those places are perfectly valid code, and we are haunted by the proverbial shadow of a dwarf? Or maybe it just makes the job harder, because we now need to identify the places where these bad things can _really_ happen, and fix only them? Or something else? This is what I'm trying to establish. The problem is not theoretical, it's entirely practical: where to steer our efforts of stabilizing the branch and preparing it for landing, given the available resources. > The *existing* code is broken, and you don't see it because we use alloca to allocate on the stack almost all the time. If we allocate on the stack almost all the time, we will keep allocating on the stack almost all the time. But anyway, if you are now saying that the code is broken, then a rewrite, and a significant one at that, _is_ needed, right?

This bug report was last modified 243 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #75322 SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)

GNU bug report logs - #75322
SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)