Package: emacs;
Reported by: Pip Cet <pipcet <at> protonmail.com>
Date: Fri, 3 Jan 2025 17:21:02 UTC
Severity: normal
View this message in rfc822 format
From: Eli Zaretskii <eliz <at> gnu.org> To: Pip Cet <pipcet <at> protonmail.com> Cc: gerd.moellmann <at> gmail.com, 75322 <at> debbugs.gnu.org Subject: bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Date: Sat, 04 Jan 2025 22:31:48 +0200
> Date: Sat, 04 Jan 2025 19:32:01 +0000 > From: Pip Cet <pipcet <at> protonmail.com> > Cc: gerd.moellmann <at> gmail.com, 75322 <at> debbugs.gnu.org > > > ENCODE_FILE can indeed trigger GC in rare cases, but I see only one > > such call: > > > > path = ENCODE_FILE (path); > > > > We could simply move this to before the loop that sets up new_argv[]. > > Fixing the current code for this super-rare case would be good, but my > point was that we cannot prove the current code to be correct; it isn't, > technically speaking, even though it's battle-worn. I want to be pragmatic and solve practical problems in my life time. Proving that the code is correct I leave to the next generation. > >> >> Yes, make_environment_block does say its callers can't run GC, but > >> >> call_process doesn't indicate when and how it establishes a no-GC > >> >> assumption. > >> > > >> > What would be needed to indicate that? > >> > >> I'd prefer a pair of macros (no-ops in regular builds) to comments, but > >> there is no obvious best solution here. > > > > Sorry, I don't understand: why macros? Do we use something like that > > anywhere else in Emacs? > > How is it different from other easserts? emacs_spawn has > > eassert (input_blocked_p ()); How do you eassert something that should not happen during some code fragment? If you have something specific in mind, do show the code. > >> My proposal would be to remove most (ideally, all, and then we're done) > >> no-GC assumptions, and put the few ones that must remain into separate > >> function bodies for the no-GC / possible-GC cases. Then we can put the > >> no-GC function bodies into a separate section and prohibit references > >> from no-GC to possible-GC functions, and have the linker check that. > > > > First, the techniques that rely on separate segments are non-portable, > > so not all platforms will support them. More importantly, I'm afraid > > Absolutely, debug builds on the main platforms can, though. But then important code that is #ifdef SOME-PLATFORM cannot be handled like that, and we are back at square one, because users of "non-main platforms" still report bugs and we try to investigate and fix them, so we still need to be able to solve this when separate segments are not available. > > the amount of code where we currently don't expect GC is much more > > than you seem to imagine, so I don't think the above is practical. > > That's why I want to remove most non-GC assumptions first. I gave up > the last time I tried doing this, FWIW, for the reason you describe. > > > In any case, the starting point is to audit all the places where GC > > can happen, and that needs to be done anyway if we want to do it > > thoroughly and proactively (as opposed to only when someone reports a > > bug). > > I don't see how putting a few macros in while we're there could hurt :-) We neither saw any macros yet nor have any idea how many of them will be needed. So I don't know why you think it's just a few macros. We are still discussing a tiny static function and haven't arrived at an agreed and safe solution. Let's be a bit more practical here, okay? > > The safe thing is to re-initialize the pointer from the string > > whenever we think GC could happen, and otherwise make sure GC cannot > > happen. > > For the old GC, yes. For the new GC, the safe thing would be to ensure > MPS has removed all memory barriers, take the arena lock, then call the > syscall and return. Or use xstrdup. If this is indeed so (and I don't think it is), then we need to discuss it very thoroughly, because it basically means we cannot do anything with Lisp strings in C. For example, the display iterator has a member that is a Lisp string, which is used when we produce glyphs for display from a string (such as a display property or an overlay before/after string) -- what you say here basically means that we cannot carry that string around, right? If not, how is that different? > >> If a pointer to "old" data is ever exposed to Emacs, we lose, because > >> MPS will reuse the memory for new data, which might be behind a barrier. > >> > >> If we ever do: > >> > >> static Lisp_Object unmarked; > ^^^^^^ > >> unmarked = string; > >> ... trigger GC here ... > >> puts (SDATA (unmarked); > >> > >> the most likely outcome (thanks to Gerd for the hint) is that > >> nonsensical data is printed > > > > Are you sure? > > With the static keyword, yes. (Assuming we don't staticpro the static > variable, of course). What does static have to do with this? What matters is the value, not the storage. The value comes from 'string', a different variable. It points to string data, and it's that string data that is of interest to us in the above snippet. > > The below is indeed unsafe: > > > > char *p = SDATA (unmarked); > > ... trigger GC here ... > > puts (p); > > Right now, that's safe for MPS, but not for the old GC, correct? If GC moves string data, then it is not safe, period. Does MPS move string data? > >> > To clarify, I was trying to understand whether the error message > >> > reported by Ihor in another thread could have happened because of GC > >> > in this are of the code. > >> > >> I currently think that Ihor's test case calls execve () with nonsensical > >> "environment" strings a lot, and once in a while they'll even be behind > >> the barrier which causes an EFAULT. > > > > Before we agree that this could be the reason, we'd need to find the > > places where GC could reuse the memory of a live string, while that > > string appears in some live data structure, and as long as no GC can > > happen between the time we put the SDATA pointers into the environment > > array and the time posix_spawn returns. > > Calling igc_collect () right before the spawn results in corrupt data: But the code doesn't call igc_collect, so how is this relevant? Please see how I described the situation in the paragraph above. > So the only mystery left is what causes GC to be called on current > scratch/igc after the environment block has been created; That's the _only_ mystery, and only if that in fact happens. Someone should explain how GC _can_ happen in that case on the igc branch. If you can, please do, and let's stop arguing about theoretical possibilities. > My current theory is that igc_on_grow_specpdl (called indirectly from > here: > > /* Do the unwind-protect now, even though the pid is not known, so > that no storage allocation is done in the critical section. > The actual PID will be filled in during the critical section. */ > record_unwind_protect (call_process_cleanup, Fcurrent_buffer ()); > > ) releases the arena, and MPS uses that apparent opportunity to call > ArenaPoll which might do GC things. Evidence, please.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.