GNU bug report logs - #75322
SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)

Previous Next

Package: emacs;

Reported by: Pip Cet <pipcet <at> protonmail.com>

Date: Fri, 3 Jan 2025 17:21:02 UTC

Severity: normal

Full log


Message #86 received at 75322 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: gerd.moellmann <at> gmail.com, 75322 <at> debbugs.gnu.org
Subject: Re: bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
Date: Sat, 04 Jan 2025 22:31:48 +0200
> Date: Sat, 04 Jan 2025 19:32:01 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: gerd.moellmann <at> gmail.com, 75322 <at> debbugs.gnu.org
> 
> > ENCODE_FILE can indeed trigger GC in rare cases, but I see only one
> > such call:
> >
> >   path = ENCODE_FILE (path);
> >
> > We could simply move this to before the loop that sets up new_argv[].
> 
> Fixing the current code for this super-rare case would be good, but my
> point was that we cannot prove the current code to be correct; it isn't,
> technically speaking, even though it's battle-worn.

I want to be pragmatic and solve practical problems in my life time.
Proving that the code is correct I leave to the next generation.

> >> >> Yes, make_environment_block does say its callers can't run GC, but
> >> >> call_process doesn't indicate when and how it establishes a no-GC
> >> >> assumption.
> >> >
> >> > What would be needed to indicate that?
> >>
> >> I'd prefer a pair of macros (no-ops in regular builds) to comments, but
> >> there is no obvious best solution here.
> >
> > Sorry, I don't understand: why macros?  Do we use something like that
> > anywhere else in Emacs?
> 
> How is it different from other easserts?  emacs_spawn has
> 
>   eassert (input_blocked_p ());

How do you eassert something that should not happen during some code
fragment?  If you have something specific in mind, do show the code.

> >> My proposal would be to remove most (ideally, all, and then we're done)
> >> no-GC assumptions, and put the few ones that must remain into separate
> >> function bodies for the no-GC / possible-GC cases.  Then we can put the
> >> no-GC function bodies into a separate section and prohibit references
> >> from no-GC to possible-GC functions, and have the linker check that.
> >
> > First, the techniques that rely on separate segments are non-portable,
> > so not all platforms will support them.  More importantly, I'm afraid
> 
> Absolutely, debug builds on the main platforms can, though.

But then important code that is #ifdef SOME-PLATFORM cannot be handled
like that, and we are back at square one, because users of "non-main
platforms" still report bugs and we try to investigate and fix them,
so we still need to be able to solve this when separate segments are
not available.

> > the amount of code where we currently don't expect GC is much more
> > than you seem to imagine, so I don't think the above is practical.
> 
> That's why I want to remove most non-GC assumptions first.  I gave up
> the last time I tried doing this, FWIW, for the reason you describe.
> 
> > In any case, the starting point is to audit all the places where GC
> > can happen, and that needs to be done anyway if we want to do it
> > thoroughly and proactively (as opposed to only when someone reports a
> > bug).
> 
> I don't see how putting a few macros in while we're there could hurt :-)

We neither saw any macros yet nor have any idea how many of them will
be needed.  So I don't know why you think it's just a few macros.  We
are still discussing a tiny static function and haven't arrived at an
agreed and safe solution.  Let's be a bit more practical here, okay?

> > The safe thing is to re-initialize the pointer from the string
> > whenever we think GC could happen, and otherwise make sure GC cannot
> > happen.
> 
> For the old GC, yes.  For the new GC, the safe thing would be to ensure
> MPS has removed all memory barriers, take the arena lock, then call the
> syscall and return.  Or use xstrdup.

If this is indeed so (and I don't think it is), then we need to
discuss it very thoroughly, because it basically means we cannot do
anything with Lisp strings in C.  For example, the display iterator
has a member that is a Lisp string, which is used when we produce
glyphs for display from a string (such as a display property or an
overlay before/after string) -- what you say here basically means that
we cannot carry that string around, right?  If not, how is that
different?

> >> If a pointer to "old" data is ever exposed to Emacs, we lose, because
> >> MPS will reuse the memory for new data, which might be behind a barrier.
> >>
> >> If we ever do:
> >>
> >>    static Lisp_Object unmarked;
>       ^^^^^^
> >>    unmarked = string;
> >>    ... trigger GC here ...
> >>    puts (SDATA (unmarked);
> >>
> >> the most likely outcome (thanks to Gerd for the hint) is that
> >> nonsensical data is printed
> >
> > Are you sure?
> 
> With the static keyword, yes.  (Assuming we don't staticpro the static
> variable, of course).

What does static have to do with this?  What matters is the value, not
the storage.  The value comes from 'string', a different variable.  It
points to string data, and it's that string data that is of interest
to us in the above snippet.

> > The below is indeed unsafe:
> >
> >   char *p = SDATA (unmarked);
> >   ... trigger GC here ...
> >   puts (p);
> 
> Right now, that's safe for MPS, but not for the old GC, correct?

If GC moves string data, then it is not safe, period.  Does MPS move
string data?

> >> > To clarify, I was trying to understand whether the error message
> >> > reported by Ihor in another thread could have happened because of GC
> >> > in this are of the code.
> >>
> >> I currently think that Ihor's test case calls execve () with nonsensical
> >> "environment" strings a lot, and once in a while they'll even be behind
> >> the barrier which causes an EFAULT.
> >
> > Before we agree that this could be the reason, we'd need to find the
> > places where GC could reuse the memory of a live string, while that
> > string appears in some live data structure, and as long as no GC can
> > happen between the time we put the SDATA pointers into the environment
> > array and the time posix_spawn returns.
> 
> Calling igc_collect () right before the spawn results in corrupt data:

But the code doesn't call igc_collect, so how is this relevant?
Please see how I described the situation in the paragraph above.

> So the only mystery left is what causes GC to be called on current
> scratch/igc after the environment block has been created;

That's the _only_ mystery, and only if that in fact happens.  Someone
should explain how GC _can_ happen in that case on the igc branch.  If
you can, please do, and let's stop arguing about theoretical
possibilities.

> My current theory is that igc_on_grow_specpdl (called indirectly from
> here:
> 
>   /* Do the unwind-protect now, even though the pid is not known, so
>      that no storage allocation is done in the critical section.
>      The actual PID will be filled in during the critical section.  */
>   record_unwind_protect (call_process_cleanup, Fcurrent_buffer ());
> 
> ) releases the arena, and MPS uses that apparent opportunity to call
> ArenaPoll which might do GC things.

Evidence, please.




This bug report was last modified 147 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.