GNU bug report logs -
#75322
SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string)
Previous Next
Full log
Message #65 received at 75322 <at> debbugs.gnu.org (full text, mbox):
"Eli Zaretskii" <eliz <at> gnu.org> writes:
>> Date: Sat, 04 Jan 2025 11:08:50 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: Gerd Möllmann <gerd.moellmann <at> gmail.com>, 75322 <at> debbugs.gnu.org
>>
>> "Eli Zaretskii" <eliz <at> gnu.org> writes:
>>
>> > The current code in callproc.c assumes that GC cannot run while we are
>> > parked in posix_spawn or vfork.
>>
>> If you're attempting to explain why the current code is safe (if you're
>> saying it is), it assumes much more than that. call_process assumes
>> Fexpand_file_name doesn't GC, for example, which seems unsafe to me: the
>> function calls Lisp, which may do anything, including modifying
>> Vprocess_environment.
>
> AFAICT, expand-file-name is called before we start using the SSDATA of
> strings in the args[] array. Or what did I miss?
You're right, thanks. I got confused between args and new_argv.
The next thing I'd look at is the final call to ENCODE_FILE, called
after new_argv is set up; this ends up in encode_coding_object, which
calls safe_funcall in what seems to me to be an unlikely but possible
code path. I assume that's unsafe (the safe_ refers to redisplay, not
GC, IIUC)?
While maybe_quit is "safe" because it inhibits GC, I believe it can
still call the debugger, which might require more memory than is
available until GC is re-enabled.
>> Regardless of what you're saying, such assumptions need to be spelled
>> out. Where they are made, that is, not in a utility function.
>
> I'm okay with spelling out these assumptions.
Excellent. Now we just need to establish what they are :-)
>> Yes, make_environment_block does say its callers can't run GC, but
>> call_process doesn't indicate when and how it establishes a no-GC
>> assumption.
>
> What would be needed to indicate that?
I'd prefer a pair of macros (no-ops in regular builds) to comments, but
there is no obvious best solution here.
My proposal would be to remove most (ideally, all, and then we're done)
no-GC assumptions, and put the few ones that must remain into separate
function bodies for the no-GC / possible-GC cases. Then we can put the
no-GC function bodies into a separate section and prohibit references
from no-GC to possible-GC functions, and have the linker check that.
>> > Is that assumption false with MPS?
>>
>> As we agreed, code should be written to assume GC can strike at any
>> time.
>
> I don't think we agreed to that. At least I didn't, not in this
> general form. It would be a huge job to make all of our code comply
> with this.
I said "That's what you should think: GC can strike at any time", and
you said "The same is true with the old GC". I misread that as
agreement.
>> IIUC, Gerd explained that the old GC can still move the string *data*
>> used in that structure, even if the string metadata stays in place.
>
> If string data is moved, then accessing the old pointer would trigger
> the barrier and MPS will do its thing, not?
Sorry, but I think I'm confused here.
IIUC, MPS doesn't currently use barriers on data that is moved (it
could, so the data is copied lazily, but I don't think that's what you
meant), it uses barriers on data that contains references that may not
have been fixed.
If a pointer to "old" data is ever exposed to Emacs, we lose, because
MPS will reuse the memory for new data, which might be behind a barrier.
If we ever do:
static Lisp_Object unmarked;
unmarked = string;
... trigger GC here ...
puts (SDATA (unmarked);
the most likely outcome (thanks to Gerd for the hint) is that
nonsensical data is printed, because the string data was reused for
another string; less likely, but possibly, the data is reused for a
different pool (with barriers) and we'd get a SIGSEGV; even less likely,
puts() will call write() directly and we get an EFAULT, and glibc does
whatever glibc does in that case (set errno, I hope).
Accessing the old pointer is never okay. MPS can't fix it, and doesn't
make catching such errors easy.
> To clarify, I was trying to understand whether the error message
> reported by Ihor in another thread could have happened because of GC
> in this are of the code.
I currently think that Ihor's test case calls execve () with nonsensical
"environment" strings a lot, and once in a while they'll even be behind
the barrier which causes an EFAULT. GNU/Linux seems relatively
forgiving about strings without "=" in the envp, for whatever reason.
Pip
This bug report was last modified 147 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.