Package: emacs;
Reported by: Pip Cet <pipcet <at> protonmail.com>
Date: Fri, 3 Jan 2025 17:21:02 UTC
Severity: normal
View this message in rfc822 format
From: Pip Cet <pipcet <at> protonmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: gerd.moellmann <at> gmail.com, 75322 <at> debbugs.gnu.org Subject: bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Date: Sat, 04 Jan 2025 15:26:01 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes: >> Date: Sat, 04 Jan 2025 11:08:50 +0000 >> From: Pip Cet <pipcet <at> protonmail.com> >> Cc: Gerd Möllmann <gerd.moellmann <at> gmail.com>, 75322 <at> debbugs.gnu.org >> >> "Eli Zaretskii" <eliz <at> gnu.org> writes: >> >> > The current code in callproc.c assumes that GC cannot run while we are >> > parked in posix_spawn or vfork. >> >> If you're attempting to explain why the current code is safe (if you're >> saying it is), it assumes much more than that. call_process assumes >> Fexpand_file_name doesn't GC, for example, which seems unsafe to me: the >> function calls Lisp, which may do anything, including modifying >> Vprocess_environment. > > AFAICT, expand-file-name is called before we start using the SSDATA of > strings in the args[] array. Or what did I miss? You're right, thanks. I got confused between args and new_argv. The next thing I'd look at is the final call to ENCODE_FILE, called after new_argv is set up; this ends up in encode_coding_object, which calls safe_funcall in what seems to me to be an unlikely but possible code path. I assume that's unsafe (the safe_ refers to redisplay, not GC, IIUC)? While maybe_quit is "safe" because it inhibits GC, I believe it can still call the debugger, which might require more memory than is available until GC is re-enabled. >> Regardless of what you're saying, such assumptions need to be spelled >> out. Where they are made, that is, not in a utility function. > > I'm okay with spelling out these assumptions. Excellent. Now we just need to establish what they are :-) >> Yes, make_environment_block does say its callers can't run GC, but >> call_process doesn't indicate when and how it establishes a no-GC >> assumption. > > What would be needed to indicate that? I'd prefer a pair of macros (no-ops in regular builds) to comments, but there is no obvious best solution here. My proposal would be to remove most (ideally, all, and then we're done) no-GC assumptions, and put the few ones that must remain into separate function bodies for the no-GC / possible-GC cases. Then we can put the no-GC function bodies into a separate section and prohibit references from no-GC to possible-GC functions, and have the linker check that. >> > Is that assumption false with MPS? >> >> As we agreed, code should be written to assume GC can strike at any >> time. > > I don't think we agreed to that. At least I didn't, not in this > general form. It would be a huge job to make all of our code comply > with this. I said "That's what you should think: GC can strike at any time", and you said "The same is true with the old GC". I misread that as agreement. >> IIUC, Gerd explained that the old GC can still move the string *data* >> used in that structure, even if the string metadata stays in place. > > If string data is moved, then accessing the old pointer would trigger > the barrier and MPS will do its thing, not? Sorry, but I think I'm confused here. IIUC, MPS doesn't currently use barriers on data that is moved (it could, so the data is copied lazily, but I don't think that's what you meant), it uses barriers on data that contains references that may not have been fixed. If a pointer to "old" data is ever exposed to Emacs, we lose, because MPS will reuse the memory for new data, which might be behind a barrier. If we ever do: static Lisp_Object unmarked; unmarked = string; ... trigger GC here ... puts (SDATA (unmarked); the most likely outcome (thanks to Gerd for the hint) is that nonsensical data is printed, because the string data was reused for another string; less likely, but possibly, the data is reused for a different pool (with barriers) and we'd get a SIGSEGV; even less likely, puts() will call write() directly and we get an EFAULT, and glibc does whatever glibc does in that case (set errno, I hope). Accessing the old pointer is never okay. MPS can't fix it, and doesn't make catching such errors easy. > To clarify, I was trying to understand whether the error message > reported by Ihor in another thread could have happened because of GC > in this are of the code. I currently think that Ihor's test case calls execve () with nonsensical "environment" strings a lot, and once in a while they'll even be behind the barrier which causes an EFAULT. GNU/Linux seems relatively forgiving about strings without "=" in the envp, for whatever reason. Pip
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.