Package: emacs;
Reported by: Pip Cet <pipcet <at> protonmail.com>
Date: Fri, 3 Jan 2025 17:21:02 UTC
Severity: normal
Message #62 received at 75322 <at> debbugs.gnu.org (full text, mbox):
From: Gerd Möllmann <gerd.moellmann <at> gmail.com> To: Pip Cet <pipcet <at> protonmail.com> Cc: Eli Zaretskii <eliz <at> gnu.org>, 75322 <at> debbugs.gnu.org Subject: Re: bug#75322: SAFE_ALLOCA assumed to root Lisp_Objects/SSDATA(string) Date: Sat, 04 Jan 2025 16:34:15 +0100
Pip Cet <pipcet <at> protonmail.com> writes: > "Eli Zaretskii" <eliz <at> gnu.org> writes: > >>> Date: Sat, 04 Jan 2025 11:08:50 +0000 >>> From: Pip Cet <pipcet <at> protonmail.com> >>> Cc: Gerd Möllmann <gerd.moellmann <at> gmail.com>, 75322 <at> debbugs.gnu.org >>> >>> "Eli Zaretskii" <eliz <at> gnu.org> writes: >>> >>> > The current code in callproc.c assumes that GC cannot run while we are >>> > parked in posix_spawn or vfork. >>> >>> If you're attempting to explain why the current code is safe (if you're >>> saying it is), it assumes much more than that. call_process assumes >>> Fexpand_file_name doesn't GC, for example, which seems unsafe to me: the >>> function calls Lisp, which may do anything, including modifying >>> Vprocess_environment. >> >> AFAICT, expand-file-name is called before we start using the SSDATA of >> strings in the args[] array. Or what did I miss? > > You're right, thanks. I got confused between args and new_argv. > > The next thing I'd look at is the final call to ENCODE_FILE, called > after new_argv is set up; this ends up in encode_coding_object, which > calls safe_funcall in what seems to me to be an unlikely but possible > code path. I assume that's unsafe (the safe_ refers to redisplay, not > GC, IIUC)? > > While maybe_quit is "safe" because it inhibits GC, I believe it can > still call the debugger, which might require more memory than is > available until GC is re-enabled. > >>> Regardless of what you're saying, such assumptions need to be spelled >>> out. Where they are made, that is, not in a utility function. >> >> I'm okay with spelling out these assumptions. > > Excellent. Now we just need to establish what they are :-) > >>> Yes, make_environment_block does say its callers can't run GC, but >>> call_process doesn't indicate when and how it establishes a no-GC >>> assumption. >> >> What would be needed to indicate that? > > I'd prefer a pair of macros (no-ops in regular builds) to comments, but > there is no obvious best solution here. > > My proposal would be to remove most (ideally, all, and then we're done) > no-GC assumptions, and put the few ones that must remain into separate > function bodies for the no-GC / possible-GC cases. Then we can put the > no-GC function bodies into a separate section and prohibit references > from no-GC to possible-GC functions, and have the linker check that. > >>> > Is that assumption false with MPS? >>> >>> As we agreed, code should be written to assume GC can strike at any >>> time. >> >> I don't think we agreed to that. At least I didn't, not in this >> general form. It would be a huge job to make all of our code comply >> with this. > > I said "That's what you should think: GC can strike at any time", and > you said "The same is true with the old GC". I misread that as > agreement. > >>> IIUC, Gerd explained that the old GC can still move the string *data* >>> used in that structure, even if the string metadata stays in place. >> >> If string data is moved, then accessing the old pointer would trigger >> the barrier and MPS will do its thing, not? > > Sorry, but I think I'm confused here. > > IIUC, MPS doesn't currently use barriers on data that is moved (it > could, so the data is copied lazily, but I don't think that's what you > meant), it uses barriers on data that contains references that may not > have been fixed. > > If a pointer to "old" data is ever exposed to Emacs, we lose, because > MPS will reuse the memory for new data, which might be behind a barrier. > > If we ever do: > > static Lisp_Object unmarked; > unmarked = string; > ... trigger GC here ... > puts (SDATA (unmarked); > > the most likely outcome (thanks to Gerd for the hint) is that > nonsensical data is printed, because the string data was reused for > another string; less likely, but possibly, the data is reused for a > different pool (with barriers) and we'd get a SIGSEGV; even less likely, > puts() will call write() directly and we get an EFAULT, and glibc does > whatever glibc does in that case (set errno, I hope). > > Accessing the old pointer is never okay. MPS can't fix it, and doesn't > make catching such errors easy. > >> To clarify, I was trying to understand whether the error message >> reported by Ihor in another thread could have happened because of GC >> in this are of the code. > > I currently think that Ihor's test case calls execve () with nonsensical > "environment" strings a lot, and once in a while they'll even be behind > the barrier which causes an EFAULT. GNU/Linux seems relatively > forgiving about strings without "=" in the envp, for whatever reason. > > Pip I'm entering a state of confusion again, as usual in this discussion. Can we _pretty please_ just do the xstrdup thing, and forget about it?
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.