Package: emacs;
Reported by: Pip Cet <pipcet <at> protonmail.com>
Date: Wed, 22 Jan 2025 10:20:01 UTC
Severity: normal
Done: Pip Cet <pipcet <at> protonmail.com>
Bug is archived. No further changes may be made.
Message #68 received at 75754 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Pip Cet <pipcet <at> protonmail.com> Cc: eggert <at> cs.ucla.edu, michael.albinus <at> gmx.de, 75754 <at> debbugs.gnu.org Subject: Re: bug#75754: styled_format stack usage/GC protection Date: Sat, 25 Jan 2025 09:54:58 +0200
> Date: Fri, 24 Jan 2025 22:35:50 +0000 > From: Pip Cet <pipcet <at> protonmail.com> > > >> I think we should eventually strive to call these macros in all but the > >> most obviously correct places where we use SDATA: > >> > >> memcpy (buf, SDATA (str), SBYTES (str)); > >> > >> is perfectly okay, but the current code in call_process is not, and we > >> can (IF we find bugs; most of my experiments don't, and that's okay) > >> decide where to draw the line between those. > > > > I won't have an opinion until I actually see the patches. > > Understood. > > >> My preference would be to store the "no-gc" nesting level in the specpdl > >> directly rather than allocating a new specpdl entry every time we > >> increase it. We definitely do unwind out of no-gc situations, in any > >> case, which is fine. > >> > >> (The alternative would be to call SDATA_FREE on sdata pointers when > >> we're done with them, but that seems excessive to me at this point, > >> particularly since we'd need to unwind that, too.) > > > > You lost me here. > > We could demand that every SDATA call be balanced by an SDATA_FREE call > or a nonlocal exit. > > At this point, I don't think we should, because there are other no-GC > assertions and we've seen where "technically necessary but usually nop" > macros got us with GCPRO. > > I think we'll have to ultimately have to find a compromise, for the old > GC, between reducing no-GC situations and asserting we're in one when we > need to be. If stack references pin SDATA and we scan SAFE_ALLOCA'd > memory conservatively, that would still leave SDATA pointers in > explicitly xmalloc'd memory, and temporarily invalid objects that would > crash the garbage collector if it were to look at them. Maybe more > situations. > > Note these proposals interact non-trivially. > > However, I think we should try hard to keep this as non-intrusive as > possible. ASSUME_NO_GC (); seems okay to me to mark the start of a > no-GC section, but ASSUME_NO_GC (true) balanced by ASSUME_NO_GC (false) > already feels like I'm doing the compiler's work for it, even if an > xsignal non-local exit implies the right numbers of ASSUME_NO_GC (false) > calls: the vast majority of critical sections begin and end in the same > function, and any early return means we've left the critical section. > > However, if we decided to make "return" imply an ASSUME_NO_GC reset, > that, while possible, would probably introduce system dependencies we > might not be able to accept easily. > > I don't know whether most functions that call SDATA have a critical > section which ends before the function returns. You seem to be discussing some details of the ASSUME_NO_GC macro implementation. I cannot follow this because I don't have a clear enough idea of what you have in mind, and you didn't yet describe the main ideas of the implementation. So let's defer this until you do. > >> >> > Even if the code modifies the string after GC? > >> >> > >> >> If the GC free'd the string, but not the SDATA structure, we cannot > >> >> regenerate the string to call resize_string_data. So, no, this cannot > >> >> happen. > >> >> > >> >> Note that resizing a live string while you have a live SDATA pointer to > >> >> it might result in the SDATA pointer referring to the pre-modification > >> >> SDATA, not the current one. > >> > > >> > Who said anything about resizing? > >> > >> Non-resizing Faset is boring, works as expected. > > > > But it will modify the string whose SDATA pointer is different, no? > > Different how? Different as in: "pointing to a location different from the data of the string that was moved by GC". > There's only one string in this example, and after the > string's SDATA changes, you can access the old SDATA, but it doesn't > belong to a string anymore. That's what I thought. So code that expects the old SDATA pointer to be pointing to the same string after GC will have a bug on the igc branch. It's a different bug than under the old GC, but still a bug. Right? > > Lisp_Object my_string; > > char *p = SDATA (my_string); > > /* trigger GC */ > > Faset (my_string, make_fixnum (0), make_fixnum (97)); > > /* code which expects p[0] to be 'a' */ > > If that is all the code, this is safe with feature/igc. You say "safe", but it sounds like we have different definitions of "safe". Because later you say: > MPS GC will never make pointers that live on the stack invalid. It's > transparent to correct client programs. So, yes, if you're asking about > MPS GC, the sequence is safe and doesn't crash, but may surprise you by > reproducing the old string data. Which seems to mean your "safe" means the code will not crash, whereas what I meant is the correctness of the code. Code which refers to the old string data is incorrect, even if it won't crash. So my conclusion is that even with MPS, referring in code after GC to an SDATA pointer taken before GC leads to incorrect code. IOW, pointers to SDATA of a string cannot be relied upon to be valid (i.e. pointing to the correct string) across GC, so the code must re-initialize such pointer after any fragment that could GC. This was the rule with the old GC, and it should continue to be the rule with MPS. > >> > With the old GC, this is a no-no, but you seem to be saying that on > >> > the igc branch it's okay to do that? > >> > >> Note that none of that is guaranteed. We might poison the old SDATA in > >> Faset, and possibly should. I don't think MPS officially allows you to > >> prematurely free an object, so that's probably not going to happen. > > > > Is that a yes? > > It's currently safe. "Safe" as in "won't crash" or "safe" as in "will point to the correct string data"? I think the former, whereas I was talking about the latter. > One scenario is we decide that no useful code > wants to do this, so we deliberately make it crash if we can. Another > scenario is that we decide this is so useful that we add an > SDATA->struct Lisp_String back pointer (alloc.c goes both ways, giving > you a back pointer but moving it around in memory so it'll be invalid > should you ever need it). As both of these scenarios require > significant work and a lot of thought, it's most likely the current > situation persists on feature/igc for a while, and no great loss either > way. That's not the issue that was bothering me.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.