GNU bug report logs -
#75754
styled_format stack usage/GC protection
Previous Next
Reported by: Pip Cet <pipcet <at> protonmail.com>
Date: Wed, 22 Jan 2025 10:20:01 UTC
Severity: normal
Done: Pip Cet <pipcet <at> protonmail.com>
Bug is archived. No further changes may be made.
Full log
Message #71 received at 75754 <at> debbugs.gnu.org (full text, mbox):
"Eli Zaretskii" <eliz <at> gnu.org> writes:
>> Date: Fri, 24 Jan 2025 22:35:50 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> >> >> > Even if the code modifies the string after GC?
>> >> >>
>> >> >> If the GC free'd the string, but not the SDATA structure, we cannot
>> >> >> regenerate the string to call resize_string_data. So, no, this cannot
>> >> >> happen.
>> >> >>
>> >> >> Note that resizing a live string while you have a live SDATA pointer to
>> >> >> it might result in the SDATA pointer referring to the pre-modification
>> >> >> SDATA, not the current one.
>> >> >
>> >> > Who said anything about resizing?
>> >>
>> >> Non-resizing Faset is boring, works as expected.
>> >
>> > But it will modify the string whose SDATA pointer is different, no?
>>
>> Different how?
>
> Different as in: "pointing to a location different from the data of
> the string that was moved by GC".
I'm sorry, this may sound like you as though I'm being pedantic, but I
need to be sure that we're talking about the same thing here:
The scenario you describe involves a single string object, but two sdata
areas. The single string points to the first sdata area, but then is
modified to point to the second. Modifying the first sdata area after
the second one has been created will have no effect whatsoever on the
string.
>> There's only one string in this example, and after the
>> string's SDATA changes, you can access the old SDATA, but it doesn't
>> belong to a string anymore.
>
> That's what I thought. So code that expects the old SDATA pointer to
> be pointing to the same string after GC will have a bug on the igc
feature/igc: GC doesn't invalidate SDATA pointers. Resizing string
modification does (the other way is to call pin_string).
> branch. It's a different bug than under the old GC, but still a bug.
> Right?
If there's modification involved, it's a bug which will cause apparently
random behavior but no crashes. If there's no modification, it's a bug
only because the code might be used with alloc.c GC.
>> > Lisp_Object my_string;
>> > char *p = SDATA (my_string);
>> > /* trigger GC */
>> > Faset (my_string, make_fixnum (0), make_fixnum (97));
>> > /* code which expects p[0] to be 'a' */
>>
>> If that is all the code, this is safe with feature/igc.
>
> You say "safe", but it sounds like we have different definitions of
> "safe". Because later you say:
I meant "won't crash, but most likely useless".
>> MPS GC will never make pointers that live on the stack invalid. It's
>> transparent to correct client programs. So, yes, if you're asking about
>> MPS GC, the sequence is safe and doesn't crash, but may surprise you by
>> reproducing the old string data.
>
> Which seems to mean your "safe" means the code will not crash, whereas
> what I meant is the correctness of the code. Code which refers to the
> old string data is incorrect, even if it won't crash.
I believe that's true.
> So my conclusion is that even with MPS, referring in code after GC to
> an SDATA pointer taken before GC leads to incorrect code. IOW,
> pointers to SDATA of a string cannot be relied upon to be valid
> (i.e. pointing to the correct string) across GC, so the code must
> re-initialize such pointer after any fragment that could GC. This was
> the rule with the old GC, and it should continue to be the rule with
> MPS.
No. GC doesn't invalidate SDATA pointers. Only string resizing does
(but that's a technical detail: code should assume all string
modification may resize the string, and always reload any SDATA pointers
they might have).
The only reason such code must be buggy is because it will fail to work
with alloc.c.
In some (rare) cases, we're fine with the undefined behavior: if we call
out to Lisp code which unexpectedly modifies a string we're currently
looking at, we shouldn't crash (this is hard for multibyte strings), but
we shouldn't go out of our way to detect or handle this situation,
either.
(I posted a patch yesterday to fix mapconcat in this case, and it does
precisely that: if we find ourselves in the middle of a multibyte
sequence unexpectedly, that's an error(), not a crash. The patch does
need an additional small change I'd overlooked).
My proposal is to reload the SDATA pointer even in such cases, so it's
more obvious that there is no bug in our code then. If we decide to
establish that rule, we may also want to enforce it.
Pip
This bug report was last modified 162 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.