GNU bug report logs - #75754
styled_format stack usage/GC protection

Previous Next

Package: emacs;

Reported by: Pip Cet <pipcet <at> protonmail.com>

Date: Wed, 22 Jan 2025 10:20:01 UTC

Severity: normal

Done: Pip Cet <pipcet <at> protonmail.com>

Bug is archived. No further changes may be made.

Full log


Message #47 received at 75754 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eggert <at> cs.ucla.edu, Michael Albinus <michael.albinus <at> gmx.de>,
 75754 <at> debbugs.gnu.org
Subject: Re: bug#75754: styled_format stack usage/GC protection
Date: Fri, 24 Jan 2025 12:51:22 +0000
Michael Albinus, could you please C-s ert-how in this message and
comment briefly if appropriate? I think adding a new defstruct may help
improve ert in at least three ways, including this one.  Please indicate
whether this would be *potentially* acceptable; if it is, I'll prepare a
separate bug.  If it isn't, I might have to think of something else.

"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Thu, 23 Jan 2025 23:58:41 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: eggert <at> cs.ucla.edu, 75754 <at> debbugs.gnu.org
>>
>> 1. Reload format_start and format (and end, and format0) after every
>> call which might have GC'd.  If you think we should do this, please
>> tell me whether lisp_string_width can GC.
>
> It can, if called with the last argument 'true' (because
> find_automatic_composition calls into Lisp).  Currently, we call it
> with 'false', so it cannot.

Is this documented anywhere?

>> More importantly, assuming it doesn't, document this in every
>> function in the call tree starting at lisp_string_width so we don't
>> accidentally change it.
>>
>> 2. memcpy the format string.  Two-liner, more likely to fix the bug for
>> good than (1), wastes more memory (since sa_avail has been negative
>> since we entered the function, this is xmalloc'd memory).
>>
>> 3. replace format by a ptrdiff_t and all instances of *format by
>> SREF (args[0], index).  Faster than 2, but many changes hurting
>> readability.
>
> I prefer (2), I think.  Assuming it indeed fixes the problem.

I'll let you know when I'm (reasonably) certain.

>
>> > +  format_start = SSDATA (args[0]);
>> >    p = buf;
>> >    nchars = 0;
>> >
>> >
>> > should fix it.
>>
>> Just be clear here: that fixes the specific test.  It does not fix the
>> bug.
>
> Can we have a test for "the rest" of the bug, which is not fixed by
> the above?

Thank you very much for suggesting this: if I hadn't been wrong about
the first test case being useful (it was), I'd probably have refused.
While writing this test didn't uncover new bugs, it was an educational
experience.

New test:

(ert-deftest editfns-tests-styled-print ()
   (let* ((gc-me (copy-sequence "foo"))
          (print-unreadable-function
	   (lambda (&rest args)
             (make-string 10 10 t)
             (garbage-collect)
             (make-string 10 10 t)
             (prog1 (make-string 100 ?Ā t)
               (setq gc-me nil)
               (make-string 10 10 t))))
          (str "\"[1] ĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀ ĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀ ĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀĀ\""))
     (should (string=
              (format "%S" (format (copy-sequence "%S %S %S %S") [1]
                                   (symbol-function '+)
                                   (symbol-function '*)
                                   (symbol-function '-)))
              str))))

The "problem" writing this test is that string compaction is very often
idempotent: the only way a string is moved again in a second GC is when
another string which is *older* than the string moved in the first GC is
freed.  That's 'gc-me', which must be live during the first call to
garbage-collect but unreachable for the second call.

While I figured out how to do this for this specific test, it'd be nicer
to just run make check-gc and stress-test the garbage collector in ways
known to have caused problems.

(Brief proposal relevant to several issues: If we want to test GC bugs
more thoroughly, we need a way to tell ert to run them in a special way.
I proposed extending ert with an ert-how defstruct for the make
benchmark target, indicating "how" a test is to be run, and implemented
it, but I don't think I've sent the patch yet.  Using this structure to
create conditions for testing for GC bugs would be possible, and it
would also be an easier way to make crash tests print a message before
they're run.

Of course, this would in no way imply a decision to use this structure,
or ERT, for "make benchmark".  Using it for GC testing or indicating
crash tests would be sufficient reason to add it, IMHO.

While this needs more thought, if it's obviously unacceptable for a
simple reason I haven't thought of, please let me know so I won't waste
time on it.  Of course the final decision will have to be made based on
an actual patch, but if we can have a quick "no" right now I'd
appreciate it.)

>> Here's (2), which seems most doable and, I think, may fix the bug,
>> unless there's a subtlety I missed.
>>
>> diff --git a/src/editfns.c b/src/editfns.c
>> index 4ba356d627c..72cbbb51c40 100644
>> --- a/src/editfns.c
>> +++ b/src/editfns.c
>> @@ -3442,9 +3442,10 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
>>    } *info;
>>
>>    CHECK_STRING (args[0]);
>> -  char *format_start = SSDATA (args[0]);
>>    bool multibyte_format = STRING_MULTIBYTE (args[0]);
>>    ptrdiff_t formatlen = SBYTES (args[0]);
>> +  char *format_start = SAFE_ALLOCA (formatlen);
>> +  memcpy (format_start, SSDATA (args[0]), formatlen);
>>    bool fmt_props = !!string_intervals (args[0]);
>
> Does this assume anything about formatlen, like that it doesn't exceed
> the alloca limit?  If so, should we have an assertion about that?

No.  We're already past the alloca limit at his point so SAFE_ALLOCA
will always call xmalloc here (that's the "excessive stack usage" part
of this bug, which remains unfixed but might be a performance issue in
rare situations involving very many calls to styled_format which don't
GC).

Calling SAFE_ALLOCA, while unsafe for GC (I'd suggest renaming the
macro, but not right now), is safe wrt large allocations.

>> feature/igc avoids all of these problems by keeping SDATA pointers
>> valid, even when the string is resized.
>
> Unless the string is GC'ed, right?

Even if it is GC'd.

The SDATA will stay around while there are live pointers to it.  The
SDATA doesn't keep the string alive, but there's no way to get a
reference to the now-dead Lisp_String structure from the SDATA, so this
won't cause bugs.

Pip





This bug report was last modified 162 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.