GNU bug report logs -
#76970
31.0.50; master emacs crash with stack overflow
Previous Next
Reported by: Eval Exec <execvy <at> gmail.com>
Date: Wed, 12 Mar 2025 02:45:02 UTC
Severity: normal
Found in version 31.0.50
Done: Pip Cet <pipcet <at> protonmail.com>
Full log
View this message in rfc822 format
"Aaron Zeng" <azeng <at> janestreet.com> writes:
> On Tue, Jun 24, 2025 at 1:46 AM Pip Cet <pipcet <at> protonmail.com> wrote:
>>
>> The original fix was incomplete, please try applying this one, too:
>>
>> diff --git a/src/eval.c b/src/eval.c
>> index 46705dc4543..20782639990 100644
>> --- a/src/eval.c
>> +++ b/src/eval.c
>> @@ -159,7 +159,11 @@ set_backtrace_debug_on_exit (union specbinding *pdl, bool doe)
>>
>> bool
>> backtrace_p (union specbinding *pdl)
>> -{ return specpdl ? pdl >= specpdl : false; }
>> +{
>> + if (current_thread && specpdl && pdl)
>> + return pdl >= specpdl;
>> + return false;
>> +}
>>
>> static bool
>> backtrace_thread_p (struct thread_state *tstate, union specbinding *pdl)
>>
>> > This is from Emacs compiled at revision 991d3ad80a37a1cf8951d2607eb5f7544f968e93.
>> >
>> > It seems it is possible for current_thread to be set to NULL by the dying thread,
>> > during backtrace_top.
>>
>> It is possible, in theory, for that to happen, and this additional race
>> condition also should be fixed, but it's not what happened here. What
>> happened here was that current_thread became NULL, we checked it in
>> backtrace_top, returned a NULL pointer for the pdl, passed that NULL
>> pointer to backtrace_p, dereferenced current_thread again (by evaluating
>> specpdl) and caused a segfault.
>
> Doesn't this new code have the same issue, though? Each time specpdl appears
> in the code, it's dereferencing current_thread anew.
That is an additional issue, but it's not what happened here. But yes,
we should fix that, too, if we can.
Note that current_thread isn't volatile, so the compiler is free not to
dereference it every time; this means that optimization may hide the
race condition you're worried about.
> Maybe I am being overly paranoid, but it seems like the only correct
> way to avoid this race condition altogether is to read current_thread
> only once during the course of the signal handler, and to avoid using
> the macros which may read it a second time, instead dereferencing a
> local copy of the pointer once it has been verified to be not-NULL.
It's not just current_thread that may become invalid, the specpdl itself
may also be concurrently modified (or grow!), and depending on your
precise CPU architecture that may result in nonsensical results: in
fact, the union member which is in use for a specpdl entry might change,
and this would almost always result in a problem, I think.
However, concurrent modifications are much less likely than seeing a
NULL pointer in the rather long window between the termination of one
thread's function and the point when another thread has grabbed the lock
and set current_thread to its own thread state. I'm not saying we
shouldn't worry about them, but if we can avoid the latter for now and
add a comment explaining the former problem that would be good, I think.
Ultimately, it may be best to wait for the next maybe_quit so we can get
a consistent view of the specpdl, but that would distort the backtrace,
too.
Pip
This bug report was last modified 27 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.