#76970 - 31.0.50; master emacs crash with stack overflow

GNU bug report logs - #76970
31.0.50; master emacs crash with stack overflow

Package: emacs;

Reported by: Eval Exec <execvy <at> gmail.com>

Date: Wed, 12 Mar 2025 02:45:02 UTC

Severity: normal

Found in version 31.0.50

Done: Pip Cet <pipcet <at> protonmail.com>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Pip Cet <pipcet <at> protonmail.com> To: Aaron Zeng <azeng <at> janestreet.com> Cc: Eli Zaretskii <eliz <at> gnu.org>, app-emacs-dev <at> janestreet.com, 76970 <at> debbugs.gnu.org Subject: bug#76970: 31.0.50; master emacs crash with stack overflow Date: Wed, 25 Jun 2025 11:43:34 +0000

"Aaron Zeng" <azeng <at> janestreet.com> writes: > On Tue, Jun 24, 2025 at 1:46 AM Pip Cet <pipcet <at> protonmail.com> wrote: >> >> The original fix was incomplete, please try applying this one, too: >> >> diff --git a/src/eval.c b/src/eval.c >> index 46705dc4543..20782639990 100644 >> --- a/src/eval.c >> +++ b/src/eval.c >> @@ -159,7 +159,11 @@ set_backtrace_debug_on_exit (union specbinding *pdl, bool doe) >> >> bool >> backtrace_p (union specbinding *pdl) >> -{ return specpdl ? pdl >= specpdl : false; } >> +{ >> + if (current_thread && specpdl && pdl) >> + return pdl >= specpdl; >> + return false; >> +} >> >> static bool >> backtrace_thread_p (struct thread_state *tstate, union specbinding *pdl) >> >> > This is from Emacs compiled at revision 991d3ad80a37a1cf8951d2607eb5f7544f968e93. >> > >> > It seems it is possible for current_thread to be set to NULL by the dying thread, >> > during backtrace_top. >> >> It is possible, in theory, for that to happen, and this additional race >> condition also should be fixed, but it's not what happened here. What >> happened here was that current_thread became NULL, we checked it in >> backtrace_top, returned a NULL pointer for the pdl, passed that NULL >> pointer to backtrace_p, dereferenced current_thread again (by evaluating >> specpdl) and caused a segfault. > > Doesn't this new code have the same issue, though? Each time specpdl appears > in the code, it's dereferencing current_thread anew. That is an additional issue, but it's not what happened here. But yes, we should fix that, too, if we can. Note that current_thread isn't volatile, so the compiler is free not to dereference it every time; this means that optimization may hide the race condition you're worried about. > Maybe I am being overly paranoid, but it seems like the only correct > way to avoid this race condition altogether is to read current_thread > only once during the course of the signal handler, and to avoid using > the macros which may read it a second time, instead dereferencing a > local copy of the pointer once it has been verified to be not-NULL. It's not just current_thread that may become invalid, the specpdl itself may also be concurrently modified (or grow!), and depending on your precise CPU architecture that may result in nonsensical results: in fact, the union member which is in use for a specpdl entry might change, and this would almost always result in a problem, I think. However, concurrent modifications are much less likely than seeing a NULL pointer in the rather long window between the termination of one thread's function and the point when another thread has grabbed the lock and set current_thread to its own thread state. I'm not saying we shouldn't worry about them, but if we can avoid the latter for now and add a comment explaining the former problem that would be good, I think. Ultimately, it may be best to wait for the next maybe_quit so we can get a consistent view of the specpdl, but that would distort the backtrace, too. Pip

This bug report was last modified 44 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #76970 31.0.50; master emacs crash with stack overflow

GNU bug report logs - #76970
31.0.50; master emacs crash with stack overflow