#76970 - 31.0.50; master emacs crash with stack overflow

GNU bug report logs - #76970
31.0.50; master emacs crash with stack overflow

Package: emacs;

Reported by: Eval Exec <execvy <at> gmail.com>

Date: Wed, 12 Mar 2025 02:45:02 UTC

Severity: normal

Found in version 31.0.50

Done: Pip Cet <pipcet <at> protonmail.com>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Pip Cet <pipcet <at> protonmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 76970 <at> debbugs.gnu.org, app-emacs-dev <at> janestreet.com, azeng <at> janestreet.com Subject: bug#76970: 31.0.50; master emacs crash with stack overflow Date: Sun, 22 Jun 2025 06:15:07 +0000

"Eli Zaretskii" <eliz <at> gnu.org> writes: >> Date: Sat, 21 Jun 2025 09:38:41 +0000 >> From: Pip Cet <pipcet <at> protonmail.com> >> Cc: Aaron Zeng <azeng <at> janestreet.com>, 76970 <at> debbugs.gnu.org, app-emacs-dev <at> janestreet.com >> >> "Eli Zaretskii" <eliz <at> gnu.org> writes: >> >> (gdb) bt full >> >> #0 0x00000000005564f7 in stack_overflow (siginfo=0xcbeb30 <sigsegv_stack+62896>) at sysdep.c:1902 >> >> addr = 0x70 <error: Cannot access memory at address 0x70> >> >> bot = <optimized out> >> >> top = <optimized out> >> >> fatal = false >> >> #1 0x00000000005564f7 in handle_sigsegv (sig=11, siginfo=0xcbeb30 <sigsegv_stack+62896>, arg=<optimized out>) at sysdep.c:1937 >> >> fatal = false >> >> #2 0x00007fbda4812970 in <signal handler called> () at /lib64/libpthread.so.0 >> >> #3 0x00000000005c3f27 in backtrace_top () at eval.c:4294 >> >> pdl = <optimized out> >> >> pdl = <optimized out> >> >> The segfault is the signal delivered while we were in frame #3, with >> signal number 11. >> >> >> #4 0x00000000005c3f27 in backtrace_top_function () at eval.c:4294 >> >> pdl = <optimized out> >> >> #5 0x000000000063a0da in add_sample (plog=0xcdf060 <cpu>, count=1436) at lisp.h:1192 >> >> #6 0x0000000000557604 in deliver_process_signal (sig=27, handler=0x63a440 <handle_profiler_signal>) at sysdep.c:1758 >> >> old_errno = 11 >> >> on_main_thread = true >> >> #7 0x00007fbda4812970 in <signal handler called> () at /lib64/libpthread.so.0 >> >> #8 0x00007fbda481154a in __lll_unlock_wake () at /lib64/libpthread.so.0 >> >> This is the profiler signal, delivered while we're in frame #8, with >> signal number 27. >> >> >> #9 0x00007fbda480c2e6 in __pthread_mutex_unlock_usercnt () at /lib64/libpthread.so.0 >> >> #10 0x000000000063af2f in release_global_lock () at thread.c:621 >> >> sa = 0x7ffc6645abd0 >> >> self = 0xc76300 <main_thread> >> >> oldset = {__val = {0, 0, 843691369, 843691368, 843691369, 843691368, 0, 837799220, 0, 1, 13385680, 13385744, 0, 0, 13385680, 13385744}} >> >> #11 0x000000000063af2f in really_call_select (arg=0x7ffc6645abd0) at thread.c:621 >> >> sa = 0x7ffc6645abd0 >> >> self = 0xc76300 <main_thread> >> >> oldset = {__val = {0, 0, 843691369, 843691368, 843691369, 843691368, 0, 837799220, 0, 1, 13385680, 13385744, 0, 0, 13385680, 13385744}} >> > >> > This seems to be a different problem? The segfault is inside >> > release_global_lock, with self = current_thread = &main_thread, which >> > is not NULL? Or what did I miss? >> >> release_global_lock has released the lock, so any other thread could >> have set current_thread to point to its thread structure, or set it to >> NULL if the other thread has exited. > > The variable current_thread is a global variable. really_call_select, > which calls release_global_lock in the backtrace, does this: > > static void > really_call_select (void *arg) > { > struct select_args *sa = arg; > struct thread_state *self = current_thread; > sigset_t oldset; > > block_interrupt_signal (&oldset); > self->not_holding_lock = 1; > release_global_lock (); > > If we are to believe the backtrace, SIGPROF was delivered when we were > inside release_global_lock (which doesn't touch current_thread, > AFAICT). And the backtrace shows: > >> #10 0x000000000063af2f in release_global_lock () at thread.c:621 >> sa = 0x7ffc6645abd0 >> self = 0xc76300 <main_thread> > > Which tells me that current_thread's value is main_thread, since > that's the value of 'self'. And main_thread is always a valid value. > > If release_global_lock caused some other thread to run, then that > other thread will call post_acquire_global_lock, which never sets > current_thread to NULL, it only assigns that variable the value of Most likely the other thread continued running, finished, and set current_thread to NULL before we got a chance to run the main thread again. It's very likely we spent some time in release_global_lock because we were still in that function when SIGPROF, which only happens once in a while, hit. There may well have been more threads than CPU cores. Pip

This bug report was last modified 44 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #76970 31.0.50; master emacs crash with stack overflow

GNU bug report logs - #76970
31.0.50; master emacs crash with stack overflow