#76970 - 31.0.50; master emacs crash with stack overflow

GNU bug report logs - #76970
31.0.50; master emacs crash with stack overflow

Package: emacs;

Reported by: Eval Exec <execvy <at> gmail.com>

Date: Wed, 12 Mar 2025 02:45:02 UTC

Severity: normal

Found in version 31.0.50

Done: Pip Cet <pipcet <at> protonmail.com>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org> To: Pip Cet <pipcet <at> protonmail.com> Cc: 76970 <at> debbugs.gnu.org, app-emacs-dev <at> janestreet.com, azeng <at> janestreet.com Subject: bug#76970: 31.0.50; master emacs crash with stack overflow Date: Sat, 21 Jun 2025 13:44:25 +0300

> Date: Sat, 21 Jun 2025 09:38:41 +0000 > From: Pip Cet <pipcet <at> protonmail.com> > Cc: Aaron Zeng <azeng <at> janestreet.com>, 76970 <at> debbugs.gnu.org, app-emacs-dev <at> janestreet.com > > "Eli Zaretskii" <eliz <at> gnu.org> writes: > >> (gdb) bt full > >> #0 0x00000000005564f7 in stack_overflow (siginfo=0xcbeb30 <sigsegv_stack+62896>) at sysdep.c:1902 > >> addr = 0x70 <error: Cannot access memory at address 0x70> > >> bot = <optimized out> > >> top = <optimized out> > >> fatal = false > >> #1 0x00000000005564f7 in handle_sigsegv (sig=11, siginfo=0xcbeb30 <sigsegv_stack+62896>, arg=<optimized out>) at sysdep.c:1937 > >> fatal = false > >> #2 0x00007fbda4812970 in <signal handler called> () at /lib64/libpthread.so.0 > >> #3 0x00000000005c3f27 in backtrace_top () at eval.c:4294 > >> pdl = <optimized out> > >> pdl = <optimized out> > > The segfault is the signal delivered while we were in frame #3, with > signal number 11. > > >> #4 0x00000000005c3f27 in backtrace_top_function () at eval.c:4294 > >> pdl = <optimized out> > >> #5 0x000000000063a0da in add_sample (plog=0xcdf060 <cpu>, count=1436) at lisp.h:1192 > >> #6 0x0000000000557604 in deliver_process_signal (sig=27, handler=0x63a440 <handle_profiler_signal>) at sysdep.c:1758 > >> old_errno = 11 > >> on_main_thread = true > >> #7 0x00007fbda4812970 in <signal handler called> () at /lib64/libpthread.so.0 > >> #8 0x00007fbda481154a in __lll_unlock_wake () at /lib64/libpthread.so.0 > > This is the profiler signal, delivered while we're in frame #8, with > signal number 27. > > >> #9 0x00007fbda480c2e6 in __pthread_mutex_unlock_usercnt () at /lib64/libpthread.so.0 > >> #10 0x000000000063af2f in release_global_lock () at thread.c:621 > >> sa = 0x7ffc6645abd0 > >> self = 0xc76300 <main_thread> > >> oldset = {__val = {0, 0, 843691369, 843691368, 843691369, 843691368, 0, 837799220, 0, 1, 13385680, 13385744, 0, 0, 13385680, 13385744}} > >> #11 0x000000000063af2f in really_call_select (arg=0x7ffc6645abd0) at thread.c:621 > >> sa = 0x7ffc6645abd0 > >> self = 0xc76300 <main_thread> > >> oldset = {__val = {0, 0, 843691369, 843691368, 843691369, 843691368, 0, 837799220, 0, 1, 13385680, 13385744, 0, 0, 13385680, 13385744}} > > > > This seems to be a different problem? The segfault is inside > > release_global_lock, with self = current_thread = &main_thread, which > > is not NULL? Or what did I miss? > > release_global_lock has released the lock, so any other thread could > have set current_thread to point to its thread structure, or set it to > NULL if the other thread has exited. The variable current_thread is a global variable. really_call_select, which calls release_global_lock in the backtrace, does this: static void really_call_select (void *arg) { struct select_args *sa = arg; struct thread_state *self = current_thread; sigset_t oldset; block_interrupt_signal (&oldset); self->not_holding_lock = 1; release_global_lock (); If we are to believe the backtrace, SIGPROF was delivered when we were inside release_global_lock (which doesn't touch current_thread, AFAICT). And the backtrace shows: > #10 0x000000000063af2f in release_global_lock () at thread.c:621 > sa = 0x7ffc6645abd0 > self = 0xc76300 <main_thread> Which tells me that current_thread's value is main_thread, since that's the value of 'self'. And main_thread is always a valid value. If release_global_lock caused some other thread to run, then that other thread will call post_acquire_global_lock, which never sets current_thread to NULL, it only assigns that variable the value of another thread's self. If there's no other thread (i.e., that other thread exited), then release_global_lock will not switch to any other thread and will not set current_thread to NULL. So please elaborate on how this scenario could cause a segfault.

This bug report was last modified 44 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #76970 31.0.50; master emacs crash with stack overflow

GNU bug report logs - #76970
31.0.50; master emacs crash with stack overflow