GNU bug report logs - #76970
31.0.50; master emacs crash with stack overflow

Previous Next

Package: emacs;

Reported by: Eval Exec <execvy <at> gmail.com>

Date: Wed, 12 Mar 2025 02:45:02 UTC

Severity: normal

Found in version 31.0.50

Done: Pip Cet <pipcet <at> protonmail.com>

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: 76970 <at> debbugs.gnu.org, app-emacs-dev <at> janestreet.com, azeng <at> janestreet.com
Subject: bug#76970: 31.0.50; master emacs crash with stack overflow
Date: Sat, 21 Jun 2025 13:44:25 +0300
> Date: Sat, 21 Jun 2025 09:38:41 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: Aaron Zeng <azeng <at> janestreet.com>, 76970 <at> debbugs.gnu.org, app-emacs-dev <at> janestreet.com
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> >> (gdb) bt full
> >> #0  0x00000000005564f7 in stack_overflow (siginfo=0xcbeb30 <sigsegv_stack+62896>) at sysdep.c:1902
> >>         addr = 0x70 <error: Cannot access memory at address 0x70>
> >>         bot = <optimized out>
> >>         top = <optimized out>
> >>         fatal = false
> >> #1  0x00000000005564f7 in handle_sigsegv (sig=11, siginfo=0xcbeb30 <sigsegv_stack+62896>, arg=<optimized out>) at sysdep.c:1937
> >>         fatal = false
> >> #2  0x00007fbda4812970 in <signal handler called> () at /lib64/libpthread.so.0
> >> #3  0x00000000005c3f27 in backtrace_top () at eval.c:4294
> >>         pdl = <optimized out>
> >>         pdl = <optimized out>
> 
> The segfault is the signal delivered while we were in frame #3, with
> signal number 11.
> 
> >> #4  0x00000000005c3f27 in backtrace_top_function () at eval.c:4294
> >>         pdl = <optimized out>
> >> #5  0x000000000063a0da in add_sample (plog=0xcdf060 <cpu>, count=1436) at lisp.h:1192
> >> #6  0x0000000000557604 in deliver_process_signal (sig=27, handler=0x63a440 <handle_profiler_signal>) at sysdep.c:1758
> >>         old_errno = 11
> >>         on_main_thread = true
> >> #7  0x00007fbda4812970 in <signal handler called> () at /lib64/libpthread.so.0
> >> #8  0x00007fbda481154a in __lll_unlock_wake () at /lib64/libpthread.so.0
> 
> This is the profiler signal, delivered while we're in frame #8, with
> signal number 27.
> 
> >> #9  0x00007fbda480c2e6 in __pthread_mutex_unlock_usercnt () at /lib64/libpthread.so.0
> >> #10 0x000000000063af2f in release_global_lock () at thread.c:621
> >>         sa = 0x7ffc6645abd0
> >>         self = 0xc76300 <main_thread>
> >>         oldset = {__val = {0, 0, 843691369, 843691368, 843691369, 843691368, 0, 837799220, 0, 1, 13385680, 13385744, 0, 0, 13385680, 13385744}}
> >> #11 0x000000000063af2f in really_call_select (arg=0x7ffc6645abd0) at thread.c:621
> >>         sa = 0x7ffc6645abd0
> >>         self = 0xc76300 <main_thread>
> >>         oldset = {__val = {0, 0, 843691369, 843691368, 843691369, 843691368, 0, 837799220, 0, 1, 13385680, 13385744, 0, 0, 13385680, 13385744}}
> >
> > This seems to be a different problem?  The segfault is inside
> > release_global_lock, with self = current_thread = &main_thread, which
> > is not NULL?  Or what did I miss?
> 
> release_global_lock has released the lock, so any other thread could
> have set current_thread to point to its thread structure, or set it to
> NULL if the other thread has exited.

The variable current_thread is a global variable.  really_call_select,
which calls release_global_lock in the backtrace, does this:

  static void
  really_call_select (void *arg)
  {
    struct select_args *sa = arg;
    struct thread_state *self = current_thread;
    sigset_t oldset;

    block_interrupt_signal (&oldset);
    self->not_holding_lock = 1;
    release_global_lock ();

If we are to believe the backtrace, SIGPROF was delivered when we were
inside release_global_lock (which doesn't touch current_thread,
AFAICT).  And the backtrace shows:

> #10 0x000000000063af2f in release_global_lock () at thread.c:621
>         sa = 0x7ffc6645abd0
>         self = 0xc76300 <main_thread>

Which tells me that current_thread's value is main_thread, since
that's the value of 'self'.  And main_thread is always a valid value.

If release_global_lock caused some other thread to run, then that
other thread will call post_acquire_global_lock, which never sets
current_thread to NULL, it only assigns that variable the value of
another thread's self.  If there's no other thread (i.e., that other
thread exited), then release_global_lock will not switch to any other
thread and will not set current_thread to NULL.

So please elaborate on how this scenario could cause a segfault.




This bug report was last modified 27 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.