GNU bug report logs - #76970
31.0.50; master emacs crash with stack overflow

Previous Next

Package: emacs;

Reported by: Eval Exec <execvy <at> gmail.com>

Date: Wed, 12 Mar 2025 02:45:02 UTC

Severity: normal

Found in version 31.0.50

Done: Pip Cet <pipcet <at> protonmail.com>

Full log


Message #38 received at 76970 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: 76970 <at> debbugs.gnu.org, app-emacs-dev <at> janestreet.com, azeng <at> janestreet.com
Subject: Re: bug#76970: 31.0.50; master emacs crash with stack overflow
Date: Sun, 22 Jun 2025 10:15:24 +0300
> Date: Sun, 22 Jun 2025 06:15:07 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: azeng <at> janestreet.com, 76970 <at> debbugs.gnu.org, app-emacs-dev <at> janestreet.com
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> 
> >> release_global_lock has released the lock, so any other thread could
> >> have set current_thread to point to its thread structure, or set it to
> >> NULL if the other thread has exited.
> >
> > The variable current_thread is a global variable.  really_call_select,
> > which calls release_global_lock in the backtrace, does this:
> >
> >   static void
> >   really_call_select (void *arg)
> >   {
> >     struct select_args *sa = arg;
> >     struct thread_state *self = current_thread;
> >     sigset_t oldset;
> >
> >     block_interrupt_signal (&oldset);
> >     self->not_holding_lock = 1;
> >     release_global_lock ();
> >
> > If we are to believe the backtrace, SIGPROF was delivered when we were
> > inside release_global_lock (which doesn't touch current_thread,
> > AFAICT).  And the backtrace shows:
> >
> >> #10 0x000000000063af2f in release_global_lock () at thread.c:621
> >>         sa = 0x7ffc6645abd0
> >>         self = 0xc76300 <main_thread>
> >
> > Which tells me that current_thread's value is main_thread, since
> > that's the value of 'self'.  And main_thread is always a valid value.
> >
> > If release_global_lock caused some other thread to run, then that
> > other thread will call post_acquire_global_lock, which never sets
> > current_thread to NULL, it only assigns that variable the value of
> 
> Most likely the other thread continued running, finished, and set
> current_thread to NULL before we got a chance to run the main thread
> again.

This is possible, but we have no evidence to think this is what
happened.  Moreover, the main thread didn't yet return from
pthread_mutex_unlock when SIGPROF is delivered:

>> #4  0x00000000005c3f27 in backtrace_top_function () at eval.c:4294
>>         pdl = <optimized out>
>> #5  0x000000000063a0da in add_sample (plog=0xcdf060 <cpu>, count=1436) at lisp.h:1192
>> #6  0x0000000000557604 in deliver_process_signal (sig=27, handler=0x63a440 <handle_profiler_signal>) at sysdep.c:1758
>>         old_errno = 11
>>         on_main_thread = true
>> #7  0x00007fbda4812970 in <signal handler called> () at /lib64/libpthread.so.0
>> #8  0x00007fbda481154a in __lll_unlock_wake () at /lib64/libpthread.so.0
>> #9  0x00007fbda480c2e6 in __pthread_mutex_unlock_usercnt () at /lib64/libpthread.so.0
>> >> #10 0x000000000063af2f in release_global_lock () at thread.c:621

Is the global lock already released at this point? are other threads
allowed to run?  What is __lll_unlock_wake about -- doesn't it wake
some other thread?  If it did not yet do so, the other thread couldn't
have been running.

IOW, we need backtrace from all the threads, not just from the main
thread, to draw any definitive conclusions.




This bug report was last modified 27 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.