GNU bug report logs - #76327
29.4; random segfaults after switch to tree-sitter

Previous Next

Package: emacs;

Reported by: Evgeniy Dushistov <dushistov <at> mail.ru>

Date: Sun, 16 Feb 2025 08:47:01 UTC

Severity: normal

Found in version 29.4

Full log


View this message in rfc822 format

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 76327 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>, Evgeniy Dushistov <dushistov <at> mail.ru>, Mattias EngdegÄrd <mattiasengdegard <at> gmail.com>
Subject: bug#76327: 29.4; random segfaults after switch to tree-sitter
Date: Tue, 18 Feb 2025 17:44:15 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Tue, 18 Feb 2025 19:20:44 +0300
>> From: Evgeniy Dushistov <dushistov <at> mail.ru>
>> Cc: Eli Zaretskii <eliz <at> gnu.org>, 76327 <at> debbugs.gnu.org
>>
>> On Tue, Feb 18, 2025 at 03:12:41PM +0000, Pip Cet wrote:
>> > Can we find out precisely which compiler is in use?  Maybe it would even
>> > help to get hold of the emacs binary...
>> >
>>
>> The compiler (gcc) and emacs binary from Arch Linux, all the last avaible
>> versions.
>>
>> I rebuild emacs several times during attempts to debug,
>> but all starts from official emacs binary crashes.
>>
>> Here is initial bug report in arch linux issue tracker:
>>
>> https://gitlab.archlinux.org/archlinux/packaging/packages/emacs/-/issues/6
>
> This is all very strange.  So much so that I'm inclined to suspect
> some hardware problem on that system.  Did you try building and using
> the same versions on a different box?

I'm pretty sure it's a GCC issue.  Here's disass mark_threads from the
archlinux binary:

Dump of assembler code for function mark_threads:
   0x0000555555857740 <+0>:	endbr64
   0x0000555555857744 <+4>:	push   %rbp
   0x0000555555857745 <+5>:	xor    %esi,%esi
   0x0000555555857747 <+7>:	lea    0xf2(%rip),%rdi        # 0x555555857840
   0x000055555585774e <+14>:	mov    %rsp,%rbp
   0x0000555555857751 <+17>:	push   %r15
   0x0000555555857753 <+19>:	push   %r14
   0x0000555555857755 <+21>:	push   %r13
   0x0000555555857757 <+23>:	push   %r12
   0x0000555555857759 <+25>:	push   %rbx
   0x000055555585775a <+26>:	pop    %rbx
   0x000055555585775b <+27>:	pop    %r12
   0x000055555585775d <+29>:	pop    %r13
   0x000055555585775f <+31>:	pop    %r14
   0x0000555555857761 <+33>:	pop    %r15
   0x0000555555857763 <+35>:	pop    %rbp
   0x0000555555857764 <+36>:	jmp    0x555555791410 <flush_stack_call_func1>
   0x0000555555857769 <+41>:	nop

Here's what it should look like:

Dump of assembler code for function mark_threads:
   0x000000000087c0d0 <+0>:	endbr64
   0x000000000087c0d4 <+4>:	push   %rbp
   0x000000000087c0d5 <+5>:	xor    %esi,%esi
   0x000000000087c0d7 <+7>:	lea    -0x28fe(%rip),%rdi        # 0x8797e0 <mark_threads_callback>
   0x000000000087c0de <+14>:	mov    %rsp,%rbp
   0x000000000087c0e1 <+17>:	push   %r15
   0x000000000087c0e3 <+19>:	push   %r14
   0x000000000087c0e5 <+21>:	push   %r13
   0x000000000087c0e7 <+23>:	push   %r12
   0x000000000087c0e9 <+25>:	push   %rbx
   0x000000000087c0ea <+26>:	sub    $0x8,%rsp
   0x000000000087c0ee <+30>:	call   0x691560 <flush_stack_call_func1>
   0x000000000087c0f3 <+35>:	add    $0x8,%rsp
   0x000000000087c0f7 <+39>:	pop    %rbx
   0x000000000087c0f8 <+40>:	pop    %r12
   0x000000000087c0fa <+42>:	pop    %r13
   0x000000000087c0fc <+44>:	pop    %r14
   0x000000000087c0fe <+46>:	pop    %r15
   0x000000000087c100 <+48>:	pop    %rbp
   0x000000000087c101 <+49>:	ret
End of assembler dump.

The difference is crucial: the broken version pushes the call-saved
registers, then pops them again immediately afterwards, then overwrites
them in flush_stack_call_func1.  The correct version keeps the
call-saved registers on the stack while calling flush_stack_call_func1.

We need to find which of the (many) unusual compiler options cause this
miscompilation, and how to avoid it.

As __builtin_unwind_init isn't really documented, I guess it's okay for
GCC to have decided no longer to implement it in the appropriate
fashion.

Paul, Mattias, do you agree with this analysis?

Evgeniy,, could you try replacing the definition of
flush_stack_call_func in lisp.h by this definition, and recompiling?

INLINE void
flush_stack_call_func (void (*func) (void *arg), void *arg)
{
  volatile bool repeat = true;
  while (repeat)
    {
      __builtin_unwind_init ();
      asm volatile ("" : : : "memory");
      flush_stack_call_func1 (func, arg);
      repeat = false;
    }
}

This attempts to force GCC to make sure the call-saved registers are
still live by the time we call flush_stack_call_func1, by making it
believe that it might have to call __builtin_unwind_init again depending
on the value of a volatile bool variable.

The asm statement is probably unnecessary.

I'll try figuring out which compiler option is to blame now.

Pip





This bug report was last modified 116 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.