GNU bug report logs -
#76327
29.4; random segfaults after switch to tree-sitter
Previous Next
Full log
Message #68 received at 76327 <at> debbugs.gnu.org (full text, mbox):
"Eli Zaretskii" <eliz <at> gnu.org> writes:
>> Date: Tue, 18 Feb 2025 19:20:44 +0300
>> From: Evgeniy Dushistov <dushistov <at> mail.ru>
>> Cc: Eli Zaretskii <eliz <at> gnu.org>, 76327 <at> debbugs.gnu.org
>>
>> On Tue, Feb 18, 2025 at 03:12:41PM +0000, Pip Cet wrote:
>> > Can we find out precisely which compiler is in use? Maybe it would even
>> > help to get hold of the emacs binary...
>> >
>>
>> The compiler (gcc) and emacs binary from Arch Linux, all the last avaible
>> versions.
>>
>> I rebuild emacs several times during attempts to debug,
>> but all starts from official emacs binary crashes.
>>
>> Here is initial bug report in arch linux issue tracker:
>>
>> https://gitlab.archlinux.org/archlinux/packaging/packages/emacs/-/issues/6
>
> This is all very strange. So much so that I'm inclined to suspect
> some hardware problem on that system. Did you try building and using
> the same versions on a different box?
I'm pretty sure it's a GCC issue. Here's disass mark_threads from the
archlinux binary:
Dump of assembler code for function mark_threads:
0x0000555555857740 <+0>: endbr64
0x0000555555857744 <+4>: push %rbp
0x0000555555857745 <+5>: xor %esi,%esi
0x0000555555857747 <+7>: lea 0xf2(%rip),%rdi # 0x555555857840
0x000055555585774e <+14>: mov %rsp,%rbp
0x0000555555857751 <+17>: push %r15
0x0000555555857753 <+19>: push %r14
0x0000555555857755 <+21>: push %r13
0x0000555555857757 <+23>: push %r12
0x0000555555857759 <+25>: push %rbx
0x000055555585775a <+26>: pop %rbx
0x000055555585775b <+27>: pop %r12
0x000055555585775d <+29>: pop %r13
0x000055555585775f <+31>: pop %r14
0x0000555555857761 <+33>: pop %r15
0x0000555555857763 <+35>: pop %rbp
0x0000555555857764 <+36>: jmp 0x555555791410 <flush_stack_call_func1>
0x0000555555857769 <+41>: nop
Here's what it should look like:
Dump of assembler code for function mark_threads:
0x000000000087c0d0 <+0>: endbr64
0x000000000087c0d4 <+4>: push %rbp
0x000000000087c0d5 <+5>: xor %esi,%esi
0x000000000087c0d7 <+7>: lea -0x28fe(%rip),%rdi # 0x8797e0 <mark_threads_callback>
0x000000000087c0de <+14>: mov %rsp,%rbp
0x000000000087c0e1 <+17>: push %r15
0x000000000087c0e3 <+19>: push %r14
0x000000000087c0e5 <+21>: push %r13
0x000000000087c0e7 <+23>: push %r12
0x000000000087c0e9 <+25>: push %rbx
0x000000000087c0ea <+26>: sub $0x8,%rsp
0x000000000087c0ee <+30>: call 0x691560 <flush_stack_call_func1>
0x000000000087c0f3 <+35>: add $0x8,%rsp
0x000000000087c0f7 <+39>: pop %rbx
0x000000000087c0f8 <+40>: pop %r12
0x000000000087c0fa <+42>: pop %r13
0x000000000087c0fc <+44>: pop %r14
0x000000000087c0fe <+46>: pop %r15
0x000000000087c100 <+48>: pop %rbp
0x000000000087c101 <+49>: ret
End of assembler dump.
The difference is crucial: the broken version pushes the call-saved
registers, then pops them again immediately afterwards, then overwrites
them in flush_stack_call_func1. The correct version keeps the
call-saved registers on the stack while calling flush_stack_call_func1.
We need to find which of the (many) unusual compiler options cause this
miscompilation, and how to avoid it.
As __builtin_unwind_init isn't really documented, I guess it's okay for
GCC to have decided no longer to implement it in the appropriate
fashion.
Paul, Mattias, do you agree with this analysis?
Evgeniy,, could you try replacing the definition of
flush_stack_call_func in lisp.h by this definition, and recompiling?
INLINE void
flush_stack_call_func (void (*func) (void *arg), void *arg)
{
volatile bool repeat = true;
while (repeat)
{
__builtin_unwind_init ();
asm volatile ("" : : : "memory");
flush_stack_call_func1 (func, arg);
repeat = false;
}
}
This attempts to force GCC to make sure the call-saved registers are
still live by the time we call flush_stack_call_func1, by making it
believe that it might have to call __builtin_unwind_init again depending
on the value of a volatile bool variable.
The asm statement is probably unnecessary.
I'll try figuring out which compiler option is to blame now.
Pip
This bug report was last modified 116 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.