GNU bug report logs - #76427
31.0.50; feature/igc: terminate_due_to_signal (sig=sig@entry=6, backtrace_limit=backtrace_limit@entry=2147483647) at ./src/emacs.c:425

Previous Next

Package: emacs;

Reported by: Gregor Zattler <telegraph <at> gmx.net>

Date: Wed, 19 Feb 2025 23:20:02 UTC

Severity: normal

Found in version 31.0.50

Full log


View this message in rfc822 format

From: Pip Cet <pipcet <at> protonmail.com>
To: 76427 <at> debbugs.gnu.org, Gregor Zattler <telegraph <at> gmx.net>
Subject: bug#76427: 31.0.50; feature/igc: terminate_due_to_signal (sig=sig <at> entry=6, backtrace_limit=backtrace_limit <at> entry=2147483647) at ./src/emacs.c:425
Date: Sat, 22 Feb 2025 16:22:42 +0000
"Gregor Zattler via \"Bug reports for GNU Emacs, the Swiss army knife of text editors\"" <bug-gnu-emacs <at> gnu.org> writes:

> Dear Emacs developers, while working
> with org-noter and pdf-tools, Emacs
> crashed.

Do you still have the Emacs binary?  I understand you no longer have the
session or the coredump file, but I have a theory...

>         local_getcjmp = {{
>             __jmpbuf = {140736179668459, 140736179668456, 140737488340880, 93824995807319, 140736306686045, 0, 140737098294339, 56448},
>             __mask_was_saved = -14448,
>             __saved_mask = {
>               __val = {0, 56448, 0, 140736179668459, 140736179668456, 140737488340976, 93824995807319, 56448, 8589920320, 140737488340976, 0, 56448, 140737488341056, 140736568339357, 140737049900221, 140737488341248}
>             }
>           }}
>         save_jump = {{
>             __jmpbuf = {140737049900221, 93824995844760, 140737049900216, 140737488341104, 140737049900216, 0, 0, 93824995067072},
                                                           ^^^^^^^^^^^^^^^                   ^^^^^^^^^^^^^^^

This pointer appears twice in save_jump, but it doesn't appear in
local_getcjmp.  IIUC, that means that it's temporarily stored in
main_thread.s.m_getcjmp, but I don't see how the pointers in m_getcjmp
are protected from GC.

We've changed the other jmp_bufs to be allocated using
igc_xzalloc_ambig, but not this one.  (return_to_command_loop should be
similarly protected, as should xterm.c's x_dnd_disconnect_handler.  The
nativecomp code seems fine as it goes through push_handler).

Maybe I'm missing something, but I think that means some callee-saved
registers can be resurrected from main_thread.s.m_getcjmp but weren't
traced in the meantime.

I think while everything works fine without any inlining, aggressive
inlining may make GCC reuse the stack space for the stack copy of the
jump buffer, leaving only the heap copy.

This problem should affect MPS builds, Emacs 29 builds, and master
builds equally, and it may very well be related to bug#76327, which has
this suspicious error report in its valgrind logs:

==1884847== Invalid read of size 8
==1884847==    at 0x48536DF: memmove (vg_replace_strmem.c:1414)
==1884847==    by 0x34EC86: do_one_unbind (eval.c:3636)
==1884847==    by 0x34EC86: unbind_to (eval.c:3764)
==1884847==    by 0x2C4376: read_char (keyboard.c:2725)
==1884847==    by 0x2C646C: read_key_sequence (keyboard.c:10084)
==1884847==    by 0x2C8525: command_loop_1 (keyboard.c:1384)
==1884847==    by 0x34D785: internal_condition_case (eval.c:1474)
==1884847==    by 0x2B273E: command_loop_2 (keyboard.c:1133)
==1884847==    by 0x34D6D7: internal_catch (eval.c:1197)
==1884847==    by 0x2B26C4: command_loop (keyboard.c:1111)
==1884847==    by 0x2BA460: recursive_edit_1 (keyboard.c:720)
==1884847==    by 0x2BA83C: Frecursive_edit (keyboard.c:803)
==1884847==    by 0x1720E5: main (emacs.c:2521)
==1884847==  Address 0x1ffeffef70 is on thread 1's stack
==1884847==  136 bytes below stack pointer

indicating that read_char restored a jump buffer from stack space that
was no longer reserved.

> +i reg
> rax            0x42                66
> rbx            0x555555606a60      93824992963168
> rcx            0x0                 0
> rdx            0x0                 0
> rsi            0x7fffffff          2147483647
> rdi            0x6                 6
> rbp            0x7fffffffaf80      0x7fffffffaf80
> rsp            0x7fffffffaf78      0x7fffffffaf78
> r8             0x0                 0
> r9             0x73                115
> r10            0x0                 0
> r11            0x202               514
> r12            0x555555671790      93824993400720
> r13            0x7fffe5ddb4b8      140737049900216
                                     ^^^^^^^^^^^^^^^

And here's the pointer again, restored in a callee-saved register.

Gregor, can you please disassemble these functions in your Emacs binary
(please use disass/s if it works, as that makes reading the code
easier):

disass/s read_char
disass/s prepare_menu_bars
disass/s 0x00005555555a4270,0x00005555555a4470
disass/s 0x00005555555a0ada,0x00005555555a0cda

While the bug is tricky and potentially happens only very rarely, with
strange GCC optimization/hardening options, the fix is simple enough,
and should be totally harmless if applied but unnecessary (Famous Last
Words, I know).

Maybe it's a good idea to look at this for Emacs 30.

The one major side effect of the fix is that the thread structure would
no longer require 16-byte alignment on win64, reducing LISP_ALIGNMENT to
8 rather than 16, and saving 4 bytes on average per vector on that
platform.  However, it's possible some other place or module relies on
the larger LISP_ALIGNMENt on this platform....

Pip





This bug report was last modified 108 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.