Package: emacs;
Reported by: Dima Kogan <dima <at> secretsauce.net>
Date: Wed, 17 Jul 2024 20:58:01 UTC
Severity: normal
Found in version 31.0.50
Done: Dima Kogan <dima <at> secretsauce.net>
Bug is archived. No further changes may be made.
Message #11 received at 72165 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Dima Kogan <dima <at> secretsauce.net> Cc: 72165 <at> debbugs.gnu.org Subject: Re: bug#72165: 31.0.50; Intermittent crashing with recent emacs build Date: Thu, 18 Jul 2024 07:58:54 +0300
> From: Dima Kogan <dima <at> secretsauce.net> > Date: Wed, 17 Jul 2024 13:56:27 -0700 > > I'm running a bleeding-edge build of emacs. Using packages from: > > https://emacs.secretsauce.net/ > > Debian GNU/Linux. GTK+. Currently using a build from git as of > 2024/07/09 (8e46f44ea0e). It is crashing periodically, with an unclear > cause. > > This isn't a brand-new problem; I observed a similar crash with an earlier > build: 2024/04/30 (d24981d27ce). After that crash I upgraded, and I see > crashes still. > > Anecdotally, the 2024/04/30 build has been very stable. Today I started > to debug a different issue: something about mu4e modeline updating is > signalling args-out-of-range. To debug this I'm tweaking functions like > (truncate-string-to-width), and re-evaluating them. This debugging isn't > very interesting, but something about it is causing emacs to crash, with > both builds. So when you say that "anecdotally, the 2024/04/30 build has been very stable", what exactly do you mean? It sounds like both that build and the one from 2024/07/09 crash in the same way, so why do you consider the April one "very stable"? > I just made a core. I cannot xbacktrace because (I think) I'm looking at > a core, and not at a live process. If that would be helpful, I can > probably get that. And I see the crash every 20min maybe, while > debugging the mu4e modeline problem. Below is the backtrace. Hopefully > this speaks to somebody. Thanks! Thanks, but please always try to supply the information that explains the crash, not just the backtrace. (In this case, it's a deliberate abort, not a crash, but still.) That means look at the source code where GDB says the problem happens and print the values of the variables involved in the crash. In this case: > (gdb) bt full > #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo <at> entry=6, no_tid=no_tid <at> entry=0) at ./nptl/pthread_kill.c:44 > tid = <optimized out> > ret = 0 > pd = <optimized out> > old_mask = { > __val = {0} > } > ret = <optimized out> > #1 0x00007fc68a4a6b7f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 > #2 0x00007fc68a4584e2 in __GI_raise (sig=sig <at> entry=6) at ../sysdeps/posix/raise.c:26 > ret = <optimized out> > #3 0x0000561d3dcb9798 in terminate_due_to_signal (sig=sig <at> entry=6, backtrace_limit=backtrace_limit <at> entry=40) at ./debian/build-x/src/emacs.c:469 > #4 0x0000561d3dcb9d4e in emacs_abort () at ./debian/build-x/src/sysdep.c:2391 > #5 0x0000561d3dcb6c34 in redisplay_window (window=<optimized out>, just_this_one_p=just_this_one_p <at> entry=false) at ./debian/build-x/src/xdisp.c:20086 The call to emacs_abort seems to be here: /* Some sanity checks. */ CHECK_WINDOW_END (w); if (Z == Z_BYTE && CHARPOS (opoint) != BYTEPOS (opoint)) emacs_abort (); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Now, your "bt full" doesn't help to understand what went wrong because GDB is unable to find the values of many variables: > w = 0x561d6bcb2bc8 > f = <optimized out> > buffer = <optimized out> > old = <optimized out> > lpoint = { > charpos = <optimized out>, > bytepos = <optimized out> > } > opoint = { > charpos = <optimized out>, > bytepos = <optimized out> > } Still, at least Z and Z_BYTE should be available; what are their values? And regarding opoint, look back in the code a small ways to where it was defined: SET_TEXT_POS (opoint, PT, PT_BYTE); If you look up the definition of SET_TEXT_POS, you will see: /* Set character position of POS to CHARPOS, byte position to BYTEPOS. */ #define SET_TEXT_POS(POS, CHARPOS, BYTEPOS) \ ((POS).charpos = (CHARPOS), (POS).bytepos = BYTEPOS) which means opoint takes its character position from PT and its byte position from PT_BYTE. So if you print the values of PT and PT_BYTE, we will know the ("optimized-out") values of opoint.charpos and opoint.bytepos, and will probably be able to understand why we aborted. IOW: (gdb) frame 5 (gdb) print Z (gdb) print Z_BYTE (gdb) print PT (gdb) pt PT_BYTE (The "frame 5" command is to get to the callstack frame where we call emacs_abort, shown as #5 at the right edge of the backtrace line.) If GDB says it doesn't know about these variables with up-cased names, like Z and PT_BYTE, it means your Emacs was built without macro information (the -g3 compiler option), and you will need to type the macro definitions instead. For example (from buffer.h): #define PT (current_buffer->pt + 0) So instead of "print PT" you will need to say "print current_buffer->pt". And similarly with other variables above. Next question is: what buffer did Emacs try to display? To answer that, print the name of the buffer that is current in this place in the code: (gdb) print current_buffer->name_ (gdb) xstring If GDB says it doesn't know what "xstring" is, type: (gdb) source /path/to/emacs/src/.gdbinit and then repeat the above 2 commands. Once you know which buffer was being displayed, try to describe the text that was in it, if you can. (If you cannot, I can give instructions how to find it out using GDB commands.) Thanks.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.