GNU bug report logs - #72165
31.0.50; Intermittent crashing with recent emacs build

Previous Next

Package: emacs;

Reported by: Dima Kogan <dima <at> secretsauce.net>

Date: Wed, 17 Jul 2024 20:58:01 UTC

Severity: normal

Found in version 31.0.50

Done: Dima Kogan <dima <at> secretsauce.net>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 72165 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dima Kogan <dima <at> secretsauce.net>
Cc: 72165 <at> debbugs.gnu.org
Subject: Re: bug#72165: 31.0.50; Intermittent crashing with recent emacs build
Date: Thu, 18 Jul 2024 07:58:54 +0300
> From: Dima Kogan <dima <at> secretsauce.net>
> Date: Wed, 17 Jul 2024 13:56:27 -0700
> 
> I'm running a bleeding-edge build of emacs. Using packages from:
> 
>   https://emacs.secretsauce.net/
> 
> Debian GNU/Linux. GTK+. Currently using a build from git as of
> 2024/07/09 (8e46f44ea0e). It is crashing periodically, with an unclear
> cause.
> 
> This isn't a brand-new problem; I observed a similar crash with an earlier
> build: 2024/04/30 (d24981d27ce). After that crash I upgraded, and I see
> crashes still.
> 
> Anecdotally, the 2024/04/30 build has been very stable. Today I started
> to debug a different issue: something about mu4e modeline updating is
> signalling args-out-of-range. To debug this I'm tweaking functions like
> (truncate-string-to-width), and re-evaluating them. This debugging isn't
> very interesting, but something about it is causing emacs to crash, with
> both builds.

So when you say that "anecdotally, the 2024/04/30 build has been very
stable", what exactly do you mean?  It sounds like both that build and
the one from 2024/07/09 crash in the same way, so why do you consider
the April one "very stable"?

> I just made a core. I cannot xbacktrace because (I think) I'm looking at
> a core, and not at a live process. If that would be helpful, I can
> probably get that. And I see the crash every 20min maybe, while
> debugging the mu4e modeline problem. Below is the backtrace. Hopefully
> this speaks to somebody. Thanks!

Thanks, but please always try to supply the information that explains
the crash, not just the backtrace.  (In this case, it's a deliberate
abort, not a crash, but still.)  That means look at the source code
where GDB says the problem happens and print the values of the
variables involved in the crash.  In this case:

>   (gdb) bt full
>   #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo <at> entry=6, no_tid=no_tid <at> entry=0) at ./nptl/pthread_kill.c:44
>           tid = <optimized out>
>           ret = 0
>           pd = <optimized out>
>           old_mask = {
>             __val = {0}
>           }
>           ret = <optimized out>
>   #1  0x00007fc68a4a6b7f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
>   #2  0x00007fc68a4584e2 in __GI_raise (sig=sig <at> entry=6) at ../sysdeps/posix/raise.c:26
>           ret = <optimized out>
>   #3  0x0000561d3dcb9798 in terminate_due_to_signal (sig=sig <at> entry=6, backtrace_limit=backtrace_limit <at> entry=40) at ./debian/build-x/src/emacs.c:469
>   #4  0x0000561d3dcb9d4e in emacs_abort () at ./debian/build-x/src/sysdep.c:2391
>   #5  0x0000561d3dcb6c34 in redisplay_window (window=<optimized out>, just_this_one_p=just_this_one_p <at> entry=false) at ./debian/build-x/src/xdisp.c:20086

The call to emacs_abort seems to be here:

  /* Some sanity checks.  */
  CHECK_WINDOW_END (w);
  if (Z == Z_BYTE && CHARPOS (opoint) != BYTEPOS (opoint))
    emacs_abort ();  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Now, your "bt full" doesn't help to understand what went wrong because
GDB is unable to find the values of many variables:

>           w = 0x561d6bcb2bc8
>           f = <optimized out>
>           buffer = <optimized out>
>           old = <optimized out>
>           lpoint = {
>             charpos = <optimized out>,
>             bytepos = <optimized out>
>           }
>           opoint = {
>             charpos = <optimized out>,
>             bytepos = <optimized out>
>           }

Still, at least Z and Z_BYTE should be available; what are their
values?

And regarding opoint, look back in the code a small ways to where it
was defined:

  SET_TEXT_POS (opoint, PT, PT_BYTE);

If you look up the definition of SET_TEXT_POS, you will see:

  /* Set character position of POS to CHARPOS, byte position to BYTEPOS.  */

  #define SET_TEXT_POS(POS, CHARPOS, BYTEPOS) \
       ((POS).charpos = (CHARPOS), (POS).bytepos = BYTEPOS)

which means opoint takes its character position from PT and its byte
position from PT_BYTE.  So if you print the values of PT and PT_BYTE,
we will know the ("optimized-out") values of opoint.charpos and
opoint.bytepos, and will probably be able to understand why we
aborted.  IOW:

  (gdb) frame 5
  (gdb) print Z
  (gdb) print Z_BYTE
  (gdb) print PT
  (gdb) pt PT_BYTE

(The "frame 5" command is to get to the callstack frame where we call
emacs_abort, shown as #5 at the right edge of the backtrace line.)

If GDB says it doesn't know about these variables with up-cased names,
like Z and PT_BYTE, it means your Emacs was built without macro
information (the -g3 compiler option), and you will need to type the
macro definitions instead.  For example (from buffer.h):

  #define PT (current_buffer->pt + 0)

So instead of "print PT" you will need to say "print current_buffer->pt".
And similarly with other variables above.

Next question is: what buffer did Emacs try to display?  To answer
that, print the name of the buffer that is current in this place in
the code:

  (gdb) print current_buffer->name_
  (gdb) xstring

If GDB says it doesn't know what "xstring" is, type:

  (gdb) source /path/to/emacs/src/.gdbinit

and then repeat the above 2 commands.

Once you know which buffer was being displayed, try to describe the
text that was in it, if you can.  (If you cannot, I can give
instructions how to find it out using GDB commands.)

Thanks.




This bug report was last modified 287 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.