GNU bug report logs - #78444
30.1; Crash in GC (vector_marked_p)

Previous Next

Package: emacs;

Reported by: George P <georgepanagopo <at> gmail.com>

Date: Thu, 15 May 2025 18:46:01 UTC

Severity: normal

Found in version 30.1

Full log


View this message in rfc822 format

From: Pip Cet <pipcet <at> protonmail.com>
To: George P <georgepanagopo <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Andrea Corallo <acorallo <at> gnu.org>, 78444 <at> debbugs.gnu.org
Subject: bug#78444: 30.1; Crash in GC (vector_marked_p)
Date: Fri, 30 May 2025 08:02:08 +0000
"George P" <georgepanagopo <at> gmail.com> writes:

> I got another crash, again in GC.

Interesting!

> I haven't had a chance to apply Pip's patch, so this is just 30.1 as
> before.

No problem.  This looks a bit different anyway.

> The trace looks a bit different, apologies if it's a different
> issue.

Well, it is an issue so we should try to get to the bottom of it.  Worst
case we'll end up fixing two bugs ;-)

> I still have the program under gdb.

It's a good idea to use "gcore" to generate a core file from gdb, just
in case the session is destroyed.

> I did not use any org-mode stuff, so my original hunch. The only
> unusual (from me) thing I did just before the crash was open up an
> eshell buffer, execute a couple of commands, and close it. I hadn't
> used eshell-mode before that in that session.

Can you check whether eln files were created just before the crash?
Over here, the relevant command would be

$ ls -Strl ~/.emacs.d/eln-cache/*/*

and looking at the last few lines to see whether any files have the
creation date in the right range (after you started using eshell but
before the crash happened).

> (gdb) set print elements 0
> (gdb) bt full
> ..... REDACTED
> #50 0x0000000000546b5c in emacs_abort () at sysdep.c:2391
> No locals.
> #51 0x00000000005895f3 in process_mark_stack (base_sp=base_sp <at> entry=0) at alloc.c:7489
>         obj = 0xffdf6e1
>         po = <optimized out>
> #52 0x0000000000589620 in mark_object (obj=<optimized out>) at alloc.c:7503
>         sp = 0
> #53 0x0000000000589719 in mark_maybe_pointer (p=p <at> entry=0x196922b5, symbol_only=symbol_only <at> entry=false) at alloc.c:5260
>         obj = <optimized out>
>         m = <optimized out>
> #54 0x000000000058979b in mark_memory (start=<optimized out>, end=end <at> entry=0x7fffffff7ca0) at alloc.c:5310
>         p = 0x196922b5
>         ip = <optimized out>
>         pp = 0x7fffffffc560 "\265\"i\031"
>         tem = <optimized out>

So the immediate context of the crash was that we found a word on the
stack that contained 0x196922b5; we decided it looked like a pointer,
most likely a tagged pointer to a Lisp vectorlike, and that there was an
object that it might have pointed to.  We tried marking, most likely,
the vectorlike starting at 0x196922b0, and ended up trying to mark
0xffdf6e1, which is not a valid Lisp_Object because it uses the unused
tag.

(What confuses me is that pp > end in frame #54, but we're inside a loop
which reads:

  for (pp = start; (void const *) pp < end; pp += GC_POINTER_ALIGNMENT)
    {
      void *p = *(void *const *) pp;
      mark_maybe_pointer (p, false);
      ...
    }

Debugging information in optimized builds isn't always perfectly
reliable, so it might just be that).

So we first should inspect the memory around 0x196922b0 to find out
whether it looks like a valid vector block, and whether the bad word was
in this block or in the string object marked just before.  I'd suggest
running

    x/32gx 0x196922b0

to look at the memory following the pointer, and

    x/32gx 0x19692200

to get some idea of whether it might be the middle of a vectorlike.

You could try running

    p mem_find (0x196922b5)
    p *$

to see where the vector block (if it is one) was supposed to start.
(There's a slight chance this will crash). It may be worth it to print
the entire vector block, which should be doable by running

    x/512gx $.start

after that command.

> (gdb) p last_marked_index
> $4 = 28
> (gdb) p last_marked
> $5 = {0x18e5e015, 0x3b0b4753, 0x7699356d, 0x249d0a33, 0x6dc36f23, 0x6dc36f23, 0x15554df37b68, 0x7699356d, 0x15554df3e0a8, 0x7cfa99b5,
> 0x7699356d, 0x15554ec33b75, 0x15554ec33b75, 0xceb350, 0x15554ec33b75, 0x15554df3dfe0, 0x15554ec33b75, 0x15554ec336fd, 0x249d0d13,
> 0x15554ec336fd, 0x15554df3dfe0, 0x7699356d, 
>   0x15554ec3321d, 0x7699356d, 0x196922b5, 0x406, 0x15554efc8fac, 0xffdf6e1, 0x269b51f3, 0x17baf00, 0x269b5203, 0x285dedb4, 0x7, 0x0, 0x7

That gives us some more context.  Before hitting our bad object, we
marked 0x406, then 0x15554efc8fac.  0x406 is the Lisp_Object
representation of 257 (0x0101), and that number is used extensively in
bytecode output: bytecode objects with minargs = maxargs = 1 look like
this:

#[257 "...bytecode..." [constants...] ...]

This means that 0x15554efc8fac is most likely the string containing the
bytecode; could you please run

   p *(struct Lisp_String *)0x15554efc8fa8

to see whether this is true, and whether any data is still there?

It would be interesting to know whether we were marking the string or
the bytecode object when the crash happened; please run

   p mark_stk
   p mark_stk.stack[0]
   p mark_stk.stack[1]
   p mark_stk.stack[2]

so we see how deep in the mark stack we were at the time.

Thanks!

Pip





This bug report was last modified 3 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.