Package: emacs;
Reported by: George P <georgepanagopo <at> gmail.com>
Date: Thu, 15 May 2025 18:46:01 UTC
Severity: normal
Found in version 30.1
View this message in rfc822 format
From: Pip Cet <pipcet <at> protonmail.com> To: George P <georgepanagopo <at> gmail.com> Cc: Eli Zaretskii <eliz <at> gnu.org>, Andrea Corallo <acorallo <at> gnu.org>, 78444 <at> debbugs.gnu.org Subject: bug#78444: 30.1; Crash in GC (vector_marked_p) Date: Fri, 30 May 2025 08:02:08 +0000
"George P" <georgepanagopo <at> gmail.com> writes: > I got another crash, again in GC. Interesting! > I haven't had a chance to apply Pip's patch, so this is just 30.1 as > before. No problem. This looks a bit different anyway. > The trace looks a bit different, apologies if it's a different > issue. Well, it is an issue so we should try to get to the bottom of it. Worst case we'll end up fixing two bugs ;-) > I still have the program under gdb. It's a good idea to use "gcore" to generate a core file from gdb, just in case the session is destroyed. > I did not use any org-mode stuff, so my original hunch. The only > unusual (from me) thing I did just before the crash was open up an > eshell buffer, execute a couple of commands, and close it. I hadn't > used eshell-mode before that in that session. Can you check whether eln files were created just before the crash? Over here, the relevant command would be $ ls -Strl ~/.emacs.d/eln-cache/*/* and looking at the last few lines to see whether any files have the creation date in the right range (after you started using eshell but before the crash happened). > (gdb) set print elements 0 > (gdb) bt full > ..... REDACTED > #50 0x0000000000546b5c in emacs_abort () at sysdep.c:2391 > No locals. > #51 0x00000000005895f3 in process_mark_stack (base_sp=base_sp <at> entry=0) at alloc.c:7489 > obj = 0xffdf6e1 > po = <optimized out> > #52 0x0000000000589620 in mark_object (obj=<optimized out>) at alloc.c:7503 > sp = 0 > #53 0x0000000000589719 in mark_maybe_pointer (p=p <at> entry=0x196922b5, symbol_only=symbol_only <at> entry=false) at alloc.c:5260 > obj = <optimized out> > m = <optimized out> > #54 0x000000000058979b in mark_memory (start=<optimized out>, end=end <at> entry=0x7fffffff7ca0) at alloc.c:5310 > p = 0x196922b5 > ip = <optimized out> > pp = 0x7fffffffc560 "\265\"i\031" > tem = <optimized out> So the immediate context of the crash was that we found a word on the stack that contained 0x196922b5; we decided it looked like a pointer, most likely a tagged pointer to a Lisp vectorlike, and that there was an object that it might have pointed to. We tried marking, most likely, the vectorlike starting at 0x196922b0, and ended up trying to mark 0xffdf6e1, which is not a valid Lisp_Object because it uses the unused tag. (What confuses me is that pp > end in frame #54, but we're inside a loop which reads: for (pp = start; (void const *) pp < end; pp += GC_POINTER_ALIGNMENT) { void *p = *(void *const *) pp; mark_maybe_pointer (p, false); ... } Debugging information in optimized builds isn't always perfectly reliable, so it might just be that). So we first should inspect the memory around 0x196922b0 to find out whether it looks like a valid vector block, and whether the bad word was in this block or in the string object marked just before. I'd suggest running x/32gx 0x196922b0 to look at the memory following the pointer, and x/32gx 0x19692200 to get some idea of whether it might be the middle of a vectorlike. You could try running p mem_find (0x196922b5) p *$ to see where the vector block (if it is one) was supposed to start. (There's a slight chance this will crash). It may be worth it to print the entire vector block, which should be doable by running x/512gx $.start after that command. > (gdb) p last_marked_index > $4 = 28 > (gdb) p last_marked > $5 = {0x18e5e015, 0x3b0b4753, 0x7699356d, 0x249d0a33, 0x6dc36f23, 0x6dc36f23, 0x15554df37b68, 0x7699356d, 0x15554df3e0a8, 0x7cfa99b5, > 0x7699356d, 0x15554ec33b75, 0x15554ec33b75, 0xceb350, 0x15554ec33b75, 0x15554df3dfe0, 0x15554ec33b75, 0x15554ec336fd, 0x249d0d13, > 0x15554ec336fd, 0x15554df3dfe0, 0x7699356d, > 0x15554ec3321d, 0x7699356d, 0x196922b5, 0x406, 0x15554efc8fac, 0xffdf6e1, 0x269b51f3, 0x17baf00, 0x269b5203, 0x285dedb4, 0x7, 0x0, 0x7 That gives us some more context. Before hitting our bad object, we marked 0x406, then 0x15554efc8fac. 0x406 is the Lisp_Object representation of 257 (0x0101), and that number is used extensively in bytecode output: bytecode objects with minargs = maxargs = 1 look like this: #[257 "...bytecode..." [constants...] ...] This means that 0x15554efc8fac is most likely the string containing the bytecode; could you please run p *(struct Lisp_String *)0x15554efc8fa8 to see whether this is true, and whether any data is still there? It would be interesting to know whether we were marking the string or the bytecode object when the crash happened; please run p mark_stk p mark_stk.stack[0] p mark_stk.stack[1] p mark_stk.stack[2] so we see how deep in the mark stack we were at the time. Thanks! Pip
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.