GNU bug report logs - #76091
31.0.50; festure/igc: buffer.h:829: Emacs fatal error: assertion failed: BUFFERP (a)

Previous Next

Package: emacs;

Reported by: Gregor Zattler <telegraph <at> gmx.net>

Date: Thu, 6 Feb 2025 12:51:01 UTC

Severity: normal

Found in version 31.0.50

Done: Pip Cet <pipcet <at> protonmail.com>

Bug is archived. No further changes may be made.

Full log


Message #38 received at 76091 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: gerd.moellmann <at> gmail.com, 76091 <at> debbugs.gnu.org, telegraph <at> gmx.net
Subject: Re: bug#76091: 31.0.50;
 festure/igc: buffer.h:829: Emacs fatal error: assertion failed:
 BUFFERP (a)
Date: Fri, 07 Feb 2025 15:30:45 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Fri, 07 Feb 2025 10:41:18 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: Gregor Zattler <telegraph <at> gmx.net>, Eli Zaretskii <eliz <at> gnu.org>, 76091 <at> debbugs.gnu.org
>>
>> "Eli Zaretskii" <eliz <at> gnu.org> writes:
>>
>> >> Date: Thu, 06 Feb 2025 13:49:30 +0100
>> >> From:  Gregor Zattler via "Bug reports for GNU Emacs,
>> >>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>> >>
>> >> Dear Emacs developers, I don't know if
>> >> this failed assertion is due to using
>> >> MPS as GC.
>> >>
>> >> I played along with pdf-tools and
>> >> org-noter when it happened.
>> >>
>> >> This time I built with a current
>> >> checkout of feature/igc.
>> >>
>> >> GDB output even further below.
>> >>
>> >> The crashed session is still in GDB.
>> >> [...]
>> >> Breakpoint 1, terminate_due_to_signal (sig=sig <at> entry=6, backtrace_limit=backtrace_limit <at> entry=2147483647) at ./src/emacs.c:425
>> >> 425	{
>> >> +bt
>> >> #0  terminate_due_to_signal (sig=sig <at> entry=6, backtrace_limit=backtrace_limit <at> entry=2147483647) at ./src/emacs.c:425
>> >> #1 0x00005555555b8f5b in die (msg=msg <at> entry=0x5555559b40d0 "BUFFERP
>> >> (a)", file=file <at> entry=0x5555559b40c7 "buffer.h",
>> >> line=line <at> entry=829) at ./src/alloc.c:7683
>> >> #2  0x00005555555a0bda in XBUFFER (a=Python Exception <class 'gdb.error'>: value has been optimized out
>> >> ) at ./src/buffer.h:829
>> >> #3  0x00005555555a4370 in XBUFFER (a=Python Exception <class 'gdb.error'>: value has been optimized out
>> >> ) at ./src/xdisp.c:17024
>> >> #4  prepare_menu_bars () at ./src/xdisp.c:14041
>> >
>> > This is here:
>> >
>> >       FOR_EACH_FRAME (tail, frame)
>> > 	{
>> > 	  struct frame *f = XFRAME (frame);
>> > 	  struct window *w = XWINDOW (FRAME_SELECTED_WINDOW (f));
>> > 	  if (some_windows
>> > 	      && !f->redisplay
>> > 	      && !w->redisplay
>> > 	      && !XBUFFER (w->contents)->text->redisplay)
>> > 	    continue;
>> >
>> > And I don't understand how w->contents of a frame's selected window
>> > could fail the BUFFERP test.
>>
>> Here's my current theory:
>>
>> 1. display_mode_lines calls
>>
>>   record_unwind_protect
>>     (restore_frame_selected_window, XFRAME (new_frame)->selected_window);
>>
>> 2. that stores the selected window in the specpdl
>>
>> 3. the specpdl is then grown
>>
>> 4. we xpalloc the specpdl area, creating a copy of it and freeing the
>> old memory, which is still registered as a root.
>>
>> 5. igc_on_grow_specpdl calls mps_arena_park
>>
>> 6. the first thing mps_arena_park does is to complete the current GC
>> cycle.  This means that it will:
>>
>> * look at the old specpdl area, which has been freed and may now contain
>>   invalid data
>> * *modify* the old specpdl area, which may have been reallocated so
>>   cause random memory corruption
>> * move objects by updating their pointers in the *old* specpdl area,
>>   leaving the pointers in the *new* specpdl area invalid
>>
>> 7. in our case, the selected window was supposed to have been moved but
>> the pointer in the *new* specpdl area continues to point to the old
>> memory, which is reused for something else which fails the BUFFERP test.
>>
>> 8. we unwind and restore the invalid selected window pointer.
>>
>> I have a patch, but I'd like to discuss whether this is a plausible
>> theory first.  Gerd, if there's something that prevents this problem
>> from happening, and I missed it, could you briefly yell at me here?
>
> If this is possible, all hell will break loose.

Failing to update moved Lisp_Objects will make all hell break loose no
matter where they are.

> The selected window of any frame must be a leaf window, and its
> contents member must identify a buffer, at all times.  We must ensure
> this is true whenever our code runs.

Absolutely, which is why I think we need to remove the root reszing
code, and its no-GC assumption which turned out to be unjustified,
because parking the arena in an attempt to avoid GC turned out to
trigger GC at a very bad time.

The remaining no-GC assumptions are:

1. we don't allow the first GC to happen until things have been set up
(this seems justifiable)

2. we park the arena when walking the pool for statistics (so they
represent consistent counts)

3. (my code only) we park the arena while implementing which_symbols or
scanning the heap for references to a given object, but those are ultima
ratio debug methods which, while highly desirable, do not count as
ordinary operations.

I'll post about (3) in some more detail when I've decided whether I
would want to propose heap scanning code in time for the feature/igc
merge or not.

Pip





This bug report was last modified 102 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.