GNU bug report logs - #79116
31.0.50; Crash on IGC build

Previous Next

Package: emacs;

Reported by: Sean Devlin <spd <at> toadstyle.org>

Date: Mon, 28 Jul 2025 18:32:01 UTC

Severity: normal

Found in version 31.0.50

To reply to this bug, email your comments to 79116 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Mon, 28 Jul 2025 18:32:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Sean Devlin <spd <at> toadstyle.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 28 Jul 2025 18:32:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Sean Devlin <spd <at> toadstyle.org>
To: bug-gnu-emacs <at> gnu.org
Subject: 31.0.50; Crash on IGC build
Date: Mon, 28 Jul 2025 13:31:08 -0500
[Message part 1 (text/plain, inline)]
Hi folks,

I left Emacs idle overnight, and I found a crash log when I returned in
the morning. Please see the attached file.

I'm using the IGC branch, and it looks like a related assertion failed.

I have igc-step-interval set to 0.05, but no other configuration related to GC.

I don’t know how to reproduce the error.

Thanks!

[crash.txt (text/plain, attachment)]
[Message part 3 (text/plain, inline)]

In GNU Emacs 31.0.50 (build 1, aarch64-apple-darwin24.6.0, NS
appkit-2575.70 Version 15.6 (Build 24G5065c)) of 2025-07-16 built on
beatrix.local
Repository revision: 382123e69e2c0cae39e44f9b72ca3674eaec2ad1
Repository branch: igc
Windowing system distributor 'Apple', version 10.3.2575
System Description:  macOS 15.6

Configured using:
'configure --with-ns --with-modules --with-native-compilation
--with-libgmp --with-tree-sitter --with-sqlite3 --with-mps=yes
--without-imagemagick --without-dbus CPPFLAGS=-I/opt/homebrew/include
LDFLAGS=-L/opt/homebrew/lib PKG_CONFIG_PATH=/opt/homebrew/lib/pkgconfig
ac_cv_func_posix_spawn_file_actions_addchdir=no'

Configured features:
ACL GIF GMP GNUTLS JPEG LCMS2 LIBXML2 MODULES MPS NATIVE_COMP NOTIFY
KQUEUE NS PDUMPER PNG SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS
TREE_SITTER WEBP XIM ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Help

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  isearch-fold-quotes-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  minibuffer-regexp-mode: t
  buffer-read-only: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug lisp-mnt message mailcap yank-media puny
dired dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg
rfc6068 epg-config gnus-util mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045
ietf-drums mm-util mail-prsvr mail-utils cus-start cus-load autorevert
filenotify thingatpt time-date jka-compr info-look info compile comint
subr-x ansi-osc ansi-color ring comp-run cl-extra shortdoc
text-property-search pp comp-common rx help-fns byte-opt gv bytecomp
byte-compile radix-tree help-mode cl-loaddefs cl-lib
display-line-numbers rmc iso-transl tooltip cconv eldoc paren electric
uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel
term/ns-win ns-win ucs-normalize mule-util term/common-win tool-bar dnd
fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow
isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax
font-core term/tty-colors frame minibuffer nadvice seq simple cl-generic
indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
theme-loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget keymap
hashtable-print-readable backquote threads kqueue cocoa ns lcms2
multi-tty make-network-process tty-child-frames native-compile mps
emacs)

Memory information:
((conses 24 0 0) (symbols 56 0 0) (strings 40 0 0) (string-bytes 1 0)
(vectors 24 0) (vector-slots 8 0 0) (floats 24 0 0)
(intervals 64 0 0) (buffers 1072 0))

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Tue, 29 Jul 2025 18:19:01 GMT) Full text and rfc822 format available.

Message #8 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Sean Devlin <spd <at> toadstyle.org>
Cc: 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Tue, 29 Jul 2025 18:18:05 +0000
"Sean Devlin" <spd <at> toadstyle.org> writes:

> Hi folks,
>
> I left Emacs idle overnight, and I found a crash log when I returned in
> the morning. Please see the attached file.
>
> I'm using the IGC branch, and it looks like a related assertion failed.
>
> I have igc-step-interval set to 0.05, but no other configuration related to GC.
>
> I don’t know how to reproduce the error.
>
> Thanks!

Thanks for the report. I'm afraid we don't have very much to go on: this
assertion in trace.c in MPS failed:

    AVER(RefSetSub(ScanStateUnfixedSummary(ss), SegSummary(seg))); /* <design/check/#.common> */

I think that means a memory structure internal to MPS must have been
corrupted; either the ScanState on the stack, or the segment
information.

A core dump might help us figure out more, but I don't think there is
one, right?

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Tue, 29 Jul 2025 22:17:02 GMT) Full text and rfc822 format available.

Message #11 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Sean Devlin <spd <at> toadstyle.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Tue, 29 Jul 2025 17:16:35 -0500
[Message part 1 (text/plain, inline)]
> On Jul 29, 2025, at 1:18 PM, Pip Cet <pipcet <at> protonmail.com> wrote:
> 
> "Sean Devlin" <spd <at> toadstyle.org <mailto:spd <at> toadstyle.org>> writes:
> 
>> Hi folks,
>> 
>> I left Emacs idle overnight, and I found a crash log when I returned in
>> the morning. Please see the attached file.
>> 
>> I'm using the IGC branch, and it looks like a related assertion failed.
>> 
>> I have igc-step-interval set to 0.05, but no other configuration related to GC.
>> 
>> I don’t know how to reproduce the error.
>> 
>> Thanks!
> 
> Thanks for the report. I'm afraid we don't have very much to go on: this
> assertion in trace.c in MPS failed:
> 
>    AVER(RefSetSub(ScanStateUnfixedSummary(ss), SegSummary(seg))); /* <design/check/#.common> */
> 
> I think that means a memory structure internal to MPS must have been
> corrupted; either the ScanState on the stack, or the segment
> information.
> 
> A core dump might help us figure out more, but I don't think there is
> one, right?
> 
> Pip

Right, I don’t think there is a core dump. I can look into how to record one in
the future, but I don’t know how to reproduce this issue.

Sean
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Wed, 30 Jul 2025 03:44:01 GMT) Full text and rfc822 format available.

Message #14 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Helmut Eller <eller.helmut <at> gmail.com>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: Sean Devlin <spd <at> toadstyle.org>, 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Wed, 30 Jul 2025 05:43:48 +0200
On Tue, Jul 29 2025, Pip Cet via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:

> Thanks for the report. I'm afraid we don't have very much to go on: this
> assertion in trace.c in MPS failed:
>
>     AVER(RefSetSub(ScanStateUnfixedSummary(ss), SegSummary(seg))); /* <design/check/#.common> */

This may or may not be relevant: the GC literature defines a weak and a
strong tricolor invariant.  I wonder if this code checks those
invariants.

In fix_frame, we do something that may not be kosher: we trace the glyph
pool.  We can access the glyph pool because it lives outside the GC heap
and has no memory barriers (and the world is stopped).  However, MPS may
assume that we scan such regions as roots because there are no memory
barriers on them to enforce the tricolor invariants.  I don't understand
the invariants well enough to tell if what we do is harmless or not.
Perhaps someone who does could think this through.

Helmut




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Wed, 30 Jul 2025 11:25:01 GMT) Full text and rfc822 format available.

Message #17 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Sean Devlin <spd <at> toadstyle.org>
Cc: 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Wed, 30 Jul 2025 14:24:23 +0300
> From: Sean Devlin <spd <at> toadstyle.org>
> Date: Mon, 28 Jul 2025 13:31:08 -0500
> 
> I left Emacs idle overnight, and I found a crash log when I returned in
> the morning. Please see the attached file.
> 
> I'm using the IGC branch, and it looks like a related assertion failed.
> 
> I have igc-step-interval set to 0.05, but no other configuration related to GC.
> 
> I don’t know how to reproduce the error.

Can you tell what code could your Emacs be running while idle?  Like,
timers etc.?

> Thread 0::  Dispatch queue: com.apple.main-thread
> 0   Emacs                         	       0x100863214 redisplay_internal + 208
> 1   Emacs                         	       0x10086930c redisplay_preserve_echo_area + 132
> 2   Emacs                         	       0x100914d5c detect_input_pending_run_timers + 144
> 3   Emacs                         	       0x1009f4520 wait_reading_process_output + 4488
> 4   Emacs                         	       0x100912bb4 read_char + 7444
> 5   Emacs                         	       0x10090ee14 read_key_sequence + 1328
> 6   Emacs                         	       0x10090d2c8 command_loop_1 + 864
> 7   Emacs                         	       0x100992c14 internal_condition_case + 228
> 8   Emacs                         	       0x10090cf54 command_loop_2 + 52
> 9   Emacs                         	       0x100992218 internal_catch + 224
> 10  Emacs                         	       0x100aecd94 command_loop.cold.1 + 88
> 11  Emacs                         	       0x10090c784 command_loop + 156
> 12  Emacs                         	       0x10090c618 recursive_edit_1 + 188
> 13  Emacs                         	       0x10090c91c Frecursive_edit + 384
> 14  Emacs                         	       0x10090b73c main + 8644
> 15  dyld                          	       0x183686b98 start + 6076
> 
> Thread 1 Crashed:
> 0   libsystem_kernel.dylib        	       0x1839ee388 __pthread_kill + 8
> 1   libsystem_pthread.dylib       	       0x183a2788c pthread_kill + 296
> 2   libsystem_c.dylib             	       0x1838f8d04 raise + 32
> 3   Emacs                         	       0x100aec8d4 terminate_due_to_signal + 120
> 4   Emacs                         	       0x100aed748 emacs_abort + 20
> 5   Emacs                         	       0x100a4a2a0 ns_term_shutdown + 132
> 6   Emacs                         	       0x10090951c shut_down_emacs + 360
> 7   Emacs                         	       0x100aec938 terminate_due_to_signal + 220
> 8   Emacs                         	       0x100a1d150 igc_assert_fail + 76
> 9   Emacs                         	       0x100ae1b50 mps_lib_assert_fail + 32 (mpsliban.c:87) [inlined]
> 10  Emacs                         	       0x100ae1b50 traceScanSegRes + 532 (trace.c:1229)
> 11  Emacs                         	       0x100aa30e4 traceScanSeg + 40 (trace.c:1267)
> 12  Emacs                         	       0x100aa2f64 TraceSegAccess + 328 (trace.c:1320)
> 13  Emacs                         	       0x100aa84f8 SegWholeAccess + 336 (seg.c:1262)
> 14  Emacs                         	       0x100a96d44 ArenaAccess + 564 (global.c:671)
> 15  Emacs                         	       0x100ae9300 protCatchOne + 192 (protxc.c:242) [inlined]
> 16  Emacs                         	       0x100ae9300 protCatchThread + 304 (protxc.c:284)
> 17  libsystem_pthread.dylib       	       0x183a27c0c _pthread_start + 136
> 18  libsystem_pthread.dylib       	       0x183a22b80 thread_start + 8

This indicates that thread 1 was the thread which got hit by SIGABRT.
Thread 1 is NOT the main thread, which runs Lisp and redisplay.  So
what is thread 1, and how come it calls MPS functions?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Wed, 30 Jul 2025 11:50:02 GMT) Full text and rfc822 format available.

Message #20 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: spd <at> toadstyle.org, 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Wed, 30 Jul 2025 14:48:53 +0300
> Cc: 79116 <at> debbugs.gnu.org
> Date: Tue, 29 Jul 2025 18:18:05 +0000
> From:  Pip Cet via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> "Sean Devlin" <spd <at> toadstyle.org> writes:
> 
> > Hi folks,
> >
> > I left Emacs idle overnight, and I found a crash log when I returned in
> > the morning. Please see the attached file.
> >
> > I'm using the IGC branch, and it looks like a related assertion failed.
> >
> > I have igc-step-interval set to 0.05, but no other configuration related to GC.
> >
> > I don’t know how to reproduce the error.
> >
> > Thanks!
> 
> Thanks for the report. I'm afraid we don't have very much to go on: this
> assertion in trace.c in MPS failed:
> 
>     AVER(RefSetSub(ScanStateUnfixedSummary(ss), SegSummary(seg))); /* <design/check/#.common> */
> 
> I think that means a memory structure internal to MPS must have been
> corrupted; either the ScanState on the stack, or the segment
> information.

Do you understand how come a non-main thread called MPS in this case?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Wed, 30 Jul 2025 12:29:02 GMT) Full text and rfc822 format available.

Message #23 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: spd <at> toadstyle.org, pipcet <at> protonmail.com, 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Wed, 30 Jul 2025 15:28:10 +0300
> Cc: Sean Devlin <spd <at> toadstyle.org>, 79116 <at> debbugs.gnu.org
> From: Helmut Eller <eller.helmut <at> gmail.com>
> Date: Wed, 30 Jul 2025 05:43:48 +0200
> 
> On Tue, Jul 29 2025, Pip Cet via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
> 
> > Thanks for the report. I'm afraid we don't have very much to go on: this
> > assertion in trace.c in MPS failed:
> >
> >     AVER(RefSetSub(ScanStateUnfixedSummary(ss), SegSummary(seg))); /* <design/check/#.common> */
> 
> This may or may not be relevant: the GC literature defines a weak and a
> strong tricolor invariant.  I wonder if this code checks those
> invariants.
> 
> In fix_frame, we do something that may not be kosher: we trace the glyph
> pool.  We can access the glyph pool because it lives outside the GC heap
> and has no memory barriers (and the world is stopped).  However, MPS may
> assume that we scan such regions as roots because there are no memory
> barriers on them to enforce the tricolor invariants.  I don't understand
> the invariants well enough to tell if what we do is harmless or not.
> Perhaps someone who does could think this through.

You seem to be talking about the data structures manipulated by the
display engine, but the thread which got hit with SIGABRT was not the
main thread (which indeed was in redisplay), it's some other thread.
And I wonder how come that other thread called MPS.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Wed, 30 Jul 2025 12:31:02 GMT) Full text and rfc822 format available.

Message #26 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: spd <at> toadstyle.org, 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Wed, 30 Jul 2025 12:30:31 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Cc: 79116 <at> debbugs.gnu.org
>> Date: Tue, 29 Jul 2025 18:18:05 +0000
>> From:  Pip Cet via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>>
>> "Sean Devlin" <spd <at> toadstyle.org> writes:
>>
>> > Hi folks,
>> >
>> > I left Emacs idle overnight, and I found a crash log when I returned in
>> > the morning. Please see the attached file.
>> >
>> > I'm using the IGC branch, and it looks like a related assertion failed.
>> >
>> > I have igc-step-interval set to 0.05, but no other configuration related to GC.
>> >
>> > I don’t know how to reproduce the error.
>> >
>> > Thanks!
>>
>> Thanks for the report. I'm afraid we don't have very much to go on: this
>> assertion in trace.c in MPS failed:
>>
>>     AVER(RefSetSub(ScanStateUnfixedSummary(ss), SegSummary(seg))); /* <design/check/#.common> */
>>
>> I think that means a memory structure internal to MPS must have been
>> corrupted; either the ScanState on the stack, or the segment
>> information.
>
> Do you understand how come a non-main thread called MPS in this case?

Yes, MPS uses a separate thread to handle Mach exceptions on macOS,
rather than a signal handler. Most likely, the main thread accessed
memory behind a memory barrier and was suspended while the protection
thread attempted to lift the barrier, but that ran into an assertion
violation and aborted instead.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Wed, 30 Jul 2025 12:47:02 GMT) Full text and rfc822 format available.

Message #29 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: spd <at> toadstyle.org, 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Wed, 30 Jul 2025 15:46:45 +0300
> Date: Wed, 30 Jul 2025 12:30:31 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: spd <at> toadstyle.org, 79116 <at> debbugs.gnu.org
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> 
> >> Cc: 79116 <at> debbugs.gnu.org
> >> Date: Tue, 29 Jul 2025 18:18:05 +0000
> >> From:  Pip Cet via "Bug reports for GNU Emacs,
> >>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> >>
> >> "Sean Devlin" <spd <at> toadstyle.org> writes:
> >>
> >> > Hi folks,
> >> >
> >> > I left Emacs idle overnight, and I found a crash log when I returned in
> >> > the morning. Please see the attached file.
> >> >
> >> > I'm using the IGC branch, and it looks like a related assertion failed.
> >> >
> >> > I have igc-step-interval set to 0.05, but no other configuration related to GC.
> >> >
> >> > I don’t know how to reproduce the error.
> >> >
> >> > Thanks!
> >>
> >> Thanks for the report. I'm afraid we don't have very much to go on: this
> >> assertion in trace.c in MPS failed:
> >>
> >>     AVER(RefSetSub(ScanStateUnfixedSummary(ss), SegSummary(seg))); /* <design/check/#.common> */
> >>
> >> I think that means a memory structure internal to MPS must have been
> >> corrupted; either the ScanState on the stack, or the segment
> >> information.
> >
> > Do you understand how come a non-main thread called MPS in this case?
> 
> Yes, MPS uses a separate thread to handle Mach exceptions on macOS,
> rather than a signal handler.

Are we sure this is that separate thread?  What are the indications of
that?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Wed, 30 Jul 2025 12:52:02 GMT) Full text and rfc822 format available.

Message #32 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: Sean Devlin <spd <at> toadstyle.org>, 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Wed, 30 Jul 2025 12:51:24 +0000
"Helmut Eller" <eller.helmut <at> gmail.com> writes:

> On Tue, Jul 29 2025, Pip Cet via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
>
>> Thanks for the report. I'm afraid we don't have very much to go on: this
>> assertion in trace.c in MPS failed:
>>
>>     AVER(RefSetSub(ScanStateUnfixedSummary(ss), SegSummary(seg))); /* <design/check/#.common> */
>
> This may or may not be relevant: the GC literature defines a weak and a
> strong tricolor invariant.  I wonder if this code checks those
> invariants.
>
> In fix_frame, we do something that may not be kosher: we trace the glyph
> pool.  We can access the glyph pool because it lives outside the GC heap
> and has no memory barriers (and the world is stopped).  However, MPS may
> assume that we scan such regions as roots because there are no memory
> barriers on them to enforce the tricolor invariants.  I don't understand
> the invariants well enough to tell if what we do is harmless or not.
> Perhaps someone who does could think this through.

Good catch!

I think you're absolutely correct. What we do there definitely isn't
correct: if one of the frame's pools is behind a read barrier, there's a
tiny race condition, but a write barrier would effectively be ignored,
which might well cause problems like this one.

But we should fix it even if it didn't cause this particular crash.
igc_xpalloc_ambig shouldn't be hard to do, but it's probably better for
performance to turn glyph pools into their own IGC object type.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Wed, 30 Jul 2025 12:59:01 GMT) Full text and rfc822 format available.

Message #35 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: spd <at> toadstyle.org, 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Wed, 30 Jul 2025 12:57:55 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Wed, 30 Jul 2025 12:30:31 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: spd <at> toadstyle.org, 79116 <at> debbugs.gnu.org
>>
>> "Eli Zaretskii" <eliz <at> gnu.org> writes:
>>
>> >> Cc: 79116 <at> debbugs.gnu.org
>> >> Date: Tue, 29 Jul 2025 18:18:05 +0000
>> >> From:  Pip Cet via "Bug reports for GNU Emacs,
>> >>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>> >>
>> >> "Sean Devlin" <spd <at> toadstyle.org> writes:
>> >>
>> >> > Hi folks,
>> >> >
>> >> > I left Emacs idle overnight, and I found a crash log when I returned in
>> >> > the morning. Please see the attached file.
>> >> >
>> >> > I'm using the IGC branch, and it looks like a related assertion failed.
>> >> >
>> >> > I have igc-step-interval set to 0.05, but no other configuration related to GC.
>> >> >
>> >> > I don’t know how to reproduce the error.
>> >> >
>> >> > Thanks!
>> >>
>> >> Thanks for the report. I'm afraid we don't have very much to go on: this
>> >> assertion in trace.c in MPS failed:
>> >>
>> >>     AVER(RefSetSub(ScanStateUnfixedSummary(ss), SegSummary(seg))); /* <design/check/#.common> */
>> >>
>> >> I think that means a memory structure internal to MPS must have been
>> >> corrupted; either the ScanState on the stack, or the segment
>> >> information.
>> >
>> > Do you understand how come a non-main thread called MPS in this case?
>>
>> Yes, MPS uses a separate thread to handle Mach exceptions on macOS,
>> rather than a signal handler.
>
> Are we sure this is that separate thread?

Yes, we are.

> What are the indications of that?

The backtrace clearly indicates that thread 1 is the one running code
from protxc.c, and it is thread 1 which crashed.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Wed, 30 Jul 2025 16:32:02 GMT) Full text and rfc822 format available.

Message #38 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Helmut Eller <eller.helmut <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: spd <at> toadstyle.org, pipcet <at> protonmail.com, 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Wed, 30 Jul 2025 18:31:39 +0200
On Wed, Jul 30 2025, Eli Zaretskii wrote:

> You seem to be talking about the data structures manipulated by the
> display engine, but the thread which got hit with SIGABRT was not the
> main thread (which indeed was in redisplay), it's some other thread.
> And I wonder how come that other thread called MPS.

I think that's the usual way how page faults are handled on MacOS:

  1. the main thread hits a memory barrier (in this case in redisplay)

  2. the Mach kernel stops the main thread and sends an message to
     process' exception port

  3. there is a dedicated thread that reads the message from the
     exception port.  It's this thread that calls into MPS and
     copies/scans objects as needed.  In this case that triggered the
     assertion.

I'm not sure when or how the main thread is resumed. Gerd would
certainly know.

Helmut




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Wed, 30 Jul 2025 21:13:02 GMT) Full text and rfc822 format available.

Message #41 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Sean Devlin <spd <at> toadstyle.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Wed, 30 Jul 2025 16:11:57 -0500
Hi Eli,

> On Jul 30, 2025, at 6:24 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
>> From: Sean Devlin <spd <at> toadstyle.org>
>> Date: Mon, 28 Jul 2025 13:31:08 -0500
>> 
>> I left Emacs idle overnight, and I found a crash log when I returned in
>> the morning. Please see the attached file.
>> 
>> I'm using the IGC branch, and it looks like a related assertion failed.
>> 
>> I have igc-step-interval set to 0.05, but no other configuration related to GC.
>> 
>> I don’t know how to reproduce the error.
> 
> Can you tell what code could your Emacs be running while idle?  Like,
> timers etc.?

Here are the timers I have running:

               5.0s         5.0s auto-revert-buffers
            2m 0.6s           5m savehist-autosave
          22m 14.7s           1h org-persist--refresh-gc-lock
   *           0.1s            t show-paren-function
   *           0.5s            t which-func-update
   *           0.5s            t jit-lock-context--update
   *           0.5s      :repeat blink-cursor-start
   *     4h 0m 0.0s       repeat ef-themes-load-random

The last one might be relevant, since I added it within the last week. Basically, it is choosing a new theme after Emacs has been idle for four hours. This entails disabling the currently enabled theme and then enabling a new one.

> 
>> Thread 0::  Dispatch queue: com.apple.main-thread
>> 0   Emacs                         	       0x100863214 redisplay_internal + 208
>> 1   Emacs                         	       0x10086930c redisplay_preserve_echo_area + 132
>> 2   Emacs                         	       0x100914d5c detect_input_pending_run_timers + 144
>> 3   Emacs                         	       0x1009f4520 wait_reading_process_output + 4488
>> 4   Emacs                         	       0x100912bb4 read_char + 7444
>> 5   Emacs                         	       0x10090ee14 read_key_sequence + 1328
>> 6   Emacs                         	       0x10090d2c8 command_loop_1 + 864
>> 7   Emacs                         	       0x100992c14 internal_condition_case + 228
>> 8   Emacs                         	       0x10090cf54 command_loop_2 + 52
>> 9   Emacs                         	       0x100992218 internal_catch + 224
>> 10  Emacs                         	       0x100aecd94 command_loop.cold.1 + 88
>> 11  Emacs                         	       0x10090c784 command_loop + 156
>> 12  Emacs                         	       0x10090c618 recursive_edit_1 + 188
>> 13  Emacs                         	       0x10090c91c Frecursive_edit + 384
>> 14  Emacs                         	       0x10090b73c main + 8644
>> 15  dyld                          	       0x183686b98 start + 6076
>> 
>> Thread 1 Crashed:
>> 0   libsystem_kernel.dylib        	       0x1839ee388 __pthread_kill + 8
>> 1   libsystem_pthread.dylib       	       0x183a2788c pthread_kill + 296
>> 2   libsystem_c.dylib             	       0x1838f8d04 raise + 32
>> 3   Emacs                         	       0x100aec8d4 terminate_due_to_signal + 120
>> 4   Emacs                         	       0x100aed748 emacs_abort + 20
>> 5   Emacs                         	       0x100a4a2a0 ns_term_shutdown + 132
>> 6   Emacs                         	       0x10090951c shut_down_emacs + 360
>> 7   Emacs                         	       0x100aec938 terminate_due_to_signal + 220
>> 8   Emacs                         	       0x100a1d150 igc_assert_fail + 76
>> 9   Emacs                         	       0x100ae1b50 mps_lib_assert_fail + 32 (mpsliban.c:87) [inlined]
>> 10  Emacs                         	       0x100ae1b50 traceScanSegRes + 532 (trace.c:1229)
>> 11  Emacs                         	       0x100aa30e4 traceScanSeg + 40 (trace.c:1267)
>> 12  Emacs                         	       0x100aa2f64 TraceSegAccess + 328 (trace.c:1320)
>> 13  Emacs                         	       0x100aa84f8 SegWholeAccess + 336 (seg.c:1262)
>> 14  Emacs                         	       0x100a96d44 ArenaAccess + 564 (global.c:671)
>> 15  Emacs                         	       0x100ae9300 protCatchOne + 192 (protxc.c:242) [inlined]
>> 16  Emacs                         	       0x100ae9300 protCatchThread + 304 (protxc.c:284)
>> 17  libsystem_pthread.dylib       	       0x183a27c0c _pthread_start + 136
>> 18  libsystem_pthread.dylib       	       0x183a22b80 thread_start + 8
> 
> This indicates that thread 1 was the thread which got hit by SIGABRT.
> Thread 1 is NOT the main thread, which runs Lisp and redisplay.  So
> what is thread 1, and how come it calls MPS functions?





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Thu, 31 Jul 2025 10:08:01 GMT) Full text and rfc822 format available.

Message #44 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: spd <at> toadstyle.org, Eli Zaretskii <eliz <at> gnu.org>, 79116 <at> debbugs.gnu.org,
 pipcet <at> protonmail.com
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Thu, 31 Jul 2025 12:07:29 +0200
Helmut Eller <eller.helmut <at> gmail.com> writes:

> On Wed, Jul 30 2025, Eli Zaretskii wrote:
>
>> You seem to be talking about the data structures manipulated by the
>> display engine, but the thread which got hit with SIGABRT was not the
>> main thread (which indeed was in redisplay), it's some other thread.
>> And I wonder how come that other thread called MPS.
>
> I think that's the usual way how page faults are handled on MacOS:
>
>   1. the main thread hits a memory barrier (in this case in redisplay)
>
>   2. the Mach kernel stops the main thread and sends an message to
>      process' exception port
>
>   3. there is a dedicated thread that reads the message from the
>      exception port.  It's this thread that calls into MPS and
>      copies/scans objects as needed.  In this case that triggered the
>      assertion.
>
> I'm not sure when or how the main thread is resumed. Gerd would
> certainly know.

I'm not reading emacs-bugs normally, feel free to CC me, if there is
something I can help with.)

The barrier on macOS is a Match port. A port is a message queue olus a
thread handling messages. When a barrier is the oS stops the thread
hitting the barrier, and send the exception handler a message. The
thread of the exception does its MPS things, and tells the OS to resume
the thread having the problem.

BTW, glyph pools are only used for tty frames. Don't know if that plays
a role here.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Thu, 31 Jul 2025 16:47:01 GMT) Full text and rfc822 format available.

Message #47 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Helmut Eller <eller.helmut <at> gmail.com>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: spd <at> toadstyle.org, Eli Zaretskii <eliz <at> gnu.org>, 79116 <at> debbugs.gnu.org,
 pipcet <at> protonmail.com
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Thu, 31 Jul 2025 18:46:49 +0200
On Thu, Jul 31 2025, Gerd Möllmann wrote:

> BTW, glyph pools are only used for tty frames. Don't know if that plays
> a role here.

I didn't know that.  And the glyph matrix, is that a tty-only thing too?

Anyway, my concern is a bit more general: I think that objects without
memory barriers, (e.g. structs allocated with malloc) should be scanned
as roots.

E.g. in fix_frame there is code that uses FRAME_FONT and to get the
address of some field in a device dependent struct.  This struct is not
protected by a memory barrier; I think these kind of structs should be
roots.

In the MPS paper, section "Phase 4: Black Mutator Tracing", they say
that gray and white segments are read protected.  I assume that also
means that, at his point, there are no write barriers.  If the mutator
can read a pointer to a white object from a struct without read barriers
and put the white pointer in a black object, then we have a problem.
That's why I think that structs without memory barriers should be roots.

There might be special circumstances where code like in fix_frame is
sufficient.  E.g. if for some reason all paths to the font field go
though the read-protected frame.  But I think those special conditions
are hard to think about and it would be simpler to just make the thing a
root.

Helmut




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Thu, 31 Jul 2025 17:17:02 GMT) Full text and rfc822 format available.

Message #50 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: spd <at> toadstyle.org, Eli Zaretskii <eliz <at> gnu.org>, 79116 <at> debbugs.gnu.org,
 pipcet <at> protonmail.com
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Thu, 31 Jul 2025 19:16:27 +0200
Helmut Eller <eller.helmut <at> gmail.com> writes:

> On Thu, Jul 31 2025, Gerd Möllmann wrote:
>
>> BTW, glyph pools are only used for tty frames. Don't know if that 
>> a role here.
>
> I didn't know that.  And the glyph matrix, is that a tty-only thing
> too?

No, every window has a glyph matrix. On ttys, window glyph matrices are
sub-allocated from a frame matrix, and all matrices are allocated from
glyph pools. On GUIs there are neither frame matrices nor glyph pools,
and windows manage the matrix memory themselves.

See dispnew.c, allocate_matrices_for_window_redisplay, and
allocate_matrices_for_frame_redisplay.

> Anyway, my concern is a bit more general: I think that objects without
> memory barriers, (e.g. structs allocated with malloc) should be scanned
> as roots.
>
> E.g. in fix_frame there is code that uses FRAME_FONT and to get the
> address of some field in a device dependent struct.  This struct is not
> protected by a memory barrier; I think these kind of structs should be
> roots.
>
> In the MPS paper, section "Phase 4: Black Mutator Tracing", they say
> that gray and white segments are read protected.  I assume that also
> means that, at his point, there are no write barriers.  If the mutator
> can read a pointer to a white object from a struct without read barriers
> and put the white pointer in a black object, then we have a problem.
> That's why I think that structs without memory barriers should be roots.
>
> There might be special circumstances where code like in fix_frame is
> sufficient.  E.g. if for some reason all paths to the font field go
> though the read-protected frame.  But I think those special conditions
> are hard to think about and it would be simpler to just make the thing a
> root.

Hm, could be. And what you say makes sense to me, as a potential
problem. I've never studied the MPS implementation in depth, so I don't
know if they take this into account, and have something for this case.

The only thing I remember from the docs is that one is allowed, while
scanning, in our case in the fix functions, to access non-MPS memory. I
don't think the docs say that one may not scan such non-MPS memory. But
I could be wrong of course, or it's something missing in the docs.

Hm, don't know what's best to do. At least it can't be the reason for
the current case, right?

OTOH, making the pools roots is also not a catastrophe either. BTW, The
frame in struct glyph is only used on ttys, it's something for child
frames on ttys. That's why I scan only the pools, from which all
matrices are sub-allocated. 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Fri, 01 Aug 2025 06:00:02 GMT) Full text and rfc822 format available.

Message #53 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: gerd.moellmann <at> gmail.com, spd <at> toadstyle.org, pipcet <at> protonmail.com,
 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Fri, 01 Aug 2025 08:59:12 +0300
> From: Helmut Eller <eller.helmut <at> gmail.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  spd <at> toadstyle.org,
>   pipcet <at> protonmail.com,  79116 <at> debbugs.gnu.org
> Date: Thu, 31 Jul 2025 18:46:49 +0200
> 
> Anyway, my concern is a bit more general: I think that objects without
> memory barriers, (e.g. structs allocated with malloc) should be scanned
> as roots.

This is very general, and in that general form quite scary.  Emacs
allocates with malloc all over the place, and most such allocations
have nothing to do with Lisp objects.  So I hope we can have rules
that determine whether a given malloc'ed object needs to be scanned or
not, because otherwise we'll force MPS to scan too much, and lose the
single most important advantage it gives us: that its GC cycles are
fast and don't stop the application code for prolonged periods of
time.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Fri, 01 Aug 2025 07:01:02 GMT) Full text and rfc822 format available.

Message #56 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Helmut Eller <eller.helmut <at> gmail.com>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: spd <at> toadstyle.org, Eli Zaretskii <eliz <at> gnu.org>, 79116 <at> debbugs.gnu.org,
 pipcet <at> protonmail.com
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Fri, 01 Aug 2025 09:00:40 +0200
On Thu, Jul 31 2025, Gerd Möllmann wrote:

> Hm, don't know what's best to do. At least it can't be the reason for
> the current case, right?

The assertion was:

    AVER(RefSetSub(ScanStateUnfixedSummary(ss), SegSummary(seg))); /* <design/check/#.common> */

It looks like MPS found a reference to a zone/generation that it was not
expecting to find.  If something is able to by-pass memory barriers it
could create such unexpected references.  Of course, I'm just
speculating.  I don't even know how Pip figured out that this particular
assertion failed.

I would not change anything without a reproducible test case.  Could be
tricky to create the problematic situation: the access must happen after
the flip and not go through read-protected objects.

Helmut




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Fri, 01 Aug 2025 07:12:01 GMT) Full text and rfc822 format available.

Message #59 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: spd <at> toadstyle.org, Eli Zaretskii <eliz <at> gnu.org>, 79116 <at> debbugs.gnu.org,
 pipcet <at> protonmail.com
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Fri, 01 Aug 2025 09:11:40 +0200
Helmut Eller <eller.helmut <at> gmail.com> writes:

> On Thu, Jul 31 2025, Gerd Möllmann wrote:
>
>> Hm, don't know what's best to do. At least it can't be the reason for
>> the current case, right?
>
> The assertion was:
>
>     AVER(RefSetSub(ScanStateUnfixedSummary(ss), SegSummary(seg))); /* <design/check/#.common> */
>
> It looks like MPS found a reference to a zone/generation that it was not
> expecting to find.  If something is able to by-pass memory barriers it
> could create such unexpected references.  Of course, I'm just
> speculating.  I don't even know how Pip figured out that this particular
> assertion failed.
>
> I would not change anything without a reproducible test case.  Could be
> tricky to create the problematic situation: the access must happen after
> the flip and not go through read-protected objects.

I think I agree. FWIW, I'm running tty Emacs all the time, so glyph
pools are in use, and I haven't see anything happening. But who knows...




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Fri, 01 Aug 2025 07:29:02 GMT) Full text and rfc822 format available.

Message #62 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Helmut Eller <eller.helmut <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: gerd.moellmann <at> gmail.com, spd <at> toadstyle.org, pipcet <at> protonmail.com,
 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Fri, 01 Aug 2025 09:28:07 +0200
On Fri, Aug 01 2025, Eli Zaretskii wrote:

>> From: Helmut Eller <eller.helmut <at> gmail.com>
>> Cc: Eli Zaretskii <eliz <at> gnu.org>,  spd <at> toadstyle.org,
>>   pipcet <at> protonmail.com,  79116 <at> debbugs.gnu.org
>> Date: Thu, 31 Jul 2025 18:46:49 +0200
>> 
>> Anyway, my concern is a bit more general: I think that objects without
>> memory barriers, (e.g. structs allocated with malloc) should be scanned
>> as roots.
>
> This is very general, and in that general form quite scary.  Emacs
> allocates with malloc all over the place, and most such allocations
> have nothing to do with Lisp objects.

If there aren't any Lisp objects involved, then there is no problem.
The problematic cases are (or could be) structs that are malloc'd and
contain references to GC-managed objects.

> So I hope we can have rules
> that determine whether a given malloc'ed object needs to be scanned or
> not, because otherwise we'll force MPS to scan too much, and lose the
> single most important advantage it gives us: that its GC cycles are
> fast and don't stop the application code for prolonged periods of
> time.

Yes, it would require more work during the root scanning phase.

However, performance of igc is already unconvincing: throughput with the
old GC and gc-cons-percentage = 1.0 is better than with igc.  For
latency, we have no benchmarks; so I'm not convinced that igc actually
has lower latency.

Even if igc has lower latency, the current way igc triggers
opportunistic GC doesn't work well: I've often seen "Opportunism: client
predicts plenty of idle time, so start full collection." messages when I
was about to type something.

Helmut




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Fri, 01 Aug 2025 08:14:01 GMT) Full text and rfc822 format available.

Message #65 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: spd <at> toadstyle.org, Eli Zaretskii <eliz <at> gnu.org>, 79116 <at> debbugs.gnu.org,
 pipcet <at> protonmail.com
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Fri, 01 Aug 2025 10:12:58 +0200
Helmut Eller <eller.helmut <at> gmail.com> writes:

> However, performance of igc is already unconvincing: throughput with the
> old GC and gc-cons-percentage = 1.0 is better than with igc.  For
> latency, we have no benchmarks; so I'm not convinced that igc actually
> has lower latency.
>
> Even if igc has lower latency, the current way igc triggers
> opportunistic GC doesn't work well: I've often seen "Opportunism: client
> predicts plenty of idle time, so start full collection." messages when I
> was about to type something.

For me, neither throughput nor latency are important. What I care about
is pause times in interactive use.

I've been using using igc for > 1 year, and I am now running emacs-mac
for 10 days, with the old GC. GC-wise, emacs-mac is definitely a step
back :-).





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Fri, 01 Aug 2025 10:22:02 GMT) Full text and rfc822 format available.

Message #68 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: gerd.moellmann <at> gmail.com, spd <at> toadstyle.org, pipcet <at> protonmail.com,
 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Fri, 01 Aug 2025 13:21:31 +0300
> From: Helmut Eller <eller.helmut <at> gmail.com>
> Cc: gerd.moellmann <at> gmail.com,  spd <at> toadstyle.org,  pipcet <at> protonmail.com,
>   79116 <at> debbugs.gnu.org
> Date: Fri, 01 Aug 2025 09:28:07 +0200
> 
> On Fri, Aug 01 2025, Eli Zaretskii wrote:
> 
> >> From: Helmut Eller <eller.helmut <at> gmail.com>
> >> Cc: Eli Zaretskii <eliz <at> gnu.org>,  spd <at> toadstyle.org,
> >>   pipcet <at> protonmail.com,  79116 <at> debbugs.gnu.org
> >> Date: Thu, 31 Jul 2025 18:46:49 +0200
> >> 
> >> Anyway, my concern is a bit more general: I think that objects without
> >> memory barriers, (e.g. structs allocated with malloc) should be scanned
> >> as roots.
> >
> > This is very general, and in that general form quite scary.  Emacs
> > allocates with malloc all over the place, and most such allocations
> > have nothing to do with Lisp objects.
> 
> If there aren't any Lisp objects involved, then there is no problem.
> The problematic cases are (or could be) structs that are malloc'd and
> contain references to GC-managed objects.

What is a "GC-managed object", for this purpose?  How can one
determine whether a given object is or isn't GC-managed?

> > So I hope we can have rules
> > that determine whether a given malloc'ed object needs to be scanned or
> > not, because otherwise we'll force MPS to scan too much, and lose the
> > single most important advantage it gives us: that its GC cycles are
> > fast and don't stop the application code for prolonged periods of
> > time.
> 
> Yes, it would require more work during the root scanning phase.
> 
> However, performance of igc is already unconvincing: throughput with the
> old GC and gc-cons-percentage = 1.0 is better than with igc.

Did you try to run interactively with gc-cons-percentage = 1.0?  If
you did, can you share the experience?

> For latency, we have no benchmarks; so I'm not convinced that igc
> actually has lower latency.

My anecdotal evidence from running the igc branch is unambiguous: it
is significantly less "stuttering" than the master branch.

> Even if igc has lower latency, the current way igc triggers
> opportunistic GC doesn't work well: I've often seen "Opportunism: client
> predicts plenty of idle time, so start full collection." messages when I
> was about to type something.

So maybe some tuning is in order?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Fri, 01 Aug 2025 11:45:02 GMT) Full text and rfc822 format available.

Message #71 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Helmut Eller <eller.helmut <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: gerd.moellmann <at> gmail.com, spd <at> toadstyle.org, pipcet <at> protonmail.com,
 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Fri, 01 Aug 2025 13:44:08 +0200
[Message part 1 (text/plain, inline)]
On Fri, Aug 01 2025, Eli Zaretskii wrote:

[...]
>> If there aren't any Lisp objects involved, then there is no problem.
>> The problematic cases are (or could be) structs that are malloc'd and
>> contain references to GC-managed objects.
>
> What is a "GC-managed object", for this purpose?  How can one
> determine whether a given object is or isn't GC-managed?

Objects that allocated on the GC heap and automatically freed are
GC-managed.

[...]
>> Yes, it would require more work during the root scanning phase.
>> 
>> However, performance of igc is already unconvincing: throughput with the
>> old GC and gc-cons-percentage = 1.0 is better than with igc.
>
> Did you try to run interactively with gc-cons-percentage = 1.0?  If
> you did, can you share the experience?

No.  I usually run igc with an MPS debug build; it has much longer and
noticeable GC pauses than a regular built.

However, I have a bunch of benchmarks and those are executed inside GNU
screen [*].  I don't claim that the benchmarks are good or relevant or
anything.  For the longest time I didn't even know that batch mode uses
a different gc-cons-percentage.  Doh!  The results, with all its
badness, are:

[real.svg.gz (application/gzip, attachment)]
[rss_max.svg.gz (application/gzip, attachment)]
[Message part 4 (text/plain, inline)]
The master-X versions are for gc-cons-percentage = X.  wgc is a branch
with a half finished Whipped GC.

>> For latency, we have no benchmarks; so I'm not convinced that igc
>> actually has lower latency.
>
> My anecdotal evidence from running the igc branch is unambiguous: it
> is significantly less "stuttering" than the master branch.

Is that with gc-cons-percentage = 0.1?

>> Even if igc has lower latency, the current way igc triggers
>> opportunistic GC doesn't work well: I've often seen "Opportunism: client
>> predicts plenty of idle time, so start full collection." messages when I
>> was about to type something.
>
> So maybe some tuning is in order?

Definitely.

Helmut

[*] https://github.com/ellerh/igc-benchmarks

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Fri, 01 Aug 2025 11:53:02 GMT) Full text and rfc822 format available.

Message #74 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: spd <at> toadstyle.org, Eli Zaretskii <eliz <at> gnu.org>, 79116 <at> debbugs.gnu.org,
 pipcet <at> protonmail.com
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Fri, 01 Aug 2025 13:52:30 +0200
Helmut Eller <eller.helmut <at> gmail.com> writes:

> The master-X versions are for gc-cons-percentage = X.  wgc is a branch
> with a half finished Whipped GC.

Care to tell more about that Whipped GC? I don't think I've ever heard
that term.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Fri, 01 Aug 2025 11:59:02 GMT) Full text and rfc822 format available.

Message #77 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Stéphane Marks <shipmints <at> gmail.com>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: spd <at> toadstyle.org, Eli Zaretskii <eliz <at> gnu.org>, 79116 <at> debbugs.gnu.org,
 Helmut Eller <eller.helmut <at> gmail.com>, pipcet <at> protonmail.com
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Fri, 1 Aug 2025 07:58:31 -0400
[Message part 1 (text/plain, inline)]
On Fri, Aug 1, 2025 at 7:54 AM Gerd Möllmann <gerd.moellmann <at> gmail.com>
wrote:

> Helmut Eller <eller.helmut <at> gmail.com> writes:
>
> > The master-X versions are for gc-cons-percentage = X.  wgc is a branch
> > with a half finished Whipped GC.
>
> Care to tell more about that Whipped GC? I don't think I've ever heard
> that term.
>

I'm guessing he means the guile whippet gc https://github.com/wingo/whippet
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Fri, 01 Aug 2025 12:17:02 GMT) Full text and rfc822 format available.

Message #80 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Stéphane Marks <shipmints <at> gmail.com>
Cc: spd <at> toadstyle.org, Eli Zaretskii <eliz <at> gnu.org>, 79116 <at> debbugs.gnu.org,
 Helmut Eller <eller.helmut <at> gmail.com>, pipcet <at> protonmail.com
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Fri, 01 Aug 2025 14:16:03 +0200
Stéphane Marks <shipmints <at> gmail.com> writes:

> On Fri, Aug 1, 2025 at 7:54 AM Gerd Möllmann <gerd.moellmann <at> gmail.com> wrote:
>
>  Helmut Eller <eller.helmut <at> gmail.com> writes:
>
>  > The master-X versions are for gc-cons-percentage = X.  wgc is a branch
>  > with a half finished Whipped GC.
>
>  Care to tell more about that Whipped GC? I don't think I've ever heard
>  that term.
>
> I'm guessing he means the guile whippet gc https://github.com/wingo/whippet

Ah, thanks, that makew sense. Forgot about that one.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Fri, 01 Aug 2025 12:30:02 GMT) Full text and rfc822 format available.

Message #83 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: gerd.moellmann <at> gmail.com, spd <at> toadstyle.org, pipcet <at> protonmail.com,
 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Fri, 01 Aug 2025 15:28:28 +0300
> From: Helmut Eller <eller.helmut <at> gmail.com>
> Cc: gerd.moellmann <at> gmail.com,  spd <at> toadstyle.org,  pipcet <at> protonmail.com,
>   79116 <at> debbugs.gnu.org
> Date: Fri, 01 Aug 2025 13:44:08 +0200
> 
> >> If there aren't any Lisp objects involved, then there is no problem.
> >> The problematic cases are (or could be) structs that are malloc'd and
> >> contain references to GC-managed objects.
> >
> > What is a "GC-managed object", for this purpose?  How can one
> > determine whether a given object is or isn't GC-managed?
> 
> Objects that allocated on the GC heap and automatically freed are
> GC-managed.

That answers the first question I asked, but not the second one, which
is what is important in practice.

> >> Yes, it would require more work during the root scanning phase.
> >> 
> >> However, performance of igc is already unconvincing: throughput with the
> >> old GC and gc-cons-percentage = 1.0 is better than with igc.
> >
> > Did you try to run interactively with gc-cons-percentage = 1.0?  If
> > you did, can you share the experience?
> 
> No.  I usually run igc with an MPS debug build; it has much longer and
> noticeable GC pauses than a regular built.
> 
> However, I have a bunch of benchmarks and those are executed inside GNU
> screen [*].  I don't claim that the benchmarks are good or relevant or
> anything.  For the longest time I didn't even know that batch mode uses
> a different gc-cons-percentage.  Doh!  The results, with all its
> badness, are:

The "real" data seems to contradict what both Gerd and myself see in
interactive usage: the pause times, such as they are, in the igc
branch are significantly shorter, almost as if they didn't exist.

> > My anecdotal evidence from running the igc branch is unambiguous: it
> > is significantly less "stuttering" than the master branch.
> 
> Is that with gc-cons-percentage = 0.1?

I compare "emacs -Q", so yes.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Fri, 01 Aug 2025 13:32:02 GMT) Full text and rfc822 format available.

Message #86 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Helmut Eller <eller.helmut <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: gerd.moellmann <at> gmail.com, spd <at> toadstyle.org, pipcet <at> protonmail.com,
 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Fri, 01 Aug 2025 15:31:16 +0200
On Fri, Aug 01 2025, Eli Zaretskii wrote:

>> Objects that allocated on the GC heap and automatically freed are
>> GC-managed.
>
> That answers the first question I asked, but not the second one, which
> is what is important in practice.

Then I don't know the answer.

[...]
>> However, I have a bunch of benchmarks and those are executed inside GNU
>> screen [*].  I don't claim that the benchmarks are good or relevant or
>> anything.  For the longest time I didn't even know that batch mode uses
>> a different gc-cons-percentage.  Doh!  The results, with all its
>> badness, are:
>
> The "real" data seems to contradict what both Gerd and myself see in
> interactive usage: the pause times, such as they are, in the igc
> branch are significantly shorter, almost as if they didn't exist.

Well, you didn't define let alone quantify pause times.

>> > My anecdotal evidence from running the igc branch is unambiguous: it
>> > is significantly less "stuttering" than the master branch.
>> 
>> Is that with gc-cons-percentage = 0.1?
>
> I compare "emacs -Q", so yes.

With gc-cons-percentage = 0.1 the GC runs very often.  Bigger values
would likely have better performance.

Helmut




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Fri, 01 Aug 2025 14:00:03 GMT) Full text and rfc822 format available.

Message #89 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: gerd.moellmann <at> gmail.com, spd <at> toadstyle.org, pipcet <at> protonmail.com,
 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Fri, 01 Aug 2025 16:58:18 +0300
> From: Helmut Eller <eller.helmut <at> gmail.com>
> Cc: gerd.moellmann <at> gmail.com,  spd <at> toadstyle.org,  pipcet <at> protonmail.com,
>   79116 <at> debbugs.gnu.org
> Date: Fri, 01 Aug 2025 15:31:16 +0200
> 
> On Fri, Aug 01 2025, Eli Zaretskii wrote:
> 
> >> Objects that allocated on the GC heap and automatically freed are
> >> GC-managed.
> >
> > That answers the first question I asked, but not the second one, which
> > is what is important in practice.
> 
> Then I don't know the answer.

I think we will have to define these indications.  Or at least we
should try.  It's important for future maintenance.

> >> However, I have a bunch of benchmarks and those are executed inside GNU
> >> screen [*].  I don't claim that the benchmarks are good or relevant or
> >> anything.  For the longest time I didn't even know that batch mode uses
> >> a different gc-cons-percentage.  Doh!  The results, with all its
> >> badness, are:
> >
> > The "real" data seems to contradict what both Gerd and myself see in
> > interactive usage: the pause times, such as they are, in the igc
> > branch are significantly shorter, almost as if they didn't exist.
> 
> Well, you didn't define let alone quantify pause times.

My estimation is that pauses while scrolling through xdisp.c take a
few hundred milliseconds.

> >> > My anecdotal evidence from running the igc branch is unambiguous: it
> >> > is significantly less "stuttering" than the master branch.
> >> 
> >> Is that with gc-cons-percentage = 0.1?
> >
> > I compare "emacs -Q", so yes.
> 
> With gc-cons-percentage = 0.1 the GC runs very often.  Bigger values
> would likely have better performance.

Which might mean our default value is too low.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79116; Package emacs. (Fri, 01 Aug 2025 14:02:02 GMT) Full text and rfc822 format available.

Message #92 received at 79116 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: gerd.moellmann <at> gmail.com, spd <at> toadstyle.org, pipcet <at> protonmail.com,
 79116 <at> debbugs.gnu.org
Subject: Re: bug#79116: 31.0.50; Crash on IGC build
Date: Fri, 01 Aug 2025 17:00:20 +0300
> From: Helmut Eller <eller.helmut <at> gmail.com>
> Cc: gerd.moellmann <at> gmail.com,  spd <at> toadstyle.org,  pipcet <at> protonmail.com,
>   79116 <at> debbugs.gnu.org
> Date: Fri, 01 Aug 2025 15:31:16 +0200
> 
> On Fri, Aug 01 2025, Eli Zaretskii wrote:
> 
> >> Objects that allocated on the GC heap and automatically freed are
> >> GC-managed.
> >
> > That answers the first question I asked, but not the second one, which
> > is what is important in practice.
> 
> Then I don't know the answer.

I think we will have to define these indications.  Or at least we
should try.  It's important for future maintenance.

> >> > My anecdotal evidence from running the igc branch is unambiguous: it
> >> > is significantly less "stuttering" than the master branch.
> >> 
> >> Is that with gc-cons-percentage = 0.1?
> >
> > I compare "emacs -Q", so yes.
> 
> With gc-cons-percentage = 0.1 the GC runs very often.  Bigger values
> would likely have better performance.

Which might mean our default value is too low.




This bug report was last modified 16 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.