Package: emacs;
Reported by: Óscar Fuentes <oscarfv <at> eclipso.eu>
Date: Mon, 3 Mar 2025 04:33:04 UTC
Severity: normal
Found in version 31.0.50
View this message in rfc822 format
From: Pip Cet <pipcet <at> protonmail.com> To: Óscar Fuentes <oscarfv <at> eclipso.eu>, Gerd Möllmann <gerd.moellmann <at> gmail.com>, Helmut Eller <eller.helmut <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org> Cc: 76705 <at> debbugs.gnu.org Subject: bug#76705: 31.0.50; igc: crash Date: Tue, 04 Mar 2025 19:04:34 +0000
Gerd, Helmut, Eli: See https://github.com/Ravenbrook/mps/issues/304 for the description of a Linux-specific MPS issue that currently causes hard crashes when we create "too many" mappings for the Linux kernel (this is purely a kernel issue, so no GNU/ prefix here), which causes 'mprotect' to fail with ENOMEM. We have to find a workaround for this. Increasing AREA_GRAIN_SIZE may delay running into the problem, or there might be a way to handle a failed mprotect call and at least recover the Emacs session, but ultimately the maximum safe size of an Emacs session appears to be ARENA_GRAIN_SIZE * /proc/sys/vm/max_map_count, or about 512 MB of MPS memory (minus whatever mappings libraries, malloc, and mmap require). That's obviously not sufficient. One open question is whether Linux is actually capable of merging adjacent mappings when they're 'mprotect'ed to have the same permissions. Looking at /proc/<emacs PID>/maps shows things like these: 7fff7044a000-7fff70462000 rwxp 00000000 00:00 0 7fff70462000-7fff70463000 ---p 00000000 00:00 0 7fff70463000-7fff7047a000 rwxp 00000000 00:00 0 7fff7047a000-7fff7049e000 rwxp 00000000 00:00 0 7fff7049e000-7fff704b0000 rwxp 00000000 00:00 0 which appears to indicate at least some mappings cannot be merged that way. However, the line count of this /proc file is reduced significantly by calling M-x igc-collect, indicating that sometimes mappings are merged. I'll have a closer look at the MPS code to come up with a plausible workaround, but any ideas would be appreciated! Pip Cet <pipcet <at> protonmail.com> writes: > Óscar Fuentes <oscarfv <at> eclipso.eu> writes: > >> Pip Cet <pipcet <at> protonmail.com> writes: >> >>>> $ objdump -h dump | grep " load" | wc >>>> 9237 64659 747221 >>> >>> Hmm. Is it possible that increases by a factor of 6 during a collection? >>> How large is the core file? >> >> $ ls -lh >> total 2.8G >> -rw-rw-r-- 1 oscar oscar 2.8G Mar 3 16:46 dump > > That looks potentially large enough for my theory that we ran out into > the max_map_count limit. > > MPS_ARGS_ADD (args, MPS_KEY_ARENA_GRAIN_SIZE, 64 * 1024); > > wouldn't fix the problem, but by extrapolation we'd only hit it when the > Emacs session reaches ~20 GB :-) > > (I originally added that line to my local tree for performance reasons, > but I'm not sure about the impact on weak objects in particular.) > > If you have the time and inclination, you could reduce > /proc/sys/vm/max_map_count and see what kind of Emacs crash you get, but > as it's a system-wide property, that might not be a good idea if other > important stuff is on the machine... So I decided to experiment with this on a VM, and I'm now convinced it's what happens. I've reported this bug on GitHub at https://github.com/Ravenbrook/mps/issues/304 (complete copy follows because GitHub sometimes will refuse to let you even see posts without an account, so we should document things in public forums as well as GitHub's closed database). While running out of maps is an unfortunate situation, it is not necessarily unfixable: if Linux coalesces maps that become identical again, we can just eagerly trigger memory barriers until we're back in a situation where not too many maps exist. Some care needs to be taken to ensure we don't call 'malloc' while doing so, because 'malloc' can itself create new mappings. However, it seems reasonable to ultimately contact the Linux kernel folks about setting this limit to a more reasonable value (and working around whatever optimizations depend on it being so low), or doing the same thing for the more popular distros. Here's the GitHub report: (I'm working on the feature/igc branch of GNU Emacs which adds MPS as a garbage collector. You haven't heard much from us since things are going very well generally, and it's usually our fault when they don't) However, we've now seen several reports which indicate the `mprotect` call in `protix.c`:`ProtSet` failed, which leads to unreachable code being reached and an Emacs crash: ``` /* .assume.mprotect.base */ result = mprotect((void *)base, (size_t)AddrOffset(base, limit), flags); if (MAYBE_HARDENED_RUNTIME && result != 0 && errno == EACCES && (flags & PROT_WRITE) && (flags & PROT_EXEC)) { /* Apple Hardened Runtime is enabled, so that we cannot have * memory that is simultaneously writable and executable. Handle * this by dropping the executable part of the request. See * <design/prot#impl.xc.prot.exec> for details. */ prot_all = PROT_READ | PROT_WRITE; result = mprotect((void *)base, (size_t)AddrOffset(base, limit), flags & prot_all); } if (result != 0) NOTREACHED; ``` `protix.txt` says this: ``` _`.fun.set.assume.mprotect`: We assume that the call to ``mprotect()`` always succeeds. We should always call the function with valid arguments (aligned, references to mapped pages, and with an access that is compatible with the access of the underlying object). ``` Our current theory is that this assumption is violated on Linux, which keeps a maximum number of contiguous maps in `/proc/sys/vm/max_map_count`, with the default at 65530. The way it appears to work is that when a single page in a large contiguous map is `mprotect`ed to be different than its neighbors, that map turns into three maps for the purposes of the max_map_count limit, and `mprotect` eventually returns `ENOMEM`. Experiments with lowering the `max_map_count` limit in a virtual machine appear to be consistent with that theory, because it causes Emacs crashes much like the ones that were reported by our users. Pip
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.