GNU bug report logs - #76705
31.0.50; igc: crash

Previous Next

Package: emacs;

Reported by: Óscar Fuentes <oscarfv <at> eclipso.eu>

Date: Mon, 3 Mar 2025 04:33:04 UTC

Severity: normal

Found in version 31.0.50

Full log


View this message in rfc822 format

From: Pip Cet <pipcet <at> protonmail.com>
To: Óscar Fuentes <oscarfv <at> eclipso.eu>, Gerd Möllmann <gerd.moellmann <at> gmail.com>, Helmut Eller <eller.helmut <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 76705 <at> debbugs.gnu.org
Subject: bug#76705: 31.0.50; igc: crash
Date: Tue, 04 Mar 2025 19:04:34 +0000
Gerd, Helmut, Eli:

See https://github.com/Ravenbrook/mps/issues/304 for the description of
a Linux-specific MPS issue that currently causes hard crashes when we
create "too many" mappings for the Linux kernel (this is purely a kernel
issue, so no GNU/ prefix here), which causes 'mprotect' to fail with
ENOMEM.

We have to find a workaround for this.

Increasing AREA_GRAIN_SIZE may delay running into the problem, or there
might be a way to handle a failed mprotect call and at least recover the
Emacs session, but ultimately the maximum safe size of an Emacs session
appears to be ARENA_GRAIN_SIZE * /proc/sys/vm/max_map_count, or about
512 MB of MPS memory (minus whatever mappings libraries, malloc, and
mmap require).  That's obviously not sufficient.

One open question is whether Linux is actually capable of merging
adjacent mappings when they're 'mprotect'ed to have the same
permissions.

Looking at /proc/<emacs PID>/maps shows things like these:

7fff7044a000-7fff70462000 rwxp 00000000 00:00 0
7fff70462000-7fff70463000 ---p 00000000 00:00 0
7fff70463000-7fff7047a000 rwxp 00000000 00:00 0
7fff7047a000-7fff7049e000 rwxp 00000000 00:00 0
7fff7049e000-7fff704b0000 rwxp 00000000 00:00 0

which appears to indicate at least some mappings cannot be merged that
way.  However, the line count of this /proc file is reduced
significantly by calling M-x igc-collect, indicating that sometimes
mappings are merged.

I'll have a closer look at the MPS code to come up with a plausible
workaround, but any ideas would be appreciated!

Pip Cet <pipcet <at> protonmail.com> writes:

> Óscar Fuentes <oscarfv <at> eclipso.eu> writes:
>
>> Pip Cet <pipcet <at> protonmail.com> writes:
>>
>>>> $ objdump -h dump | grep " load" | wc
>>>>    9237   64659  747221
>>>
>>> Hmm. Is it possible that increases by a factor of 6 during a collection?
>>> How large is the core file?
>>
>> $ ls -lh
>> total 2.8G
>> -rw-rw-r-- 1 oscar oscar 2.8G Mar  3 16:46 dump
>
> That looks potentially large enough for my theory that we ran out into
> the max_map_count limit.
>
>     MPS_ARGS_ADD (args, MPS_KEY_ARENA_GRAIN_SIZE, 64 * 1024);
>
> wouldn't fix the problem, but by extrapolation we'd only hit it when the
> Emacs session reaches ~20 GB :-)
>
> (I originally added that line to my local tree for performance reasons,
> but I'm not sure about the impact on weak objects in particular.)
>
> If you have the time and inclination, you could reduce
> /proc/sys/vm/max_map_count and see what kind of Emacs crash you get, but
> as it's a system-wide property, that might not be a good idea if other
> important stuff is on the machine...

So I decided to experiment with this on a VM, and I'm now convinced it's
what happens.  I've reported this bug on GitHub at
https://github.com/Ravenbrook/mps/issues/304 (complete copy follows
because GitHub sometimes will refuse to let you even see posts without
an account, so we should document things in public forums as well as
GitHub's closed database).

While running out of maps is an unfortunate situation, it is not
necessarily unfixable: if Linux coalesces maps that become identical
again, we can just eagerly trigger memory barriers until we're back in a
situation where not too many maps exist.  Some care needs to be taken to
ensure we don't call 'malloc' while doing so, because 'malloc' can
itself create new mappings.

However, it seems reasonable to ultimately contact the Linux kernel
folks about setting this limit to a more reasonable value (and working
around whatever optimizations depend on it being so low), or doing the
same thing for the more popular distros.

Here's the GitHub report:

(I'm working on the feature/igc branch of GNU Emacs which adds MPS as a garbage collector. You haven't heard much from us since things are going very well generally, and it's usually our fault when they don't)

However, we've now seen several reports which indicate the `mprotect` call in `protix.c`:`ProtSet` failed, which leads to unreachable code being reached and an Emacs crash:

```
  /* .assume.mprotect.base */
  result = mprotect((void *)base, (size_t)AddrOffset(base, limit), flags);
  if (MAYBE_HARDENED_RUNTIME && result != 0 && errno == EACCES
      && (flags & PROT_WRITE) && (flags & PROT_EXEC))
  {
    /* Apple Hardened Runtime is enabled, so that we cannot have
     * memory that is simultaneously writable and executable. Handle
     * this by dropping the executable part of the request. See
     * <design/prot#impl.xc.prot.exec> for details. */
    prot_all = PROT_READ | PROT_WRITE;
    result = mprotect((void *)base, (size_t)AddrOffset(base, limit), flags & prot_all);
  }
  if (result != 0)
    NOTREACHED;
```

`protix.txt` says this:

```
_`.fun.set.assume.mprotect`: We assume that the call to ``mprotect()``
always succeeds.  We should always call the function with valid
arguments (aligned, references to mapped pages, and with an access
that is compatible with the access of the underlying object).
```

Our current theory is that this assumption is violated on Linux, which keeps a maximum number of contiguous maps in `/proc/sys/vm/max_map_count`, with the default at 65530. The way it appears to work is that when a single page in a large contiguous map is `mprotect`ed to be different than its neighbors, that map turns into three maps for the purposes of the max_map_count limit, and `mprotect` eventually returns `ENOMEM`.

Experiments with lowering the `max_map_count` limit in a virtual machine appear to be consistent with that theory, because it causes Emacs crashes much like the ones that were reported by our users.

Pip





This bug report was last modified 162 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.