GNU bug report logs - #74805
30.0.92; Trying to build scratch/igc on Cygwin

Previous Next

Package: emacs;

Reported by: Ken Brown <kbrown <at> cornell.edu>

Date: Wed, 11 Dec 2024 22:56:02 UTC

Severity: normal

Found in version 30.0.92

Done: Ken Brown <kbrown <at> cornell.edu>

Bug is archived. No further changes may be made.

Full log


Message #86 received at 74805 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 74805 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Richard Brooksby <rb <at> ravenbrook.com>
Subject: Re: bug#74805: 30.0.92; Trying to build scratch/igc on Cygwin
Date: Mon, 27 Jan 2025 17:40:30 +0000
"Ken Brown" <kbrown <at> cornell.edu> writes:

> On 12/23/2024 6:32 PM, Ken Brown wrote:
>> On 12/22/2024 3:32 AM, Richard Brooksby wrote:
>>> On 2024-12-22 08:24, Richard Brooksby wrote:
>>>> On 2024-12-21 16:56, Ken Brown wrote:
>>>>> On 12/21/2024 2:24 AM, Eli Zaretskii wrote:
>>>>>>> Date: Fri, 20 Dec 2024 18:48:37 -0500
>>>>>>> Cc: 74805 <at> debbugs.gnu.org, Richard Brooksby <rb <at> ravenbrook.com>
>>>>>>> From: Ken Brown <kbrown <at> cornell.edu>
>>>>>>> 3. The "mmap" branch is a straightforward port, mostly imitating the
>>>>>>> FreeBSD port.  It currently (with Cygwin 3.5.5) fails because of a
>>>>>>> limitation of Cygwin's mmap.  But I have a simple patch to Cygwin
>>>>>>> in the
>>>>>>> works that removes that limitation.  With that patch, 37 of the 38
>>>>>>> MPS
>>>>>>> tests pass.  I still need to debug the failing test.  I'm cautiously
>>>>>>> optimistic that I can get this approach to work.  Either way, I
>>>>>>> expect
>>>>>>> the Cygwin patch to soon be available in a test release of Cygwin
>>>>>>> 3.6.0
>>>>>>> so that other Cygwin users can try it.
>>>>>>
>>>>>> Thanks for the update, I think this is very good news.
>>>>>
>>>>> I could use some help from the MPS experts in debugging the failing
>>>>> test, which is arenacv.  I ran the test under strace and didn't see
>>>>> any mmap or munmap failures.  I'm attaching the test log, which
>>>>> doesn't mean a thing to me.  I also built an unoptimized arenacv and
>>>>> can run it under gdb if someone tells me what to look for.
>>>>
>>>> Hello.  I can't offer much direct help just now, but here's where I'd
>>>> start.
>>> ...
>>>  > It's very unlikely that you're actually running out of address
>>> space on a 64-bit system.
>>>
>>> I should add that arenacv is a coverage test that is *trying* to
>>> provoke the ResRESOURCE error paths in some circumstances https://
>>> github.com/ Ravenbrook/mps/
>>> blob/9fd0577cf1231e61c9801c81499e5d16d0743806/code/ arenacv.c#L461 but
>>> note that this isn't where the test is failing. However, it may be
>>> relevant, perhaps if your munmap doesn't successfully free address space.
>> Thanks!  You've given me some good hints.
> Here's the latest update on my attempts.  The good news is that I've

Sorry for responding to this only now.  I saw Eli's suggestion, thought
that it must have been the reason, and was hoping the problem was fixed.

> been running a build from the igc branch with no noticeable problems.

That is excellent.  Please report any issues!

> The bad news is that I am unable to debug the failing arenacv test.  I
> have no idea how critical this is from the point of view of the igc branch.

I think it would be quite important to find out why it's failing; if it
turns out to be harmless, that's okay.

After thinking about this for a bit, I think we should probably ignore
this test failure for now: there's some (most likely) fragmentation
issue on cygwin, it seems, which causes this unusually aggressive stress
test to fail eventually.  I assume Emacs isn't that aggressive.

> 2. Of the two calls to testAllocAndIterate in testPageTable, it's only
> the second one that fails, provided I apply the following patch:
>
> --- a/code/arenacv.c
> +++ b/code/arenacv.c
> @@ -333,6 +333,7 @@ static void testAllocAndIterate(Arena arena, Pool pool,
>     Count offset, gap, new;
>     ZoneSet zone = (ZoneSet)2;
>     int i;
> +  mps_res_t res;
>
>     LocusPrefInit(&pref);
>
> @@ -353,8 +354,14 @@ static void testAllocAndIterate(Arena arena, Pool pool,
>               "offsetRegion");
>         for(gap = numPerPage+1; gap <= 3 * (numPerPage+1);
>             gap += (numPerPage+1)) {
> -        die(allocator->alloc(&gapRegion, &pref, gap * pageSize, pool),
> -            "gapRegion");
> +       res = allocator->alloc(&gapRegion, &pref, gap * pageSize, pool);
> +       if (res != ResOK) {
> +         fprintf(stdout, "res = %d, gap = %lu, offset = %lu, i = %d\n",
> +                 res, gap, offset, i);
> +         die(res, "gapRegion");
> +       }
> +        /* die(allocator->alloc(&gapRegion, &pref, gap * pageSize,
> pool), */
> +        /*     "gapRegion"); */
>           die(allocator->alloc(&topRegion, &pref, pageSize, pool),
>               "topRegion");
>           allocator->free(&gapRegion);
> @@ -473,8 +480,8 @@ int main(int argc, char *argv[])
>
>     testlib_init(argc, argv);
>
> -  testPageTable((ArenaClass)mps_arena_class_vm(), TEST_ARENA_SIZE, 0,
> TRUE);
> -  testPageTable((ArenaClass)mps_arena_class_vm(), TEST_ARENA_SIZE, 0,
> FALSE);
> +  /* testPageTable((ArenaClass)mps_arena_class_vm(), TEST_ARENA_SIZE,
> 0, TRUE); */
> +  /* testPageTable((ArenaClass)mps_arena_class_vm(), TEST_ARENA_SIZE,
> 0, FALSE); */
>
>     block = malloc(TEST_ARENA_SIZE);
>     cdie(block != NULL, "malloc");
>
>
> If I don't apply that patch then the first call fails (and of course the
> second one doesn't get run).  I can't make sense out of that.

So the two calls to testPageTable that succeed but were commented out in
your patch must have left the VM system in a different state?  My theory
is the allocator is running out of VA space or fragmenting it so badly
that it eventually cannot make the (large) allocation.

> 3. After I apply the patch, there are no calls to mmap or munmap
> (according to strace) except an initial call to mmap when the code calls
> malloc.  [This is an internal implementation detail, that Cygwin uses

I *think* that's okay: we're working with the large malloc'd area.

> mmap in malloc.]  If I don't apply that patch, there are lots of calls
> to both, all of which succeed.  I don't understand why the code is not
> calling mmap/munmap when the patch is applied.  How else could it be
> trying to acquire address space?

sbrk, maybe, or it's in that inital malloc...

> 4. When testAllocAndIterate fails, it's at the very first iteration
> through the loops.  In other words, i = 0, offset = 0, and gap =
> numPerPage = 2731.

I don't see how gap can ever equal numPerPage.

> 5. The failure can be traced to PolicyAlloc() returning ResRESOURCE at
> policy.c:126.  Unfortunately, there are about 7 calls to
> ArenaFreeLandAlloc that have to *all* fail before we reach line 126.  I
> tried stepping through all 7 of them in the failure case, but, being
> completely unfamiliar with the code, I couldn't see what was happening.
>
> 6. Here's the backtrace when we reach policy.c:126 (in a run without the
> patch above):
>
> #0  PolicyAlloc (tractReturn=0x7ffffc760, arena=0x6ffffbff0010,
>      pref=0x7ffffc870, size=178978816, pool=0x6ffffc030248) at policy.c:126

This looks like we ran out of VA space.  Maybe we failed to return
free'd VA space to the system, eventually running out of it even on a
64-bit machine?  Can you print the return values in the allocator in any
way?  If you can't, can you make the allocator keep count of it and
inspect the counter in gdb on failure?

The most unusual thing seems to be that pageSize / ArenaGrainSize is so
large (64 KB).  Maybe that simply means that TEST_ARENA_SIZE needs to be
increased?

Pip





This bug report was last modified 103 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.