GNU bug report logs -
#40525
inferior process on core-updates crashes: mmap(PROT_NONE) failed
Previous Next
Full log
Message #22 received at 40525 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Christopher Baines <mail <at> cbaines.net> writes:
> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Hi Christopher,
>>
>> Christopher Baines <mail <at> cbaines.net> skribis:
>>
>>> I've attached a script that when run should reproduce the issue. I
>>> extracted the code relating to lint warnings from the Guix Data
>>> Service. The script attached runs this code twice against the inferior,
>>> once will often be enough to cause it to crash, but twice should
>>> reproduce it more reliably.
>>
>> Thanks a lot.
>>
>> Here’s a backtrace from the core dumped by the inferior:
>
> ...
>
>> It could be an unbounded growth of libgc’s finalizer table or our weak
>> tables as we experienced in <https://bugs.gnu.org/28590>.
>>
>> We should be able to reproduce it with something like:
>>
>> guix time-machine --commit=d523eb5c9c2659cbbaf4eeef3691234ae527ee6a -- \
>> lint -c inputs-should-be-native,license,mirror-url,source-file-name,source-unstable-tarball,derivation,patch-file-names,formatting,synopsis
>>
>> In top one can see that heap usage keeps growing, which may well be a
>> bug in Guix proper rather than in Guile… but it doesn’t crash.
>>
>> I would propose three actions here:
>>
>> 1. Run linters un ‘gcprof’ to see what’s eating memory and hopefully
>> find and address the leak. As a start, maybe just start reducing
>> the list of checkers to see if there’s one of them that’s causing
>> it.
>>
>> The ‘derivation’ checker is definitely responsible for a lot of the
>> heap consumption because of the various caches in (guix packages) &
>> co. Perhaps add calls to ‘invalidate-derivation-caches!’ as in
>> (gnu ci).
>>
>> 2. Work around the problem in Guix Data Service by running, say, one
>> inferior per checker instead of one inferior for all checkers for
>> all packages.
>>
>> 3. If #1 didn’t help, let’s see if we can isolate a Guile weak-table
>> bug or something like that.
>>
>> Thoughts?
>
> Thanks, that's useful to know.
>
> I think I've now managed to find a way of reproducing this without the
> inferior getting in the way. I was testing if triggering garbage
> collection in Guile would help avoid the problem, but actually it seems
> to cause it. I guess given the mentions of GC in the above stacktrace,
> and the major version change of libgc, some GC related bug seems quite
> likely here.
>
> I've been testing with a checkout of Guix built with Guix from the
> core-updates branch. I think that provides the same broken Guile that
> the guix repl is using.
>
> When trying to just use a checkout of the core-updates branch, and guile
> built from that branch I get the following odd error:
>
> → ./pre-inst-env /gnu/store/18hp7flyb3yid3yp49i6qcdq0sbi5l1n-guile-3.0.2/bin/guile ./reproduce-core-updates-mmap-PROT_NONE-failed.scm
> guile: warning: failed to install locale
> warning: failed to load '(gnu packages abiword)': Function not implemented
> error: git-fetch: unbound variable
> hint: Did you forget `(use-modules (guix git-download))'?
>
> error: git-version: unbound variable
>
>
>
> No idea what's happening there, but when I ./configure and make with
> packages from core-updates, I seem to end up with a setup that works:
>
> This is the guile I'm using: /gnu/store/18hp7flyb3yid3yp49i6qcdq0sbi5l1n-guile-3.0.2/bin/guile
>
> If you just run the script, you should see:
>
> → ./pre-inst-env guile ./reproduce-core-updates-mmap-PROT_NONE-failed.scm
>
> ;;; ("%package-table-setup" #<hash-table 7f5f329278a0 13275/28099>)
> mmap(PROT_NONE) failed
> Aborted
>
>
> For more information, you can pipe the script to the REPL. What you
> should see is that it's slow to compute the lint warnings the first
> time, but the subsequent times are quick, and it crashes in one of the
> (gc) calls.
>
> I'm going to try and continue looking in to this, at least it'll be
> easier to delve in to guile now that I can directly control what guile
> is used.
Following up on this, I've built Guile on core-updates with libgc <at> 7
rather than libgc <at> 8 (which is what's used above), and I can't reproduce
the issue. So, I'm getting more certain that this is a regression which
the libgc upgrade has led to.
Would it be feasible to keep guile, or at least the guile Guix uses with
libgc <at> 7 for now?
[signature.asc (application/pgp-signature, inline)]
This bug report was last modified 4 years and 11 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.