GNU bug report logs -
#31925
'guix substitutes' sometimes hangs on glibc 2.27
Previous Next
Reported by: ludo <at> gnu.org (Ludovic Courtès)
Date: Thu, 21 Jun 2018 11:46:01 UTC
Severity: serious
Tags: unreproducible
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
Message #25 received at 31925 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello Mark,
Thanks for chiming in!
Mark H Weaver <mhw <at> netris.org> skribis:
> Does libgc spawn threads that run concurrently with user threads? If
> so, that would be news to me. My understanding was that incremental
> marking occurs within GC allocation calls, and marking threads are only
> spawned after all user threads have been stopped, but I could be wrong.
libgc launches mark threads as soon as it is initialized, I think.
> The first idea that comes to my mind is that perhaps the finalization
> thread is holding the GC allocation lock when 'fork' is called. The
> finalization thread grabs the GC allocation lock every time it calls
> 'GC_invoke_finalizers'. All ports backed by POSIX file descriptors
> (including pipes) register finalizers and therefore spawn the
> finalization thread and make work for it to do.
In 2.2 there’s scm_i_finalizer_pre_fork that takes care of shutting down
the finalization thread right before fork. So the finalization thread
cannot be blamed, AIUI.
> Another possibility: both the finalization thread and the signal
> delivery thread call 'scm_without_guile', which calls 'GC_do_blocking',
> which also temporarily grabs the GC allocation lock before calling the
> specified function. See 'GC_do_blocking_inner' in pthread_support.c in
> libgc. You spawn the signal delivery thread by calling 'sigaction' and
> you make work for it to do every second when the SIGALRM is delivered.
That’s definitely a possibility: the signal thread could be allocating
stuff, and thereby taking the alloc lock just at that time.
>> If that is correct, the fix would be to call fork within
>> ‘GC_call_with_alloc_lock’.
>>
>> How does that sound?
>
> Sure, sounds good to me.
Here’s a patch:
[Message part 2 (text/x-patch, inline)]
diff --git a/libguile/posix.c b/libguile/posix.c
index b0fcad5fd..088e75631 100644
--- a/libguile/posix.c
+++ b/libguile/posix.c
@@ -1209,6 +1209,13 @@ SCM_DEFINE (scm_execle, "execle", 2, 0, 1,
#undef FUNC_NAME
#ifdef HAVE_FORK
+static void *
+do_fork (void *pidp)
+{
+ * (int *) pidp = fork ();
+ return NULL;
+}
+
SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
(),
"Creates a new \"child\" process by duplicating the current \"parent\" process.\n"
@@ -1236,7 +1243,13 @@ SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
" further behavior unspecified. See \"Processes\" in the\n"
" manual, for more information.\n"),
scm_current_warning_port ());
- pid = fork ();
+
+ /* Take the alloc lock to make sure it is released when the child
+ process starts. Failing to do that the child process could start
+ in a state where the alloc lock is taken and will never be
+ released. */
+ GC_call_with_alloc_lock (do_fork, &pid);
+
if (pid == -1)
SCM_SYSERROR;
return scm_from_int (pid);
[Message part 3 (text/plain, inline)]
Thoughts?
Unfortunately my ‘call-with-decompressed-port’ reproducer doesn’t seem t
to reproduce much today so I can’t tell if this helps (I let it run more
than 5 minutes with the supposedly-buggy Guile and nothing happened…).
Thanks,
Ludo’.
This bug report was last modified 5 years and 216 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.