GNU bug report logs - #31925
'guix substitutes' sometimes hangs on glibc 2.27

Previous Next

Package: guix;

Reported by: ludo <at> gnu.org (Ludovic Courtès)

Date: Thu, 21 Jun 2018 11:46:01 UTC

Severity: serious

Tags: unreproducible

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #34 received at 31925 <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> igalia.com>
To: Mark H Weaver <mhw <at> netris.org>
Cc: 31925 <at> debbugs.gnu.org, Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27
Date: Thu, 05 Jul 2018 16:04:12 +0200
Hi,

On Thu 05 Jul 2018 12:05, Mark H Weaver <mhw <at> netris.org> writes:

> However, it's also the case that libgc uses 'pthread_atfork' (where
> available) to arrange to grab the GC allocation as well as the "mark
> locks" in the case where parallel marking is enabled.  See
> fork_prepare_proc, fork_parent_proc, and fork_child_proc in
> pthread_support.c.

I don't think this is enabled by default.  You have to configure your
libgc this way.  When investigating similar bugs, I proposed enabling it
by default a while ago:

  http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2012-February/004958.html

I ended up realizing that pthread_atfork was just a bogus interface.
For one, it turns out that POSIX clearly says that if a multithreaded
program forks, the behavior of the child after the fork is undefined if
it calls any non-async-signal-safe function before calling exec():

  https://lists.gnu.org/archive/html/guile-devel/2012-02/msg00157.html

But practically, the only reasonable thing to do with atfork is to grab
all of the locks, then release them after forking, in both child and
parent.  However you can't do this without deadlocking from a library,
as the total lock order is a property of the program and not something a
library can decide.

There are thus two solutions: either ensure that there are no other
threads when you fork, or only call async-signal-safe functions before
you exec().  open-process does the latter.  fork will warn if the former
is not the case.  When last I looked into this, I concluded that
pthread_atfork doesn't buy us anything, though I could be wrong!

>> Here's the body of primitive-fork fwiw:
>>
>>     {
>>       int pid;
>>       scm_i_finalizer_pre_fork ();
>>       if (scm_ilength (scm_all_threads ()) != 1)
>
> I think there's a race here.  I think it's possible for the finalizer
> thread to be respawned after 'scm_i_finalizer_pre_fork' in two different
> ways:
>
> (1) 'scm_all_threads' performs allocation, which could trigger GC.
>
> (2) another thread could perform heap allocation and trigger GC after
>     'scm_i_finalizer_pre_fork' joins the thread.  it might then shut
>     down before 'scm_all_thread' is called.
>
> However, these are highly unlikely scenarios, and most likely not the
> problem we're seeing here.
>
> Still, I think the 'scm_i_finalizer_pre_fork' should be moved after the
> 'if', to avoid this race.

Good point!  Probably we should use some non-allocating
 scm_i_is_multi_threaded() or something.  We can't move the pre-fork
thing though because one of the effects we are looking for is to reduce
the thread count!

Cheers,

Andy




This bug report was last modified 5 years and 217 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.