From debbugs-submit-bounces@debbugs.gnu.org Thu Jun 21 07:45:30 2018 Received: (at submit) by debbugs.gnu.org; 21 Jun 2018 11:45:30 +0000 Received: from localhost ([127.0.0.1]:58304 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fVy1p-0004jB-89 for submit@debbugs.gnu.org; Thu, 21 Jun 2018 07:45:30 -0400 Received: from eggs.gnu.org ([208.118.235.92]:50379) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fVy1n-0004bl-G8 for submit@debbugs.gnu.org; Thu, 21 Jun 2018 07:45:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fVy1g-0006iU-BR for submit@debbugs.gnu.org; Thu, 21 Jun 2018 07:45:22 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:49879) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fVy1g-0006iN-7D for submit@debbugs.gnu.org; Thu, 21 Jun 2018 07:45:20 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41337) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fVy1e-0003Zq-6M for bug-guix@gnu.org; Thu, 21 Jun 2018 07:45:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fVy1b-0006gg-1k for bug-guix@gnu.org; Thu, 21 Jun 2018 07:45:18 -0400 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:45860) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fVy1a-0006gW-Su for bug-guix@gnu.org; Thu, 21 Jun 2018 07:45:14 -0400 Received: from [193.50.110.137] (port=58118 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fVy1a-0003NJ-BH for bug-guix@gnu.org; Thu, 21 Jun 2018 07:45:14 -0400 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: bug-guix@gnu.org Subject: 'guix substitutes' sometimes hangs on glibc 2.27 X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 3 Messidor an 226 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Thu, 21 Jun 2018 13:45:12 +0200 Message-ID: <87bmc4748n.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) Hello Guix! When downloading a number of substitutes, =E2=80=98guix substitute=E2=80=99= sometimes hangs for me since the switch to glibc 2.27. Anyone else experiencing this? It=E2=80=99s relatively frequent for me. The backtrace shows this: --8<---------------cut here---------------start------------->8--- (gdb) bt #0 0x00007fbb34bf794d in __GI___pthread_timedjoin_ex (threadid=3D140441961= 314048, thread_return=3Dthread_return@entry=3D0x0, abstime=3Dabstime@entry= =3D0x0, block=3Dblock@entry=3Dtrue) at pthread_join_common.c:89 #1 0x00007fbb34bf773c in __pthread_join (threadid=3D, threa= d_return=3Dthread_return@entry=3D0x0) at pthread_join.c:24 #2 0x00007fbb350d7548 in stop_finalization_thread () at finalizers.c:265 #3 0x00007fbb350d7759 in scm_i_finalizer_pre_fork () at finalizers.c:290 #4 0x00007fbb3514f256 in scm_fork () at posix.c:1222 #5 0x00007fbb351477fd in vm_regular_engine (thread=3D0x7fbb313739d0, vp=3D= 0x1569f30, registers=3D0x52de, resume=3D884963661) at vm-engine.c:784 #6 0x00007fbb3514ae5a in scm_call_n (proc=3D0x7fbb355c2030, argv=3Dargv@en= try=3D0x7fff856ae7f8, nargs=3Dnargs@entry=3D1) at vm.c:1257 #7 0x00007fbb350ceff7 in scm_primitive_eval (exp=3Dexp@entry=3D0x164fb00) = at eval.c:662 #8 0x00007fbb350cf053 in scm_eval (exp=3D0x164fb00,=20 module_or_state=3Dmodule_or_state@entry=3D0x162b140) at eval.c:696 #9 0x00007fbb3511a0b0 in scm_shell (argc=3D10, argv=3D0x15f0390) at script= .c:454 #10 0x00007fbb350e5add in invoke_main_func (body_data=3D0x7fff856aed10) at = init.c:340 #11 0x00007fbb350c82da in c_body (d=3D0x7fff856aec50) at continuations.c:422 #12 0x00007fbb351477fd in vm_regular_engine (thread=3D0x7fbb313739d0, vp=3D= 0x1569f30, registers=3D0x52de, resume=3D884963661) at vm-engine.c:784 #13 0x00007fbb3514ae5a in scm_call_n (proc=3Dproc@entry=3D#, argv=3Dargv@entry=3D0x0, nargs=3Dnargs@entry=3D0) at vm.c:1257 #14 0x00007fbb350cdef9 in scm_call_0 (proc=3Dproc@entry=3D#) at eval.c:481 #15 0x00007fbb3513a026 in catch (tag=3Dtag@entry=3D#t, thunk=3D#, handler=3D0x15641e0,=20 pre_unwind_handler=3D0x15641c0) at throw.c:137 #16 0x00007fbb3513a365 in scm_catch_with_pre_unwind_handler (key=3Dkey@entr= y=3D#t, thunk=3D, handler=3D, pre_unwind_hand= ler=3D) at throw.c:254 #17 0x00007fbb3513a51f in scm_c_catch (tag=3Dtag@entry=3D#t, body=3Dbody@en= try=3D0x7fbb350c82d0 , body_data=3Dbody_data@entry=3D0x7fff856aec50= , handler=3Dhandler@entry=3D0x7fbb350c8560 ,=20 handler_data=3Dhandler_data@entry=3D0x7fff856aec50, pre_unwind_handler= =3Dpre_unwind_handler@entry=3D0x7fbb350c83c0 , pre_unwi= nd_handler_data=3D0x1564b60) at throw.c:377 #18 0x00007fbb350c88c0 in scm_i_with_continuation_barrier (body=3Dbody@entr= y=3D0x7fbb350c82d0 , body_data=3Dbody_data@entry=3D0x7fff856aec50, = handler=3Dhandler@entry=3D0x7fbb350c8560 ,=20 handler_data=3Dhandler_data@entry=3D0x7fff856aec50, pre_unwind_handler= =3Dpre_unwind_handler@entry=3D0x7fbb350c83c0 , pre_unwi= nd_handler_data=3D0x1564b60) at continuations.c:360 #19 0x00007fbb350c8955 in scm_c_with_continuation_barrier (func=3D, data=3D) at continuations.c:456 #20 0x00007fbb35138c3c in with_guile (base=3Dbase@entry=3D0x7fff856aecb8, d= ata=3Ddata@entry=3D0x7fff856aece0) at threads.c:661 #21 0x00007fbb34e2afb8 in GC_call_with_stack_base (fn=3Dfn@entry=3D0x7fbb35= 138bf0 , arg=3Darg@entry=3D0x7fff856aece0) at misc.c:1949 #22 0x00007fbb35138fd8 in scm_i_with_guile (dynamic_state=3D= , data=3Ddata@entry=3D0x7fff856aece0, func=3Dfunc@entry=3D0x7fbb350e5ac0 ) at threads.c:704 #23 scm_with_guile (func=3Dfunc@entry=3D0x7fbb350e5ac0 , = data=3Ddata@entry=3D0x7fff856aed10) at threads.c:710 #24 0x00007fbb350e5c72 in scm_boot_guile (argc=3Dargc@entry=3D7, argv=3Darg= v@entry=3D0x7fff856aee68, main_func=3Dmain_func@entry=3D0x400ce0 , closure=3Dclosure@entry=3D0x0) at init.c:323 #25 0x0000000000400b80 in main (argc=3D7, argv=3D0x7fff856aee68) at guile.c= :101 (gdb) info threads Id Target Id Frame=20 * 1 Thread 0x7fbb355cab80 (LWP 21207) "guix substitute" 0x00007fbb34bf79= 4d in __GI___pthread_timedjoin_ex (threadid=3D140441961314048, thread_retur= n=3Dthread_return@entry=3D0x0,=20 abstime=3Dabstime@entry=3D0x0, block=3Dblock@entry=3Dtrue) at pthread_j= oin_common.c:89 2 Thread 0x7fbb3342c700 (LWP 21208) ".guix-real" 0x00007fbb34bfc552 in= futex_wait_cancelable (private=3D, expected=3D0, futex_word= =3D0x7fbb3504f6ec ) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 3 Thread 0x7fbb32c2b700 (LWP 21209) ".guix-real" 0x00007fbb34bfc552 in= futex_wait_cancelable (private=3D, expected=3D0, futex_word= =3D0x7fbb3504f6ec ) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 4 Thread 0x7fbb3242a700 (LWP 21210) ".guix-real" 0x00007fbb34bfc552 in= futex_wait_cancelable (private=3D, expected=3D0, futex_word= =3D0x7fbb3504f6ec ) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 5 Thread 0x7fbb2eb78700 (LWP 21212) ".guix-real" 0x00007fbb34bffaac in= __libc_read (fd=3D9, buf=3Dbuf@entry=3D0x7fbb2eb77540, nbytes=3Dnbytes@ent= ry=3D1) at ../sysdeps/unix/sysv/linux/read.c:27 6 Thread 0x7fbb31373700 (LWP 21214) "guix substitute" 0x00007fbb34bffa= ac in __libc_read (fd=3D5, buf=3Dbuf@entry=3D0x7fbb31372a30, nbytes=3Dnbyte= s@entry=3D1) at ../sysdeps/unix/sysv/linux/read.c:27 --8<---------------cut here---------------end--------------->8--- The finalization thread is itself stuck reading from its pipe: --8<---------------cut here---------------start------------->8--- (gdb) thread 6 [Switching to thread 6 (Thread 0x7fbb31373700 (LWP 21214))] #0 0x00007fbb34bffaac in __libc_read (fd=3D5, buf=3Dbuf@entry=3D0x7fbb3137= 2a30, nbytes=3Dnbytes@entry=3D1) at ../sysdeps/unix/sysv/linux/read.c:27 27 in ../sysdeps/unix/sysv/linux/read.c (gdb) bt #0 0x00007fbb34bffaac in __libc_read (fd=3D5, buf=3Dbuf@entry=3D0x7fbb3137= 2a30, nbytes=3Dnbytes@entry=3D1) at ../sysdeps/unix/sysv/linux/read.c:27 #1 0x00007fbb350d74d7 in read_finalization_pipe_data (data=3D0x7fbb31372a3= 0) at finalizers.c:199 #2 0x00007fbb34e30b63 in GC_do_blocking_inner (data=3D0x7fbb313729f0 "\300= t\r5\273\177", context=3D) at pthread_support.c:1353 #3 0x00007fbb34e25389 in GC_with_callee_saves_pushed (fn=3D0x7fbb34e30b20 = , arg=3Darg@entry=3D0x7fbb313729f0 "\300t\r5\273\177"= ) at mach_dep.c:322 #4 0x00007fbb34e2afec in GC_do_blocking (fn=3Dfn@entry=3D0x7fbb350d74c0 , client_data=3Dclient_data@entry=3D0x7fbb31372a= 30) at misc.c:2061 #5 0x00007fbb3513902a in scm_without_guile (func=3D0x7fbb350d74c0 , data=3D0x7fbb31372a30) at threads.c:722 #6 0x00007fbb350d7887 in finalization_thread_proc (unused=3D) at finalizers.c:212 #7 0x00007fbb350c82da in c_body (d=3D0x7fbb31372e50) at continuations.c:422 #8 0x00007fbb351477fd in vm_regular_engine (thread=3D0x5, vp=3D0x172aea0, = registers=3D0x1, resume=3D884996780) at vm-engine.c:784 #9 0x00007fbb3514ae5a in scm_call_n (proc=3Dproc@entry=3D#, argv=3Dargv@entry=3D0x0, nargs=3Dnargs@entry=3D0) at vm.c:1257 #10 0x00007fbb350cdef9 in scm_call_0 (proc=3Dproc@entry=3D#) at eval.c:481 #11 0x00007fbb3513a026 in catch (tag=3Dtag@entry=3D#t, thunk=3D#, handler=3D0x1c1ff40,=20 pre_unwind_handler=3D0x1c1ff00) at throw.c:137 #12 0x00007fbb3513a365 in scm_catch_with_pre_unwind_handler (key=3Dkey@entr= y=3D#t, thunk=3D, handler=3D, pre_unwind_hand= ler=3D) at throw.c:254 #13 0x00007fbb3513a51f in scm_c_catch (tag=3Dtag@entry=3D#t, body=3Dbody@en= try=3D0x7fbb350c82d0 , body_data=3Dbody_data@entry=3D0x7fbb31372e50= , handler=3Dhandler@entry=3D0x7fbb350c8560 ,=20 handler_data=3Dhandler_data@entry=3D0x7fbb31372e50, pre_unwind_handler= =3Dpre_unwind_handler@entry=3D0x7fbb350c83c0 , pre_unwi= nd_handler_data=3D0x1564b60) at throw.c:377 #14 0x00007fbb350c88c0 in scm_i_with_continuation_barrier (body=3Dbody@entr= y=3D0x7fbb350c82d0 , body_data=3Dbody_data@entry=3D0x7fbb31372e50, = handler=3Dhandler@entry=3D0x7fbb350c8560 ,=20 handler_data=3Dhandler_data@entry=3D0x7fbb31372e50, pre_unwind_handler= =3Dpre_unwind_handler@entry=3D0x7fbb350c83c0 , pre_unwi= nd_handler_data=3D0x1564b60) at continuations.c:360 #15 0x00007fbb350c8955 in scm_c_with_continuation_barrier (func=3D, data=3D) at continuations.c:456 #16 0x00007fbb35138c3c in with_guile (base=3Dbase@entry=3D0x7fbb31372eb8, d= ata=3Ddata@entry=3D0x7fbb31372ee0) at threads.c:661 #17 0x00007fbb34e2afb8 in GC_call_with_stack_base (fn=3Dfn@entry=3D0x7fbb35= 138bf0 , arg=3Darg@entry=3D0x7fbb31372ee0) at misc.c:1949 #18 0x00007fbb35138fd8 in scm_i_with_guile (dynamic_state=3D= , data=3D, func=3D) at threads.c:704 #19 scm_with_guile (func=3D, data=3D) at thre= ads.c:710 #20 0x00007fbb34bf6567 in start_thread (arg=3D0x7fbb31373700) at pthread_cr= eate.c:463 #21 0x00007fbb3351eeaf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clo= ne.S:95 --8<---------------cut here---------------end--------------->8--- The =E2=80=98primitive-fork=E2=80=99 call itself presumably comes from =E2=80=98decompressed-port=E2=80=99, used in (guix scripts substitute). Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Thu Jun 21 10:11:17 2018 Received: (at 31925) by debbugs.gnu.org; 21 Jun 2018 14:11:17 +0000 Received: from localhost ([127.0.0.1]:59230 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fW0Iv-0007Nz-AO for submit@debbugs.gnu.org; Thu, 21 Jun 2018 10:11:17 -0400 Received: from sender-of-o51.zoho.com ([135.84.80.216]:21109) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fW0It-0007Nq-Ge for 31925@debbugs.gnu.org; Thu, 21 Jun 2018 10:11:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1529590248; s=zoho; d=elephly.net; i=rekado@elephly.net; h=References:From:To:Cc:Subject:In-reply-to:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding; l=279; bh=ZaN6mNE9D7yV9rKjAQS3A3MkvZKYC0msSBGEmtfu0Ms=; b=Mceqy9o8arD6IkUKRxYnQDEHwBrW0IJsFjvG2dMqLxjhLt7BrsdCQLFZjfswl7nK 7/dqVuZ4RTOVmZ8Y15PYDEsGxicxwXWKu9U56GmOOYMvNyMu+B8EfvXYgquAdqVoknQ B88Jsy4DbfytjpW3myCPM2qbS6spUnmdX67DQzo8= Received: from localhost (141.80.247.181 [141.80.247.181]) by mx.zohomail.com with SMTPS id 1529590248173934.3391544818337; Thu, 21 Jun 2018 07:10:48 -0700 (PDT) References: <87bmc4748n.fsf@gnu.org> User-agent: mu4e 1.0; emacs 26.1 From: Ricardo Wurmus To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 In-reply-to: <87bmc4748n.fsf@gnu.org> X-URL: https://elephly.net X-PGP-Key: https://elephly.net/rekado.pubkey X-PGP-Fingerprint: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC Date: Thu, 21 Jun 2018 16:10:45 +0200 Message-ID: <87tvpws00q.fsf@elephly.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31925 Cc: 31925@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi Ludo, > When downloading a number of substitutes, =E2=80=98guix substitute=E2=80= =99 sometimes > hangs for me since the switch to glibc 2.27. Anyone else experiencing > this? It=E2=80=99s relatively frequent for me. It has never happened to me. --=20 Ricardo From debbugs-submit-bounces@debbugs.gnu.org Mon Jun 25 08:25:44 2018 Received: (at control) by debbugs.gnu.org; 25 Jun 2018 12:25:45 +0000 Received: from localhost ([127.0.0.1]:34429 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fXQYy-0007nW-O9 for submit@debbugs.gnu.org; Mon, 25 Jun 2018 08:25:44 -0400 Received: from eggs.gnu.org ([208.118.235.92]:58787) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fXQYx-0007nJ-PF for control@debbugs.gnu.org; Mon, 25 Jun 2018 08:25:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fXQYo-00083L-8E for control@debbugs.gnu.org; Mon, 25 Jun 2018 08:25:38 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:39657) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fXQYn-000839-WA for control@debbugs.gnu.org; Mon, 25 Jun 2018 08:25:34 -0400 Received: from [193.50.110.137] (port=36298 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fXQYm-0006yy-ID for control@debbugs.gnu.org; Mon, 25 Jun 2018 08:25:33 -0400 Date: Mon, 25 Jun 2018 14:25:30 +0200 Message-Id: <87wounqchx.fsf@gnu.org> To: control@debbugs.gnu.org From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: control message for bug #31925 MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) severity 31925 serious From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 04 03:04:03 2018 Received: (at 31925) by debbugs.gnu.org; 4 Jul 2018 07:04:03 +0000 Received: from localhost ([127.0.0.1]:45816 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fabpa-0007Td-Pq for submit@debbugs.gnu.org; Wed, 04 Jul 2018 03:04:02 -0400 Received: from eggs.gnu.org ([208.118.235.92]:38560) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fabpZ-0007TB-Gc for 31925@debbugs.gnu.org; Wed, 04 Jul 2018 03:04:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fabpT-0002Un-Km for 31925@debbugs.gnu.org; Wed, 04 Jul 2018 03:03:56 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:33469) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fabpT-0002Ui-Ga for 31925@debbugs.gnu.org; Wed, 04 Jul 2018 03:03:55 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=60300 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fabpT-0005gw-6B for 31925@debbugs.gnu.org; Wed, 04 Jul 2018 03:03:55 -0400 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: 31925@debbugs.gnu.org Subject: Re: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 References: <87bmc4748n.fsf@gnu.org> Date: Wed, 04 Jul 2018 09:03:53 +0200 In-Reply-To: <87bmc4748n.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Thu, 21 Jun 2018 13:45:12 +0200") Message-ID: <874lhffpnq.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 31925 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable ludo@gnu.org (Ludovic Court=C3=A8s) skribis: > #0 0x00007fbb34bf794d in __GI___pthread_timedjoin_ex (threadid=3D1404419= 61314048, thread_return=3Dthread_return@entry=3D0x0, abstime=3Dabstime@entr= y=3D0x0, block=3Dblock@entry=3Dtrue) > at pthread_join_common.c:89 > #1 0x00007fbb34bf773c in __pthread_join (threadid=3D, thr= ead_return=3Dthread_return@entry=3D0x0) at pthread_join.c:24 > #2 0x00007fbb350d7548 in stop_finalization_thread () at finalizers.c:265 > #3 0x00007fbb350d7759 in scm_i_finalizer_pre_fork () at finalizers.c:290 > #4 0x00007fbb3514f256 in scm_fork () at posix.c:1222 Here=E2=80=99s a reproducer that works quite well (it hangs within a couple= of minutes): --=-=-= Content-Type: text/x-scheme Content-Disposition: inline; filename=finalization-thread-proc.scm Content-Description: the code (use-modules (guix utils) (ice-9 ftw) (ice-9 match) (srfi srfi-1) (srfi srfi-26) (rnrs io ports)) (define infodir (string-append (getenv "HOME") "/.guix-profile/share/info/")) (define files (apply circular-list (map (cut string-append infodir <>) (scandir infodir (lambda (file) (string-suffix? ".gz" file)))))) (sigaction SIGALRM (lambda _ (alarm 1))) (alarm 1) (let loop ((files files) (n 0)) (match files ((file . tail) (call-with-input-file file (lambda (port) (call-with-decompressed-port 'gzip port (lambda (port) (let loop () (unless (eof-object? (get-bytevector-n port 777)) (loop))))))) ;; (pk 'loop n file) (display ".") (loop tail (+ n 1))))) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: base64 DQpMdWRv4oCZLg0K --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 04 12:59:22 2018 Received: (at 31925) by debbugs.gnu.org; 4 Jul 2018 16:59:22 +0000 Received: from localhost ([127.0.0.1]:46919 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fal7i-0004OM-1T for submit@debbugs.gnu.org; Wed, 04 Jul 2018 12:59:22 -0400 Received: from eggs.gnu.org ([208.118.235.92]:60522) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fal7g-0004O9-Hz for 31925@debbugs.gnu.org; Wed, 04 Jul 2018 12:59:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fal7a-0007Y9-4J for 31925@debbugs.gnu.org; Wed, 04 Jul 2018 12:59:15 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_40 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:41545) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fal6u-0007FZ-J9; Wed, 04 Jul 2018 12:58:32 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=38116 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fal6u-00013q-2w; Wed, 04 Jul 2018 12:58:32 -0400 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: 31925@debbugs.gnu.org, Andy Wingo Subject: Re: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 References: <87bmc4748n.fsf@gnu.org> <874lhffpnq.fsf@gnu.org> Date: Wed, 04 Jul 2018 18:58:30 +0200 In-Reply-To: <874lhffpnq.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Wed, 04 Jul 2018 09:03:53 +0200") Message-ID: <87tvpfaqfd.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 31925 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) (+Cc: Andy as the ultimate authority for all these things. :-)) ludo@gnu.org (Ludovic Court=C3=A8s) skribis: > (let loop ((files files) > (n 0)) > (match files > ((file . tail) > (call-with-input-file file > (lambda (port) > (call-with-decompressed-port 'gzip port > (lambda (port) > (let loop () > (unless (eof-object? (get-bytevector-n port 777)) > (loop))))))) > ;; (pk 'loop n file) > (display ".") > (loop tail (+ n 1))))) One problem I=E2=80=99ve noticed is that the child process that =E2=80=98call-with-decompressed-port=E2=80=99 spawns would be stuck trying = to get the allocation lock: --8<---------------cut here---------------start------------->8--- (gdb) bt #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:= 135 #1 0x00007f9fd8d5cb25 in __GI___pthread_mutex_lock (mutex=3D0x7f9fd91b3240= ) at ../nptl/pthread_mutex_lock.c:78 #2 0x00007f9fd8f8ef8f in GC_call_with_alloc_lock (fn=3Dfn@entry=3D0x7f9fd9= 2b0420 , client_data=3Dclient_data@entry=3D0x7ffe4b9a0d= 80) at misc.c:1929 #3 0x00007f9fd92b1270 in copy_weak_entry (dst=3D0x7ffe4b9a0d70, src=3D0x75= 9ed0) at weak-set.c:124 #4 weak_set_remove_x (closure=3D0x8850c0, pred=3D0x7f9fd92b0440 , hash=3D3944337866184184181, set=3D0x70cf00) at weak-set.c:615 #5 scm_c_weak_set_remove_x (set=3Dset@entry=3D#, raw_hash= =3D, pred=3Dpred@entry=3D0x7f9fd92b0440 , clos= ure=3Dclosure@entry=3D0x8850c0) at weak-set.c:791 #6 0x00007f9fd92b13b0 in scm_weak_set_remove_x (set=3D#, = obj=3Dobj@entry=3D#) at weak-set.c:812 #7 0x00007f9fd926f72f in close_port (port=3D#, explicit=3D<= optimized out>) at ports.c:884 #8 0x00007f9fd92ad307 in vm_regular_engine (thread=3D0x7f9fd91b3240 , vp=3D0x7adf30, registers=3D0x0, resume=3D-657049556) at vm-engi= ne.c:786 #9 0x00007f9fd92afb37 in scm_call_n (proc=3D0x7f9fd959b030, argv=3Dargv@en= try=3D0x7ffe4b9a1018, nargs=3Dnargs@entry=3D1) at vm.c:1257 #10 0x00007f9fd9233017 in scm_primitive_eval (exp=3D, exp@en= try=3D0x855280) at eval.c:662 #11 0x00007f9fd9233073 in scm_eval (exp=3D0x855280, module_or_state=3Dmodul= e_or_state@entry=3D0x83d140) at eval.c:696 #12 0x00007f9fd927e8d0 in scm_shell (argc=3D2, argv=3D0x7ffe4b9a1668) at sc= ript.c:454 #13 0x00007f9fd9249a9d in invoke_main_func (body_data=3D0x7ffe4b9a1510) at = init.c:340 #14 0x00007f9fd922c28a in c_body (d=3D0x7ffe4b9a1450) at continuations.c:422 #15 0x00007f9fd92ad307 in vm_regular_engine (thread=3D0x7f9fd91b3240 , vp=3D0x7adf30, registers=3D0x0, resume=3D-657049556) at vm-engi= ne.c:786 #16 0x00007f9fd92afb37 in scm_call_n (proc=3Dproc@entry=3D#, argv=3Dargv@entry=3D0x0, nargs=3Dnargs@entry=3D0) at vm.c:1257 #17 0x00007f9fd9231e69 in scm_call_0 (proc=3Dproc@entry=3D#) at eval.c:481 #18 0x00007f9fd929e7b2 in catch (tag=3Dtag@entry=3D#t, thunk=3D#, handler=3D0x7950c0, pre_unwind_handler=3D0x7950a0) at t= hrow.c:137 #19 0x00007f9fd929ea95 in scm_catch_with_pre_unwind_handler (key=3Dkey@entr= y=3D#t, thunk=3D, handler=3D, pre_unwind_hand= ler=3D) at throw.c:254 #20 0x00007f9fd929ec5f in scm_c_catch (tag=3Dtag@entry=3D#t, body=3Dbody@en= try=3D0x7f9fd922c280 , body_data=3Dbody_data@entry=3D0x7ffe4b9a1450= , handler=3Dhandler@entry=3D0x7f9fd922c510 , handler_data=3Dhand= ler_data@entry=3D0x7ffe4b9a1450, pre_unwind_handler=3Dpre_unwind_handler@en= try=3D0x7f9fd922c370 , pre_unwind_handler_data=3D0x7a9b= c0) at throw.c:377 #21 0x00007f9fd922c870 in scm_i_with_continuation_barrier (body=3Dbody@entr= y=3D0x7f9fd922c280 , body_data=3Dbody_data@entry=3D0x7ffe4b9a1450, = handler=3Dhandler@entry=3D0x7f9fd922c510 , handler_data=3Dhandle= r_data@entry=3D0x7ffe4b9a1450, pre_unwind_handler=3Dpre_unwind_handler@entr= y=3D0x7f9fd922c370 , pre_unwind_handler_data=3D0x7a9bc0= ) at continuations.c:360 #22 0x00007f9fd922c905 in scm_c_with_continuation_barrier (func=3D, data=3D) at continuations.c:456 #23 0x00007f9fd929d3ec in with_guile (base=3Dbase@entry=3D0x7ffe4b9a14b8, d= ata=3Ddata@entry=3D0x7ffe4b9a14e0) at threads.c:661 #24 0x00007f9fd8f8efb8 in GC_call_with_stack_base (fn=3Dfn@entry=3D0x7f9fd9= 29d3a0 , arg=3Darg@entry=3D0x7ffe4b9a14e0) at misc.c:1949 #25 0x00007f9fd929d708 in scm_i_with_guile (dynamic_state=3D= , data=3Ddata@entry=3D0x7ffe4b9a14e0, func=3Dfunc@entry=3D0x7f9fd9249a80 ) at threads.c:704 #26 scm_with_guile (func=3Dfunc@entry=3D0x7f9fd9249a80 , = data=3Ddata@entry=3D0x7ffe4b9a1510) at threads.c:710 #27 0x00007f9fd9249c32 in scm_boot_guile (argc=3Dargc@entry=3D2, argv=3Darg= v@entry=3D0x7ffe4b9a1668, main_func=3Dmain_func@entry=3D0x400cb0 , closure=3Dclosure@entry=3D0x0) at init.c:323 #28 0x0000000000400b70 in main (argc=3D2, argv=3D0x7ffe4b9a1668) at guile.c= :101 (gdb) info threads Id Target Id Frame=20 * 1 Thread 0x7f9fd972eb80 (LWP 15573) "guile" __lll_lock_wait () at ../s= ysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 --8<---------------cut here---------------end--------------->8--- So it seems quite clear that the thing has the alloc lock taken. I suppose this can happen if one of the libgc threads runs right when we call fork and takes the alloc lock, right? If that is correct, the fix would be to call fork within =E2=80=98GC_call_with_alloc_lock=E2=80=99. How does that sound? As a workaround on the Guix side, we might achieve the same effect by calling =E2=80=98gc-disable=E2=80=99 right before =E2=80=98primitive-fork= =E2=80=99. Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 04 23:35:18 2018 Received: (at 31925) by debbugs.gnu.org; 5 Jul 2018 03:35:19 +0000 Received: from localhost ([127.0.0.1]:47132 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fav38-000284-II for submit@debbugs.gnu.org; Wed, 04 Jul 2018 23:35:18 -0400 Received: from world.peace.net ([64.112.178.59]:59756) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fav36-00027q-Ct for 31925@debbugs.gnu.org; Wed, 04 Jul 2018 23:35:16 -0400 Received: from mhw by world.peace.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fav30-0005ph-2A; Wed, 04 Jul 2018 23:35:10 -0400 From: Mark H Weaver To: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Re: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 References: <87bmc4748n.fsf@gnu.org> <874lhffpnq.fsf@gnu.org> <87tvpfaqfd.fsf@gnu.org> Date: Wed, 04 Jul 2018 23:33:52 -0400 In-Reply-To: <87tvpfaqfd.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Wed, 04 Jul 2018 18:58:30 +0200") Message-ID: <87efgil5jz.fsf@netris.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31925 Cc: Andy Wingo , 31925@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi Ludovic, ludo@gnu.org (Ludovic Court=C3=A8s) writes: > (+Cc: Andy as the ultimate authority for all these things. :-)) > > ludo@gnu.org (Ludovic Court=C3=A8s) skribis: > >> (let loop ((files files) >> (n 0)) >> (match files >> ((file . tail) >> (call-with-input-file file >> (lambda (port) >> (call-with-decompressed-port 'gzip port >> (lambda (port) >> (let loop () >> (unless (eof-object? (get-bytevector-n port 777)) >> (loop))))))) >> ;; (pk 'loop n file) >> (display ".") >> (loop tail (+ n 1))))) > > One problem I=E2=80=99ve noticed is that the child process that > =E2=80=98call-with-decompressed-port=E2=80=99 spawns would be stuck tryin= g to get the > allocation lock: > > (gdb) bt > #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.= S:135 > #1 0x00007f9fd8d5cb25 in __GI___pthread_mutex_lock (mutex=3D0x7f9fd91b32= 40 ) at ../nptl/pthread_mutex_lock.c:78 > #2 0x00007f9fd8f8ef8f in GC_call_with_alloc_lock (fn=3Dfn@entry=3D0x7f9f= d92b0420 , client_data=3Dclient_data@entry=3D0x7ffe4b9a= 0d80) at misc.c:1929 > #3 0x00007f9fd92b1270 in copy_weak_entry (dst=3D0x7ffe4b9a0d70, src=3D0x= 759ed0) at weak-set.c:124 > #4 weak_set_remove_x (closure=3D0x8850c0, pred=3D0x7f9fd92b0440 , hash=3D3944337866184184181, set=3D0x70cf00) at weak-set.c:615 > #5 scm_c_weak_set_remove_x (set=3Dset@entry=3D#, raw_ha= sh=3D, pred=3Dpred@entry=3D0x7f9fd92b0440 , cl= osure=3Dclosure@entry=3D0x8850c0) at weak-set.c:791 > #6 0x00007f9fd92b13b0 in scm_weak_set_remove_x (set=3D#= , obj=3Dobj@entry=3D#) at weak-set.c:812 > #7 0x00007f9fd926f72f in close_port (port=3D#, explicit= =3D) at ports.c:884 > #8 0x00007f9fd92ad307 in vm_regular_engine (thread=3D0x7f9fd91b3240 , vp=3D0x7adf30, registers=3D0x0, resume=3D-657049556) at vm-en= gine.c:786 > #9 0x00007f9fd92afb37 in scm_call_n (proc=3D0x7f9fd959b030, argv=3Dargv@= entry=3D0x7ffe4b9a1018, nargs=3Dnargs@entry=3D1) at vm.c:1257 > #10 0x00007f9fd9233017 in scm_primitive_eval (exp=3D, exp@= entry=3D0x855280) at eval.c:662 > #11 0x00007f9fd9233073 in scm_eval (exp=3D0x855280, module_or_state=3Dmod= ule_or_state@entry=3D0x83d140) at eval.c:696 > #12 0x00007f9fd927e8d0 in scm_shell (argc=3D2, argv=3D0x7ffe4b9a1668) at = script.c:454 > #13 0x00007f9fd9249a9d in invoke_main_func (body_data=3D0x7ffe4b9a1510) a= t init.c:340 > #14 0x00007f9fd922c28a in c_body (d=3D0x7ffe4b9a1450) at continuations.c:= 422 > #15 0x00007f9fd92ad307 in vm_regular_engine (thread=3D0x7f9fd91b3240 , vp=3D0x7adf30, registers=3D0x0, resume=3D-657049556) at vm-en= gine.c:786 > #16 0x00007f9fd92afb37 in scm_call_n (proc=3Dproc@entry=3D#, argv=3Dargv@entry=3D0x0, nargs=3Dnargs@entry=3D0) at vm.c:12= 57 > #17 0x00007f9fd9231e69 in scm_call_0 (proc=3Dproc@entry=3D#) at eval.c:481 > #18 0x00007f9fd929e7b2 in catch (tag=3Dtag@entry=3D#t, thunk=3D#, handler=3D0x7950c0, pre_unwind_handler=3D0x7950a0) at= throw.c:137 > #19 0x00007f9fd929ea95 in scm_catch_with_pre_unwind_handler (key=3Dkey@en= try=3D#t, thunk=3D, handler=3D, pre_unwind_ha= ndler=3D) at throw.c:254 > #20 0x00007f9fd929ec5f in scm_c_catch (tag=3Dtag@entry=3D#t, body=3Dbody@= entry=3D0x7f9fd922c280 , body_data=3Dbody_data@entry=3D0x7ffe4b9a14= 50, handler=3Dhandler@entry=3D0x7f9fd922c510 , handler_data=3Dha= ndler_data@entry=3D0x7ffe4b9a1450, pre_unwind_handler=3Dpre_unwind_handler@= entry=3D0x7f9fd922c370 , pre_unwind_handler_data=3D0x7a= 9bc0) at throw.c:377 > #21 0x00007f9fd922c870 in scm_i_with_continuation_barrier (body=3Dbody@en= try=3D0x7f9fd922c280 , body_data=3Dbody_data@entry=3D0x7ffe4b9a1450= , handler=3Dhandler@entry=3D0x7f9fd922c510 , handler_data=3Dhand= ler_data@entry=3D0x7ffe4b9a1450, pre_unwind_handler=3Dpre_unwind_handler@en= try=3D0x7f9fd922c370 , pre_unwind_handler_data=3D0x7a9b= c0) at continuations.c:360 > #22 0x00007f9fd922c905 in scm_c_with_continuation_barrier (func=3D, data=3D) at continuations.c:456 > #23 0x00007f9fd929d3ec in with_guile (base=3Dbase@entry=3D0x7ffe4b9a14b8,= data=3Ddata@entry=3D0x7ffe4b9a14e0) at threads.c:661 > #24 0x00007f9fd8f8efb8 in GC_call_with_stack_base (fn=3Dfn@entry=3D0x7f9f= d929d3a0 , arg=3Darg@entry=3D0x7ffe4b9a14e0) at misc.c:1949 > #25 0x00007f9fd929d708 in scm_i_with_guile (dynamic_state=3D, data=3Ddata@entry=3D0x7ffe4b9a14e0, func=3Dfunc@entry=3D0x7f9fd9249a80 = ) at threads.c:704 > #26 scm_with_guile (func=3Dfunc@entry=3D0x7f9fd9249a80 = , data=3Ddata@entry=3D0x7ffe4b9a1510) at threads.c:710 > #27 0x00007f9fd9249c32 in scm_boot_guile (argc=3Dargc@entry=3D2, argv=3Da= rgv@entry=3D0x7ffe4b9a1668, main_func=3Dmain_func@entry=3D0x400cb0 , closure=3Dclosure@entry=3D0x0) at init.c:323 > #28 0x0000000000400b70 in main (argc=3D2, argv=3D0x7ffe4b9a1668) at guile= .c:101 > (gdb) info threads > Id Target Id Frame=20 > * 1 Thread 0x7f9fd972eb80 (LWP 15573) "guile" __lll_lock_wait () at ..= /sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > > So it seems quite clear that the thing has the alloc lock taken. I > suppose this can happen if one of the libgc threads runs right when we > call fork and takes the alloc lock, right? Does libgc spawn threads that run concurrently with user threads? If so, that would be news to me. My understanding was that incremental marking occurs within GC allocation calls, and marking threads are only spawned after all user threads have been stopped, but I could be wrong. The first idea that comes to my mind is that perhaps the finalization thread is holding the GC allocation lock when 'fork' is called. The finalization thread grabs the GC allocation lock every time it calls 'GC_invoke_finalizers'. All ports backed by POSIX file descriptors (including pipes) register finalizers and therefore spawn the finalization thread and make work for it to do. Another possibility: both the finalization thread and the signal delivery thread call 'scm_without_guile', which calls 'GC_do_blocking', which also temporarily grabs the GC allocation lock before calling the specified function. See 'GC_do_blocking_inner' in pthread_support.c in libgc. You spawn the signal delivery thread by calling 'sigaction' and you make work for it to do every second when the SIGALRM is delivered. > If that is correct, the fix would be to call fork within > =E2=80=98GC_call_with_alloc_lock=E2=80=99. > > How does that sound? Sure, sounds good to me. > As a workaround on the Guix side, we might achieve the same effect by > calling =E2=80=98gc-disable=E2=80=99 right before =E2=80=98primitive-fork= =E2=80=99. I don't think this would help. Thanks, Mark From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 05 04:01:04 2018 Received: (at 31925) by debbugs.gnu.org; 5 Jul 2018 08:01:04 +0000 Received: from localhost ([127.0.0.1]:47201 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fazCK-00005K-8D for submit@debbugs.gnu.org; Thu, 05 Jul 2018 04:01:04 -0400 Received: from pb-sasl1.pobox.com ([64.147.108.66]:56600 helo=sasl.smtp.pobox.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fazCH-000051-V9 for 31925@debbugs.gnu.org; Thu, 05 Jul 2018 04:01:02 -0400 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-sasl1.pobox.com (Postfix) with ESMTP id A75C4D8C5C; Thu, 5 Jul 2018 04:01:01 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; s=sasl; bh=kT6Q9JJiCmW0 OcaRNNz2fD19Cl0=; b=dN0RWOYsuTT2+/kQy5apPAZG0DAsGfFYr/N82wvmjl82 zpT0T5+XcrlozW+KIepsel5jKwpt5qayVgs7j+DYD6LUxyZJeSJolOISLJqka7h2 O5SL4CrGXUAFJ00D98960nDqi3D3lV24v8nihsv5L7AFl5I1Rz1Mws9Pzs8IOoo= Received: from pb-sasl1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-sasl1.pobox.com (Postfix) with ESMTP id 663E9D8C57; Thu, 5 Jul 2018 04:01:01 -0400 (EDT) Received: from sparrow (unknown [88.160.190.192]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pb-sasl1.pobox.com (Postfix) with ESMTPSA id 2F525D8C54; Thu, 5 Jul 2018 04:00:59 -0400 (EDT) From: Andy Wingo To: Mark H Weaver Subject: Re: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 References: <87bmc4748n.fsf@gnu.org> <874lhffpnq.fsf@gnu.org> <87tvpfaqfd.fsf@gnu.org> <87efgil5jz.fsf@netris.org> Date: Thu, 05 Jul 2018 10:00:52 +0200 In-Reply-To: <87efgil5jz.fsf@netris.org> (Mark H. Weaver's message of "Wed, 04 Jul 2018 23:33:52 -0400") Message-ID: <87lgaqjemj.fsf@igalia.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Pobox-Relay-ID: 8E558238-8029-11E8-AAF2-46F7D6707B88-02397024!pb-sasl1.pobox.com X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 31925 Cc: 31925@debbugs.gnu.org, Ludovic =?utf-8?Q?Court=C3=A8s?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.3 (/) Hi! On Thu 05 Jul 2018 05:33, Mark H Weaver writes: >> One problem I=E2=80=99ve noticed is that the child process that >> =E2=80=98call-with-decompressed-port=E2=80=99 spawns would be stuck tryi= ng to get the >> allocation lock: >> >> So it seems quite clear that the thing has the alloc lock taken. I >> suppose this can happen if one of the libgc threads runs right when we >> call fork and takes the alloc lock, right? > > Does libgc spawn threads that run concurrently with user threads? If > so, that would be news to me. My understanding was that incremental > marking occurs within GC allocation calls, and marking threads are only > spawned after all user threads have been stopped, but I could be wrong. I think Mark is correct. > The first idea that comes to my mind is that perhaps the finalization > thread is holding the GC allocation lock when 'fork' is called. So of course we agree you're only supposed to "fork" when there are no other threads running, I think. As far as the finalizer thread goes, "primitive-fork" calls "scm_i_finalizer_pre_fork" which should join the finalizer thread, before the fork. There could be a bug obviously but the intention is for Guile to shut down its internal threads. Here's the body of primitive-fork fwiw: { int pid; scm_i_finalizer_pre_fork (); if (scm_ilength (scm_all_threads ()) !=3D 1) /* Other threads may be holding on to resources that Guile needs -- it is not safe to permit one thread to fork while others are running. =20=20=20=20 In addition, POSIX clearly specifies that if a multi-threaded program forks, the child must only call functions that are async-signal-safe. We can't guarantee that in general. The best we can do is to allow forking only very early, before any call to sigaction spawns the signal-handling thread. */ scm_display (scm_from_latin1_string ("warning: call to primitive-fork while multiple threads are run= ning;\n" " further behavior unspecified. See \"Processes\" in t= he\n" " manual, for more information.\n"), scm_current_warning_port ()); pid =3D fork (); if (pid =3D=3D -1) SCM_SYSERROR; return scm_from_int (pid); } > Another possibility: both the finalization thread and the signal > delivery thread call 'scm_without_guile', which calls 'GC_do_blocking', > which also temporarily grabs the GC allocation lock before calling the > specified function. See 'GC_do_blocking_inner' in pthread_support.c in > libgc. You spawn the signal delivery thread by calling 'sigaction' and > you make work for it to do every second when the SIGALRM is delivered. The signal thread is a possibility though in that case you'd get a warning; the signal-handling thread appears in scm_all_threads. Do you see a warning? If you do, that is a problem :) >> If that is correct, the fix would be to call fork within >> =E2=80=98GC_call_with_alloc_lock=E2=80=99. >> >> How does that sound? > > Sure, sounds good to me. I don't think this is necessary. I think the problem is that other threads are running. If we solve that, then we solve this issue; if we don't solve that, we don't know what else those threads are doing, so we don't know what mutexes and other state they might have. Andy From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 05 04:35:33 2018 Received: (at 31925) by debbugs.gnu.org; 5 Jul 2018 08:35:33 +0000 Received: from localhost ([127.0.0.1]:47219 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fazjg-0000z4-Uj for submit@debbugs.gnu.org; Thu, 05 Jul 2018 04:35:33 -0400 Received: from eggs.gnu.org ([208.118.235.92]:44722) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fazjf-0000yp-JW for 31925@debbugs.gnu.org; Thu, 05 Jul 2018 04:35:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fazjX-0002jx-9w for 31925@debbugs.gnu.org; Thu, 05 Jul 2018 04:35:26 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:52876) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fazir-0000jD-Bk; Thu, 05 Jul 2018 04:34:41 -0400 Received: from [193.50.110.150] (port=34120 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1faziq-0002a2-SO; Thu, 05 Jul 2018 04:34:41 -0400 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: Mark H Weaver Subject: Re: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 References: <87bmc4748n.fsf@gnu.org> <874lhffpnq.fsf@gnu.org> <87tvpfaqfd.fsf@gnu.org> <87efgil5jz.fsf@netris.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 17 Messidor an 226 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Thu, 05 Jul 2018 10:34:38 +0200 In-Reply-To: <87efgil5jz.fsf@netris.org> (Mark H. Weaver's message of "Wed, 04 Jul 2018 23:33:52 -0400") Message-ID: <87lgaqgjxd.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 31925 Cc: Andy Wingo , 31925@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hello Mark, Thanks for chiming in! Mark H Weaver skribis: > Does libgc spawn threads that run concurrently with user threads? If > so, that would be news to me. My understanding was that incremental > marking occurs within GC allocation calls, and marking threads are only > spawned after all user threads have been stopped, but I could be wrong. libgc launches mark threads as soon as it is initialized, I think. > The first idea that comes to my mind is that perhaps the finalization > thread is holding the GC allocation lock when 'fork' is called. The > finalization thread grabs the GC allocation lock every time it calls > 'GC_invoke_finalizers'. All ports backed by POSIX file descriptors > (including pipes) register finalizers and therefore spawn the > finalization thread and make work for it to do. In 2.2 there=E2=80=99s scm_i_finalizer_pre_fork that takes care of shutting= down the finalization thread right before fork. So the finalization thread cannot be blamed, AIUI. > Another possibility: both the finalization thread and the signal > delivery thread call 'scm_without_guile', which calls 'GC_do_blocking', > which also temporarily grabs the GC allocation lock before calling the > specified function. See 'GC_do_blocking_inner' in pthread_support.c in > libgc. You spawn the signal delivery thread by calling 'sigaction' and > you make work for it to do every second when the SIGALRM is delivered. That=E2=80=99s definitely a possibility: the signal thread could be allocat= ing stuff, and thereby taking the alloc lock just at that time. >> If that is correct, the fix would be to call fork within >> =E2=80=98GC_call_with_alloc_lock=E2=80=99. >> >> How does that sound? > > Sure, sounds good to me. Here=E2=80=99s a patch: --=-=-= Content-Type: text/x-patch Content-Disposition: inline diff --git a/libguile/posix.c b/libguile/posix.c index b0fcad5fd..088e75631 100644 --- a/libguile/posix.c +++ b/libguile/posix.c @@ -1209,6 +1209,13 @@ SCM_DEFINE (scm_execle, "execle", 2, 0, 1, #undef FUNC_NAME #ifdef HAVE_FORK +static void * +do_fork (void *pidp) +{ + * (int *) pidp = fork (); + return NULL; +} + SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0, (), "Creates a new \"child\" process by duplicating the current \"parent\" process.\n" @@ -1236,7 +1243,13 @@ SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0, " further behavior unspecified. See \"Processes\" in the\n" " manual, for more information.\n"), scm_current_warning_port ()); - pid = fork (); + + /* Take the alloc lock to make sure it is released when the child + process starts. Failing to do that the child process could start + in a state where the alloc lock is taken and will never be + released. */ + GC_call_with_alloc_lock (do_fork, &pid); + if (pid == -1) SCM_SYSERROR; return scm_from_int (pid); --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Thoughts? Unfortunately my =E2=80=98call-with-decompressed-port=E2=80=99 reproducer d= oesn=E2=80=99t seem t to reproduce much today so I can=E2=80=99t tell if this helps (I let it run= more than 5 minutes with the supposedly-buggy Guile and nothing happened=E2=80= =A6). Thanks, Ludo=E2=80=99. --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 05 06:07:28 2018 Received: (at 31925) by debbugs.gnu.org; 5 Jul 2018 10:07:28 +0000 Received: from localhost ([127.0.0.1]:47244 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fb1Ac-0003FT-Ki for submit@debbugs.gnu.org; Thu, 05 Jul 2018 06:07:28 -0400 Received: from world.peace.net ([64.112.178.59]:38708) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fb1Aa-0003FD-3W for 31925@debbugs.gnu.org; Thu, 05 Jul 2018 06:07:24 -0400 Received: from mhw by world.peace.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fb1AU-0003IK-37; Thu, 05 Jul 2018 06:07:18 -0400 From: Mark H Weaver To: Andy Wingo Subject: Re: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 References: <87bmc4748n.fsf@gnu.org> <874lhffpnq.fsf@gnu.org> <87tvpfaqfd.fsf@gnu.org> <87efgil5jz.fsf@netris.org> <87lgaqjemj.fsf@igalia.com> Date: Thu, 05 Jul 2018 06:05:59 -0400 In-Reply-To: <87lgaqjemj.fsf@igalia.com> (Andy Wingo's message of "Thu, 05 Jul 2018 10:00:52 +0200") Message-ID: <87601ukneg.fsf@netris.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31925 Cc: 31925@debbugs.gnu.org, Ludovic =?utf-8?Q?Court=C3=A8s?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi, Andy Wingo writes: > On Thu 05 Jul 2018 05:33, Mark H Weaver writes: > >>> One problem I=E2=80=99ve noticed is that the child process that >>> =E2=80=98call-with-decompressed-port=E2=80=99 spawns would be stuck try= ing to get the >>> allocation lock: >>> >>> So it seems quite clear that the thing has the alloc lock taken. I >>> suppose this can happen if one of the libgc threads runs right when we >>> call fork and takes the alloc lock, right? >> >> Does libgc spawn threads that run concurrently with user threads? If >> so, that would be news to me. My understanding was that incremental >> marking occurs within GC allocation calls, and marking threads are only >> spawned after all user threads have been stopped, but I could be wrong. > > I think Mark is correct. Actually, looking at the libgc code more closely, it seems that Ludovic is correct. GC_init calls GC_start_mark_threads_inner if PARALLEL_MARK is defined at compile time. However, it's also the case that libgc uses 'pthread_atfork' (where available) to arrange to grab the GC allocation as well as the "mark locks" in the case where parallel marking is enabled. See fork_prepare_proc, fork_parent_proc, and fork_child_proc in pthread_support.c. As the libgc developers admit in the comment above 'fork_prepare_proc', they are not strictly meeting the requirements for safe use of 'fork', but they _are_ grabbing the allocation lock during 'fork', at least on systems that support 'pthread_atfork'. It looks like setting the GC_MARKERS environment variable to 1 should result in 'available_markers_m1' being set to 0, and thus effectively disable parallel marking. In that case, if I understand the code correctly, no marker threads will be spawned. It would be interesting to see if this problem can be reproduced when GC_MARKERS is set to 1. >> The first idea that comes to my mind is that perhaps the finalization >> thread is holding the GC allocation lock when 'fork' is called. > > So of course we agree you're only supposed to "fork" when there are no > other threads running, I think. > > As far as the finalizer thread goes, "primitive-fork" calls > "scm_i_finalizer_pre_fork" which should join the finalizer thread, > before the fork. There could be a bug obviously but the intention is > for Guile to shut down its internal threads. Ah, good! I didn't know this. So, I guess the problem is most likely elsewhere. We should probably arrange to join the signal delivery thread at the same time, and then to respawn it in the parent and child if needed. What do you think? > Here's the body of primitive-fork fwiw: > > { > int pid; > scm_i_finalizer_pre_fork (); > if (scm_ilength (scm_all_threads ()) !=3D 1) I think there's a race here. I think it's possible for the finalizer thread to be respawned after 'scm_i_finalizer_pre_fork' in two different ways: (1) 'scm_all_threads' performs allocation, which could trigger GC. (2) another thread could perform heap allocation and trigger GC after 'scm_i_finalizer_pre_fork' joins the thread. it might then shut down before 'scm_all_thread' is called. However, these are highly unlikely scenarios, and most likely not the problem we're seeing here. Still, I think the 'scm_i_finalizer_pre_fork' should be moved after the 'if', to avoid this race. >> Another possibility: both the finalization thread and the signal >> delivery thread call 'scm_without_guile', which calls 'GC_do_blocking', >> which also temporarily grabs the GC allocation lock before calling the >> specified function. See 'GC_do_blocking_inner' in pthread_support.c in >> libgc. You spawn the signal delivery thread by calling 'sigaction' and >> you make work for it to do every second when the SIGALRM is delivered. > > The signal thread is a possibility though in that case you'd get a > warning; the signal-handling thread appears in scm_all_threads. Do you > see a warning? If you do, that is a problem :) Good point! >>> If that is correct, the fix would be to call fork within >>> =E2=80=98GC_call_with_alloc_lock=E2=80=99. >>> >>> How does that sound? >> >> Sure, sounds good to me. > > I don't think this is necessary. I think the problem is that other > threads are running. If we solve that, then we solve this issue; if we > don't solve that, we don't know what else those threads are doing, so we > don't know what mutexes and other state they might have. On second thought, I agree with Andy here. Grabbing the allocation lock shouldn't be needed, and moreover it's not sufficient. I think we really need to ensure that no other threads are running, in which case grabbing the allocation lock is pointless. Anyway, it seems that libgc is already arranging to do this, at least on modern GNU/Linux systems. Thanks, Mark From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 05 08:27:35 2018 Received: (at 31925) by debbugs.gnu.org; 5 Jul 2018 12:27:35 +0000 Received: from localhost ([127.0.0.1]:47307 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fb3ME-0002NS-Os for submit@debbugs.gnu.org; Thu, 05 Jul 2018 08:27:34 -0400 Received: from eggs.gnu.org ([208.118.235.92]:41344) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fb3MD-0002NF-2L for 31925@debbugs.gnu.org; Thu, 05 Jul 2018 08:27:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fb3M3-0008NU-2P for 31925@debbugs.gnu.org; Thu, 05 Jul 2018 08:27:27 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_40 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:56074) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fb3M2-0008Mc-U4; Thu, 05 Jul 2018 08:27:23 -0400 Received: from [193.50.110.150] (port=35732 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fb3M2-00017K-I9; Thu, 05 Jul 2018 08:27:22 -0400 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: Andy Wingo Subject: Re: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 References: <87bmc4748n.fsf@gnu.org> <874lhffpnq.fsf@gnu.org> <87tvpfaqfd.fsf@gnu.org> <87efgil5jz.fsf@netris.org> <87lgaqjemj.fsf@igalia.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 17 Messidor an 226 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Thu, 05 Jul 2018 14:27:20 +0200 In-Reply-To: <87lgaqjemj.fsf@igalia.com> (Andy Wingo's message of "Thu, 05 Jul 2018 10:00:52 +0200") Message-ID: <87po01eul3.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 31925 Cc: Mark H Weaver , 31925@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) Hello, Andy Wingo skribis: > On Thu 05 Jul 2018 05:33, Mark H Weaver writes: [...] >> Another possibility: both the finalization thread and the signal >> delivery thread call 'scm_without_guile', which calls 'GC_do_blocking', >> which also temporarily grabs the GC allocation lock before calling the >> specified function. See 'GC_do_blocking_inner' in pthread_support.c in >> libgc. You spawn the signal delivery thread by calling 'sigaction' and >> you make work for it to do every second when the SIGALRM is delivered. > > The signal thread is a possibility though in that case you'd get a > warning; the signal-handling thread appears in scm_all_threads. Do you > see a warning? If you do, that is a problem :) I don=E2=80=99t see a warning. But as a Guile user, I shouldn=E2=80=99t see a warning just because there= =E2=80=99s a signal thread anyway; it=E2=80=99s not a thread I spawned myself. The weird thing is that the signal thread always exists, and it=E2=80=99s surprising IMO that it shows up in =E2=80=98scm_all_threads=E2=80=99 becaus= e it=E2=80=99s not a =E2=80=9Cuser thread=E2=80=9D. The other surprise is that the warning isn= =E2=80=99t triggered: --8<---------------cut here---------------start------------->8--- $ guile GNU Guile 2.2.4 Copyright (C) 1995-2017 Free Software Foundation, Inc. Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. This program is free software, and you are welcome to redistribute it under certain conditions; type `,show c' for details. Enter `,help' for help. scheme@(guile-user)> (all-threads) $1 =3D (# #) scheme@(guile-user)> (when (zero? (primitive-fork)) (primitive-_exit 0)) ;; no warning --8<---------------cut here---------------end--------------->8--- Ludo=E2=80=99, surprised. :-) From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 05 10:04:25 2018 Received: (at 31925) by debbugs.gnu.org; 5 Jul 2018 14:04:25 +0000 Received: from localhost ([127.0.0.1]:47890 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fb4rx-0004rU-Ab for submit@debbugs.gnu.org; Thu, 05 Jul 2018 10:04:25 -0400 Received: from pb-sasl1.pobox.com ([64.147.108.66]:58072 helo=sasl.smtp.pobox.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fb4ru-0004rM-UQ for 31925@debbugs.gnu.org; Thu, 05 Jul 2018 10:04:24 -0400 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-sasl1.pobox.com (Postfix) with ESMTP id 2CB5CD9D27; Thu, 5 Jul 2018 10:04:22 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=ftjwDegkJr/IIQ/NV2DiucUwiRw=; b=R2S0sh 5bHdi0ZUBCpUaHk4rNb2uq42xJNoKLMhMEI+2EkHGVyyOesC5ivJ9kSZTNwpjfvv 7Dr/qW6ewQOTCDjR6+uflch9eDBR7M4cpAvpYSln/p8SZWEw/2r0/8eTlQqvOsJj jnXqBxEOOdnpVeL5i8xYQ/7gndUYYMwxOoAKs= Received: from pb-sasl1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-sasl1.pobox.com (Postfix) with ESMTP id 23918D9D26; Thu, 5 Jul 2018 10:04:22 -0400 (EDT) Received: from sparrow (unknown [88.160.190.192]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pb-sasl1.pobox.com (Postfix) with ESMTPSA id 37B88D9D24; Thu, 5 Jul 2018 10:04:21 -0400 (EDT) From: Andy Wingo To: Mark H Weaver Subject: Re: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 References: <87bmc4748n.fsf@gnu.org> <874lhffpnq.fsf@gnu.org> <87tvpfaqfd.fsf@gnu.org> <87efgil5jz.fsf@netris.org> <87lgaqjemj.fsf@igalia.com> <87601ukneg.fsf@netris.org> Date: Thu, 05 Jul 2018 16:04:12 +0200 In-Reply-To: <87601ukneg.fsf@netris.org> (Mark H. Weaver's message of "Thu, 05 Jul 2018 06:05:59 -0400") Message-ID: <874lhdkcdf.fsf@igalia.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 50B7A2E8-805C-11E8-B318-46F7D6707B88-02397024!pb-sasl1.pobox.com X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 31925 Cc: 31925@debbugs.gnu.org, Ludovic =?utf-8?Q?Court=C3=A8s?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.3 (/) Hi, On Thu 05 Jul 2018 12:05, Mark H Weaver writes: > However, it's also the case that libgc uses 'pthread_atfork' (where > available) to arrange to grab the GC allocation as well as the "mark > locks" in the case where parallel marking is enabled. See > fork_prepare_proc, fork_parent_proc, and fork_child_proc in > pthread_support.c. I don't think this is enabled by default. You have to configure your libgc this way. When investigating similar bugs, I proposed enabling it by default a while ago: http://www.hpl.hp.com/hosted/linux/mail-archives/gc/2012-February/004958.html I ended up realizing that pthread_atfork was just a bogus interface. For one, it turns out that POSIX clearly says that if a multithreaded program forks, the behavior of the child after the fork is undefined if it calls any non-async-signal-safe function before calling exec(): https://lists.gnu.org/archive/html/guile-devel/2012-02/msg00157.html But practically, the only reasonable thing to do with atfork is to grab all of the locks, then release them after forking, in both child and parent. However you can't do this without deadlocking from a library, as the total lock order is a property of the program and not something a library can decide. There are thus two solutions: either ensure that there are no other threads when you fork, or only call async-signal-safe functions before you exec(). open-process does the latter. fork will warn if the former is not the case. When last I looked into this, I concluded that pthread_atfork doesn't buy us anything, though I could be wrong! >> Here's the body of primitive-fork fwiw: >> >> { >> int pid; >> scm_i_finalizer_pre_fork (); >> if (scm_ilength (scm_all_threads ()) != 1) > > I think there's a race here. I think it's possible for the finalizer > thread to be respawned after 'scm_i_finalizer_pre_fork' in two different > ways: > > (1) 'scm_all_threads' performs allocation, which could trigger GC. > > (2) another thread could perform heap allocation and trigger GC after > 'scm_i_finalizer_pre_fork' joins the thread. it might then shut > down before 'scm_all_thread' is called. > > However, these are highly unlikely scenarios, and most likely not the > problem we're seeing here. > > Still, I think the 'scm_i_finalizer_pre_fork' should be moved after the > 'if', to avoid this race. Good point! Probably we should use some non-allocating scm_i_is_multi_threaded() or something. We can't move the pre-fork thing though because one of the effects we are looking for is to reduce the thread count! Cheers, Andy From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 05 10:08:56 2018 Received: (at 31925) by debbugs.gnu.org; 5 Jul 2018 14:08:56 +0000 Received: from localhost ([127.0.0.1]:47894 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fb4wJ-0004xR-WA for submit@debbugs.gnu.org; Thu, 05 Jul 2018 10:08:56 -0400 Received: from pb-sasl1.pobox.com ([64.147.108.66]:59398 helo=sasl.smtp.pobox.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fb4wI-0004xK-O0 for 31925@debbugs.gnu.org; Thu, 05 Jul 2018 10:08:55 -0400 Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by pb-sasl1.pobox.com (Postfix) with ESMTP id 9E134D9D43; Thu, 5 Jul 2018 10:08:54 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; s=sasl; bh=XxJoUfXj21I4 KiWLPBbA6k/ehaw=; b=LTTcGFif8fZGYBhjGlx74Dqpl3SAZXnKozgDUYWTrxEe 1tzfvtpAeo4AVRVpEQubon+e0W/NH2WMuIT9KXhJ/s46ikdboQ8wofrVdaH328mw ripzQ6uHGgALpWzY08T6BSRy8OmB0EpdZTjwYEbHj/TYLbv6d/Zh1EuBJsJu42Y= Received: from pb-sasl1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-sasl1.pobox.com (Postfix) with ESMTP id 84700D9D42; Thu, 5 Jul 2018 10:08:54 -0400 (EDT) Received: from sparrow (unknown [88.160.190.192]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by pb-sasl1.pobox.com (Postfix) with ESMTPSA id A886BD9D41; Thu, 5 Jul 2018 10:08:53 -0400 (EDT) From: Andy Wingo To: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Re: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 References: <87bmc4748n.fsf@gnu.org> <874lhffpnq.fsf@gnu.org> <87tvpfaqfd.fsf@gnu.org> <87efgil5jz.fsf@netris.org> <87lgaqjemj.fsf@igalia.com> <87po01eul3.fsf@gnu.org> Date: Thu, 05 Jul 2018 16:08:45 +0200 In-Reply-To: <87po01eul3.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Thu, 05 Jul 2018 14:27:20 +0200") Message-ID: <87zhz5ixle.fsf@igalia.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Pobox-Relay-ID: F313B676-805C-11E8-868D-46F7D6707B88-02397024!pb-sasl1.pobox.com X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 31925 Cc: Mark H Weaver , 31925@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.3 (/) On Thu 05 Jul 2018 14:27, ludo@gnu.org (Ludovic Court=C3=A8s) writes: > Hello, > > Andy Wingo skribis: > >> The signal thread is a possibility though in that case you'd get a >> warning; the signal-handling thread appears in scm_all_threads. Do you >> see a warning? If you do, that is a problem :) > > I don=E2=80=99t see a warning. > > But as a Guile user, I shouldn=E2=80=99t see a warning just because there= =E2=80=99s a > signal thread anyway; it=E2=80=99s not a thread I spawned myself. I understand but it's how it works. If we want to change this, probably we need a similar interface as we have with finalization. > The weird thing is that the signal thread always exists, and it=E2=80=99s > surprising IMO that it shows up in =E2=80=98scm_all_threads=E2=80=99 beca= use it=E2=80=99s not a > =E2=80=9Cuser thread=E2=80=9D. The other surprise is that the warning is= n=E2=80=99t triggered: > > $ guile > GNU Guile 2.2.4 > Copyright (C) 1995-2017 Free Software Foundation, Inc. > > Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. > This program is free software, and you are welcome to redistribute it > under certain conditions; type `,show c' for details. > > Enter `,help' for help. > scheme@(guile-user)> (all-threads) > $1 =3D (# #) > scheme@(guile-user)> (when (zero? (primitive-fork)) (primitive-_exit 0)) > ;; no warning Are you certain that this is the signal-handling thread and not the finalizer thread? I suspect it is the finalizer thread, and that it gets properly shut down before the fork. Regarding seeing the warning: do you do make some other binding for the default warning port in Guix? Andy From debbugs-submit-bounces@debbugs.gnu.org Fri Jul 06 11:35:36 2018 Received: (at 31925) by debbugs.gnu.org; 6 Jul 2018 15:35:36 +0000 Received: from localhost ([127.0.0.1]:48783 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fbSli-000274-V2 for submit@debbugs.gnu.org; Fri, 06 Jul 2018 11:35:36 -0400 Received: from eggs.gnu.org ([208.118.235.92]:40023) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fbSlh-00026p-4O for 31925@debbugs.gnu.org; Fri, 06 Jul 2018 11:35:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fbSlY-00073V-V5 for 31925@debbugs.gnu.org; Fri, 06 Jul 2018 11:35:28 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:33703) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fbSlY-00073L-Jd; Fri, 06 Jul 2018 11:35:24 -0400 Received: from [193.50.110.150] (port=44084 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fbSlY-0000zj-10; Fri, 06 Jul 2018 11:35:24 -0400 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: Andy Wingo Subject: Re: bug#31925: 'guix substitutes' sometimes hangs on glibc 2.27 References: <87bmc4748n.fsf@gnu.org> <874lhffpnq.fsf@gnu.org> <87tvpfaqfd.fsf@gnu.org> <87efgil5jz.fsf@netris.org> <87lgaqjemj.fsf@igalia.com> <87po01eul3.fsf@gnu.org> <87zhz5ixle.fsf@igalia.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 18 Messidor an 226 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Fri, 06 Jul 2018 17:35:22 +0200 In-Reply-To: <87zhz5ixle.fsf@igalia.com> (Andy Wingo's message of "Thu, 05 Jul 2018 16:08:45 +0200") Message-ID: <87fu0wbcn9.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 31925 Cc: Mark H Weaver , 31925@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) Hello Andy, Andy Wingo skribis: > On Thu 05 Jul 2018 14:27, ludo@gnu.org (Ludovic Court=C3=A8s) writes: > >> Hello, >> >> Andy Wingo skribis: >> >>> The signal thread is a possibility though in that case you'd get a >>> warning; the signal-handling thread appears in scm_all_threads. Do you >>> see a warning? If you do, that is a problem :) >> >> I don=E2=80=99t see a warning. >> >> But as a Guile user, I shouldn=E2=80=99t see a warning just because ther= e=E2=80=99s a >> signal thread anyway; it=E2=80=99s not a thread I spawned myself. > > I understand but it's how it works. If we want to change this, probably > we need a similar interface as we have with finalization. Right, understood. >> scheme@(guile-user)> (all-threads) >> $1 =3D (# #) >> scheme@(guile-user)> (when (zero? (primitive-fork)) (primitive-_exit 0)) >> ;; no warning > > Are you certain that this is the signal-handling thread and not the > finalizer thread? I suspect it is the finalizer thread, and that it > gets properly shut down before the fork. Oh, you must be right. > Regarding seeing the warning: do you do make some other binding for the > default warning port in Guix? No. Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Sat Oct 19 16:04:26 2019 Received: (at control) by debbugs.gnu.org; 19 Oct 2019 20:04:26 +0000 Received: from localhost ([127.0.0.1]:53575 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iLuxe-00089U-2i for submit@debbugs.gnu.org; Sat, 19 Oct 2019 16:04:26 -0400 Received: from eggs.gnu.org ([209.51.188.92]:40143) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iLuxc-00089G-8o for control@debbugs.gnu.org; Sat, 19 Oct 2019 16:04:24 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:40741) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1iLuxX-0007Vt-64 for control@debbugs.gnu.org; Sat, 19 Oct 2019 16:04:19 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=36954 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1iLuxW-0001Ep-PP for control@debbugs.gnu.org; Sat, 19 Oct 2019 16:04:19 -0400 Date: Sat, 19 Oct 2019 22:04:17 +0200 Message-Id: <87sgnov6wu.fsf@gnu.org> To: control@debbugs.gnu.org From: =?utf-8?Q?Ludovic_Court=C3=A8s?= Subject: control message for bug #31925 MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) tags 31925 unreproducible close 31925 quit From unknown Fri Jun 20 07:18:34 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 17 Nov 2019 12:24:11 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator