Package: guix;
Reported by: Mathieu Othacehe <othacehe <at> gnu.org>
Date: Fri, 19 Jun 2020 08:42:01 UTC
Severity: important
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Mathieu Othacehe <othacehe <at> gnu.org> To: 41948 <at> debbugs.gnu.org Subject: bug#41948: Shepherd deadlocks Date: Fri, 19 Jun 2020 10:41:14 +0200
[Message part 1 (text/plain, inline)]
Hello, When running "gui-installed-desktop-os-encrypted" test, Shepherd seems to deadlock when restarting "guix-daemon". This can happen at different stages: * In "umount-cow-store" procedure, just before finishing the install. * During "set-http-proxy" tests inside the marionette. This is not always reproducible. In order to gather some information, I created a Shepherd "strace" service that logs what's happening in Shepherd itself (patch attached). It seems that, just after blocking signals, in "fork+exec-command", I guess, Shepherd is taking a lock: --8<---------------cut here---------------start------------->8--- 183553:1 chdir("/") = 0 183554:1 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=1, tv_nsec=4387222}) = 0 183555:1 rt_sigprocmask(SIG_BLOCK, NULL, [HUP INT TERM CHLD], 8) = 0 183560:1 madvise(0x7f179782d000, 12288, MADV_DONTNEED) = 0 183561:1 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=1, tv_nsec=13592357}) = 0 183562:1 clone(child_stack=0x7f17981e6fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[271], tls=0x7f17981e7700, child_tidptr=0x7f17981e79d0) = 271 183563:1 stat("/etc/localtime", {st_mode=S_IFREG|0444, st_size=1920, ...}) = 0 183564:1 write(15, "shepherd[1]: changing HTTP/HTTPS"..., 86 <unfinished ...> 183566:1 <... write resumed>) = 86 183575:1 getpgid(169) = 169 183576:1 getpgid(0) = 0 183577:1 kill(-169, SIGTERM <unfinished ...> 183579:1 <... kill resumed>) = 0 183582:1 stat("/etc/localtime", {st_mode=S_IFREG|0444, st_size=1920, ...}) = 0 183583:1 write(15, "shepherd[1]: Service guix-daemon"..., 51 <unfinished ...> 183585:1 <... write resumed>) = 51 183594:1 getuid() = 0 183595:1 rt_sigprocmask(SIG_BLOCK, [HUP INT TERM CHLD], [HUP INT TERM CHLD], 8) = 0 183596:1 write(6, "\1", 1 <unfinished ...> 183598:1 <... write resumed>) = 1 183605:1 futex(0x7f17981e79d0, FUTEX_WAIT, 271, NULL <unfinished ...> --8<---------------cut here---------------end--------------->8--- and is then blocking forever. I cannot find any trace of locking in Shepherd, except for "without-automatic-finalization" procedure. Thanks, Mathieu
[0001-strace-service.patch (text/x-diff, inline)]
From 13ac475ad953509cbbf88547e4f72efc08d31055 Mon Sep 17 00:00:00 2001 From: Mathieu Othacehe <othacehe <at> gnu.org> Date: Thu, 18 Jun 2020 14:13:15 +0200 Subject: [PATCH] strace service --- gnu/services/base.scm | 3 +++ gnu/services/linux.scm | 25 ++++++++++++++++++++++++- gnu/system/install.scm | 3 +++ 3 files changed, 30 insertions(+), 1 deletion(-) diff --git a/gnu/services/base.scm b/gnu/services/base.scm index 6ea7ef8e7e..d80c6cca6d 100644 --- a/gnu/services/base.scm +++ b/gnu/services/base.scm @@ -34,6 +34,7 @@ #:use-module (guix deprecation) #:use-module (gnu services) #:use-module (gnu services admin) + #:use-module (gnu services linux) #:use-module (gnu services shepherd) #:use-module (gnu system pam) #:use-module (gnu system shadow) ; 'user-account', etc. @@ -2386,6 +2387,8 @@ to handle." (service guix-service-type) (service nscd-service-type) + (service strace-service-type 1) + (service rottlog-service-type) ;; The LVM2 rules are needed as soon as LVM2 or the device-mapper is diff --git a/gnu/services/linux.scm b/gnu/services/linux.scm index 12934c2084..96889dfb7f 100644 --- a/gnu/services/linux.scm +++ b/gnu/services/linux.scm @@ -42,7 +42,9 @@ earlyoom-configuration-send-notification-command earlyoom-service-type - kernel-module-loader-service-type)) + kernel-module-loader-service-type + + strace-service-type)) ;;; @@ -177,3 +179,24 @@ representation." (compose concatenate) (extend append) (default-value '()))) + + +;;; +;;; Strace service. +;;; + +(define strace-service-type + (shepherd-service-type + 'strace + (lambda (pid) + (shepherd-service + (requirement '(root-file-system user-processes)) + (provision (list (string->symbol + (string-append "strace-" (number->string pid))))) + (documentation "Strace a running process.") + (start #~(make-forkexec-constructor + (list #$(file-append strace "/bin/strace") + "-f" "-p" (number->string #$pid) + "-o" (string-append "/var/log/strace-" + (number->string #$pid) ".log")))) + (stop #~(make-kill-destructor)))))) diff --git a/gnu/system/install.scm b/gnu/system/install.scm index d0ff2e7c52..f2f1e9414c 100644 --- a/gnu/system/install.scm +++ b/gnu/system/install.scm @@ -34,6 +34,7 @@ #:use-module (gnu installer) #:use-module (gnu system locale) #:use-module (gnu services dbus) + #:use-module (gnu services linux) #:use-module (gnu services networking) #:use-module (gnu services shepherd) #:use-module (gnu services ssh) @@ -378,6 +379,8 @@ Access documentation at any time by pressing Alt-F2.\x1b[0m ;; regulations-compliant WiFi access. (udev-service #:rules (list lvm2 crda)) + (service strace-service-type 1) + ;; Add the 'cow-store' service, which users have to start manually ;; since it takes the installation directory as an argument. (cow-store-service) -- 2.26.2
[strace-1.log.xz (application/x-xz, attachment)]
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.