GNU bug report logs -
#59884
‘gui-installed-desktop-os-encrypted’ test intermittent failures
Previous Next
Reported by: Ludovic Courtès <ludo <at> gnu.org>
Date: Wed, 7 Dec 2022 13:52:01 UTC
Severity: important
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 59884 in the body.
You can then email your comments to 59884 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
othacehe <at> gnu.org, bug-guix <at> gnu.org
:
bug#59884
; Package
guix
.
(Wed, 07 Dec 2022 13:52:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Ludovic Courtès <ludo <at> gnu.org>
:
New bug report received and forwarded. Copy sent to
othacehe <at> gnu.org, bug-guix <at> gnu.org
.
(Wed, 07 Dec 2022 13:52:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi,
The ‘gui-installed-desktop-os-encrypted’ test fails intermittently like
so:
--8<---------------cut here---------------start------------->8---
Dec 7 12:14:01 localhost installer[253]: ^M^[[Kguix system: ^[[1m^[[0mbootloader successfully installed on /dev/vda^M
Dec 7 12:14:01 localhost installer[253]: command ("guix" "system" "init" "--fallback" "--no-grafts" "--no-substitutes" "/mnt/etc/config.scm" "/mnt") succeeded
conversation expecting pattern ((quote installation-complete))
Dec 7 12:14:01 localhost shepherd[1]: Service guix-daemon has been stopped.
Dec 7 12:14:01 localhost shepherd[1]: Service guix-daemon has been started.
Dec 7 12:14:01 localhost installer[253]: crashing due to uncaught exception: system-error ("umount" "~S: ~A" ("/remove" "Device or resource busy") (16))
Dec 7 12:14:01 localhost installer[253]: running form #<newt-form aec220> ("Unexpected problem") with 1 clients
Dec 7 12:14:01 localhost installer[253]: form #<newt-form aec220> ("Unexpected problem"): client 20 replied #t
/gnu/store/2bdm6l1j7siky5vadr62l3ns2z8qz75a-shepherd-marionette.scm:1:1718: ERROR:
1. &pattern-not-matched:
pattern: ((quote installation-complete))
sexp: (contents-dialog (title "Unexpected problem") (text "The installer has encountered an unexpected problem. The backtrace is displayed below. You may choose to exit or create a dump archive.") (content "In ./gnu/installer/steps.scm:\n 150:13 19 (run ((substitutes . #f) (network (select-technology . #<<technology> name: \"Wired\" type: \"ethernet\" powered?: #t connected?: #t>) (power-technology . #<unspecified>) (# . #<<???>) ???) ???) ???)\n 150:13 18 (run ((user #<<user> name: \"root\" real-name: \"\" group: \"users\" password: <secret> home-directory: \"/root\"> #<<user> name: \"alice\" real-name: \"Alice\" group: \"users\" password: <s???> ???) ???) ???)\n 150:13 17 (run ((services #<<system-service> name: \"GNOME\" type: desktop recommended?: #f snippet: ((service gnome-desktop-service-type)) packages: ()> #<<system-service> name: \"Xfce\" ty???> ???) ???) ???)\n 148:23 16 (run ((partition #<<user-partition> name: #f type: normal file-name: \"/dev/vda1\" disk-file-name: \"/dev/vda\" crypt-label: #f crypt-password: #f fs-type: ext4 bootable?: #t esp?:???> ???) ???) ???)\nIn ./gnu/installer/newt/final.scm:\n 132:11 15 (run-final-page ((partition #<<user-partition> name: #f type: normal file-name: \"/dev/vda1\" disk-file-name: \"/dev/vda\" crypt-label: #f crypt-password: #f fs-type: ext4 bootable???> ???) ???) ???)\n 105:21 14 (run-install-shell \"en_HK.utf8\" #:users _)\nIn ./gnu/installer/final.scm:\n 191:5 13 (install-system \"en_HK.utf8\" #:users _)\n 113:13 12 (call-with-mnt-container #<procedure 7f0e0b7d2f00 at ./gnu/installer/final.scm:192:7 ()>)\nIn gnu/build/linux-container.scm:\n 265:16 11 (run-container _ _ (mnt) _ #<procedure 7f0e0b7d2f00 at ./gnu/installer/final.scm:192:7 ()> #:guest-uid _ #:guest-gid _)\nIn ./gnu/installer/final.scm:\n 222:13 10 (_)\nIn gnu/build/install.scm:\n 290:5 9 (unmount-cow-store \"/mnt\" \"/tmp/guix-inst\")\nIn guix/build/syscalls.scm:\n 600:10 8 (_ _ _ #:update-mtab? _)\nIn ice-9/boot-9.scm:\n 1685:16 7 (raise-exception _ #:continuable? _)\n 1780:13 6 (_ #<&compound-exception components: (#<&external-error> #<&origin origin: \"umount\"> #<&message message: \"~S: ~A\"> #<&irritants irritants: (\"/remove\" \"Device or resource busy\")> #<&ex???>)\nIn ice-9/eval.scm:\n 619:8 5 (_ #(#(#(#<directory (guile-user) 7f0e0cee8c80>) system-error (\"umount\" \"~S: ~A\" (\"/remove\" \"Device or resource busy\") (16))) #<variable 7f0e08731270 value: #<unspecified>> #<varia???> ???))\n 626:19 4 (_ #(#(#(#<directory (guile-user) 7f0e0cee8c80>) system-error (\"umount\" \"~S: ~A\" (\"/remove\" \"Device or resource busy\") (16))) #<variable 7f0e08731270 value: #<unspecified>> #<varia???> ???))\nIn ./gnu/installer/dump.scm:\n 58:4 3 (prepare-dump system-error (\"umount\" \"~S: ~A\" (\"/remove\" \"Device or resource busy\") (16)) #:result _)\nIn ice-9/ports.scm:\n 433:17 2 (call-with-output-file _ _ #:binary _ #:encoding _)\nIn ./gnu/installer/dump.scm:\n 60:27 1 (_ #<output: installer-backtrace 7>)\nIn unknown file:\n 0 (make-stack #t)\n./gnu/installer/dump.scm:62:36: In procedure umount: \"/remove\": Device or resource busy\n"))
Dec 7 12:14:01 localhost installer[198]: unmounting "/mnt/"
Backtrace:
2 (primitive-load "/gnu/store/s6a8v2l4387bgsp0w0pcbg4wjsy?")
In ice-9/eval.scm:
191:35 1 (_ #f)
619:8 0 (_ #(#<directory (guile-user) 7fffeffcfc80> #<variabl?>))
ice-9/eval.scm:619:8: Throw to key `marionette-eval-failure' with args `((quote (complete-installation installer-socket)))'.
builder for `/gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv' failed with exit code 1
@ build-failed /gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv - 1 builder for `/gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv' failed with exit code 1
cannot build derivation `/gnu/store/dr20sisps9rlpbq0vfzncdiyymrd5r0i-gui-installed-desktop-os-encrypted.drv': 1 dependencies couldn't be built
--8<---------------cut here---------------end--------------->8---
(From <https://ci.guix.gnu.org/build/224930/details>.)
Does that ring a bell, Mathieu?
Thanks,
Ludo’.
Severity set to 'important' from 'normal'
Request was from
Ludovic Courtès <ludo <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Wed, 07 Dec 2022 17:37:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#59884
; Package
guix
.
(Thu, 08 Dec 2022 18:03:01 GMT)
Full text and
rfc822 format available.
Message #10 received at 59884 <at> debbugs.gnu.org (full text, mbox):
Hello,
> ice-9/eval.scm:619:8: Throw to key `marionette-eval-failure' with args `((quote (complete-installation installer-socket)))'.
> builder for `/gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv' failed with exit code 1
> @ build-failed /gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv - 1 builder for `/gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv' failed with exit code 1
> cannot build derivation `/gnu/store/dr20sisps9rlpbq0vfzncdiyymrd5r0i-gui-installed-desktop-os-encrypted.drv': 1 dependencies couldn't be built
>
> (From <https://ci.guix.gnu.org/build/224930/details>.)
>
> Does that ring a bell, Mathieu?
I spent days on that issue before. It used to show up on all installer
tests, and even on real hardware, then
8ce6f4dc2879919c12bc76a2f4b01200af97e01 mitigated it.
The installation is now made in a container to make sure that we are
later on able to umount the store overlay even though some background
processes such as kmscon or udev opened files from the overlay.
Now the issue only shows up on that specific test and is intermittent as
you noticed.
To be honest, that was quite painful to debug and I'm a bit scared to
jump back in. I think I had the marionette produce some lsof reports
back then, or something like that. I very much regret not to have kept
notes somewhere.
Mathieu
Information forwarded
to
bug-guix <at> gnu.org
:
bug#59884
; Package
guix
.
(Fri, 09 Dec 2022 22:33:01 GMT)
Full text and
rfc822 format available.
Message #13 received at 59884 <at> debbugs.gnu.org (full text, mbox):
Hi,
Mathieu Othacehe <othacehe <at> gnu.org> skribis:
> I spent days on that issue before. It used to show up on all installer
> tests, and even on real hardware, then
> 8ce6f4dc2879919c12bc76a2f4b01200af97e01 mitigated it.
>
> The installation is now made in a container to make sure that we are
> later on able to umount the store overlay even though some background
> processes such as kmscon or udev opened files from the overlay.
Right.
> Now the issue only shows up on that specific test and is intermittent as
> you noticed.
On this particular test it seems to be frequent.
> To be honest, that was quite painful to debug and I'm a bit scared to
> jump back in. I think I had the marionette produce some lsof reports
> back then, or something like that. I very much regret not to have kept
> notes somewhere.
Yeah. One possibility is timing: (restart-service 'guix-daemon) kills
the daemon’s process group, waits for the group leader to terminate,
then starts the daemon. I think there’s a possibility that other
processes in the group (like a ‘guix substitute’ child) are still alive
at that point and they might be the one keeping the device busy.
Sleeping for a couple of seconds should allow us to kinda verify that
hypothesis. However, the fact that it’s only in the cryptsetup case
suggests this hypothesis might well be completely bogus.
But yeah, invoking ‘lsof’ would tell us.
I guess we’ll leave that as future work though.
Thanks for your feedback!
Ludo’.
Reply sent
to
Ludovic Courtès <ludo <at> gnu.org>
:
You have taken responsibility.
(Sat, 10 Dec 2022 15:36:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Ludovic Courtès <ludo <at> gnu.org>
:
bug acknowledged by developer.
(Sat, 10 Dec 2022 15:36:02 GMT)
Full text and
rfc822 format available.
Message #18 received at 59884-done <at> debbugs.gnu.org (full text, mbox):
Ludovic Courtès <ludo <at> gnu.org> skribis:
> Yeah. One possibility is timing: (restart-service 'guix-daemon) kills
> the daemon’s process group, waits for the group leader to terminate,
> then starts the daemon. I think there’s a possibility that other
> processes in the group (like a ‘guix substitute’ child) are still alive
> at that point and they might be the one keeping the device busy.
> Sleeping for a couple of seconds should allow us to kinda verify that
> hypothesis.
For good measure, I threw a retry loop in commit
61b7e9687757aff013b99e4ab15669a950c8b222. The test passes on my laptop
and it passed on ci.guix as well:
https://ci.guix.gnu.org/eval/31297
I’m considering the case closed and now getting ready for 1.4.0rc2.
Ludo’.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 08 Jan 2023 12:24:07 GMT)
Full text and
rfc822 format available.
This bug report was last modified 2 years and 163 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.