GNU bug report logs - #59884
‘gui-installed-desktop-os-encrypted’ test intermittent failures

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Wed, 7 Dec 2022 13:52:01 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 59884 in the body.
You can then email your comments to 59884 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to othacehe <at> gnu.org, bug-guix <at> gnu.org:
bug#59884; Package guix. (Wed, 07 Dec 2022 13:52:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ludovic Courtès <ludo <at> gnu.org>:
New bug report received and forwarded. Copy sent to othacehe <at> gnu.org, bug-guix <at> gnu.org. (Wed, 07 Dec 2022 13:52:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: bug-guix <at> gnu.org
Subject: ‘gui-installed-desktop-os-encrypted’
 test intermittent failures
Date: Wed, 07 Dec 2022 14:50:58 +0100
Hi,

The ‘gui-installed-desktop-os-encrypted’ test fails intermittently like
so:

--8<---------------cut here---------------start------------->8---
Dec  7 12:14:01 localhost installer[253]: ^M^[[Kguix system: ^[[1m^[[0mbootloader successfully installed on /dev/vda^M 
Dec  7 12:14:01 localhost installer[253]: command ("guix" "system" "init" "--fallback" "--no-grafts" "--no-substitutes" "/mnt/etc/config.scm" "/mnt") succeeded 
conversation expecting pattern ((quote installation-complete))
Dec  7 12:14:01 localhost shepherd[1]: Service guix-daemon has been stopped. 
Dec  7 12:14:01 localhost shepherd[1]: Service guix-daemon has been started. 
Dec  7 12:14:01 localhost installer[253]: crashing due to uncaught exception: system-error ("umount" "~S: ~A" ("/remove" "Device or resource busy") (16)) 
Dec  7 12:14:01 localhost installer[253]: running form #<newt-form aec220> ("Unexpected problem") with 1 clients 
Dec  7 12:14:01 localhost installer[253]: form #<newt-form aec220> ("Unexpected problem"): client 20 replied #t 
/gnu/store/2bdm6l1j7siky5vadr62l3ns2z8qz75a-shepherd-marionette.scm:1:1718: ERROR:
  1. &pattern-not-matched:
      pattern: ((quote installation-complete))
      sexp: (contents-dialog (title "Unexpected problem") (text "The installer has encountered an unexpected problem. The backtrace is displayed below. You may choose to exit or create a dump archive.") (content "In ./gnu/installer/steps.scm:\n   150:13 19 (run ((substitutes . #f) (network (select-technology . #<<technology> name: \"Wired\" type: \"ethernet\" powered?: #t connected?: #t>) (power-technology . #<unspecified>) (# . #<<???>) ???) ???) ???)\n   150:13 18 (run ((user #<<user> name: \"root\" real-name: \"\" group: \"users\" password: <secret> home-directory: \"/root\"> #<<user> name: \"alice\" real-name: \"Alice\" group: \"users\" password: <s???> ???) ???) ???)\n   150:13 17 (run ((services #<<system-service> name: \"GNOME\" type: desktop recommended?: #f snippet: ((service gnome-desktop-service-type)) packages: ()> #<<system-service> name: \"Xfce\" ty???> ???) ???) ???)\n   148:23 16 (run ((partition #<<user-partition> name: #f type: normal file-name: \"/dev/vda1\" disk-file-name: \"/dev/vda\" crypt-label: #f crypt-password: #f fs-type: ext4 bootable?: #t esp?:???> ???) ???) ???)\nIn ./gnu/installer/newt/final.scm:\n   132:11 15 (run-final-page ((partition #<<user-partition> name: #f type: normal file-name: \"/dev/vda1\" disk-file-name: \"/dev/vda\" crypt-label: #f crypt-password: #f fs-type: ext4 bootable???> ???) ???) ???)\n   105:21 14 (run-install-shell \"en_HK.utf8\" #:users _)\nIn ./gnu/installer/final.scm:\n    191:5 13 (install-system \"en_HK.utf8\" #:users _)\n   113:13 12 (call-with-mnt-container #<procedure 7f0e0b7d2f00 at ./gnu/installer/final.scm:192:7 ()>)\nIn gnu/build/linux-container.scm:\n   265:16 11 (run-container _ _ (mnt) _ #<procedure 7f0e0b7d2f00 at ./gnu/installer/final.scm:192:7 ()> #:guest-uid _ #:guest-gid _)\nIn ./gnu/installer/final.scm:\n   222:13 10 (_)\nIn gnu/build/install.scm:\n    290:5  9 (unmount-cow-store \"/mnt\" \"/tmp/guix-inst\")\nIn guix/build/syscalls.scm:\n   600:10  8 (_ _ _ #:update-mtab? _)\nIn ice-9/boot-9.scm:\n  1685:16  7 (raise-exception _ #:continuable? _)\n  1780:13  6 (_ #<&compound-exception components: (#<&external-error> #<&origin origin: \"umount\"> #<&message message: \"~S: ~A\"> #<&irritants irritants: (\"/remove\" \"Device or resource busy\")> #<&ex???>)\nIn ice-9/eval.scm:\n    619:8  5 (_ #(#(#(#<directory (guile-user) 7f0e0cee8c80>) system-error (\"umount\" \"~S: ~A\" (\"/remove\" \"Device or resource busy\") (16))) #<variable 7f0e08731270 value: #<unspecified>> #<varia???> ???))\n   626:19  4 (_ #(#(#(#<directory (guile-user) 7f0e0cee8c80>) system-error (\"umount\" \"~S: ~A\" (\"/remove\" \"Device or resource busy\") (16))) #<variable 7f0e08731270 value: #<unspecified>> #<varia???> ???))\nIn ./gnu/installer/dump.scm:\n     58:4  3 (prepare-dump system-error (\"umount\" \"~S: ~A\" (\"/remove\" \"Device or resource busy\") (16)) #:result _)\nIn ice-9/ports.scm:\n   433:17  2 (call-with-output-file _ _ #:binary _ #:encoding _)\nIn ./gnu/installer/dump.scm:\n    60:27  1 (_ #<output: installer-backtrace 7>)\nIn unknown file:\n           0 (make-stack #t)\n./gnu/installer/dump.scm:62:36: In procedure umount: \"/remove\": Device or resource busy\n"))
Dec  7 12:14:01 localhost installer[198]: unmounting "/mnt/" 

Backtrace:
           2 (primitive-load "/gnu/store/s6a8v2l4387bgsp0w0pcbg4wjsy?")
In ice-9/eval.scm:
   191:35  1 (_ #f)
    619:8  0 (_ #(#<directory (guile-user) 7fffeffcfc80> #<variabl?>))

ice-9/eval.scm:619:8: Throw to key `marionette-eval-failure' with args `((quote (complete-installation installer-socket)))'.
builder for `/gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv' failed with exit code 1
@ build-failed /gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv - 1 builder for `/gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv' failed with exit code 1
cannot build derivation `/gnu/store/dr20sisps9rlpbq0vfzncdiyymrd5r0i-gui-installed-desktop-os-encrypted.drv': 1 dependencies couldn't be built
--8<---------------cut here---------------end--------------->8---

(From <https://ci.guix.gnu.org/build/224930/details>.)

Does that ring a bell, Mathieu?

Thanks,
Ludo’.




Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Wed, 07 Dec 2022 17:37:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#59884; Package guix. (Thu, 08 Dec 2022 18:03:01 GMT) Full text and rfc822 format available.

Message #10 received at 59884 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 59884 <at> debbugs.gnu.org
Subject: Re: bug#59884: ‘gui-installed-desktop-os-encrypted’ test intermittent failures
Date: Thu, 08 Dec 2022 19:02:38 +0100
Hello,

> ice-9/eval.scm:619:8: Throw to key `marionette-eval-failure' with args `((quote (complete-installation installer-socket)))'.
> builder for `/gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv' failed with exit code 1
> @ build-failed /gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv - 1 builder for `/gnu/store/wgw64jfyqrrg27afqmlj70a22d1mr5mv-installation.drv' failed with exit code 1
> cannot build derivation `/gnu/store/dr20sisps9rlpbq0vfzncdiyymrd5r0i-gui-installed-desktop-os-encrypted.drv': 1 dependencies couldn't be built
>
> (From <https://ci.guix.gnu.org/build/224930/details>.)
>
> Does that ring a bell, Mathieu?

I spent days on that issue before. It used to show up on all installer
tests, and even on real hardware, then
8ce6f4dc2879919c12bc76a2f4b01200af97e01 mitigated it.

The installation is now made in a container to make sure that we are
later on able to umount the store overlay even though some background
processes such as kmscon or udev opened files from the overlay.

Now the issue only shows up on that specific test and is intermittent as
you noticed.

To be honest, that was quite painful to debug and I'm a bit scared to
jump back in. I think I had the marionette produce some lsof reports
back then, or something like that. I very much regret not to have kept
notes somewhere.

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#59884; Package guix. (Fri, 09 Dec 2022 22:33:01 GMT) Full text and rfc822 format available.

Message #13 received at 59884 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 59884 <at> debbugs.gnu.org
Subject: Re: bug#59884: ‘gui-installed-desktop-os-encrypted’ test intermittent failures
Date: Fri, 09 Dec 2022 23:32:05 +0100
Hi,

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

> I spent days on that issue before. It used to show up on all installer
> tests, and even on real hardware, then
> 8ce6f4dc2879919c12bc76a2f4b01200af97e01 mitigated it.
>
> The installation is now made in a container to make sure that we are
> later on able to umount the store overlay even though some background
> processes such as kmscon or udev opened files from the overlay.

Right.

> Now the issue only shows up on that specific test and is intermittent as
> you noticed.

On this particular test it seems to be frequent.

> To be honest, that was quite painful to debug and I'm a bit scared to
> jump back in. I think I had the marionette produce some lsof reports
> back then, or something like that. I very much regret not to have kept
> notes somewhere.

Yeah.  One possibility is timing: (restart-service 'guix-daemon) kills
the daemon’s process group, waits for the group leader to terminate,
then starts the daemon.  I think there’s a possibility that other
processes in the group (like a ‘guix substitute’ child) are still alive
at that point and they might be the one keeping the device busy.
Sleeping for a couple of seconds should allow us to kinda verify that
hypothesis.  However, the fact that it’s only in the cryptsetup case
suggests this hypothesis might well be completely bogus.

But yeah, invoking ‘lsof’ would tell us.

I guess we’ll leave that as future work though.

Thanks for your feedback!

Ludo’.






Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Sat, 10 Dec 2022 15:36:02 GMT) Full text and rfc822 format available.

Notification sent to Ludovic Courtès <ludo <at> gnu.org>:
bug acknowledged by developer. (Sat, 10 Dec 2022 15:36:02 GMT) Full text and rfc822 format available.

Message #18 received at 59884-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 59884-done <at> debbugs.gnu.org
Subject: Re: bug#59884: ‘gui-installed-desktop-os-encrypted’ test intermittent failures
Date: Sat, 10 Dec 2022 16:35:35 +0100
Ludovic Courtès <ludo <at> gnu.org> skribis:

> Yeah.  One possibility is timing: (restart-service 'guix-daemon) kills
> the daemon’s process group, waits for the group leader to terminate,
> then starts the daemon.  I think there’s a possibility that other
> processes in the group (like a ‘guix substitute’ child) are still alive
> at that point and they might be the one keeping the device busy.
> Sleeping for a couple of seconds should allow us to kinda verify that
> hypothesis.

For good measure, I threw a retry loop in commit
61b7e9687757aff013b99e4ab15669a950c8b222.  The test passes on my laptop
and it passed on ci.guix as well:

  https://ci.guix.gnu.org/eval/31297

I’m considering the case closed and now getting ready for 1.4.0rc2.

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 08 Jan 2023 12:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 163 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.