GNU bug report logs - #59493
cuirass-remote-worker crash

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludovic.courtes <at> inria.fr>

Date: Tue, 22 Nov 2022 22:15:02 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 59493 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 59493 <at> debbugs.gnu.org
Subject: Re: bug#59493: cuirass-remote-worker crash
Date: Wed, 23 Nov 2022 16:47:32 +0100
Hi,

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

>> 2022-11-21 14:27:24   1685:16  0 (raise-exception _ #:continuable? _)
>> 2022-11-21 14:27:24
>> 2022-11-21 14:27:24 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>> 2022-11-21 14:27:24 Throw to key `match-error' with args `("match" "no matching pattern" (#vu8()))'.
>
> Yes this is because a new remote-server is running on Berlin and it
> sends an empty sequence at every connection:
> https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=fc1641381d2a8a0472a71ef5ad2b64361faaaab4

Oh I see.  It would be nice to avoid non-backward-compatible changes in
the protocol so we can upgrade more smoothly.

> All remote-workers must update, and I have deployed Cuirass
> 1.1.0-13.1341725 on all hydra workers + guix9p.
>
> I have been trying to deploy that to overdrive1 for two days but Berlin
> offloads the builds to kreuzberg which has some issues because a lot of
> builds are timeouting:

Done now!

--8<---------------cut here---------------start------------->8---
ludo <at> overdrive1 ~$ guix system describe
Generation 37   Nov 23 2022 15:58:08    (current)
  file name: /var/guix/profiles/system-37-link
  canonical file name: /gnu/store/62dr875n7i30l375j87flbqfym78kddg-system
  label: GNU with Linux-Libre 6.0.9
  bootloader: grub-efi
  root device: /dev/sda3
  kernel: /gnu/store/p4impcxw8lba8600acrxs21lgzc06xzq-linux-libre-6.0.9/Image
  channels:
    guix:
      repository URL: https://git.savannah.gnu.org/git/guix.git
      commit: 78f03567f44f704dfbc03cb64368aa42a01e78ad
  configuration file: /gnu/store/myvzd1kpw2pfzfj3krl4lzpcbqsdn48x-configuration.scm
--8<---------------cut here---------------end--------------->8---

Running the Shepherd 0.9.3 and all, wonderful.

>> (Stuttering is due to the unprotected use of ‘primitive-fork’: a
>> non-local exit in the child leads it to execute the same code as its
>> parent.  We should fix that, but should we really fork in the first
>> place?  :-))

Fixed in Cuirass commit 9fb6f21d29c5398b35f4c1a77cf6c20f207c9ebb.

> Right, this is problematic. I can't remember why I chose to fork.

One concern is that, in the Avahi case, we create at least one thread
before forking, and as we know that doesn’t work (as in: it might work
sometimes).  ZMQ may also create threads behind our back.

The parent doesn’t call ‘waitpid’ on its children, which isn’t great.

To me, ideally this would be either multi-threaded or Fiberized.  The
latter would be more fruitful but what might be difficult is
guile-simple-zmq integration with Fibers (but maybe not: zmq_getsockopt
+ ZMQ_FD lets us get the file descriptor of a socket).

Something to consider…

Thanks,
Ludo’.




This bug report was last modified 2 years and 183 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.