GNU bug report logs - #24496
offloading should fall back to local build after n tries

Previous Next

Package: guix;

Reported by: ng0 <ngillmann <at> runbox.com>

Date: Wed, 21 Sep 2016 15:41:02 UTC

Severity: normal

Full log


View this message in rfc822 format

From: ludo <at> gnu.org (Ludovic Courtès)
To: ng0 <ngillmann <at> runbox.com>
Cc: 24496 <at> debbugs.gnu.org
Subject: bug#24496: offloading should fall back to local build after n tries
Date: Wed, 05 Oct 2016 13:36:20 +0200
ng0 <ngillmann <at> runbox.com> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:

[...]

>> Like you say, on Hydra-style setup this could be a problem: the
>> front-end machine may have --max-jobs=0, meaning that it cannot perform
>> builds on its own.
>>
>> So I guess we would need a command-line option to select a different
>> behavior.  I’m not sure how to do that because ‘guix offload’ is
>> “hidden” behind ‘guix-daemon’, so there’s no obvious place for such an
>> option.
>
> Could the daemon run with --enable-hydra-style or --disable-hydra-style
> and --disable-hydra-style would allow falling back to local build if
> after a defined time - keeping slow connections in mind - the machine
> did not reply.

That would be too ad-hoc IMO, and the problem mentioned above remains.

>> In the meantime, you could also hack up your machines.scm: it would
>> return a list where unreachable machines have been filtered out.
>
> How can I achieve this?

Something like:

  (define the-machine (build-machine …))

  (if (managed-to-connect-timely the-machine)
      (list the-machine)
      '())

… where ‘managed-to-connect-timely’ would try to connect to the
machine with a timeout.

> And to append to this bug: it seems to me that offloading requires 1
> lsh-key for each
> build-machine.

The main machine needs to be able to connect to each build machine over
SSH, so indeed, that requires proper SSH key registration (host keys and
authorized user keys).

> (https://lists.gnu.org/archive/html/help-guix/2016-10/msg00007.html)
> and that you can not directly address them (say I want to create some
> system where I want to build on machine 1 AND machine 2. Having 2
> x86_64 in machines.scm only selects one of them (if 2 were working,
> see linked thread) and builds on the one which is accessible first. If
> however the first machine is somehow blocked and it fails, therefore
> terminates lsh connection, the build does not happen at all.

The code that selects machines is in (guix scripts offload),
specifically ‘choose-build-machine’.  It tries to choose the “best”
machine, which means, roughly, the fastest and least loaded one.

HTH,
Ludo’.




This bug report was last modified 3 years and 173 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.