GNU bug report logs - #24496
offloading should fall back to local build after n tries

Previous Next

Package: guix;

Reported by: ng0 <ngillmann <at> runbox.com>

Date: Wed, 21 Sep 2016 15:41:02 UTC

Severity: normal

Done: Simon Tournier <zimon.toutoune <at> gmail.com>

Full log


View this message in rfc822 format

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: ng0 <ngillmann <at> runbox.com>, 24496 <at> debbugs.gnu.org, zimoun <zimon.toutoune <at> gmail.com>
Subject: bug#24496: offloading should fall back to local build after n tries
Date: Fri, 17 Dec 2021 16:57:33 -0500
Hello Ludovic,

Ludovic Courtès <ludo <at> gnu.org> writes:

> Hi!
>
> zimoun <zimon.toutoune <at> gmail.com> skribis:
>
>> I am just hitting this old bug#24496 [1].
>>
>> On Mon, 26 Sep 2016 at 18:20, ludo <at> gnu.org (Ludovic Courtès) wrote:
>>> ng0 <ngillmann <at> runbox.com> skribis:
>>>
>>>> When I forgot that my build machine is offline and I did not pass
>>>> --no-build-hook, the offloading keeps trying forever until I had to
>>>> cancel the build, boot the build-machine and started the build again.
>>
>> [...]
>>
>>> Like you say, on Hydra-style setup this could be a problem: the
>>> front-end machine may have --max-jobs=0, meaning that it cannot perform
>>> builds on its own.
>>>
>>> So I guess we would need a command-line option to select a different
>>> behavior.  I’m not sure how to do that because ‘guix offload’ is
>>> “hidden” behind ‘guix-daemon’, so there’s no obvious place for such an
>>> option.
>>
>> When the build machine used to offload is offline and the master daemon
>> is --max-jobs=0, I expect X tries (leading to timeout) and then just
>> fails with a hint, where X is defined by user.  WDYT?
>>
>>
>>> In the meantime, you could also hack up your machines.scm: it would
>>> return a list where unreachable machines have been filtered out.
>>
>> Maybe, this could be done by “guix offload”.
>
> Prior to commit efbf5fdd01817ea75de369e3dd2761a85f8f7dd5, this was the
> case: an unreachable machine would have ‘machine-load’ return +inf.0,
> and so it would be discarded from the list of candidates.
>
> However, I think this behavior was unintentionally lost in
> efbf5fdd01817ea75de369e3dd2761a85f8f7dd5.  Maxim, WDYT?

I just reviewed this commit, and don't see anywhere where the behavior
would have changed.  The discarding happens here:

--8<---------------cut here---------------start------------->8---
-         (if (and node (< load 2.) (>= space %minimum-disk-space))
+         (if (and node
+                  (or (not threshold) (< load threshold))
+                  (>= space %minimum-disk-space))
--8<---------------cut here---------------end--------------->8---

previously load could be set to +inf.0.  Now it is a float between 0.0
and 1.0, with threshold defaulting to 0.6.

As far as I remember, this has always been a problem for me (busy
offload machines being forever retried with no fallback to the local
machine).

Thanks,

Maxim




This bug report was last modified 22 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.