GNU bug report logs - #24496
offloading should fall back to local build after n tries

Previous Next

Package: guix;

Reported by: ng0 <ngillmann <at> runbox.com>

Date: Wed, 21 Sep 2016 15:41:02 UTC

Severity: normal

Full log


Message #23 received at 24496 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: ng0 <ngillmann <at> runbox.com>, 24496 <at> debbugs.gnu.org,
 zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: bug#24496: offloading should fall back to local build after n
 tries
Date: Fri, 17 Dec 2021 16:57:33 -0500
Hello Ludovic,

Ludovic Courtès <ludo <at> gnu.org> writes:

> Hi!
>
> zimoun <zimon.toutoune <at> gmail.com> skribis:
>
>> I am just hitting this old bug#24496 [1].
>>
>> On Mon, 26 Sep 2016 at 18:20, ludo <at> gnu.org (Ludovic Courtès) wrote:
>>> ng0 <ngillmann <at> runbox.com> skribis:
>>>
>>>> When I forgot that my build machine is offline and I did not pass
>>>> --no-build-hook, the offloading keeps trying forever until I had to
>>>> cancel the build, boot the build-machine and started the build again.
>>
>> [...]
>>
>>> Like you say, on Hydra-style setup this could be a problem: the
>>> front-end machine may have --max-jobs=0, meaning that it cannot perform
>>> builds on its own.
>>>
>>> So I guess we would need a command-line option to select a different
>>> behavior.  I’m not sure how to do that because ‘guix offload’ is
>>> “hidden” behind ‘guix-daemon’, so there’s no obvious place for such an
>>> option.
>>
>> When the build machine used to offload is offline and the master daemon
>> is --max-jobs=0, I expect X tries (leading to timeout) and then just
>> fails with a hint, where X is defined by user.  WDYT?
>>
>>
>>> In the meantime, you could also hack up your machines.scm: it would
>>> return a list where unreachable machines have been filtered out.
>>
>> Maybe, this could be done by “guix offload”.
>
> Prior to commit efbf5fdd01817ea75de369e3dd2761a85f8f7dd5, this was the
> case: an unreachable machine would have ‘machine-load’ return +inf.0,
> and so it would be discarded from the list of candidates.
>
> However, I think this behavior was unintentionally lost in
> efbf5fdd01817ea75de369e3dd2761a85f8f7dd5.  Maxim, WDYT?

I just reviewed this commit, and don't see anywhere where the behavior
would have changed.  The discarding happens here:

--8<---------------cut here---------------start------------->8---
-         (if (and node (< load 2.) (>= space %minimum-disk-space))
+         (if (and node
+                  (or (not threshold) (< load threshold))
+                  (>= space %minimum-disk-space))
--8<---------------cut here---------------end--------------->8---

previously load could be set to +inf.0.  Now it is a float between 0.0
and 1.0, with threshold defaulting to 0.6.

As far as I remember, this has always been a problem for me (busy
offload machines being forever retried with no fallback to the local
machine).

Thanks,

Maxim




This bug report was last modified 3 years and 173 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.