GNU bug report logs -
#24496
offloading should fall back to local build after n tries
Previous Next
Full log
Message #26 received at 24496 <at> debbugs.gnu.org (full text, mbox):
Hi,
I have not checked all the details, since the code of “guix offload” is
run by root, IIUC and so it is not as friendly as usual to debug. :-)
On Fri, 17 Dec 2021 at 16:57, Maxim Cournoyer <maxim.cournoyer <at> gmail.com> wrote:
>> However, I think this behavior was unintentionally lost in
>> efbf5fdd01817ea75de369e3dd2761a85f8f7dd5. Maxim, WDYT?
>
> I just reviewed this commit, and don't see anywhere where the behavior
> would have changed. The discarding happens here:
[...]
> previously load could be set to +inf.0. Now it is a float between 0.0
> and 1.0, with threshold defaulting to 0.6.
My /etc/guix/machines.scm contains only one machine and --max-jobs=0.
Because the machine is unreachable, IIUC, ’node’ is (or should be) false
and ’load’ is thus not involved, I guess. Indeed, ’report-load’
displays nothing, and instead I get:
--8<---------------cut here---------------start------------->8---
The following derivation will be built:
/gnu/store/c1qicg17ygn1a0biq0q4mkprzy4p2x74-hello-2.10.drv
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
guix offload: error: failed to connect to 'x.x.x.x': Timeout connecting to x.x.x.x
waiting for locks or build slots...
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
guix offload: error: failed to connect to 'x.x.x.x': Timeout connecting to x.x.x.x
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
guix offload: error: failed to connect to 'x.x.x.x': Timeout connecting to x.x.x.x
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
guix offload: error: failed to connect to 'x.x.x.x': Timeout connecting to x.x.x.x
process 75621 acquired build slot '/var/guix/offload/x.x.x.x:22/0'
C-c C-c
--8<---------------cut here---------------end--------------->8---
Well, if the machine is not reachable, then ’session’ is false, right?
--8<---------------cut here---------------start------------->8---
@@ -472,11 +480,15 @@ (define (machine-faster? m1 m2)
(let* ((session (false-if-exception (open-ssh-session best
%short-timeout)))
(node (and session (remote-inferior session)))
- (load (and node (normalized-load best (node-load node))))
+ (load (and node (node-load node)))
+ (threshold (build-machine-overload-threshold best))
(space (and node (node-free-disk-space node))))
+ (when load (report-load best load))
(when node (close-inferior node))
(when session (disconnect! session))
- (if (and node (< load 2.) (>= space %minimum-disk-space))
+ (if (and node
+ (or (not threshold) (< load threshold))
+ (>= space %minimum-disk-space))
[...]
(begin
;; BEST is unsuitable, so try the next one.
(when (and space (< space %minimum-disk-space))
(format (current-error-port)
"skipping machine '~a' because it is low \
on disk space (~,2f MiB free)~%"
(build-machine-name best)
(/ space (expt 2 20) 1.)))
(release-build-slot slot)
(loop others)))))
--8<---------------cut here---------------end--------------->8---
Therefore, the ’else’ branch goes and so the codes does ’(loop others)’.
However, I miss why ’others’ is not empty (only one machine in
/etc/guix/machines.scm). Well, the message «waiting for locks or build
slots...» suggests that something is restarted and it is not that ’loop’
we are observing but another one.
On daemon side, I do not know what this ’waitingForAWhile’ and
’lastWokenUp’ mean.
--8<---------------cut here---------------start------------->8---
/* If we are polling goals that are waiting for a lock, then wake
up after a few seconds at most. */
if (!waitingForAWhile.empty()) {
useTimeout = true;
if (lastWokenUp == 0)
printMsg(lvlError, "waiting for locks or build slots...");
if (lastWokenUp == 0 || lastWokenUp > before) lastWokenUp = before;
timeout.tv_sec = std::max((time_t) 1, (time_t) (lastWokenUp + settings.pollInterval - before));
} else lastWokenUp = 0;
--8<---------------cut here---------------end--------------->8---
Bah it requires more investigations and I agree with Maxim that
efbf5fdd01817ea75de369e3dd2761a85f8f7dd5 is probably not the issue
there.
Cheers,
simon
This bug report was last modified 22 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.