GNU bug report logs - #67988
[Cuirass] ‘request-work’ responses received by several workers

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludovic.courtes <at> inria.fr>

Date: Sat, 23 Dec 2023 09:14:01 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 67988 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 67988 <at> debbugs.gnu.org
Subject: Re: bug#67988: [Cuirass] ‘request-work’
 responses received by several workers
Date: Fri, 31 May 2024 21:55:16 +0200
Ludovic Courtès <ludovic.courtes <at> inria.fr> skribis:

> I’m under the impression that sometimes, when the server replies to
> ‘worker-request-work’ messages, its reply is received by more than just
> the target worker, leading to builds being performed twice:

On closer inspection, the theory of the message being received by two
different peers doesn’t hold.

Instead, I believe ‘db-get-pending-build’ would return the same build at
two different points in time, typically while the first one is still
running.

That’s normally not possible because the build’s status is changed to
‘submitted’ once it’s been picked up.  Turns out that, due to slowness
of the query in ‘db-get-pending-build’ (fixed in
17338588d4862b04e9e405c1244a2ea703b50d98), ‘remote-server’ would
sometimes fail to see worker pings in a timely fashion.  Thus, it would
call ‘db-remove-unresponsive-workers’, which would reschedule builds
that were being carried out by said worker(s).  And that’s how we would
end up with multiple concurrent builds of the same derivation.

I added logging in c2061ca845d05694ebeb88935a6ff2254711beb2, which
should give a hint, should that happen again.

Ludo’.




This bug report was last modified 352 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.