GNU bug report logs - #35181
Hydra offloads often get stuck while exporting build requisites

Previous Next

Package: guix;

Reported by: Mark H Weaver <mhw <at> netris.org>

Date: Sun, 7 Apr 2019 16:44:02 UTC

Severity: normal

Merged with 34157

Done: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mark H Weaver <mhw <at> netris.org>
Cc: 35181 <at> debbugs.gnu.org
Subject: bug#35181: Hydra offloads often get stuck while exporting build requisites
Date: Tue, 09 Apr 2019 12:54:20 +0200
Hi Mark,

Mark H Weaver <mhw <at> netris.org> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Mark H Weaver <mhw <at> netris.org> skribis:
>>
>>> The source checkout currently being transferred for build 3432472
>>> (/gnu/store/…-font-google-material-design-icons-3.0.1-checkout) is 176
>>> megabytes uncompressed, as measured by "du -s --si", which is not
>>> precisely same as NAR size, but hopefully close enough for a rough
>>> estimate.  As I write this, build 3432472 been stuck here for 24 hours
>>> 15 minutes.  Even if the average transfer rate were 4 kilobytes per
>>> second, it should have been done in half that time.
>>
>> This is weird, could it be that data transfers get stuck somehow?
>
> As far as I can tell, that's what seems to happen.
>
>> Did you try to check the status of the ‘nix-store’ and ‘guix offload’
>> processes on the head node?
>
> Here are the corresponding 'guix offload' processes:
>
> hydra <at> 20121227-hydra:~$ ps auxwwf | head -1; ps auxwwf | egrep -B1 'off()load'

[...]

> root     14769  0.0  0.2 145668 10912 ?        SLsl Apr07   0:16  |       |   \_ /gnu/store/yihvhxv3xyyvl1m2cy1lnf1lyi9h76fk-guile-2.2.2/bin/guile --no-auto-compile /gnu/store/fkkjhida23k612naa9d4q6avqj5v3b28-guix-0.13.0-8.357ab93/bin/.guix-real offload x86_64-linux 3600 1 72000

The problem is that this is an ancient Guix.  In the meantime,
offloading has seen relevant changes, in particular things like commit
ed7b44370f71126087eb953f36aad8dc4c44109f which address stability issues
with Guile-SSH (ssh dist node) that was previously used.

I think we should upgrade Guix on hydra.gnu.org otherwise we’re likely
to end up chasing old bugs.

> The 'nix-store' processes seem to be stuck sleeping in 'read', if I'm
> interpreting the 'strace' output correctly:
>
> root <at> 20121227-hydra:~# strace -p 8983
> Process 8983 attached - interrupt to quit
> read(3, ^C <unfinished ...>
> Process 8983 detached
> root <at> 20121227-hydra:~# strace -p 14767
> Process 14767 attached - interrupt to quit
> read(3, ^C <unfinished ...>
> Process 14767 detached
>
>
> "netstat --inet --program" shows that the SSH connections are still
> open:
>
> root <at> 20121227-hydra:~# netstat --inet --program | grep 'hydra\.net\.in\.tum\.'
> tcp        0      0 20121227-hydra.gn:53216 hydra.net.in.tum.de:ssh ESTABLISHED 14769/guile     
> tcp        0      0 20121227-hydra.gn:52434 hydra.net.in.tum.de:ssh ESTABLISHED 8985/guile      
> tcp        0      0 20121227-hydra.gnu.:www hydra.net.in.tum.:52104 TIME_WAIT   -               
> tcp        0      0 20121227-hydra.gnu.:www hydra.net.in.tum.:52103 TIME_WAIT   -               

This could be the kind of issue that we had with (ssh dist node).  It’s
hard to tell.

> I could easily believe that this problem is specific to
> hydra.gnunet.org, but even if that's the case, it would be good if
> offloading would reliably time out before days have passed.

That’s the case with commit a708de151c255712071e42e5c8284756b51768cd,
but again, the Guix installation on hydra may predate that.  :-/

Thanks,
Ludo’.




This bug report was last modified 2 years and 38 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.