GNU bug report logs -
#35181
Hydra offloads often get stuck while exporting build requisites
Previous Next
Reported by: Mark H Weaver <mhw <at> netris.org>
Date: Sun, 7 Apr 2019 16:44:02 UTC
Severity: normal
Merged with 34157
Done: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Hi Mark,
Mark H Weaver <mhw <at> netris.org> skribis:
> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Mark H Weaver <mhw <at> netris.org> skribis:
>>
>>> The source checkout currently being transferred for build 3432472
>>> (/gnu/store/…-font-google-material-design-icons-3.0.1-checkout) is 176
>>> megabytes uncompressed, as measured by "du -s --si", which is not
>>> precisely same as NAR size, but hopefully close enough for a rough
>>> estimate. As I write this, build 3432472 been stuck here for 24 hours
>>> 15 minutes. Even if the average transfer rate were 4 kilobytes per
>>> second, it should have been done in half that time.
>>
>> This is weird, could it be that data transfers get stuck somehow?
>
> As far as I can tell, that's what seems to happen.
>
>> Did you try to check the status of the ‘nix-store’ and ‘guix offload’
>> processes on the head node?
>
> Here are the corresponding 'guix offload' processes:
>
> hydra <at> 20121227-hydra:~$ ps auxwwf | head -1; ps auxwwf | egrep -B1 'off()load'
[...]
> root 14769 0.0 0.2 145668 10912 ? SLsl Apr07 0:16 | | \_ /gnu/store/yihvhxv3xyyvl1m2cy1lnf1lyi9h76fk-guile-2.2.2/bin/guile --no-auto-compile /gnu/store/fkkjhida23k612naa9d4q6avqj5v3b28-guix-0.13.0-8.357ab93/bin/.guix-real offload x86_64-linux 3600 1 72000
The problem is that this is an ancient Guix. In the meantime,
offloading has seen relevant changes, in particular things like commit
ed7b44370f71126087eb953f36aad8dc4c44109f which address stability issues
with Guile-SSH (ssh dist node) that was previously used.
I think we should upgrade Guix on hydra.gnu.org otherwise we’re likely
to end up chasing old bugs.
> The 'nix-store' processes seem to be stuck sleeping in 'read', if I'm
> interpreting the 'strace' output correctly:
>
> root <at> 20121227-hydra:~# strace -p 8983
> Process 8983 attached - interrupt to quit
> read(3, ^C <unfinished ...>
> Process 8983 detached
> root <at> 20121227-hydra:~# strace -p 14767
> Process 14767 attached - interrupt to quit
> read(3, ^C <unfinished ...>
> Process 14767 detached
>
>
> "netstat --inet --program" shows that the SSH connections are still
> open:
>
> root <at> 20121227-hydra:~# netstat --inet --program | grep 'hydra\.net\.in\.tum\.'
> tcp 0 0 20121227-hydra.gn:53216 hydra.net.in.tum.de:ssh ESTABLISHED 14769/guile
> tcp 0 0 20121227-hydra.gn:52434 hydra.net.in.tum.de:ssh ESTABLISHED 8985/guile
> tcp 0 0 20121227-hydra.gnu.:www hydra.net.in.tum.:52104 TIME_WAIT -
> tcp 0 0 20121227-hydra.gnu.:www hydra.net.in.tum.:52103 TIME_WAIT -
This could be the kind of issue that we had with (ssh dist node). It’s
hard to tell.
> I could easily believe that this problem is specific to
> hydra.gnunet.org, but even if that's the case, it would be good if
> offloading would reliably time out before days have passed.
That’s the case with commit a708de151c255712071e42e5c8284756b51768cd,
but again, the Guix installation on hydra may predate that. :-/
Thanks,
Ludo’.
This bug report was last modified 2 years and 38 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.