GNU bug report logs -
#41625
Sporadic guix-offload crashes due to EOF errors
Previous Next
Reported by: Marius Bakke <marius <at> gnu.org>
Date: Sun, 31 May 2020 09:52:01 UTC
Severity: normal
Done: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
Message #61 received at 41625-done <at> debbugs.gnu.org (full text, mbox):
Hello,
Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:
> Hi Marius,
>
> Marius Bakke <marius <at> gnu.org> writes:
>
>> Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skriver:
>>
>>>> Is running ‘guix offload test /etc/guix/machines.scm overdrive1’ on
>>>> berlin enough to reproduce the issue? If so, we could monitor/strace
>>>> sshd on overdrive1 to get a better understanding of what’s going on.
>>>
>>> It's actually difficult to trigger it; it seems to happen mostly on the
>>> first try after a long time without connecting to the machine; on the
>>> 2nd and later tries, everything is smooth. Waiting a few minutes is not
>>> enough to re-trigger the problem.
>>>
>>> I've managed to see the problem a few lucky times with:
>>>
>>> --8<---------------cut here---------------start------------->8---
>>> while true; do guix offload test /etc/guix/machines.scm overdrive1; done
>>> --8<---------------cut here---------------end--------------->8---
>>
>> I used to be able to reproduce it by inducing a high load on the target
>> machine and just let Guix keep trying to connect. But now I did that,
>> and set overload threshold to 0.0 for good measure, and Guix has been
>> waiting patiently for two hours without failure.
>>
>> So AFAICT this bug has been fixed. Perhaps Berlin or the Overdrive
>> simply needs to be updated?
>
> Ah! Do you have root access to overdrive1? It'd be interesting to
> reconfigure it to update the guix-daemon and see if the problem
> vanishes.
Good news, this seems resolved with the newer Guile-SSH 0.15.1, where
long delays to return some output no longer triggers an EOF response
(instead now the client waits still). I believe it was fixed by this
commit [0].
Many thanks to Artyom Poptsov for fixing it!
Closing.
Maxim
[0] https://github.com/artyom-poptsov/guile-ssh/commit/fefaab9e925d015b01abc7c76ea4017c373ad895
This bug report was last modified 3 years and 54 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.