GNU bug report logs -
#41625
Sporadic guix-offload crashes due to EOF errors
Previous Next
Reported by: Marius Bakke <marius <at> gnu.org>
Date: Sun, 31 May 2020 09:52:01 UTC
Severity: normal
Done: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your bug report
#41625: Sporadic guix-offload crashes due to EOF errors
which was filed against the guix package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 41625 <at> debbugs.gnu.org.
--
41625: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=41625
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
Hello,
Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:
> Hi Marius,
>
> Marius Bakke <marius <at> gnu.org> writes:
>
>> Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skriver:
>>
>>>> Is running ‘guix offload test /etc/guix/machines.scm overdrive1’ on
>>>> berlin enough to reproduce the issue? If so, we could monitor/strace
>>>> sshd on overdrive1 to get a better understanding of what’s going on.
>>>
>>> It's actually difficult to trigger it; it seems to happen mostly on the
>>> first try after a long time without connecting to the machine; on the
>>> 2nd and later tries, everything is smooth. Waiting a few minutes is not
>>> enough to re-trigger the problem.
>>>
>>> I've managed to see the problem a few lucky times with:
>>>
>>> --8<---------------cut here---------------start------------->8---
>>> while true; do guix offload test /etc/guix/machines.scm overdrive1; done
>>> --8<---------------cut here---------------end--------------->8---
>>
>> I used to be able to reproduce it by inducing a high load on the target
>> machine and just let Guix keep trying to connect. But now I did that,
>> and set overload threshold to 0.0 for good measure, and Guix has been
>> waiting patiently for two hours without failure.
>>
>> So AFAICT this bug has been fixed. Perhaps Berlin or the Overdrive
>> simply needs to be updated?
>
> Ah! Do you have root access to overdrive1? It'd be interesting to
> reconfigure it to update the guix-daemon and see if the problem
> vanishes.
Good news, this seems resolved with the newer Guile-SSH 0.15.1, where
long delays to return some output no longer triggers an EOF response
(instead now the client waits still). I believe it was fixed by this
commit [0].
Many thanks to Artyom Poptsov for fixing it!
Closing.
Maxim
[0] https://github.com/artyom-poptsov/guile-ssh/commit/fefaab9e925d015b01abc7c76ea4017c373ad895
[Message part 3 (message/rfc822, inline)]
[Message part 4 (text/plain, inline)]
Hello,
During 'guix build -s aarch64-linux dolphin' on Berlin, I got this crash:
--8<---------------cut here---------------start------------->8---
building /gnu/store/87655bh9rqcr29qasl1c4yj3skmxkyiz-kfilemetadata-5.70.0.drv...
process 12989 acquired build slot '/var/guix/offload/overdrive1.guixsd.org:52522/1'
process 12989 acquired build slot '/var/guix/offload/dover.guix.info:9023/1'
process 12989 acquired build slot '/var/guix/offload/141.80.167.167:22/0'
process 12989 acquired build slot '/var/guix/offload/141.80.167.163:22/0'
process 12989 acquired build slot '/var/guix/offload/localhost:2223/1'
process 12989 acquired build slot '/var/guix/offload/141.80.167.168:22/0'
process 12989 acquired build slot '/var/guix/offload/141.80.167.173:22/0'
process 12989 acquired build slot '/var/guix/offload/141.80.167.176:22/0'
process 12989 acquired build slot '/var/guix/offload/localhost:2222/0'
process 12989 acquired build slot '/var/guix/offload/141.80.167.165:22/0'
process 12989 acquired build slot '/var/guix/offload/141.80.167.169:22/0'
process 12989 acquired build slot '/var/guix/offload/141.80.167.181:22/0'
process 12989 acquired build slot '/var/guix/offload/141.80.167.170:22/0'
process 12989 acquired build slot '/var/guix/offload/141.80.167.174:22/0'
process 12989 acquired build slot '/var/guix/offload/141.80.167.180:22/0'
process 12989 acquired build slot '/var/guix/offload/141.80.167.161:22/0'
Backtrace:
In ice-9/boot-9.scm:
1736:10 5 (with-exception-handler _ _ #:unwind? _ # _)
In unknown file:
4 (apply-smob/0 #<thunk 7f3344d296c0>)
In ice-9/boot-9.scm:
718:2 3 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
In ice-9/eval.scm:
619:8 2 (_ #(#(#<directory (guile-user) 7f3344933f00>)))
In guix/ui.scm:
1936:12 1 (run-guix-command _ . _)
In guix/scripts/offload.scm:
742:22 0 (guix-offload . _)
guix/scripts/offload.scm:742:22: In procedure guix-offload:
Throw to key `match-error' with args `("match" "no matching pattern" #<eof>)'.
guix build: error: unexpected EOF reading a line
--8<---------------cut here---------------end--------------->8---
Which is strange because guix/scripts/offload.scm:742 is wrapped in a
(unless (eof-object? ...)) block.
When this happens, the build command terminates, along with any other
builds that it had started concurrently. Builds from other clients
were unaffected, of course.
I have also seen this occur on my personal offloading setup once every
blue moon, but don't know what could have caused it.
[signature.asc (application/pgp-signature, inline)]
This bug report was last modified 3 years and 110 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.