GNU bug report logs -
#41625
Sporadic guix-offload crashes due to EOF errors
Previous Next
Reported by: Marius Bakke <marius <at> gnu.org>
Date: Sun, 31 May 2020 09:52:01 UTC
Severity: normal
Done: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
Message #17 received at 41625 <at> debbugs.gnu.org (full text, mbox):
Hi,
Ludovic Courtès <ludo <at> gnu.org> writes:
> Hi,
>
> Marius Bakke <marius <at> gnu.org> skribis:
>
>> Marius Bakke <marius <at> gnu.org> writes:
>>
>>> 'guix offload test' passes without problems.
>>
>> Not so fast, running it in a loop reveals the crash.
>>
>> There is a trace file in /root/offloadtest.trace on Berlin with such an
>> occurence. It looks like a timeout is reached shortly before the EOF
>> error:
>>
>> 10139 poll([{fd=14, events=POLLIN|POLLOUT}], 1, 0) = 1 ([{fd=14, revents=POLLOUT}])
>> 10139 poll([{fd=14, events=POLLIN}], 1, 15000) = 0 (Timeout)
>> 10139 write(2, "Backtrace:\n", 11) = 11
>>
>> This seems to be from a different node than the one reported previously,
>> as the preceding connect() was to this machine:
>>
>> 10139 connect(44, {sa_family=AF_INET, sin_port=htons(22),
>> sin_addr=inet_addr("141.80.167.186")}, 16) = -1 EINPROGRESS
>> (Operation now in progress)
>
> So it looks like ‘connect’ fails and eventually we get an EOF object.
> However, I don’t see where that EOF comes from because the return value
> of ‘connect!’ (the Guile-SSH procedure) is properly checked.
>
> Ludo’.
I got a slightly different backtrace that suggests making the connection
is not at fault, rather it occurs during the read-repl-response call:
--8<---------------cut here---------------start------------->8---
guix offload: testing 1 build machines defined in '/etc/guix/machines.scm'...
Backtrace:
8 (primitive-load "/home/maxim/.config/guix/current/bin/guix")
In guix/ui.scm:
2165:12 7 (run-guix-command _ . _)
In ice-9/boot-9.scm:
1752:10 6 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
1747:15 5 (with-exception-handler #<procedure 7f2caf885780 at ice-9/boot-9.scm:1831:7 (exn)> _ # _ # …)
In guix/scripts/offload.scm:
704:21 4 (check-machine-availability _ _)
In srfi/srfi-1.scm:
586:17 3 (map1 (#<session maxim <at> overdrive1.guix.gnu.org:52522 (connected) 7f2cae396fc0>))
In guix/inferior.scm:
258:2 2 (port->inferior _ _)
240:2 1 (read-repl-response _ _)
In ice-9/boot-9.scm:
1685:16 0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `match-error' with args `("match" "no matching pattern" #<eof>)'.
--8<---------------cut here---------------end--------------->8---
I seem to get this more often than not with the overdrive1 offload
machine.
Maxim
This bug report was last modified 3 years and 53 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.