GNU bug report logs - #33239
'guix offload' regularly hangs in 'channel-get-exit-status' call

Previous Next

Package: guix;

Reported by: ludo <at> gnu.org (Ludovic Courtès)

Date: Fri, 2 Nov 2018 10:58:02 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: ludo <at> gnu.org (Ludovic Courtès)
Subject: bug#33239: closed (Re: bug#33239: 'guix offload' regularly hangs
 in 'channel-get-exit-status' call)
Date: Wed, 09 Jan 2019 20:38:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#33239: 'guix offload' regularly hangs in 'channel-get-exit-status' call

which was filed against the guix package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 33239 <at> debbugs.gnu.org.

-- 
33239: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=33239
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
To: 33239-done <at> debbugs.gnu.org
Subject: Re: bug#33239: 'guix offload' regularly hangs in
 'channel-get-exit-status' call
Date: Wed, 09 Jan 2019 21:37:08 +0100
Ludovic Courtès <ludo <at> gnu.org> skribis:

> Ludovic Courtès <ludo <at> gnu.org> skribis:
>
>> ludo <at> gnu.org (Ludovic Courtès) skribis:
>>
>>> The ‘guix offload’ processes on berlin regularly hang while calling
>>> ‘channel-get-exit-status’:
>>>
>>> (gdb) bt
>>> #0  0x00007f299fb330f1 in __GI___poll (fds=0x1dd58c0, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
>>> #1  0x00007f2994287577 in ssh_poll_ctx_dopoll () from target:/gnu/store/wmpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4
>>> #2  0x00007f29942884d9 in ssh_handle_packets () from target:/gnu/store/wmpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4
>>> #3  0x00007f29942885ad in ssh_handle_packets_termination () from target:/gnu/store/wmpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4
>>> #4  0x00007f2994275080 in ssh_channel_get_exit_status () from target:/gnu/store/wmpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4
>>> #5  0x00007f29946dd11a in guile_ssh_channel_get_exit_status () from target:/gnu/store/i3nfl17wfx7sryq6w15r9wxl7ilmq4rb-guile-ssh-0.11.3/lib/libguile-ssh.so.11
>>> #6  0x00007f29a1765965 in vm_regular_engine (thread=0x1dd58c0, vp=0x1d4df30, registers=0xffffffff, resume=-1615646479) at vm-engine.c:786
>>
>> I was able to come up with a reduced test case for Guile-SSH:
>>
>>   https://github.com/artyom-poptsov/guile-ssh/issues/11
>
> It turned out that the code to start a REPL server in (ssh dist node)
> would currently hang, as I wrote in the bug report above.
>
> After investigation, I decided that inferiors are more appropriate than
> Guile-SSH’s node to address this use case, after all.  Commit
> ed7b44370f71126087eb953f36aad8dc4c44109f changes ‘guix offload’ to
> inferiors.

It looks like this commit fixed the bug above, so I’m closing it.

There are still occasional hangs in ‘ssh_handle_packets_termination’
though while reading from a channel but AFAICS that’s a different issue.

Ludo’.

[Message part 3 (message/rfc822, inline)]
From: ludo <at> gnu.org (Ludovic Courtès)
To: bug-guix <at> gnu.org
Subject: 'guix offload' regularly hangs in 'channel-get-exit-status' call
Date: Fri, 02 Nov 2018 11:57:06 +0100
Hello,

The ‘guix offload’ processes on berlin regularly hang while calling
‘channel-get-exit-status’:

--8<---------------cut here---------------start------------->8---
(gdb) bt
#0  0x00007f299fb330f1 in __GI___poll (fds=0x1dd58c0, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f2994287577 in ssh_poll_ctx_dopoll () from target:/gnu/store/wmpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4
#2  0x00007f29942884d9 in ssh_handle_packets () from target:/gnu/store/wmpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4
#3  0x00007f29942885ad in ssh_handle_packets_termination () from target:/gnu/store/wmpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4
#4  0x00007f2994275080 in ssh_channel_get_exit_status () from target:/gnu/store/wmpg67bn7i7pqc0p4xjp1npnqixk9znd-libssh-0.7.6/lib/libssh.so.4
#5  0x00007f29946dd11a in guile_ssh_channel_get_exit_status () from target:/gnu/store/i3nfl17wfx7sryq6w15r9wxl7ilmq4rb-guile-ssh-0.11.3/lib/libguile-ssh.so.11
#6  0x00007f29a1765965 in vm_regular_engine (thread=0x1dd58c0, vp=0x1d4df30, registers=0xffffffff, resume=-1615646479) at vm-engine.c:786
#7  0x00007f29a1768fba in scm_call_n (proc=#<program 7f29a1be0030>, argv=argv <at> entry=0x7ffc76b1ece8, nargs=nargs <at> entry=1) at vm.c:1257
#8  0x00007f29a16ecff7 in scm_primitive_eval (
    exp=exp <at> entry=((@ (ice-9 control) %) (begin ((@@ (ice-9 command-line) load/lang) "/gnu/store/zz3b7j4iv6v143v7cqyr77k83zc5n3zw-guix-0.15.0-6.f9a8fce/bin/.guix-real") (main (command-line)) (quit)))) at eval.c:662
#9  0x00007f29a16ed053 in scm_eval (
    exp=((@ (ice-9 control) %) (begin ((@@ (ice-9 command-line) load/lang) "/gnu/store/zz3b7j4iv6v143v7cqyr77k83zc5n3zw-guix-0.15.0-6.f9a8fce/bin/.guix-real") (main (command-line)) (quit))), module_or_state=module_or_state <at> entry="#<struct module>" = {...}) at eval.c:696
#10 0x00007f29a1738220 in scm_shell (argc=11, argv=0x1dd5280) at script.c:454

(gdb) frame 0
#0  0x00007f299fb330f1 in __GI___poll (fds=0x1dd58c0, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29      in ../sysdeps/unix/sysv/linux/poll.c
(gdb) p *fds
$1 = {fd = 14, events = 1, revents = 0}
(gdb) shell ls -l /proc/12605/fd
total 0
lr-x------ 1 root root 64 Nov  2 11:20 0 -> 'pipe:[44413497]'
l-wx------ 1 root root 64 Nov  2 11:33 1 -> 'pipe:[44413496]'
lr-x------ 1 root root 64 Nov  2 11:33 10 -> 'pipe:[44459532]'
l-wx------ 1 root root 64 Nov  2 11:33 11 -> 'pipe:[44459532]'
lr-x------ 1 root root 64 Nov  2 11:33 12 -> 'pipe:[44429590]'
l-wx------ 1 root root 64 Nov  2 11:33 13 -> 'pipe:[44429590]'
lrwx------ 1 root root 64 Nov  2 11:33 14 -> 'socket:[44444783]'
lrwx------ 1 root root 64 Nov  2 11:33 15 -> 'socket:[44444784]'
l-wx------ 1 root root 64 Nov  2 11:33 16 -> /var/guix/offload/141.80.167.140/0
l-wx------ 1 root root 64 Nov  2 11:33 2 -> 'pipe:[44413496]'
lr-x------ 1 root root 64 Nov  2 11:33 3 -> 'pipe:[44459528]'
lr-x------ 1 root root 64 Nov  2 11:33 33 -> /dev/urandom
l-wx------ 1 root root 64 Nov  2 11:33 4 -> 'pipe:[44413498]'
l-wx------ 1 root root 64 Nov  2 11:33 5 -> 'pipe:[44459528]'
lr-x------ 1 root root 64 Nov  2 11:33 6 -> 'pipe:[44459531]'
l-wx------ 1 root root 64 Nov  2 11:33 7 -> 'pipe:[44459531]'
lr-x------ 1 root root 64 Nov  2 11:33 8 -> 'pipe:[44453928]'
l-wx------ 1 root root 64 Nov  2 11:33 9 -> 'pipe:[44453928]'
--8<---------------cut here---------------end--------------->8---

I believe this is because in (guix ssh) we don’t ensure the remote
process is dead by the time we call ‘channel-get-exit-status’, as in
this example:

--8<---------------cut here---------------start------------->8---
scheme@(guix ssh)> (define s (open-ssh-session "localhost" #:user "ludo" #:port 22))
scheme@(guix ssh)> (define c (open-remote-pipe* s OPEN_BOTH "sleep 1000"))
scheme@(guix ssh)> (channel-send-eof c)
$4 = #<undefined>
scheme@(guix ssh)> (channel-get-exit-status c)
;; hangs
--8<---------------cut here---------------end--------------->8---

Problem is that calling ‘channel-get-exit-status’ on a closed port
doesn’t work, so forcing a port close isn’t really an option:

--8<---------------cut here---------------start------------->8---
scheme@(guix ssh)> (define c (open-remote-pipe* s OPEN_BOTH "sleep 100"))
scheme@(guix ssh)> (close-port c)
$4 = #t
scheme@(guix ssh)> (channel-get-exit-status c)
ERROR: In procedure channel-get-exit-status:
In procedure channel-get-exit-status: Wrong type argument in position 1 (expecting open channel): #<unknown channel (freed) 221d5c0>
--8<---------------cut here---------------end--------------->8---

To be continued…

Ludo’.



This bug report was last modified 6 years and 128 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.