GNU bug report logs - #58290
guile ssh error on guix deploy

Previous Next

Package: guix;

Reported by: Andrew Tropin <andrew <at> trop.in>

Date: Tue, 4 Oct 2022 11:10:01 UTC

Severity: important

Merged with 56709

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Andrew Tropin <andrew <at> trop.in>
To: Marius Bakke <marius <at> gnu.org>, Ludovic Courtès <ludo <at> gnu.org>
Cc: poptsov.artyom <at> gmail.com, 58290 <at> debbugs.gnu.org
Subject: bug#58290: guile ssh error on guix deploy
Date: Thu, 10 Nov 2022 14:06:33 +0400
[Message part 1 (text/plain, inline)]
On 2022-11-04 00:47, Marius Bakke wrote:

> Ludovic Courtès <ludo <at> gnu.org> skriver:
>
>> Hi,
>>
>> Marius Bakke <marius <at> gnu.org> skribis:
>>
>>> I have the same problem.  Log messages around failure look something
>>> like this (extracted from above message):
>>>
>>>   Oct  4 11:51:49 localhost shepherd[1]: Service sshd-2 has been started. 
>>>   Oct  4 11:51:50 localhost sshd[250]: Accepted publickey for bob from 185.70.53.164 port 1915 ssh2: RSA SHA256:/X3jyhyMizAOKOjCfXK5cLN3Pa0vmi7dbQG+fxK3d3Y
>>>   Oct  4 11:52:28 localhost sshd[252]: error: no more sessions
>>>   Oct  4 11:52:28 localhost shepherd[1]: 1 connection still in use after sshd-2 termination. 
>>>   Oct  4 11:52:28 localhost shepherd[1]: Service sshd-2 has been disabled. 
>>>
>>> Then deploy crashes with 'Channel opening failure'.
>>
>> “no more sessions” comes from this:
>>
>> --8<---------------cut here---------------start------------->8---
>> int
>> session_open(Authctxt *authctxt, int chanid)
>> {
>> 	Session *s = session_new();
>> 	debug("session_open: channel %d", chanid);
>> 	if (s == NULL) {
>> 		error("no more sessions");
>> 		return 0;
>> 	}
>>
>> [...]
>>
>> }
>> --8<---------------cut here---------------end--------------->8---
>>
>> Would be interesting to run sshd in verbose/debug mode and see why we
>> hit that; it could be because the maximum number of sessions has been
>> reached or something.
>
> I enabled (log-level 'debug) and this is what happens server side:
>
> Nov  3 21:48:25 capella sshd[26345]: debug1: permanently_set_uid: 0/0
> Nov  3 21:50:27 capella sshd[26115]: debug1: Received SIGCHLD.
> Nov  3 21:50:27 capella sshd[26115]: debug1: session_by_pid: pid 26345
> Nov  3 21:50:27 capella sshd[26115]: debug1: session_exit_message: session 8 channel 8 pid 26345
> Nov  3 21:50:27 capella sshd[26115]: debug1: session_exit_message: release channel 8
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 65 win 64000 max 32768
> Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: new [server-session]
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_new: session 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: session 0: link with channel 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: confirm session
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_req: channel 0 request exec reply 1
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_by_channel: session 0 channel 0
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_input_channel_req: session 0 req exec
> Nov  3 21:50:28 capella sshd[26535]: debug1: PAM: reinitializing credentials
> Nov  3 21:50:28 capella sshd[26535]: debug1: permanently_set_uid: 0/0
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768
> Nov  3 21:50:28 capella sshd[26115]: debug1: input_session_request
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: new [server-session]
> Nov  3 21:50:28 capella sshd[26115]: debug1: session_open: channel 10
> ["no more sessions" error occurs here, in a different log file]
> Nov  3 21:50:28 capella sshd[26115]: debug1: session open failed, free channel 10
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 10: free: server-session, nchannels 11
> Nov  3 21:50:28 capella sshd[26115]: debug1: server_input_channel_open: failure session
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 0: free: server-session, nchannels 10
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 1: free: server-session, nchannels 9
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 2: free: server-session, nchannels 8
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 3: free: server-session, nchannels 7
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 4: free: server-session, nchannels 6
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 5: free: server-session, nchannels 5
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 6: free: server-session, nchannels 4
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 7: free: server-session, nchannels 3
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 8: free: server-session, nchannels 2
> Nov  3 21:50:28 capella sshd[26115]: debug1: channel 9: free: server-session, nchannels 1
> Nov  3 21:50:28 capella sshd[26115]: debug1: do_cleanup
>
> I compared this with a successful deploy:
>
> Nov  3 21:44:22 capella sshd[25842]: debug1: permanently_set_uid: 0/0
> Nov  3 21:46:25 capella sshd[25612]: debug1: Received SIGCHLD.
> Nov  3 21:46:25 capella sshd[25612]: debug1: session_by_pid: pid 25842
> Nov  3 21:46:25 capella sshd[25612]: debug1: session_exit_message: session 6 channel 6 pid 25842
> Nov  3 21:46:25 capella sshd[25612]: debug1: session_exit_message: release channel 6
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 65 win 64000 max 32768
> Nov  3 21:46:26 capella sshd[25612]: debug1: input_session_request
> Nov  3 21:46:26 capella sshd[25612]: debug1: channel 0: new [server-session]
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_new: session 0
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: channel 0
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: session 0: link with channel 0
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm session
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 0 request exec reply 1
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 0 channel 0
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 0 req exec
> Nov  3 21:46:26 capella sshd[26032]: debug1: PAM: reinitializing credentials
> Nov  3 21:46:26 capella sshd[26032]: debug1: permanently_set_uid: 0/0
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 66 win 64000 max 32768
> Nov  3 21:46:26 capella sshd[25612]: debug1: input_session_request
> Nov  3 21:46:26 capella sshd[25612]: debug1: channel 8: new [server-session]
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_new: session 8
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: channel 8
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_open: session 8: link with channel 8
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_open: confirm session
> Nov  3 21:46:26 capella sshd[25612]: debug1: server_input_channel_req: channel 8 request exec reply 1
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_by_channel: session 8 channel 8
> Nov  3 21:46:26 capella sshd[25612]: debug1: session_input_channel_req: session 8 req exec
> Nov  3 21:46:26 capella sshd[26043]: debug1: PAM: reinitializing credentials
> Nov  3 21:46:26 capella sshd[26043]: debug1: permanently_set_uid: 0/0
> Nov  3 21:46:28 capella sshd[25612]: debug1: chan_shutdown_extended_read: channel 8: (i0 o3 sock -1 wfd -1 efd 13 [read])
> Nov  3 21:46:28 capella sshd[25612]: debug1: server_input_channel_open: ctype session rchan 67 win 64000 max 32768
> Nov  3 21:46:28 capella sshd[25612]: debug1: input_session_request
> Nov  3 21:46:28 capella sshd[25612]: debug1: channel 9: new [server-session]
> Nov  3 21:46:28 capella sshd[25612]: debug1: session_new: session 9
> [... more channels and stuffs before graceful shutdown ...]
>
> I believe the two-minute window before SIGCHLD is waiting for some
> Shepherd services to time out (long story..!).
>
> In my current tests the failure always occur at the
> very end of deploy:
>
> building /gnu/store/19yajyzw4jhnkkz9w0l9gm4az0jxihkc-install-bootloader.scm.drv...
> building /gnu/store/m7ngq5gszyswm6sficinz7ghpra30dl4-remote-exp.scm.drv...
> ;;; [2022/11/03 21:50:28.892606, 0] [GSSH ERROR] Channel opening failure: channel 66 error (2) open failed: #<input-output: channel (closed) 7fdefb015e80>
>
> What's interesting is that all of the failed deploys fail to open
> "channel 10", whereas successful deploys never reach it.  All of these
> are deploying the exact same configuration and commit, so the SSH
> traffic should nearly identical.
>
> I have attached the transcripts of each session (by grepping the PIDs
> from /var/log/debug).  Grepping for session_open yields some
> inconstencies, but haven't studied them in detail yet.
>
>>> Due to the number of SSH connections made by deploy and frequent
>>> occurence of this problem, I was not able to successfully deploy through
>>> many attempts.
>>
>> Ouch.  Normally, thanks to memoization, ‘guix deploy’ opens only one
>> session per machine.  (I think it works well with the 25 local build
>> machines behind berlin; it’s also fine for a small set of machines I
>> take care of at work.)
>
> Have they been rebooted since the switch to inetd-style sshd?  It fails
> more often than not for me.
>
>>> I found that removing the memoizing open-machine-ssh-session* in
>>> (gnu machine ssh) works around it and can happily deploy again.
>>>
>>> (that is, just use 'open-machine-ssh-session' instead)
>>
>> Interesting.  That gives us a hint.  Could you add a ‘pk’ in
>> ‘open-machine-ssh-session*’ to see how many connections it opens?
>
> Indeed it only opens a single connection.  The problem seems to be
> with multiple "channels" and "sessions" over a single connection.
>

Very detailed report, thank you for digging in!

CCed Artyom Poptsov, author of Guile-SSH.

-- 
Best regards,
Andrew Tropin
[signature.asc (application/pgp-signature, inline)]

This bug report was last modified 2 years and 163 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.