GNU bug report logs - #48468
substitute server connection timeout

Previous Next

Package: guix;

Reported by: Mathieu Othacehe <othacehe <at> gnu.org>

Date: Sun, 16 May 2021 17:58:02 UTC

Severity: important

Full log


View this message in rfc822 format

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: 48468 <at> debbugs.gnu.org
Subject: bug#48468: substitute server connection timeout
Date: Sun, 16 May 2021 19:57:49 +0200
Hello,

We recently have a lot of those errors on Cuirass:

--8<---------------cut here---------------start------------->8---
guix substitute: warning: while fetching http://141.80.167.131:5557/nar/g7ka09613k5v1vlznh87yg35905ggw51-python2-scipy-1.2.2-guile-builder: server is somewhat slow
guix substitute: warning: try `--no-substitutes' if the problem persists
guix substitute: error: connect*: Connection timed out
--8<---------------cut here---------------end--------------->8---

which means that the workers are failing to connect to the Cuirass
remote-server publish server on berlin at 141.80.167.131:5557.

Stracing this publish server shows that connection reuse seems to be
broken:

--8<---------------cut here---------------start------------->8---
accept4(9, {sa_family=AF_INET, sin_port=htons(41742), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41744), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41746), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 25
accept4(9, {sa_family=AF_INET, sin_port=htons(41748), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 24
accept4(9, {sa_family=AF_INET, sin_port=htons(41750), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41752), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41754), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 25
accept4(9, {sa_family=AF_INET, sin_port=htons(41756), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41758), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 26
accept4(9, {sa_family=AF_INET, sin_port=htons(41760), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 24
accept4(9, {sa_family=AF_INET, sin_port=htons(41762), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41764), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41766), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41768), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 22
accept4(9, {sa_family=AF_INET, sin_port=htons(41770), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41772), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41774), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41776), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41778), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41780), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41782), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
--8<---------------cut here---------------end--------------->8---

Investigating it, I found that the connection is closed and opened
multiple times in the call-with-cached-connection procedure of the (guix
script substitute) module.

It looks like its because a 'bad-headers exception is raised when trying
to parse an eof object:

--8<---------------cut here---------------start------------->8---
;;; (error bad-header (read-header-line #<eof>))
--8<---------------cut here---------------end--------------->8---

I'm not sure where this eof comes from. There is this comment in the
http-multiple-get procedure in (guix http-client):

--8<---------------cut here---------------start------------->8---
;; Swallow networking errors that could occur due to connection reuse
;; and the like; they will be handled down the road when trying to
;; read responses.
(false-if-networking-error
 (begin
   (for-each (cut write-request <> buffer) batch)
   (put-bytevector p (get))
   (force-output p))))
--8<---------------cut here---------------end--------------->8---

which would suggest that connection reuse could cause networking errors?

What also puzzles me it that the main guix publish server on berlin does
not seem to present this issue. That would indicate that this error is
caused by how the Cuirass remote-server publish server is started or
configured.

Ludo, Chris, any idea?

I will keep searching anyway :)

Thanks,

Mathieu




This bug report was last modified 2 years and 157 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.