From unknown Mon Jun 16 23:31:15 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#48468 <48468@debbugs.gnu.org> To: bug#48468 <48468@debbugs.gnu.org> Subject: Status: substitute server connection timeout Reply-To: bug#48468 <48468@debbugs.gnu.org> Date: Tue, 17 Jun 2025 06:31:15 +0000 retitle 48468 substitute server connection timeout reassign 48468 guix submitter 48468 Mathieu Othacehe severity 48468 important thanks From debbugs-submit-bounces@debbugs.gnu.org Sun May 16 13:57:56 2021 Received: (at submit) by debbugs.gnu.org; 16 May 2021 17:57:56 +0000 Received: from localhost ([127.0.0.1]:51426 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1liL1Y-00024J-5i for submit@debbugs.gnu.org; Sun, 16 May 2021 13:57:56 -0400 Received: from lists.gnu.org ([209.51.188.17]:56062) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1liL1W-00024A-Al for submit@debbugs.gnu.org; Sun, 16 May 2021 13:57:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44674) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1liL1V-0000JG-NQ for bug-guix@gnu.org; Sun, 16 May 2021 13:57:53 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:57356) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1liL1V-0004Qj-Ga for bug-guix@gnu.org; Sun, 16 May 2021 13:57:53 -0400 Received: from [2a01:e34:ed27:e500:a880:d241:574d:84e1] (port=51494 helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1liL1T-0007vp-Vn for bug-guix@gnu.org; Sun, 16 May 2021 13:57:52 -0400 From: Mathieu Othacehe To: bug-guix@gnu.org Subject: substitute server connection timeout Date: Sun, 16 May 2021 19:57:49 +0200 Message-ID: <87lf8e4l42.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hello, We recently have a lot of those errors on Cuirass: --8<---------------cut here---------------start------------->8--- guix substitute: warning: while fetching http://141.80.167.131:5557/nar/g7ka09613k5v1vlznh87yg35905ggw51-python2-scipy-1.2.2-guile-builder: server is somewhat slow guix substitute: warning: try `--no-substitutes' if the problem persists guix substitute: error: connect*: Connection timed out --8<---------------cut here---------------end--------------->8--- which means that the workers are failing to connect to the Cuirass remote-server publish server on berlin at 141.80.167.131:5557. Stracing this publish server shows that connection reuse seems to be broken: --8<---------------cut here---------------start------------->8--- accept4(9, {sa_family=AF_INET, sin_port=htons(41742), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 accept4(9, {sa_family=AF_INET, sin_port=htons(41744), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 accept4(9, {sa_family=AF_INET, sin_port=htons(41746), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 25 accept4(9, {sa_family=AF_INET, sin_port=htons(41748), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 24 accept4(9, {sa_family=AF_INET, sin_port=htons(41750), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 accept4(9, {sa_family=AF_INET, sin_port=htons(41752), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 accept4(9, {sa_family=AF_INET, sin_port=htons(41754), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 25 accept4(9, {sa_family=AF_INET, sin_port=htons(41756), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 accept4(9, {sa_family=AF_INET, sin_port=htons(41758), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 26 accept4(9, {sa_family=AF_INET, sin_port=htons(41760), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 24 accept4(9, {sa_family=AF_INET, sin_port=htons(41762), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 accept4(9, {sa_family=AF_INET, sin_port=htons(41764), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 accept4(9, {sa_family=AF_INET, sin_port=htons(41766), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 accept4(9, {sa_family=AF_INET, sin_port=htons(41768), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 22 accept4(9, {sa_family=AF_INET, sin_port=htons(41770), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 accept4(9, {sa_family=AF_INET, sin_port=htons(41772), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 accept4(9, {sa_family=AF_INET, sin_port=htons(41774), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 accept4(9, {sa_family=AF_INET, sin_port=htons(41776), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 accept4(9, {sa_family=AF_INET, sin_port=htons(41778), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 accept4(9, {sa_family=AF_INET, sin_port=htons(41780), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 accept4(9, {sa_family=AF_INET, sin_port=htons(41782), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 --8<---------------cut here---------------end--------------->8--- Investigating it, I found that the connection is closed and opened multiple times in the call-with-cached-connection procedure of the (guix script substitute) module. It looks like its because a 'bad-headers exception is raised when trying to parse an eof object: --8<---------------cut here---------------start------------->8--- ;;; (error bad-header (read-header-line #)) --8<---------------cut here---------------end--------------->8--- I'm not sure where this eof comes from. There is this comment in the http-multiple-get procedure in (guix http-client): --8<---------------cut here---------------start------------->8--- ;; Swallow networking errors that could occur due to connection reuse ;; and the like; they will be handled down the road when trying to ;; read responses. (false-if-networking-error (begin (for-each (cut write-request <> buffer) batch) (put-bytevector p (get)) (force-output p)))) --8<---------------cut here---------------end--------------->8--- which would suggest that connection reuse could cause networking errors? What also puzzles me it that the main guix publish server on berlin does not seem to present this issue. That would indicate that this error is caused by how the Cuirass remote-server publish server is started or configured. Ludo, Chris, any idea? I will keep searching anyway :) Thanks, Mathieu From debbugs-submit-bounces@debbugs.gnu.org Sun May 16 14:26:32 2021 Received: (at submit) by debbugs.gnu.org; 16 May 2021 18:26:32 +0000 Received: from localhost ([127.0.0.1]:51458 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1liLTE-0002n9-5G for submit@debbugs.gnu.org; Sun, 16 May 2021 14:26:32 -0400 Received: from lists.gnu.org ([209.51.188.17]:40602) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1liLTD-0002n2-2G for submit@debbugs.gnu.org; Sun, 16 May 2021 14:26:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51284) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1liLTC-00028q-TL for bug-guix@gnu.org; Sun, 16 May 2021 14:26:30 -0400 Received: from mira.cbaines.net ([212.71.252.8]:56692) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1liLTA-0005Ms-Op; Sun, 16 May 2021 14:26:30 -0400 Received: from localhost (unknown [IPv6:2a02:8010:68c1:0:8ac0:b4c7:f5c8:7caa]) by mira.cbaines.net (Postfix) with ESMTPSA id 64FE527BC78; Sun, 16 May 2021 19:26:26 +0100 (BST) Received: from capella (localhost [127.0.0.1]) by localhost (OpenSMTPD) with ESMTP id 835715d7; Sun, 16 May 2021 18:26:25 +0000 (UTC) References: <87lf8e4l42.fsf@gnu.org> User-agent: mu4e 1.4.15; emacs 27.1 From: Christopher Baines To: Mathieu Othacehe Subject: Re: bug#48468: substitute server connection timeout In-reply-to: <87lf8e4l42.fsf@gnu.org> Date: Sun, 16 May 2021 19:26:22 +0100 Message-ID: <87eee6cz75.fsf@cbaines.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" Received-SPF: pass client-ip=212.71.252.8; envelope-from=mail@cbaines.net; helo=mira.cbaines.net X-Spam_score_int: -6 X-Spam_score: -0.7 X-Spam_bar: / X-Spam_report: (-0.7 / 5.0 requ) BAYES_00=-1.9, NORMAL_HTTP_TO_IP=0.001, NUMERIC_HTTP_ADDR=1.242, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, WEIRD_PORT=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: submit Cc: bug-guix@gnu.org, 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) --=-=-= Content-Type: text/plain Mathieu Othacehe writes: > Hello, > > We recently have a lot of those errors on Cuirass: > > --8<---------------cut here---------------start------------->8--- > guix substitute: warning: while fetching http://141.80.167.131:5557/nar/g7ka09613k5v1vlznh87yg35905ggw51-python2-scipy-1.2.2-guile-builder: server is somewhat slow > guix substitute: warning: try `--no-substitutes' if the problem persists > guix substitute: error: connect*: Connection timed out > --8<---------------cut here---------------end--------------->8--- > > which means that the workers are failing to connect to the Cuirass > remote-server publish server on berlin at 141.80.167.131:5557. > > Stracing this publish server shows that connection reuse seems to be > broken: > > --8<---------------cut here---------------start------------->8--- > accept4(9, {sa_family=AF_INET, sin_port=htons(41742), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41744), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41746), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 25 > accept4(9, {sa_family=AF_INET, sin_port=htons(41748), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 24 > accept4(9, {sa_family=AF_INET, sin_port=htons(41750), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41752), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41754), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 25 > accept4(9, {sa_family=AF_INET, sin_port=htons(41756), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41758), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 26 > accept4(9, {sa_family=AF_INET, sin_port=htons(41760), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 24 > accept4(9, {sa_family=AF_INET, sin_port=htons(41762), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41764), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41766), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41768), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 22 > accept4(9, {sa_family=AF_INET, sin_port=htons(41770), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41772), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41774), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41776), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41778), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41780), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > accept4(9, {sa_family=AF_INET, sin_port=htons(41782), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21 > --8<---------------cut here---------------end--------------->8--- > > Investigating it, I found that the connection is closed and opened > multiple times in the call-with-cached-connection procedure of the (guix > script substitute) module. > > It looks like its because a 'bad-headers exception is raised when trying > to parse an eof object: > > --8<---------------cut here---------------start------------->8--- > ;;; (error bad-header (read-header-line #)) > --8<---------------cut here---------------end--------------->8--- > > I'm not sure where this eof comes from. There is this comment in the > http-multiple-get procedure in (guix http-client): > > --8<---------------cut here---------------start------------->8--- > ;; Swallow networking errors that could occur due to connection reuse > ;; and the like; they will be handled down the road when trying to > ;; read responses. > (false-if-networking-error > (begin > (for-each (cut write-request <> buffer) batch) > (put-bytevector p (get)) > (force-output p)))) > --8<---------------cut here---------------end--------------->8--- > > which would suggest that connection reuse could cause networking errors? > > What also puzzles me it that the main guix publish server on berlin does > not seem to present this issue. That would indicate that this error is > caused by how the Cuirass remote-server publish server is started or > configured. > > Ludo, Chris, any idea? While I've been working in this area, I've actually been trying to pick apart the connection caching, since the single thread assumption doesn't hold in the Guix Build Coordinator. Anyway, I do have a theory. Assuming I'm correct in saying that there's no nginx between the client and publish server here, I think that's your configuration difference. For ci.guix.gnu.org, as well as data.guix.gnu.org, it's NGinx which is keeping connections alive. I'm not sure the Guile code for the publish server does similarly, so talking to it directly might be different. That's on the server side, the actual problem is probably on the client side, as I guess there are possibly places where closed connections aren't handled properly. This reminds me I sent some patches relating to closing connections, this could well be related [1]. 1: https://issues.guix.gnu.org/47174 --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmChY89fFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh aW5lcy5uZXQACgkQXiijOwuE9XfG9w/+KvzCHwfnB+eeEd6i0xTvTfc8+oLxb+GS j1TSbXSGR02ByGMGIbJeAMmNPV20V9Lzi24a49451fskUZbCGeg6aS2rO3c9QpUX jbF7//NW8zwgmhO/Hh9dvB4v+q01+VwrzE3c6ssAVO6772ZrTJqGA7oCU9UqD1w6 dQYUto8iSxa95UhFcNFMtGcBAevdoN4Ku9OpkqIt8uViCED+YM6a9Ej4d9f9oC8h vxw6EU0YsCrobJQJ8KW56N9uR4oeI4IqWevou2LbSAmcHEWmGHAF6XDgxu0594Ip YOFuqso4nuRxU3ffzgEupMbJA07yrwf0EB4lreqsSfU8gtZupFMLgyc8PviyXjw3 XPpo3PAukKhsVZyQlnTc6ldatWWVHIeTPy8vxdr6GvSQnzliaQmzpHj7xVqkfS3f R1gz5hEXoS0o/CXpVkpnaYMC8m+Y2ujM5AbEUBVo6itR6mmUCCF5D9q/BtosjQvu 8wrLes42iAqwlnNeg8UGKra/cwJ0YGVrOInsOyXX97jz3UVjkxV7jq53+vkw3fGZ eyHDugAJ8DKw9js8BX88AmXaV/Art4M0qG6D6Zp+MvIUbU9z+oNeTjAXgGewxntw sfIdxHju+DNs5x5je3pTzyRSDPEfle8Xvd58DbGyAIoP8v7C7LYPmxVID7R0D/cs ccIQKCj/kqg= =yVQQ -----END PGP SIGNATURE----- --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Mon May 17 10:49:59 2021 Received: (at submit) by debbugs.gnu.org; 17 May 2021 14:49:59 +0000 Received: from localhost ([127.0.0.1]:53395 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lieZD-00084T-38 for submit@debbugs.gnu.org; Mon, 17 May 2021 10:49:59 -0400 Received: from lists.gnu.org ([209.51.188.17]:36806) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lieZ9-00084K-Q9 for submit@debbugs.gnu.org; Mon, 17 May 2021 10:49:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34242) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lieZ9-0007kn-Dt for bug-guix@gnu.org; Mon, 17 May 2021 10:49:55 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:45226) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lieZ8-0003sN-6g; Mon, 17 May 2021 10:49:55 -0400 Received: from [109.190.253.11] (port=55430 helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lieZ7-0002KA-Mq; Mon, 17 May 2021 10:49:54 -0400 From: Mathieu Othacehe To: Christopher Baines Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87eee6cz75.fsf@cbaines.net> Date: Mon, 17 May 2021 16:49:49 +0200 In-Reply-To: <87eee6cz75.fsf@cbaines.net> (Christopher Baines's message of "Sun, 16 May 2021 19:26:22 +0100") Message-ID: <87eee59zzm.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit Cc: bug-guix@gnu.org, 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hey, > That's on the server side, the actual problem is probably on the client > side, as I guess there are possibly places where closed connections > aren't handled properly. This reminds me I sent some patches relating to > closing connections, this could well be related [1]. Oh, you're right, the Nginx server probably makes the difference here. The http-write procedure of the Guile (web server http) module seems to handle keep-alive connections. However, the wrapping http-write procedure of (guix scripts publish) may not. I'll have a closer look, thanks for your help. Mathieu From debbugs-submit-bounces@debbugs.gnu.org Tue May 18 10:35:30 2021 Received: (at submit) by debbugs.gnu.org; 18 May 2021 14:35:30 +0000 Received: from localhost ([127.0.0.1]:56844 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lj0ok-0007kx-5f for submit@debbugs.gnu.org; Tue, 18 May 2021 10:35:30 -0400 Received: from lists.gnu.org ([209.51.188.17]:35134) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lj0og-0007km-Ml for submit@debbugs.gnu.org; Tue, 18 May 2021 10:35:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51018) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lj0og-0003Ay-F8 for bug-guix@gnu.org; Tue, 18 May 2021 10:35:26 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:37814) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lj0of-0005tX-Go; Tue, 18 May 2021 10:35:25 -0400 Received: from [2a01:e0a:19b:d9a0:f2f7:a404:c3d3:f8b4] (port=34756 helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lj0oe-0002Rn-CL; Tue, 18 May 2021 10:35:24 -0400 From: Mathieu Othacehe To: Christopher Baines Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87eee6cz75.fsf@cbaines.net> <87eee59zzm.fsf@gnu.org> Date: Tue, 18 May 2021 16:35:17 +0200 In-Reply-To: <87eee59zzm.fsf@gnu.org> (Mathieu Othacehe's message of "Mon, 17 May 2021 16:49:49 +0200") Message-ID: <878s4c85zu.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit Cc: Ludovic =?utf-8?Q?Court?= =?utf-8?Q?=C3=A8s?= , bug-guix@gnu.org, 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hey, > I'll have a closer look, thanks for your help. So this snippet in the http-write procedure of the (guix scripts publish) module: --8<---------------cut here---------------start------------->8--- (swallow-zlib-error (close-port port)) --8<---------------cut here---------------end--------------->8--- is closing the client port unconditionally, which means that guix publish cannot keep connections alive, unless sitting behind an Nginx proxy. I'm trying to turn the close-port call into a maybe-close-port with the following procedure: --8<---------------cut here---------------start------------->8--- (define (maybe-close-port port) (cond ((keep-alive? response) (poll-set-add! (http-poll-set server) port *events*)) (else (close-port port)))) --8<---------------cut here---------------end--------------->8--- however this is terribly hacky, as I need to access the private poll-set from (web server http). Ludo, do you have a better idea? Thanks, Mathieu From debbugs-submit-bounces@debbugs.gnu.org Fri May 21 09:30:30 2021 Received: (at 48468) by debbugs.gnu.org; 21 May 2021 13:30:31 +0000 Received: from localhost ([127.0.0.1]:36065 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lk5EU-0001jq-Jk for submit@debbugs.gnu.org; Fri, 21 May 2021 09:30:30 -0400 Received: from eggs.gnu.org ([209.51.188.92]:38876) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lk5EM-0001jV-EN for 48468@debbugs.gnu.org; Fri, 21 May 2021 09:30:28 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:43390) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lk5EH-00057D-04; Fri, 21 May 2021 09:30:17 -0400 Received: from [2a01:e0a:19b:d9a0:f2f7:a404:c3d3:f8b4] (port=50402 helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lk5EG-0003ny-MQ; Fri, 21 May 2021 09:30:16 -0400 From: Mathieu Othacehe To: Christopher Baines Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87eee6cz75.fsf@cbaines.net> <87eee59zzm.fsf@gnu.org> <878s4c85zu.fsf@gnu.org> Date: Fri, 21 May 2021 15:30:15 +0200 In-Reply-To: <878s4c85zu.fsf@gnu.org> (Mathieu Othacehe's message of "Tue, 18 May 2021 16:35:17 +0200") Message-ID: <87cztkqkns.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: ludo@gnu.org, 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hey, I posted a patchset adding keep-alive support to guix publish earlier: https://issues.guix.gnu.org/48556. Thanks, Mathieu From debbugs-submit-bounces@debbugs.gnu.org Sat May 29 17:04:55 2021 Received: (at control) by debbugs.gnu.org; 29 May 2021 21:04:55 +0000 Received: from localhost ([127.0.0.1]:57815 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ln68c-0001Ai-Vh for submit@debbugs.gnu.org; Sat, 29 May 2021 17:04:55 -0400 Received: from eggs.gnu.org ([209.51.188.92]:35536) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ln68b-0001AV-Ps for control@debbugs.gnu.org; Sat, 29 May 2021 17:04:54 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:54224) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ln68W-0001sw-LS for control@debbugs.gnu.org; Sat, 29 May 2021 17:04:48 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=41512 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ln68W-0005ub-DB for control@debbugs.gnu.org; Sat, 29 May 2021 17:04:48 -0400 Date: Sat, 29 May 2021 23:04:47 +0200 Message-Id: <877djh5k0g.fsf@gnu.org> To: control@debbugs.gnu.org From: =?utf-8?Q?Ludovic_Court=C3=A8s?= Subject: control message for bug #48468 MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) severity 48468 important quit From debbugs-submit-bounces@debbugs.gnu.org Sat May 29 17:44:43 2021 Received: (at 48468) by debbugs.gnu.org; 29 May 2021 21:44:43 +0000 Received: from localhost ([127.0.0.1]:57852 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ln6l8-0004KU-TX for submit@debbugs.gnu.org; Sat, 29 May 2021 17:44:43 -0400 Received: from eggs.gnu.org ([209.51.188.92]:40360) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ln6l8-0004KI-BF for 48468@debbugs.gnu.org; Sat, 29 May 2021 17:44:42 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:55892) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ln6l3-0001I6-1J; Sat, 29 May 2021 17:44:37 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=41666 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ln6l2-0003Ey-Nx; Sat, 29 May 2021 17:44:36 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Christopher Baines Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87eee6cz75.fsf@cbaines.net> Date: Sat, 29 May 2021 23:44:35 +0200 In-Reply-To: <87eee6cz75.fsf@cbaines.net> (Christopher Baines's message of "Sun, 16 May 2021 19:26:22 +0100") Message-ID: <87v97143lo.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: Mathieu Othacehe , 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Christopher Baines skribis: > Mathieu Othacehe writes: [...] >> Stracing this publish server shows that connection reuse seems to be >> broken: >> >> --8<---------------cut here---------------start------------->8--- >> accept4(9, {sa_family=3DAF_INET, sin_port=3Dhtons(41742), sin_addr=3Dine= t_addr("141.80.167.185")}, [112->16], 0) =3D 21 >> accept4(9, {sa_family=3DAF_INET, sin_port=3Dhtons(41744), sin_addr=3Dine= t_addr("141.80.167.185")}, [112->16], 0) =3D 21 >> accept4(9, {sa_family=3DAF_INET, sin_port=3Dhtons(41746), sin_addr=3Dine= t_addr("141.80.167.185")}, [112->16], 0) =3D 25 >> accept4(9, {sa_family=3DAF_INET, sin_port=3Dhtons(41748), sin_addr=3Dine= t_addr("141.80.167.185")}, [112->16], 0) =3D 24 >> accept4(9, {sa_family=3DAF_INET, sin_port=3Dhtons(41750), sin_addr=3Dine= t_addr("141.80.167.185")}, [112->16], 0) =3D 21 Ouch. >> Investigating it, I found that the connection is closed and opened >> multiple times in the call-with-cached-connection procedure of the (guix >> script substitute) module. >> >> It looks like its because a 'bad-headers exception is raised when trying >> to parse an eof object: >> >> --8<---------------cut here---------------start------------->8--- >> ;;; (error bad-header (read-header-line #)) >> --8<---------------cut here---------------end--------------->8--- >> >> I'm not sure where this eof comes from. There is this comment in the >> http-multiple-get procedure in (guix http-client): >> >> --8<---------------cut here---------------start------------->8--- >> ;; Swallow networking errors that could occur due to connection reuse >> ;; and the like; they will be handled down the road when trying to >> ;; read responses. >> (false-if-networking-error >> (begin >> (for-each (cut write-request <> buffer) batch) >> (put-bytevector p (get)) >> (force-output p)))) >> --8<---------------cut here---------------end--------------->8--- >> >> which would suggest that connection reuse could cause networking errors? [...] > That's on the server side, the actual problem is probably on the client > side, as I guess there are possibly places where closed connections > aren't handled properly. D=E2=80=99oh. So we should be able to use it by talking to a bare =E2=80= =98guix publish=E2=80=99? I tried and failed to reproduce it, but that=E2=80=99s p= robably not deterministic. > This reminds me I sent some patches relating to closing connections, > this could well be related [1]. > > 1: https://issues.guix.gnu.org/47174 This one will properly close connections when the servers asks for it, but would that make a difference? Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Fri Jun 18 08:33:31 2021 Received: (at 48468) by debbugs.gnu.org; 18 Jun 2021 12:33:31 +0000 Received: from localhost ([127.0.0.1]:56007 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1luDgh-000360-EP for submit@debbugs.gnu.org; Fri, 18 Jun 2021 08:33:31 -0400 Received: from eggs.gnu.org ([209.51.188.92]:58296) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1luDgf-00035o-Ah for 48468@debbugs.gnu.org; Fri, 18 Jun 2021 08:33:29 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:45758) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1luDga-0002no-4Q for 48468@debbugs.gnu.org; Fri, 18 Jun 2021 08:33:24 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=54544 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1luDgZ-0004v0-0B; Fri, 18 Jun 2021 08:33:23 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Mathieu Othacehe Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> Date: Fri, 18 Jun 2021 14:33:21 +0200 In-Reply-To: <87lf8e4l42.fsf@gnu.org> (Mathieu Othacehe's message of "Sun, 16 May 2021 19:57:49 +0200") Message-ID: <87im2bfj3i.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Mathieu Othacehe skribis: > Investigating it, I found that the connection is closed and opened > multiple times in the call-with-cached-connection procedure of the (guix > script substitute) module. > > It looks like its because a 'bad-headers exception is raised when trying > to parse an eof object: > > ;;; (error bad-header (read-header-line #)) > > > I'm not sure where this eof comes from. There is this comment in the > http-multiple-get procedure in (guix http-client): > > ;; Swallow networking errors that could occur due to connection reuse > ;; and the like; they will be handled down the road when trying to > ;; read responses. > (false-if-networking-error > (begin > (for-each (cut write-request <> buffer) batch) > (put-bytevector p (get)) > (force-output p)))) > > which would suggest that connection reuse could cause networking errors? Trying to see exactly which bit is at fault here. First, =E2=80=98http-multiple-get=E2=80=99 seems to be working as expected when pa= ssed a fresh connection: --8<---------------cut here---------------start------------->8--- scheme@(guix http-client)> (http-multiple-get (string->uri "https://ci.guix= .gnu.org") (lambda (request response port result) (get-bytevector-n port (response-content-length response)) (cons #t result)) '() (make-list 5000 (build-request (build-uri 'https #:host "ci.guix.gnu.org" #:path "/d7gpr41qpsifri54vp2lzjs99zkylscq.narinfo")))) connecting (5000 requests left)...connecting (4400 requests left)...connect= ing (3800 requests left)...connecting (3200 requests left)...connecting (26= 00 requests left)...connecting (2000 requests left)...connecting (1400 requ= ests left)...connecting (800 requests left)...connecting (200 requests left= )...$143 =3D (#t #t =E2=80=A6) scheme@(guix http-client)> (length $143) $144 =3D 5000 --8<---------------cut here---------------end--------------->8--- But maybe I=E2=80=99m not looking at the right thing. Do you have evidence or a reproducer? Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Tue Jun 29 12:49:37 2021 Received: (at 48468) by debbugs.gnu.org; 29 Jun 2021 16:49:37 +0000 Received: from localhost ([127.0.0.1]:55196 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lyGvY-00024d-RT for submit@debbugs.gnu.org; Tue, 29 Jun 2021 12:49:37 -0400 Received: from eggs.gnu.org ([209.51.188.92]:34882) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lyGvY-00024S-3x for 48468@debbugs.gnu.org; Tue, 29 Jun 2021 12:49:36 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:34964) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lyGvQ-0001sO-LC; Tue, 29 Jun 2021 12:49:29 -0400 Received: from [2a01:e0a:19b:d9a0:f2f7:a404:c3d3:f8b4] (port=51100 helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lyGvQ-0002tg-DC; Tue, 29 Jun 2021 12:49:28 -0400 From: Mathieu Othacehe To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87im2bfj3i.fsf@gnu.org> Date: Tue, 29 Jun 2021 18:49:26 +0200 In-Reply-To: <87im2bfj3i.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Fri, 18 Jun 2021 14:33:21 +0200") Message-ID: <87sg108vl5.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hey, > But maybe I=E2=80=99m not looking at the right thing. > > Do you have evidence or a reproducer? Yes, adding the following debug message here: --8<---------------cut here---------------start------------->8--- --- a/guix/scripts/substitute.scm +++ b/guix/scripts/substitute.scm @@ -416,6 +416,7 @@ server certificates." ;; and retry. We might also get 'bad-response or a similar ;; exception from (web response) later on, once we've sent the ;; request, or a ERROR/INVALID-SESSION from GnuTLS. + (pk key args) (if (or (and (eq? key 'system-error) (=3D EPIPE (system-error-errno `(,key ,@args)))) (and (eq? key 'gnutls-error) --8<---------------cut here---------------end--------------->8--- and using substitutes from a publish server without keep alive support (before 0b8fa24), prints something like: --8<---------------cut here---------------start------------->8--- downloading from http://192.168.1.51:8080/nar/gzip/lkmiyfzi1bq571yn4gypbcs5= vn4fpma7-texlive-cm-51265 ... texlive-cm-51265 = = = 7.8MiB/s 00:00 | 1.7MiB transferred substituting /gnu/store/l788x07ska5vffayz0gayv4hsx5flxal-module-import-comp= iled... ;;; (bad-header (read-header-line #)) downloading from http://192.168.1.51:8080/nar/gzip/l788x07ska5vffayz0gayv4h= sx5flxal-module-import-compiled ... module-import-compiled = = = 7.5MiB/s 00:00 | 85KiB transferred substituting /gnu/store/1s1lrnxlkjwxshk5q2w97ig3clc6n06f-ruby-2.6.5... ;;; (bad-header (read-header-line #)) --8<---------------cut here---------------end--------------->8--- That's because the connection is closed by the publish server each time a NAR is sent. This particular behaviour is fixed by: https://issues.guix.gnu.org/48556. I hoped that it would decrease the load of the Cuirass publish server that was flooded by worker connections. While the situation is better, there are still a lot of substitute timeout errors on Cuirass. I have also observed those timeouts using the publish server behind ci.guix.gnu.org. This makes me think that there's something else that is preventing the server from honoring every connection request in less than 5 seconds (the timeout duration). Thanks, Mathieu From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 05 08:21:36 2022 Received: (at 48468) by debbugs.gnu.org; 5 Dec 2022 13:21:36 +0000 Received: from localhost ([127.0.0.1]:35994 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1p2BPc-0000p9-4z for submit@debbugs.gnu.org; Mon, 05 Dec 2022 08:21:36 -0500 Received: from eggs.gnu.org ([209.51.188.92]:42810) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1p2BPa-0000p3-M6 for 48468@debbugs.gnu.org; Mon, 05 Dec 2022 08:21:35 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p2BPS-0007KC-PN for 48468@debbugs.gnu.org; Mon, 05 Dec 2022 08:21:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=QdgzazPdBaky6oWPFshw7JxEXLHN3x9Z1jEgar2OWp4=; b=Tq3PRLGnbvwXUXkn+Lk9 B5HPonNF5S2wASbMfDEfvPQUEYjRYcGs/Vb2UiTKVQVErsa4yeKa947UwMyAdKHYTaU6aUFRUjv8q 3lc1SYUgsZhpII9RmacW81iqvxh/3CQB0XrS1mipklFkd2P8v0hizfjcnhVjq7J/VzQ5cx4ID5Suq Pl+Ot0mHbAlFJ+4M/gHEWdW5TJpQtA4um4Gdow3HWzUzaiarW+PWfQ5ZH78ews2btAbkmYlIAhH1U mGgOT5B/0W2wFCaWAdlhv3XEq8uT/Ap1CARcP/lUF6LgrC939OUpe5z7+qWhAouyjr8CK3YnwJC6k HZWJ+mNW5gO9sw==; Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p2BPR-0006qF-UV; Mon, 05 Dec 2022 08:21:26 -0500 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Mathieu Othacehe Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87im2bfj3i.fsf@gnu.org> <87sg108vl5.fsf@gnu.org> Date: Mon, 05 Dec 2022 14:21:23 +0100 In-Reply-To: <87sg108vl5.fsf@gnu.org> (Mathieu Othacehe's message of "Tue, 29 Jun 2021 18:49:26 +0200") Message-ID: <87lenmrnos.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Mathieu Othacehe skribis: > I hoped that it would decrease the load of the Cuirass publish server > that was flooded by worker connections. While the situation is better, > there are still a lot of substitute timeout errors on Cuirass. > > I have also observed those timeouts using the publish server behind > ci.guix.gnu.org. This makes me think that there's something else that is > preventing the server from honoring every connection request in less > than 5 seconds (the timeout duration). I don=E2=80=99t see this when substituting from https://ci.guix.gnu.org the= se days. You mentioned on IRC that nginx logs show that =E2=80=98guix publish=E2=80= =99 times out. Looking at /var/log/nginx/error.log, I see =E2=80=9CConnection reset by pee= r=E2=80=9D and =E2=80=9CBroken pipe=E2=80=9D, which could indicate that the client clo= sed the connection (which was open) prematurely, maybe due to an internal timeout. What I=E2=80=99d like to know is whether those timeouts you mention appear during connection establishment (connect(2) on the client side doesn=E2=80= =99t complete within 5 seconds) or after connection establishment. Connection establishment is handled by the kernel and =E2=80=98guix publish= =E2=80=99 is not involved. However, the listen(2) call made in (web server http) sets a maximum backlog of 128 connections; if at some point 128 connections are already queued, then the 129th client will have connect(2) take some time. Seems unlikely, but who knows. Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 07 04:44:19 2022 Received: (at 48468) by debbugs.gnu.org; 7 Dec 2022 09:44:19 +0000 Received: from localhost ([127.0.0.1]:48909 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1p2qyQ-0001cx-SL for submit@debbugs.gnu.org; Wed, 07 Dec 2022 04:44:19 -0500 Received: from eggs.gnu.org ([209.51.188.92]:60034) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1p2qyN-0001cr-Av for 48468@debbugs.gnu.org; Wed, 07 Dec 2022 04:44:17 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p2qyH-00045e-Do; Wed, 07 Dec 2022 04:44:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=9xYEt4HDXobSZZuvXN7OeOueQs4B+Q050cNQ+t23BMU=; b=J5g3m6XVh69+J0o9HdVS XyMdTuApKqnAkH2l6dVuJZNt2dxK8ACt99ixsNi02All0LRjr0o7ExMth6hPfTL0a470gmF8opUC6 ge3ukhNWtnkFQ01kOWb1jLDAIV6uF4bvSOEzKUJm2bbpuh70FtFzR9F9CcoQFZl+V0q7SQR83Ev7E b9rFP/dzaSw7M5+f9TAhtSgifKlQDqJsZhhXkPHLhI3eUwYBpnn2J6HVk5Ze9y8YBKcjUDSK9RDal 7V0uLyqyRNSM6YuiGxZyWT0Ttr17EG9nq47IHUfCjCetWAQgBVOEX162KMIKQ6b1Gpl7dDUK50xvL lP52ox9k7f2DBA==; Received: from 2a02-8429-81d2-3d01-94c9-8097-ea5c-2774.rev.sfr.net ([2a02:8429:81d2:3d01:94c9:8097:ea5c:2774] helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p2qyC-0005rX-DF; Wed, 07 Dec 2022 04:44:08 -0500 From: Mathieu Othacehe To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87im2bfj3i.fsf@gnu.org> <87sg108vl5.fsf@gnu.org> <87lenmrnos.fsf@gnu.org> Date: Wed, 07 Dec 2022 10:44:01 +0100 In-Reply-To: <87lenmrnos.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Mon, 05 Dec 2022 14:21:23 +0100") Message-ID: <87cz8v8s66.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hello, > You mentioned on IRC that nginx logs show that =E2=80=98guix publish=E2= =80=99 times out. > Looking at /var/log/nginx/error.log, I see =E2=80=9CConnection reset by p= eer=E2=80=9D > and =E2=80=9CBroken pipe=E2=80=9D, which could indicate that the client c= losed the > connection (which was open) prematurely, maybe due to an internal > timeout. Could it be that the client is receiving 404 because the baking of some NAR was deferred to a worker, and then it closes the connection? I think that's what I had in mind with the patch 2/2 of this patchset: https://issues.guix.gnu.org/50040. Thanks, Mathieu From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 07 08:38:38 2022 Received: (at 48468) by debbugs.gnu.org; 7 Dec 2022 13:38:38 +0000 Received: from localhost ([127.0.0.1]:50421 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1p2udC-0000vj-9k for submit@debbugs.gnu.org; Wed, 07 Dec 2022 08:38:38 -0500 Received: from eggs.gnu.org ([209.51.188.92]:45806) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1p2udA-0000vd-Bd for 48468@debbugs.gnu.org; Wed, 07 Dec 2022 08:38:37 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p2ud5-0006eW-1B for 48468@debbugs.gnu.org; Wed, 07 Dec 2022 08:38:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=VMpl2g1kW8uavHNeZi6SkTR2Agv4PQwDjU2wki/AnD8=; b=XwMPTTe/8xc9QlNfP/b9 e+awrw7HWTl7bp9yYqWy6gtXg+4D7v0TBkxRIHfZAW7LwnUk0D933R79koxRm3pmPlbhg4RJ/BPL/ x0+x9Fb8dmYUOZqNHx1n0ez2u3PdZ1sg5ax7Fqi9BLpDsx1/3Rzq4XOyhYjx7TMh3IIZWAliQx7vr iIjUCfMddm0WgzJ9BtcU1/f0mLZnVdxDUojmdXCahyntERo05nV3nJO0NvBQnI6zvnvmKzfnplQ4S SFISVBIdVte8laVhNu+IbpOgikPxUpIFTpF911in6naeHirjeG4d8QjRLwB4KG0m1BYyP/8DXi5/a PGl1MRmPJtbAQg==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201] helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p2ud1-0000l0-Na; Wed, 07 Dec 2022 08:38:27 -0500 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Mathieu Othacehe Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87im2bfj3i.fsf@gnu.org> <87sg108vl5.fsf@gnu.org> <87lenmrnos.fsf@gnu.org> <87cz8v8s66.fsf@gnu.org> Date: Wed, 07 Dec 2022 14:38:25 +0100 In-Reply-To: <87cz8v8s66.fsf@gnu.org> (Mathieu Othacehe's message of "Wed, 07 Dec 2022 10:44:01 +0100") Message-ID: <87cz8vjpv2.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Mathieu Othacehe skribis: >> You mentioned on IRC that nginx logs show that =E2=80=98guix publish=E2= =80=99 times out. >> Looking at /var/log/nginx/error.log, I see =E2=80=9CConnection reset by = peer=E2=80=9D >> and =E2=80=9CBroken pipe=E2=80=9D, which could indicate that the client = closed the >> connection (which was open) prematurely, maybe due to an internal >> timeout. > > Could it be that the client is receiving 404 because the baking of some > NAR was deferred to a worker, and then it closes the connection? Unlikely. Take /gnu/store/qmzr030rzgikdxv3g9msqv0l8qp5j6y2-btrfs-raid-root-os.drv, which was marked as failed earlier today due to missing .drv. It=E2=80=99s= a 4KiB file, and the cache-bypass-threshold is =E2=80=98guix publish=E2=80=99= is typically set to something much higher than that. So =E2=80=98guix publish=E2=80=99 = won=E2=80=99t return 404 in that case. If you find a =E2=80=9Cmissing derivation=E2=80=9D error in Cuirass, how wo= uld you search logs to find what happened? I=E2=80=99m not sure where to look for useful debugging info. Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 07 09:32:14 2022 Received: (at 48468) by debbugs.gnu.org; 7 Dec 2022 14:32:14 +0000 Received: from localhost ([127.0.0.1]:50524 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1p2vT3-0001UP-Ki for submit@debbugs.gnu.org; Wed, 07 Dec 2022 09:32:14 -0500 Received: from eggs.gnu.org ([209.51.188.92]:59208) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1p2vT1-0001UJ-S1 for 48468@debbugs.gnu.org; Wed, 07 Dec 2022 09:32:12 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p2vSv-0006eU-Fe; Wed, 07 Dec 2022 09:32:05 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=+QIVLCTr0wnscpqQ49rkieXQI15PIBR56NfsJYUJpWc=; b=CM3BAH5dr+8eOgJD7zhX vS5iPZtcvhuXiW6rJ4xeoeE32LixDkl6KPlcha2IbJv/hVOCGtN405dJeiWeUk8RQQw5JUN7oNEsB wuLEjM0o9UYiafKwIhMB1YZeT0Lc/nw6qPxfA30MQEe6YkjeovMmb3jxnbR+TpL8wWIcZb/sjqYWP my/3fwmI8bnyc+eAeNX5IozFXoP3Ukqw7lQYExEMpc+j3pfvceSSmzyNVARj7PfA9P5Lc1vX+X52c 6zipxQWcIgGbkXd4jtWhvd/OwGtzH9PopfxcM+Qo6Bh3iglTLrRuPS9907ixBt+wzajrdvFNJUqfV aEmcFFPliIUB6w==; Received: from 2a02-8429-81d2-3d01-94c9-8097-ea5c-2774.rev.sfr.net ([2a02:8429:81d2:3d01:94c9:8097:ea5c:2774] helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p2vSf-0005IU-4F; Wed, 07 Dec 2022 09:32:03 -0500 From: Mathieu Othacehe To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87im2bfj3i.fsf@gnu.org> <87sg108vl5.fsf@gnu.org> <87lenmrnos.fsf@gnu.org> <87cz8v8s66.fsf@gnu.org> <87cz8vjpv2.fsf@gnu.org> Date: Wed, 07 Dec 2022 15:31:44 +0100 In-Reply-To: <87cz8vjpv2.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Wed, 07 Dec 2022 14:38:25 +0100") Message-ID: <878rjj8eun.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hello, > /gnu/store/qmzr030rzgikdxv3g9msqv0l8qp5j6y2-btrfs-raid-root-os.drv, > which was marked as failed earlier today due to missing .drv. It=E2=80= =99s a > 4KiB file, and the cache-bypass-threshold is =E2=80=98guix publish=E2=80= =99 is typically > set to something much higher than that. So =E2=80=98guix publish=E2=80= =99 won=E2=80=99t return > 404 in that case. Yes but that derivation also depends on other derivations, for instance qemu-minimal and if I try: --8<---------------cut here---------------start------------->8--- mathieu@berlin ~$ guix build qemu-minimal ... /gnu/store/lwv2pl0m6dkf6bkzip755w5p71g5akq4-qemu-minimal-7.1.0 --8<---------------cut here---------------end--------------->8--- and then, from my machine. --8<---------------cut here---------------start------------->8--- curl https://ci.guix.gnu.org/lwv2pl0m6dkf6bkzip755w5p71g5akq4.narinfo We're baking it: /lwv2pl0m6dkf6bkzip755w5p71g5akq4.narinfo --8<---------------cut here---------------end--------------->8--- wget exhibits the same behaviour and returns 404. So any build that requires a heavy substitute, heavier than the cache bypass threshold at least, will fail on the workers, as it would fail locally. That's not really a surprise as baking substitutes takes time and there is a time window between the moment Cuirass triggers NAR baking and the moment the NAR is baked, where every user will get a 404. Mathieu From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 08 05:26:39 2022 Received: (at 48468) by debbugs.gnu.org; 8 Dec 2022 10:26:39 +0000 Received: from localhost ([127.0.0.1]:55883 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1p3E6x-0008Nw-3E for submit@debbugs.gnu.org; Thu, 08 Dec 2022 05:26:39 -0500 Received: from eggs.gnu.org ([209.51.188.92]:58260) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1p3E6v-0008Np-Cb for 48468@debbugs.gnu.org; Thu, 08 Dec 2022 05:26:38 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p3E6q-0008Pw-3l for 48468@debbugs.gnu.org; Thu, 08 Dec 2022 05:26:32 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=1OwmOglqgnVtHkHRwkKwAXDWfPcjCygSAB5v1BC1iP4=; b=V9Q4lME8a3zVPYYVrWX/ 1qs32wxPu+w+D0j8tXPsS00VxuHKpF4Qt60HhOnckfpnH8p6zDiXjM55pS5CP7Oj/3i/gORGlVlpg dIAoo3BOFxGbhHW9o5E9q0wclFR6o5lZ1YIi8nYdoAhronnrdY5kXXxpsGoiG7vZSm1j5LrzAppEl lc1t8rMLXzqRnbW9MNP7KPDbJDTWxqzGHzLeHr+8IkaOQikBJLT79P1wO8kML12Np0lAXV2t4qMQl n7nvKVAtNLVFKgQbLInM6Qtm958LFKKPOauNC1kuiW7WYnvNY+j00GZv1rcpukgg8I4ZCt3hqx4Eu 5SRu729RBVB0Lw==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201] helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p3E6p-0003x0-Gz; Thu, 08 Dec 2022 05:26:31 -0500 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Mathieu Othacehe Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87im2bfj3i.fsf@gnu.org> <87sg108vl5.fsf@gnu.org> <87lenmrnos.fsf@gnu.org> <87cz8v8s66.fsf@gnu.org> <87cz8vjpv2.fsf@gnu.org> <878rjj8eun.fsf@gnu.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: Octidi 18 Frimaire an 231 de la =?utf-8?Q?R=C3=A9vol?= =?utf-8?Q?ution=2C?= jour du Lierre X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Thu, 08 Dec 2022 11:26:28 +0100 In-Reply-To: <878rjj8eun.fsf@gnu.org> (Mathieu Othacehe's message of "Wed, 07 Dec 2022 15:31:44 +0100") Message-ID: <87wn72fay3.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Mathieu Othacehe skribis: >> /gnu/store/qmzr030rzgikdxv3g9msqv0l8qp5j6y2-btrfs-raid-root-os.drv, >> which was marked as failed earlier today due to missing .drv. It=E2=80= =99s a >> 4KiB file, and the cache-bypass-threshold is =E2=80=98guix publish=E2=80= =99 is typically >> set to something much higher than that. So =E2=80=98guix publish=E2=80= =99 won=E2=80=99t return >> 404 in that case. > > Yes but that derivation also depends on other derivations, for instance > qemu-minimal and if I try: > > mathieu@berlin ~$ guix build qemu-minimal > ... > /gnu/store/lwv2pl0m6dkf6bkzip755w5p71g5akq4-qemu-minimal-7.1.0 > > > and then, from my machine. > > curl https://ci.guix.gnu.org/lwv2pl0m6dkf6bkzip755w5p71g5akq4.narinfo > We're baking it: /lwv2pl0m6dkf6bkzip755w5p71g5akq4.narinfo Yes, but derivations (.drv) are depend only on =E2=80=9Csources=E2=80=9D (l= ike *-guile-builder, *.patch, *.scm) and on other derivations, all of which are typically less than 1=C2=A0MiB. Derivations don=E2=80=99t depend on derivation outputs like that of qemu-mi= nimal above. > So any build that requires a heavy substitute, heavier than the cache > bypass threshold at least, will fail on the workers, as it would fail > locally. The main =E2=80=98guix publish=E2=80=99 instance on berlin has cache-bypass= -threshold set to 100=C2=A0MiB; I don=E2=80=99t think we have any source or .drv that = is this big, or even a tenth of it. :-) The =E2=80=98guix publish=E2=80=99 instance spawned in (cuirass remote) doe= sn=E2=80=99t use caching at all, so there=E2=80=99s no bypass threshold. The only clue we have so far is a long chain of successful .drv substitutions followed by a failing one: --8<---------------cut here---------------start------------->8--- Downloading http://141.80.167.131/nar/zstd/9nm35401i8j42559iadi3iz3kmhmj7pr= -guix-system-tests.drv... [K guix-system-tests.drv 1KiB 0B/s 00:00 [ ] = 0.0% [K guix-system-tests.drv 1KiB 1.3MiB/s 00:00 [##################] = 100.0% [K guix-system-tests.drv 1KiB 408KiB/s 00:00 [##################] = 100.0% @ substituter-succeeded /gnu/store/9nm35401i8j42559iadi3iz3kmhmj7pr-guix-sy= stem-tests.drv fetching path `/gnu/store/9zf6inizcb09m136c44dj35fmzf2g3hs-guix-system-test= s-modules.drv'... @ substituter-started /gnu/store/9zf6inizcb09m136c44dj35fmzf2g3hs-guix-syst= em-tests-modules.drv substitute Downloading http://141.80.167.131/nar/zstd/9zf6inizcb09m136c44dj35fmzf2g3hs= -guix-system-tests-modules.drv... [K guix-system-tests-modules.drv 543B 0B/s 00:00 [ ] = 0.0% [K guix-system-tests-modules.drv 543B 851KiB/s 00:00 [##################] = 100.0% [K guix-system-tests-modules.drv 543B 484KiB/s 00:00 [##################] = 100.0% @ substituter-succeeded /gnu/store/9zf6inizcb09m136c44dj35fmzf2g3hs-guix-sy= stem-tests-modules.drv cannot build missing derivation ?/gnu/store/zq7idl0j51fdzqhhqm9ql90d0f2326k= 7-btrfs-root-on-subvolume-os.drv? --8<---------------cut here---------------end--------------->8--- (From .) Plausible explanations that come to mind: 1. =E2=80=98guix publish=E2=80=99 returning 404, but not due to baking. = Instead the .drv is simply not in store, hence 404. 2. Client timeout (=E2=80=98guix publish=E2=80=99 fails to reply on time). 3. Cached 404 in nginx (though we=E2=80=99re not supposed to cache those I think?), or timeout in nginx (again due to =E2=80=98guix publish=E2=80= =99 being too slow and =E2=80=98proxy_read_timeout=E2=80=99 is reached, currently 10= s). We should see if we can get useful info from nginx/publish logs. Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Sat Dec 10 05:56:04 2022 Received: (at 48468) by debbugs.gnu.org; 10 Dec 2022 10:56:04 +0000 Received: from localhost ([127.0.0.1]:42078 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1p3xWV-00067e-Us for submit@debbugs.gnu.org; Sat, 10 Dec 2022 05:56:04 -0500 Received: from eggs.gnu.org ([209.51.188.92]:48908) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1p3xWU-00067H-QX for 48468@debbugs.gnu.org; Sat, 10 Dec 2022 05:56:03 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p3xWP-0002VM-5W for 48468@debbugs.gnu.org; Sat, 10 Dec 2022 05:55:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=YD/ILx6kPDmpu3G7BGvmOwwpa1gjGAnaH2rwXlq5sdM=; b=LYilsh4IhxJCfcKZreWs 0KQ1aIAC6wv3DPn/lzIDFns7L03mwsN3Utgn341YHPfKpl/Mj9IxP3o1zsVlFiru4L52usD/dRpzC qoPKcKY8Wn+S4LzjUdKqfuLwevKkmXgn9f0bygC8n/QxrQGq5M32jE2lEaDVaz4KXthUapUjfXEyL oqTbW00IhxTGIsHI44dDtGmkaoAmpFdfc4Hi07oe0dq+SL98UNBt7zexZImcMLIvs0qyFKCT0rtEx GThPh85vw9IVR0rja8uwKqCSWLDtEo+/XRRL3jVCyNmkcKzZFE4siS0MF9IGjH2imeSBf98SwO8O4 ERGw6somlMXViQ==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201] helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p3xWM-0004Fc-S8; Sat, 10 Dec 2022 05:55:56 -0500 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Mathieu Othacehe Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87im2bfj3i.fsf@gnu.org> <87sg108vl5.fsf@gnu.org> <87lenmrnos.fsf@gnu.org> <87cz8v8s66.fsf@gnu.org> <87cz8vjpv2.fsf@gnu.org> <878rjj8eun.fsf@gnu.org> <87wn72fay3.fsf@gnu.org> Date: Sat, 10 Dec 2022 11:55:52 +0100 In-Reply-To: <87wn72fay3.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Thu, 08 Dec 2022 11:26:28 +0100") Message-ID: <87fsdn5xzb.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Ludovic Court=C3=A8s skribis: > Mathieu Othacehe skribis: [...] >> curl https://ci.guix.gnu.org/lwv2pl0m6dkf6bkzip755w5p71g5akq4.narinfo >> We're baking it: /lwv2pl0m6dkf6bkzip755w5p71g5akq4.narinfo > > Yes, but derivations (.drv) are depend only on =E2=80=9Csources=E2=80=9D = (like > *-guile-builder, *.patch, *.scm) and on other derivations, all of which > are typically less than 1=C2=A0MiB. > > Derivations don=E2=80=99t depend on derivation outputs like that of qemu-= minimal > above. > >> So any build that requires a heavy substitute, heavier than the cache >> bypass threshold at least, will fail on the workers, as it would fail >> locally. > > The main =E2=80=98guix publish=E2=80=99 instance on berlin has cache-bypa= ss-threshold > set to 100=C2=A0MiB; I don=E2=80=99t think we have any source or .drv tha= t is this > big, or even a tenth of it. :-) On closer inspection, we do get bigger sources for those system test derivations: --8<---------------cut here---------------start------------->8--- $ du $(guix gc -R /gnu/store/2laznragkfjf0jn6n7b8c53vgmb8maqp-installed-ex= tlinux-os.drv ) |sort -k1 -n |tail 6304 /gnu/store/mvvf60w8p0lx8vlixryb1a1gfib401rf-guix-b129026/gnu/packages 6956 /gnu/store/9cs4i3hygldsan8ahf0p7x5kgzmqz9gr-guix-packages-base-source/= gnu/packages/patches 7000 /gnu/store/mvvf60w8p0lx8vlixryb1a1gfib401rf-guix-b129026/gnu 7716 /gnu/store/z0gc56x1w83wn0pi4mz8nhyjdr0d10fs-guix 12352 /gnu/store/mvvf60w8p0lx8vlixryb1a1gfib401rf-guix-b129026 22548 /gnu/store/ky96vygv1hpa8iz6677qz96binbz201h-packages 22684 /gnu/store/rrg9ki4bdjffvmgapjcqwjwjwkhb0qhl-doc 39516 /gnu/store/9cs4i3hygldsan8ahf0p7x5kgzmqz9gr-guix-packages-base-source= /gnu/packages 39524 /gnu/store/9cs4i3hygldsan8ahf0p7x5kgzmqz9gr-guix-packages-base-source= /gnu 39528 /gnu/store/9cs4i3hygldsan8ahf0p7x5kgzmqz9gr-guix-packages-base-source --8<---------------cut here---------------end--------------->8--- The items at the bottom weigh around 40=C2=A0MiB. That=E2=80=99s still below the 100=C2=A0MiB cache bypass threshold of the m= ain =E2=80=98guix publish=E2=80=99 instance though. Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Tue Dec 27 04:53:04 2022 Received: (at 48468) by debbugs.gnu.org; 27 Dec 2022 09:53:04 +0000 Received: from localhost ([127.0.0.1]:54740 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pA6dr-0005AI-Tl for submit@debbugs.gnu.org; Tue, 27 Dec 2022 04:53:04 -0500 Received: from eggs.gnu.org ([209.51.188.92]:43808) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pA6dp-00059l-OK for 48468@debbugs.gnu.org; Tue, 27 Dec 2022 04:53:02 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pA6dj-0005hj-J7; Tue, 27 Dec 2022 04:52:55 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=Du6ptv6lLBXUXrdtL2hDOU5Efll8XCc6SJgPnuEdRc0=; b=jBUqewPuXBsBj+EzTEkc V9XvXsyllsxAtgt+yKOUDoITTkgNsXZHNyl0X4RKkDdE9MJQ8TJNfDnrCokJ8iSdfmi+46vmSW68z 25SQtJgYvIIe3WeULy3FfezXz5XlalIwkIm0sQWd2SdXg1rMZ3/khJNd/9xulHtusm590M9bDguhZ gaZeOSlede4KAP2rmUC7qOUWO7x0V7yecJXx6TMcYi4KHFxNJ1OFxcAWw2xbJZZWQuoBoRSZiW8Jp S3ObgHTefRhWT2q/swG2e9dP7jEnsKjXyybJNMa5WQYP9/uFh7yHWt6CMkYpb0+fFMDaQv/sLHvJT MgJG0psvfOZgiQ==; Received: from [2a01:cb18:832e:5f00:3563:417e:2a38:86d8] (helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pA6dh-0003Mh-Te; Tue, 27 Dec 2022 04:52:55 -0500 From: Mathieu Othacehe To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87im2bfj3i.fsf@gnu.org> <87sg108vl5.fsf@gnu.org> <87lenmrnos.fsf@gnu.org> <87cz8v8s66.fsf@gnu.org> <87cz8vjpv2.fsf@gnu.org> <878rjj8eun.fsf@gnu.org> <87wn72fay3.fsf@gnu.org> <87fsdn5xzb.fsf@gnu.org> Date: Tue, 27 Dec 2022 10:52:51 +0100 In-Reply-To: <87fsdn5xzb.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Sat, 10 Dec 2022 11:55:52 +0100") Message-ID: <87sfh1mavg.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hey Ludo, > That=E2=80=99s still below the 100=C2=A0MiB cache bypass threshold of the= main =E2=80=98guix > publish=E2=80=99 instance though. Right. Just to be on the safe side here, what about applying this patch to have log lines when we are replying 404 due to baking? Thanks, Mathieu --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-scripts-publish-Add-a-log-when-replying-404-due-to-b.patch >From 725d5ba21a0fc0108b60c37bbc8d947fab6ac938 Mon Sep 17 00:00:00 2001 From: Mathieu Othacehe Date: Tue, 27 Dec 2022 10:49:04 +0100 Subject: [PATCH 1/1] scripts: publish: Add a log when replying 404 due to baking. * guix/scripts/publish.scm (render-narinfo/cached): Add it. --- guix/scripts/publish.scm | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/guix/scripts/publish.scm b/guix/scripts/publish.scm index 3bf3bd9c7c..a2048c98fd 100644 --- a/guix/scripts/publish.scm +++ b/guix/scripts/publish.scm @@ -586,9 +586,13 @@ (define (delete-entry narinfo) #:ttl 300 ;temporary #:nar-path nar-path #:compressions compressions) - (not-found request - #:phrase "We're baking it" - #:ttl 300))) ;should be available within 5m + (begin + (format #t (G_ "~a ~a: 404 (baking)~%") + (request-method request) + (uri-path (request-uri request))) + (not-found request + #:phrase "We're baking it" + #:ttl 300)))) ;should be available within 5m (else (not-found request #:phrase "" #:ttl negative-ttl))))) -- 2.38.1 --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 28 09:23:31 2022 Received: (at 48468) by debbugs.gnu.org; 28 Dec 2022 14:23:31 +0000 Received: from localhost ([127.0.0.1]:57315 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pAXL8-0003yO-Ei for submit@debbugs.gnu.org; Wed, 28 Dec 2022 09:23:31 -0500 Received: from eggs.gnu.org ([209.51.188.92]:51524) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pAXL7-0003yA-5R for 48468@debbugs.gnu.org; Wed, 28 Dec 2022 09:23:29 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pAXL1-0002uC-Im; Wed, 28 Dec 2022 09:23:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:References:In-Reply-To:Subject:To: From; bh=QOPIJnp44LyWwhKxBw3SXZP166pG8HblXCwrDfAzN5Q=; b=QGqLtaL9TJccFG9KgUtQ 7rjgDYC6oBJxK5x/OCq9oan9dwVfYkCAhaB/g+hDaGoXjYaAYqzivPOHOU3zOWHPgydG5V7E80ZK3 6PKv4QG0i4TqwTBxtzaEn6DaqCCJwd7Qbo7m0n94J8mCMwXZ/gcnHZkcSdjWZNuoQ34SbC79vjk4d ZeGMmRjZKrsP4jyy6t7wnGuDHeW2OtOP+nEx9KoJiATj0Yw/zbGOQ/rfjYVVreENyUVhvBQb1w7Yx AvRAhM1zkoQ5nzKwIW9WDT5aXA3J8McGYnNnq7FXMQcfnU2O+uuUsa3M66Mjmh6V/lViu0WmAD66v WT2Rz30v7REHNw==; Received: from [2a01:cb18:832e:5f00:3563:417e:2a38:86d8] (helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pAXL1-00056O-64; Wed, 28 Dec 2022 09:23:23 -0500 From: Mathieu Othacehe To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#48468: substitute server connection timeout In-Reply-To: <87wn72fay3.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Thu, 08 Dec 2022 11:26:28 +0100") References: <87lf8e4l42.fsf@gnu.org> <87im2bfj3i.fsf@gnu.org> <87sg108vl5.fsf@gnu.org> <87lenmrnos.fsf@gnu.org> <87cz8v8s66.fsf@gnu.org> <87cz8vjpv2.fsf@gnu.org> <878rjj8eun.fsf@gnu.org> <87wn72fay3.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) Date: Wed, 28 Dec 2022 15:23:21 +0100 Message-ID: <87v8lvsj3a.fsf@gnu.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hello, So I had an other look to that one, > (From .) > > Plausible explanations that come to mind: > > 1. =E2=80=98guix publish=E2=80=99 returning 404, but not due to baking.= Instead the > .drv is simply not in store, hence 404. > > 2. Client timeout (=E2=80=98guix publish=E2=80=99 fails to reply on tim= e). > > 3. Cached 404 in nginx (though we=E2=80=99re not supposed to cache thos= e I > think?), or timeout in nginx (again due to =E2=80=98guix publish=E2= =80=99 being too > slow and =E2=80=98proxy_read_timeout=E2=80=99 is reached, currently = 10s). I focused on https://ci.guix.gnu.org/build/308493/details which is similar to the one linked above. Some derivations are successfully substituted then, one is not and it aborts. This build stopped at 2022/12/24 00:25:38 CET. At that time the nginx error log on Berlin looks like: --8<---------------cut here---------------start------------->8--- 2022/12/24 00:25:23 [info] 128755#0: *178076209 recv() failed (104: Connect= ion reset by peer) while sending to client, client: 78.47.68.4, server: ci.= guix.gnu.org, request: "GET /x3c2f9zzxhk0ci81nrdsxzglp4b1h3di.narinfo HTTP/= 1.1", upstream: "http://127.0.0.1:3000/x3c2f9zzxhk0ci81nrdsxzglp4b1h3di.nar= info", host: "ci.guix.gnu.org" 2022/12/24 00:25:24 [info] 128742#0: *178076816 client 174.59.221.51 closed= keepalive connection 2022/12/24 00:25:25 [info] 128755#0: *178076830 recv() failed (104: Connect= ion reset by peer) while sending to client, client: 78.47.68.4, server: ci.= guix.gnu.org, request: "GET /vx7lgvrcw6i6r0inw513qn6fj3cb27mm.narinfo HTTP/= 1.1", upstream: "http://127.0.0.1:3000/vx7lgvrcw6i6r0inw513qn6fj3cb27mm.nar= info", host: "ci.guix.gnu.org" 2022/12/24 00:25:26 [info] 128742#0: *178077436 recv() failed (104: Connect= ion reset by peer) while sending to client, client: 78.47.68.4, server: ci.= guix.gnu.org, request: "GET /b1g8vyqp8c52yk6vscdvjflci5mnladq.narinfo HTTP/= 1.1", upstream: "http://127.0.0.1:3000/b1g8vyqp8c52yk6vscdvjflci5mnladq.nar= info", host: "ci.guix.gnu.org" 2022/12/24 00:25:28 [info] 128742#0: *178078045 recv() failed (104: Connect= ion reset by peer) while sending to client, client: 78.47.68.4, server: ci.= guix.gnu.org, request: "GET /007zgflsl5xkr377wpakbsis5c2yqh1q.narinfo HTTP/= 1.1", upstream: "http://127.0.0.1:3000/007zgflsl5xkr377wpakbsis5c2yqh1q.nar= info", host: "ci.guix.gnu.org" 2022/12/24 00:25:32 [info] 128742#0: *178078659 recv() failed (104: Connect= ion reset by peer) while sending to client, client: 78.47.68.4, server: ci.= guix.gnu.org, request: "GET /xdfyi8jki350i8njk1glrzg9p0va18vy.narinfo HTTP/= 1.1", upstream: "http://127.0.0.1:3000/xdfyi8jki350i8njk1glrzg9p0va18vy.nar= info", host: "ci.guix.gnu.org" 2022/12/24 00:25:33 [info] 128741#0: *178079271 recv() failed (104: Connect= ion reset by peer) while sending to client, client: 78.47.68.4, server: ci.= guix.gnu.org, request: "GET /rzs7jiwhyls2bg8zv2g1vmmgcfwkvaax.narinfo HTTP/= 1.1", upstream: "http://127.0.0.1:3000/rzs7jiwhyls2bg8zv2g1vmmgcfwkvaax.nar= info", host: "ci.guix.gnu.org" 2022/12/24 00:25:36 [info] 128741#0: *178080049 client 157.55.39.212 closed= keepalive connection 2022/12/24 00:25:38 [info] 128741#0: *178080047 peer closed connection in S= SL handshake while SSL handshaking, client: 127.0.0.1, server: 0.0.0.0:443 --8<---------------cut here---------------end--------------->8--- That would, in that specific case, invalidate your hypothesis number 2. When 'guix publish' timeouts, we have messages like: --8<---------------cut here---------------start------------->8--- 2022/12/24 00:45:36 [error] 128749#0: *178225211 upstream timed out (110: C= onnection timed out) while reading response header from upstream, client: 8= 6.87.253.49, server: ci.guix.gnu.org, request: "GET /m0hqyvj8di41xlraz087km= pr38zaw90r.narinfo HTTP/1.1", upstream: "http://127.0.0.1:3000/m0hqyvj8di41= xlraz087kmpr38zaw90r.narinfo", host: "ci.guix.gnu.org" --8<---------------cut here---------------end--------------->8--- However, like suggested in your hypothesis number 1, it seems instead that we are replying 404 to the worker which resets the connection. As we have put aside the baking thing, the question is now why are those derivations not available? Are they not part of Berlin's store at that time? Or is the publish server erroneously returns 404? It looks like the requested derivations are still absent, as of right now: --8<---------------cut here---------------start------------->8--- mathieu@berlin /var/log/nginx$ ls /gnu/store/vx7lgvrcw6i6r0inw513qn6fj3cb27= mm* ls: cannot access '/gnu/store/vx7lgvrcw6i6r0inw513qn6fj3cb27mm*': No such f= ile or directory mathieu@berlin /var/log/nginx$ ls /gnu/store/b1g8vyqp8c52yk6vscdvjflci5mnla= dq* ls: cannot access '/gnu/store/b1g8vyqp8c52yk6vscdvjflci5mnladq*': No such f= ile or directory mathieu@berlin /var/log/nginx$ ls /gnu/store/007zgflsl5xkr377wpakbsis5c2yqh= 1q* ls: cannot access '/gnu/store/007zgflsl5xkr377wpakbsis5c2yqh1q*': No such f= ile or directory --8<---------------cut here---------------end--------------->8--- As I don't have much clue about what are those derivations, I think we should instrument a bit the publish server and maybe the substitute script like proposed in the attachments. WDYT? Thanks, Mathieu --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: inline; filename=0001-scripts-publish-Add-a-custom-baking-header.patch Content-Transfer-Encoding: quoted-printable >From 9f9c839937ac2edd1b5901b2262c4be0954fa20c Mon Sep 17 00:00:00 2001 From: Mathieu Othacehe Date: Wed, 28 Dec 2022 15:12:46 +0100 Subject: [PATCH 1/2] scripts: publish: Add a custom baking header. Log the not-found responses and their reason (baking or not) to stdout. Also send the X-Baking custom header so that the client can be informed of the cause of the failure. * guix/scripts/publish.scm (not-found): Add a baking? argument to add the X-Baking HTTP header to the response if baking is in progress. Also, log t= he 404 responses to stdout, indicating if it is due to baking or not. (render-narinfo/cached): Pass the baking? argument. --- guix/scripts/publish.scm | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/guix/scripts/publish.scm b/guix/scripts/publish.scm index 3bf3bd9c7c..11fedf092e 100644 --- a/guix/scripts/publish.scm +++ b/guix/scripts/publish.scm @@ -4,7 +4,7 @@ ;;; Copyright =C2=A9 2015-2022 Ludovic Court=C3=A8s ;;; Copyright =C2=A9 2020 Maxim Cournoyer ;;; Copyright =C2=A9 2021 Simon Tournier -;;; Copyright =C2=A9 2021 Mathieu Othacehe +;;; Copyright =C2=A9 2021, 2022 Mathieu Othacehe ;;; ;;; This file is part of GNU Guix. ;;; @@ -375,14 +375,28 @@ (define* (narinfo-string store store-path compression))) compressions)))) =20 +;; Custom header to indicate that baking is in progress. +(declare-opaque-header! "X-Baking") + (define* (not-found request - #:key (phrase "Resource not found") + #:key + baking? + (phrase "Resource not found") ttl) "Render 404 response for REQUEST." + (format #t (G_ "=E2=86=B3 ~a ~a: 404~a~%") + (request-method request) + (uri-path (request-uri request)) + (if baking? " (baking)" "")) (values (build-response #:code 404 - #:headers (if ttl - `((cache-control (max-age . ,ttl))) - '())) + #:headers + (append + (if ttl + `((cache-control (max-age . ,ttl))) + '()) + (if baking? + '((x-baking . "1")) + '()))) (string-append phrase ": " (uri-path (request-uri request))))) =20 @@ -587,6 +601,7 @@ (define (delete-entry narinfo) #:nar-path nar-path #:compressions compressions) (not-found request + #:baking? #t #:phrase "We're baking it" #:ttl 300))) ;should be available withi= n 5m (else --=20 2.38.1 --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0002-substitutes-Log-the-failing-queries.patch >From 25ffc57864dbf34ca58741f89c1f790dbde6702f Mon Sep 17 00:00:00 2001 From: Mathieu Othacehe Date: Wed, 28 Dec 2022 15:19:29 +0100 Subject: [PATCH 2/2] substitutes: Log the failing queries. * guix/substitutes.scm (%debug?): New variable. (handle-narinfo-response): Log the failing queries if the %debug? parameter is set. --- guix/substitutes.scm | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/guix/substitutes.scm b/guix/substitutes.scm index 9014cf61ec..819eb2c73e 100644 --- a/guix/substitutes.scm +++ b/guix/substitutes.scm @@ -90,6 +90,12 @@ (define %narinfo-cache-directory (string-append %state-directory "/substitute/cache")) (string-append (cache-directory #:ensure? #f) "/substitute"))) +(define %debug? + ;; Enable debug mode by setting the GUIX_SUBSTITUTE_DEBUG environmnent + ;; variable. + (make-parameter + (getenv "GUIX_SUBSTITUTE_DEBUG"))) + (define (narinfo-cache-file cache-url path) "Return the name of the local file that contains an entry for PATH. The entry is stored in a sub-directory specific to CACHE-URL." @@ -224,6 +230,15 @@ (define (handle-narinfo-response request response port result) (let* ((path (uri-path (request-uri request))) (hash-part (basename (string-drop-right path 8)))) ;drop ".narinfo" + ;; Log the failing queries and indicate if it failed because the + ;; narinfo is being baked. + (when (%debug?) + (let ((baking? + (assoc-ref (response-headers response) 'x-baking))) + (display + (format #f "could not fetch ~a~a ~a~a~%" + url path code + (if baking? " (baking)" ""))))) (if len (get-bytevector-n port len) (read-to-eof port)) -- 2.38.1 --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Fri Jan 06 17:42:58 2023 Received: (at 48468) by debbugs.gnu.org; 6 Jan 2023 22:42:58 +0000 Received: from localhost ([127.0.0.1]:56000 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pDvQQ-000665-BQ for submit@debbugs.gnu.org; Fri, 06 Jan 2023 17:42:58 -0500 Received: from eggs.gnu.org ([209.51.188.92]:48656) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pDvQO-00065t-KE for 48468@debbugs.gnu.org; Fri, 06 Jan 2023 17:42:57 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pDvQI-0004M3-VE for 48468@debbugs.gnu.org; Fri, 06 Jan 2023 17:42:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=whBnKolx2dU774Oks/UTuIWS1dsEoVXfJ5gnW8ZPumw=; b=R/PzVWCZHbk0BOI6Kcjs EhM1V4SFSb/Zpm5ByaG2B042Ws90AjF4rwq5FPUMau0QLrPqrxH1q9N6dP84JrL6FWDvL9fmSKzgw ZzqSY7ds40OgPz8RT5CEO6XSNuLndRbQg0D6QNCgDnqHSpzGT1R+Ez9yXSWf1EjGhn/hJPqt5Udf6 Z/i0vlBf9JCxQ5RlgLT99cytTLItbTaf5vyASilpLlnzY9tV6pfpzqlXgTK/JgEDvOfEItS4eXYw/ bcNaBfldJ8DsB8BsLIo8ISHDmskqqOIoQ/y9e6nhrBdukmqU9MwP3JO1zAv55+h5yGLbM2aSoBVlt fcJzq4AzUOxciA==; Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pDvQH-0007ra-Kf; Fri, 06 Jan 2023 17:42:50 -0500 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Mathieu Othacehe Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87im2bfj3i.fsf@gnu.org> <87sg108vl5.fsf@gnu.org> <87lenmrnos.fsf@gnu.org> <87cz8v8s66.fsf@gnu.org> <87cz8vjpv2.fsf@gnu.org> <878rjj8eun.fsf@gnu.org> <87wn72fay3.fsf@gnu.org> <87v8lvsj3a.fsf@gnu.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: Septidi 17 =?utf-8?Q?Niv=C3=B4se?= an 231 de la =?utf-8?Q?R=C3=A9volution=2C?= jour de la Marne X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Fri, 06 Jan 2023 23:42:47 +0100 In-Reply-To: <87v8lvsj3a.fsf@gnu.org> (Mathieu Othacehe's message of "Wed, 28 Dec 2022 15:23:21 +0100") Message-ID: <87mt6vxp20.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi Mathieu! Mathieu Othacehe skribis: >> (From .) >> >> Plausible explanations that come to mind: >> >> 1. =E2=80=98guix publish=E2=80=99 returning 404, but not due to baking= . Instead the >> .drv is simply not in store, hence 404. >> >> 2. Client timeout (=E2=80=98guix publish=E2=80=99 fails to reply on ti= me). >> >> 3. Cached 404 in nginx (though we=E2=80=99re not supposed to cache tho= se I >> think?), or timeout in nginx (again due to =E2=80=98guix publish=E2= =80=99 being too >> slow and =E2=80=98proxy_read_timeout=E2=80=99 is reached, currently= 10s). > > I focused on https://ci.guix.gnu.org/build/308493/details which is > similar to the one linked above. Some derivations are successfully > substituted then, one is not and it aborts. > > This build stopped at 2022/12/24 00:25:38 CET. At that time the nginx err= or > log on Berlin looks like: > > 2022/12/24 00:25:23 [info] 128755#0: *178076209 recv() failed (104: Conne= ction reset by peer) while sending to client, client: 78.X.Y.Z, server: ci.= guix.gnu.org, request: "GET /x3c2f9zzxhk0ci81nrdsxzglp4b1h3di.narinfo HTTP/= 1.1", upstream: "http://127.0.0.1:3000/x3c2f9zzxhk0ci81nrdsxzglp4b1h3di.nar= info", host: "ci.guix.gnu.org" It means that upstream (i.e., =E2=80=98guix publish=E2=80=99) closed the co= nnection, right? And it means that it closed it prematurely I guess? > However, like suggested in your hypothesis number 1, it seems instead > that we are replying 404 to the worker which resets the connection. As > we have put aside the baking thing, the question is now why are those > derivations not available? In that case we=E2=80=99re not replying at all, are we? > From 9f9c839937ac2edd1b5901b2262c4be0954fa20c Mon Sep 17 00:00:00 2001 > From: Mathieu Othacehe > Date: Wed, 28 Dec 2022 15:12:46 +0100 > Subject: [PATCH 1/2] scripts: publish: Add a custom baking header. > > Log the not-found responses and their reason (baking or not) to stdout. A= lso > send the X-Baking custom header so that the client can be informed of the > cause of the failure. > > * guix/scripts/publish.scm (not-found): Add a baking? argument to add the > X-Baking HTTP header to the response if baking is in progress. Also, log= the > 404 responses to stdout, indicating if it is due to baking or not. > (render-narinfo/cached): Pass the baking? argument. [...] > (define* (not-found request > - #:key (phrase "Resource not found") > + #:key > + baking? > + (phrase "Resource not found") > ttl) > "Render 404 response for REQUEST." > + (format #t (G_ "=E2=86=B3 ~a ~a: 404~a~%") Drop =E2=80=98G_=E2=80=99 (we don=E2=80=99t translate debugging messages) a= nd use ASCII, to be on the safe side=E2=80=A6 > From 25ffc57864dbf34ca58741f89c1f790dbde6702f Mon Sep 17 00:00:00 2001 > From: Mathieu Othacehe > Date: Wed, 28 Dec 2022 15:19:29 +0100 > Subject: [PATCH 2/2] substitutes: Log the failing queries. > > * guix/substitutes.scm (%debug?): New variable. > (handle-narinfo-response): Log the failing queries if the %debug? paramet= er is > set. [...] > +(define %debug? > + ;; Enable debug mode by setting the GUIX_SUBSTITUTE_DEBUG environmnent > + ;; variable. > + (make-parameter > + (getenv "GUIX_SUBSTITUTE_DEBUG"))) Instead of an env. var., maybe add a =E2=80=98--debug=E2=80=99 command-line= option and parameterize =E2=80=98%debug?=E2=80=99 accordingly? You can also have something like: (define-syntax (debug fmt args ...) (when (%debug?) (format #t fmt args ...))) LGTM with changes along these lines! Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 07 08:40:27 2023 Received: (at 48468) by debbugs.gnu.org; 7 Jan 2023 13:40:27 +0000 Received: from localhost ([127.0.0.1]:56734 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pE9Qx-0002KK-AY for submit@debbugs.gnu.org; Sat, 07 Jan 2023 08:40:27 -0500 Received: from eggs.gnu.org ([209.51.188.92]:51282) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pE9Qu-0002K6-Ot for 48468@debbugs.gnu.org; Sat, 07 Jan 2023 08:40:25 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pE9Qo-0002W7-ND; Sat, 07 Jan 2023 08:40:18 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=RL7DrW0eSm6Batp7hgbEIXghtXQ5stS/dl2hZxbq574=; b=OlsW9iWV4fY3T+q0bQmA h4eJiq4Myzq0tH6ekG4pc1+YstnRk0bPmeRS1Xxp/Uvyx2DIlv21hmTI0x5tD30ozGrc7UA/mCYgH K+9qu48QHbwRO2D44MfB/ZLiIPD0oN25+G6nzOYBfT7SXq28suj/YTvX6FEE2B+ntG63CgE3hiei1 BUV9FT0RLs38BszEqquLFoNyA2P5fddH+hfLOG+ku8Y6Y9gvA5poSpANRGJPyIa5xPpYXJGHp24kz Wbah1/n/DSx7W1kBiDbt5UDQ4ZehOe3wDeuk0c4ZPclOmrTF+Sql0jcP4nHZ6aN+E3TTBfsL6rCKB FbRkjKKcvJTAew==; Received: from [89.207.171.57] (helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pE9Qn-0002fq-NP; Sat, 07 Jan 2023 08:40:18 -0500 From: Mathieu Othacehe To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87im2bfj3i.fsf@gnu.org> <87sg108vl5.fsf@gnu.org> <87lenmrnos.fsf@gnu.org> <87cz8v8s66.fsf@gnu.org> <87cz8vjpv2.fsf@gnu.org> <878rjj8eun.fsf@gnu.org> <87wn72fay3.fsf@gnu.org> <87v8lvsj3a.fsf@gnu.org> <87mt6vxp20.fsf@gnu.org> Date: Sat, 07 Jan 2023 14:40:15 +0100 In-Reply-To: <87mt6vxp20.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Fri, 06 Jan 2023 23:42:47 +0100") Message-ID: <87358mo43k.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 1.3 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Hello, > It means that upstream (i.e., ‘guix publish’) closed the connection, > right? > > And it means that it closed it prematurely I guess? Looks like it yes. Content analysis details: (1.3 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 3.6 RCVD_IN_SBL_CSS RBL: Received via a relay in Spamhaus SBL-CSS [89.207.171.57 listed in zen.spamhaus.org] -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [209.51.188.92 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record X-Debbugs-Envelope-To: 48468 Cc: 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.3 (/) Hello, > It means that upstream (i.e., =E2=80=98guix publish=E2=80=99) closed the = connection, > right? > > And it means that it closed it prematurely I guess? Looks like it yes. >> However, like suggested in your hypothesis number 1, it seems instead >> that we are replying 404 to the worker which resets the connection. As >> we have put aside the baking thing, the question is now why are those >> derivations not available? > > In that case we=E2=80=99re not replying at all, are we? Well could be, I'm not 100% sure how to understand those nginx logs. If we are replying anything, it will be visible with the new traces. If on the other hand the publish server is hanging up then they won't help much I guess. > Drop =E2=80=98G_=E2=80=99 (we don=E2=80=99t translate debugging messages)= and use ASCII, to be > on the safe side=E2=80=A6 Done. > Instead of an env. var., maybe add a =E2=80=98--debug=E2=80=99 command-li= ne option and > parameterize =E2=80=98%debug?=E2=80=99 accordingly? The --debug command-line feels better but it involves a guix-daemon modification so I kept the environment variable, > You can also have something like: > > (define-syntax (debug fmt args ...) > (when (%debug?) > (format #t fmt args ...))) > and used that macro :) Thanks for having a look! Mathieu From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 09 04:40:08 2023 Received: (at 48468) by debbugs.gnu.org; 9 Jan 2023 09:40:08 +0000 Received: from localhost ([127.0.0.1]:35793 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pEodU-0000yR-IT for submit@debbugs.gnu.org; Mon, 09 Jan 2023 04:40:08 -0500 Received: from eggs.gnu.org ([209.51.188.92]:43580) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pEodP-0000xn-EF for 48468@debbugs.gnu.org; Mon, 09 Jan 2023 04:40:06 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pEodK-0002zx-0l for 48468@debbugs.gnu.org; Mon, 09 Jan 2023 04:39:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=GcFTwEOm1yRTeUK1kdaMSwVEPLQG5TMd/4cEPeGGrKU=; b=SMM4ihdIcQebrBeFDRaH jycUSpqiXJGzQ/0A1uCBi4th6eJoKYzLoMk8X6NVmqvccDLfTSPr4iEEBltnDoCOPU6lnxc/dQkMR CRmgndV4fJwOoJ5gxn2lIR2pKY+pW5/xf8Pykn6YvjSaDV0bmAI/qosYt0DxqhGzWE/AaejSO54Ti 8U5f6b+nkEVsHSdnv/26qK0UqK/WLYQvmVV3moBU148sAFwSo4GfP37GZCwON1ltABdTjhBxNK3zx T77lvW68RaV7Mer7RSaPvZveVJtX2MZfGbcMCFeZ4Q7hwKFGXVdhnxZ5vIxFwqI839rj57EyTu8Jv f1vZnAEPraBAuQ==; Received: from [193.50.110.246] (helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pEodJ-0001vu-BL; Mon, 09 Jan 2023 04:39:57 -0500 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Mathieu Othacehe Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87im2bfj3i.fsf@gnu.org> <87sg108vl5.fsf@gnu.org> <87lenmrnos.fsf@gnu.org> <87cz8v8s66.fsf@gnu.org> <87cz8vjpv2.fsf@gnu.org> <878rjj8eun.fsf@gnu.org> <87wn72fay3.fsf@gnu.org> <87v8lvsj3a.fsf@gnu.org> <87mt6vxp20.fsf@gnu.org> <87358mo43k.fsf@gnu.org> Date: Mon, 09 Jan 2023 10:39:54 +0100 In-Reply-To: <87358mo43k.fsf@gnu.org> (Mathieu Othacehe's message of "Sat, 07 Jan 2023 14:40:15 +0100") Message-ID: <87pmbot5at.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Mathieu Othacehe skribis: >> Instead of an env. var., maybe add a =E2=80=98--debug=E2=80=99 command-l= ine option and >> parameterize =E2=80=98%debug?=E2=80=99 accordingly? > > The --debug command-line feels better but it involves a guix-daemon > modification so I kept the environment variable, Oh sorry, I was confusing =E2=80=98substitute=E2=80=99 and =E2=80=98publish= =E2=80=99, that makes sense! Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 10 03:10:41 2023 Received: (at 48468) by debbugs.gnu.org; 10 Jan 2023 08:10:41 +0000 Received: from localhost ([127.0.0.1]:38674 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pF9iS-0004Da-TX for submit@debbugs.gnu.org; Tue, 10 Jan 2023 03:10:41 -0500 Received: from eggs.gnu.org ([209.51.188.92]:54984) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pF9iR-0004DO-Ix for 48468@debbugs.gnu.org; Tue, 10 Jan 2023 03:10:40 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pF9iM-0001me-9P; Tue, 10 Jan 2023 03:10:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=VrhoeTtFZfkncL3We/NkvZoZDTQapp/7UxbWZ6Q7Gp0=; b=hz1rka+b/FvJaKH4wWPl /Zv266WUMvHzJ7OJY1O6a9Jfecoc5s/0A3YfttdRfBPR58MK9/wcIOu0ZBniuKmpeiigT+DV/iJmM n8/ymVIQGzpHwdA4pqbqbfvoX+CfacA3j58RCj6TWpcRQc6Sj5QJ3C8T5prCZqOdstT42BSZZ48uS KBkO4s/4RdLZaaby8SfPxEXaS7N3pvegCNOWyb8pxnqv7axnxt/uty+6v4nDbn0bchLAtBrGgClY3 LkwI0PaPhr3cQcSpEBc5YDJIoX/MWXhAHyHsEnG4kftpx3s5z2CG50apL0HCG5gice89LouGk+Aj9 brcVcA2xAp5hbg==; Received: from 2a02-8429-81d2-3d01-94c9-8097-ea5c-2775.rev.sfr.net ([2a02:8429:81d2:3d01:94c9:8097:ea5c:2775] helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pF9iL-0003ai-PY; Tue, 10 Jan 2023 03:10:33 -0500 From: Mathieu Othacehe To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#48468: substitute server connection timeout References: <87lf8e4l42.fsf@gnu.org> <87im2bfj3i.fsf@gnu.org> <87sg108vl5.fsf@gnu.org> <87lenmrnos.fsf@gnu.org> <87cz8v8s66.fsf@gnu.org> <87cz8vjpv2.fsf@gnu.org> <878rjj8eun.fsf@gnu.org> <87wn72fay3.fsf@gnu.org> <87v8lvsj3a.fsf@gnu.org> <87mt6vxp20.fsf@gnu.org> <87358mo43k.fsf@gnu.org> <87pmbot5at.fsf@gnu.org> Date: Tue, 10 Jan 2023 09:10:30 +0100 In-Reply-To: <87pmbot5at.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Mon, 09 Jan 2023 10:39:54 +0100") Message-ID: <87h6wyolmx.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48468 Cc: 48468@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hey, So the debug mechanism is in place. Requesting a non-existing derivation on a worker gives: --8<---------------cut here---------------start------------->8--- mathieu@hydra-guix-104 ~$ guix build /gnu/store/yd1p7069rs4xbbfwj5p7nzp9psw= 7d3vv-hello-2.12.1.drv substitute: could not fetch http://141.80.167.131/yd1p7069rs4xbbfwj5p7nzp9p= sw7d3vv.narinfo 404 substitute: updating substitutes from 'http://141.80.167.131'... 100.0% cannot build missing derivation =E2=80=98/gnu/store/yd1p7069rs4xbbfwj5p7nzp= 9psw7d3vv-hello-2.12.1.drv=E2=80=99 guix build: error: build of `/gnu/store/yd1p7069rs4xbbfwj5p7nzp9psw7d3vv-he= llo-2.12.1.drv' failed --8<---------------cut here---------------end--------------->8--- as expected. The funny thing is that during the test failures of tonight, none of those traces were displayed. That would mean that the failure is not caused by a missing narinfo. I added the "--debug" option to the guix-daemon on the workers as well hoping to gather more info. Thanks, Mathieu