GNU bug report logs - #36754
SSH connections to hydra-slave{1,2,3} fail during builds

Previous Next

Package: guix;

Reported by: Mark H Weaver <mhw <at> netris.org>

Date: Sun, 21 Jul 2019 23:59:01 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 36754 in the body.
You can then email your comments to 36754 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Sun, 21 Jul 2019 23:59:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mark H Weaver <mhw <at> netris.org>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sun, 21 Jul 2019 23:59:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: bug-guix <at> gnu.org
Subject: New linux-libre failed to build on armhf on Berlin
Date: Sun, 21 Jul 2019 19:56:20 -0400
In commit 1ad9c105c208caa9059924cbfbe4759c8101f6c9, I changed our
linux-libre packages to deblob the linux-libre source tarballs
ourselves, i.e. to run the deblobbing scripts provided by the
linux-libre project to produce linux-libre source tarballs from the
upstream linux tarballs:

  https://git.savannah.gnu.org/cgit/guix.git/commit/?id=1ad9c105c208caa9059924cbfbe4759c8101f6c9

The following queries show that the updated packages built successfully
on x86_64, i686, and aarch64, but they all failed on armhf:

  https://ci.guix.gnu.org/search?query=linux-libre-5.2.2
  https://ci.guix.gnu.org/search?query=linux-libre-4.19.60
  https://ci.guix.gnu.org/search?query=linux-libre-4.14.134
  https://ci.guix.gnu.org/search?query=linux-libre-4.9.186
  https://ci.guix.gnu.org/search?query=linux-libre-4.4.186
  https://ci.guix.gnu.org/search?query=linux-libre-arm-veyron-5.2.2
  https://ci.guix.gnu.org/search?query=linux-libre-arm-generic-5.2.2
  https://ci.guix.gnu.org/search?query=linux-libre-arm-generic-4.19.60
  https://ci.guix.gnu.org/search?query=linux-libre-arm-generic-4.14.134
  https://ci.guix.gnu.org/search?query=linux-libre-arm-omap2plus-5.2.2
  https://ci.guix.gnu.org/search?query=linux-libre-arm-omap2plus-4.19.60
  https://ci.guix.gnu.org/search?query=linux-libre-arm-omap2plus-4.14.134

Unfortunately, I'm unable to get *any* information about what went wrong
from Cuirass.  None of the failed builds have associated log files, and
the build details page has no useful information either.  For example:

  https://ci.guix.gnu.org/build/1488517/details

My first guess was that something went wrong in the 'computed' origin
that runs the deblobbing script.  However, that's apparently not the
case, because all of the updated 'linux-libre-headers' packages built
successfully on armhf, and those use the same source tarballs as the
main 'linux-libre' packages.

  https://ci.guix.gnu.org/search?query=linux-libre-headers-5.2.2
  https://ci.guix.gnu.org/search?query=linux-libre-headers-4.19.60
  https://ci.guix.gnu.org/search?query=linux-libre-headers-4.14.134

Can someone help me find out what's going on here?  Until then, I'm
sorry to say that armhf-linux users will be unable to update their
systems.

       Mark




Severity set to 'important' from 'normal' Request was from Mark H Weaver <mhw <at> netris.org> to control <at> debbugs.gnu.org. (Mon, 22 Jul 2019 00:38:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Mon, 22 Jul 2019 16:11:02 GMT) Full text and rfc822 format available.

Message #10 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: mhw <at> netris.org
Cc: 36754 <at> debbugs.gnu.org
Subject: Re: bug#36754: New linux-libre failed to build on armhf on Berlin
Date: Mon, 22 Jul 2019 18:10:46 +0200
Mark H Weaver <mhw <at> netris.org> writes:

> Unfortunately, I'm unable to get *any* information about what went wrong
> from Cuirass.  None of the failed builds have associated log files, and
> the build details page has no useful information either.  For example:
>
>   https://ci.guix.gnu.org/build/1488517/details

On that page I see a link to the build log, but it appears to be
truncated:

    https://ci.guix.gnu.org/log/33hv7mij9bqqgf5hqwrw14106z9zgav9-linux-libre-5.2.2

Maybe the build node died before the build could be completed?

-- 
Ricardo





Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Mon, 22 Jul 2019 17:15:01 GMT) Full text and rfc822 format available.

Message #13 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: 36754 <at> debbugs.gnu.org
Subject: Re: bug#36754: New linux-libre failed to build on armhf on Berlin
Date: Mon, 22 Jul 2019 13:13:11 -0400
Hi Ricardo,

Interesting.  I distinctly remember that there was no log file when I
looked last time.  Hmm.

Anyway, it seems that now, all of the failed builds have either build
logs available or else information about which dependency failed.  I
don't remember seeing any of this last time, but I'm glad to see it now.

A pattern has now emerged, but I don't know what it means.  All of the
armhf kernel builds failed except for linux-libre-arm-veyron-5.2.2,
which succeeded:

  https://ci.guix.gnu.org/build/1488502/details  (arm-veyron-5.2.2)

Apart from this anomalous success, all of the armhf 5.2.2 and 4.19.60
have a truncated log file:

  https://ci.guix.gnu.org/build/1488517/details  (5.2.2)
  https://ci.guix.gnu.org/build/1488503/details  (4.19.60)
  https://ci.guix.gnu.org/build/1488513/details  (arm-generic-5.2.2)
  https://ci.guix.gnu.org/build/1488519/details  (arm-generic-4.19.60)
  https://ci.guix.gnu.org/build/1488504/details  (arm-omap2plus-5.2.2)
  https://ci.guix.gnu.org/build/1488501/details  (arm-omap2plus-4.19.60)

This pattern seems too regular to be a coincidence.  Can we find out
which build machines were used for these builds?

All of the 4.14.134 builds failed in the deblobbing step, due to timeout
(1 hour of silence) while packing the linux-libre tarball:

  https://ci.guix.gnu.org/build/1488514/details  (4.14.134)
  https://ci.guix.gnu.org/build/1488515/details  (arm-generic-4.14.134)
  https://ci.guix.gnu.org/build/1488512/details  (arm-omap2plus-4.14.134)

I'm not sure how to deal with this.  This is a computed origin, not a
normal package, and so I don't see a way to configure a longer timeout.

Perhaps I should make the tarball packing and unpacking operations
verbose, to work around the issue.  Of course that's our usual practice,
but I find it suboptimal because any warnings will be buried in a
mountain of uninteresting output.

Thoughts?  Anyway, thanks for looking into it.

       Mark




Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Tue, 23 Jul 2019 16:47:02 GMT) Full text and rfc822 format available.

Message #16 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Marius Bakke <mbakke <at> fastmail.com>
To: Mark H Weaver <mhw <at> netris.org>, Ricardo Wurmus <rekado <at> elephly.net>
Cc: 36754 <at> debbugs.gnu.org
Subject: Re: bug#36754: New linux-libre failed to build on armhf on Berlin
Date: Tue, 23 Jul 2019 18:46:46 +0200
Mark H Weaver <mhw <at> netris.org> writes:

> Hi Ricardo,
>
> Interesting.  I distinctly remember that there was no log file when I
> looked last time.  Hmm.
>
> Anyway, it seems that now, all of the failed builds have either build
> logs available or else information about which dependency failed.  I
> don't remember seeing any of this last time, but I'm glad to see it now.
>
> A pattern has now emerged, but I don't know what it means.  All of the
> armhf kernel builds failed except for linux-libre-arm-veyron-5.2.2,
> which succeeded:
>
>   https://ci.guix.gnu.org/build/1488502/details  (arm-veyron-5.2.2)
>
> Apart from this anomalous success, all of the armhf 5.2.2 and 4.19.60
> have a truncated log file:
>
>   https://ci.guix.gnu.org/build/1488517/details  (5.2.2)
>   https://ci.guix.gnu.org/build/1488503/details  (4.19.60)
>   https://ci.guix.gnu.org/build/1488513/details  (arm-generic-5.2.2)
>   https://ci.guix.gnu.org/build/1488519/details  (arm-generic-4.19.60)
>   https://ci.guix.gnu.org/build/1488504/details  (arm-omap2plus-5.2.2)
>   https://ci.guix.gnu.org/build/1488501/details  (arm-omap2plus-4.19.60)
>
> This pattern seems too regular to be a coincidence.  Can we find out
> which build machines were used for these builds?

I tried building 5.2.2 'interactively' on Berlin, and got an SSH error:

  CC [M]  net/openvswitch/vport-geneve.o
  CC [M]  net/openvswitch/vport-gre.o
  LD [M]  net/openvswitch/openvswitch.o
;;; [2019/07/23 05:14:53.501502, 0] read_from_channel_port: [GSSH ERROR] Error reading from the channel: #<input-output: channel (closed) 14c0e60>
Backtrace:
          16 (apply-smob/1 #<catch-closure b79640>)
In ice-9/boot-9.scm:
    705:2 15 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
In ice-9/eval.scm:
    619:8 14 (_ #(#(#<directory (guile-user) bfb140>)))
In guix/ui.scm:
  1747:12 13 (run-guix-command _ . _)
In guix/scripts/offload.scm:
   781:22 12 (guix-offload . _)
In ice-9/boot-9.scm:
    829:9 11 (catch _ _ #<procedure 7f576678d910 at guix/ui.scm:703…> …)
    829:9 10 (catch _ _ #<procedure 7f576678d928 at guix/ui.scm:826…> …)
In guix/scripts/offload.scm:
   580:19  9 (process-request _ _ _ _ #:print-build-trace? _ # _ # _)
    531:6  8 (call-with-timeout _ _ _)
    361:2  7 (transfer-and-offload #<derivation /gnu/store/yfns7ga4…> …)
In ice-9/boot-9.scm:
    829:9  6 (catch _ _ #<procedure dbdab0 at guix/scripts/offload.…> …)
In guix/scripts/offload.scm:
    385:6  5 (_)
In guix/store.scm:
  1203:15  4 (_ #<store-connection 256.99 19a0ba0> _ _)
   692:11  3 (process-stderr #<store-connection 256.99 19a0ba0> _)
In guix/serialization.scm:
    87:11  2 (read-int _)
    73:12  1 (get-bytevector-n* #<input-output: channel (closed) 14…> …)
In unknown file:
           0 (get-bytevector-n #<input-output: channel (closed) 14c…> …)

ERROR: In procedure get-bytevector-n:
Throw to key `guile-ssh-error' with args `("read_from_channel_port" "Error reading from the channel" #<input-output: channel (closed) 14c0e60> #f)'.
guix build: error: build of `/gnu/store/yfns7ga468vmv9jn72snk79b16p8mhfa-linux-libre-5.2.2.drv' failed

real    637m24.906s
user    0m6.661s
sys     0m0.897s

Unfortunately I failed to record which machine was used and don't know a
way to find out after the fact.




Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Tue, 23 Jul 2019 17:36:01 GMT) Full text and rfc822 format available.

Message #19 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: Marius Bakke <mbakke <at> fastmail.com>
Cc: Ricardo Wurmus <rekado <at> elephly.net>, 36754 <at> debbugs.gnu.org,
 Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#36754: SSH connections to hydra-slave{1,2,3} fail during
 builds (was: New linux-libre failed to build on armhf on Berlin)
Date: Tue, 23 Jul 2019 13:33:09 -0400
retitle 36754 SSH connections to hydra-slave{1,2,3} fail during builds
thanks

Hi,

I've added Ludovic to the CC list, since he recently added
hydra-slave{1,2,3} to Berlin.

Marius wrote:
> I tried building 5.2.2 'interactively' on Berlin, and got an SSH error:
> 
>   CC [M]  net/openvswitch/vport-geneve.o
>   CC [M]  net/openvswitch/vport-gre.o
>   LD [M]  net/openvswitch/openvswitch.o
> ;;; [2019/07/23 05:14:53.501502, 0] read_from_channel_port: [GSSH ERROR] Error reading from the channel: #<input-output: channel (closed) 14c0e60>
> Backtrace:
>           16 (apply-smob/1 #<catch-closure b79640>)
> In ice-9/boot-9.scm:
>     705:2 15 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
> In ice-9/eval.scm:
>     619:8 14 (_ #(#(#<directory (guile-user) bfb140>)))
> In guix/ui.scm:
>   1747:12 13 (run-guix-command _ . _)
> In guix/scripts/offload.scm:
>    781:22 12 (guix-offload . _)
> In ice-9/boot-9.scm:
>     829:9 11 (catch _ _ #<procedure 7f576678d910 at guix/ui.scm:703…> …)
>     829:9 10 (catch _ _ #<procedure 7f576678d928 at guix/ui.scm:826…> …)
> In guix/scripts/offload.scm:
>    580:19  9 (process-request _ _ _ _ #:print-build-trace? _ # _ # _)
>     531:6  8 (call-with-timeout _ _ _)
>     361:2  7 (transfer-and-offload #<derivation /gnu/store/yfns7ga4…> …)
> In ice-9/boot-9.scm:
>     829:9  6 (catch _ _ #<procedure dbdab0 at guix/scripts/offload.…> …)
> In guix/scripts/offload.scm:
>     385:6  5 (_)
> In guix/store.scm:
>   1203:15  4 (_ #<store-connection 256.99 19a0ba0> _ _)
>    692:11  3 (process-stderr #<store-connection 256.99 19a0ba0> _)
> In guix/serialization.scm:
>     87:11  2 (read-int _)
>     73:12  1 (get-bytevector-n* #<input-output: channel (closed) 14…> …)
> In unknown file:
>            0 (get-bytevector-n #<input-output: channel (closed) 14c…> …)
> 
> ERROR: In procedure get-bytevector-n:
> Throw to key `guile-ssh-error' with args `("read_from_channel_port" "Error reading from the channel" #<input-output: channel (closed) 14c0e60> #f)'.
> guix build: error: build of `/gnu/store/yfns7ga468vmv9jn72snk79b16p8mhfa-linux-libre-5.2.2.drv' failed
> 
> real    637m24.906s
> user    0m6.661s
> sys     0m0.897s

Thank you, this is helpful.

> Unfortunately I failed to record which machine was used and don't know a
> way to find out after the fact.

I believe it was hydra-slave2, one of the three armhf machines that I
host which were formerly part of hydra.gnu.org's build farm and were
recently added to Berlin by Ludovic.  I checked hydra-slave{1,2,3} for
build log files corresponding to the derivation above, and found that
all three of them have been attempted recently:

hydra-slave2 attempted to build it on July 23 08:07 UTC.
hydra-slave3 attempted to build it on July 22 16:40 UTC.
hydra-slave1 attempted to build it on July 22 04:44 UTC.

To be precise, each of those dates correspond to the end of the build
attempt.  All three build logs are truncated on the build machine as
well, with no error message at the end.

I now believe that these failures are related to the newly added armhf
build slaves, and that they have nothing to do with the recent changes
to our linux-libre packages.

Well, except for the silence timeout that sometimes happens on slower
machines while deblobbing linux-libre.  That's a separate issue.

      Thanks,
        Mark




Changed bug title to 'SSH connections to hydra-slave{1,2,3} fail during builds' from 'New linux-libre failed to build on armhf on Berlin' Request was from Mark H Weaver <mhw <at> netris.org> to control <at> debbugs.gnu.org. (Tue, 23 Jul 2019 17:36:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Tue, 23 Jul 2019 17:52:02 GMT) Full text and rfc822 format available.

Message #24 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: Marius Bakke <mbakke <at> fastmail.com>
Cc: Ricardo Wurmus <rekado <at> elephly.net>, 36754 <at> debbugs.gnu.org,
 Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#36754: SSH connections to hydra-slave{1,2,3} fail during
 builds (was: New linux-libre failed to build on armhf on Berlin)
Date: Tue, 23 Jul 2019 13:49:24 -0400
I wrote earlier:
> I now believe that these failures are related to the newly added armhf
> build slaves, and that they have nothing to do with the recent changes
> to our linux-libre packages.

I should mention that the armhf build slaves are on a private network,
and I use my public-facing internet server to forward TCP connections to
them, using the following entries in /etc/inetd.conf:

--8<---------------cut here---------------start------------->8---
# TCP-level forwards for SSH connections to build machines for the GNU
# Guix build farm:
7275    stream  tcp     nowait  nobody  /bin/nc /bin/nc -w 10 172.19.189.11 7275
7276    stream  tcp     nowait  nobody  /bin/nc /bin/nc -w 10 172.19.189.12 7276
7274    stream  tcp     nowait  nobody  /bin/nc /bin/nc -w 10 172.19.189.13 7274
--8<---------------cut here---------------end--------------->8---

It's possible that this arrangement is somehow part of the problem.
However, note that nothing has changed here in several years, and it
worked fine on hydra.gnu.org.  The build slaves were running a *very*
old version of Guix though.  It seems likely that the new Guile-SSH code
doesn't cope well with this setup.

       Mark




Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Tue, 23 Jul 2019 21:27:02 GMT) Full text and rfc822 format available.

Message #27 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mark H Weaver <mhw <at> netris.org>
Cc: Ricardo Wurmus <rekado <at> elephly.net>, Marius Bakke <mbakke <at> fastmail.com>,
 36754 <at> debbugs.gnu.org
Subject: Re: bug#36754: SSH connections to hydra-slave{1, 2,
 3} fail during builds
Date: Tue, 23 Jul 2019 23:26:27 +0200
Hi Mark,

Mark H Weaver <mhw <at> netris.org> skribis:

> I wrote earlier:
>> I now believe that these failures are related to the newly added armhf
>> build slaves, and that they have nothing to do with the recent changes
>> to our linux-libre packages.
>
> I should mention that the armhf build slaves are on a private network,
> and I use my public-facing internet server to forward TCP connections to
> them, using the following entries in /etc/inetd.conf:
>
> # TCP-level forwards for SSH connections to build machines for the GNU
> # Guix build farm:
> 7275    stream  tcp     nowait  nobody  /bin/nc /bin/nc -w 10 172.19.189.11 7275
> 7276    stream  tcp     nowait  nobody  /bin/nc /bin/nc -w 10 172.19.189.12 7276
> 7274    stream  tcp     nowait  nobody  /bin/nc /bin/nc -w 10 172.19.189.13 7274
>
> It's possible that this arrangement is somehow part of the problem.
> However, note that nothing has changed here in several years, and it
> worked fine on hydra.gnu.org.  The build slaves were running a *very*
> old version of Guix though.  It seems likely that the new Guile-SSH code
> doesn't cope well with this setup.

I noticed that connections to the machines were unstable (using
OpenSSH’s client).  That is, the connection would eventually “hang”,
apparently several times a day.

Currently we have an SSH tunnel set up on berlin to connect to each of
these machines via overdrive1.guixsd.org.  This setup proved to be
robust in the past (we used it to connect to another build machine), so
I suspect something’s wrong on “your” end of the network.  It’s hard to
tell exactly what, though.

Ideas?

If it’s causing build failures, I’m afraid we’ll have to comment out
those machines from berlin’s machines.scm until we’ve figured it out.

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Tue, 23 Jul 2019 21:56:02 GMT) Full text and rfc822 format available.

Message #30 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Mark H Weaver <mhw <at> netris.org>, Marius Bakke <mbakke <at> fastmail.com>,
 36754 <at> debbugs.gnu.org
Subject: Re: bug#36754: SSH connections to hydra-slave{1, 2,
 3} fail during builds
Date: Tue, 23 Jul 2019 23:55:13 +0200
Ludovic Courtès <ludo <at> gnu.org> writes:

> Currently we have an SSH tunnel set up on berlin to connect to each of
> these machines via overdrive1.guixsd.org.  This setup proved to be
> robust in the past (we used it to connect to another build machine), so
> I suspect something’s wrong on “your” end of the network.  It’s hard to
> tell exactly what, though.

FWIW by the end of this week we should have the firewall changes
implemented so we can do without the SSH tunnel.

--
Ricardo





Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Wed, 24 Jul 2019 11:13:02 GMT) Full text and rfc822 format available.

Message #33 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Ricardo Wurmus <rekado <at> elephly.net>, Marius Bakke <mbakke <at> fastmail.com>,
 36754 <at> debbugs.gnu.org
Subject: Re: bug#36754: SSH connections to hydra-slave{1, 2,
 3} fail during builds
Date: Wed, 24 Jul 2019 07:09:38 -0400
Hi Ludovic,

Ludovic Courtès <ludo <at> gnu.org> wrote:
> I noticed that connections to the machines were unstable (using
> OpenSSH’s client).  That is, the connection would eventually “hang”,
> apparently several times a day.
>
> Currently we have an SSH tunnel set up on berlin to connect to each of
> these machines via overdrive1.guixsd.org.  This setup proved to be
> robust in the past (we used it to connect to another build machine), so
> I suspect something’s wrong on “your” end of the network.  It’s hard to
> tell exactly what, though.
>
> Ideas?

Okay, I'll look into it.  I'm very busy with something else for the next
couple of days, but I'll try to get to it in the next week.

> If it’s causing build failures, I’m afraid we’ll have to comment out
> those machines from berlin’s machines.scm until we’ve figured it out.

Agreed.

     Thanks,
       Mark




Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Wed, 24 Jul 2019 14:57:14 GMT) Full text and rfc822 format available.

Message #36 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mark H Weaver <mhw <at> netris.org>
Cc: Ricardo Wurmus <rekado <at> elephly.net>, Marius Bakke <mbakke <at> fastmail.com>,
 36754 <at> debbugs.gnu.org
Subject: Re: bug#36754: SSH connections to hydra-slave{1, 2,
 3} fail during builds
Date: Wed, 24 Jul 2019 16:56:35 +0200
Hello,

Mark H Weaver <mhw <at> netris.org> skribis:

> Ludovic Courtès <ludo <at> gnu.org> wrote:
>> I noticed that connections to the machines were unstable (using
>> OpenSSH’s client).  That is, the connection would eventually “hang”,
>> apparently several times a day.
>>
>> Currently we have an SSH tunnel set up on berlin to connect to each of
>> these machines via overdrive1.guixsd.org.  This setup proved to be
>> robust in the past (we used it to connect to another build machine), so
>> I suspect something’s wrong on “your” end of the network.  It’s hard to
>> tell exactly what, though.
>>
>> Ideas?
>
> Okay, I'll look into it.  I'm very busy with something else for the next
> couple of days, but I'll try to get to it in the next week.

OK!

>> If it’s causing build failures, I’m afraid we’ll have to comment out
>> those machines from berlin’s machines.scm until we’ve figured it out.
>
> Agreed.

I’ve commented them out now.

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Thu, 01 Aug 2019 14:10:03 GMT) Full text and rfc822 format available.

Message #39 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Marius Bakke <mbakke <at> fastmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>, Mark H Weaver
 <mhw <at> netris.org>
Cc: Ricardo Wurmus <rekado <at> elephly.net>, 36754 <at> debbugs.gnu.org
Subject: Re: bug#36754: SSH connections to hydra-slave{1, 2,
 3} fail during builds
Date: Thu, 01 Aug 2019 16:09:00 +0200
[Message part 1 (text/plain, inline)]
The truncated log files seems to happen for other builds as well, even
within the Berlin data center.

https://ci.guix.gnu.org/log/n3ra1b8ic6qhfinnhb80mrn7snsqws9d-geocode-glib-3.26.0
https://ci.guix.gnu.org/log/zqhqlib00i8f7f10g4c2dfzprw16h4xv-scintilla-4.2.0
https://ci.guix.gnu.org/log/718jmbq94mvdgnmjyqgxgy7zaj8xzxk3-htslib-1.9

All of these builds are for i686-linux.

Mark: are the armhf nodes still operational?  I would like to re-enable
them again, since we desperately need the computing power with four huge
branches going concurrently at the moment.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Thu, 01 Aug 2019 15:57:01 GMT) Full text and rfc822 format available.

Message #42 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Mark H Weaver <mhw <at> netris.org>, Marius Bakke <mbakke <at> fastmail.com>,
 36754 <at> debbugs.gnu.org
Subject: Re: bug#36754: SSH connections to hydra-slave{1, 2,
 3} fail during builds
Date: Thu, 01 Aug 2019 17:39:19 +0200
Ricardo Wurmus <rekado <at> elephly.net> writes:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Currently we have an SSH tunnel set up on berlin to connect to each of
>> these machines via overdrive1.guixsd.org.  This setup proved to be
>> robust in the past (we used it to connect to another build machine), so
>> I suspect something’s wrong on “your” end of the network.  It’s hard to
>> tell exactly what, though.
>
> FWIW by the end of this week we should have the firewall changes
> implemented so we can do without the SSH tunnel.

The firewall changes have been applied today.

-- 
Ricardo





Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Thu, 01 Aug 2019 16:40:01 GMT) Full text and rfc822 format available.

Message #45 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: Marius Bakke <mbakke <at> fastmail.com>
Cc: Ricardo Wurmus <rekado <at> elephly.net>, 36754 <at> debbugs.gnu.org,
 Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#36754: SSH connections to hydra-slave{1, 2,
 3} fail during builds
Date: Thu, 01 Aug 2019 12:37:31 -0400
Hi Marius,

Marius Bakke <mbakke <at> fastmail.com> wrote:

> The truncated log files seems to happen for other builds as well, even
> within the Berlin data center.
>
> https://ci.guix.gnu.org/log/n3ra1b8ic6qhfinnhb80mrn7snsqws9d-geocode-glib-3.26.0
> https://ci.guix.gnu.org/log/zqhqlib00i8f7f10g4c2dfzprw16h4xv-scintilla-4.2.0
> https://ci.guix.gnu.org/log/718jmbq94mvdgnmjyqgxgy7zaj8xzxk3-htslib-1.9
>
> All of these builds are for i686-linux.

Thanks, that's very useful information.

> Mark: are the armhf nodes still operational?

I assume so.  They all respond to pings anyway, and I haven't touched
them since before they were disconnected from Berlin.  (I would need to
boot up my other, more secure computer to try SSHing into them).

> I would like to re-enable them again, since we desperately need the
> computing power with four huge branches going concurrently at the
> moment.

I have no objection, but since Ludovic made the decision to disconnect
them, it would be good to hear from him first.

       Thanks,
         Mark




Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Thu, 01 Aug 2019 21:07:01 GMT) Full text and rfc822 format available.

Message #48 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Mark H Weaver <mhw <at> netris.org>
Cc: Marius Bakke <mbakke <at> fastmail.com>,
 Ludovic Courtès <ludo <at> gnu.org>, 36754 <at> debbugs.gnu.org
Subject: Re: bug#36754: SSH connections to hydra-slave{1, 2,
 3} fail during builds
Date: Thu, 01 Aug 2019 23:06:19 +0200
Mark H Weaver <mhw <at> netris.org> writes:

>> Mark: are the armhf nodes still operational?
>
> I assume so.  They all respond to pings anyway, and I haven't touched
> them since before they were disconnected from Berlin.  (I would need to
> boot up my other, more secure computer to try SSHing into them).
>
>> I would like to re-enable them again, since we desperately need the
>> computing power with four huge branches going concurrently at the
>> moment.
>
> I have no objection, but since Ludovic made the decision to disconnect
> them, it would be good to hear from him first.

Now that we should be able to SSH to them directly from Berlin we can
try connecting and perhaps upgrading the guix-daemon on these machines.

--
Ricardo





Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Wed, 07 Aug 2019 14:32:02 GMT) Full text and rfc822 format available.

Message #51 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Mark H Weaver <mhw <at> netris.org>
Cc: Marius Bakke <mbakke <at> fastmail.com>,
 Ludovic Courtès <ludo <at> gnu.org>,
 36754 <at> debbugs.gnu.org
Subject: Re: bug#36754: SSH connections to hydra-slave{1, 2,
 3} fail during builds
Date: Wed, 07 Aug 2019 16:30:54 +0200
Ricardo Wurmus <rekado <at> elephly.net> writes:

> Mark H Weaver <mhw <at> netris.org> writes:
>
>>> Mark: are the armhf nodes still operational?
>>
>> I assume so.  They all respond to pings anyway, and I haven't touched
>> them since before they were disconnected from Berlin.  (I would need to
>> boot up my other, more secure computer to try SSHing into them).
>>
>>> I would like to re-enable them again, since we desperately need the
>>> computing power with four huge branches going concurrently at the
>>> moment.
>>
>> I have no objection, but since Ludovic made the decision to disconnect
>> them, it would be good to hear from him first.
>
> Now that we should be able to SSH to them directly from Berlin we can
> try connecting and perhaps upgrading the guix-daemon on these machines.

I have removed the SSH tunnel configuration from /etc/guix/machines.scm
and re-enabled the machines.

Let’s see if this makes any difference.  If not we should try to upgrade
Guix on these build machines.

--
Ricardo





Information forwarded to bug-guix <at> gnu.org:
bug#36754; Package guix. (Fri, 16 Aug 2019 10:27:01 GMT) Full text and rfc822 format available.

Message #54 received at 36754 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: Mark H Weaver <mhw <at> netris.org>, Marius Bakke <mbakke <at> fastmail.com>,
 36754 <at> debbugs.gnu.org
Subject: Re: bug#36754: SSH connections to hydra-slave{1, 2,
 3} fail during builds
Date: Fri, 16 Aug 2019 12:25:57 +0200
Hi,

Ricardo Wurmus <rekado <at> elephly.net> skribis:

> Ricardo Wurmus <rekado <at> elephly.net> writes:
>
>> Mark H Weaver <mhw <at> netris.org> writes:
>>
>>>> Mark: are the armhf nodes still operational?
>>>
>>> I assume so.  They all respond to pings anyway, and I haven't touched
>>> them since before they were disconnected from Berlin.  (I would need to
>>> boot up my other, more secure computer to try SSHing into them).
>>>
>>>> I would like to re-enable them again, since we desperately need the
>>>> computing power with four huge branches going concurrently at the
>>>> moment.
>>>
>>> I have no objection, but since Ludovic made the decision to disconnect
>>> them, it would be good to hear from him first.
>>
>> Now that we should be able to SSH to them directly from Berlin we can
>> try connecting and perhaps upgrading the guix-daemon on these machines.
>
> I have removed the SSH tunnel configuration from /etc/guix/machines.scm
> and re-enabled the machines.
>
> Let’s see if this makes any difference.

Is it working well now?

> If not we should try to upgrade Guix on these build machines.

I think there’s a misunderstanding: these machines used to run a very
old Guix but I installed 1.0 from scratch before migrating them to
berlin.

Thanks,
Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Thu, 12 Sep 2019 08:42:01 GMT) Full text and rfc822 format available.

Notification sent to Mark H Weaver <mhw <at> netris.org>:
bug acknowledged by developer. (Thu, 12 Sep 2019 08:42:01 GMT) Full text and rfc822 format available.

Message #59 received at 36754-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: Mark H Weaver <mhw <at> netris.org>, Marius Bakke <mbakke <at> fastmail.com>,
 36754-done <at> debbugs.gnu.org
Subject: Re: bug#36754: SSH connections to hydra-slave{1, 2,
 3} fail during builds
Date: Thu, 12 Sep 2019 10:41:32 +0200
Hello,

AFAICS we no longer have connection issues to
hydra-slave{1,2,3}.netris.org so I’m closing this bug.

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 10 Oct 2019 11:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 249 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.