GNU bug report logs - #72722
[cuirass] Failure to write build log leads to build failure

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludovic.courtes <at> inria.fr>

Date: Mon, 19 Aug 2024 22:43:02 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 72722 in the body.
You can then email your comments to 72722 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#72722; Package guix. (Mon, 19 Aug 2024 22:43:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ludovic Courtès <ludovic.courtes <at> inria.fr>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Mon, 19 Aug 2024 22:43:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludovic.courtes <at> inria.fr>
To: bug-guix <at> gnu.org
Subject: [cuirass] Failure to write build log leads to build failure
Date: Tue, 20 Aug 2024 00:41:14 +0200
We occasionally see failed builds with truncated logs on ci.guix.  These
happens in situations where ‘cuirass remote-worker’ gets EPIPE as it
sends the build log to ‘remote-server’:

--8<---------------cut here---------------start------------->8---
2024-08-19 19:54:52 @ substituter-started /gnu/store/sv3z77cgg2788hrl87w35bfmyhkmkv54-libomp-16.0.6.drv substitute
2024-08-19 19:54:52 Downloading http://141.80.167.131/nar/lzip/sv3z77cgg2788hrl87w35bfmyhkmkv54-libomp-16.0.6.drv...
2024-08-19 19:54:52 
2024-08-19 19:54:52 ESC[K libomp-16.0.6.drv                            1.8MiB/s 00:00 | 1KiB transferred
2024-08-19 19:54:52 ESC[K libomp-16.0.6.drv                            942KiB/s 00:00 | 1KiB transferred
2024-08-19 19:54:52 
2024-08-19 19:54:52 @ substituter-succeeded /gnu/store/sv3z77cgg2788hrl87w35bfmyhkmkv54-libomp-16.0.6.drv
2024-08-19 19:55:04 warning: zlib error in 'gzwrite' while sending log to 141.80.167.131: 0
2024-08-19 19:55:04 error: gdPO1dI1: unexpected error while building '/gnu/store/sv3z77cgg2788hrl87w35bfmyhkmkv54-libomp-16.0.6.drv': #<&compound-exception components: (#<&external-error> #<&origin origin: "fport_write"> #<&message message: "~A"> #<&irritants irritants: ("Broken pipe")> #<&exception-with-kind-and-args kind: system-error args: ("fport_write" "~A" ("Broken pipe") (32))>)>
--8<---------------cut here---------------end--------------->8---

The “unexpected error while building” message was added in
ba3ea7947ec67e9d9585e40a30d67dbd8115e73c.

This suggests that the ‘catch #t’ around the ‘send-log’ call doesn’t
have the desired effect; instead, the ‘guard’ handler is invoked.

It looks like EPIPE comes from the pipe set up by ‘build-derivations&’
and that the underlying ‘build-derivations’ call keeps writing to even
after ‘send-log’ has closed it.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#72722; Package guix. (Thu, 22 Aug 2024 14:44:02 GMT) Full text and rfc822 format available.

Message #8 received at 72722 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 72722 <at> debbugs.gnu.org
Subject: Re: bug#72722: [cuirass] Failure to write build log leads to build
 failure
Date: Thu, 22 Aug 2024 16:42:20 +0200
Hi,

Ludovic Courtès <ludovic.courtes <at> inria.fr> skribis:

> We occasionally see failed builds with truncated logs on ci.guix.  These
> happens in situations where ‘cuirass remote-worker’ gets EPIPE as it
> sends the build log to ‘remote-server’:
>
> 2024-08-19 19:54:52 @ substituter-started /gnu/store/sv3z77cgg2788hrl87w35bfmyhkmkv54-libomp-16.0.6.drv substitute
> 2024-08-19 19:54:52 Downloading http://141.80.167.131/nar/lzip/sv3z77cgg2788hrl87w35bfmyhkmkv54-libomp-16.0.6.drv...
> 2024-08-19 19:54:52 
> 2024-08-19 19:54:52 ESC[K libomp-16.0.6.drv                            1.8MiB/s 00:00 | 1KiB transferred
> 2024-08-19 19:54:52 ESC[K libomp-16.0.6.drv                            942KiB/s 00:00 | 1KiB transferred
> 2024-08-19 19:54:52 
> 2024-08-19 19:54:52 @ substituter-succeeded /gnu/store/sv3z77cgg2788hrl87w35bfmyhkmkv54-libomp-16.0.6.drv
> 2024-08-19 19:55:04 warning: zlib error in 'gzwrite' while sending log to 141.80.167.131: 0
> 2024-08-19 19:55:04 error: gdPO1dI1: unexpected error while building '/gnu/store/sv3z77cgg2788hrl87w35bfmyhkmkv54-libomp-16.0.6.drv': #<&compound-exception components: (#<&external-error> #<&origin origin: "fport_write"> #<&message message: "~A"> #<&irritants irritants: ("Broken pipe")> #<&exception-with-kind-and-args kind: system-error args: ("fport_write" "~A" ("Broken pipe") (32))>)>

But hey, why does ‘gzwrite’ fail in the first place?

I noticed that this usually happened when dumping big logs (several
MiBs) very quickly (typically the unpack phase of a large package like
LLVM producing lots of data very quickly.)

As it turns out, ‘send-log’ opens its socket with SOCK_NONBLOCK, and
then passes it to zlib, which writes to it in ‘gzwrite’.  But zlib is
not equipped to deal with EAGAIN: it just errors out, with ‘gzwrite’
returning Z_ERRNO, hence the bug above.

I was able to confirm this hypothesis by running:

   echo '(log-server (version 0))' | nc -l -p 5000 -v | \
     (sleep 10; echo starting >&2; wc -c)

and then, from a REPL:

  scheme@(cuirass remote)> (send-log "127.0.0.1" 5000 "foo.drv" (open-input-file "llvm.log"))
  2024-08-22T16:35:37 warning: zlib error in 'gzwrite' while sending log to 127.0.0.1: -1 <fd:19>: Resource temporarily unavailable
  $30 = #f

QED.  (Here I used Guile-zlib 0.2.1 with a small modification to
‘remote.scm’ so it displays the error message after Z_ERRNO = -1.)

Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Fri, 23 Aug 2024 18:49:01 GMT) Full text and rfc822 format available.

Notification sent to Ludovic Courtès <ludovic.courtes <at> inria.fr>:
bug acknowledged by developer. (Fri, 23 Aug 2024 18:49:01 GMT) Full text and rfc822 format available.

Message #13 received at 72722-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 72722-done <at> debbugs.gnu.org
Subject: Re: bug#72722: [cuirass] Failure to write build log leads to build
 failure
Date: Fri, 23 Aug 2024 20:46:55 +0200
Ludovic Courtès <ludo <at> gnu.org> skribis:

> As it turns out, ‘send-log’ opens its socket with SOCK_NONBLOCK, and
> then passes it to zlib, which writes to it in ‘gzwrite’.  But zlib is
> not equipped to deal with EAGAIN: it just errors out, with ‘gzwrite’
> returning Z_ERRNO, hence the bug above.

Fixed in 59010a5ba32a5f0802d28900908ee9c75f473a66.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#72722; Package guix. (Fri, 23 Aug 2024 18:49:02 GMT) Full text and rfc822 format available.

Message #16 received at 72722-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 72722-done <at> debbugs.gnu.org
Subject: Re: bug#72722: [cuirass] Failure to write build log leads to build
 failure
Date: Fri, 23 Aug 2024 20:47:40 +0200
Ludovic Courtès <ludo <at> gnu.org> skribis:

> As it turns out, ‘send-log’ opens its socket with SOCK_NONBLOCK, and
> then passes it to zlib, which writes to it in ‘gzwrite’.  But zlib is
> not equipped to deal with EAGAIN: it just errors out, with ‘gzwrite’
> returning Z_ERRNO, hence the bug above.

Fixed in 59010a5ba32a5f0802d28900908ee9c75f473a66.

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 21 Sep 2024 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 267 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.