GNU bug report logs - #66692
[cuirass] Spurious failed builds following patching u-boot sources

Previous Next

Package: guix;

Reported by: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>

Date: Mon, 23 Oct 2023 02:58:02 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 66692 in the body.
You can then email your comments to 66692 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#66692; Package guix. (Mon, 23 Oct 2023 02:58:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Mon, 23 Oct 2023 02:58:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: bug-guix <bug-guix <at> gnu.org>
Subject: [cuirass] Spurious failed builds following patching u-boot sources
Date: Sun, 22 Oct 2023 22:57:06 -0400
Hi,

I've noticed build failure events from Cuirass such as:

 --8<---------------cut here---------------start------------->8---
. [   ?: Cuirass                ] Build u-boot-sifive-unmatched.x86_64-linux on master is broken.
R. [   ?: Cuirass                ] Build u-boot-am335x-boneblack.x86_64-linux on master is broken.
 . [   ?: Cuirass                ] Build u-boot-a20-olinuxino-lime.x86_64-linux on master is broken.
 . [   ?: Cuirass                ] Build u-boot-wandboard.powerpc64le-linux on master is broken.
 . [   ?: Cuirass                ] Build u-boot-qemu-riscv64-smode.aarch64-linux on master is broken.
 . [   ?: Cuirass                ] Build u-boot-rpi-2-efi.powerpc64le-linux on master is broken.
 . [   ?: Cuirass                ] Build u-boot-qemu-arm.x86_64-linux on master is broken.
 . [   ?: Cuirass                ] Build u-boot-rpi-2.x86_64-linux on master is broken.
 --8<---------------cut here---------------end--------------->8---

There are a few more, and they were triggered by applying a patch (that
only touches the patman tool) in this commit:
<https://git.savannah.gnu.org/cgit/guix.git/log/?id=faeebdc37e9137d62e11175e83988f77496a6641>.

The build failures are not captured in the logs, which appear truncated,
e.g. from <https://ci.guix.gnu.org/build/2294052/details> (log empty --
builder: sjd-p9, worker N38vG7OW) or from
<https://ci.guix.gnu.org/build/2294218/details> (builder:
hydra-guix-107, worker 4CWieLpQ) which ends with:

--8<---------------cut here---------------start------------->8---
/tmp/guix-build-gcc-cross-arm-linux-gnueabihf-11.3.0.drv-0/build/./gcc/nm -pg _arm_fixsfsi_s.o | gawk 'NF == 3 && $2 !~ /^[UN]$/ && $3 !~ /.*_compat/ && $3 !~ /.*@.*/ { print "\t.hidden", $3 }' > _arm_fixsfsi.visT
/tmp/guix-build-gcc-cross-arm-linux-gnueabihf-11.3.0.drv-0/build/./gcc/nm -pg _arm_fixunssfsi_s.o | gawk 'NF == 3 && $2 !~ /^[UN]$/ && $3 !~ /.*_compat/ && $3 !~ /.*@.*/ { print "\t.hidden", $3 }' > _arm_fixunssfsi.visT
mv -f _arm_truncdfsf2.visT _arm_truncdfsf2.vis
/tmp/guix-build-gcc-cross-arm-linux-gnueabihf-11.3.0.drv-0/build/./gcc/nm -pg _arm_floatdidf_s.o | gawk 'NF == 3 && $2 !~ /^[UN]$/ && $3 !~ /.*_compat/ && $3 !~ /.*@.*/ { print "\t.hidden", $3 }' > _arm_floatdidf.visT
mv -f _arm_negsf2.visT _arm_negsf2.vis
mv -f _arm_addsubsf3.visT _arm_addsubsf3.vis
/tmp/guix-build-gcc-cross-arm-linux-gnueabihf-11.3.0.drv-0/build/./gcc/nm -pg _arm_floatdisf_s.o | gawk 'NF == 3 && $2 !~ /^[UN]$/ && $3 !~ /.*_compat/ && $3 !~ /.*@.*/ { print "\t.hidden", $3 }' > _arm_floatdisf.visT
/tmp/guix-build-gcc-cross-arm-linux-gnueabihf-11.3.0.drv-0/build/./gcc/nm -pg _arm_floatundi
--8<---------------cut here---------------end--------------->8---

These appear to be some problem with the CI software (probably not
hardware, as many workers across different architectures suffered the
same problem) rather than packaging problems.

I didn't see anything useful in /var/log/cuirass.log or
/var/log/cuirass-remote-worker.log berlin.  I haven't checked the logs
on the builders.

-- 
Thanks,
Maxim




Information forwarded to bug-guix <at> gnu.org:
bug#66692; Package guix. (Mon, 23 Oct 2023 20:51:02 GMT) Full text and rfc822 format available.

Message #8 received at 66692 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: 66692 <at> debbugs.gnu.org
Subject: Re: bug#66692: [cuirass] Spurious failed builds following patching
 u-boot sources
Date: Mon, 23 Oct 2023 22:49:56 +0200
Hi,

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skribis:

> The build failures are not captured in the logs, which appear truncated,
> e.g. from <https://ci.guix.gnu.org/build/2294052/details> (log empty --
> builder: sjd-p9, worker N38vG7OW) 

On sjd-p9, I get:

--8<---------------cut here---------------start------------->8---
root <at> sjd-p9:~# journalctl -u cuirass-remote-worker.service |grep -C2 s167lh0w0azfxx83ri8k0mn2gj57i5rl
Oct 23 00:06:21 sjd-p9 cuirass[531]: N38vG7OW: derivation /gnu/store/kcv6hfqz94f1vdw24viw7d9ih6v9zi1j-libkmahjongg-23.04.3.drv build succeeded.
Oct 23 00:06:21 sjd-p9 cuirass[531]: N38vG7OW: request work.
Oct 23 00:06:21 sjd-p9 cuirass[531]: N38vG7OW: building derivation `/gnu/store/s167lh0w0azfxx83ri8k0mn2gj57i5rl-u-boot-rpi-4-32b-2023.07.02.drv' (system: powerpc64le-linux)
Oct 23 00:06:32 sjd-p9 cuirass[531]: C2q41fts: ping tcp://10.0.0.1:5555.
Oct 23 00:06:32 sjd-p9 cuirass[531]: BrnR2Sju: ping tcp://10.0.0.1:5555.
--
Oct 23 00:34:16 sjd-p9 cuirass[531]: vA6cYbb5: request work.
Oct 23 00:34:16 sjd-p9 cuirass[531]: vA6cYbb5: building derivation `/gnu/store/lj4h1g1zq235l6i8mjh88s1ylvj7ksxd-u-boot-pine64-plus-2023.07.02.drv' (system: powerpc64le-linux)
Oct 23 00:34:29 sjd-p9 cuirass[531]: N38vG7OW: derivation /gnu/store/s167lh0w0azfxx83ri8k0mn2gj57i5rl-u-boot-rpi-4-32b-2023.07.02.drv build failed.
Oct 23 00:34:29 sjd-p9 cuirass[531]: N38vG7OW: request work.
Oct 23 00:34:29 sjd-p9 cuirass[531]: N38vG7OW: building derivation `/gnu/store/dy8g5pi1gv8dza1av9957ic6zzj4l88j-u-boot-tools-2023.07.02.drv' (system: powerpc64le-linux)
root <at> sjd-p9:~# guix build --log-file /gnu/store/s167lh0w0azfxx83ri8k0mn2gj57i5rl-u-boot-rpi-4-32b-2023.07.02.drv
https://ci.guix.gnu.org/log/0yzfwl2b017gsnf60wvr7dnn2g26wrj7-u-boot-rpi-4-32b-2023.07.02
--8<---------------cut here---------------end--------------->8---

So it looks as though the build was attempted and failed, but there’s no
local build log in /var/log/guix/drvs, meaning that, most likely, no
build process was even started.

The log of guix-daemon provide few hints:

--8<---------------cut here---------------start------------->8---
root <at> sjd-p9:~# journalctl -u guix-daemon |grep 'Oct 23 00:'

[...]

Oct 23 00:06:21 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:06:21 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:06:55 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:07:51 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:16:25 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:17:15 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:33:42 sjd-p9 guix-daemon[2097426]: SIGPOLL
Oct 23 00:33:42 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:33:49 sjd-p9 guix-daemon[2097426]: unexpected build daemon error: interrupted by the user
Oct 23 00:34:16 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:34:16 sjd-p9 guix-daemon[1990649]: SIGPOLL
Oct 23 00:34:23 sjd-p9 guix-daemon[1990649]: unexpected build daemon error: interrupted by the user
Oct 23 00:34:29 sjd-p9 guix-daemon[2052532]: SIGPOLL
Oct 23 00:34:29 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:34:30 sjd-p9 guix-daemon[2052532]: unexpected build daemon error: interrupted by the user
Oct 23 00:34:38 sjd-p9 guix-daemon[1996946]: SIGPOLL
Oct 23 00:34:38 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:34:38 sjd-p9 guix-daemon[1996946]: unexpected build daemon error: interrupted by the user
Oct 23 00:34:45 sjd-p9 guix-daemon[2248556]: SIGPOLL
Oct 23 00:34:45 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:34:45 sjd-p9 guix-daemon[2248556]: unexpected build daemon error: interrupted by the user
Oct 23 00:35:33 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:36:46 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
--8<---------------cut here---------------end--------------->8---

One possibility would have been that ‘guix gc’ was running, preventing
the build process from starting (“big GC lock” taken); however, if this
were the case, we’d see a connection from a different process in the
guix-daemon log.

> or from <https://ci.guix.gnu.org/build/2294218/details> (builder:
> hydra-guix-107, worker 4CWieLpQ) which ends with:

Here we get:

--8<---------------cut here---------------start------------->8---
root <at> hydra-guix-107 ~# grep -C2 jy463789qf42dn4zk5i5jzry2lf82ypy /var/log/cuirass-remote-worker.log
2023-10-23 00:17:34 vDKeRcHy: no available build.
2023-10-23 00:17:46 4CWieLpQ: request work.
2023-10-23 00:17:46 4CWieLpQ: building derivation `/gnu/store/jy463789qf42dn4zk5i5jzry2lf82ypy-u-boot-cubietruck-2023.07.02.drv' (system: x86_64-linux)
2023-10-23 00:18:03 zARGXQMa: request work.
2023-10-23 00:18:03 zARGXQMa: no available build.
--
2023-10-23 00:27:35 vDKeRcHy: request work.
2023-10-23 00:27:35 vDKeRcHy: no available build.
2023-10-23 00:27:42 4CWieLpQ: derivation /gnu/store/jy463789qf42dn4zk5i5jzry2lf82ypy-u-boot-cubietruck-2023.07.02.drv build failed.
2023-10-23 00:27:42 4CWieLpQ: request work.
2023-10-23 00:27:42 4CWieLpQ: no available build.
root <at> hydra-guix-107 ~# ls /var/log/guix/drvs/*/*u-boot-cubietruck-2023*
ls: cannot access '/var/log/guix/drvs/*/*u-boot-cubietruck-2023*': No such file or directory
--8<---------------cut here---------------end--------------->8---

And likewise:

--8<---------------cut here---------------start------------->8---
root <at> hydra-guix-107 ~# grep -C2 '10-23 00:27:' /var/log/guix-daemon.log
2023-10-23 00:17:46 accepted connection from pid 64022, user root
2023-10-23 00:19:34 accepted connection from pid 64022, user root
2023-10-23 00:27:42 SIGPOLL
2023-10-23 00:27:45 unexpected build daemon error: interrupted by the user
2023-10-23 00:33:33 accepted connection from pid 18016, user hydra
2023-10-23 00:33:34 accepted connection from pid 18035, user hydra
--8<---------------cut here---------------end--------------->8---

But hey, as a matter of fact, ‘cuirass remote-server’ on berlin was
restarted at around that time, as can be seen in
/var/log/cuirass-remote-server.log:

--8<---------------cut here---------------start------------->8---
2023-10-23 00:26:06 0 items queued for eventual download
2023-10-23 00:26:55 period update: 0 resumable, 0 failed builds.
2023-10-23 00:26:58 error: failed to start worker/database proxy: Invalid argument
2023-10-23 00:26:59 increased maximum number of open files from 1024 to 2048
2023-10-23 00:26:59 listening for build logs on port 5556
2023-10-23 00:26:59 deleting old build logs from '/var/cache/cuirass/remote/'...
2023-10-23 00:28:49 selected 1115 build logs to remove
2023-10-23 00:28:49 WARNING: (cuirass scripts remote-server): imported module (fibers) overrides core binding `sleep'
2023-10-23 00:28:49 starting fetch worker with up to 8 concurrent downloads
2023-10-23 00:28:49 build failed: '/gnu/store/qlmhcyk9ac30464v99i2ys0irs7jz3lg-u-boot-am335x-evm-2023.07.02.drv'
2023-10-23 00:28:49 build failed: '/gnu/store/dl3far2xa4paq6nl6pp9iklpxrwg1wy2-ungoogled-chromium-wayland-112.0.5615.165-1.drv'
2023-10-23 00:28:50 build failed: '/gnu/store/y9fwzzgvyhvrmf0avihm6944j8rvpg6m-u-boot-mx6cuboxi-2023.07.02.drv'
2023-10-23 00:28:50 build failed: '/gnu/store/5bhl5q86j6bkqx8zybq29khndhf050jm-u-boot-rpi-2-2023.07.02.drv'
2023-10-23 00:28:50 build failed: '/gnu/store/16d5vq3dlzy1f4s7vazlfg6358bl8dlc-u-boot-qemu-arm-2023.07.02.drv'
2023-10-23 00:28:51 build failed: '/gnu/store/i88vczhcnlz462cmpls8qgx5n3ssw7rc-u-boot-qemu-arm-2023.07.02.drv'
2023-10-23 00:28:51 build failed: '/gnu/store/jy463789qf42dn4zk5i5jzry2lf82ypy-u-boot-cubietruck-2023.07.02.drv'
2023-10-23 00:28:51 build failed: '/gnu/store/i1ri2nkh1bcfzynmggbjg0rjlyi121y8-u-boot-rpi-3-32b-2023.07.02.drv'
2023-10-23 00:28:51 build failed: '/gnu/store/q08xw052r484c1c1nj31sx0fg5yf2py9-openjdk-18.0.2.drv'
2023-10-23 00:28:52 build failed: '/gnu/store/5j4pp2b5dx2ykw4na2wbmiwp84g0hi4i-u-boot-am335x-boneblack-2023.07.02.drv'
2023-10-23 00:28:52 build failed: '/gnu/store/nqp61id4bavvcsib307rw5234myl7ykr-u-boot-rpi-3-32b-efi-2023.07.02.drv'
2023-10-23 00:28:52 build failed: '/gnu/store/9h7x080cn44ik79mrzycbg2x0dvvgigc-u-boot-a20-olinuxino-lime-2023.07.02.drv'
2023-10-23 00:28:52 build failed: '/gnu/store/mkwy8ywa6mrw5zdxwr88skyw94afgaqv-u-boot-novena-2023.07.02.drv'
2023-10-23 00:28:53 build failed: '/gnu/store/lfjp146k0z0w2yqd3y6m2mv09fq01948-u-boot-sifive-unmatched-2023.07.02.drv'
2023-10-23 00:28:53 build failed: '/gnu/store/425ykg0ci4i612ryc7z79as94vipp9rb-u-boot-rpi-2-efi-2023.07.02.drv'
2023-10-23 00:28:57 build failed: '/gnu/store/7fyq24jsmvclnapjgf1li43w6dmm5n06-u-boot-mx6cuboxi-2023.07.02.drv'
[…]
2023-10-23 00:34:29 build failed: '/gnu/store/s167lh0w0azfxx83ri8k0mn2gj57i5rl-u-boot-rpi-4-32b-2023.07.02.drv'
--8<---------------cut here---------------end--------------->8---

/var/log/messages shows that ‘remote-server’ failed and was respawned
(the failure manifests as “failed to start worker/database proxy” in the
cuirass-remote-server.log excerpt above):

--8<---------------cut here---------------start------------->8---
Oct 23 00:24:33 localhost vmunix: [12534110.756389] BTRFS error (device sde2): parent transid verify failed on logical 30457856 mirror 2 wanted 61709 found 11
Oct 23 00:24:33 localhost vmunix: [12534110.768173] BTRFS error (device sde2): parent transid verify failed on logical 30457856 mirror 1 wanted 61709 found 11
Oct 23 00:24:34 localhost vmunix: [12534110.779454] BTRFS error (device sde2): parent transid verify failed on logical 30457856 mirror 2 wanted 61709 found 11
Oct 23 00:24:59 localhost ntpd[1953]: Soliciting pool server 81.169.208.5
Oct 23 00:26:04 localhost ntpd[1953]: Soliciting pool server 141.98.136.83
Oct 23 00:26:58 localhost shepherd[1]: Service cuirass-remote-server (PID 92414) exited with 1. 
Oct 23 00:26:58 localhost shepherd[1]: Respawning cuirass-remote-server. 
Oct 23 00:26:58 localhost shepherd[1]: Starting service user-homes... 
Oct 23 00:26:58 localhost shepherd[1]: Service user-homes has been started. 
Oct 23 00:26:58 localhost shepherd[1]: Service user-homes started. 
Oct 23 00:26:58 localhost shepherd[1]: Service user-homes running with value #t. 
Oct 23 00:26:58 localhost shepherd[1]: Starting service postgres-roles... 
Oct 23 00:26:58 localhost shepherd[1]: Service postgres-roles has been started. 
Oct 23 00:26:58 localhost shepherd[1]: Service postgres-roles started. 
Oct 23 00:26:58 localhost shepherd[1]: Service postgres-roles running with value #t. 
Oct 23 00:26:58 localhost shepherd[1]: Starting service cuirass-remote-server... 
Oct 23 00:26:58 localhost shepherd[1]: Service cuirass-remote-server has been started. 
Oct 23 00:26:58 localhost shepherd[1]: Service cuirass-remote-server started. 
Oct 23 00:26:58 localhost shepherd[1]: Service cuirass-remote-server running with value 63052. 
Oct 23 00:26:59 localhost vmunix: [12534255.969754] validate_extent_buffer: 58 callbacks suppressed
Oct 23 00:26:59 localhost vmunix: [12534255.969764] BTRFS error (device sde2): parent transid verify failed on logical 30457856 mirror 1 wanted 61709 found 11
--8<---------------cut here---------------end--------------->8---

To summarize:

  1. ‘cuirass remote-server’ failed for some unknown reason and was
     respawned.

  2. All the ongoing and about-to-be-started builds carried out by
     worker processes at that point were marked as failed.

We can improve on both parts.

To be continued…

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#66692; Package guix. (Mon, 23 Oct 2023 21:16:01 GMT) Full text and rfc822 format available.

Message #11 received at 66692 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: 66692 <at> debbugs.gnu.org
Subject: Re: bug#66692: [cuirass] Spurious failed builds following patching
 u-boot sources
Date: Mon, 23 Oct 2023 23:14:23 +0200
Ludovic Courtès <ludo <at> gnu.org> skribis:

> /var/log/cuirass-remote-server.log:
>
> 2023-10-23 00:26:06 0 items queued for eventual download
> 2023-10-23 00:26:55 period update: 0 resumable, 0 failed builds.
> 2023-10-23 00:26:58 error: failed to start worker/database proxy: Invalid argument

EINVAL here most likely comes from replying to a sender with invalid ID,
due to ZMQ_ROUTER_MANDATORY, exactly what Cuirass commit
40f70d28aed55c404cca6a0760860fb4942e6bee fixed and aimed to catch.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#66692; Package guix. (Tue, 24 Oct 2023 01:00:01 GMT) Full text and rfc822 format available.

Message #14 received at 66692 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 66692 <at> debbugs.gnu.org
Subject: Re: bug#66692: [cuirass] Spurious failed builds following patching
 u-boot sources
Date: Mon, 23 Oct 2023 20:59:09 -0400
Hi Ludovic!

Ludovic Courtès <ludo <at> gnu.org> writes:

> Ludovic Courtès <ludo <at> gnu.org> skribis:
>
>> /var/log/cuirass-remote-server.log:
>>
>> 2023-10-23 00:26:06 0 items queued for eventual download
>> 2023-10-23 00:26:55 period update: 0 resumable, 0 failed builds.
>> 2023-10-23 00:26:58 error: failed to start worker/database proxy: Invalid argument
>
> EINVAL here most likely comes from replying to a sender with invalid ID,
> due to ZMQ_ROUTER_MANDATORY, exactly what Cuirass commit
> 40f70d28aed55c404cca6a0760860fb4942e6bee fixed and aimed to catch.

Thanks for picking up the investigation.  Does that mean that the above
fix was partial and that there likely is a use after free somewhere
else?

It's good at least that this ZMQ_ROUTER_MANDATORY option catches that.

(Also, these scary Btrfs errors are a reminder that the file system got
corrupted at some point and should be recreated at some point -- I hope
they are not related to the problem at hand but we should keep an eye on
things).

-- 
Thanks,
Maxim




Information forwarded to bug-guix <at> gnu.org:
bug#66692; Package guix. (Tue, 24 Oct 2023 20:28:01 GMT) Full text and rfc822 format available.

Message #17 received at 66692 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: 66692 <at> debbugs.gnu.org
Subject: Re: bug#66692: [cuirass] Spurious failed builds following patching
 u-boot sources
Date: Tue, 24 Oct 2023 22:27:07 +0200
Hi,

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Ludovic Courtès <ludo <at> gnu.org> skribis:
>>
>>> /var/log/cuirass-remote-server.log:
>>>
>>> 2023-10-23 00:26:06 0 items queued for eventual download
>>> 2023-10-23 00:26:55 period update: 0 resumable, 0 failed builds.
>>> 2023-10-23 00:26:58 error: failed to start worker/database proxy: Invalid argument
>>
>> EINVAL here most likely comes from replying to a sender with invalid ID,
>> due to ZMQ_ROUTER_MANDATORY, exactly what Cuirass commit
>> 40f70d28aed55c404cca6a0760860fb4942e6bee fixed and aimed to catch.
>
> Thanks for picking up the investigation.  Does that mean that the above
> fix was partial and that there likely is a use after free somewhere
> else?

Yes, the fix in Cuirass above was insufficient, though you had to be
truly unlucky for use-after-free to happen this quickly.

Anyway, I had also proposed a fix in guile-simple-zmq, which the upgrade
in Guix commit 04215779baaa21f10bd5f20999761dc506e449d8 pulls in.

I have yet to reconfigure on berlin.

In the meantime I’ll look at the worker side of this issue.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#66692; Package guix. (Sat, 28 Oct 2023 09:57:01 GMT) Full text and rfc822 format available.

Message #20 received at 66692 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: 66692 <at> debbugs.gnu.org
Subject: Re: bug#66692: [cuirass] Spurious failed builds following patching
 u-boot sources
Date: Sat, 28 Oct 2023 11:55:38 +0200
Hi,

Ludovic Courtès <ludo <at> gnu.org> skribis:

>   2. All the ongoing and about-to-be-started builds carried out by
>      worker processes at that point were marked as failed.

This part of the problem was a bit more subtle.q

When ‘cuirass remote-server’ goes away, ‘cuirass remote-worker’ keeps
going if it was building something.  In that case the build log will be
lost (because the direct connection to the server was cut), but the
worker eventually notifies the server of build success/failure (ZeroMQ,
makes sure those messages are eventually sent, when the server is back).

The bug was that the worker could in some cases, when the server is
down, send a “build failed” message even though the bug had not
completed.

This is fixed by these Cuirass commits:

  2eb3e13 * remote-worker: Discard log anytime ‘send-log’ throws.
  4cc37f5 * remote-worker: Ignore the return value of ‘build-derivations&’.
  3bbb5c8 * store: ‘build-derivations&’ enforces synchronization with the build.

(I have yet to update the ‘cuirass’ package and to deploy it on ci.guix.
It’s been running for a few days at <https://guix.bordeaux.inria.fr>.)

Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Mon, 30 Oct 2023 13:14:02 GMT) Full text and rfc822 format available.

Notification sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
bug acknowledged by developer. (Mon, 30 Oct 2023 13:14:02 GMT) Full text and rfc822 format available.

Message #25 received at 66692-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: 66692-done <at> debbugs.gnu.org
Subject: Re: bug#66692: [cuirass] Spurious failed builds following patching
 u-boot sources
Date: Mon, 30 Oct 2023 14:13:12 +0100
Ludovic Courtès <ludo <at> gnu.org> skribis:

> This is fixed by these Cuirass commits:
>
>   2eb3e13 * remote-worker: Discard log anytime ‘send-log’ throws.
>   4cc37f5 * remote-worker: Ignore the return value of ‘build-derivations&’.
>   3bbb5c8 * store: ‘build-derivations&’ enforces synchronization with the build.
>
> (I have yet to update the ‘cuirass’ package and to deploy it on ci.guix.
> It’s been running for a few days at <https://guix.bordeaux.inria.fr>.)

I’ve done that just a couple of hours ago, running Cuirass 1.2.0 on both
berlin and the x86_64 nodes listed in ‘berlin-nodes.scm’.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#66692; Package guix. (Tue, 31 Oct 2023 13:33:02 GMT) Full text and rfc822 format available.

Message #28 received at 66692 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: 66692 <at> debbugs.gnu.org, Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#66692: closed (Re: bug#66692: [cuirass] Spurious failed
 builds following patching u-boot sources)
Date: Tue, 31 Oct 2023 09:31:42 -0400
Hi,

> Ludovic Courtès <ludo <at> gnu.org> skribis:
>
>> This is fixed by these Cuirass commits:
>>
>>   2eb3e13 * remote-worker: Discard log anytime ‘send-log’ throws.
>>   4cc37f5 * remote-worker: Ignore the return value of ‘build-derivations&’.
>>   3bbb5c8 * store: ‘build-derivations&’ enforces synchronization with the build.
>>
>> (I have yet to update the ‘cuirass’ package and to deploy it on ci.guix.
>> It’s been running for a few days at <https://guix.bordeaux.inria.fr>.)
>
> I’ve done that just a couple of hours ago, running Cuirass 1.2.0 on both
> berlin and the x86_64 nodes listed in ‘berlin-nodes.scm’.

Awesome, thanks for all the fixes toward Cuirass!

-- 
Maxim




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 29 Nov 2023 12:24:10 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 206 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.