GNU bug report logs - #77610
guix-daemon socket activation does not work on the hurd

Previous Next

Package: guix;

Reported by: yelninei <at> tutamail.com

Date: Mon, 7 Apr 2025 16:30:03 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

To reply to this bug, email your comments to 77610 AT debbugs.gnu.org.
There is no need to reopen the bug first.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Mon, 07 Apr 2025 16:30:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to yelninei <at> tutamail.com:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Mon, 07 Apr 2025 16:30:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: yelninei <at> tutamail.com
To: bug-guix <at> gnu.org
Subject: guix-daemon socket activation does not work on the hurd
Date: Mon, 7 Apr 2025 18:29:29 +0200 (CEST)
Hi,

today i reconfigured my system and after a reboot I am unable to use the guix-daemon on a childhurd.


guix build hello -n
guix build: error: failed to connect to `/var/guix/daemon-socket/socket': Protocol error

Offloading:
guix offload: error: failed to connect over SSH to daemon at 'localhost', socket /var/guix/daemon-socket/socket

Daemon Logs:
socket-activated with 1 socket
unexpected build daemon error: reading from file: Resource temporarily unavailable
Starting the daemon as the root user normally continues to work as before so i suspect the socket activation change is to blame.
Guix commit: 6af680670bf9055b90e6f8b63c4c2ab7b08e7c56




Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Wed, 09 Apr 2025 10:30:03 GMT) Full text and rfc822 format available.

Message #8 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: yelninei <at> tutamail.com
To: 77610 <at> debbugs.gnu.org
Subject: Re: guix-daemon socket activation does not work on the hurd
Date: Wed, 9 Apr 2025 12:29:09 +0200 (CEST)
After mentioning this on IRC Ludovic pushed 8d31cafbdcb818160852a5d1e6fc24c1a9c53e41 to the shepherd repo.

I wanted to try this out and reconfigured using the shepherd from this commit as pid1 in the vm (a bit tricky because of help2man).

The first connection still fails in the same way.unexpected build daemon error: reading from file: Resource temporarily unavailable

A client mentions:
guix build: error: corrupt input while restoring archive from #<closed: file 2396ea8>

However subsequent connections work.




Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Tue, 15 Apr 2025 16:09:02 GMT) Full text and rfc822 format available.

Message #11 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 77610 <at> debbugs.gnu.org,  yelninei <at> tutamail.com
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Tue, 15 Apr 2025 18:07:43 +0200
[Message part 1 (text/plain, inline)]
yelninei--- via Bug reports for GNU Guix <bug-guix <at> gnu.org> writes:

> After mentioning this on IRC Ludovic pushed 8d31cafbdcb818160852a5d1e6fc24c1a9c53e41 to the shepherd repo.
>
> I wanted to try this out and reconfigured using the shepherd from this commit as pid1 in the vm (a bit tricky because of help2man).
>
> The first connection still fails in the same way.unexpected build daemon error: reading from file: Resource temporarily unavailable

I looked a bit into this, and I think shepherd is doing the right
working as expected, making the socket blocking before executing
guix-daemon (it’s clear when stracing it on Linux).

So there must be something specific at play on the Hurd.

I tried this snippet (server on one side, client on the other side) and
it works as expected: ‘accept’ blocks and subsequent read does not get
EAGAIN.

So I’m at loss here.  Does ‘tests/systemd.sh’ succeed when ran natively?
(In particular the check added in
8d31cafbdcb818160852a5d1e6fc24c1a9c53e41.)

Thanks,
Ludo’.

[non-blocking-hurd.scm (text/plain, inline)]
(use-modules (ice-9 match))

(define (blocking-port port)
  "Return PORT after putting it in non-blocking mode."
  (let ((flags (fcntl port F_GETFL)))
    (fcntl port F_SETFL (logand (lognot O_NONBLOCK) flags))
    port))

(let ((sock (socket AF_UNIX (logior SOCK_STREAM SOCK_NONBLOCK) 0)))
  (bind sock AF_UNIX "/tmp/sock")
  (listen sock 10)
  (match (pk 'x (accept (blocking-port sock) SOCK_CLOEXEC)) ;should block
    ((port . _)
     (pk 'read (read port)))))

;; Client:
(let ((sock (socket AF_UNIX (logior SOCK_STREAM) 0)))
  (connect sock AF_UNIX "/tmp/sock")
  (display "hi!\n" sock))

Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Wed, 16 Apr 2025 18:09:07 GMT) Full text and rfc822 format available.

Message #14 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: yelninei <at> tutamail.com
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 77610 <at> debbugs.gnu.org
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Wed, 16 Apr 2025 20:08:14 +0200 (CEST)
Hello,

Apr 15, 2025, 16:08 by ludo <at> gnu.org:

> yelninei--- via Bug reports for GNU Guix <bug-guix <at> gnu.org> writes:
>
>> After mentioning this on IRC Ludovic pushed 8d31cafbdcb818160852a5d1e6fc24c1a9c53e41 to the shepherd repo.
>>
>> I wanted to try this out and reconfigured using the shepherd from this commit as pid1 in the vm (a bit tricky because of help2man).
>>
>> The first connection still fails in the same way.unexpected build daemon error: reading from file: Resource temporarily unavailable
>>
>
> I looked a bit into this, and I think shepherd is doing the right
> working as expected, making the socket blocking before executing
> guix-daemon (it’s clear when stracing it on Linux).
>
> So there must be something specific at play on the Hurd.
>
> I tried this snippet (server on one side, client on the other side) and
> it works as expected: ‘accept’ blocks and subsequent read does not get
> EAGAIN.
>
> So I’m at loss here.  Does ‘tests/systemd.sh’ succeed when ran natively?
> (In particular the check added in
> 8d31cafbdcb818160852a5d1e6fc24c1a9c53e41.)
>

Yes, it is passing both on 1.0.3 and 1.0.4. The only thing failing now is the system-log test.
As before when using #:lazy-start #f it works as expected which makes the only difference the timing of the first connection. What would the most minimal guix-daemon client need to look like to trigger the EAGAIN
 
I tried to verify that the port is definitly blocking before being passed to guix-daemon and it is. I am very confused.

Do you know of other processes (with not a lot of dependencies) that can be socket activated to try to replicate this with something less complicated than guix-daemon?



> Thanks,
> Ludo’.
>
> (use-modules (ice-9 match))
>
> (define (blocking-port port)
> "Return PORT after putting it in non-blocking mode."
> (let ((flags (fcntl port F_GETFL)))
> (fcntl port F_SETFL (logand (lognot O_NONBLOCK) flags))
> port))
>
> (let ((sock (socket AF_UNIX (logior SOCK_STREAM SOCK_NONBLOCK) 0)))
> (bind sock AF_UNIX "/tmp/sock")
> (listen sock 10)
> (match (pk 'x (accept (blocking-port sock) SOCK_CLOEXEC)) ;should block
> ((port . _)
> (pk 'read (read port)))))
>
> ;; Client:
> (let ((sock (socket AF_UNIX (logior SOCK_STREAM) 0)))
> (connect sock AF_UNIX "/tmp/sock")
> (display "hi!\n" sock))
>





Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Wed, 16 Apr 2025 20:20:02 GMT) Full text and rfc822 format available.

Message #17 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: yelninei <at> tutamail.com
Cc: 77610 <at> debbugs.gnu.org
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Wed, 16 Apr 2025 22:19:17 +0200
Hi,

yelninei <at> tutamail.com writes:

>> So I’m at loss here.  Does ‘tests/systemd.sh’ succeed when ran natively?
>> (In particular the check added in
>> 8d31cafbdcb818160852a5d1e6fc24c1a9c53e41.)
>>
>
> Yes, it is passing both on 1.0.3 and 1.0.4. The only thing failing now is the system-log test.

Intriguing.

> As before when using #:lazy-start #f it works as expected which makes
> the only difference the timing of the first connection. What would the
> most minimal guix-daemon client need to look like to trigger the
> EAGAIN
>  
> I tried to verify that the port is definitly blocking before being passed to guix-daemon and it is. I am very confused.
>
> Do you know of other processes (with not a lot of dependencies) that can be socket activated to try to replicate this with something less complicated than guix-daemon?

Well there’s ‘guix publish’, and otherwise the examples from
‘tests/systemd.sh’ (following ‘define %command’).

Otherwise we could mimic it by writing a C program that that opens a
SOCK_NONBLOCK socket, binds + listens + select(2) until something
happens, then calls fcntl(2) to clear the O_NONBLOCK flag, and then
forks + execs and call accept(2) in the child process.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Fri, 18 Apr 2025 08:23:08 GMT) Full text and rfc822 format available.

Message #20 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: yelninei <at> tutamail.com
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 77610 <at> debbugs.gnu.org
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Fri, 18 Apr 2025 10:21:20 +0200 (GMT+02:00)
Hello,

Apr 16, 2025, 20:19 by ludo <at> gnu.org:

> Well there’s ‘guix publish’, and otherwise the examples from
> ‘tests/systemd.sh’ (following ‘define %command’).
>
> Otherwise we could mimic it by writing a C program that that opens a
> SOCK_NONBLOCK socket, binds + listens + select(2) until something
> happens, then calls fcntl(2) to clear the O_NONBLOCK flag, and then
> forks + execs and call accept(2) in the child process.
>
> Ludo’.
>
I tested guix-publish and that had no issues.

Some checks I did yesterday with guix-dameon:
- Shepherd is passing a blocking socket
- The "fdSocket" in "acceptConnection" is always blocking.
- the "remote" socket in "acceptConnection" is O_NONBLOCK on the first connection only.
- Then also the "from.fd" socket in  "processConnection" is O_NONBLOCK on the first connectionThis then causes EAGAIN on trying to read the clientVersion.

On linux none of this is an issue.
Adding the same check as for the fd 3 socket  for O_NONBLOCK to the "connection" socket after accept  to tests/systemd.sh passes on Linux but causes a failure on the Hurd.

Is glibc accept doing something weird?
I am struggling to understand how the first connection would be any different than subsequent ones (and only in the #:lazy-start? #t case)

I am unsure what to do about this because shepherd seems to do everything correctly. I saw that ci.g.g.o has started to build i586-gnu substitutes (in particular gcc-final) but if you are restarting the builders more aggressively now then each first build will fail because of this and idk if cuirass can reschedule builds on such failures.

Maybe the easiest is to to expose the #:lazy-start? option for now and disable it for guix-daemon in %base-services/hurd ?











Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Fri, 18 Apr 2025 09:43:05 GMT) Full text and rfc822 format available.

Message #23 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: yelninei <at> tutamail.com
Cc: 77610 <at> debbugs.gnu.org
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Fri, 18 Apr 2025 11:42:17 +0200
Hi,

yelninei <at> tutamail.com writes:

> I tested guix-publish and that had no issues.

You mean the first ‘wget -O …’ passes?

> Some checks I did yesterday with guix-dameon:
> - Shepherd is passing a blocking socket
> - The "fdSocket" in "acceptConnection" is always blocking.
> - the "remote" socket in "acceptConnection" is O_NONBLOCK on the first connection only.

Looking at ‘accept4.c’ in libc, the only way ‘remote’ can be O_NONBLOCK
is if:

  1. ‘accept4’ is passed SOCK_NONBLOCK, but that’s not the case here
     (see ‘accept.c’);

  2. ‘__socket_accept’ returns a O_NONBLOCK socket, which would be a bug
     in the server, pflocal.

At first sight ‘S_io_set_all_openmodes’ in pflocal does the job and
‘S_socket_accept’ honors those flags.

> Adding the same check as for the fd 3 socket  for O_NONBLOCK to the
> "connection" socket after accept  to tests/systemd.sh passes on Linux
> but causes a failure on the Hurd.

So we have a reproducer.

Could you pass it on to bug-hurd? :-)  It may be easier if the whole
thing is in C.

> I am unsure what to do about this because shepherd seems to do
> everything correctly. I saw that ci.g.g.o has started to build
> i586-gnu substitutes (in particular gcc-final) but if you are
> restarting the builders more aggressively now then each first build
> will fail because of this and idk if cuirass can reschedule builds on
> such failures.

Yeah, it’s not great.  Those will have to be restarted manually I’m
afraid, but most of the time anybody can click on the “Restart” button
in Cuirass.

> Maybe the easiest is to to expose the #:lazy-start? option for now and disable it for guix-daemon in %base-services/hurd ?

Hmm maybe.  Let’s first figure out if this is Hurd bug.

Thanks for investigating!

Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Wed, 14 May 2025 17:05:09 GMT) Full text and rfc822 format available.

Notification sent to yelninei <at> tutamail.com:
bug acknowledged by developer. (Wed, 14 May 2025 17:05:09 GMT) Full text and rfc822 format available.

Message #28 received at 77610-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: yelninei <at> tutamail.com
Cc: 77610-done <at> debbugs.gnu.org
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Wed, 14 May 2025 18:04:55 +0200
For the record, this issue is now fixed upstream:

  https://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?id=029ab7d7b38c76ba14c24fcbf526ccef29af9e88
  https://lists.gnu.org/archive/html/bug-hurd/2025-05/msg00016.html

Closing!

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Wed, 14 May 2025 19:26:02 GMT) Full text and rfc822 format available.

Message #31 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: yelninei <at> tutamail.com
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 77610 <at> debbugs.gnu.org
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Wed, 14 May 2025 21:24:22 +0200 (CEST)
Hi Ludo,

Thank you again for finding the cause.Could we add your patch to our hurd either for master or core-packages-team as it will be a while until it is available in a tagged snapshot.It would fix the hurd ci builders randomly failing, the childhurd system test and the minor annoyance that the manual offload is failing.

From what I can see only adding it to hurd (and not the headers) should not cause a rebootstrap. 
May 14, 2025, 17:03 by ludo <at> gnu.org:

> For the record, this issue is now fixed upstream:
>
>  https://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?id=029ab7d7b38c76ba14c24fcbf526ccef29af9e88
>  https://lists.gnu.org/archive/html/bug-hurd/2025-05/msg00016.html
>
> Closing!
>
> Ludo’.
>





Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Wed, 14 May 2025 21:52:02 GMT) Full text and rfc822 format available.

Message #34 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: yelninei <at> tutamail.com
Cc: 77610 <at> debbugs.gnu.org
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Wed, 14 May 2025 23:48:45 +0200
Hi yelninei,

yelninei <at> tutamail.com writes:

> Thank you again for finding the cause.Could we add your patch to our
> hurd either for master or core-packages-team as it will be a while
> until it is available in a tagged snapshot.It would fix the hurd ci
> builders randomly failing, the childhurd system test and the minor
> annoyance that the manual offload is failing.
>
> From what I can see only adding it to hurd (and not the headers) should not cause a rebootstrap. 

Yes, sounds like a good idea.  Do you want to give it a try?

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Thu, 15 May 2025 08:20:02 GMT) Full text and rfc822 format available.

Message #37 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: yelninei <at> tutamail.com
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 77610 <at> debbugs.gnu.org, Janneke Nieuwenhuizen <janneke <at> gnu.org>
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Thu, 15 May 2025 10:18:38 +0200 (CEST)
[Message part 1 (text/plain, inline)]
Hello Ludo,

Something like this? I called the patch hurd-socket-activation.patch to indicate what it is addressing. Do you have a better suggestion?

I added it to master but this will create a minor merge conflict with the hurd update on core-packages-team.

May 14, 2025, 21:51 by ludo <at> gnu.org:

> Hi yelninei,
>
> yelninei <at> tutamail.com writes:
>
>> Thank you again for finding the cause.Could we add your patch to our
>> hurd either for master or core-packages-team as it will be a while
>> until it is available in a tagged snapshot.It would fix the hurd ci
>> builders randomly failing, the childhurd system test and the minor
>> annoyance that the manual offload is failing.
>>
>> From what I can see only adding it to hurd (and not the headers) should not cause a rebootstrap. 
>>
>
> Yes, sounds like a good idea.  Do you want to give it a try?
>
> Thanks,
> Ludo’.
>

[0001-gnu-hurd-Fix-service-socket-activation.patch (text/x-patch, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#77610; Package guix. (Sun, 18 May 2025 21:04:02 GMT) Full text and rfc822 format available.

Message #40 received at 77610 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: yelninei <at> tutamail.com
Cc: 77610 <at> debbugs.gnu.org, Janneke Nieuwenhuizen <janneke <at> gnu.org>
Subject: Re: bug#77610: guix-daemon socket activation does not work on the hurd
Date: Sun, 18 May 2025 22:46:24 +0200
Hello,

yelninei <at> tutamail.com writes:

> Something like this? I called the patch hurd-socket-activation.patch
> to indicate what it is addressing. Do you have a better suggestion?

Perfect; applied, thank you.

> I added it to master but this will create a minor merge conflict with the hurd update on core-packages-team.

Hopefully we can easily address it.

Ludo’.




This bug report was last modified 27 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.