GNU bug report logs - #76790
[Shepherd] Handling process termination before service is running

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Thu, 6 Mar 2025 20:47:02 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Ludovic Courtès <ludo <at> gnu.org>
To: 76790 <at> debbugs.gnu.org
Subject: bug#76790: [Shepherd] Handling process termination before service is running
Date: Thu, 06 Mar 2025 21:45:56 +0100
[Message part 1 (text/plain, inline)]
While on a quest for flaky tests in the Shepherd, I found a genuine bug
that would manifest with this ‘tests/basic.sh’ failure:

--8<---------------cut here---------------start------------->8---
+ herd -s t-socket-21679 status test-run-from-nonexistent-directory
+ sleep 0.5
+ herd -s t-socket-21679 status test-run-from-nonexistent-directory
+ grep 'exited with code 127'
+ sleep 0.5
+ herd -s t-socket-21679 status test-run-from-nonexistent-directory
+ grep 'exited with code 127'
[…]
2025-03-06 14:06:36 Service test-run-from-nonexistent-directory started.
2025-03-06 14:06:36 Failed to run "/gnu/store/3bg5qfsmjw6p7bh1xadarbaq246zis0d-coreutils-9.1/bin/pwd": In procedure chdir: No such file or directory
2025-03-06 14:06:36 Service test-run-from-nonexistent-directory running with value #<<process> id: 22431 command: ("/gnu/store/3bg5qfsmjw6p7bh1xadarbaq246zis0d-coreutils-9.1/bin/pwd")>.
2025-03-06 14:06:36 Service test-run-from-nonexistent-directory has been started.
2025-03-06 14:06:36 Service test-run-from-nonexistent-directory has been disabled.
2025-03-06 14:11:51 Stopping service root...
--8<---------------cut here---------------end--------------->8---

What happens is that the service is not marked as “exited with code
127”; instead, it is marked as having exited with code 0:

--8<---------------cut here---------------start------------->8---
● Status of test-run-from-nonexistent-directory:
  It is stopped since 14:06:36 (37 seconds ago).
  Process exited successfully.
  It is disabled.
  Provides: test-run-from-nonexistent-directory
  Will not be respawned.
--8<---------------cut here---------------end--------------->8---

This is due to a race condition: the process terminates before its
service goes from ‘starting’ to ‘running’.

By the time the service controller calls ‘monitor-service-process’, the
process has already terminated, so the process monitor replies 0 to the
'await request because that process no longer exists.

Attached is a test that reproduces the problem.

Ludo’.

[terminate-before-running.sh (text/plain, attachment)]

This bug report was last modified 128 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.