While on a quest for flaky tests in the Shepherd, I found a genuine bug that would manifest with this ‘tests/basic.sh’ failure: --8<---------------cut here---------------start------------->8--- + herd -s t-socket-21679 status test-run-from-nonexistent-directory + sleep 0.5 + herd -s t-socket-21679 status test-run-from-nonexistent-directory + grep 'exited with code 127' + sleep 0.5 + herd -s t-socket-21679 status test-run-from-nonexistent-directory + grep 'exited with code 127' […] 2025-03-06 14:06:36 Service test-run-from-nonexistent-directory started. 2025-03-06 14:06:36 Failed to run "/gnu/store/3bg5qfsmjw6p7bh1xadarbaq246zis0d-coreutils-9.1/bin/pwd": In procedure chdir: No such file or directory 2025-03-06 14:06:36 Service test-run-from-nonexistent-directory running with value #< id: 22431 command: ("/gnu/store/3bg5qfsmjw6p7bh1xadarbaq246zis0d-coreutils-9.1/bin/pwd")>. 2025-03-06 14:06:36 Service test-run-from-nonexistent-directory has been started. 2025-03-06 14:06:36 Service test-run-from-nonexistent-directory has been disabled. 2025-03-06 14:11:51 Stopping service root... --8<---------------cut here---------------end--------------->8--- What happens is that the service is not marked as “exited with code 127”; instead, it is marked as having exited with code 0: --8<---------------cut here---------------start------------->8--- ● Status of test-run-from-nonexistent-directory: It is stopped since 14:06:36 (37 seconds ago). Process exited successfully. It is disabled. Provides: test-run-from-nonexistent-directory Will not be respawned. --8<---------------cut here---------------end--------------->8--- This is due to a race condition: the process terminates before its service goes from ‘starting’ to ‘running’. By the time the service controller calls ‘monitor-service-process’, the process has already terminated, so the process monitor replies 0 to the 'await request because that process no longer exists. Attached is a test that reproduces the problem. Ludo’.