GNU bug report logs -
#76790
[Shepherd] Handling process termination before service is running
Previous Next
Reported by: Ludovic Courtès <ludo <at> gnu.org>
Date: Thu, 6 Mar 2025 20:47:02 UTC
Severity: normal
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
While on a quest for flaky tests in the Shepherd, I found a genuine bug
that would manifest with this ‘tests/basic.sh’ failure:
--8<---------------cut here---------------start------------->8---
+ herd -s t-socket-21679 status test-run-from-nonexistent-directory
+ sleep 0.5
+ herd -s t-socket-21679 status test-run-from-nonexistent-directory
+ grep 'exited with code 127'
+ sleep 0.5
+ herd -s t-socket-21679 status test-run-from-nonexistent-directory
+ grep 'exited with code 127'
[…]
2025-03-06 14:06:36 Service test-run-from-nonexistent-directory started.
2025-03-06 14:06:36 Failed to run "/gnu/store/3bg5qfsmjw6p7bh1xadarbaq246zis0d-coreutils-9.1/bin/pwd": In procedure chdir: No such file or directory
2025-03-06 14:06:36 Service test-run-from-nonexistent-directory running with value #<<process> id: 22431 command: ("/gnu/store/3bg5qfsmjw6p7bh1xadarbaq246zis0d-coreutils-9.1/bin/pwd")>.
2025-03-06 14:06:36 Service test-run-from-nonexistent-directory has been started.
2025-03-06 14:06:36 Service test-run-from-nonexistent-directory has been disabled.
2025-03-06 14:11:51 Stopping service root...
--8<---------------cut here---------------end--------------->8---
What happens is that the service is not marked as “exited with code
127”; instead, it is marked as having exited with code 0:
--8<---------------cut here---------------start------------->8---
● Status of test-run-from-nonexistent-directory:
It is stopped since 14:06:36 (37 seconds ago).
Process exited successfully.
It is disabled.
Provides: test-run-from-nonexistent-directory
Will not be respawned.
--8<---------------cut here---------------end--------------->8---
This is due to a race condition: the process terminates before its
service goes from ‘starting’ to ‘running’.
By the time the service controller calls ‘monitor-service-process’, the
process has already terminated, so the process monitor replies 0 to the
'await request because that process no longer exists.
Attached is a test that reproduces the problem.
Ludo’.
[terminate-before-running.sh (text/plain, attachment)]
This bug report was last modified 128 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.