Ludovic Courtès <ludo@gnu.org> writes:

> Tomas Volf <~@wolfsden.cz> skribis:
>
>> (I wonder if there is better way to detect the sleep.  I feel like *any*
>> number will be wrong for someone.  Do we know how for example systemd's
>> timers handle this?)
>
> I believe systemd is the one initiating hibernation, so it has the
> information first-hand; in our case this is initiated by elogind and
> shepherd doesn’t know.  Probably something to fix.
>
> Anyway, this time drift remains a mystery to me.  I would go for a hack
> like this:
>
> diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm
> index adc4530..1587a02 100644
> --- a/modules/shepherd/service.scm
> +++ b/modules/shepherd/service.scm
> @@ -2490,6 +2490,10 @@ keyword arguments as @code{fork+exec-command}: @code{#:user},
>    "Make an operation that returns @var{timeout} when @var{seconds} have
>  elapsed and @var{overslept} when many more seconds have elapsed--this can
>  happen if the machine is suspended or put into hibernation mode."
> +  (define max-delay
> +    ;; Time after which we consider that we missed the deadline.

I would extend the comment to describe why both 10 and 2 are used.

> +    (if (> seconds 180) 10 2))
> +
>    (let ((expiry (+ (get-internal-real-time)
>                     (inexact->exact
>                      (round (* seconds internal-time-units-per-second))))))
> @@ -2497,7 +2501,7 @@ happen if the machine is suspended or put into hibernation mode."
>                      (lambda ()
>                        (let* ((now (get-internal-real-time))

I have no idea how Shepherd works internally (and much less how Fibers
work), so maybe this comment is completely off, but this seems
suspicious.  Should this lambda not get the wake up time as an argument,
instead of calling get-internal-real-time to get the "now"?

I have no idea what guarantees do Fibers make regarding the delays
between detecting that time is up and calling the callback.  And after
quick look at the source code I have decided that it is way beyond me to
try to figure it out.

Is there a way to enable logging of the events?  So we would know when
fibers decided the timer is up, and when the lambda was called?

>                               (delta (- now expiry)))
> -                        (if (> delta (* 2 internal-time-units-per-second))
> +                        (if (> delta (* max-delay internal-time-units-per-second))
>                              overslept
>                              timeout))))))
>  
>
>
> WDYT?

Well, in *this* particular case it would have resolved the problem, so
great for me I guess.  However I have left a suggestion above.

Out of curiosity, I have scheduled a timer event for tomorrow 23:0{0..5}
to see if they will fire with delay.  Testing with short timer (closest
whole minute) did not bring any results (the timers were executed
exactly on time), so maybe the long wait is a factor?  Will report
tomorrow.

Tomas

PS: Looking into timer.scm, I see this comment

--8<---------------cut here---------------start------------->8---
;; Reached when resuming from sleep state: we slept
;; significantly more than the requested number of seconds.  To
;; avoid triggering every timer when resuming from sleep state,
;; sleep again to remain in sync.
--8<---------------cut here---------------end--------------->8---

Not sure I would call 2 (or even the 10) a "significantly more". :) If I
expect the cron to sleep for 86400 seconds, 10 more seems... minor.

Maybe (I did not put too much though into this and the numbers are
completely thumb-sucked), the "overslept" could be if the sleep was
longer by more than 10% of the timer period, clipped to be at least 2,
and at most 30 minutes?

If I have a cron scheduled to run once a month, I would guess most
people would prefer to have it run 20 minutes late than to skip a month
completely.

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.