GNU bug report logs - #37237
mcron randomly stops running jobs

Previous Next

Package: guix;

Reported by: Robert Vollmert <rob <at> vllmrt.net>

Date: Fri, 30 Aug 2019 17:33:02 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Robert Vollmert <rob <at> vllmrt.net>
Subject: bug#37237: closed (Re: bug#37237: mcron randomly stops running jobs)
Date: Wed, 18 Mar 2020 11:11:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#37237: mcron randomly stops running jobs

which was filed against the guix package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 37237 <at> debbugs.gnu.org.

-- 
37237: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=37237
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
To: Robert Vollmert <rob <at> vllmrt.net>
Cc: 37237-done <at> debbugs.gnu.org
Subject: Re: bug#37237: mcron randomly stops running jobs
Date: Wed, 18 Mar 2020 12:10:27 +0100
Hi,

Robert Vollmert <rob <at> vllmrt.net> skribis:

> I have numerous mcron jobs configured, many of which run every 15 minutes,
> and one which runs once a minute. Just now, at 19:22, I noticed the minutely
> cron job didn’t seem to run, and saw that one every-15-minute job which always
> logs to mcron.log had last run at 18:00. I restarted mcron, and jobs started
> running again. There is no timezone confusion either: That every-15-minute job
> logged its first run at 19:30.

When mcron is stuck, it’s apparently stuck in ‘pthread_join’, waiting
for the finalization thread to terminate:

--8<---------------cut here---------------start------------->8---
(gdb) bt
#0  0x00007f7f730d8478 in __GI___pthread_timedjoin_ex (threadid=140185229293312, thread_return=thread_return <at> entry=0x0, abstime=abstime <at> entry=0x0, 
    block=block <at> entry=true) at pthread_join_common.c:84
#1  0x00007f7f730d82dc in __pthread_join (threadid=<optimized out>, thread_return=thread_return <at> entry=0x0) at pthread_join.c:24
#2  0x00007f7f731d7508 in stop_finalization_thread () at finalizers.c:265
#3  0x00007f7f731d7729 in scm_i_finalizer_pre_fork () at finalizers.c:290
#4  0x00007f7f73250426 in scm_fork () at posix.c:1220
#5  0x00007f7f7324979f in vm_regular_engine (thread=0x7f7f6acb79d0, vp=0x7f7f6b299f30, registers=0xafaf, resume=1930265720) at vm-engine.c:786
#6  0x00007f7f7324bfd7 in scm_call_n (proc=#<program 7f7f6aeaf5d0>, argv=argv <at> entry=0x7ffd9b10bed8, nargs=nargs <at> entry=1) at vm.c:1260
#7  0x00007f7f731cdf58 in scm_call_1 (proc=<optimized out>, arg1=<optimized out>) at eval.c:485
#8  0x000000000040130c in inner_main ()
#9  0x00007f7f731e5cfd in invoke_main_func (body_data=0x7ffd9b10c3b0) at init.c:341
#10 0x00007f7f731c835a in c_body (d=0x7ffd9b10c2f0) at continuations.c:422
#11 0x00007f7f7324979f in vm_regular_engine (thread=0x7f7f6acb79d0, vp=0x7f7f6b299f30, registers=0xafaf, resume=1930265720) at vm-engine.c:786
#12 0x00007f7f7324bfd7 in scm_call_n (proc=proc <at> entry=#<smob catch-closure 7f7f6b2a1460>, argv=argv <at> entry=0x0, nargs=nargs <at> entry=0) at vm.c:1260
#13 0x00007f7f731cdf39 in scm_call_0 (proc=proc <at> entry=#<smob catch-closure 7f7f6b2a1460>) at eval.c:479
#14 0x00007f7f7323ac12 in catch (tag=tag <at> entry=#t, thunk=#<smob catch-closure 7f7f6b2a1460>, 
    handler=<error reading variable: ERROR: Cannot access memory at address 0x73776b3469667269>0x7f7f6b2a1420, 
    pre_unwind_handler=<error reading variable: ERROR: Cannot access memory at address 0x73776b3469667269>0x7f7f6b2a1400) at throw.c:137
#15 0x00007f7f7323aef5 in scm_catch_with_pre_unwind_handler (key=key <at> entry=#t, thunk=<optimized out>, handler=<optimized out>, pre_unwind_handler=<optimized out>)
    at throw.c:254
#16 0x00007f7f7323b0bf in scm_c_catch (tag=tag <at> entry=#t, body=body <at> entry=0x7f7f731c8350 <c_body>, body_data=body_data <at> entry=0x7ffd9b10c2f0, 
    handler=handler <at> entry=0x7f7f731c85e0 <c_handler>, handler_data=handler_data <at> entry=0x7ffd9b10c2f0, 
    pre_unwind_handler=pre_unwind_handler <at> entry=0x7f7f731c8440 <pre_unwind_handler>, pre_unwind_handler_data=0x7f7f6b2939a0) at throw.c:377
#17 0x00007f7f731c8940 in scm_i_with_continuation_barrier (body=body <at> entry=0x7f7f731c8350 <c_body>, body_data=body_data <at> entry=0x7ffd9b10c2f0, 
    handler=handler <at> entry=0x7f7f731c85e0 <c_handler>, handler_data=handler_data <at> entry=0x7ffd9b10c2f0, 
    pre_unwind_handler=pre_unwind_handler <at> entry=0x7f7f731c8440 <pre_unwind_handler>, pre_unwind_handler_data=0x7f7f6b2939a0) at continuations.c:360
#18 0x00007f7f731c89d5 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>) at continuations.c:456
#19 0x00007f7f7323984c in with_guile (base=base <at> entry=0x7ffd9b10c358, data=data <at> entry=0x7ffd9b10c380) at threads.c:661
#20 0x00007f7f73127a68 in GC_call_with_stack_base (fn=fn <at> entry=0x7f7f73239800 <with_guile>, arg=arg <at> entry=0x7ffd9b10c380) at misc.c:1941
#21 0x00007f7f73239b68 in scm_i_with_guile (dynamic_state=<optimized out>, data=data <at> entry=0x7ffd9b10c380, func=func <at> entry=0x7f7f731e5ce0 <invoke_main_func>)
    at threads.c:704
#22 scm_with_guile (func=func <at> entry=0x7f7f731e5ce0 <invoke_main_func>, data=data <at> entry=0x7ffd9b10c3b0) at threads.c:710
#23 0x00007f7f731e5e92 in scm_boot_guile (argc=12, argv=0x7ffd9b10c4f8, main_func=0x4012a0 <inner_main>, closure=0x0) at init.c:324
#24 0x000000000040118b in main ()
--8<---------------cut here---------------end--------------->8---

The finalization thread lives its life as if it hadn’t received the
“stop” message in its pipe:

--8<---------------cut here---------------start------------->8---
(gdb) thread 17
[Switching to thread 17 (Thread 0x7f7f6acb7700 (LWP 44975))]
#0  0x00007f7f730e0344 in __libc_read (fd=5, buf=buf <at> entry=0x7f7f6acb6a40, nbytes=nbytes <at> entry=1) at ../sysdeps/unix/sysv/linux/read.c:26
26      ../sysdeps/unix/sysv/linux/read.c: No such file or directory.
(gdb) bt
#0  0x00007f7f730e0344 in __libc_read (fd=5, buf=buf <at> entry=0x7f7f6acb6a40, nbytes=nbytes <at> entry=1) at ../sysdeps/unix/sysv/linux/read.c:26
#1  0x00007f7f731d7497 in read_finalization_pipe_data (data=0x7f7f6acb6a40) at finalizers.c:199
#2  0x00007f7f7312d503 in GC_do_blocking_inner (data=0x7f7f6acb6a00 "\200t\035s\177\177", context=<optimized out>) at pthread_support.c:1362
#3  0x00007f7f73121d62 in GC_with_callee_saves_pushed (fn=0x7f7f7312d4c0 <GC_do_blocking_inner>, arg=arg <at> entry=0x7f7f6acb6a00 "\200t\035s\177\177") at mach_dep.c:328
#4  0x00007f7f73127a9c in GC_do_blocking (fn=fn <at> entry=0x7f7f731d7480 <read_finalization_pipe_data>, client_data=client_data <at> entry=0x7f7f6acb6a40) at misc.c:2053
#5  0x00007f7f73239bba in scm_without_guile (func=0x7f7f731d7480 <read_finalization_pipe_data>, data=0x7f7f6acb6a40) at threads.c:722
#6  0x00007f7f731d784b in finalization_thread_proc (unused=<optimized out>) at finalizers.c:212
#7  0x00007f7f731c835a in c_body (d=0x7f7f6acb6e50) at continuations.c:422
#8  0x00007f7f7324979f in vm_regular_engine (thread=0x5, vp=0x7f7f6b299ea0, registers=0x1, resume=1930298180) at vm-engine.c:786
#9  0x00007f7f7324bfd7 in scm_call_n (proc=proc <at> entry=#<smob catch-closure 7f7f6b22fb80>, argv=argv <at> entry=0x0, nargs=nargs <at> entry=0) at vm.c:1260
#10 0x00007f7f731cdf39 in scm_call_0 (proc=proc <at> entry=#<smob catch-closure 7f7f6b22fb80>) at eval.c:479
#11 0x00007f7f7323ac12 in catch (tag=tag <at> entry=#t, thunk=#<smob catch-closure 7f7f6b22fb80>, 
    handler=<error reading variable: ERROR: Cannot access memory at address 0x73776b3469667269>0x7f7f6b22f740, 
    pre_unwind_handler=<error reading variable: ERROR: Cannot access memory at address 0x73776b3469667269>0x7f7f6b22f720) at throw.c:137
#12 0x00007f7f7323aef5 in scm_catch_with_pre_unwind_handler (key=key <at> entry=#t, thunk=<optimized out>, handler=<optimized out>, pre_unwind_handler=<optimized out>)
    at throw.c:254
#13 0x00007f7f7323b0bf in scm_c_catch (tag=tag <at> entry=#t, body=body <at> entry=0x7f7f731c8350 <c_body>, body_data=body_data <at> entry=0x7f7f6acb6e50, 
    handler=handler <at> entry=0x7f7f731c85e0 <c_handler>, handler_data=handler_data <at> entry=0x7f7f6acb6e50, 
    pre_unwind_handler=pre_unwind_handler <at> entry=0x7f7f731c8440 <pre_unwind_handler>, pre_unwind_handler_data=0x7f7f6b2939a0) at throw.c:377
#14 0x00007f7f731c8940 in scm_i_with_continuation_barrier (body=body <at> entry=0x7f7f731c8350 <c_body>, body_data=body_data <at> entry=0x7f7f6acb6e50, 
    handler=handler <at> entry=0x7f7f731c85e0 <c_handler>, handler_data=handler_data <at> entry=0x7f7f6acb6e50, 
    pre_unwind_handler=pre_unwind_handler <at> entry=0x7f7f731c8440 <pre_unwind_handler>, pre_unwind_handler_data=0x7f7f6b2939a0) at continuations.c:360
#15 0x00007f7f731c89d5 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>) at continuations.c:456
#16 0x00007f7f7323984c in with_guile (base=base <at> entry=0x7f7f6acb6eb8, data=data <at> entry=0x7f7f6acb6ee0) at threads.c:661
#17 0x00007f7f73127a68 in GC_call_with_stack_base (fn=fn <at> entry=0x7f7f73239800 <with_guile>, arg=arg <at> entry=0x7f7f6acb6ee0) at misc.c:1941
#18 0x00007f7f73239b68 in scm_i_with_guile (dynamic_state=<optimized out>, data=<optimized out>, func=<optimized out>) at threads.c:704
#19 scm_with_guile (func=<optimized out>, data=<optimized out>) at threads.c:710
#20 0x00007f7f730d7015 in start_thread (arg=0x7f7f6acb7700) at pthread_create.c:486
#21 0x00007f7f7300891f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
--8<---------------cut here---------------end--------------->8---

This is on 2.2.6 and very likely the same issue as
<https://bugs.gnu.org/37757>: ‘finalization_thread_proc’ gets EINTR, by
chance ‘data.byte’ is zero, and so it keeps going instead of exiting.

Commit e002332b8a136638c2a4fc1a2f2a26541ce0211d moves mcron to 2.2.7,
which should fix this problem.  \o/

Ludo’.

[Message part 3 (message/rfc822, inline)]
From: Robert Vollmert <rob <at> vllmrt.net>
To: bug-guix <at> gnu.org
Subject: mcron randomly stops running jobs
Date: Fri, 30 Aug 2019 19:31:44 +0200
This is the third time I’ve seen this, and this time I’m
sure that nothing else happened.

I have numerous mcron jobs configured, many of which run every 15 minutes,
and one which runs once a minute. Just now, at 19:22, I noticed the minutely
cron job didn’t seem to run, and saw that one every-15-minute job which always
logs to mcron.log had last run at 18:00. I restarted mcron, and jobs started
running again. There is no timezone confusion either: That every-15-minute job
logged its first run at 19:30.




This bug report was last modified 5 years and 123 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.