Package: emacs;
Reported by: Andreas Politz <politza <at> hochschule-trier.de>
Date: Thu, 11 Jul 2019 20:52:02 UTC
Severity: normal
Tags: fixed
Found in version 27.0.50
Fixed in version 28.1
Done: dick <dick.r.chiang <at> gmail.com>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Pip Cet <pipcet <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de Subject: bug#36609: 27.0.50; Possible race-condition in threading implementation Date: Fri, 12 Jul 2019 12:57:51 +0000
On Fri, Jul 12, 2019 at 12:42 PM Eli Zaretskii <eliz <at> gnu.org> wrote: > > > From: Pip Cet <pipcet <at> gmail.com> > > Date: Fri, 12 Jul 2019 09:02:22 +0000 > > Cc: 36609 <at> debbugs.gnu.org > > > > On Thu, Jul 11, 2019 at 8:52 PM Andreas Politz > > <politza <at> hochschule-trier.de> wrote: > > > I think there is a race-condition in the implementation of threads. I > > > tried to find a minimal test-case, without success. Thus, I've attached > > > a lengthy source-file. Loading that file should trigger this bug and > > > may freeze your session. > > > > It does here, so I can provide further debugging information if > > needed. > > Thanks, can you provide the info I asked for? Yes, albeit not right now. > > On first glance, it appears that xgselect returns abnormally with > > g_main_context acquired in one thread, and then other threads fail > > to acquire it and loop endlessly. > > If you can describe what causes this to happen, I think we might be > half-way to a solution. Here's the backtrace of the abnormal exit I see with the patch attached: (gdb) bt full #0 0x00000000006bf987 in release_g_main_context (ptr=0xc1d070) at xgselect.c:36 context = 0x7fffedf79710 #1 0x0000000000616f03 in do_one_unbind (this_binding=0x7fffedf79770, unwinding=true, bindflag=SET_INTERNAL_UNBIND) at eval.c:3446 #2 0x0000000000617245 in unbind_to (count=0, value=XIL(0)) at eval.c:3567 this_binding = { kind = SPECPDL_UNWIND_PTR, unwind = { kind = SPECPDL_UNWIND_PTR, func = 0x6bf97b <release_g_main_context>, arg = XIL(0xc1d070), eval_depth = 0 }, unwind_array = { kind = SPECPDL_UNWIND_PTR, nelts = 7076219, array = 0xc1d070 }, unwind_ptr = { kind = SPECPDL_UNWIND_PTR, func = 0x6bf97b <release_g_main_context>, arg = 0xc1d070 }, unwind_int = { kind = SPECPDL_UNWIND_PTR, func = 0x6bf97b <release_g_main_context>, arg = 12701808 }, unwind_excursion = { kind = SPECPDL_UNWIND_PTR, marker = XIL(0x6bf97b), window = XIL(0xc1d070) }, unwind_void = { kind = SPECPDL_UNWIND_PTR, func = 0x6bf97b <release_g_main_context> }, let = { kind = SPECPDL_UNWIND_PTR, symbol = XIL(0x6bf97b), old_value = XIL(0xc1d070), where = XIL(0), saved_value = XIL(0xef26a0) }, bt = { kind = SPECPDL_UNWIND_PTR, debug_on_exit = false, function = XIL(0x6bf97b), args = 0xc1d070, nargs = 0 } } quitf = XIL(0) #3 0x00000000006116df in unwind_to_catch (catch=0x7fffd8000c50, type=NONLOCAL_EXIT_SIGNAL, value=XIL(0x14d3653)) at eval.c:1162 last_time = false #4 0x00000000006126d9 in signal_or_quit (error_symbol=XIL(0x90), data=XIL(0), keyboard_quit=false) at eval.c:1674 unwind_data = XIL(0x14d3653) conditions = XIL(0x7ffff05d676b) string = XIL(0x5f5e77) real_error_symbol = XIL(0x90) clause = XIL(0x30) h = 0x7fffd8000c50 #5 0x00000000006122e9 in Fsignal (error_symbol=XIL(0x90), data=XIL(0)) at eval.c:1564 #6 0x0000000000698901 in post_acquire_global_lock (self=0xe09db0) at thread.c:115 sym = XIL(0x90) data = XIL(0) prev_thread = 0xa745c0 <main_thread> #7 0x000000000069892b in acquire_global_lock (self=0xe09db0) at thread.c:123 #8 0x0000000000699303 in really_call_select (arg=0x7fffedf79a70) at thread.c:596 sa = 0x7fffedf79a70 self = 0xe09db0 oldset = { __val = {0, 0, 7, 0, 80, 140736817269952, 2031, 2080, 18446744073709550952, 32, 343597383808, 4, 0, 472446402655, 511101108348, 0} } #9 0x00000000005e5ee0 in flush_stack_call_func (func=0x699239 <really_call_select>, arg=0x7fffedf79a70) at alloc.c:4969 end = 0x7fffedf79a30 self = 0xe09db0 sentry = { o = { __max_align_ll = 0, __max_align_ld = <invalid float value> } } #10 0x0000000000699389 in thread_select (func=0x419320 <pselect <at> plt>, max_fds=9, rfds=0x7fffedf79fa0, wfds=0x7fffedf79f20, efds=0x0, timeout=0x7fffedf7a260, sigmask=0x0) at thread.c:616 sa = { func = 0x419320 <pselect <at> plt>, max_fds = 9, rfds = 0x7fffedf79fa0, wfds = 0x7fffedf79f20, efds = 0x0, timeout = 0x7fffedf7a260, sigmask = 0x0, result = 1 } #11 0x00000000006bfef5 in xg_select (fds_lim=9, rfds=0x7fffedf7a300, wfds=0x7fffedf7a280, efds=0x0, timeout=0x7fffedf7a260, sigmask=0x0) at xgselect.c:130 all_rfds = { fds_bits = {8, 0 <repeats 15 times>} } all_wfds = { fds_bits = {0 <repeats 16 times>} } tmo = { tv_sec = 0, tv_nsec = 0 } tmop = 0x7fffedf7a260 context = 0xc1d070 have_wfds = true gfds_buf = {{ fd = 5, events = 1, revents = 0 }, { fd = 6, events = 1, revents = 0 }, { fd = 8, events = 1, revents = 0 }, { fd = 0, events = 0, revents = 0 } <repeats 125 times>} gfds = 0x7fffedf79b10 gfds_size = 128 n_gfds = 3 retval = 0 our_fds = 0 max_fds = 8 context_acquired = true i = 3 nfds = 0 tmo_in_millisec = -1 must_free = 0 need_to_dispatch = false count = 3 #12 0x000000000066b757 in wait_reading_process_output (time_limit=3, nsecs=0, read_kbd=0, do_display=false, wait_for_cell=XIL(0), wait_proc=0x0, just_wait_proc=0) at process.c:5423 process_skipped = false channel = 0 nfds = 0 Available = { fds_bits = {8, 0 <repeats 15 times>} } Writeok = { fds_bits = {0 <repeats 16 times>} } check_write = true check_delay = 0 no_avail = false xerrno = 0 proc = XIL(0x7fffedf7a440) timeout = { tv_sec = 3, tv_nsec = 0 } end_time = { tv_sec = 1562935633, tv_nsec = 911868453 } timer_delay = { tv_sec = 0, tv_nsec = -1 } got_output_end_time = { tv_sec = 0, tv_nsec = -1 } wait = TIMEOUT got_some_output = -1 prev_wait_proc_nbytes_read = 0 retry_for_async = false count = 2 now = { tv_sec = 0, tv_nsec = -1 } #13 0x0000000000429bf6 in Fsleep_for (seconds=make_fixnum(3), milliseconds=XIL(0)) at dispnew.c:5825 t = { tv_sec = 3, tv_nsec = 0 } tend = { tv_sec = 1562935633, tv_nsec = 911868112 } duration = 3 #14 0x0000000000613e99 in eval_sub (form=XIL(0xf6df73)) at eval.c:2273 i = 2 maxargs = 2 args_left = XIL(0) numargs = 1 original_fun = XIL(0x7fffefa9fb98) original_args = XIL(0xf6df83) count = 1 fun = XIL(0xa756a5) val = XIL(0) funcar = make_fixnum(35184372085343) argvals = {make_fixnum(3), XIL(0), XIL(0), XIL(0), XIL(0), XIL(0), XIL(0), XIL(0)} #15 0x0000000000610032 in Fprogn (body=XIL(0)) at eval.c:462 form = XIL(0xf6df73) val = XIL(0) #16 0x0000000000616102 in funcall_lambda (fun=XIL(0xf6da43), nargs=0, arg_vector=0xe09dd8) at eval.c:3065 val = XIL(0xc0) syms_left = XIL(0) next = XIL(0x3400000013) lexenv = XIL(0) count = 1 i = 0 optional = false rest = false #17 0x0000000000615542 in Ffuncall (nargs=1, args=0xe09dd0) at eval.c:2813 fun = XIL(0xf6da43) original_fun = XIL(0xf6da43) funcar = XIL(0xc0) numargs = 0 val = XIL(0xaf72e0) count = 0 #18 0x000000000069956f in invoke_thread_function () at thread.c:702 count = 0 #19 0x0000000000611d61 in internal_condition_case (bfun=0x69953e <invoke_thread_function>, handlers=XIL(0x30), hfun=0x699596 <record_thread_error>) at eval.c:1351 val = make_fixnum(1405386) c = 0x7fffd8000c50 #20 0x0000000000699697 in run_thread (state=0xe09db0) at thread.c:741 stack_pos = { __max_align_ll = 0, __max_align_ld = 0 } self = 0xe09db0 iter = 0x0 c = 0x7fffd8000b20 #21 0x00007ffff4b38fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486 ret = <optimized out> pd = <optimized out> now = <optimized out> unwind_buf = { cancel_jmp_buf = {{ jmp_buf = {140737185822464, -1249422724209328276, 140737488341374, 140737488341375, 140737185822464, 0, 1249453444682727276, 1249398985402204012}, mask_was_saved = 0 }}, priv = { pad = {0x0, 0x0, 0x0, 0x0}, data = { prev = 0x0, cleanup = 0x0, canceltype = 0 } } } not_first_call = <optimized out> #22 0x00007ffff49724cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Lisp Backtrace: "sleep-for" (0xedf7a530) 0xf6da40 Lisp type 3 post_acquire_global_lock () can return abnormally (I didn't know that), so really_call_select() can, too, so thread_select() can, too. > > + ptrdiff_t count = SPECPDL_INDEX (); > > I don't think we should do that at this low level. You're right, it does stick out. I think we're safe because we're calling Fsignal with the global lock held, but it's not a pretty or well-documented situation.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.