From debbugs-submit-bounces@debbugs.gnu.org Sun Jun 04 10:58:09 2017 Received: (at submit) by debbugs.gnu.org; 4 Jun 2017 14:58:09 +0000 Received: from localhost ([127.0.0.1]:55940 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dHWyl-0003iV-44 for submit@debbugs.gnu.org; Sun, 04 Jun 2017 10:58:09 -0400 Received: from eggs.gnu.org ([208.118.235.92]:34576) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dHWyi-0003i0-HI for submit@debbugs.gnu.org; Sun, 04 Jun 2017 10:58:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dHWyX-0006Qk-Qj for submit@debbugs.gnu.org; Sun, 04 Jun 2017 10:57:59 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:49184) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dHWyX-0006QX-Mn for submit@debbugs.gnu.org; Sun, 04 Jun 2017 10:57:53 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53752) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dHWyS-0001Ec-0z for bug-guile@gnu.org; Sun, 04 Jun 2017 10:57:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dHWyM-0006IZ-Gc for bug-guile@gnu.org; Sun, 04 Jun 2017 10:57:48 -0400 Received: from mail-lf0-x22e.google.com ([2a00:1450:4010:c07::22e]:34158) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dHWyL-0006Fm-SF for bug-guile@gnu.org; Sun, 04 Jun 2017 10:57:42 -0400 Received: by mail-lf0-x22e.google.com with SMTP id v20so28785472lfa.1 for ; Sun, 04 Jun 2017 07:57:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:reply-to:from:date:message-id:subject:to; bh=fDowtPjzhOCoDpRB3nbwureBQp9WCeOO4DHNio7/poM=; b=PjF4HNEygq4jxAHQ3acXy2rdo/AJySclymxcP9U+GU8xK8VsWiNgxPeKVG2OAsO8j7 CFjFJ8E8C0YLJ7cpbwvB8e4fKqh3XtlGKt0ZxkBhdULX16j3SUbjOUa2EMUlCTlMZOoW HKOMjYh6C9n97H+Xctq9sO2y0VQ+nxIuxcpFDZ0433tADDIWBQos6g61c2oM6CtLKJb9 ctLjtVlfZ8vPQqRcyIRNBEwECB9/L3dnutCqek2rIhYsBV06PeMtaKJX3v/D9UWYWBfy lSZKpBqFkW2jnFioCL3KzeKCpF+JtQmXwemBVGPpBEbiOacJa9vUbMfUyK2Whf8CIdoQ vQOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:from:date:message-id :subject:to; bh=fDowtPjzhOCoDpRB3nbwureBQp9WCeOO4DHNio7/poM=; b=ja9UeYKg8wV/+4VnywTBHaWkyUko/sUHly9gTpAUYq+aZC0tJ3i6PSJNBddWQ5GqNs YHIgt+P1TIMVpbF0Ubul7NBdwDKkt73Z6qQX+TsYfK+DFrzsF28qz2Trbm89tEHNknkG xWid2SlkXTIBh4Wt+2njhk1MZOjddBWSBqSwkHgMh2h/GGBu+3PoRGm98LNuKgz7HTNe ao5X2OGeAGwvJo5Rfiug6YKG2dIlE/+NasZy1C/1Rp7dT8nLcG74Fhm044uT9xSVX6zA HJr0bIQLCKKNJFSeE1Wk5CYhQ1Amcsq2wB+2AYdXGMMGu4DpaBxVv0brtmSu30BczO63 yI3w== X-Gm-Message-State: AODbwcCNiBkiPqcp26u1/krl06spKbtEXMtLhR8K77aPC/S634IjmfP5 iyrekAUpCY5/DKtRCOXOtTn/fEpNODDL X-Received: by 10.46.14.10 with SMTP id 10mr5021674ljo.25.1496588257697; Sun, 04 Jun 2017 07:57:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.157.70 with HTTP; Sun, 4 Jun 2017 07:57:17 -0700 (PDT) From: Linas Vepstas Date: Sun, 4 Jun 2017 09:57:17 -0500 Message-ID: Subject: Hang in GC, inf loop while walking frame pointers To: bug-guile@gnu.org Content-Type: multipart/alternative; boundary="f403045ea6d679a79b0551239b33" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: linasvepstas@gmail.com Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --f403045ea6d679a79b0551239b33 Content-Type: text/plain; charset="UTF-8" what: guile-2.2-stable, from git. I've got a large, complex, heavily multi-threaded guile program that hangs during garbage collection; usually after running for half a day. It hangs in a tight loop in scm_i_vm_mark_stack, spinning at 100% of CPU. This is due to the for-loop line fp = SCM_FRAME_DYNAMIC_LINK (fp)) at libguile/vm.c line 975 failing to advance the frame pointer. There's no "obvious" corruption in the stack; it simply looks like the frame was incompletely set up, and so incrementing to the next fp does not go anywhere. I have recompiled guile with VM_ENABLE_ASSERTIONS and am trying to reproduce the bug now. The rest of this email is a record of a long debug session isolating the problem, and showing that, overall, the thread and stack data look more-or-less correct and uncorrupoted, except for the inability to walk forward in the frame. -- linas (gdb) bt #0 scm_i_vm_mark_stack (vp=0x755c1bd0, mark_stack_ptr=0x7f3e9b783f40, mark_stack_limit=0x7f3e9b793eb0) at ../../libguile/vm.c:1011 #1 0x00007f3e9db8835e in GC_mark_from (mark_stack_top=0x7f3e9b783ee0, mark_stack_top@entry=0x7f3e9b783f00, mark_stack=mark_stack@entry=0x7f3e9b783eb0, mark_stack_limit=mark_stack_limit@entry=0x7f3e9b793eb0) at ../mark.c:772 #2 0x00007f3e9db8897e in GC_do_local_mark (local_mark_stack=0x7f3e9b783eb0, local_top=0x7f3e9b783f00) at ../mark.c:1037 #3 0x00007f3e9db88b98 in GC_mark_local ( local_mark_stack=local_mark_stack@entry=0x7f3e9b783eb0, id=id@entry=4) at ../mark.c:1170 #4 0x00007f3e9db88eaa in GC_help_marker (my_mark_no=my_mark_no@entry=80003) at ../mark.c:1238 #5 0x00007f3e9db92e3c in GC_mark_thread (id=) at ../pthread_support.c:380 #6 0x00007f3e9e2ae6ba in start_thread (arg=0x7f3e9b794700) at pthread_create.c:333 #7 0x00007f3e9dfdd82d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 (gdb) step #0 find_slot_map (cache=0x7f3e9b783ab0, ip=0x1) at ../../libguile/vm.c:935 935 if (cache->entries[slot].ip == ip) (gdb) print slot $1 = 0 (gdb) print cache->entries[slot].ip $2 = (scm_t_uint32 *) 0x1 (gdb) print cache->entries[slot].map $4 = (const scm_t_uint8 *) 0x0 (gdb) step scm_i_vm_mark_stack (vp=0x755c1bd0, mark_stack_ptr=0x7f3e9b783f40, mark_stack_limit=0x7f3e9b793eb0) at ../../libguile/vm.c:1011 1011 slot_map = find_slot_map (SCM_FRAME_RETURN_ADDRESS (fp), &cache); (gdb) print fp $5 = (union scm_vm_stack_element *) 0x7f3e94cdee38 #define SCM_FRAME_RETURN_ADDRESS(fp) ((fp)[0].as_ip) (gdb) print (fp)[0].as_ip $6 = (scm_t_uint32 *) 0x1 OK that looks weird ... is this corrupted ?? but whatever, because the returned slot_map is never used ... because ... (gdb) step scm_i_vm_mark_stack (vp=0x755c1bd0, mark_stack_ptr=0x7f3e9b783f40, mark_stack_limit=0x7f3e9b793eb0) at ../../libguile/vm.c:975 975 fp = SCM_FRAME_DYNAMIC_LINK (fp)) (gdb) print fp $7 = (union scm_vm_stack_element *) 0x7f3e94cdee38 frames.h:#define SCM_FRAME_DYNAMIC_LINK(fp) ((fp) + (fp)[1].as_uint) (gdb) print (fp)[1].as_uint $8 = 0 OK, that seems bad, because now fp never advances, it just repeats over and over with this same value. (gdb) 979 for (slot = nlocals - 1; sp < fp; sp++, slot--) (gdb) print nlocals $10 = -2 (gdb) print sp $11 = (union scm_vm_stack_element *) 0x7f3e94cdee48 #define SCM_FRAME_NUM_LOCALS(fp, sp) ((fp) - (sp)) (gdb) print fp $12 = (union scm_vm_stack_element *) 0x7f3e94cdee38 (gdb) print ((fp) - (sp)) $13 = -2 Ohh .. its not -16 because its -2 * sizeof (union scm_vm_stack_element *) so that's OK. So for loops is skipped, it should go to. sp = SCM_FRAME_PREVIOUS_SP (fp); frames.h:#define SCM_FRAME_PREVIOUS_SP(fp) ((fp) + 2) and so now it loops around and repeats. (gdb) print cache $19 = {entries = {{ip = 0x1, map = 0x0}, {ip = 0x0, map = 0x0} , {ip = 0x1a1b5e0, map = 0x0}, {ip = 0x0, map = 0x0}, {ip = 0x0, map = 0x0}, {ip = 0x0, map = 0x0}, {ip = 0x0, map = 0x0}, {ip = 0x0, map = 0x0}, {ip = 0x0, map = 0x0}, {ip = 0x0, map = 0x0}}} (gdb) print &cache $20 = (struct slot_map_cache *) 0x7f3e9b783ab0 and so the loop repeates forever, because fp = SCM_FRAME_DYNAMIC_LINK (fp) never advances fp, because (fp)[1].as_uint is zero. So where is fp pointing to? recall fp == 0x7f3e94cdee38 (gdb) x/20x 0x7f3e94cdee00 0x7f3e94cdee00: 0x0000000000000904 0x000000000000200c 0x7f3e94cdee10: 0x0000000001a1b7e0 0x0000000001a1b5e0 0x7f3e94cdee20: 0x0000000000000004 0x000000000000200c 0x7f3e94cdee30: 0x0000000000169bd6 0x0000000000000001 0x7f3e94cdee40: 0x0000000000000000 0x000000000192acd0 0x7f3e94cdee50: 0x0000000001fa4bc0 0x0000000001a1c6d0 0x7f3e94cdee60: 0x0000000044507950 0x0000000000000002 0x7f3e94cdee70: 0x000000000005a6f5 0x00000000018febd0 0x7f3e94cdee80: 0x000000006d490d00 0x00007f3e9e97241c 0x7f3e94cdee90: 0x0000000000000002 0x00007f3e9e97241c (gdb) x/s 0x7f3e94cdee60 0x7f3e94cdee60: "PyPD" <<< ?? is this a meaningul string? (gdb) x/s 0x7f3e94cdee81 0x7f3e94cdee81: "\rIm" << how about this ??? So this does not look healthy. How did we get here? (gdb) print vp $21 = (struct scm_vm *) 0x755c1bd0 (gdb) print mark_stack_ptr $22 = (struct GC_ms_entry *) 0x7f3e9b783f40 (gdb) print mark_stack_limit $23 = (struct GC_ms_entry *) 0x7f3e9b793eb0 (gdb) print upper $24 = (void *) 0x7ffdd986 (gdb) print GC_greatest_plausible_heap_addr $25 = (void *) 0x7ffdd986 (gdb) print lower $26 = (void *) 0x17a1ff8 (gdb) print vp->fp $27 = (union scm_vm_stack_element *) 0x7f3e94cdee18 (gdb) print vp->sp $28 = (union scm_vm_stack_element *) 0x7f3e94cdee00 (gdb) print vp->stack_top $29 = (union scm_vm_stack_element *) 0x7f3e94cdf000 So lets walk manually. frames.h:#define SCM_FRAME_DYNAMIC_LINK(fp) ((fp) + (fp)[1].as_uint) (gdb) print (vp->fp) + (vp->fp)[1].as_uint $31 = (union scm_vm_stack_element *) 0x7f3e94cdee38 So we are instanly in trouble! once through the loop, and we've got the broken fp So how did we get here? (gdb) print vp $32 = (struct scm_vm *) 0x755c1bd0 (gdb) print *vp $33 = {ip = 0x7f3e9e5a6378 , sp = 0x7f3e94cdee00, fp = 0x7f3e94cdee18, stack_limit = 0x7f3e94cde000, trace_level = 0, sp_min_since_gc = 0x7f3e94cdee00, stack_size = 512, stack_bottom = 0x7f3e94cde000, stack_top = 0x7f3e94cdf000, overflow_handler_stack = 0x304, hooks = {0x4, 0x4, 0x4, 0x4, 0x4}, resumable_prompt_cookie = 0x7f3d397f9440, engine = 1} Looks plausible, not corrupted. Why is engine=1 ?? #define SCM_VM_REGULAR_ENGINE 0 #define SCM_VM_DEBUG_ENGINE 1 #define SCM_VM_NUM_ENGINES 2 # define SCM_I_CURRENT_THREAD (scm_i_current_thread) (gdb) print scm_i_current_thread $34 = (scm_i_thread *) 0x0 OK, that's not excellent. (gdb) print thread_count $35 = 161 (gdb) bt #0 scm_i_vm_mark_stack (vp=0x755c1bd0, mark_stack_ptr=0x7f3e9b783f40, mark_stack_limit=0x7f3e9b793eb0) at ../../libguile/vm.c:1011 #1 0x00007f3e9db8835e in GC_mark_from (mark_stack_top=0x7f3e9b783ee0, mark_stack_top@entry=0x7f3e9b783f00, mark_stack=mark_stack@entry=0x7f3e9b783eb0, mark_stack_limit=mark_stack_limit@entry=0x7f3e9b793eb0) at ../mark.c:772 #2 0x00007f3e9db8897e in GC_do_local_mark (local_mark_stack=0x7f3e9b783eb0, local_top=0x7f3e9b783f00) at ../mark.c:1037 #3 0x00007f3e9db88b98 in GC_mark_local ( local_mark_stack=local_mark_stack@entry=0x7f3e9b783eb0, id=id@entry=4) at ../mark.c:1170 #4 0x00007f3e9db88eaa in GC_help_marker (my_mark_no=my_mark_no@entry=80003) at ../mark.c:1238 #5 0x00007f3e9db92e3c in GC_mark_thread (id=) at ../pthread_support.c:380 #6 0x00007f3e9e2ae6ba in start_thread (arg=0x7f3e9b794700) at pthread_create.c:333 #7 0x00007f3e9dfdd82d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 so bdwgc is version gc7_6_0 So ../mark.c:772 is GET_HDR and GET_HDR is in include/private/gc_hdrs.h define GET_HDR(p, hhdr) ... ((hhdr) = HDR_INNER(p) # define HDR_INNER(p) HDR_FROM_BI(BI(p),p) #define HDR_FROM_BI(bi, p) \ ((bi)->index[((word)(p) >> LOG_HBLKSIZE) & (BOTTOM_SZ - 1)]) This seems all wrong.. wtf .... is it called by thread_mark ?? in libguile/thread.c: thread_gc_kind = GC_new_kind (GC_new_free_list (), GC_MAKE_PROC (GC_new_proc (thread_mark), 0), 0, 1) in bdwgc/misc.c GC_new_proc GC_mark_procs[result] = proc #define GC_mark_procs GC_arrays._mark_procs #define PROC(descr) \ (GC_mark_procs[((descr) >> GC_DS_TAG_BITS) & (GC_MAX_MARK_PROCS-1)]) GC_obj_kinds[result].ok_descriptor = descr; <<< this is where MAKE_PROC goes. anyway ... in mark.c line 733 case GC_DS_PROC: mark_stack_top--; credit -= GC_PROC_BYTES; mark_stack_top = (*PROC(descr))((word *)current_p, mark_stack_top, mark_stack_limit, ENV(descr)); continue; But that's the wrong line number. line 772 is in case GC_DS_PER_OBJECT: if ((signed_word)descr >= 0) { /* Descriptor is in the object. */ descr = *(word *)(current_p + descr - GC_DS_PER_OBJECT); Oh, it goes to retry, which loops around and calls PROC which is thread_mark which is called with args mark_stack_top = (*PROC(descr))((word *)current_p, mark_stack_top, mark_stack_limit, ENV(descr)); (gdb) print current_p $39 = (ptr_t) 0x7225d380 "@\325%r" so back in libguile/threads.c line 84: const struct scm_i_thread *t = (struct scm_i_thread *) addr; (gdb) print *(struct scm_i_thread *) current_p $40 = {next_thread = 0x7225d540, handle = 0x5763f3b0, pthread = 139900934399744, result = 0x4, exited = 0, guile_mode = 1, needs_unregister = 1, wake = 0x0, sleep_cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' , __align = 0}, sleep_pipe = {151, 197}, freelists = 0x71cfe800, pointerless_freelists = 0x71cfe780, dynamic_state = 0x71de7900, dynstack = { base = 0x698c3800, top = 0x698c3940, limit = 0x698c3a00}, pending_asyncs = 0x304, block_asyncs = 0, continuation_root = 0x5763f340, continuation_base = 0x7f3d397f9c90, vp = 0x755c1bd0, base = 0x7f3d397f9da0, regs = {{__jmpbuf = {139905204221696, 139906923931021, 1, 139894281647232, 139906913276512, 0, 139906918327551, 8388608}, __mask_was_saved = 964664752, __saved_mask = {__val = {0, 139900934399744, 139900934399744, 139900934397344, 139906923929400, 139893936840720, 1804170656, 139906913276512, 139900934399744, 139900934397344, 139900934397504, 139893936840720, 139906910859134, 139906913276484, 139906910857112, 27401856}}}}} (gdb) print ((struct scm_i_thread *) current_p)->vp $41 = (struct scm_vm *) 0x755c1bd0 which is exactly the vp we had before, so this is the correct thread. So ... all the values in *(struct scm_i_thread *) current_p look more or less OK; its not corrupted. And all the values in *vp look more or less OK, its not corrrupted except that engine=1 looks wrong. So what's up with the weird stack looping? Again: >From before, we had: (gdb) print *vp $42 = {ip = 0x7f3e9e5a6378 , sp = 0x7f3e94cdee00, fp = 0x7f3e94cdee18, stack_limit = 0x7f3e94cde000, trace_level = 0, sp_min_since_gc = 0x7f3e94cdee00, stack_size = 512, stack_bottom = 0x7f3e94cde000, stack_top = 0x7f3e94cdf000, overflow_handler_stack = 0x304, hooks = {0x4, 0x4, 0x4, 0x4, 0x4}, resumable_prompt_cookie = 0x7f3d397f9440, engine = 1} so fp, sp, stack_bottom, stack_top all are self-consistent with each other. The problem is that fp = SCM_FRAME_DYNAMIC_LINK (fp) goes nowhere, after the first iteration. Is the frame corrupted? How? what does the rest of the stack look like? (gdb) x/32x 0x7f3e94cdee00 0x7f3e94cdee00: 0x0000000000000904 0x000000000000200c 0x7f3e94cdee10: 0x0000000001a1b7e0 0x0000000001a1b5e0 0x7f3e94cdee20: 0x0000000000000004 0x000000000000200c <<< theres the 4 0x7f3e94cdee30: 0x0000000000169bd6 0x0000000000000001 <<< there's the new fp 0x7f3e94cdee40: 0x0000000000000000 0x000000000192acd0 <<< there's the zero. 0x7f3e94cdee50: 0x0000000001fa4bc0 0x0000000001a1c6d0 0x7f3e94cdee60: 0x0000000044507950 0x0000000000000002 0x7f3e94cdee70: 0x000000000005a6f5 0x00000000018febd0 0x7f3e94cdee80: 0x000000006d490d00 0x00007f3e9e97241c 0x7f3e94cdee90: 0x0000000000000002 0x00007f3e9e97241c 0x7f3e94cdeea0: 0x0000000000000002 0x00007f3e8d797630 0x7f3e94cdeeb0: 0x0000000000000002 0x00007f3e86effacc 0x7f3e94cdeec0: 0x0000000000000003 0x000000006d490d00 0x7f3e94cdeed0: 0x00007f3e86eb8698 0x0000000000000002 0x7f3e94cdeee0: 0x00007f3e8e51f450 0x0000000000000008 0x7f3e94cdeef0: 0x0000000001a37c10 0x000000000192acd0 So... who sets up thread->vp ? /* #define VM_ENABLE_ASSERTIONS */ However, defining this breaks the build. diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c index 6c88ebf..f0f8164 100644 --- a/libguile/vm-engine.c +++ b/libguile/vm-engine.c @@ -2050,11 +2050,11 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp, if (!scm_module_system_booted_p) { - ASSERT (scm_is_true + ASSERT (scm_is_true( scm_equal_p (modname, scm_list_2 (SCM_BOOL_T, - scm_from_utf8_symbol ("guile")))); + scm_from_utf8_symbol ("guile"))))); var = scm_lookup (sym); } else if (scm_is_true (SCM_CAR (modname))) diff --git a/libguile/vm.c b/libguile/vm.c index 18f2192..c26066e 100644 --- a/libguile/vm.c +++ b/libguile/vm.c @@ -63,7 +63,7 @@ static size_t page_size; /* The VM has a number of internal assertions that shouldn't normally be necessary, but might be if you think you found a bug in the VM. */ -/* #define VM_ENABLE_ASSERTIONS */ +#define VM_ENABLE_ASSERTIONS static void vm_expand_stack (struct scm_vm *vp, union scm_vm_stack_element *new_sp) SCM_NOINLINE; GEN guile-procedures.texi /bin/bash: line 1: 24078 Broken pipe ./meta/guile Aborted ./libguile/guile Aborted gdb libguile/.libs/lt-guile #0 0x00007ffff74f7428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007ffff74f902a in __GI_abort () at abort.c:89 #2 0x00007ffff7b707c4 in vm_regular_engine (thread=0x7282, vp=0x806f30, registers=0x6, resume=-145787864) at ../../libguile/vm-engine.c:2053 #3 0x00007ffff7b711d2 in scm_call_n (proc=0x832b30, argv=argv@entry=0x0, nargs=nargs@entry=0) at ../../libguile/vm.c:1257 #4 0x00007ffff7af14e9 in scm_call_0 (proc=) at ../../libguile/eval.c:481 #5 0x00007ffff7b10568 in scm_primitive_load_path (args=) at ../../libguile/load.c:1248 #6 0x00007ffff7b6e976 in vm_regular_engine (thread=0x7282, vp=0x806f30, registers=0x6, resume=-145787864) at ../../libguile/vm-engine.c:784 Oh. wtf. (define last-gc (gc-stats)) (define start-time (get-internal-real-time)) (define run-time (get-internal-run-time)) (define (avg-time-taken) (define now (get-internal-real-time)) (define run (get-internal-run-time)) (define cur (gc-stats)) (define gc-time-taken (* 1.0e-9 (- (cdar cur) (cdar last-gc)))) (define elapsed-time (* 1.0e-9 (- now start-time))) (define cpu-time (* 1.0e-9 (- run run-time))) (format #t "Elapsed time: ~5f secs. GC-time-taken: ~5f s or ~5f% cpu-usage: ~5f%\n" elapsed-time gc-time-taken (* 100 (/ gc-time-taken elapsed-time)) (* 100 (/ cpu-time elapsed-time)) ) (set! last-gc cur) (set! start-time now) (set! run-time run) ) linas@fanny: ~/src/novamente/src/opencog-git/opencog/nlp/learn $ vi x linas@fanny: ~/src/novamente/src/opencog-git/opencog/nlp/learn $ cat x #0 find_slot_map (cache=0x7f3e9b783ab0, ip=0x1) at ../../libguile/vm.c:935 935 if (cache->entries[slot].ip == ip) (gdb) print slot $1 = 0 (gdb) print cache->entries[slot].ip $2 = (scm_t_uint32 *) 0x1 (gdb) print cache->entries[slot].map $4 = (const scm_t_uint8 *) 0x0 (gdb) step scm_i_vm_mark_stack (vp=0x755c1bd0, mark_stack_ptr=0x7f3e9b783f40, mark_stack_limit=0x7f3e9b793eb0) at ../../libguile/vm.c:1011 1011 slot_map = find_slot_map (SCM_FRAME_RETURN_ADDRESS (fp), &cache); (gdb) print fp $5 = (union scm_vm_stack_element *) 0x7f3e94cdee38 #define SCM_FRAME_RETURN_ADDRESS(fp) ((fp)[0].as_ip) (gdb) print (fp)[0].as_ip $6 = (scm_t_uint32 *) 0x1 OK that looks weird ... is this corrupted ?? but whatever, because the returned slot_map is never used ... because ... (gdb) step scm_i_vm_mark_stack (vp=0x755c1bd0, mark_stack_ptr=0x7f3e9b783f40, mark_stack_limit=0x7f3e9b793eb0) at ../../libguile/vm.c:975 975 fp = SCM_FRAME_DYNAMIC_LINK (fp)) (gdb) print fp $7 = (union scm_vm_stack_element *) 0x7f3e94cdee38 frames.h:#define SCM_FRAME_DYNAMIC_LINK(fp) ((fp) + (fp)[1].as_uint) (gdb) print (fp)[1].as_uint $8 = 0 OK, that seems bad, because now fp never advances, it just repeats over and over with this same value. (gdb) 979 for (slot = nlocals - 1; sp < fp; sp++, slot--) (gdb) print nlocals $10 = -2 (gdb) print sp $11 = (union scm_vm_stack_element *) 0x7f3e94cdee48 #define SCM_FRAME_NUM_LOCALS(fp, sp) ((fp) - (sp)) (gdb) print fp $12 = (union scm_vm_stack_element *) 0x7f3e94cdee38 (gdb) print ((fp) - (sp)) $13 = -2 Ohh .. its not -16 because its -2 * sizeof (union scm_vm_stack_element *) so that's OK. So for loops is skipped, it should go to. sp = SCM_FRAME_PREVIOUS_SP (fp); frames.h:#define SCM_FRAME_PREVIOUS_SP(fp) ((fp) + 2) and so now it loops around and repeats. (gdb) print cache $19 = {entries = {{ip = 0x1, map = 0x0}, {ip = 0x0, map = 0x0} , {ip = 0x1a1b5e0, map = 0x0}, {ip = 0x0, map = 0x0}, {ip = 0x0, map = 0x0}, {ip = 0x0, map = 0x0}, {ip = 0x0, map = 0x0}, {ip = 0x0, map = 0x0}, {ip = 0x0, map = 0x0}, {ip = 0x0, map = 0x0}}} (gdb) print &cache $20 = (struct slot_map_cache *) 0x7f3e9b783ab0 and so the loop repeates forever, because fp = SCM_FRAME_DYNAMIC_LINK (fp) never advances fp, because (fp)[1].as_uint is zero. So where is fp pointing to? recall fp == 0x7f3e94cdee38 (gdb) x/20x 0x7f3e94cdee00 0x7f3e94cdee00: 0x0000000000000904 0x000000000000200c 0x7f3e94cdee10: 0x0000000001a1b7e0 0x0000000001a1b5e0 0x7f3e94cdee20: 0x0000000000000004 0x000000000000200c 0x7f3e94cdee30: 0x0000000000169bd6 0x0000000000000001 0x7f3e94cdee40: 0x0000000000000000 0x000000000192acd0 0x7f3e94cdee50: 0x0000000001fa4bc0 0x0000000001a1c6d0 0x7f3e94cdee60: 0x0000000044507950 0x0000000000000002 0x7f3e94cdee70: 0x000000000005a6f5 0x00000000018febd0 0x7f3e94cdee80: 0x000000006d490d00 0x00007f3e9e97241c 0x7f3e94cdee90: 0x0000000000000002 0x00007f3e9e97241c (gdb) x/s 0x7f3e94cdee60 0x7f3e94cdee60: "PyPD" <<< ?? is this a meaningul string? (gdb) x/s 0x7f3e94cdee81 0x7f3e94cdee81: "\rIm" << how about this ??? So this does not look healthy. How did we get here? (gdb) print vp $21 = (struct scm_vm *) 0x755c1bd0 (gdb) print mark_stack_ptr $22 = (struct GC_ms_entry *) 0x7f3e9b783f40 (gdb) print mark_stack_limit $23 = (struct GC_ms_entry *) 0x7f3e9b793eb0 (gdb) print upper $24 = (void *) 0x7ffdd986 (gdb) print GC_greatest_plausible_heap_addr $25 = (void *) 0x7ffdd986 (gdb) print lower $26 = (void *) 0x17a1ff8 (gdb) print vp->fp $27 = (union scm_vm_stack_element *) 0x7f3e94cdee18 (gdb) print vp->sp $28 = (union scm_vm_stack_element *) 0x7f3e94cdee00 (gdb) print vp->stack_top $29 = (union scm_vm_stack_element *) 0x7f3e94cdf000 So lets walk manually. frames.h:#define SCM_FRAME_DYNAMIC_LINK(fp) ((fp) + (fp)[1].as_uint) (gdb) print (vp->fp) + (vp->fp)[1].as_uint $31 = (union scm_vm_stack_element *) 0x7f3e94cdee38 So we are instanly in trouble! once through the loop, and we've got the broken fp So how did we get here? (gdb) print vp $32 = (struct scm_vm *) 0x755c1bd0 (gdb) print *vp $33 = {ip = 0x7f3e9e5a6378 , sp = 0x7f3e94cdee00, fp = 0x7f3e94cdee18, stack_limit = 0x7f3e94cde000, trace_level = 0, sp_min_since_gc = 0x7f3e94cdee00, stack_size = 512, stack_bottom = 0x7f3e94cde000, stack_top = 0x7f3e94cdf000, overflow_handler_stack = 0x304, hooks = {0x4, 0x4, 0x4, 0x4, 0x4}, resumable_prompt_cookie = 0x7f3d397f9440, engine = 1} Looks plausible, not corrupted. Why is engine=1 ?? #define SCM_VM_REGULAR_ENGINE 0 #define SCM_VM_DEBUG_ENGINE 1 #define SCM_VM_NUM_ENGINES 2 # define SCM_I_CURRENT_THREAD (scm_i_current_thread) (gdb) print scm_i_current_thread $34 = (scm_i_thread *) 0x0 OK, that's not excellent. (gdb) print thread_count $35 = 161 (gdb) bt #0 scm_i_vm_mark_stack (vp=0x755c1bd0, mark_stack_ptr=0x7f3e9b783f40, mark_stack_limit=0x7f3e9b793eb0) at ../../libguile/vm.c:1011 #1 0x00007f3e9db8835e in GC_mark_from (mark_stack_top=0x7f3e9b783ee0, mark_stack_top@entry=0x7f3e9b783f00, mark_stack=mark_stack@entry=0x7f3e9b783eb0, mark_stack_limit=mark_stack_limit@entry=0x7f3e9b793eb0) at ../mark.c:772 #2 0x00007f3e9db8897e in GC_do_local_mark (local_mark_stack=0x7f3e9b783eb0, local_top=0x7f3e9b783f00) at ../mark.c:1037 #3 0x00007f3e9db88b98 in GC_mark_local ( local_mark_stack=local_mark_stack@entry=0x7f3e9b783eb0, id=id@entry=4) at ../mark.c:1170 #4 0x00007f3e9db88eaa in GC_help_marker (my_mark_no=my_mark_no@entry=80003) at ../mark.c:1238 #5 0x00007f3e9db92e3c in GC_mark_thread (id=) at ../pthread_support.c:380 #6 0x00007f3e9e2ae6ba in start_thread (arg=0x7f3e9b794700) at pthread_create.c:333 #7 0x00007f3e9dfdd82d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 so bdwgc is version gc7_6_0 So ../mark.c:772 is GET_HDR and GET_HDR is in include/private/gc_hdrs.h define GET_HDR(p, hhdr) ... ((hhdr) = HDR_INNER(p) # define HDR_INNER(p) HDR_FROM_BI(BI(p),p) #define HDR_FROM_BI(bi, p) \ ((bi)->index[((word)(p) >> LOG_HBLKSIZE) & (BOTTOM_SZ - 1)]) This seems all wrong.. wtf .... is it called by thread_mark ?? in libguile/thread.c: thread_gc_kind = GC_new_kind (GC_new_free_list (), GC_MAKE_PROC (GC_new_proc (thread_mark), 0), 0, 1) in bdwgc/misc.c GC_new_proc GC_mark_procs[result] = proc #define GC_mark_procs GC_arrays._mark_procs #define PROC(descr) \ (GC_mark_procs[((descr) >> GC_DS_TAG_BITS) & (GC_MAX_MARK_PROCS-1)]) GC_obj_kinds[result].ok_descriptor = descr; <<< this is where MAKE_PROC goes. anyway ... in mark.c line 733 case GC_DS_PROC: mark_stack_top--; credit -= GC_PROC_BYTES; mark_stack_top = (*PROC(descr))((word *)current_p, mark_stack_top, mark_stack_limit, ENV(descr)); continue; But that's the wrong line number. line 772 is in case GC_DS_PER_OBJECT: if ((signed_word)descr >= 0) { /* Descriptor is in the object. */ descr = *(word *)(current_p + descr - GC_DS_PER_OBJECT); Oh, it goes to retry, which loops around and calls PROC which is thread_mark which is called with args mark_stack_top = (*PROC(descr))((word *)current_p, mark_stack_top, mark_stack_limit, ENV(descr)); (gdb) print current_p $39 = (ptr_t) 0x7225d380 "@\325%r" so back in libguile/threads.c line 84: const struct scm_i_thread *t = (struct scm_i_thread *) addr; (gdb) print *(struct scm_i_thread *) current_p $40 = {next_thread = 0x7225d540, handle = 0x5763f3b0, pthread = 139900934399744, result = 0x4, exited = 0, guile_mode = 1, needs_unregister = 1, wake = 0x0, sleep_cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' , __align = 0}, sleep_pipe = {151, 197}, freelists = 0x71cfe800, pointerless_freelists = 0x71cfe780, dynamic_state = 0x71de7900, dynstack = { base = 0x698c3800, top = 0x698c3940, limit = 0x698c3a00}, pending_asyncs = 0x304, block_asyncs = 0, continuation_root = 0x5763f340, continuation_base = 0x7f3d397f9c90, vp = 0x755c1bd0, base = 0x7f3d397f9da0, regs = {{__jmpbuf = {139905204221696, 139906923931021, 1, 139894281647232, 139906913276512, 0, 139906918327551, 8388608}, __mask_was_saved = 964664752, __saved_mask = {__val = {0, 139900934399744, 139900934399744, 139900934397344, 139906923929400, 139893936840720, 1804170656, 139906913276512, 139900934399744, 139900934397344, 139900934397504, 139893936840720, 139906910859134, 139906913276484, 139906910857112, 27401856}}}}} (gdb) print ((struct scm_i_thread *) current_p)->vp $41 = (struct scm_vm *) 0x755c1bd0 which is exactly the vp we had before, so this is the correct thread. So ... all the values in *(struct scm_i_thread *) current_p look more or less OK; its not corrupted. And all the values in *vp look more or less OK, its not corrrupted except that engine=1 looks wrong. So what's up with the weird stack looping? Again: >From before, we had: (gdb) print *vp $42 = {ip = 0x7f3e9e5a6378 , sp = 0x7f3e94cdee00, fp = 0x7f3e94cdee18, stack_limit = 0x7f3e94cde000, trace_level = 0, sp_min_since_gc = 0x7f3e94cdee00, stack_size = 512, stack_bottom = 0x7f3e94cde000, stack_top = 0x7f3e94cdf000, overflow_handler_stack = 0x304, hooks = {0x4, 0x4, 0x4, 0x4, 0x4}, resumable_prompt_cookie = 0x7f3d397f9440, engine = 1} so fp, sp, stack_bottom, stack_top all are self-consistent with each other. The problem is that fp = SCM_FRAME_DYNAMIC_LINK (fp) goes nowhere, after the first iteration. Is the frame corrupted? How? what does the rest of the stack look like? (gdb) x/32x 0x7f3e94cdee00 0x7f3e94cdee00: 0x0000000000000904 0x000000000000200c 0x7f3e94cdee10: 0x0000000001a1b7e0 0x0000000001a1b5e0 0x7f3e94cdee20: 0x0000000000000004 0x000000000000200c <<< theres the 4 0x7f3e94cdee30: 0x0000000000169bd6 0x0000000000000001 <<< there's the new fp 0x7f3e94cdee40: 0x0000000000000000 0x000000000192acd0 <<< there's the zero. 0x7f3e94cdee50: 0x0000000001fa4bc0 0x0000000001a1c6d0 0x7f3e94cdee60: 0x0000000044507950 0x0000000000000002 0x7f3e94cdee70: 0x000000000005a6f5 0x00000000018febd0 0x7f3e94cdee80: 0x000000006d490d00 0x00007f3e9e97241c 0x7f3e94cdee90: 0x0000000000000002 0x00007f3e9e97241c 0x7f3e94cdeea0: 0x0000000000000002 0x00007f3e8d797630 0x7f3e94cdeeb0: 0x0000000000000002 0x00007f3e86effacc 0x7f3e94cdeec0: 0x0000000000000003 0x000000006d490d00 0x7f3e94cdeed0: 0x00007f3e86eb8698 0x0000000000000002 0x7f3e94cdeee0: 0x00007f3e8e51f450 0x0000000000000008 0x7f3e94cdeef0: 0x0000000001a37c10 0x000000000192acd0 So... who sets up thread->vp ? /* #define VM_ENABLE_ASSERTIONS */ However, defining this breaks the build. diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c index 6c88ebf..f0f8164 100644 --- a/libguile/vm-engine.c +++ b/libguile/vm-engine.c @@ -2050,11 +2050,11 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp, if (!scm_module_system_booted_p) { - ASSERT (scm_is_true + ASSERT (scm_is_true( scm_equal_p (modname, scm_list_2 (SCM_BOOL_T, - scm_from_utf8_symbol ("guile")))); + scm_from_utf8_symbol ("guile"))))); var = scm_lookup (sym); } else if (scm_is_true (SCM_CAR (modname))) diff --git a/libguile/vm.c b/libguile/vm.c index 18f2192..c26066e 100644 --- a/libguile/vm.c +++ b/libguile/vm.c @@ -63,7 +63,7 @@ static size_t page_size; /* The VM has a number of internal assertions that shouldn't normally be necessary, but might be if you think you found a bug in the VM. */ -/* #define VM_ENABLE_ASSERTIONS */ +#define VM_ENABLE_ASSERTIONS static void vm_expand_stack (struct scm_vm *vp, union scm_vm_stack_element *new_sp) SCM_NOINLINE; GEN guile-procedures.texi /bin/bash: line 1: 24078 Broken pipe ./meta/guile Aborted ./libguile/guile Aborted gdb libguile/.libs/lt-guile #0 0x00007ffff74f7428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007ffff74f902a in __GI_abort () at abort.c:89 #2 0x00007ffff7b707c4 in vm_regular_engine (thread=0x7282, vp=0x806f30, registers=0x6, resume=-145787864) at ../../libguile/vm-engine.c:2053 #3 0x00007ffff7b711d2 in scm_call_n (proc=0x832b30, argv=argv@entry=0x0, nargs=nargs@entry=0) at ../../libguile/vm.c:1257 #4 0x00007ffff7af14e9 in scm_call_0 (proc=) at ../../libguile/eval.c:481 #5 0x00007ffff7b10568 in scm_primitive_load_path (args=) at ../../libguile/load.c:1248 #6 0x00007ffff7b6e976 in vm_regular_engine (thread=0x7282, vp=0x806f30, registers=0x6, resume=-145787864) at ../../libguile/vm-engine.c:784 --f403045ea6d679a79b0551239b33 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
what: guile-2.2-stable, from git.

I've got= a large, complex, heavily multi-threaded guile program that
hangs durin= g garbage collection; usually after running for half a day.
It hangs in = a tight loop in scm_i_vm_mark_stack, spinning at 100% of CPU.

This i= s due to the for-loop line fp =3D SCM_FRAME_DYNAMIC_LINK (fp))
at libgui= le/vm.c line 975 failing to advance the frame pointer.
There's no &q= uot;obvious" corruption in the stack; it simply looks like
the fram= e was incompletely set up, and so incrementing to the next
fp does not g= o anywhere.

I have recompiled guile with VM_ENABLE_ASSERTIONS and am= trying to
reproduce the bug now.=C2=A0 The rest of this email is a reco= rd of a long
debug session isolating the problem, and showing that, over= all, the
thread and stack data look more-or-less correct and uncorrupote= d,
except for the inability to walk forward in the frame.

-- lina= s

(gdb) bt
#0=C2=A0 scm_i_vm_mark_stack (vp=3D0x755c1bd0, mark_st= ack_ptr=3D0x7f3e9b783f40,
=C2=A0=C2=A0=C2=A0 mark_stack_limit=3D0x7f3e9b= 793eb0) at ../../libguile/vm.c:1011
#1=C2=A0 0x00007f3e9db8835e in GC_ma= rk_from (mark_stack_top=3D0x7f3e9b783ee0,
=C2=A0=C2=A0=C2=A0 mark_stack_= top@entry=3D0x7f3e9b783f00,
=C2=A0=C2=A0=C2=A0 mark_stack=3Dmark_stack@e= ntry=3D0x7f3e9b783eb0,
=C2=A0=C2=A0=C2=A0 mark_stack_limit=3Dmark_stack_= limit@entry=3D0x7f3e9b793eb0) at ../mark.c:772
#2=C2=A0 0x00007f3e9db889= 7e in GC_do_local_mark (local_mark_stack=3D0x7f3e9b783eb0,
=C2=A0=C2=A0= =C2=A0 local_top=3D0x7f3e9b783f00) at ../mark.c:1037
#3=C2=A0 0x00007f3e= 9db88b98 in GC_mark_local (
=C2=A0=C2=A0=C2=A0 local_mark_stack=3Dlocal_= mark_stack@entry=3D0x7f3e9b783eb0, id=3Did@entry=3D4)
=C2=A0=C2=A0=C2=A0= at ../mark.c:1170
#4=C2=A0 0x00007f3e9db88eaa in GC_help_marker (my_mar= k_no=3Dmy_mark_no@entry=3D80003)
=C2=A0=C2=A0=C2=A0 at ../mark.c:1238#5=C2=A0 0x00007f3e9db92e3c in GC_mark_thread (id=3D<optimized out>)=
=C2=A0=C2=A0=C2=A0 at ../pthread_support.c:380
#6=C2=A0 0x00007f3e9e= 2ae6ba in start_thread (arg=3D0x7f3e9b794700)
=C2=A0=C2=A0=C2=A0 at pthr= ead_create.c:333
#7=C2=A0 0x00007f3e9dfdd82d in clone ()
=C2=A0=C2=A0= =C2=A0 at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

(gdb) = step

#0=C2=A0 find_slot_map (cache=3D0x7f3e9b783ab0, ip=3D0x1) = at
../../libguile/vm.c:935
935=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if= (cache->entries[slot].ip =3D=3D ip)
(gdb) print slot
$1 =3D 0
= (gdb) print cache->entries[slot].ip
$2 =3D (scm_t_uint32 *) 0x1
(g= db) print cache->entries[slot].map
$4 =3D (const scm_t_uint8 *) 0x0(gdb) step
scm_i_vm_mark_stack (vp=3D0x755c1bd0, mark_stack_ptr=3D0x7f= 3e9b783f40,
=C2=A0=C2=A0=C2=A0 mark_stack_limit=3D0x7f3e9b793eb0) at ..= /../libguile/vm.c:1011
1011=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sl= ot_map =3D find_slot_map (SCM_FRAME_RETURN_ADDRESS (fp), &cache);
(g= db) print fp
$5 =3D (union scm_vm_stack_element *) 0x7f3e94cdee38
#define SCM_FRAME_RETURN_ADDRESS(fp)=C2=A0=C2=A0=C2=A0 ((fp)[0].as_ip)
= (gdb) print (fp)[0].as_ip
$6 =3D (scm_t_uint32 *) 0x1

OK that loo= ks weird ... is this corrupted ?? but whatever,
because the returned slo= t_map is never used ... because ...

(gdb) step
scm_i_vm_mark_stac= k (vp=3D0x755c1bd0, mark_stack_ptr=3D0x7f3e9b783f40,
=C2=A0=C2=A0=C2=A0= mark_stack_limit=3D0x7f3e9b793eb0) at ../../libguile/vm.c:975
975=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 fp =3D SCM_FRA= ME_DYNAMIC_LINK (fp))
(gdb) print fp
$7 =3D (union scm_vm_stack_eleme= nt *) 0x7f3e94cdee38

frames.h:#define SCM_FRAME_DYNAMIC_LINK(fp)=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 ((fp) + (fp)[1].as_uint)
(gdb) print (fp)[1]= .as_uint
$8 =3D 0

OK, that seems bad, because now fp never advanc= es, it just repeats
over and over with this same value.

(gdb) 979=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 for (slot = =3D nlocals - 1; sp < fp; sp++, slot--)
(gdb) print nlocals
$10 = =3D -2
(gdb) print sp
$11 =3D (union scm_vm_stack_element *) 0x7f3e94= cdee48
#define SCM_FRAME_NUM_LOCALS(fp, sp)=C2=A0=C2=A0=C2=A0 ((fp) - (s= p))
(gdb) print fp
$12 =3D (union scm_vm_stack_element *) 0x7f3e94cde= e38
(gdb) print=C2=A0 ((fp) - (sp))
$13 =3D -2
Ohh .. its not -16 = because its -2 * sizeof (union scm_vm_stack_element *)
so that's OK.=
So for loops is skipped, it should go to.
=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 sp =3D SCM_FRAME_PREVIOUS_SP (fp);
frames.h:#define SCM_FRAME_PRE= VIOUS_SP(fp)=C2=A0=C2=A0=C2=A0 ((fp) + 2)

and so now it loops around= and repeats.
(gdb) print cache
$19 =3D {entries =3D {{ip =3D 0x1, ma= p =3D 0x0}, {ip =3D 0x0,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map =3D 0x0} &l= t;repeats 23 times>, {ip =3D 0x1a1b5e0, map =3D 0x0}, {ip =3D 0x0,
= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map =3D 0x0}, {ip =3D 0x0, map =3D 0x0}, {ip= =3D 0x0, map =3D 0x0}, {ip =3D 0x0,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map= =3D 0x0}, {ip =3D 0x0, map =3D 0x0}, {ip =3D 0x0, map =3D 0x0}, {ip =3D 0x= 0,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map =3D 0x0}}}
(gdb) print &ca= che
$20 =3D (struct slot_map_cache *) 0x7f3e9b783ab0

and so the l= oop repeates forever, because
fp =3D SCM_FRAME_DYNAMIC_LINK (fp) never = advances fp, because
(fp)[1].as_uint=C2=A0 is zero.

So where is f= p pointing to?=C2=A0 recall fp =3D=3D 0x7f3e94cdee38

(gdb) x/20x 0x7= f3e94cdee00
0x7f3e94cdee00: 0x0000000000000904=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0x000000000000200c
0x7f3e94cdee10: 0x0000000001a1b7e0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 0x0000000001a1b5e0
0x7f3e94cdee20: 0x0000000000000004= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x000000000000200c
0x7f3e94cdee30: 0x0000= 000000169bd6=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x0000000000000001
0x7f3e94cd= ee40: 0x0000000000000000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x000000000192acd00x7f3e94cdee50: 0x0000000001fa4bc0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x000000= 0001a1c6d0
0x7f3e94cdee60: 0x0000000044507950=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 0x0000000000000002
0x7f3e94cdee70: 0x000000000005a6f5=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 0x00000000018febd0
0x7f3e94cdee80: 0x000000006d490d00=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 0x00007f3e9e97241c
0x7f3e94cdee90: 0x0000000= 000000002=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x00007f3e9e97241c


(gdb)= x/s 0x7f3e94cdee60
0x7f3e94cdee60: "PyPD"=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 <<< ?? is this a meaningul string?
(gdb) x/s 0x7f3= e94cdee81
0x7f3e94cdee81: "\rIm"=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= << how about this ???

So this does not look healthy.=C2=A0 Ho= w did we get here?

(gdb) print vp
$21 =3D (struct scm_vm *) 0x755= c1bd0
(gdb) print mark_stack_ptr
$22 =3D (struct GC_ms_entry *) 0x7f3= e9b783f40
(gdb) print mark_stack_limit
$23 =3D (struct GC_ms_entry *)= 0x7f3e9b793eb0
(gdb) print upper
$24 =3D (void *) 0x7ffdd986
(gdb= ) print GC_greatest_plausible_heap_addr
$25 =3D (void *) 0x7ffdd986
(= gdb) print lower
$26 =3D (void *) 0x17a1ff8

(gdb) print vp->fp=
$27 =3D (union scm_vm_stack_element *) 0x7f3e94cdee18
(gdb) print vp= ->sp
$28 =3D (union scm_vm_stack_element *) 0x7f3e94cdee00
(gdb) p= rint=C2=A0 vp->stack_top
$29 =3D (union scm_vm_stack_element *) 0x7f3= e94cdf000

So lets walk manually.
frames.h:#define SCM_FRAME_DYNA= MIC_LINK(fp)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ((fp) + (fp)[1].as_uint)

= (gdb) print (vp->fp) + (vp->fp)[1].as_uint
$31 =3D (union scm_vm_s= tack_element *) 0x7f3e94cdee38
So we are instanly in trouble! once throu= gh the loop, and we've got the broken fp

So how did we get here?=

(gdb) print vp
$32 =3D (struct scm_vm *) 0x755c1bd0
(gdb) pri= nt *vp
$33 =3D {ip =3D 0x7f3e9e5a6378 <subr_stub_code+56>, sp =3D = 0x7f3e94cdee00,
=C2=A0 fp =3D 0x7f3e94cdee18, stack_limit =3D 0x7f3e94c= de000, trace_level =3D 0,
=C2=A0 sp_min_since_gc =3D 0x7f3e94cdee00, st= ack_size =3D 512,
=C2=A0 stack_bottom =3D 0x7f3e94cde000, stack_top =3D= 0x7f3e94cdf000,
=C2=A0 overflow_handler_stack =3D 0x304, hooks =3D {0x= 4, 0x4, 0x4, 0x4, 0x4},
=C2=A0 resumable_prompt_cookie =3D 0x7f3d397f94= 40, engine =3D 1}

Looks plausible, not corrupted. Why is engine=3D1 = ??
#define SCM_VM_REGULAR_ENGINE 0
#define SCM_VM_DEBUG_ENGINE 1
= #define SCM_VM_NUM_ENGINES 2
#=C2=A0 define SCM_I_CURRENT_THREAD (scm_i_= current_thread)

(gdb) print scm_i_current_thread
$34 =3D (scm_i_t= hread *) 0x0

OK, that's not excellent.
(gdb) print thread_cou= nt
$35 =3D 161

(gdb) bt
#0=C2=A0 scm_i_vm_mark_stack (vp=3D0x7= 55c1bd0, mark_stack_ptr=3D0x7f3e9b783f40,
=C2=A0=C2=A0=C2=A0 mark_stack= _limit=3D0x7f3e9b793eb0) at ../../libguile/vm.c:1011
#1=C2=A0 0x00007f3e= 9db8835e in GC_mark_from (mark_stack_top=3D0x7f3e9b783ee0,
=C2=A0=C2=A0= =C2=A0 mark_stack_top@entry=3D0x7f3e9b783f00,
=C2=A0=C2=A0=C2=A0 mark_s= tack=3Dmark_stack@entry=3D0x7f3e9b783eb0,
=C2=A0=C2=A0=C2=A0 mark_stack= _limit=3Dmark_stack_limit@entry=3D0x7f3e9b793eb0) at ../mark.c:772
#2=C2= =A0 0x00007f3e9db8897e in GC_do_local_mark (local_mark_stack=3D0x7f3e9b783e= b0,
=C2=A0=C2=A0=C2=A0 local_top=3D0x7f3e9b783f00) at ../mark.c:1037#3=C2=A0 0x00007f3e9db88b98 in GC_mark_local (
=C2=A0=C2=A0=C2=A0 local= _mark_stack=3Dlocal_mark_stack@entry=3D0x7f3e9b783eb0, id=3Did@entry=3D4)=C2=A0=C2=A0=C2=A0 at ../mark.c:1170
#4=C2=A0 0x00007f3e9db88eaa in GC= _help_marker (my_mark_no=3Dmy_mark_no@entry=3D80003)
=C2=A0=C2=A0=C2=A0 = at ../mark.c:1238
#5=C2=A0 0x00007f3e9db92e3c in GC_mark_thread (id=3D&l= t;optimized out>)
=C2=A0=C2=A0=C2=A0 at ../pthread_support.c:380
#= 6=C2=A0 0x00007f3e9e2ae6ba in start_thread (arg=3D0x7f3e9b794700)
=C2=A0= =C2=A0=C2=A0 at pthread_create.c:333
#7=C2=A0 0x00007f3e9dfdd82d in clon= e ()
=C2=A0=C2=A0=C2=A0 at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109=

so bdwgc is version gc7_6_0

So ../mark.c:772 is GET_HDR and = GET_HDR is in
include/private/gc_hdrs.h

define GET_HDR(p, hhdr)= =C2=A0 ... ((hhdr) =3D HDR_INNER(p)
# define HDR_INNER(p) HDR_FROM_BI(B= I(p),p)
#define HDR_FROM_BI(bi, p) \
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ((bi)->inde= x[((word)(p) >> LOG_HBLKSIZE) & (BOTTOM_SZ - 1)])

This see= ms all wrong.. wtf ....

is it called by thread_mark ??
in libguil= e/thread.c:
=C2=A0 thread_gc_kind =3D
=C2=A0=C2=A0=C2=A0 GC_new_kind = (GC_new_free_list (),
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 GC_MAKE_PROC = (GC_new_proc (thread_mark), 0),
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0, = 1)
in bdwgc/misc.c=C2=A0 GC_new_proc
=C2=A0=C2=A0=C2=A0 GC_mark_procs= [result] =3D proc
#define GC_mark_procs GC_arrays._mark_procs
#define= PROC(descr) \
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (GC_mark_procs[((descr) &g= t;> GC_DS_TAG_BITS) & (GC_MAX_MARK_PROCS-1)])

=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 GC_obj_kinds[result].ok_descriptor =3D descr;=C2=A0 <<= ;< this is where MAKE_PROC goes.

anyway ... in mark.c line 733=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 case GC_DS_PROC:
=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 mark_stack_top--;
=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 credit -=3D GC_PROC_BYTES;=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 mark_stack_top = =3D (*PROC(descr))((word *)current_p, mark_stack_top,
=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 mark_stack_limit, ENV(descr));
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 continue;
But that's the wrong line number.=C2=A0= line 772 is in
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 case GC_DS_P= ER_OBJECT:
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if ((s= igned_word)descr >=3D 0) {
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 /* Descriptor is in the object.=C2=A0=C2=A0=C2=A0= =C2=A0 */
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 descr =3D *(word *)(current_p + descr - GC_DS_PER_OBJECT);

Oh= , it goes to retry, which loops around and calls PROC which is thread_mark<= br>which is called with args
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 mark_stack_top =3D (*PROC(descr))((word *)current_p, mark_stac= k_top,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 mark_stack_limit, ENV(descr));

(gd= b) print current_p
$39 =3D (ptr_t) 0x7225d380 "@\325%r"
so back in libguile/threads.c line 84:
=C2=A0 const struct scm_i_thread= *t =3D (struct scm_i_thread *) addr;

(gdb) print *(struct scm_i_thr= ead *) current_p
$40 =3D {next_thread =3D 0x7225d540, handle =3D 0x5763f= 3b0, pthread =3D 139900934399744,
=C2=A0 result =3D 0x4, exited =3D 0, = guile_mode =3D 1, needs_unregister =3D 1, wake =3D 0x0,
=C2=A0 sleep_co= nd =3D {__data =3D {__lock =3D 0, __futex =3D 0, __total_seq =3D 0,
=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 __wakeup_seq =3D 0, __woken_seq =3D 0, __mutex = =3D 0x0, __nwaiters =3D 0,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __broadcast_s= eq =3D 0}, __size =3D '\000' <repeats 47 times>, __align =3D = 0},
=C2=A0 sleep_pipe =3D {151, 197}, freelists =3D 0x71cfe800,
=C2= =A0 pointerless_freelists =3D 0x71cfe780, dynamic_state =3D 0x71de7900, dyn= stack =3D {
=C2=A0=C2=A0=C2=A0 base =3D 0x698c3800, top =3D 0x698c3940, = limit =3D 0x698c3a00},
=C2=A0 pending_asyncs =3D 0x304, block_asyncs = =3D 0, continuation_root =3D 0x5763f340,
=C2=A0 continuation_base =3D 0= x7f3d397f9c90, vp =3D 0x755c1bd0, base =3D 0x7f3d397f9da0,
=C2=A0 regs = =3D {{__jmpbuf =3D {139905204221696, 139906923931021, 1, 139894281647232, <= br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 139906913276512, 0, 139906918= 327551, 8388608},
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __mask_was_saved =3D 9= 64664752, __saved_mask =3D {__val =3D {0, 139900934399744,
=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 139900934399744, 139900934397344= , 139906923929400, 139893936840720,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 1804170656, 139906913276512, 139900934399744, 1399009= 34397344,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 139900= 934397504, 139893936840720, 139906910859134, 139906913276484,
=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 139906910857112, 27401856}}}}= }

(gdb) print ((struct scm_i_thread *) current_p)->vp
$41 =3D = (struct scm_vm *) 0x755c1bd0

which is exactly the vp we had before, = so this is the correct thread.

So=C2=A0 ... all the values in *(stru= ct scm_i_thread *) current_p look more or less OK;
its not corrupted. An= d all the values in *vp look more or less OK, its not corrrupted
except = that engine=3D1 looks wrong.=C2=A0 So what's up with the weird stack lo= oping?
Again:
From before, we had:

(gdb) print *vp
$42 =3D= {ip =3D 0x7f3e9e5a6378 <subr_stub_code+56>, sp =3D 0x7f3e94cdee00, <= br>=C2=A0 fp =3D 0x7f3e94cdee18, stack_limit =3D 0x7f3e94cde000, trace_leve= l =3D 0,
=C2=A0 sp_min_since_gc =3D 0x7f3e94cdee00, stack_size =3D 512,=
=C2=A0 stack_bottom =3D 0x7f3e94cde000, stack_top =3D 0x7f3e94cdf000, =
=C2=A0 overflow_handler_stack =3D 0x304, hooks =3D {0x4, 0x4, 0x4, 0x4,= 0x4},
=C2=A0 resumable_prompt_cookie =3D 0x7f3d397f9440, engine =3D 1}=

so fp, sp, stack_bottom, stack_top all are self-consistent with eac= h other.
The problem is that
=C2=A0=C2=A0=C2=A0 fp =3D SCM_FRAME_DYNA= MIC_LINK (fp)
goes nowhere, after the first iteration.

Is the fra= me corrupted? How? what does the rest of the stack look like?

(gdb) = x/32x 0x7f3e94cdee00
0x7f3e94cdee00: 0x0000000000000904=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 0x000000000000200c
0x7f3e94cdee10: 0x0000000001a1b7e0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 0x0000000001a1b5e0
0x7f3e94cdee20: 0x0000000= 000000004=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x000000000000200c=C2=A0 <<&l= t; theres the 4
0x7f3e94cdee30: 0x0000000000169bd6=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 0x0000000000000001 <<< there's the new fp
0x7f3e9= 4cdee40: 0x0000000000000000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x000000000192acd= 0 <<< there's the zero.
0x7f3e94cdee50: 0x0000000001fa4bc0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x0000000001a1c6d0
0x7f3e94cdee60: 0x0000= 000044507950=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x0000000000000002
0x7f3e94cd= ee70: 0x000000000005a6f5=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x00000000018febd00x7f3e94cdee80: 0x000000006d490d00=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x00007f= 3e9e97241c
0x7f3e94cdee90: 0x0000000000000002=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 0x00007f3e9e97241c
0x7f3e94cdeea0: 0x0000000000000002=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 0x00007f3e8d797630
0x7f3e94cdeeb0: 0x0000000000000002=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 0x00007f3e86effacc
0x7f3e94cdeec0: 0x0000000= 000000003=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x000000006d490d00
0x7f3e94cdeed= 0: 0x00007f3e86eb8698=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x0000000000000002
0= x7f3e94cdeee0: 0x00007f3e8e51f450=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x000000000= 0000008
0x7f3e94cdeef0: 0x0000000001a37c10=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= 0x000000000192acd0


So... who sets up thread->vp ?
/* #de= fine VM_ENABLE_ASSERTIONS */
However, defining this breaks the build.
diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c
index 6c88= ebf..f0f8164 100644
--- a/libguile/vm-engine.c
+++ b/libguile/vm-engi= ne.c
@@ -2050,11 +2050,11 @@ VM_NAME (scm_i_thread *thread, struct scm_v= m *vp,
=C2=A0
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 if (!scm_module_system_booted_p)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 {
-=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ASSERT (scm_is_true<= br>+=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 ASSERT (scm_is_true(
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 scm_equal_p (modname,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 scm_list_2
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (SCM_BOOL_T,
-=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 scm_from_utf8_symbol ("guile"))= ));
+=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 scm_f= rom_utf8_symbol ("guile")))));
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 var =3D scm_lookup (= sym);
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 }
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 el= se if (scm_is_true (SCM_CAR (modname)))
diff --git a/libguile/vm.c b/lib= guile/vm.c
index 18f2192..c26066e 100644
--- a/libguile/vm.c
+++ b= /libguile/vm.c
@@ -63,7 +63,7 @@ static size_t page_size;
=C2=A0
= =C2=A0/* The VM has a number of internal assertions that shouldn't norm= ally be
=C2=A0=C2=A0=C2=A0 necessary, but might be if you think you foun= d a bug in the VM. */
-/* #define VM_ENABLE_ASSERTIONS */
+#define VM= _ENABLE_ASSERTIONS
=C2=A0
=C2=A0static void vm_expand_stack (struct s= cm_vm *vp,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 union scm_vm_stack_element *new_sp)= SCM_NOINLINE;


=C2=A0GEN=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 guile-pro= cedures.texi
/bin/bash: line 1: 24078 Broken pipe

=C2=A0./meta/g= uile
Aborted
./libguile/guile
Aborted
gdb libguile/.libs/lt-gui= le

#0=C2=A0 0x00007ffff74f7428 in __GI_raise (sig=3Dsig@entry=3D6)=C2=A0=C2=A0=C2=A0 at ../sysdeps/unix/sysv/linux/raise.c:54
#1=C2=A0 0= x00007ffff74f902a in __GI_abort () at abort.c:89
#2=C2=A0 0x00007ffff7b7= 07c4 in vm_regular_engine (thread=3D0x7282, vp=3D0x806f30,
=C2=A0=C2=A0= =C2=A0 registers=3D0x6, resume=3D-145787864) at ../../libguile/vm-engine.c:= 2053
#3=C2=A0 0x00007ffff7b711d2 in scm_call_n (proc=3D0x832b30, argv=3D= argv@entry=3D0x0,
=C2=A0=C2=A0=C2=A0 nargs=3Dnargs@entry=3D0) at ../../= libguile/vm.c:1257
#4=C2=A0 0x00007ffff7af14e9 in scm_call_0 (proc=3D<= ;optimized out>)
=C2=A0=C2=A0=C2=A0 at ../../libguile/eval.c:481
#= 5=C2=A0 0x00007ffff7b10568 in scm_primitive_load_path (args=3D<optimized= out>)
=C2=A0=C2=A0=C2=A0 at ../../libguile/load.c:1248
#6=C2=A0 0= x00007ffff7b6e976 in vm_regular_engine (thread=3D0x7282, vp=3D0x806f30, =C2=A0=C2=A0=C2=A0 registers=3D0x6, resume=3D-145787864) at ../../libguile= /vm-engine.c:784

Oh. wtf.

(define last-gc (gc-stats))
(de= fine start-time (get-internal-real-time))
(define run-time (get-internal= -run-time))
(define (avg-time-taken)
=C2=A0=C2=A0=C2=A0 (define now = (get-internal-real-time))
=C2=A0=C2=A0=C2=A0 (define run (get-internal-r= un-time))
=C2=A0=C2=A0=C2=A0 (define cur (gc-stats))
=C2=A0=C2=A0=C2= =A0 (define gc-time-taken (* 1.0e-9 (- (cdar cur) (cdar last-gc))))
=C2= =A0=C2=A0=C2=A0 (define elapsed-time (* 1.0e-9 (- now start-time)))
=C2= =A0=C2=A0=C2=A0 (define cpu-time (* 1.0e-9 (- run run-time)))
=C2=A0=C2= =A0=C2=A0 (format #t "Elapsed time: ~5f secs. GC-time-taken: ~5f s or = ~5f%=C2=A0=C2=A0 cpu-usage: ~5f%\n"
=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0= =C2=A0 elapsed-time gc-time-taken=C2=A0 (* 100 (/ gc-time-taken elapsed-tim= e))
=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 (* 100 (/ cpu-time elapsed-tim= e))
=C2=A0=C2=A0=C2=A0 )
=C2=A0=C2=A0=C2=A0 (set! last-gc cur)
=C2= =A0=C2=A0=C2=A0 (set! start-time now)
=C2=A0=C2=A0=C2=A0 (set! run-time = run)
)


linas@fanny: ~/src/novamente/src/opencog-git/opencog/n= lp/learn $ vi x
linas@fanny: ~/src/novamente/src/opencog-git/opencog/nlp= /learn $ cat x

#0=C2=A0 find_slot_map (cache=3D0x7f3e9b783ab0, ip=3D= 0x1) at
../../libguile/vm.c:935
935=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 if (cache->entries[slot].ip =3D=3D ip)
(gdb) print slot
$1 =3D= 0
(gdb) print cache->entries[slot].ip
$2 =3D (scm_t_uint32 *) 0x1=
(gdb) print cache->entries[slot].map
$4 =3D (const scm_t_uint8 *)= 0x0
(gdb) step
scm_i_vm_mark_stack (vp=3D0x755c1bd0, mark_stack_ptr= =3D0x7f3e9b783f40,
=C2=A0=C2=A0=C2=A0 mark_stack_limit=3D0x7f3e9b793eb0= ) at ../../libguile/vm.c:1011
1011=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 slot_map =3D find_slot_map (SCM_FRAME_RETURN_ADDRESS (fp), &cach= e);
(gdb) print fp
$5 =3D (union scm_vm_stack_element *) 0x7f3e94cdee= 38

#define SCM_FRAME_RETURN_ADDRESS(fp)=C2=A0=C2=A0=C2=A0 ((fp)[0].a= s_ip)
(gdb) print (fp)[0].as_ip
$6 =3D (scm_t_uint32 *) 0x1

OK= that looks weird ... is this corrupted ?? but whatever,
because the ret= urned slot_map is never used ... because ...

(gdb) step
scm_i_vm_= mark_stack (vp=3D0x755c1bd0, mark_stack_ptr=3D0x7f3e9b783f40,
=C2=A0=C2= =A0=C2=A0 mark_stack_limit=3D0x7f3e9b793eb0) at ../../libguile/vm.c:975
= 975=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 fp = =3D SCM_FRAME_DYNAMIC_LINK (fp))
(gdb) print fp
$7 =3D (union scm_vm_= stack_element *) 0x7f3e94cdee38

frames.h:#define SCM_FRAME_DYNAMIC_L= INK(fp)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ((fp) + (fp)[1].as_uint)
(gdb) pri= nt (fp)[1].as_uint
$8 =3D 0

OK, that seems bad, because now fp ne= ver advances, it just repeats
over and over with this same value.
(gdb)
979=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = for (slot =3D nlocals - 1; sp < fp; sp++, slot--)
(gdb) print nlocals=
$10 =3D -2
(gdb) print sp
$11 =3D (union scm_vm_stack_element *) = 0x7f3e94cdee48
#define SCM_FRAME_NUM_LOCALS(fp, sp)=C2=A0=C2=A0=C2=A0 ((= fp) - (sp))
(gdb) print fp
$12 =3D (union scm_vm_stack_element *) 0x7= f3e94cdee38
(gdb) print=C2=A0 ((fp) - (sp))
$13 =3D -2
Ohh .. its = not -16 because its -2 * sizeof (union scm_vm_stack_element *)
so that&#= 39;s OK.
So for loops is skipped, it should go to.
=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 sp =3D SCM_FRAME_PREVIOUS_SP (fp);
frames.h:#define SCM_FRA= ME_PREVIOUS_SP(fp)=C2=A0=C2=A0=C2=A0 ((fp) + 2)

and so now it loops = around and repeats.
(gdb) print cache
$19 =3D {entries =3D {{ip =3D 0= x1, map =3D 0x0}, {ip =3D 0x0,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map =3D 0= x0} <repeats 23 times>, {ip =3D 0x1a1b5e0, map =3D 0x0}, {ip =3D 0x0,=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map =3D 0x0}, {ip =3D 0x0, map =3D 0x0}= , {ip =3D 0x0, map =3D 0x0}, {ip =3D 0x0,
=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 map =3D 0x0}, {ip =3D 0x0, map =3D 0x0}, {ip =3D 0x0, map =3D 0x0}, {ip= =3D 0x0,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 map =3D 0x0}}}
(gdb) print = &cache
$20 =3D (struct slot_map_cache *) 0x7f3e9b783ab0

and s= o the loop repeates forever, because
fp =3D SCM_FRAME_DYNAMIC_LINK (fp)= never advances fp, because
(fp)[1].as_uint=C2=A0 is zero.

So whe= re is fp pointing to?=C2=A0 recall fp =3D=3D 0x7f3e94cdee38

(gdb) x/= 20x 0x7f3e94cdee00
0x7f3e94cdee00: 0x0000000000000904=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 0x000000000000200c
0x7f3e94cdee10: 0x0000000001a1b7e0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 0x0000000001a1b5e0
0x7f3e94cdee20: 0x0000000000= 000004=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x000000000000200c
0x7f3e94cdee30: = 0x0000000000169bd6=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x0000000000000001
0x7f= 3e94cdee40: 0x0000000000000000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x000000000192= acd0
0x7f3e94cdee50: 0x0000000001fa4bc0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x= 0000000001a1c6d0
0x7f3e94cdee60: 0x0000000044507950=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 0x0000000000000002
0x7f3e94cdee70: 0x000000000005a6f5=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 0x00000000018febd0
0x7f3e94cdee80: 0x000000006d490= d00=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x00007f3e9e97241c
0x7f3e94cdee90: 0x0= 000000000000002=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x00007f3e9e97241c

(gdb) x/s 0x7f3e94cdee60
0x7f3e94cdee60: "PyPD"=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 <<< ?? is this a meaningul string?
(gdb) x/s= 0x7f3e94cdee81
0x7f3e94cdee81: "\rIm"=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 << how about this ???

So this does not look healthy.=C2= =A0 How did we get here?

(gdb) print vp
$21 =3D (struct scm_vm *)= 0x755c1bd0
(gdb) print mark_stack_ptr
$22 =3D (struct GC_ms_entry *)= 0x7f3e9b783f40
(gdb) print mark_stack_limit
$23 =3D (struct GC_ms_en= try *) 0x7f3e9b793eb0
(gdb) print upper
$24 =3D (void *) 0x7ffdd986(gdb) print GC_greatest_plausible_heap_addr
$25 =3D (void *) 0x7ffdd98= 6
(gdb) print lower
$26 =3D (void *) 0x17a1ff8

(gdb) print vp-= >fp
$27 =3D (union scm_vm_stack_element *) 0x7f3e94cdee18
(gdb) pr= int vp->sp
$28 =3D (union scm_vm_stack_element *) 0x7f3e94cdee00
(= gdb) print=C2=A0 vp->stack_top
$29 =3D (union scm_vm_stack_element *)= 0x7f3e94cdf000

So lets walk manually.
frames.h:#define SCM_FRAM= E_DYNAMIC_LINK(fp)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ((fp) + (fp)[1].as_uint)
(gdb) print (vp->fp) + (vp->fp)[1].as_uint
$31 =3D (union sc= m_vm_stack_element *) 0x7f3e94cdee38
So we are instanly in trouble! once= through the loop, and we've got the broken fp

So how did we get= here?

(gdb) print vp
$32 =3D (struct scm_vm *) 0x755c1bd0
(gd= b) print *vp
$33 =3D {ip =3D 0x7f3e9e5a6378 <subr_stub_code+56>, s= p =3D 0x7f3e94cdee00,
=C2=A0 fp =3D 0x7f3e94cdee18, stack_limit =3D 0x7= f3e94cde000, trace_level =3D 0,
=C2=A0 sp_min_since_gc =3D 0x7f3e94cdee= 00, stack_size =3D 512,
=C2=A0 stack_bottom =3D 0x7f3e94cde000, stack_t= op =3D 0x7f3e94cdf000,
=C2=A0 overflow_handler_stack =3D 0x304, hooks = =3D {0x4, 0x4, 0x4, 0x4, 0x4},
=C2=A0 resumable_prompt_cookie =3D 0x7f3= d397f9440, engine =3D 1}

Looks plausible, not corrupted. Why is engi= ne=3D1 ??
#define SCM_VM_REGULAR_ENGINE 0
#define SCM_VM_DEBUG_ENGIN= E 1
#define SCM_VM_NUM_ENGINES 2
#=C2=A0 define SCM_I_CURRENT_THREAD = (scm_i_current_thread)

(gdb) print scm_i_current_thread
$34 =3D (= scm_i_thread *) 0x0

OK, that's not excellent.
(gdb) print thr= ead_count
$35 =3D 161

(gdb) bt
#0=C2=A0 scm_i_vm_mark_stack (v= p=3D0x755c1bd0, mark_stack_ptr=3D0x7f3e9b783f40,
=C2=A0=C2=A0=C2=A0 mar= k_stack_limit=3D0x7f3e9b793eb0) at ../../libguile/vm.c:1011
#1=C2=A0 0x0= 0007f3e9db8835e in GC_mark_from (mark_stack_top=3D0x7f3e9b783ee0,
=C2= =A0=C2=A0=C2=A0 mark_stack_top@entry=3D0x7f3e9b783f00,
=C2=A0=C2=A0=C2= =A0 mark_stack=3Dmark_stack@entry=3D0x7f3e9b783eb0,
=C2=A0=C2=A0=C2=A0 = mark_stack_limit=3Dmark_stack_limit@entry=3D0x7f3e9b793eb0) at ../mark.c:77= 2
#2=C2=A0 0x00007f3e9db8897e in GC_do_local_mark (local_mark_stack=3D0x= 7f3e9b783eb0,
=C2=A0=C2=A0=C2=A0 local_top=3D0x7f3e9b783f00) at ../mark= .c:1037
#3=C2=A0 0x00007f3e9db88b98 in GC_mark_local (
=C2=A0=C2=A0= =C2=A0 local_mark_stack=3Dlocal_mark_stack@entry=3D0x7f3e9b783eb0, id=3Did@= entry=3D4)
=C2=A0=C2=A0=C2=A0 at ../mark.c:1170
#4=C2=A0 0x00007f3e9d= b88eaa in GC_help_marker (my_mark_no=3Dmy_mark_no@entry=3D80003)
=C2=A0= =C2=A0=C2=A0 at ../mark.c:1238
#5=C2=A0 0x00007f3e9db92e3c in GC_mark_th= read (id=3D<optimized out>)
=C2=A0=C2=A0=C2=A0 at ../pthread_suppo= rt.c:380
#6=C2=A0 0x00007f3e9e2ae6ba in start_thread (arg=3D0x7f3e9b7947= 00)
=C2=A0=C2=A0=C2=A0 at pthread_create.c:333
#7=C2=A0 0x00007f3e9df= dd82d in clone ()
=C2=A0=C2=A0=C2=A0 at ../sysdeps/unix/sysv/linux/x86_6= 4/clone.S:109

so bdwgc is version gc7_6_0

So ../mark.c:772 is= GET_HDR and GET_HDR is in
include/private/gc_hdrs.h

define GET_H= DR(p, hhdr)=C2=A0 ... ((hhdr) =3D HDR_INNER(p)
# define HDR_INNER(p) HD= R_FROM_BI(BI(p),p)
#define HDR_FROM_BI(bi, p) \
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ((bi)= ->index[((word)(p) >> LOG_HBLKSIZE) & (BOTTOM_SZ - 1)])
This seems all wrong.. wtf ....

is it called by thread_mark ??
i= n libguile/thread.c:
=C2=A0 thread_gc_kind =3D
=C2=A0=C2=A0=C2=A0 GC_= new_kind (GC_new_free_list (),
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 GC_M= AKE_PROC (GC_new_proc (thread_mark), 0),
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0, 1)
in bdwgc/misc.c=C2=A0 GC_new_proc
=C2=A0=C2=A0=C2=A0 GC_= mark_procs[result] =3D proc
#define GC_mark_procs GC_arrays._mark_procs<= br>#define PROC(descr) \
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (GC_mark_procs[(= (descr) >> GC_DS_TAG_BITS) & (GC_MAX_MARK_PROCS-1)])

=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 GC_obj_kinds[result].ok_descriptor =3D descr;= =C2=A0 <<< this is where MAKE_PROC goes.

anyway ... in mark= .c line 733
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 case GC_DS_PROC:<= br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 mark_stack_top--;=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 credit -=3D GC_P= ROC_BYTES;
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 mark_s= tack_top =3D (*PROC(descr))((word *)current_p, mark_stack_top,
=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 mark_stack_limit, ENV(descr));
=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 continue;
But that's the wrong line numb= er.=C2=A0 line 772 is in
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 cas= e GC_DS_PER_OBJECT:
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 if ((signed_word)descr >=3D 0) {
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* Descriptor is in the object.=C2=A0= =C2=A0=C2=A0=C2=A0 */
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 descr =3D *(word *)(current_p + descr - GC_DS_PER_OBJECT= );

Oh, it goes to retry, which loops around and calls PROC which is = thread_mark
which is called with args
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 mark_stack_top =3D (*PROC(descr))((word *)current_= p, mark_stack_top,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 mark_stack_limit, ENV(descr))= ;

(gdb) print current_p
$39 =3D (ptr_t) 0x7225d380 "@\325%r&= quot;

so back in libguile/threads.c line 84:
=C2=A0 const struct = scm_i_thread *t =3D (struct scm_i_thread *) addr;

(gdb) print *(stru= ct scm_i_thread *) current_p
$40 =3D {next_thread =3D 0x7225d540, handle= =3D 0x5763f3b0, pthread =3D 139900934399744,
=C2=A0 result =3D 0x4, ex= ited =3D 0, guile_mode =3D 1, needs_unregister =3D 1, wake =3D 0x0,
=C2= =A0 sleep_cond =3D {__data =3D {__lock =3D 0, __futex =3D 0, __total_seq = =3D 0,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __wakeup_seq =3D 0, __woken_seq = =3D 0, __mutex =3D 0x0, __nwaiters =3D 0,
=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 __broadcast_seq =3D 0}, __size =3D '\000' <repeats 47 times&= gt;, __align =3D 0},
=C2=A0 sleep_pipe =3D {151, 197}, freelists =3D 0x= 71cfe800,
=C2=A0 pointerless_freelists =3D 0x71cfe780, dynamic_state = =3D 0x71de7900, dynstack =3D {
=C2=A0=C2=A0=C2=A0 base =3D 0x698c3800, t= op =3D 0x698c3940, limit =3D 0x698c3a00},
=C2=A0 pending_asyncs =3D 0x3= 04, block_asyncs =3D 0, continuation_root =3D 0x5763f340,
=C2=A0 contin= uation_base =3D 0x7f3d397f9c90, vp =3D 0x755c1bd0, base =3D 0x7f3d397f9da0,=
=C2=A0 regs =3D {{__jmpbuf =3D {139905204221696, 139906923931021, 1, 1= 39894281647232,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 139906913276= 512, 0, 139906918327551, 8388608},
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __mas= k_was_saved =3D 964664752, __saved_mask =3D {__val =3D {0, 139900934399744,=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 139900934399744= , 139900934397344, 139906923929400, 139893936840720,
=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1804170656, 139906913276512, 139900934= 399744, 139900934397344,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 139900934397504, 139893936840720, 139906910859134, 13990691327648= 4,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1399069108571= 12, 27401856}}}}}

(gdb) print ((struct scm_i_thread *) current_p)-&g= t;vp
$41 =3D (struct scm_vm *) 0x755c1bd0

which is exactly the vp= we had before, so this is the correct thread.

So=C2=A0 ... all the = values in *(struct scm_i_thread *) current_p look more or less OK;
its n= ot corrupted. And all the values in *vp look more or less OK, its not corrr= upted
except that engine=3D1 looks wrong.=C2=A0 So what's up with th= e weird stack looping?
Again:
From before, we had:

(gdb) prin= t *vp
$42 =3D {ip =3D 0x7f3e9e5a6378 <subr_stub_code+56>, sp =3D 0= x7f3e94cdee00,
=C2=A0 fp =3D 0x7f3e94cdee18, stack_limit =3D 0x7f3e94cd= e000, trace_level =3D 0,
=C2=A0 sp_min_since_gc =3D 0x7f3e94cdee00, sta= ck_size =3D 512,
=C2=A0 stack_bottom =3D 0x7f3e94cde000, stack_top =3D = 0x7f3e94cdf000,
=C2=A0 overflow_handler_stack =3D 0x304, hooks =3D {0x4= , 0x4, 0x4, 0x4, 0x4},
=C2=A0 resumable_prompt_cookie =3D 0x7f3d397f944= 0, engine =3D 1}

so fp, sp, stack_bottom, stack_top all are self-con= sistent with each other.
The problem is that
=C2=A0=C2=A0=C2=A0 fp = =3D SCM_FRAME_DYNAMIC_LINK (fp)
goes nowhere, after the first iteration.=

Is the frame corrupted? How? what does the rest of the stack look l= ike?

(gdb) x/32x 0x7f3e94cdee00
0x7f3e94cdee00: 0x000000000000090= 4=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x000000000000200c
0x7f3e94cdee10: 0x000= 0000001a1b7e0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x0000000001a1b5e0
0x7f3e94c= dee20: 0x0000000000000004=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x000000000000200c= =C2=A0 <<< theres the 4
0x7f3e94cdee30: 0x0000000000169bd6=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 0x0000000000000001 <<< there's the= new fp
0x7f3e94cdee40: 0x0000000000000000=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= 0x000000000192acd0 <<< there's the zero.
0x7f3e94cdee50: 0= x0000000001fa4bc0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x0000000001a1c6d0
0x7f3= e94cdee60: 0x0000000044507950=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x0000000000000= 002
0x7f3e94cdee70: 0x000000000005a6f5=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x0= 0000000018febd0
0x7f3e94cdee80: 0x000000006d490d00=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 0x00007f3e9e97241c
0x7f3e94cdee90: 0x0000000000000002=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 0x00007f3e9e97241c
0x7f3e94cdeea0: 0x0000000000000= 002=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x00007f3e8d797630
0x7f3e94cdeeb0: 0x0= 000000000000002=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x00007f3e86effacc
0x7f3e9= 4cdeec0: 0x0000000000000003=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x000000006d490d0= 0
0x7f3e94cdeed0: 0x00007f3e86eb8698=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0x000= 0000000000002
0x7f3e94cdeee0: 0x00007f3e8e51f450=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0x0000000000000008
0x7f3e94cdeef0: 0x0000000001a37c10=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 0x000000000192acd0


So... who sets up thread-&= gt;vp ?
/* #define VM_ENABLE_ASSERTIONS */
However, defining this br= eaks the build.

diff --git a/libguile/vm-engine.c b/libguile/vm-engi= ne.c
index 6c88ebf..f0f8164 100644
--- a/libguile/vm-engine.c
+++ = b/libguile/vm-engine.c
@@ -2050,11 +2050,11 @@ VM_NAME (scm_i_thread *th= read, struct scm_vm *vp,
=C2=A0
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 if (!scm_module_system_booted_p)
=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 {
-=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 AS= SERT (scm_is_true
+=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 ASSERT (scm_is_true(
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 scm_equal_p (modname,
=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 scm_list_2
=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (SCM_BOOL_T,
-=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 scm_from_utf8_symbol (&= quot;guile"))));
+=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 scm_from_utf8_symbol ("guile")))));
=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 va= r =3D scm_lookup (sym);
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 }
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 else if (scm_is_true (SCM_CAR (modname)))
diff --git a/l= ibguile/vm.c b/libguile/vm.c
index 18f2192..c26066e 100644
--- a/libg= uile/vm.c
+++ b/libguile/vm.c
@@ -63,7 +63,7 @@ static size_t page_si= ze;
=C2=A0
=C2=A0/* The VM has a number of internal assertions that s= houldn't normally be
=C2=A0=C2=A0=C2=A0 necessary, but might be if y= ou think you found a bug in the VM. */
-/* #define VM_ENABLE_ASSERTIONS = */
+#define VM_ENABLE_ASSERTIONS
=C2=A0
=C2=A0static void vm_expan= d_stack (struct scm_vm *vp,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 union scm_vm_stac= k_element *new_sp) SCM_NOINLINE;


=C2=A0GEN=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 guile-procedures.texi
/bin/bash: line 1: 24078 Broken pipe
=C2=A0./meta/guile
Aborted
./libguile/guile
Aborted
gdb li= bguile/.libs/lt-guile

#0=C2=A0 0x00007ffff74f7428 in __GI_raise (sig= =3Dsig@entry=3D6)
=C2=A0=C2=A0=C2=A0 at ../sysdeps/unix/sysv/linux/raise= .c:54
#1=C2=A0 0x00007ffff74f902a in __GI_abort () at abort.c:89
#2= =C2=A0 0x00007ffff7b707c4 in vm_regular_engine (thread=3D0x7282, vp=3D0x806= f30,
=C2=A0=C2=A0=C2=A0 registers=3D0x6, resume=3D-145787864) at ../../= libguile/vm-engine.c:2053
#3=C2=A0 0x00007ffff7b711d2 in scm_call_n (pro= c=3D0x832b30, argv=3Dargv@entry=3D0x0,
=C2=A0=C2=A0=C2=A0 nargs=3Dnargs= @entry=3D0) at ../../libguile/vm.c:1257
#4=C2=A0 0x00007ffff7af14e9 in s= cm_call_0 (proc=3D<optimized out>)
=C2=A0=C2=A0=C2=A0 at ../../lib= guile/eval.c:481
#5=C2=A0 0x00007ffff7b10568 in scm_primitive_load_path = (args=3D<optimized out>)
=C2=A0=C2=A0=C2=A0 at ../../libguile/load= .c:1248
#6=C2=A0 0x00007ffff7b6e976 in vm_regular_engine (thread=3D0x728= 2, vp=3D0x806f30,
=C2=A0=C2=A0=C2=A0 registers=3D0x6, resume=3D-1457878= 64) at ../../libguile/vm-engine.c:784


--f403045ea6d679a79b0551239b33-- From debbugs-submit-bounces@debbugs.gnu.org Sat Sep 09 16:44:39 2017 Received: (at 27234) by debbugs.gnu.org; 9 Sep 2017 20:44:39 +0000 Received: from localhost ([127.0.0.1]:57959 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dqmcJ-0007E3-IR for submit@debbugs.gnu.org; Sat, 09 Sep 2017 16:44:39 -0400 Received: from mail-lf0-f50.google.com ([209.85.215.50]:34334) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dqmcI-0007Do-1z for 27234@debbugs.gnu.org; Sat, 09 Sep 2017 16:44:38 -0400 Received: by mail-lf0-f50.google.com with SMTP id l196so11421459lfl.1 for <27234@debbugs.gnu.org>; Sat, 09 Sep 2017 13:44:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:reply-to:from:date:message-id:subject:to; bh=yZ86tWlta7Y5z6FPek4nVjpJC2CgQHZruB5di6MzktU=; b=IZgHi1fMyjx0k4gS9a64BduJ0pSwK2+5xiQYXiK8d5Lu1lLIoGGBC6l3fFeyNwMmJk atnEFPWuuB3NiJlX/XlBSlJ5B6ImHEOKn/l0AgmydoVYMmnkVYB8yFAMIjMYlEHKkzg0 lR8gCuAG1j4ptq4XEMw1040kD9Rn58eQ+8zmteH6QSey+k0c6Zxl7vwkcTHD7HUEEfkb r6UpnFXRdN4rUsFJ7XBiIESppTPQqALwhaQbGvqmWaJ2ZGFj5McNXjBfFh728Ctm1gbU nPOwlxCQVDjl52caBZvDhCyyCp5TZ7PZ8caHXiF+8Wmf0X5PUP6xiv92+Dz7FVbX7qX6 /g1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:reply-to:from:date:message-id :subject:to; bh=yZ86tWlta7Y5z6FPek4nVjpJC2CgQHZruB5di6MzktU=; b=H92KYGGYEo7jP2aTLz70F9Oe2Xb3Oz7lpVTuBv0ik3dPEd75RfWbOX7At0GIu8V8L1 ToRHVUdr5gGDQmnIx8I3HW+msNYVIAAqHV5jCJ6pmipnZtmorQTgl0yNAJ2lMKkyqj8/ Ng25/Ju2uv0fsaiVKMldvFl3uPsS846l3XuxaO93x7iO16rPPfDOCLlctB7OQpqdJZr2 llHgdA7pEBvzU3yDcj4v20kWzETHYRVrbJhMlZQNijFs0zgSL6nm595jaclAhj4eX7t3 OONEB8o4JHmv1K6FLyqGTKZK3APF/+lSKg7z979zgjzTPqldeeKqaKb83r5PnRKrQeTF WaxA== X-Gm-Message-State: AHPjjUi1BVSWaFPuJ8TuRiy/haQ1on6KnNgd54jq4CqlFB9I1q2LQ/tc A6OVggz1r1bn3dOQ41zwkEDS/Yz7DQ== X-Google-Smtp-Source: AOwi7QBUAF4sbtaURAw6Zj1bKmUhi+kUf/9JHRMFy2raexKQqCYSEdHG3p+cTio8BPtZccCQjz4ja7BEGC80lPV+4y0= X-Received: by 10.25.202.23 with SMTP id a23mr1717732lfg.260.1504989871998; Sat, 09 Sep 2017 13:44:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.44.200 with HTTP; Sat, 9 Sep 2017 13:44:11 -0700 (PDT) From: Linas Vepstas Date: Sat, 9 Sep 2017 15:44:11 -0500 Message-ID: Subject: Followup. To: 27234@debbugs.gnu.org Content-Type: multipart/alternative; boundary="f403045ea792b60b930558c7c250" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 27234 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: linasvepstas@gmail.com Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.5 (/) --f403045ea792b60b930558c7c250 Content-Type: text/plain; charset="UTF-8" The original bug report failed to explain how to trigger the bug. I think I know of a simple way to trigger the bug, but have not yet created the simple test-case. I believe that the bug can be triggered like so: Create some medium-sized "random" (arbitrary) snippet of guile code. Send it to the guile repl server. Do it again and again, as fast as possible. The bug will trigger. By "arbitary/random" I mean some code that is simple but non-repeating: say, concatenate some random strings, count the number of letters in them, take the square root -- just do assorted, non-repeating computations that force the REPL to evaluate some new blob of scheme code every time. Now, once out of every 5 or 50 times, introduce a bug in the scheme code, causing the REPL to throw an exception. Using ut8 strings might also be a required ingredient. I believe that this will trigger the bug. I believe that the bug is some unprotected section in the guile interpreter, and a race condition clobbers the guile stack(s) and it all goes downhill from here. --f403045ea792b60b930558c7c250 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

The original bug report failed to e= xplain how to trigger the bug.=C2=A0 I think I know of a simple way to trig= ger the bug, but have not yet created the simple test-case.=C2=A0 I believe= that the bug can be triggered like so:

Create som= e medium-sized "random" (arbitrary) snippet of guile code. Send i= t to the guile repl server.=C2=A0 Do it again and again, as fast as possibl= e.=C2=A0 The bug will trigger.

By "arbitary/r= andom" I mean some code that is simple but non-repeating: say, concate= nate some random strings, count the number of letters in them, take the squ= are root -- just do assorted, non-repeating computations that force the REP= L to evaluate some new blob of scheme code every time.

=
Now, once out of every 5 or 50 times, introduce a bug in the scheme co= de, causing the REPL to throw an exception.=C2=A0 Using ut8 strings might a= lso be a required ingredient.

I believe that this = will trigger the bug. I believe that the bug is some unprotected section in= the guile interpreter, and a race condition clobbers the guile stack(s) an= d it all goes downhill from here.


--f403045ea792b60b930558c7c250--