Package: guile;
Reported by: Noah Lavine <noah.b.lavine <at> gmail.com>
Date: Thu, 4 Apr 2013 19:33:02 UTC
Severity: normal
Done: Mark H Weaver <mhw <at> netris.org>
Bug is archived. No further changes may be made.
Message #17 received at 14141 <at> debbugs.gnu.org (full text, mbox):
From: Noah Lavine <noah.b.lavine <at> gmail.com> To: Stefan Israelsson Tampe <stefan.itampe <at> gmail.com> Cc: 14141 <14141 <at> debbugs.gnu.org> Subject: Re: bug#14141: Abort in RTL VM Date: Fri, 12 Apr 2013 13:50:10 -0400
[Message part 1 (text/plain, inline)]
Case closed! (At least for now.) Although the bug in reserve-locals is real (you can check with the debugger), the program never actually got far enough for that to affect anything. Instead, here's what happened: - reserve-locals worked fine, reserving space for 4 local variables - bind-rest shrunk the stack back down, leaving enough space for only one local variable - the call to toplevel-ref invoked the old (non-RTL) VM, through line 479 of modules.c. - the old VM put its initial frame on the stack right after the stack pointer - but since the stack pointer had been decremented by bind-rest, that overwrote the 4 local variables in the RTL function. In particular, it overwrote the variable that held the box (fp[1], for the record). - after the old VM returned, the new VM continued, tried to use the incorrect fp[1] value, and aborted. So, mystery solved. Coming soon: patches to fix the bug you hit if you try to do reserve-locals after bind-rest ... Noah On Sat, Apr 6, 2013 at 10:49 AM, Stefan Israelsson Tampe < stefan.itampe <at> gmail.com> wrote: > Yeah, you really found the problem. > > I would put a if(nargs) to guard the while just to make it more robust. > > /Stefan > > > On Fri, Apr 5, 2013 at 7:29 PM, Noah Lavine <noah.b.lavine <at> gmail.com> > wrote: > > Hello, > > > > Just a quick update - it seems to be related to the order of the > > reserve-locals and bind-rest calls. If I reverse those, the problem goes > > away. However, I still don't know why this happens, and why the problem > > doesn't happen when the variables aren't boxed. > > > > I think there might be something weird going on in bind-rest when the > > argument is zero. It has a loop like this: > > > > while (nargs-- > dst) { ... }. > > > > When dst is zero, doesn't nargs end up getting set to -1? (Which, since > it's > > unsigned, is really 2^32 - 1.) That might make any later instructions > that > > use nargs (like reserve-locals) do odd things. > > > > Noah > > > > > > On Thu, Apr 4, 2013 at 3:44 PM, Noah Lavine <noah.b.lavine <at> gmail.com> > wrote: > >> > >> Oh, I forgot to mention one important fact. I *do* get the expected > result > >> if I eliminate the stuff with boxes. This works fine: > >> > >> > >> scheme@(guile-user)> (assemble-program '((begin-program foo) > >> (assert-nargs-ge 0) > >> (reserve-locals 4) > >> (bind-rest 0) > >> (cache-current-module! 2 foo) > >> (cached-toplevel-ref 2 foo car) > >> (tail-call 1 2) > >> (end-program))) > >> > >> Best, > >> Noah > >> > >> On Thu, Apr 4, 2013 at 3:22 PM, Noah Lavine <noah.b.lavine <at> gmail.com> > >> wrote: > >>> > >>> Hello, > >>> > >>> I'm actually testing on the wip-rtl-cps branch, but this error involves > >>> code that I believe is the same on that branch and on the wip-rtl > branch. > >>> Try opening a new Guile and doing the following: > >>> > >>> scheme@(guile-user)> (use-modules (system vm rtl)) > >>> scheme@(guile-user)> (assemble-program '((begin-program foo) > >>> (assert-nargs-ge 0) > >>> (reserve-locals 4) > >>> (bind-rest 0) > >>> (box 1 0) > >>> (cache-current-module! 2 foo) > >>> (cached-toplevel-ref 2 foo car) > >>> (box-ref 3 1) > >>> (mov 0 3) > >>> (tail-call 1 2) > >>> (end-program))) > >>> ... ... ... ... ... ... ... ... ... ... $1 = #<rtl-program dcec90 > 609bc0> > >>> scheme@(guile-user)> ($1 'hello) > >>> > >>> The expected result is > >>> $2 = hello > >>> > >>> What I actually get is, > >>> > >>> Program received signal SIGABRT, Aborted. > >>> 0x00007ffff7440425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 > >>> > >>> The full backtrace is below. The interesting part is that it seems to > be > >>> tripping the check at libguile/vm-engine.c:1868, which checks whether > an > >>> object is a variable before doing a box-ref on it. When I look at it > in GDB, > >>> it seems that whatever is at register 1 does not satisfy > scm_variable_p, > >>> although I'm not very experienced with debugging Guile. However, I am > >>> somewhat surprised at this, because I have used boxes and box-ref > before in > >>> the past with no trouble. > >>> > >>> Another surprising thing is that if I open Guile, do some other things > >>> for a while, and then run this code, the problem sometimes doesn't > appear. > >>> That is especially disturbing. > >>> > >>> Does anyone have any idea where the issue is or how I should find it? > >>> > >>> Thanks, > >>> Noah > >>> > >>> Here's the backtrace: > >>> > >>> (gdb) bt > >>> #0 0x00007ffff7440425 in raise () > >>> from /lib/x86_64-linux-gnu/libc.so.6 > >>> #1 0x00007ffff7443b8b in abort () > >>> from /lib/x86_64-linux-gnu/libc.so.6 > >>> #2 0x00007ffff7b30986 in rtl_vm_debug_engine (vm=0x6a6860, > >>> program=0xdcec90, argv=0x6a9548, nargs_=1) at vm-engine.c:1868 > >>> #3 0x00007ffff7b1aaf1 in vm_debug_engine (vm=0x6a6860, > >>> program=0xdcec90, argv=0x7fffffffd028, nargs=1) at vm-engine.c:419 > >>> #4 0x00007ffff7b38f6c in scm_c_vm_run (vm=0x6a6860, > >>> program=0x75dbe0, argv=0x7fffffffd028, nargs=1) at vm.c:791 > >>> #5 0x00007ffff7a5bff3 in scm_primitive_eval (exp=0x7fe7f0) > >>> at eval.c:691 > >>> #6 0x00007ffff7a5c0ad in scm_eval (exp=0x7fe7f0, > >>> module_or_state=0x7e0090) at eval.c:725 > >>> #7 0x00007ffff7acef2d in scm_shell (argc=1, argv=0x7fffffffe478) > >>> at script.c:441 > >>> #8 0x0000000000400bd0 in inner_main (closure=0x0, argc=1, > >>> argv=0x7fffffffe478) at guile.c:62 > >>> #9 0x00007ffff7a82663 in invoke_main_func (body_data=0x7fffffffe350) > >>> at init.c:336 > >>> #10 0x00007ffff7a563c9 in c_body (d=0x7fffffffe220) > >>> at continuations.c:513 > >>> #11 0x00007ffff7afc96c in apply_catch_closure (clo=0x81b360, > >>> args=0x304) at throw.c:146 > >>> #12 0x00007ffff7acf739 in apply_1 (smob=0x81b360, a=0x304) > >>> at smob.c:141 > >>> #13 0x00007ffff7b05cc8 in vm_regular_engine (vm=0x6a6860, > >>> program=0x7443e0, argv=0x7fffffffe0c0, nargs=2) > >>> at vm-i-system.c:873 > >>> #14 0x00007ffff7b38f6c in scm_c_vm_run (vm=0x6a6860, > >>> program=0x79d5d0, argv=0x7fffffffe0c0, nargs=4) at vm.c:791 > >>> #15 0x00007ffff7a5b793 in scm_call_4 (proc=0x79d5d0, arg1=0x404, > >>> arg2=0x81b360, arg3=0x81b340, arg4=0x81b320) at eval.c:513 > >>> #16 0x00007ffff7afc767 in scm_catch_with_pre_unwind_handler ( > >>> key=0x404, thunk=0x81b360, handler=0x81b340, > >>> pre_unwind_handler=0x81b320) at throw.c:86 > >>> #17 0x00007ffff7afca44 in scm_c_catch (tag=0x404, > >>> body=0x7ffff7a563a1 <c_body>, body_data=0x7fffffffe220, > >>> handler=0x7ffff7a563d8 <c_handler>, handler_data=0x7fffffffe220, > >>> pre_unwind_handler=0x7ffff7a5642c <pre_unwind_handler>, > >>> pre_unwind_handler_data=0x751160) at throw.c:213 > >>> #18 0x00007ffff7a5623d in scm_i_with_continuation_barrier ( > >>> body=0x7ffff7a563a1 <c_body>, body_data=0x7fffffffe220, > >>> handler=0x7ffff7a563d8 <c_handler>, handler_data=0x7fffffffe220, > >>> pre_unwind_handler=0x7ffff7a5642c <pre_unwind_handler>, > >>> pre_unwind_handler_data=0x751160) at continuations.c:451 > >>> #19 0x00007ffff7a564c3 in scm_c_with_continuation_barrier ( > >>> func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350) > >>> at continuations.c:547 > >>> #20 0x00007ffff7af97ba in with_guile_and_parent (base=0x7fffffffe290, > >>> base <at> entry=<error reading variable: value has been optimized out>, > >>> data=0x7fffffffe2d0, > >>> data <at> entry=<error reading variable: value has been optimized out>) > >>> at threads.c:907 > >>> #21 0x00007ffff71b6f55 in GC_call_with_stack_base ( > >>> fn=<optimized out>, arg=<optimized out>) at misc.c:1553 > >>> #22 0x00007ffff7af9894 in scm_i_with_guile_and_parent ( > >>> func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350, > >>> parent=0x0) at threads.c:950 > >>> #23 0x00007ffff7af98c0 in scm_with_guile ( > >>> func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350) > >>> at threads.c:956 > >>> #24 0x00007ffff7a825f4 in scm_boot_guile (argc=1, > >>> argv=0x7fffffffe478, main_func=0x400bac <inner_main>, closure=0x0) > >>> at init.c:319 > >>> #25 0x0000000000400c35 in main (argc=1, argv=0x7fffffffe478) > >>> at guile.c:81 > >>> > >>> > >> > > >
[Message part 2 (text/html, inline)]
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.