GNU bug report logs - #14141
Abort in RTL VM

Previous Next

Package: guile;

Reported by: Noah Lavine <noah.b.lavine <at> gmail.com>

Date: Thu, 4 Apr 2013 19:33:02 UTC

Severity: normal

Done: Mark H Weaver <mhw <at> netris.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Noah Lavine <noah.b.lavine <at> gmail.com>
To: Stefan Israelsson Tampe <stefan.itampe <at> gmail.com>
Cc: 14141 <14141 <at> debbugs.gnu.org>
Subject: bug#14141: Abort in RTL VM
Date: Fri, 12 Apr 2013 13:50:10 -0400
[Message part 1 (text/plain, inline)]
Case closed! (At least for now.)

Although the bug in reserve-locals is real (you can check with the
debugger), the program never actually got far enough for that to affect
anything. Instead, here's what happened:

 - reserve-locals worked fine, reserving space for 4 local variables
 - bind-rest shrunk the stack back down, leaving enough space for only one
local variable
 - the call to toplevel-ref invoked the old (non-RTL) VM, through line 479
of modules.c.
 - the old VM put its initial frame on the stack right after the stack
pointer - but since the stack pointer had been decremented by bind-rest,
that overwrote the 4 local variables in the RTL function. In particular, it
overwrote the variable that held the box (fp[1], for the record).
 - after the old VM returned, the new VM continued, tried to use the
incorrect fp[1] value, and aborted.

So, mystery solved. Coming soon: patches to fix the bug you hit if you try
to do reserve-locals after bind-rest ...

Noah


On Sat, Apr 6, 2013 at 10:49 AM, Stefan Israelsson Tampe <
stefan.itampe <at> gmail.com> wrote:

> Yeah, you really found the problem.
>
> I would put a if(nargs) to guard the while just to make it more robust.
>
> /Stefan
>
>
> On Fri, Apr 5, 2013 at 7:29 PM, Noah Lavine <noah.b.lavine <at> gmail.com>
> wrote:
> > Hello,
> >
> > Just a quick update - it seems to be related to the order of the
> > reserve-locals and bind-rest calls. If I reverse those, the problem goes
> > away. However, I still don't know why this happens, and why the problem
> > doesn't happen when the variables aren't boxed.
> >
> > I think there might be something weird going on in bind-rest when the
> > argument is zero. It has a loop like this:
> >
> > while (nargs-- > dst) { ... }.
> >
> > When dst is zero, doesn't nargs end up getting set to -1? (Which, since
> it's
> > unsigned, is really 2^32 - 1.) That might make any later instructions
> that
> > use nargs (like reserve-locals) do odd things.
> >
> > Noah
> >
> >
> > On Thu, Apr 4, 2013 at 3:44 PM, Noah Lavine <noah.b.lavine <at> gmail.com>
> wrote:
> >>
> >> Oh, I forgot to mention one important fact. I *do* get the expected
> result
> >> if I eliminate the stuff with boxes. This works fine:
> >>
> >>
> >> scheme@(guile-user)> (assemble-program '((begin-program foo)
> >>  (assert-nargs-ge 0)
> >>  (reserve-locals 4)
> >>  (bind-rest 0)
> >>  (cache-current-module! 2 foo)
> >>  (cached-toplevel-ref 2 foo car)
> >>  (tail-call 1 2)
> >>  (end-program)))
> >>
> >> Best,
> >> Noah
> >>
> >> On Thu, Apr 4, 2013 at 3:22 PM, Noah Lavine <noah.b.lavine <at> gmail.com>
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I'm actually testing on the wip-rtl-cps branch, but this error involves
> >>> code that I believe is the same on that branch and on the wip-rtl
> branch.
> >>> Try opening a new Guile and doing the following:
> >>>
> >>> scheme@(guile-user)> (use-modules (system vm rtl))
> >>> scheme@(guile-user)> (assemble-program '((begin-program foo)
> >>>  (assert-nargs-ge 0)
> >>>  (reserve-locals 4)
> >>>  (bind-rest 0)
> >>>  (box 1 0)
> >>>  (cache-current-module! 2 foo)
> >>>  (cached-toplevel-ref 2 foo car)
> >>>  (box-ref 3 1)
> >>>  (mov 0 3)
> >>>  (tail-call 1 2)
> >>>  (end-program)))
> >>> ... ... ... ... ... ... ... ... ... ... $1 = #<rtl-program dcec90
> 609bc0>
> >>> scheme@(guile-user)> ($1 'hello)
> >>>
> >>> The expected result is
> >>> $2 = hello
> >>>
> >>> What I actually get is,
> >>>
> >>> Program received signal SIGABRT, Aborted.
> >>> 0x00007ffff7440425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> >>>
> >>> The full backtrace is below. The interesting part is that it seems to
> be
> >>> tripping the check at libguile/vm-engine.c:1868, which checks whether
> an
> >>> object is a variable before doing a box-ref on it. When I look at it
> in GDB,
> >>> it seems that whatever is at register 1 does not satisfy
> scm_variable_p,
> >>> although I'm not very experienced with debugging Guile. However, I am
> >>> somewhat surprised at this, because I have used boxes and box-ref
> before in
> >>> the past with no trouble.
> >>>
> >>> Another surprising thing is that if I open Guile, do some other things
> >>> for a while, and then run this code, the problem sometimes doesn't
> appear.
> >>> That is especially disturbing.
> >>>
> >>> Does anyone have any idea where the issue is or how I should find it?
> >>>
> >>> Thanks,
> >>> Noah
> >>>
> >>> Here's the backtrace:
> >>>
> >>> (gdb) bt
> >>> #0  0x00007ffff7440425 in raise ()
> >>>    from /lib/x86_64-linux-gnu/libc.so.6
> >>> #1  0x00007ffff7443b8b in abort ()
> >>>    from /lib/x86_64-linux-gnu/libc.so.6
> >>> #2  0x00007ffff7b30986 in rtl_vm_debug_engine (vm=0x6a6860,
> >>>     program=0xdcec90, argv=0x6a9548, nargs_=1) at vm-engine.c:1868
> >>> #3  0x00007ffff7b1aaf1 in vm_debug_engine (vm=0x6a6860,
> >>>     program=0xdcec90, argv=0x7fffffffd028, nargs=1) at vm-engine.c:419
> >>> #4  0x00007ffff7b38f6c in scm_c_vm_run (vm=0x6a6860,
> >>>     program=0x75dbe0, argv=0x7fffffffd028, nargs=1) at vm.c:791
> >>> #5  0x00007ffff7a5bff3 in scm_primitive_eval (exp=0x7fe7f0)
> >>>     at eval.c:691
> >>> #6  0x00007ffff7a5c0ad in scm_eval (exp=0x7fe7f0,
> >>>     module_or_state=0x7e0090) at eval.c:725
> >>> #7  0x00007ffff7acef2d in scm_shell (argc=1, argv=0x7fffffffe478)
> >>>     at script.c:441
> >>> #8  0x0000000000400bd0 in inner_main (closure=0x0, argc=1,
> >>>     argv=0x7fffffffe478) at guile.c:62
> >>> #9  0x00007ffff7a82663 in invoke_main_func (body_data=0x7fffffffe350)
> >>>     at init.c:336
> >>> #10 0x00007ffff7a563c9 in c_body (d=0x7fffffffe220)
> >>>     at continuations.c:513
> >>> #11 0x00007ffff7afc96c in apply_catch_closure (clo=0x81b360,
> >>>     args=0x304) at throw.c:146
> >>> #12 0x00007ffff7acf739 in apply_1 (smob=0x81b360, a=0x304)
> >>>     at smob.c:141
> >>> #13 0x00007ffff7b05cc8 in vm_regular_engine (vm=0x6a6860,
> >>>     program=0x7443e0, argv=0x7fffffffe0c0, nargs=2)
> >>>     at vm-i-system.c:873
> >>> #14 0x00007ffff7b38f6c in scm_c_vm_run (vm=0x6a6860,
> >>>     program=0x79d5d0, argv=0x7fffffffe0c0, nargs=4) at vm.c:791
> >>> #15 0x00007ffff7a5b793 in scm_call_4 (proc=0x79d5d0, arg1=0x404,
> >>>     arg2=0x81b360, arg3=0x81b340, arg4=0x81b320) at eval.c:513
> >>> #16 0x00007ffff7afc767 in scm_catch_with_pre_unwind_handler (
> >>>     key=0x404, thunk=0x81b360, handler=0x81b340,
> >>>     pre_unwind_handler=0x81b320) at throw.c:86
> >>> #17 0x00007ffff7afca44 in scm_c_catch (tag=0x404,
> >>>     body=0x7ffff7a563a1 <c_body>, body_data=0x7fffffffe220,
> >>>     handler=0x7ffff7a563d8 <c_handler>, handler_data=0x7fffffffe220,
> >>>     pre_unwind_handler=0x7ffff7a5642c <pre_unwind_handler>,
> >>>     pre_unwind_handler_data=0x751160) at throw.c:213
> >>> #18 0x00007ffff7a5623d in scm_i_with_continuation_barrier (
> >>>     body=0x7ffff7a563a1 <c_body>, body_data=0x7fffffffe220,
> >>>     handler=0x7ffff7a563d8 <c_handler>, handler_data=0x7fffffffe220,
> >>>     pre_unwind_handler=0x7ffff7a5642c <pre_unwind_handler>,
> >>>     pre_unwind_handler_data=0x751160) at continuations.c:451
> >>> #19 0x00007ffff7a564c3 in scm_c_with_continuation_barrier (
> >>>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350)
> >>>     at continuations.c:547
> >>> #20 0x00007ffff7af97ba in with_guile_and_parent (base=0x7fffffffe290,
> >>>     base <at> entry=<error reading variable: value has been optimized out>,
> >>> data=0x7fffffffe2d0,
> >>>     data <at> entry=<error reading variable: value has been optimized out>)
> >>>     at threads.c:907
> >>> #21 0x00007ffff71b6f55 in GC_call_with_stack_base (
> >>>     fn=<optimized out>, arg=<optimized out>) at misc.c:1553
> >>> #22 0x00007ffff7af9894 in scm_i_with_guile_and_parent (
> >>>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350,
> >>>     parent=0x0) at threads.c:950
> >>> #23 0x00007ffff7af98c0 in scm_with_guile (
> >>>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350)
> >>>     at threads.c:956
> >>> #24 0x00007ffff7a825f4 in scm_boot_guile (argc=1,
> >>>     argv=0x7fffffffe478, main_func=0x400bac <inner_main>, closure=0x0)
> >>>     at init.c:319
> >>> #25 0x0000000000400c35 in main (argc=1, argv=0x7fffffffe478)
> >>>     at guile.c:81
> >>>
> >>>
> >>
> >
>
[Message part 2 (text/html, inline)]

This bug report was last modified 11 years and 175 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.