GNU bug report logs - #14141
Abort in RTL VM

Previous Next

Package: guile;

Reported by: Noah Lavine <noah.b.lavine <at> gmail.com>

Date: Thu, 4 Apr 2013 19:33:02 UTC

Severity: normal

Done: Mark H Weaver <mhw <at> netris.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 14141 in the body.
You can then email your comments to 14141 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#14141; Package guile. (Thu, 04 Apr 2013 19:33:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Noah Lavine <noah.b.lavine <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Thu, 04 Apr 2013 19:33:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Noah Lavine <noah.b.lavine <at> gmail.com>
To: bug-guile <at> gnu.org
Subject: Abort in RTL VM
Date: Thu, 4 Apr 2013 15:22:13 -0400
[Message part 1 (text/plain, inline)]
Hello,

I'm actually testing on the wip-rtl-cps branch, but this error involves
code that I believe is the same on that branch and on the wip-rtl branch.
Try opening a new Guile and doing the following:

scheme@(guile-user)> (use-modules (system vm rtl))
scheme@(guile-user)> (assemble-program '((begin-program foo)
 (assert-nargs-ge 0)
 (reserve-locals 4)
 (bind-rest 0)
 (box 1 0)
 (cache-current-module! 2 foo)
 (cached-toplevel-ref 2 foo car)
 (box-ref 3 1)
 (mov 0 3)
 (tail-call 1 2)
 (end-program)))
... ... ... ... ... ... ... ... ... ... $1 = #<rtl-program dcec90 609bc0>
scheme@(guile-user)> ($1 'hello)

The expected result is
$2 = hello

What I actually get is,

Program received signal SIGABRT, Aborted.
0x00007ffff7440425 in raise () from /lib/x86_64-linux-gnu/libc.so.6

The full backtrace is below. The interesting part is that it seems to be
tripping the check at libguile/vm-engine.c:1868, which checks whether an
object is a variable before doing a box-ref on it. When I look at it in
GDB, it seems that whatever is at register 1 does not satisfy
scm_variable_p, although I'm not very experienced with debugging Guile.
However, I am somewhat surprised at this, because I have used boxes and
box-ref before in the past with no trouble.

Another surprising thing is that if I open Guile, do some other things for
a while, and then run this code, the problem sometimes doesn't appear. That
is especially disturbing.

Does anyone have any idea where the issue is or how I should find it?

Thanks,
Noah

Here's the backtrace:

(gdb) bt
#0  0x00007ffff7440425 in raise ()
   from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7443b8b in abort ()
   from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff7b30986 in rtl_vm_debug_engine (vm=0x6a6860,
    program=0xdcec90, argv=0x6a9548, nargs_=1) at vm-engine.c:1868
#3  0x00007ffff7b1aaf1 in vm_debug_engine (vm=0x6a6860,
    program=0xdcec90, argv=0x7fffffffd028, nargs=1) at vm-engine.c:419
#4  0x00007ffff7b38f6c in scm_c_vm_run (vm=0x6a6860,
    program=0x75dbe0, argv=0x7fffffffd028, nargs=1) at vm.c:791
#5  0x00007ffff7a5bff3 in scm_primitive_eval (exp=0x7fe7f0)
    at eval.c:691
#6  0x00007ffff7a5c0ad in scm_eval (exp=0x7fe7f0,
    module_or_state=0x7e0090) at eval.c:725
#7  0x00007ffff7acef2d in scm_shell (argc=1, argv=0x7fffffffe478)
    at script.c:441
#8  0x0000000000400bd0 in inner_main (closure=0x0, argc=1,
    argv=0x7fffffffe478) at guile.c:62
#9  0x00007ffff7a82663 in invoke_main_func (body_data=0x7fffffffe350)
    at init.c:336
#10 0x00007ffff7a563c9 in c_body (d=0x7fffffffe220)
    at continuations.c:513
#11 0x00007ffff7afc96c in apply_catch_closure (clo=0x81b360,
    args=0x304) at throw.c:146
#12 0x00007ffff7acf739 in apply_1 (smob=0x81b360, a=0x304)
    at smob.c:141
#13 0x00007ffff7b05cc8 in vm_regular_engine (vm=0x6a6860,
    program=0x7443e0, argv=0x7fffffffe0c0, nargs=2)
    at vm-i-system.c:873
#14 0x00007ffff7b38f6c in scm_c_vm_run (vm=0x6a6860,
    program=0x79d5d0, argv=0x7fffffffe0c0, nargs=4) at vm.c:791
#15 0x00007ffff7a5b793 in scm_call_4 (proc=0x79d5d0, arg1=0x404,
    arg2=0x81b360, arg3=0x81b340, arg4=0x81b320) at eval.c:513
#16 0x00007ffff7afc767 in scm_catch_with_pre_unwind_handler (
    key=0x404, thunk=0x81b360, handler=0x81b340,
    pre_unwind_handler=0x81b320) at throw.c:86
#17 0x00007ffff7afca44 in scm_c_catch (tag=0x404,
    body=0x7ffff7a563a1 <c_body>, body_data=0x7fffffffe220,
    handler=0x7ffff7a563d8 <c_handler>, handler_data=0x7fffffffe220,
    pre_unwind_handler=0x7ffff7a5642c <pre_unwind_handler>,
    pre_unwind_handler_data=0x751160) at throw.c:213
#18 0x00007ffff7a5623d in scm_i_with_continuation_barrier (
    body=0x7ffff7a563a1 <c_body>, body_data=0x7fffffffe220,
    handler=0x7ffff7a563d8 <c_handler>, handler_data=0x7fffffffe220,
    pre_unwind_handler=0x7ffff7a5642c <pre_unwind_handler>,
    pre_unwind_handler_data=0x751160) at continuations.c:451
#19 0x00007ffff7a564c3 in scm_c_with_continuation_barrier (
    func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350)
    at continuations.c:547
#20 0x00007ffff7af97ba in with_guile_and_parent (base=0x7fffffffe290,
    base <at> entry=<error reading variable: value has been optimized out>,
data=0x7fffffffe2d0,
    data <at> entry=<error reading variable: value has been optimized out>)
    at threads.c:907
#21 0x00007ffff71b6f55 in GC_call_with_stack_base (
    fn=<optimized out>, arg=<optimized out>) at misc.c:1553
#22 0x00007ffff7af9894 in scm_i_with_guile_and_parent (
    func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350,
    parent=0x0) at threads.c:950
#23 0x00007ffff7af98c0 in scm_with_guile (
    func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350)
    at threads.c:956
#24 0x00007ffff7a825f4 in scm_boot_guile (argc=1,
    argv=0x7fffffffe478, main_func=0x400bac <inner_main>, closure=0x0)
    at init.c:319
#25 0x0000000000400c35 in main (argc=1, argv=0x7fffffffe478)
    at guile.c:81
[Message part 2 (text/html, inline)]

Information forwarded to bug-guile <at> gnu.org:
bug#14141; Package guile. (Thu, 04 Apr 2013 19:49:01 GMT) Full text and rfc822 format available.

Message #8 received at 14141 <at> debbugs.gnu.org (full text, mbox):

From: Noah Lavine <noah.b.lavine <at> gmail.com>
To: 14141 <at> debbugs.gnu.org
Subject: Re: bug#14141: Abort in RTL VM
Date: Thu, 4 Apr 2013 15:44:50 -0400
[Message part 1 (text/plain, inline)]
Oh, I forgot to mention one important fact. I *do* get the expected result
if I eliminate the stuff with boxes. This works fine:

scheme@(guile-user)> (assemble-program '((begin-program foo)
 (assert-nargs-ge 0)
 (reserve-locals 4)
 (bind-rest 0)
 (cache-current-module! 2 foo)
 (cached-toplevel-ref 2 foo car)
 (tail-call 1 2)
 (end-program)))

Best,
Noah

On Thu, Apr 4, 2013 at 3:22 PM, Noah Lavine <noah.b.lavine <at> gmail.com> wrote:

> Hello,
>
> I'm actually testing on the wip-rtl-cps branch, but this error involves
> code that I believe is the same on that branch and on the wip-rtl branch.
> Try opening a new Guile and doing the following:
>
> scheme@(guile-user)> (use-modules (system vm rtl))
> scheme@(guile-user)> (assemble-program '((begin-program foo)
>  (assert-nargs-ge 0)
>  (reserve-locals 4)
>  (bind-rest 0)
>  (box 1 0)
>  (cache-current-module! 2 foo)
>  (cached-toplevel-ref 2 foo car)
>  (box-ref 3 1)
>  (mov 0 3)
>  (tail-call 1 2)
>  (end-program)))
> ... ... ... ... ... ... ... ... ... ... $1 = #<rtl-program dcec90 609bc0>
> scheme@(guile-user)> ($1 'hello)
>
> The expected result is
> $2 = hello
>
> What I actually get is,
>
> Program received signal SIGABRT, Aborted.
> 0x00007ffff7440425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>
> The full backtrace is below. The interesting part is that it seems to be
> tripping the check at libguile/vm-engine.c:1868, which checks whether an
> object is a variable before doing a box-ref on it. When I look at it in
> GDB, it seems that whatever is at register 1 does not satisfy
> scm_variable_p, although I'm not very experienced with debugging Guile.
> However, I am somewhat surprised at this, because I have used boxes and
> box-ref before in the past with no trouble.
>
> Another surprising thing is that if I open Guile, do some other things for
> a while, and then run this code, the problem sometimes doesn't appear. That
> is especially disturbing.
>
> Does anyone have any idea where the issue is or how I should find it?
>
> Thanks,
> Noah
>
> Here's the backtrace:
>
> (gdb) bt
> #0  0x00007ffff7440425 in raise ()
>    from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007ffff7443b8b in abort ()
>    from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007ffff7b30986 in rtl_vm_debug_engine (vm=0x6a6860,
>     program=0xdcec90, argv=0x6a9548, nargs_=1) at vm-engine.c:1868
> #3  0x00007ffff7b1aaf1 in vm_debug_engine (vm=0x6a6860,
>     program=0xdcec90, argv=0x7fffffffd028, nargs=1) at vm-engine.c:419
> #4  0x00007ffff7b38f6c in scm_c_vm_run (vm=0x6a6860,
>     program=0x75dbe0, argv=0x7fffffffd028, nargs=1) at vm.c:791
> #5  0x00007ffff7a5bff3 in scm_primitive_eval (exp=0x7fe7f0)
>     at eval.c:691
> #6  0x00007ffff7a5c0ad in scm_eval (exp=0x7fe7f0,
>     module_or_state=0x7e0090) at eval.c:725
> #7  0x00007ffff7acef2d in scm_shell (argc=1, argv=0x7fffffffe478)
>     at script.c:441
> #8  0x0000000000400bd0 in inner_main (closure=0x0, argc=1,
>     argv=0x7fffffffe478) at guile.c:62
> #9  0x00007ffff7a82663 in invoke_main_func (body_data=0x7fffffffe350)
>     at init.c:336
> #10 0x00007ffff7a563c9 in c_body (d=0x7fffffffe220)
>     at continuations.c:513
> #11 0x00007ffff7afc96c in apply_catch_closure (clo=0x81b360,
>     args=0x304) at throw.c:146
> #12 0x00007ffff7acf739 in apply_1 (smob=0x81b360, a=0x304)
>     at smob.c:141
> #13 0x00007ffff7b05cc8 in vm_regular_engine (vm=0x6a6860,
>     program=0x7443e0, argv=0x7fffffffe0c0, nargs=2)
>     at vm-i-system.c:873
> #14 0x00007ffff7b38f6c in scm_c_vm_run (vm=0x6a6860,
>     program=0x79d5d0, argv=0x7fffffffe0c0, nargs=4) at vm.c:791
> #15 0x00007ffff7a5b793 in scm_call_4 (proc=0x79d5d0, arg1=0x404,
>     arg2=0x81b360, arg3=0x81b340, arg4=0x81b320) at eval.c:513
> #16 0x00007ffff7afc767 in scm_catch_with_pre_unwind_handler (
>     key=0x404, thunk=0x81b360, handler=0x81b340,
>     pre_unwind_handler=0x81b320) at throw.c:86
> #17 0x00007ffff7afca44 in scm_c_catch (tag=0x404,
>     body=0x7ffff7a563a1 <c_body>, body_data=0x7fffffffe220,
>     handler=0x7ffff7a563d8 <c_handler>, handler_data=0x7fffffffe220,
>     pre_unwind_handler=0x7ffff7a5642c <pre_unwind_handler>,
>     pre_unwind_handler_data=0x751160) at throw.c:213
> #18 0x00007ffff7a5623d in scm_i_with_continuation_barrier (
>     body=0x7ffff7a563a1 <c_body>, body_data=0x7fffffffe220,
>     handler=0x7ffff7a563d8 <c_handler>, handler_data=0x7fffffffe220,
>     pre_unwind_handler=0x7ffff7a5642c <pre_unwind_handler>,
>     pre_unwind_handler_data=0x751160) at continuations.c:451
> #19 0x00007ffff7a564c3 in scm_c_with_continuation_barrier (
>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350)
>     at continuations.c:547
> #20 0x00007ffff7af97ba in with_guile_and_parent (base=0x7fffffffe290,
>     base <at> entry=<error reading variable: value has been optimized out>,
> data=0x7fffffffe2d0,
>     data <at> entry=<error reading variable: value has been optimized out>)
>     at threads.c:907
> #21 0x00007ffff71b6f55 in GC_call_with_stack_base (
>     fn=<optimized out>, arg=<optimized out>) at misc.c:1553
> #22 0x00007ffff7af9894 in scm_i_with_guile_and_parent (
>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350,
>     parent=0x0) at threads.c:950
> #23 0x00007ffff7af98c0 in scm_with_guile (
>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350)
>     at threads.c:956
> #24 0x00007ffff7a825f4 in scm_boot_guile (argc=1,
>     argv=0x7fffffffe478, main_func=0x400bac <inner_main>, closure=0x0)
>     at init.c:319
> #25 0x0000000000400c35 in main (argc=1, argv=0x7fffffffe478)
>     at guile.c:81
>
>
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-guile <at> gnu.org:
bug#14141; Package guile. (Fri, 05 Apr 2013 17:34:01 GMT) Full text and rfc822 format available.

Message #11 received at 14141 <at> debbugs.gnu.org (full text, mbox):

From: Noah Lavine <noah.b.lavine <at> gmail.com>
To: Noah Lavine <noah.b.lavine <at> gmail.com>
Cc: 14141 <14141 <at> debbugs.gnu.org>
Subject: Re: bug#14141: Abort in RTL VM
Date: Fri, 5 Apr 2013 13:29:25 -0400
[Message part 1 (text/plain, inline)]
Hello,

Just a quick update - it seems to be related to the order of the
reserve-locals and bind-rest calls. If I reverse those, the problem goes
away. However, I still don't know why this happens, and why the problem
doesn't happen when the variables aren't boxed.

I think there might be something weird going on in bind-rest when the
argument is zero. It has a loop like this:

while (nargs-- > dst) { ... }.

When dst is zero, doesn't nargs end up getting set to -1? (Which, since
it's unsigned, is really 2^32 - 1.) That might make any later instructions
that use nargs (like reserve-locals) do odd things.

Noah


On Thu, Apr 4, 2013 at 3:44 PM, Noah Lavine <noah.b.lavine <at> gmail.com> wrote:

> Oh, I forgot to mention one important fact. I *do* get the expected result
> if I eliminate the stuff with boxes. This works fine:
>
>
> scheme@(guile-user)> (assemble-program '((begin-program foo)
>  (assert-nargs-ge 0)
>  (reserve-locals 4)
>  (bind-rest 0)
>  (cache-current-module! 2 foo)
>  (cached-toplevel-ref 2 foo car)
>  (tail-call 1 2)
>  (end-program)))
>
> Best,
> Noah
>
> On Thu, Apr 4, 2013 at 3:22 PM, Noah Lavine <noah.b.lavine <at> gmail.com>wrote:
>
>> Hello,
>>
>> I'm actually testing on the wip-rtl-cps branch, but this error involves
>> code that I believe is the same on that branch and on the wip-rtl branch.
>> Try opening a new Guile and doing the following:
>>
>> scheme@(guile-user)> (use-modules (system vm rtl))
>> scheme@(guile-user)> (assemble-program '((begin-program foo)
>>  (assert-nargs-ge 0)
>>  (reserve-locals 4)
>>  (bind-rest 0)
>>  (box 1 0)
>>  (cache-current-module! 2 foo)
>>  (cached-toplevel-ref 2 foo car)
>>  (box-ref 3 1)
>>  (mov 0 3)
>>  (tail-call 1 2)
>>  (end-program)))
>> ... ... ... ... ... ... ... ... ... ... $1 = #<rtl-program dcec90 609bc0>
>> scheme@(guile-user)> ($1 'hello)
>>
>> The expected result is
>> $2 = hello
>>
>> What I actually get is,
>>
>> Program received signal SIGABRT, Aborted.
>> 0x00007ffff7440425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>>
>> The full backtrace is below. The interesting part is that it seems to be
>> tripping the check at libguile/vm-engine.c:1868, which checks whether an
>> object is a variable before doing a box-ref on it. When I look at it in
>> GDB, it seems that whatever is at register 1 does not satisfy
>> scm_variable_p, although I'm not very experienced with debugging Guile.
>> However, I am somewhat surprised at this, because I have used boxes and
>> box-ref before in the past with no trouble.
>>
>> Another surprising thing is that if I open Guile, do some other things
>> for a while, and then run this code, the problem sometimes doesn't appear.
>> That is especially disturbing.
>>
>> Does anyone have any idea where the issue is or how I should find it?
>>
>> Thanks,
>> Noah
>>
>> Here's the backtrace:
>>
>> (gdb) bt
>> #0  0x00007ffff7440425 in raise ()
>>    from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x00007ffff7443b8b in abort ()
>>    from /lib/x86_64-linux-gnu/libc.so.6
>> #2  0x00007ffff7b30986 in rtl_vm_debug_engine (vm=0x6a6860,
>>     program=0xdcec90, argv=0x6a9548, nargs_=1) at vm-engine.c:1868
>> #3  0x00007ffff7b1aaf1 in vm_debug_engine (vm=0x6a6860,
>>     program=0xdcec90, argv=0x7fffffffd028, nargs=1) at vm-engine.c:419
>> #4  0x00007ffff7b38f6c in scm_c_vm_run (vm=0x6a6860,
>>     program=0x75dbe0, argv=0x7fffffffd028, nargs=1) at vm.c:791
>> #5  0x00007ffff7a5bff3 in scm_primitive_eval (exp=0x7fe7f0)
>>     at eval.c:691
>> #6  0x00007ffff7a5c0ad in scm_eval (exp=0x7fe7f0,
>>     module_or_state=0x7e0090) at eval.c:725
>> #7  0x00007ffff7acef2d in scm_shell (argc=1, argv=0x7fffffffe478)
>>     at script.c:441
>> #8  0x0000000000400bd0 in inner_main (closure=0x0, argc=1,
>>     argv=0x7fffffffe478) at guile.c:62
>> #9  0x00007ffff7a82663 in invoke_main_func (body_data=0x7fffffffe350)
>>     at init.c:336
>> #10 0x00007ffff7a563c9 in c_body (d=0x7fffffffe220)
>>     at continuations.c:513
>> #11 0x00007ffff7afc96c in apply_catch_closure (clo=0x81b360,
>>     args=0x304) at throw.c:146
>> #12 0x00007ffff7acf739 in apply_1 (smob=0x81b360, a=0x304)
>>     at smob.c:141
>> #13 0x00007ffff7b05cc8 in vm_regular_engine (vm=0x6a6860,
>>     program=0x7443e0, argv=0x7fffffffe0c0, nargs=2)
>>     at vm-i-system.c:873
>> #14 0x00007ffff7b38f6c in scm_c_vm_run (vm=0x6a6860,
>>     program=0x79d5d0, argv=0x7fffffffe0c0, nargs=4) at vm.c:791
>> #15 0x00007ffff7a5b793 in scm_call_4 (proc=0x79d5d0, arg1=0x404,
>>     arg2=0x81b360, arg3=0x81b340, arg4=0x81b320) at eval.c:513
>> #16 0x00007ffff7afc767 in scm_catch_with_pre_unwind_handler (
>>     key=0x404, thunk=0x81b360, handler=0x81b340,
>>     pre_unwind_handler=0x81b320) at throw.c:86
>> #17 0x00007ffff7afca44 in scm_c_catch (tag=0x404,
>>     body=0x7ffff7a563a1 <c_body>, body_data=0x7fffffffe220,
>>     handler=0x7ffff7a563d8 <c_handler>, handler_data=0x7fffffffe220,
>>     pre_unwind_handler=0x7ffff7a5642c <pre_unwind_handler>,
>>     pre_unwind_handler_data=0x751160) at throw.c:213
>> #18 0x00007ffff7a5623d in scm_i_with_continuation_barrier (
>>     body=0x7ffff7a563a1 <c_body>, body_data=0x7fffffffe220,
>>     handler=0x7ffff7a563d8 <c_handler>, handler_data=0x7fffffffe220,
>>     pre_unwind_handler=0x7ffff7a5642c <pre_unwind_handler>,
>>     pre_unwind_handler_data=0x751160) at continuations.c:451
>> #19 0x00007ffff7a564c3 in scm_c_with_continuation_barrier (
>>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350)
>>     at continuations.c:547
>> #20 0x00007ffff7af97ba in with_guile_and_parent (base=0x7fffffffe290,
>>     base <at> entry=<error reading variable: value has been optimized out>,
>> data=0x7fffffffe2d0,
>>     data <at> entry=<error reading variable: value has been optimized out>)
>>     at threads.c:907
>> #21 0x00007ffff71b6f55 in GC_call_with_stack_base (
>>     fn=<optimized out>, arg=<optimized out>) at misc.c:1553
>> #22 0x00007ffff7af9894 in scm_i_with_guile_and_parent (
>>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350,
>>     parent=0x0) at threads.c:950
>> #23 0x00007ffff7af98c0 in scm_with_guile (
>>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350)
>>     at threads.c:956
>> #24 0x00007ffff7a825f4 in scm_boot_guile (argc=1,
>>     argv=0x7fffffffe478, main_func=0x400bac <inner_main>, closure=0x0)
>>     at init.c:319
>> #25 0x0000000000400c35 in main (argc=1, argv=0x7fffffffe478)
>>     at guile.c:81
>>
>>
>>
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-guile <at> gnu.org:
bug#14141; Package guile. (Sat, 06 Apr 2013 14:53:02 GMT) Full text and rfc822 format available.

Message #14 received at 14141 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Israelsson Tampe <stefan.itampe <at> gmail.com>
To: Noah Lavine <noah.b.lavine <at> gmail.com>
Cc: 14141 <14141 <at> debbugs.gnu.org>
Subject: Re: bug#14141: Abort in RTL VM
Date: Sat, 6 Apr 2013 16:49:32 +0200
Yeah, you really found the problem.

I would put a if(nargs) to guard the while just to make it more robust.

/Stefan


On Fri, Apr 5, 2013 at 7:29 PM, Noah Lavine <noah.b.lavine <at> gmail.com> wrote:
> Hello,
>
> Just a quick update - it seems to be related to the order of the
> reserve-locals and bind-rest calls. If I reverse those, the problem goes
> away. However, I still don't know why this happens, and why the problem
> doesn't happen when the variables aren't boxed.
>
> I think there might be something weird going on in bind-rest when the
> argument is zero. It has a loop like this:
>
> while (nargs-- > dst) { ... }.
>
> When dst is zero, doesn't nargs end up getting set to -1? (Which, since it's
> unsigned, is really 2^32 - 1.) That might make any later instructions that
> use nargs (like reserve-locals) do odd things.
>
> Noah
>
>
> On Thu, Apr 4, 2013 at 3:44 PM, Noah Lavine <noah.b.lavine <at> gmail.com> wrote:
>>
>> Oh, I forgot to mention one important fact. I *do* get the expected result
>> if I eliminate the stuff with boxes. This works fine:
>>
>>
>> scheme@(guile-user)> (assemble-program '((begin-program foo)
>>  (assert-nargs-ge 0)
>>  (reserve-locals 4)
>>  (bind-rest 0)
>>  (cache-current-module! 2 foo)
>>  (cached-toplevel-ref 2 foo car)
>>  (tail-call 1 2)
>>  (end-program)))
>>
>> Best,
>> Noah
>>
>> On Thu, Apr 4, 2013 at 3:22 PM, Noah Lavine <noah.b.lavine <at> gmail.com>
>> wrote:
>>>
>>> Hello,
>>>
>>> I'm actually testing on the wip-rtl-cps branch, but this error involves
>>> code that I believe is the same on that branch and on the wip-rtl branch.
>>> Try opening a new Guile and doing the following:
>>>
>>> scheme@(guile-user)> (use-modules (system vm rtl))
>>> scheme@(guile-user)> (assemble-program '((begin-program foo)
>>>  (assert-nargs-ge 0)
>>>  (reserve-locals 4)
>>>  (bind-rest 0)
>>>  (box 1 0)
>>>  (cache-current-module! 2 foo)
>>>  (cached-toplevel-ref 2 foo car)
>>>  (box-ref 3 1)
>>>  (mov 0 3)
>>>  (tail-call 1 2)
>>>  (end-program)))
>>> ... ... ... ... ... ... ... ... ... ... $1 = #<rtl-program dcec90 609bc0>
>>> scheme@(guile-user)> ($1 'hello)
>>>
>>> The expected result is
>>> $2 = hello
>>>
>>> What I actually get is,
>>>
>>> Program received signal SIGABRT, Aborted.
>>> 0x00007ffff7440425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
>>>
>>> The full backtrace is below. The interesting part is that it seems to be
>>> tripping the check at libguile/vm-engine.c:1868, which checks whether an
>>> object is a variable before doing a box-ref on it. When I look at it in GDB,
>>> it seems that whatever is at register 1 does not satisfy scm_variable_p,
>>> although I'm not very experienced with debugging Guile. However, I am
>>> somewhat surprised at this, because I have used boxes and box-ref before in
>>> the past with no trouble.
>>>
>>> Another surprising thing is that if I open Guile, do some other things
>>> for a while, and then run this code, the problem sometimes doesn't appear.
>>> That is especially disturbing.
>>>
>>> Does anyone have any idea where the issue is or how I should find it?
>>>
>>> Thanks,
>>> Noah
>>>
>>> Here's the backtrace:
>>>
>>> (gdb) bt
>>> #0  0x00007ffff7440425 in raise ()
>>>    from /lib/x86_64-linux-gnu/libc.so.6
>>> #1  0x00007ffff7443b8b in abort ()
>>>    from /lib/x86_64-linux-gnu/libc.so.6
>>> #2  0x00007ffff7b30986 in rtl_vm_debug_engine (vm=0x6a6860,
>>>     program=0xdcec90, argv=0x6a9548, nargs_=1) at vm-engine.c:1868
>>> #3  0x00007ffff7b1aaf1 in vm_debug_engine (vm=0x6a6860,
>>>     program=0xdcec90, argv=0x7fffffffd028, nargs=1) at vm-engine.c:419
>>> #4  0x00007ffff7b38f6c in scm_c_vm_run (vm=0x6a6860,
>>>     program=0x75dbe0, argv=0x7fffffffd028, nargs=1) at vm.c:791
>>> #5  0x00007ffff7a5bff3 in scm_primitive_eval (exp=0x7fe7f0)
>>>     at eval.c:691
>>> #6  0x00007ffff7a5c0ad in scm_eval (exp=0x7fe7f0,
>>>     module_or_state=0x7e0090) at eval.c:725
>>> #7  0x00007ffff7acef2d in scm_shell (argc=1, argv=0x7fffffffe478)
>>>     at script.c:441
>>> #8  0x0000000000400bd0 in inner_main (closure=0x0, argc=1,
>>>     argv=0x7fffffffe478) at guile.c:62
>>> #9  0x00007ffff7a82663 in invoke_main_func (body_data=0x7fffffffe350)
>>>     at init.c:336
>>> #10 0x00007ffff7a563c9 in c_body (d=0x7fffffffe220)
>>>     at continuations.c:513
>>> #11 0x00007ffff7afc96c in apply_catch_closure (clo=0x81b360,
>>>     args=0x304) at throw.c:146
>>> #12 0x00007ffff7acf739 in apply_1 (smob=0x81b360, a=0x304)
>>>     at smob.c:141
>>> #13 0x00007ffff7b05cc8 in vm_regular_engine (vm=0x6a6860,
>>>     program=0x7443e0, argv=0x7fffffffe0c0, nargs=2)
>>>     at vm-i-system.c:873
>>> #14 0x00007ffff7b38f6c in scm_c_vm_run (vm=0x6a6860,
>>>     program=0x79d5d0, argv=0x7fffffffe0c0, nargs=4) at vm.c:791
>>> #15 0x00007ffff7a5b793 in scm_call_4 (proc=0x79d5d0, arg1=0x404,
>>>     arg2=0x81b360, arg3=0x81b340, arg4=0x81b320) at eval.c:513
>>> #16 0x00007ffff7afc767 in scm_catch_with_pre_unwind_handler (
>>>     key=0x404, thunk=0x81b360, handler=0x81b340,
>>>     pre_unwind_handler=0x81b320) at throw.c:86
>>> #17 0x00007ffff7afca44 in scm_c_catch (tag=0x404,
>>>     body=0x7ffff7a563a1 <c_body>, body_data=0x7fffffffe220,
>>>     handler=0x7ffff7a563d8 <c_handler>, handler_data=0x7fffffffe220,
>>>     pre_unwind_handler=0x7ffff7a5642c <pre_unwind_handler>,
>>>     pre_unwind_handler_data=0x751160) at throw.c:213
>>> #18 0x00007ffff7a5623d in scm_i_with_continuation_barrier (
>>>     body=0x7ffff7a563a1 <c_body>, body_data=0x7fffffffe220,
>>>     handler=0x7ffff7a563d8 <c_handler>, handler_data=0x7fffffffe220,
>>>     pre_unwind_handler=0x7ffff7a5642c <pre_unwind_handler>,
>>>     pre_unwind_handler_data=0x751160) at continuations.c:451
>>> #19 0x00007ffff7a564c3 in scm_c_with_continuation_barrier (
>>>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350)
>>>     at continuations.c:547
>>> #20 0x00007ffff7af97ba in with_guile_and_parent (base=0x7fffffffe290,
>>>     base <at> entry=<error reading variable: value has been optimized out>,
>>> data=0x7fffffffe2d0,
>>>     data <at> entry=<error reading variable: value has been optimized out>)
>>>     at threads.c:907
>>> #21 0x00007ffff71b6f55 in GC_call_with_stack_base (
>>>     fn=<optimized out>, arg=<optimized out>) at misc.c:1553
>>> #22 0x00007ffff7af9894 in scm_i_with_guile_and_parent (
>>>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350,
>>>     parent=0x0) at threads.c:950
>>> #23 0x00007ffff7af98c0 in scm_with_guile (
>>>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350)
>>>     at threads.c:956
>>> #24 0x00007ffff7a825f4 in scm_boot_guile (argc=1,
>>>     argv=0x7fffffffe478, main_func=0x400bac <inner_main>, closure=0x0)
>>>     at init.c:319
>>> #25 0x0000000000400c35 in main (argc=1, argv=0x7fffffffe478)
>>>     at guile.c:81
>>>
>>>
>>
>




Information forwarded to bug-guile <at> gnu.org:
bug#14141; Package guile. (Fri, 12 Apr 2013 17:55:02 GMT) Full text and rfc822 format available.

Message #17 received at 14141 <at> debbugs.gnu.org (full text, mbox):

From: Noah Lavine <noah.b.lavine <at> gmail.com>
To: Stefan Israelsson Tampe <stefan.itampe <at> gmail.com>
Cc: 14141 <14141 <at> debbugs.gnu.org>
Subject: Re: bug#14141: Abort in RTL VM
Date: Fri, 12 Apr 2013 13:50:10 -0400
[Message part 1 (text/plain, inline)]
Case closed! (At least for now.)

Although the bug in reserve-locals is real (you can check with the
debugger), the program never actually got far enough for that to affect
anything. Instead, here's what happened:

 - reserve-locals worked fine, reserving space for 4 local variables
 - bind-rest shrunk the stack back down, leaving enough space for only one
local variable
 - the call to toplevel-ref invoked the old (non-RTL) VM, through line 479
of modules.c.
 - the old VM put its initial frame on the stack right after the stack
pointer - but since the stack pointer had been decremented by bind-rest,
that overwrote the 4 local variables in the RTL function. In particular, it
overwrote the variable that held the box (fp[1], for the record).
 - after the old VM returned, the new VM continued, tried to use the
incorrect fp[1] value, and aborted.

So, mystery solved. Coming soon: patches to fix the bug you hit if you try
to do reserve-locals after bind-rest ...

Noah


On Sat, Apr 6, 2013 at 10:49 AM, Stefan Israelsson Tampe <
stefan.itampe <at> gmail.com> wrote:

> Yeah, you really found the problem.
>
> I would put a if(nargs) to guard the while just to make it more robust.
>
> /Stefan
>
>
> On Fri, Apr 5, 2013 at 7:29 PM, Noah Lavine <noah.b.lavine <at> gmail.com>
> wrote:
> > Hello,
> >
> > Just a quick update - it seems to be related to the order of the
> > reserve-locals and bind-rest calls. If I reverse those, the problem goes
> > away. However, I still don't know why this happens, and why the problem
> > doesn't happen when the variables aren't boxed.
> >
> > I think there might be something weird going on in bind-rest when the
> > argument is zero. It has a loop like this:
> >
> > while (nargs-- > dst) { ... }.
> >
> > When dst is zero, doesn't nargs end up getting set to -1? (Which, since
> it's
> > unsigned, is really 2^32 - 1.) That might make any later instructions
> that
> > use nargs (like reserve-locals) do odd things.
> >
> > Noah
> >
> >
> > On Thu, Apr 4, 2013 at 3:44 PM, Noah Lavine <noah.b.lavine <at> gmail.com>
> wrote:
> >>
> >> Oh, I forgot to mention one important fact. I *do* get the expected
> result
> >> if I eliminate the stuff with boxes. This works fine:
> >>
> >>
> >> scheme@(guile-user)> (assemble-program '((begin-program foo)
> >>  (assert-nargs-ge 0)
> >>  (reserve-locals 4)
> >>  (bind-rest 0)
> >>  (cache-current-module! 2 foo)
> >>  (cached-toplevel-ref 2 foo car)
> >>  (tail-call 1 2)
> >>  (end-program)))
> >>
> >> Best,
> >> Noah
> >>
> >> On Thu, Apr 4, 2013 at 3:22 PM, Noah Lavine <noah.b.lavine <at> gmail.com>
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I'm actually testing on the wip-rtl-cps branch, but this error involves
> >>> code that I believe is the same on that branch and on the wip-rtl
> branch.
> >>> Try opening a new Guile and doing the following:
> >>>
> >>> scheme@(guile-user)> (use-modules (system vm rtl))
> >>> scheme@(guile-user)> (assemble-program '((begin-program foo)
> >>>  (assert-nargs-ge 0)
> >>>  (reserve-locals 4)
> >>>  (bind-rest 0)
> >>>  (box 1 0)
> >>>  (cache-current-module! 2 foo)
> >>>  (cached-toplevel-ref 2 foo car)
> >>>  (box-ref 3 1)
> >>>  (mov 0 3)
> >>>  (tail-call 1 2)
> >>>  (end-program)))
> >>> ... ... ... ... ... ... ... ... ... ... $1 = #<rtl-program dcec90
> 609bc0>
> >>> scheme@(guile-user)> ($1 'hello)
> >>>
> >>> The expected result is
> >>> $2 = hello
> >>>
> >>> What I actually get is,
> >>>
> >>> Program received signal SIGABRT, Aborted.
> >>> 0x00007ffff7440425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> >>>
> >>> The full backtrace is below. The interesting part is that it seems to
> be
> >>> tripping the check at libguile/vm-engine.c:1868, which checks whether
> an
> >>> object is a variable before doing a box-ref on it. When I look at it
> in GDB,
> >>> it seems that whatever is at register 1 does not satisfy
> scm_variable_p,
> >>> although I'm not very experienced with debugging Guile. However, I am
> >>> somewhat surprised at this, because I have used boxes and box-ref
> before in
> >>> the past with no trouble.
> >>>
> >>> Another surprising thing is that if I open Guile, do some other things
> >>> for a while, and then run this code, the problem sometimes doesn't
> appear.
> >>> That is especially disturbing.
> >>>
> >>> Does anyone have any idea where the issue is or how I should find it?
> >>>
> >>> Thanks,
> >>> Noah
> >>>
> >>> Here's the backtrace:
> >>>
> >>> (gdb) bt
> >>> #0  0x00007ffff7440425 in raise ()
> >>>    from /lib/x86_64-linux-gnu/libc.so.6
> >>> #1  0x00007ffff7443b8b in abort ()
> >>>    from /lib/x86_64-linux-gnu/libc.so.6
> >>> #2  0x00007ffff7b30986 in rtl_vm_debug_engine (vm=0x6a6860,
> >>>     program=0xdcec90, argv=0x6a9548, nargs_=1) at vm-engine.c:1868
> >>> #3  0x00007ffff7b1aaf1 in vm_debug_engine (vm=0x6a6860,
> >>>     program=0xdcec90, argv=0x7fffffffd028, nargs=1) at vm-engine.c:419
> >>> #4  0x00007ffff7b38f6c in scm_c_vm_run (vm=0x6a6860,
> >>>     program=0x75dbe0, argv=0x7fffffffd028, nargs=1) at vm.c:791
> >>> #5  0x00007ffff7a5bff3 in scm_primitive_eval (exp=0x7fe7f0)
> >>>     at eval.c:691
> >>> #6  0x00007ffff7a5c0ad in scm_eval (exp=0x7fe7f0,
> >>>     module_or_state=0x7e0090) at eval.c:725
> >>> #7  0x00007ffff7acef2d in scm_shell (argc=1, argv=0x7fffffffe478)
> >>>     at script.c:441
> >>> #8  0x0000000000400bd0 in inner_main (closure=0x0, argc=1,
> >>>     argv=0x7fffffffe478) at guile.c:62
> >>> #9  0x00007ffff7a82663 in invoke_main_func (body_data=0x7fffffffe350)
> >>>     at init.c:336
> >>> #10 0x00007ffff7a563c9 in c_body (d=0x7fffffffe220)
> >>>     at continuations.c:513
> >>> #11 0x00007ffff7afc96c in apply_catch_closure (clo=0x81b360,
> >>>     args=0x304) at throw.c:146
> >>> #12 0x00007ffff7acf739 in apply_1 (smob=0x81b360, a=0x304)
> >>>     at smob.c:141
> >>> #13 0x00007ffff7b05cc8 in vm_regular_engine (vm=0x6a6860,
> >>>     program=0x7443e0, argv=0x7fffffffe0c0, nargs=2)
> >>>     at vm-i-system.c:873
> >>> #14 0x00007ffff7b38f6c in scm_c_vm_run (vm=0x6a6860,
> >>>     program=0x79d5d0, argv=0x7fffffffe0c0, nargs=4) at vm.c:791
> >>> #15 0x00007ffff7a5b793 in scm_call_4 (proc=0x79d5d0, arg1=0x404,
> >>>     arg2=0x81b360, arg3=0x81b340, arg4=0x81b320) at eval.c:513
> >>> #16 0x00007ffff7afc767 in scm_catch_with_pre_unwind_handler (
> >>>     key=0x404, thunk=0x81b360, handler=0x81b340,
> >>>     pre_unwind_handler=0x81b320) at throw.c:86
> >>> #17 0x00007ffff7afca44 in scm_c_catch (tag=0x404,
> >>>     body=0x7ffff7a563a1 <c_body>, body_data=0x7fffffffe220,
> >>>     handler=0x7ffff7a563d8 <c_handler>, handler_data=0x7fffffffe220,
> >>>     pre_unwind_handler=0x7ffff7a5642c <pre_unwind_handler>,
> >>>     pre_unwind_handler_data=0x751160) at throw.c:213
> >>> #18 0x00007ffff7a5623d in scm_i_with_continuation_barrier (
> >>>     body=0x7ffff7a563a1 <c_body>, body_data=0x7fffffffe220,
> >>>     handler=0x7ffff7a563d8 <c_handler>, handler_data=0x7fffffffe220,
> >>>     pre_unwind_handler=0x7ffff7a5642c <pre_unwind_handler>,
> >>>     pre_unwind_handler_data=0x751160) at continuations.c:451
> >>> #19 0x00007ffff7a564c3 in scm_c_with_continuation_barrier (
> >>>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350)
> >>>     at continuations.c:547
> >>> #20 0x00007ffff7af97ba in with_guile_and_parent (base=0x7fffffffe290,
> >>>     base <at> entry=<error reading variable: value has been optimized out>,
> >>> data=0x7fffffffe2d0,
> >>>     data <at> entry=<error reading variable: value has been optimized out>)
> >>>     at threads.c:907
> >>> #21 0x00007ffff71b6f55 in GC_call_with_stack_base (
> >>>     fn=<optimized out>, arg=<optimized out>) at misc.c:1553
> >>> #22 0x00007ffff7af9894 in scm_i_with_guile_and_parent (
> >>>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350,
> >>>     parent=0x0) at threads.c:950
> >>> #23 0x00007ffff7af98c0 in scm_with_guile (
> >>>     func=0x7ffff7a82613 <invoke_main_func>, data=0x7fffffffe350)
> >>>     at threads.c:956
> >>> #24 0x00007ffff7a825f4 in scm_boot_guile (argc=1,
> >>>     argv=0x7fffffffe478, main_func=0x400bac <inner_main>, closure=0x0)
> >>>     at init.c:319
> >>> #25 0x0000000000400c35 in main (argc=1, argv=0x7fffffffe478)
> >>>     at guile.c:81
> >>>
> >>>
> >>
> >
>
[Message part 2 (text/html, inline)]

bug closed, send any further explanations to 14141 <at> debbugs.gnu.org and Noah Lavine <noah.b.lavine <at> gmail.com> Request was from Mark H Weaver <mhw <at> netris.org> to control <at> debbugs.gnu.org. (Thu, 05 Dec 2013 04:07:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 02 Jan 2014 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 175 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.