GNU bug report logs - #39266
Heap corruption leads to random crashes

Previous Next

Package: guile;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Fri, 24 Jan 2020 15:15:02 UTC

Severity: important

Merged with 36811, 36812, 39208, 39241, 39988

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#39266: closed (Finalization thread hits wrong-type-arg on
 weak vector (AArch64))
Date: Tue, 10 Mar 2020 17:27:01 +0000
[Message part 1 (text/plain, inline)]
Your message dated Tue, 10 Mar 2020 18:25:55 +0100
with message-id <87d09kt93w.fsf <at> gnu.org>
and subject line Re: bug#39266: Finalization thread hits wrong-type-arg on weak vector (AArch64)
has caused the debbugs.gnu.org bug report #39266,
regarding Finalization thread hits wrong-type-arg on weak vector (AArch64)
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
39266: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=39266
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
To: bug-Guile <at> gnu.org
Subject: Finalization thread hits wrong-type-arg on weak vector (AArch64)
Date: Fri, 24 Jan 2020 16:14:29 +0100
Hello!

While building the “guix-system.drv” derivation on AArch64, I got this
crash (not fully deterministic but quite frequent).  Here the
finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
accessing a one-element weak vector):

--8<---------------cut here---------------start------------->8---
$ ( export out=$PWD/build; unset GUILE_LOAD_PATH; unset GUILE_LOAD_COMPILED_PATH; gdb --args "/gnu/store/p8in2npgl5yhliy25ikz7shjbq0gii95-guile-next-3.0.0/bin/guile" "--no-auto-compile" "-L" "/gnu/store/3qg8l6kr4wa9sbgwy00z1mb3p88xf455-module-import" "-C" "/gnu/store/h9qcvg71bmx735fsndagll9y7s72k9n9-module-import-compiled" guix-system-builder )
[…]
loading 'gnu/services/cups.scm'...
Backtrace:
[Switching to Thread 0xffffbebec1d0 (LWP 22464)]

Thread 2 "guile" hit Breakpoint 3, scm_display_backtrace_with_highlights (
    stack=stack <at> entry="#<struct stack>" = {...}, port=port <at> entry=#<port #<port-type file 4c3b40> 510040>, 
    first=first <at> entry=#f, depth=depth <at> entry=#f, highlights=highlights <at> entry=()) at backtrace.c:269
269     {
(gdb) bt
#0  scm_display_backtrace_with_highlights (stack=stack <at> entry="#<struct stack>" = {...}, 
    port=port <at> entry=#<port #<port-type file 4c3b40> 510040>, first=first <at> entry=#f, depth=depth <at> entry=#f, 
    highlights=highlights <at> entry=()) at backtrace.c:269
#1  0x0000ffffbf5ef8c4 in print_exception_and_backtrace (
    args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cef60, tag=wrong-type-arg, 
    port=#<port #<port-type file 4c3b40> 510040>) at continuations.c:409
#2  pre_unwind_handler (error_port=0x510040, tag=wrong-type-arg, 
    args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cef60) at continuations.c:453
#3  0x0000ffffbf672588 in catch_pre_unwind_handler (data=0xffffbebeb850, 
    exn=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70ced40) at throw.c:135
#4  0x0000ffffbf67bdf8 in vm_regular_engine (thread=0x475b40) at vm-engine.c:972
#5  0x0000ffffbf67d10c in scm_call_n (proc=proc <at> entry=#<unmatched-tag 10045>, argv=<optimized out>, nargs=5)
    at vm.c:1589
#6  0x0000ffffbf5f3c10 in scm_apply_0 (proc=#<unmatched-tag 10045>, args=()) at eval.c:603
#7  0x0000ffffbf5f4654 in scm_apply_1 (proc=<optimized out>, arg1=arg1 <at> entry=wrong-type-arg, 
    args=args <at> entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70caf80) at eval.c:609
#8  0x0000ffffbf6729e0 in scm_throw (key=key <at> entry=wrong-type-arg, 
    args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70caf80) at throw.c:262
#9  0x0000ffffbf672b44 in scm_ithrow (key=key <at> entry=wrong-type-arg, args=<optimized out>, no_return=no_return <at> entry=1)
    at throw.c:457
#10 0x0000ffffbf5f1dec in scm_error_scm (key=key <at> entry=wrong-type-arg, subr=subr <at> entry="weak-vector-ref", 
    message=<optimized out>, 
    args=args <at> entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafd0, 
    data=data <at> entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafc0) at error.c:90
#11 0x0000ffffbf5f1ea0 in scm_error (key=key <at> entry=wrong-type-arg, 
    subr=subr <at> entry=0xffffbf6a52d8 <s_scm_weak_vector_ref> "weak-vector-ref", 
    message=message <at> entry=0xffffbf696e98 "Wrong type argument in position ~A (expecting ~A): ~S", 
    args=args <at> entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafd0, 
    rest=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafc0) at error.c:62
#12 0x0000ffffbf5f22cc in scm_wrong_type_arg_msg (
    subr=subr <at> entry=0xffffbf6a52d8 <s_scm_weak_vector_ref> "weak-vector-ref", pos=pos <at> entry=1, 
    bad_value=<error reading variable: ERROR: Cannot access memory at address 0x0>0x30ff880, 
    szMessage=szMessage <at> entry=0xffffbf6a5300 "weak vector") at error.c:282
#13 0x0000ffffbf680050 in scm_c_weak_vector_ref (wv=<optimized out>, k=k <at> entry=0) at weak-vector.c:193
#14 0x0000ffffbf67eff4 in scm_i_weak_car (
    pair=<error reading variable: ERROR: Cannot access memory at address 0x0>0x30f8830) at weak-list.h:39
#15 scm_i_visit_weak_list (list_loc=0xffffbf6c81b0 <all_weak_tables>, visit=<optimized out>) at weak-list.h:49
#16 vacuum_all_weak_tables () at weak-table.c:494
#17 0x0000ffffbf5fda44 in async_gc_finalizer (ptr=0x494ec0, data=0x0) at finalizers.c:316
#18 0x0000ffffbf549f74 in GC_invoke_finalizers ()
   from /gnu/store/wsqzmim7m23gskpibrpqzx4djadhjz8y-libgc-7.6.12/lib/libgc.so.1
#19 0x0000ffffbf5fdf64 in scm_run_finalizers () at finalizers.c:398
#20 0x0000ffffbf5fdff4 in finalization_thread_proc (unused=<optimized out>) at finalizers.c:233
#21 0x0000ffffbf5ef6e0 in c_body (d=0xffffbebeb918) at continuations.c:430
#22 0x0000ffffbf67bdf8 in vm_regular_engine (thread=0x475b40) at vm-engine.c:972
#23 0x0000ffffbf67d10c in scm_call_n (proc=#<unmatched-tag 10045>, argv=argv <at> entry=0xffffbebeb660, 
    nargs=nargs <at> entry=2) at vm.c:1589
#24 0x0000ffffbf5f3930 in scm_call_2 (proc=<optimized out>, arg1=<optimized out>, arg2=<optimized out>) at eval.c:503
#25 0x0000ffffbf5f4f38 in scm_c_with_exception_handler (type=type <at> entry=#t, handler=0xffffbebeb670, 
    handler <at> entry=0xffffbf6724b0 <catch_post_unwind_handler>, handler_data=0x510040, 
    handler_data <at> entry=0xffffbebeb850, thunk=0x0, thunk <at> entry=0xffffbf6725f8 <catch_body>, 
    thunk_data=0x1dce42683dff4d67, thunk_data <at> entry=0xffffbebeb850) at exceptions.c:170
#26 0x0000ffffbf672850 in scm_c_catch (tag=tag <at> entry=#t, body=body <at> entry=0xffffbf5ef6c8 <c_body>, 
    body_data=body_data <at> entry=0xffffbebeb918, handler=handler <at> entry=0xffffbf5ef970 <c_handler>, 
    handler_data=handler_data <at> entry=0xffffbebeb918, 
    pre_unwind_handler=pre_unwind_handler <at> entry=0xffffbf5ef7b8 <pre_unwind_handler>, 
    pre_unwind_handler_data=pre_unwind_handler_data <at> entry=0x510040) at throw.c:168
#27 0x0000ffffbf5efbf4 in scm_i_with_continuation_barrier (body=body <at> entry=0xffffbf5ef6c8 <c_body>, 
    body_data=body_data <at> entry=0xffffbebeb918, handler=handler <at> entry=0xffffbf5ef970 <c_handler>, 
    handler_data=handler_data <at> entry=0xffffbebeb918, 
    pre_unwind_handler=pre_unwind_handler <at> entry=0xffffbf5ef7b8 <pre_unwind_handler>, pre_unwind_handler_data=0x510040)
    at continuations.c:368
#28 0x0000ffffbf5efca0 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>)
    at continuations.c:464
#29 0x0000ffffbf671148 in with_guile (base=0xffffbebeb988, data=0xffffbebeb9a8) at threads.c:645
#30 0x0000ffffbf551618 in GC_call_with_stack_base ()
   from /gnu/store/wsqzmim7m23gskpibrpqzx4djadhjz8y-libgc-7.6.12/lib/libgc.so.1
#31 0x0000ffffbf6714a8 in scm_i_with_guile (dynamic_state=<optimized out>, data=<optimized out>, func=<optimized out>)
    at threads.c:688
#32 scm_with_guile (func=<optimized out>, data=<optimized out>) at threads.c:694
#33 0x0000ffffbf50e7f4 in start_thread ()
   from /gnu/store/nr1aw4i32h7rmxwmq7d2da0mwcwg551j-glibc-2.29/lib/libpthread.so.0
#34 0x0000ffffbf136edc in thread_start () from /gnu/store/nr1aw4i32h7rmxwmq7d2da0mwcwg551j-glibc-2.29/lib/libc.so.6
(gdb) info threads
  Id   Target Id                                 Frame 
  1    Thread 0xffffbf6f4010 (LWP 22463) "guile" resize_table (table=table <at> entry=0x4a2cb0) at weak-table.c:272
* 2    Thread 0xffffbebec1d0 (LWP 22464) "guile" scm_display_backtrace_with_highlights (
    stack=stack <at> entry="#<struct stack>" = {...}, port=port <at> entry=#<port #<port-type file 4c3b40> 510040>, 
    first=first <at> entry=#f, depth=depth <at> entry=#f, highlights=highlights <at> entry=()) at backtrace.c:269
--8<---------------cut here---------------end--------------->8---

The problem appears to be that the type tag of the weak-vector got
zeroed:

--8<---------------cut here---------------start------------->8---
(gdb) frame 15
#15 scm_i_visit_weak_list (list_loc=0xffffbf6c81b0 <all_weak_tables>, visit=<optimized out>) at weak-list.h:49
49            SCM car = scm_i_weak_car (in);
(gdb) p in
$48 = <error reading variable: ERROR: Cannot access memory at address 0x0>(SCM) <error reading variable: ERROR: Cannot access memory at address 0x0>0x30f8830
(gdb) p *(void**)in
$49 = (void *) 0x30ff880
(gdb) p ((void**)$49)[0]@2
$50 = {0x0, 0x30f8840}
(gdb) p (void**)in
$51 = (void **) 0x30f8830
--8<---------------cut here---------------end--------------->8---

There’s normally no disappearing link registered on the first element of
the weak vector (type tag + length) so I don’t know how this can happen.
Here’s the other thread (with surprisingly broken stack frames):

--8<---------------cut here---------------start------------->8---
(gdb) thread 1
[Switching to thread 1 (Thread 0xffffbf6f4010 (LWP 22463))]
#0  resize_table (table=table <at> entry=0x4a2cb0) at weak-table.c:272
272               scm_t_weak_entry *next = entry->next;
(gdb) bt
#0  resize_table (table=table <at> entry=0x4a2cb0) at weak-table.c:272
#1  0x0000ffffbf67ef00 in vacuum_weak_table (table=0x4a2cb0) at weak-table.c:318
#2  0x0000ffffbf67f374 in scm_c_weak_table_ref (table=<optimized out>, raw_hash=2016212919028049524, 
    pred=0xffffbf67eb10 <assq_predicate>, closure=0x72b6a80, dflt=()) at weak-table.c:533
#3  0x0000ffffbf653824 in scm_source_properties (obj=<optimized out>) at srcprop.c:195
#4  0x0000ffffbeebcf14 in ?? ()
#5  0x0000ffffbf03c130 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
--8<---------------cut here---------------end--------------->8---

Thoughts?

This code is the same as in 2.2.

Ludo’.


[Message part 3 (message/rfc822, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
To: Pierre Langlois <pierre.langlois <at> gmx.com>
Cc: 39266-done <at> debbugs.gnu.org
Subject: Re: bug#39266: Finalization thread hits wrong-type-arg on weak vector
 (AArch64)
Date: Tue, 10 Mar 2020 18:25:55 +0100
Hi Pierre,

Pierre Langlois <pierre.langlois <at> gmx.com> skribis:

> I've tested it on AArch64 and it's looking good, I'm running Guile 3
> finally! I've tested by running 'guix pull --branch=wip-guile-3.0.1' on
> a rockpro64 running the Guix system, I've then reconfigured and rebooted
> and it's all good.

Thanks for testing!

> Thanks so much for the fix! Hopefully it'll work on every platform and
> that can be the end of it :-).

Yup, I’ve tested ‘guix pull --branch=wip-guile-3.0.1’ and ‘guix build
guile3.0-guix’ on all 4 architectures that Guix supports, and everything
is fine.

I’ve now pushed the upgrade to 3.0.1 + patch to Guix.

Closing!  \o/

The bug appears to be rare for Guile workloads not as intensive as a
Guix build (never reported, never seen), but we should still probably do
a bug-fix 3.0.2 release in the coming weeks, I guess.

Ludo’.


This bug report was last modified 5 years and 73 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.