Package: guile;
Reported by: David Kastrup <dak <at> gnu.org>
Date: Mon, 16 Feb 2015 17:16:02 UTC
Severity: normal
Done: Andy Wingo <wingo <at> pobox.com>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: help-debbugs <at> gnu.org (GNU bug Tracking System) To: David Kastrup <dak <at> gnu.org> Subject: bug#19883: closed (Re: bug#19883: Correction for backtrace) Date: Thu, 23 Jun 2016 10:37:01 +0000
[Message part 1 (text/plain, inline)]
Your bug report #19883: Smob's mark_smob has become unreliable in Guile 2.x which was filed against the guile package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 19883 <at> debbugs.gnu.org. -- 19883: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19883 GNU Bug Tracking System Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Andy Wingo <wingo <at> pobox.com> To: Ludovic Courtès <ludo <at> gnu.org> Cc: 19883-done <at> debbugs.gnu.org, David Kastrup <dak <at> gnu.org> Subject: Re: bug#19883: Correction for backtrace Date: Thu, 23 Jun 2016 12:36:34 +0200[Message part 3 (text/plain, inline)]On Thu 23 Jun 2016 11:50, Andy Wingo <wingo <at> pobox.com> writes: > On Thu 26 Feb 2015 16:30, David Kastrup <dak <at> gnu.org> writes: > >> Try ./test 2 2000 200 > > I can reproduce the crash with your test case, thanks :) The patch below > fixes the bug for me. WDYT Ludovic? Here's a patch with a test case. I'm going to apply as it seems to be obviously the right thing and the test case does reproduce what I think is the bug (racing mark and finalize procedures, even if it's only happening in one thread, finalizers and mark procedures do introduce concurrency). We trigger the concurrency in a simple way, via allocation in the finalizer. The patch does fix the original test. GC could happen due to another thread of course. I'm actually not sure where the concurrency is coming from in David's test though :/ I'm very interested in any feedback you might have! Andy[0001-Fix-race-between-SMOB-marking-and-finalization.patch (text/plain, inline)]From 8dff3af087c6eaa83ae0d72aa8b22aef5c65d65d Mon Sep 17 00:00:00 2001 From: Andy Wingo <wingo <at> pobox.com> Date: Thu, 23 Jun 2016 11:47:42 +0200 Subject: [PATCH] Fix race between SMOB marking and finalization * libguile/smob.c (clear_smobnum): New helper. (finalize_smob): Re-set the smobnum to the "finalized smob" type before finalizing. Fixes #19883. (scm_smob_prehistory): Pre-register a "finalized smob" type, which has no mark procedure. * test-suite/standalone/test-smob-mark-race.c: New file. * test-suite/standalone/Makefile.am: Arrange to build and run the new test. --- libguile/smob.c | 33 +++++++++++++-- test-suite/standalone/Makefile.am | 6 +++ test-suite/standalone/test-smob-mark-race.c | 65 +++++++++++++++++++++++++++++ 3 files changed, 101 insertions(+), 3 deletions(-) create mode 100644 test-suite/standalone/test-smob-mark-race.c diff --git a/libguile/smob.c b/libguile/smob.c index 90849a8..ed9d91a 100644 --- a/libguile/smob.c +++ b/libguile/smob.c @@ -374,20 +374,43 @@ scm_gc_mark (SCM o) } +static void* +clear_smobnum (void *ptr) +{ + SCM smob; + scm_t_bits smobnum; + + smob = PTR2SCM (ptr); + + smobnum = SCM_SMOBNUM (smob); + /* Frob the object's type in place, re-setting it to be the "finalized + smob" type. This will prevent other routines from accessing its + internals in a way that assumes that the smob data is valid. This + is notably the case for SMOB's own "mark" procedure, if any; as the + finalizer runs without the alloc lock, it's possible for a GC to + occur while it's running, in which case the object is alive and yet + its data is invalid. */ + SCM_SET_SMOB_DATA_0 (smob, SCM_SMOB_DATA_0 (smob) & ~(scm_t_bits) 0xff00); + + return (void *) smobnum; +} + /* Finalize SMOB by calling its SMOB type's free function, if any. */ static void finalize_smob (void *ptr, void *data) { SCM smob; + scm_t_bits smobnum; size_t (* free_smob) (SCM); smob = PTR2SCM (ptr); + smobnum = (scm_t_bits) GC_call_with_alloc_lock (clear_smobnum, ptr); + #if 0 - printf ("finalizing SMOB %p (smobnum: %u)\n", - ptr, SCM_SMOBNUM (smob)); + printf ("finalizing SMOB %p (smobnum: %u)\n", ptr, smobnum); #endif - free_smob = scm_smobs[SCM_SMOBNUM (smob)].free; + free_smob = scm_smobs[smobnum].free; if (free_smob) free_smob (smob); } @@ -470,6 +493,7 @@ void scm_smob_prehistory () { long i; + scm_t_bits finalized_smob_tc16; scm_i_pthread_key_create (¤t_mark_stack_pointer, NULL); scm_i_pthread_key_create (¤t_mark_stack_limit, NULL); @@ -493,6 +517,9 @@ scm_smob_prehistory () scm_smobs[i].apply = 0; scm_smobs[i].apply_trampoline_objcode = SCM_BOOL_F; } + + finalized_smob_tc16 = scm_make_smob_type ("finalized smob", 0); + if (SCM_TC2SMOBNUM (finalized_smob_tc16) != 0) abort (); } /* diff --git a/test-suite/standalone/Makefile.am b/test-suite/standalone/Makefile.am index 712418a..aec7418 100644 --- a/test-suite/standalone/Makefile.am +++ b/test-suite/standalone/Makefile.am @@ -275,4 +275,10 @@ test_smob_mark_LDADD = $(LIBGUILE_LDADD) check_PROGRAMS += test-smob-mark TESTS += test-smob-mark +test_smob_mark_race_SOURCES = test-smob-mark-race.c +test_smob_mark_race_CFLAGS = ${test_cflags} +test_smob_mark_race_LDADD = $(LIBGUILE_LDADD) +check_PROGRAMS += test-smob-mark-race +TESTS += test-smob-mark-race + EXTRA_DIST += ${check_SCRIPTS} diff --git a/test-suite/standalone/test-smob-mark-race.c b/test-suite/standalone/test-smob-mark-race.c new file mode 100644 index 0000000..eca0325 --- /dev/null +++ b/test-suite/standalone/test-smob-mark-race.c @@ -0,0 +1,65 @@ +/* Copyright (C) 2016 Free Software Foundation, Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public License + * as published by the Free Software Foundation; either version 3 of + * the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301 USA + */ + +#if HAVE_CONFIG_H +#include <config.h> +#endif + +#undef NDEBUG + +#include <libguile.h> +#include <assert.h> + +static SCM +mark_smob (SCM smob) +{ + assert (SCM_SMOB_DATA (smob) == 1); + return SCM_BOOL_F; +} + +static size_t +finalize_smob (SCM smob) +{ + assert (SCM_SMOB_DATA (smob) == 1); + SCM_SET_SMOB_DATA (smob, 0); + /* Allocate a bit in the hopes of triggering a new GC, making the + marker race with the finalizer. */ + scm_cons (SCM_BOOL_F, SCM_BOOL_F); + return 0; +} + +static void +tests (void *data, int argc, char **argv) +{ + scm_t_bits tc16; + int i; + + tc16 = scm_make_smob_type ("smob with finalizer", 0); + scm_set_smob_mark (tc16, mark_smob); + scm_set_smob_free (tc16, finalize_smob); + + for (i = 0; i < 1000 * 1000; i++) + scm_new_smob (tc16, 1); +} + +int +main (int argc, char *argv[]) +{ + scm_boot_guile (argc, argv, tests, NULL); + return 0; +} -- 2.8.3
[Message part 5 (message/rfc822, inline)]
From: David Kastrup <dak <at> gnu.org> To: bug-guile <at> gnu.org Subject: Smob's mark_smob has become unreliable in Guile 2.x Date: Mon, 16 Feb 2015 17:41:36 +0100[Message part 6 (text/plain, inline)]This precludes porting LilyPond to Guile 2 since LilyPond relies extensively on the mark_smob functionality. The symptom likely occurs when an object is collected and consequently its free_smob function gets called. This free_smob function essentially is responsible for freeing all resources claimed by the smob, rendering the underlying data moot. If mark_smob continues to get called, its interpretation of the now dead data is not likely to lead to happy results. Debugging is tricky since the gc phases tend to occur asynchronously. If I start the accompanying program with ./test 2 20000 40 I tend to get a core dump perhaps every fourth call. Running repeatedly in a debugger does not produce core dumps (perhaps one needs to reload the debugger for each try?), so one needs to do ulimit -c unlimited ./test 2 20000 40 and, after the problem triggers gdb ./test core for post-mortem debugging. A traceback looks like (gdb) bt #0 0x00000000 in ?? () #1 0x0804aee0 in ?? () #2 0xb761b2da in ?? () from /usr/lib/libguile-2.0.so.22 #3 0xb72751f8 in GC_mark_from () from /usr/lib/i386-linux-gnu/libgc.so.1 #4 0xb72766ca in GC_mark_some () from /usr/lib/i386-linux-gnu/libgc.so.1 #5 0xb726cd16 in GC_stopped_mark () from /usr/lib/i386-linux-gnu/libgc.so.1 #6 0xb726d730 in GC_try_to_collect_inner () from /usr/lib/i386-linux-gnu/libgc.so.1 #7 0xb726e12a in GC_collect_or_expand () from /usr/lib/i386-linux-gnu/libgc.so.1 #8 0xb726e2b1 in GC_allocobj () from /usr/lib/i386-linux-gnu/libgc.so.1 #9 0xb72731a4 in GC_generic_malloc_inner () from /usr/lib/i386-linux-gnu/libgc.so.1 #10 0xb72732be in GC_generic_malloc () from /usr/lib/i386-linux-gnu/libgc.so.1 #11 0xb761b81e in scm_i_new_smob () from /usr/lib/libguile-2.0.so.22 #12 0x0804985a in std::vector<void (*)(), std::allocator<void (*)()> >::_M_insert_aux(__gnu_cxx::__normal_iterator<void (**)(), std::vector<void (*)(), std::allocator<void (*)()> > >, void (* const&)()) () #13 0x0804a6fc in __gnu_cxx::new_allocator<void (*)()>::allocate(unsigned int, void const*) () #14 0x0804a03d in void std::_Destroy<void (**)(), void (*)()>(void (**)(), void (**)(), std::allocator<void (*)()>&) () #15 0x08049a73 in void std::_Destroy<Family**, Family*>(Family**, Family**, std::allocator<Family*>&) () #16 0x0804934d in scm_puts () #17 0x0804938f in Scm_init::Scm_init(void (*)()) () #18 0x0804938f in Scm_init::Scm_init(void (*)()) () #19 0x0804938f in Scm_init::Scm_init(void (*)()) () #20 0x0804938f in Scm_init::Scm_init(void (*)()) () #21 0x0804938f in Scm_init::Scm_init(void (*)()) () #22 0x0804938f in Scm_init::Scm_init(void (*)()) () #23 0x0804938f in Scm_init::Scm_init(void (*)()) () #24 0x0804938f in Scm_init::Scm_init(void (*)()) () #25 0x0804938f in Scm_init::Scm_init(void (*)()) () #26 0x0804938f in Scm_init::Scm_init(void (*)()) () #27 0x0804938f in Scm_init::Scm_init(void (*)()) () #28 0x0804938f in Scm_init::Scm_init(void (*)()) () #29 0x0804938f in Scm_init::Scm_init(void (*)()) () #30 0x080495e7 in std::vector<Family*, std::allocator<Family*> >::operator[](unsigned int) () #31 0xb75b7dfd in ?? () from /usr/lib/libguile-2.0.so.22 #32 0xb76418e7 in ?? () from /usr/lib/libguile-2.0.so.22 #33 0xb761afb9 in ?? () from /usr/lib/libguile-2.0.so.22 #34 0xb7659f20 in ?? () from /usr/lib/libguile-2.0.so.22 #35 0xb765a539 in ?? () from /usr/lib/libguile-2.0.so.22 #36 0xb75c24f3 in scm_call_4 () from /usr/lib/libguile-2.0.so.22 #37 0xb7641acf in scm_catch_with_pre_unwind_handler () from /usr/lib/libguile-2.0.so.22 #38 0xb7641bd4 in scm_c_catch () from /usr/lib/libguile-2.0.so.22 #39 0xb75b85d1 in ?? () from /usr/lib/libguile-2.0.so.22 #40 0xb75b86d3 in scm_c_with_continuation_barrier () from /usr/lib/libguile-2.0.so.22 #41 0xb763ef7e in ?? () from /usr/lib/libguile-2.0.so.22 #42 0xb72782c1 in GC_call_with_stack_base () from /usr/lib/i386-linux-gnu/libgc.so.1 #43 0xb763f3e6 in scm_with_guile () from /usr/lib/libguile-2.0.so.22 #44 0x080496c5 in void __gnu_cxx::__alloc_traits<std::allocator<void (*)()> >::construct<void (*)()>(std::allocator<void (*)()>&, void (**)(), void (* const&)()) () #45 0xb72cfa83 in __libc_start_main (main=0x4, argc=134517216, argv=0x0, init=0x8049201 <main+16>, fini=0x804967e <std::_Vector_base<void (*)(), std::allocator<void (*)()> >::~_Vector_base()+40>, rtld_fini=0x4, stack_end=0xbfca7aa4) at libc-start.c:287 #46 0xb7750000 in ?? () from /lib/ld-linux.so.2 Backtrace stopped: previous frame inner to this frame (corrupt stack?) The frames listed as #17 0x0804938f in Scm_init::Scm_init(void (*)()) () #18 ... appear to be a fluke in address/symbol correlation and do not seem to have an actual relation to the listed function when looking at the code. I append all the requisite source files producing the test binary when running "make". I can send my binary (when desired) which was compiled using 2.0.11 on a i586 platform. However, I would expect the problem to reproduce in some kind of manner on other systems. The compiler was gcc version 4.9.1 (Ubuntu 4.9.1-16ubuntu6), Target: i686-linux-gnu. The headers are somewhat cooked-down variants (to remove dependencies on other parts of LilyPond) of the Smob organizing classes used in LilyPond. If one wants to compile for Guile v1 for comparison, one probably needs typedef void * scm_t_func; or so somewhere.[guile-bug.tgz (application/x-gtar-compressed, attachment)][Message part 8 (text/plain, inline)]-- David Kastrup
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.