GNU bug report logs - #26711
Multithreading segfaults

Previous Next

Package: guile;

Reported by: Jacek Swiergocki <jswiergo <at> gmail.com>

Date: Sat, 29 Apr 2017 16:56:01 UTC

Severity: normal

To reply to this bug, email your comments to 26711 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#26711; Package guile. (Sat, 29 Apr 2017 16:56:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jacek Swiergocki <jswiergo <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Sat, 29 Apr 2017 16:56:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jacek Swiergocki <jswiergo <at> gmail.com>
To: bug-guile <at> gnu.org
Subject: Multithreading segfaults
Date: Sat, 29 Apr 2017 13:16:13 +0200
[Message part 1 (text/plain, inline)]
Hi all,

I have two examples of multi threading code that crash with segmentation
fault. If there is a bug in guile please fix it. If there is a problem only
in my code please help me how it can be workaround.

I am using Ubuntu 14.04 and guile compiled from the repository. The
examples are compiled by:
g++ demo.cc -Wall -std=c++11 -pthread -I/usr/local/test/include/guile/2.2
-L/usr/local/test/lib -lguile-2.2 -lgc

The first example has started to segfault since guile tagged v2.1.7. I have
not encountered problems for v2.1.6 and older versions. However for recent
version v2.2.2 it requires much more iterations to fail than v2.1.7 that
fails instantly.

////////////////////////////////////////////////////////////////////////////
// Example 1.

#include <libguile.h>

#include <thread>
#include <vector>
#include <atomic>
#include <mutex>

static volatile bool hold = true;
static std::atomic_int start_cnt(0);

static std::mutex init_once_mtx;
static bool start_inited_once = false;
static bool is_inited_once = false;

static std::mutex gc_mtx;

class Eval
{
public:
    Eval();
    ~Eval();
};

void* c_wrap_init(void*)
{
    return nullptr;
}

Eval::Eval()
{
    scm_with_guile(c_wrap_init, this);
}

Eval::~Eval()
{
    std::lock_guard<std::mutex> lck(gc_mtx);
    scm_gc();
}

void* c_wrap_init_only_once(void*)
{
    is_inited_once = true;
    return nullptr;
}

void init_only_once()
{
    std::lock_guard<std::mutex> lck(init_once_mtx);
    if (!start_inited_once)
    {
        start_inited_once = true;
        scm_with_guile(c_wrap_init_only_once, nullptr);
    }
    while (!is_inited_once); // spin;
}

void threadedInit(int thread_id)
{
    start_cnt ++;
    while (hold) {} // spin

    init_only_once();
    Eval* ev = new Eval();

    delete ev;
}

void test_init_race()
{
    int n_threads = 120;
    start_cnt = 0;
    hold = true;

    std::vector<std::thread> thread_pool;
    for (int i = 0; i < n_threads; i++)
        thread_pool.push_back(std::thread(&threadedInit, i));

    while (start_cnt != n_threads) {}  // spin
    printf("Done creating %d threads\n", n_threads);
    hold = false;

    for (std::thread& t : thread_pool) t.join();
    printf("Done joining %d threads\n", n_threads);
}

int main()
{
    for (int k = 0; k < 10000; k++)
    {
        test_init_race();
        printf("------------------ done iteration %d\n", k);
}

The second example requires much more iterations to crash with segfault.
Sometimes hundreds, sometimes thousands, it seems to be random. You need to
wait over a dozen of minutes, sometimes you need try again and restart. I
have found this problem for old versions e.g. v2.0.11 as well as for recent
version v2.2.2, so it seems to be an old problem.

////////////////////////////////////////////////////////////////////////////
// Example 2.

#include <libguile.h>

#include <thread>
#include <vector>
#include <atomic>
#include <mutex>

static volatile bool hold = true;
static std::atomic_int start_cnt(0);

static std::mutex init_once_mtx;
static bool start_inited_once = false;
static bool is_inited_once = false;

void* c_wrap_init_only_once(void*)
{
    is_inited_once = true;
    return nullptr;
}

void* c_wrap_eval(void*)
{
    return nullptr;
}

void init_only_once()
{
    std::lock_guard<std::mutex> lck(init_once_mtx);
    if (!start_inited_once)
    {
        start_inited_once = true;
        scm_with_guile(c_wrap_init_only_once, nullptr);
    }
    while (!is_inited_once); // spin;
}

void threadedInit(int thread_id)
{
    start_cnt ++;
    while (hold) {} // spin

    init_only_once();
    for (int i = 0; i < 100; ++i)
    {
        scm_with_guile(c_wrap_eval, nullptr);
    }
}

void test_init_race()
{
    int n_threads = 120;
    start_cnt = 0;
    hold = true;

    std::vector<std::thread> thread_pool;
    for (int i = 0; i < n_threads; i++)
        thread_pool.push_back(std::thread(&threadedInit, i));

    while (start_cnt != n_threads) {}  // spin
    printf("Done creating %d threads\n", n_threads);
    hold = false;

    for (std::thread& t : thread_pool) t.join();
    printf("Done joining %d threads\n", n_threads);
}

int main()
{
    for (int k = 0; k < 10000; k++)
    {
        test_init_race();
        printf("------------------ done iteration %d\n", k);
    }
}

--
Jacek
[Message part 2 (text/html, inline)]

Information forwarded to bug-guile <at> gnu.org:
bug#26711; Package guile. (Sun, 30 Apr 2017 22:01:01 GMT) Full text and rfc822 format available.

Message #8 received at 26711 <at> debbugs.gnu.org (full text, mbox):

From: Linas Vepstas <linasvepstas <at> gmail.com>
To: 26711 <at> debbugs.gnu.org
Subject: Example1 is buggy
Date: Sun, 30 Apr 2017 16:59:34 -0500
[Message part 1 (text/plain, inline)]
Example1.cc is has a work-around -- main() needs to call scm_init_guile()
or scm_with_guile().  If this is done, the problem goes away.

The problem with example1 is that the first thread to initialize guile is
eventually destroyed. However, the first thread to call guile never ever
sets "needs_unregister" in libguile/threads.c and thus, bdwgc never finds
out that this thread no longer exists. Sooner or later, bdwgc touches this
non-existent thread, and crashes.

If its OK to initialize guile for the first time ever in a transient
thread, then there's a bug in guile; else there's a bug in the example.

I'm now looking into example2.

--linas
[Message part 2 (text/html, inline)]

This bug report was last modified 8 years and 130 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.