GNU bug report logs - #10225
Lock ordering mismatch

Previous Next

Package: guile;

Reported by: Andy Wingo <wingo <at> pobox.com>

Date: Mon, 5 Dec 2011 20:44:02 UTC

Severity: normal

Done: Andy Wingo <wingo <at> pobox.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Andy Wingo <wingo <at> pobox.com>
Subject: bug#10225: closed (Re: bug#10225: Lock ordering mismatch)
Date: Thu, 23 Feb 2017 18:57:01 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#10225: Lock ordering mismatch

which was filed against the guile package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 10225 <at> debbugs.gnu.org.

-- 
10225: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=10225
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Andy Wingo <wingo <at> pobox.com>
To: 10225-done <at> debbugs.gnu.org
Cc: ludo <at> gnu.org
Subject: Re: bug#10225: Lock ordering mismatch
Date: Thu, 23 Feb 2017 19:56:25 +0100
> As seen in ccb80964cd7cd112e300c34d32f67125a6d6da9a, there’s a lock
> ordering mismatch between ‘do_thread exit’ and ‘fat_mutex_lock’
> wrt. ‘t->admin_mutex’ and ‘m->lock’.

Happily with the thread-safety rewrite in 2.1.5 this is fixed!

Andy

[Message part 3 (message/rfc822, inline)]
From: Andy Wingo <wingo <at> pobox.com>
To: bug-guile <bug-guile <at> gnu.org>
Cc: ludo <at> gnu.org
Subject: Lock ordering mismatch
Date: Mon, 05 Dec 2011 21:42:59 +0100
[Message part 4 (text/plain, inline)]
This message from Ludovic on 1 July indicates a problem in our threading
code that we need to fix.

Andy

[Message part 5 (message/rfc822, inline)]
From: ludo <at> gnu.org (Ludovic Courtès)
To: guile-devel <at> gnu.org
Subject: Lock ordering mismatch
Date: Fri, 01 Jul 2011 23:11:00 +0200
[Message part 6 (text/plain, inline)]
Hello,

As seen in ccb80964cd7cd112e300c34d32f67125a6d6da9a, there’s a lock
ordering mismatch between ‘do_thread exit’ and ‘fat_mutex_lock’
wrt. ‘t->admin_mutex’ and ‘m->lock’.

I thought this commit solved the problem, but now I think it doesn’t
because it leaves a small window during which a mutex could be held by a
thread without being part of its `mutexes' list, thereby violating the
invariant tested at line 667:

        /* Since MUTEX is in `t->mutexes', T must be its owner.  */
        assert (scm_is_eq (m->owner, t->handle));

So I reverted the patch.

(The situation isn’t better without the patch since
“while ./check-guile srfi-18.test threads.test ; do : ; done”
eventually hits the assertion failure.)

I tried the attached patch, which is inelegant and leads to deadlocks
with canceled threads (namely the “cancel succeeds” test in
threads.test.)

IOW I would welcome fresh ideas to approach the problem.  :-)

Thanks,
Ludo’.

[Message part 7 (text/x-patch, inline)]
	Modified libguile/threads.c
diff --git a/libguile/threads.c b/libguile/threads.c
index cbacfca..d537e0e 100644
--- a/libguile/threads.c
+++ b/libguile/threads.c
@@ -1353,12 +1353,24 @@ fat_mutex_lock (SCM mutex, scm_t_timespec *timeout, SCM owner, int *ret)
   fat_mutex *m = SCM_MUTEX_DATA (mutex);
 
   SCM new_owner = SCM_UNBNDP (owner) ? scm_current_thread() : owner;
+  scm_i_thread *t =
+    scm_is_true (new_owner) ? SCM_I_THREAD_DATA (new_owner) : NULL;
   SCM err = SCM_BOOL_F;
 
   struct timeval current_time;
 
-  scm_i_scm_pthread_mutex_lock (&m->lock);
+#define LOCK					\
+  if (t != NULL)				\
+    scm_i_pthread_mutex_lock (&t->admin_mutex);	\
+  scm_i_pthread_mutex_lock (&m->lock)
+
+#define UNLOCK					\
+  scm_i_pthread_mutex_unlock (&m->lock);	\
+  if (t != NULL)				\
+    scm_i_pthread_mutex_unlock (&t->admin_mutex)
 
+
+  LOCK;
   while (1)
     {
       if (m->level == 0)
@@ -1367,22 +1379,12 @@ fat_mutex_lock (SCM mutex, scm_t_timespec *timeout, SCM owner, int *ret)
 	  m->level++;
 
 	  if (SCM_I_IS_THREAD (new_owner))
-	    {
-	      scm_i_thread *t = SCM_I_THREAD_DATA (new_owner);
-
-	      scm_i_pthread_mutex_unlock (&m->lock);
-	      scm_i_pthread_mutex_lock (&t->admin_mutex);
-
-	      /* Only keep a weak reference to MUTEX so that it's not
-		 retained when not referenced elsewhere (bug #27450).
-		 The weak pair itself is eventually removed when MUTEX
-		 is unlocked.  Note that `t->mutexes' lists mutexes
-		 currently held by T, so it should be small.  */
-	      t->mutexes = scm_weak_car_pair (mutex, t->mutexes);
-
-	      scm_i_pthread_mutex_unlock (&t->admin_mutex);
-	      scm_i_pthread_mutex_lock (&m->lock);
-	    }
+	    /* Only keep a weak reference to MUTEX so that it's not
+	       retained when not referenced elsewhere (bug #27450).
+	       The weak pair itself is eventually removed when MUTEX
+	       is unlocked.  Note that `t->mutexes' lists mutexes
+	       currently held by T, so it should be small.  */
+	    t->mutexes = scm_weak_car_pair (mutex, t->mutexes);
 	  *ret = 1;
 	  break;
 	}
@@ -1425,13 +1427,18 @@ fat_mutex_lock (SCM mutex, scm_t_timespec *timeout, SCM owner, int *ret)
 		}
 	    }
 	  block_self (m->waiting, mutex, &m->lock, timeout);
-	  scm_i_pthread_mutex_unlock (&m->lock);
-	  SCM_TICK;
-	  scm_i_scm_pthread_mutex_lock (&m->lock);
+
+	  /* UNLOCK; */
+	  /* SCM_TICK; */
+	  /* LOCK; */
 	}
     }
-  scm_i_pthread_mutex_unlock (&m->lock);
+
+  UNLOCK;
+
   return err;
+#undef LOCK
+#undef UNLOCK
 }
 
 SCM scm_lock_mutex (SCM mx)

[Message part 8 (text/plain, inline)]

-- 
http://wingolog.org/

This bug report was last modified 8 years and 90 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.