GNU bug report logs - #21694
'clone' syscall binding unreliable

Previous Next

Package: guix;

Reported by: ludo <at> gnu.org (Ludovic Courtès)

Date: Fri, 16 Oct 2015 20:41:02 UTC

Severity: normal

Done: ludo <at> gnu.org (Ludovic Courtès)

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: ludo <at> gnu.org (Ludovic Courtès)
Subject: bug#21694: closed (Re: bug#21694: 'clone' syscall binding unreliable)
Date: Wed, 28 Oct 2015 14:40:05 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#21694: 'clone' syscall binding unreliable

which was filed against the guix package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 21694 <at> debbugs.gnu.org.

-- 
21694: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=21694
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: ludo <at> gnu.org (Ludovic Courtès)
To: "Thompson\, David" <dthompson2 <at> worcester.edu>
Cc: 21694-done <at> debbugs.gnu.org, David Thompson <davet <at> gnu.org>
Subject: Re: bug#21694: 'clone' syscall binding unreliable
Date: Wed, 28 Oct 2015 15:39:49 +0100
ludo <at> gnu.org (Ludovic Courtès) skribis:

> "Thompson, David" <dthompson2 <at> worcester.edu> skribis:
>
>> On Fri, Oct 16, 2015 at 4:39 PM, Ludovic Courtès <ludo <at> gnu.org> wrote:

[...]

>>> Now, there remains the question of CLONE_CHILD_SETTID and
>>> CLONE_CHILD_CLEARTID.  Since we’re passing NULL for ‘ctid’, I expect
>>> that these flags have no effect at all.
>>
>> I added those flags in commit ee78d02 because they solved a real issue
>> I ran into.  Adding those flags made 'clone' look like a
>> 'primitive-fork' call when examined with strace.
>
> Could you check whether removing these flags makes a difference now?

I removed them in commit after confirming that it affects neither the
test suite nor ‘guix system environment’ (on x86_64, with Linux-libre
4.2.3-gnu.)

Thanks,
Ludo’.

[Message part 3 (message/rfc822, inline)]
From: ludo <at> gnu.org (Ludovic Courtès)
To: David Thompson <davet <at> gnu.org>
Cc: bug-guix <at> gnu.org
Subject: 'clone' syscall binding unreliable
Date: Fri, 16 Oct 2015 22:39:59 +0200
[Message part 4 (text/plain, inline)]
I’m reporting the problem and (hopefully) the solution, but I think we’d
better double-check this.

The problem: Running the test below in a loop sometimes gets a SIGSEGV
in the child process (on x86_64, libc 2.22.)

--8<---------------cut here---------------start------------->8---
(use-modules (guix build syscalls) (ice-9 match))

(match (clone (logior CLONE_NEWUSER
                      CLONE_CHILD_SETTID
                      CLONE_CHILD_CLEARTID
                      SIGCHLD))
  (0
   (throw 'x))                                    ;XXX: sometimes segfaults
  (pid
   (match (waitpid pid)
     ((_ . status)
      (pk 'status status)
      (exit (not (status:term-sig status)))))))
--8<---------------cut here---------------end--------------->8---

Looking at (guix build syscalls) though, I see an ABI mismatch between
our definition and the actual ‘syscall’ C function, and between our
‘clone’ definition and the actual C function.

This leads to the attached patch, which also fixes the above problem for me.

[Message part 5 (text/x-patch, inline)]
diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scm
index 80b9d00..f931f8d 100644
--- a/guix/build/syscalls.scm
+++ b/guix/build/syscalls.scm
@@ -322,10 +322,16 @@ string TMPL and return its file name.  TMPL must end with 'XXXXXX'."
 (define CLONE_NEWNET         #x40000000)
 
 ;; The libc interface to sys_clone is not useful for Scheme programs, so the
-;; low-level system call is wrapped instead.
+;; low-level system call is wrapped instead.  The 'syscall' function is
+;; declared in <unistd.h> as a variadic function; in practice, it expects 6
+;; pointer-sized arguments, as shown in, e.g., x86_64/syscall.S.
 (define clone
   (let* ((ptr        (dynamic-func "syscall" (dynamic-link)))
-         (proc       (pointer->procedure int ptr (list int int '*)))
+         (proc       (pointer->procedure long ptr
+                                         (list long                   ;sysno
+                                               unsigned-long          ;flags
+                                               '* '* '*
+                                               '*)))
          ;; TODO: Don't do this.
          (syscall-id (match (utsname:machine (uname))
                        ("i686"   120)
@@ -336,7 +342,10 @@ string TMPL and return its file name.  TMPL must end with 'XXXXXX'."
       "Create a new child process by duplicating the current parent process.
 Unlike the fork system call, clone accepts FLAGS that specify which resources
 are shared between the parent and child processes."
-      (let ((ret (proc syscall-id flags %null-pointer))
+      (let ((ret (proc syscall-id flags
+                       %null-pointer               ;child stack
+                       %null-pointer %null-pointer ;ptid & ctid
+                       %null-pointer))             ;unused
             (err (errno)))
         (if (= ret -1)
             (throw 'system-error "clone" "~d: ~A"
[Message part 6 (text/plain, inline)]
Could you test this patch?

Now, there remains the question of CLONE_CHILD_SETTID and
CLONE_CHILD_CLEARTID.  Since we’re passing NULL for ‘ctid’, I expect
that these flags have no effect at all.

Conversely, libc uses these flags to update the thread ID in the child
process (x86_64/arch-fork.h):

--8<---------------cut here---------------start------------->8---
#define ARCH_FORK() \
  INLINE_SYSCALL (clone, 4,                                                   \
                  CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, 0,     \
                  NULL, &THREAD_SELF->tid)
--8<---------------cut here---------------end--------------->8---

This is certainly useful, but we’d have troubles doing it from the FFI…
It may that this is fine if the process doesn’t use threads.

Ludo’.

This bug report was last modified 9 years and 264 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.