GNU bug report logs -
#21694
'clone' syscall binding unreliable
Previous Next
Reported by: ludo <at> gnu.org (Ludovic Courtès)
Date: Fri, 16 Oct 2015 20:41:02 UTC
Severity: normal
Done: ludo <at> gnu.org (Ludovic Courtès)
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your message dated Wed, 28 Oct 2015 15:39:49 +0100
with message-id <87vb9r83be.fsf <at> gnu.org>
and subject line Re: bug#21694: 'clone' syscall binding unreliable
has caused the debbugs.gnu.org bug report #21694,
regarding 'clone' syscall binding unreliable
to be marked as done.
(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)
--
21694: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=21694
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
[Message part 3 (text/plain, inline)]
I’m reporting the problem and (hopefully) the solution, but I think we’d
better double-check this.
The problem: Running the test below in a loop sometimes gets a SIGSEGV
in the child process (on x86_64, libc 2.22.)
--8<---------------cut here---------------start------------->8---
(use-modules (guix build syscalls) (ice-9 match))
(match (clone (logior CLONE_NEWUSER
CLONE_CHILD_SETTID
CLONE_CHILD_CLEARTID
SIGCHLD))
(0
(throw 'x)) ;XXX: sometimes segfaults
(pid
(match (waitpid pid)
((_ . status)
(pk 'status status)
(exit (not (status:term-sig status)))))))
--8<---------------cut here---------------end--------------->8---
Looking at (guix build syscalls) though, I see an ABI mismatch between
our definition and the actual ‘syscall’ C function, and between our
‘clone’ definition and the actual C function.
This leads to the attached patch, which also fixes the above problem for me.
[Message part 4 (text/x-patch, inline)]
diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scm
index 80b9d00..f931f8d 100644
--- a/guix/build/syscalls.scm
+++ b/guix/build/syscalls.scm
@@ -322,10 +322,16 @@ string TMPL and return its file name. TMPL must end with 'XXXXXX'."
(define CLONE_NEWNET #x40000000)
;; The libc interface to sys_clone is not useful for Scheme programs, so the
-;; low-level system call is wrapped instead.
+;; low-level system call is wrapped instead. The 'syscall' function is
+;; declared in <unistd.h> as a variadic function; in practice, it expects 6
+;; pointer-sized arguments, as shown in, e.g., x86_64/syscall.S.
(define clone
(let* ((ptr (dynamic-func "syscall" (dynamic-link)))
- (proc (pointer->procedure int ptr (list int int '*)))
+ (proc (pointer->procedure long ptr
+ (list long ;sysno
+ unsigned-long ;flags
+ '* '* '*
+ '*)))
;; TODO: Don't do this.
(syscall-id (match (utsname:machine (uname))
("i686" 120)
@@ -336,7 +342,10 @@ string TMPL and return its file name. TMPL must end with 'XXXXXX'."
"Create a new child process by duplicating the current parent process.
Unlike the fork system call, clone accepts FLAGS that specify which resources
are shared between the parent and child processes."
- (let ((ret (proc syscall-id flags %null-pointer))
+ (let ((ret (proc syscall-id flags
+ %null-pointer ;child stack
+ %null-pointer %null-pointer ;ptid & ctid
+ %null-pointer)) ;unused
(err (errno)))
(if (= ret -1)
(throw 'system-error "clone" "~d: ~A"
[Message part 5 (text/plain, inline)]
Could you test this patch?
Now, there remains the question of CLONE_CHILD_SETTID and
CLONE_CHILD_CLEARTID. Since we’re passing NULL for ‘ctid’, I expect
that these flags have no effect at all.
Conversely, libc uses these flags to update the thread ID in the child
process (x86_64/arch-fork.h):
--8<---------------cut here---------------start------------->8---
#define ARCH_FORK() \
INLINE_SYSCALL (clone, 4, \
CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, 0, \
NULL, &THREAD_SELF->tid)
--8<---------------cut here---------------end--------------->8---
This is certainly useful, but we’d have troubles doing it from the FFI…
It may that this is fine if the process doesn’t use threads.
Ludo’.
[Message part 6 (message/rfc822, inline)]
ludo <at> gnu.org (Ludovic Courtès) skribis:
> "Thompson, David" <dthompson2 <at> worcester.edu> skribis:
>
>> On Fri, Oct 16, 2015 at 4:39 PM, Ludovic Courtès <ludo <at> gnu.org> wrote:
[...]
>>> Now, there remains the question of CLONE_CHILD_SETTID and
>>> CLONE_CHILD_CLEARTID. Since we’re passing NULL for ‘ctid’, I expect
>>> that these flags have no effect at all.
>>
>> I added those flags in commit ee78d02 because they solved a real issue
>> I ran into. Adding those flags made 'clone' look like a
>> 'primitive-fork' call when examined with strace.
>
> Could you check whether removing these flags makes a difference now?
I removed them in commit after confirming that it affects neither the
test suite nor ‘guix system environment’ (on x86_64, with Linux-libre
4.2.3-gnu.)
Thanks,
Ludo’.
This bug report was last modified 9 years and 265 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.