GNU bug report logs - #61095
possible misuse of posix_spawn API on non-linux OSes

Previous Next

Package: guile;

Reported by: Omar Polo <op <at> omarpolo.com>

Date: Fri, 27 Jan 2023 11:53:01 UTC

Severity: normal

Tags: patch

Merged with 61079

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Omar Polo <op <at> omarpolo.com>
Subject: bug#61095: closed (Re: bug#61095: possible misuse of posix_spawn
 API on non-linux OSes)
Date: Sun, 02 Apr 2023 13:45:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#61095: possible misuse of posix_spawn API on non-linux OSes

which was filed against the guile package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 61095 <at> debbugs.gnu.org.

-- 
61095: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=61095
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
To: Omar Polo <op <at> omarpolo.com>
Cc: Josselin Poiret <dev <at> jpoiret.xyz>, Andrew Whatson <whatson <at> tailcall.au>,
 61095-done <at> debbugs.gnu.org
Subject: Re: bug#61095: possible misuse of posix_spawn API on non-linux OSes
Date: Sun, 02 Apr 2023 15:44:01 +0200
Hi!

Omar Polo <op <at> omarpolo.com> skribis:

> On 2023/03/30 22:21:28 +0200, Josselin Poiret <dev <at> jpoiret.xyz> wrote:
>> Hi Ludo,
>> 
>> Ludovic Courtès <ludo <at> gnu.org> writes:
>> 
>> > Coming next is an updated patch series addressing this as proposed
>> > above.  Let me know what y’all think!
>> >
>> > I tested the ‘posix_spawn_file_actions_addclosefrom_np’ path by building in:
>> >
>> >   guix time-machine --branch=core-updates -- shell -CP -D -f guix.scm
>> 
>> I didn't test, but this LGTM!  Maybe someone on OpenBSD could test this
>> patchset?
>
>     % gmake check
>     <snip />
>     gmake[5]: Entering directory '/home/op/w/guile/test-suite/standalone'
>     PASS: test-system-cmds
>
> it seems to work on OpenBSD 7.3 :)

Awesome!  Pushed as 9cc85a4f52147fcdaa4c52a62bcc87bdb267d0a9.

> but note that our libc doesn't have posix_spawn_file_actions_addclosefrom_np,
> so this is using the "racy" code path.

Yeah, not great.  :-/  I hope that function will be adopted by other
libcs, especially since ‘closefrom’ is already available.

> Just for curiosity, as it's outside the scope of the bug, what's the
> reason posix_spawn was used instead of a more classic fork() +
> closefrom()?

There’s a long discussion at:

  https://issues.guix.gnu.org/52835

Essentially, ‘fork’ is unusable in multi-threaded context, in addition
to being inefficient.

Thanks,
Ludo’.

[Message part 3 (message/rfc822, inline)]
From: Omar Polo <op <at> omarpolo.com>
To: bug-guile <at> gnu.org
Subject: possible misuse of posix_spawn API on non-linux OSes
Date: Fri, 27 Jan 2023 12:51:32 +0100
Hello,

I've noticed that test-system-cmds fails on OpenBSD-CURRENT while
testing the update to guile 3.0.9:

    test-system-cmds: system* exit status was 127 rather than 42
    FAIL: test-system-cmds

Here's an excerpt of the ktrace of the child process while executing
that specific test: (the first fork() is the one implicitly done by
posix_spawn(3))

  5590 guile RET   fork 0
  [...]
  5590 guile CALL  dup2(0,3)
  5590 guile RET   dup2 3
  5590 guile CALL  dup2(1,4)
  5590 guile RET   dup2 4
  5590 guile CALL  dup2(2,5)
  5590 guile RET   dup2 5
  5590 guile CALL  dup2(3,0)
  5590 guile RET   dup2 0
  5590 guile CALL  dup2(4,1)
  5590 guile RET   dup2 1
  5590 guile CALL  dup2(5,2)
  5590 guile RET   dup2 2
  5590 guile CALL  close(1023)
  5590 guile RET   close -1 errno 9 Bad file descriptor
  5590 guile CALL  kbind(0x7f7ffffd51f8,24,0x2b5c5ced59893fa9)
  5590 guile RET   kbind 0
  5590 guile CALL  exit(127)

(if you prefer I can provide a full ktrace of guile executing that
test case)

My interpretation is that the sequence of dup2(2) is from
posix_spawn_file_actions_adddup2 in do_spawn, while the strange
close(1023) is from close_inherited_fds_slow.  Such file descriptor is
not open, so close(2) fails with EBADF and the posix_spawn machinery
exits prematurely.  My current RLIMIT_NOFILE is 1024, so the number
would make sense.

On OpenBSD I've tried to use the following patch to work around the
issue:

[[[
Index: libguile/posix.c
--- libguile/posix.c.orig
+++ libguile/posix.c
@@ -1325,6 +1325,7 @@ SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
 static void
 close_inherited_fds_slow (posix_spawn_file_actions_t *actions, int max_fd)
 {
+  max_fd = getdtablecount();
   while (--max_fd > 2)
     posix_spawn_file_actions_addclose (actions, max_fd);
 }
]]]

getdtablecount(2) returns the number of file descriptor currently open
by the process.  unfortunately it doesn't seem to be portable.  (well,
tbf /proc/self/fd is not portable too.)

However, while this pleases the system* test, it breaks the pipe
tests:

    Running popen.test
    FAIL: popen.test: open-input-pipe: echo hello
    FAIL: popen.test: pipeline - arguments: (expected-value ("HELLO WORLD\n" (0 0)) actual-value ("" (127 0)))

the reason seem to be similar:

 74865 guile    CALL  dup2(7,3)
 74865 guile    RET   dup2 3
 74865 guile    CALL  dup2(10,4)
 74865 guile    RET   dup2 4
 74865 guile    CALL  dup2(2,5)
 74865 guile    RET   dup2 5
 74865 guile    CALL  dup2(3,0)
 74865 guile    RET   dup2 0
 74865 guile    CALL  dup2(4,1)
 74865 guile    RET   dup2 1
 74865 guile    CALL  dup2(5,2)
 74865 guile    RET   dup2 2
 74865 guile    CALL  close(8)
 74865 guile    RET   close -1 errno 9 Bad file descriptor
 74865 guile    CALL  kbind(0x7f7ffffcfa88,24,0x2125923bdf2ca9e)
 74865 guile    RET   kbind 0
 74865 guile    CALL  exit(127)

I guess it's trying to close the fd of the pipe that was closed.

I'm not sure what to do from here, I'm not used to the posix_spawn_*
APIs.  I'm happy to help testing diffs or by providing more info if
needed.


Thanks,

Omar Polo



This bug report was last modified 2 years and 105 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.