GNU bug report logs - #62334
Guile's "sleep pipe" can leak into processes created by 'spawn'

Previous Next

Package: guix;

Reported by: Greg Hogan <code <at> greghogan.com>

Date: Tue, 21 Mar 2023 14:18:02 UTC

Severity: important

Merged with 62765, 63024

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Ludovic Courtès <ludo <at> gnu.org>
To: Greg Hogan <code <at> greghogan.com>
Cc: 63024 <at> debbugs.gnu.org, 62334 <at> debbugs.gnu.org, Josselin Poiret <dev <at> jpoiret.xyz>, Simon Tournier <zimon.toutoune <at> gmail.com>
Subject: bug#62334: bug#63024: Crash during `guix import pypi -r'
Date: Tue, 02 May 2023 23:47:24 +0200
Hello!

Greg Hogan <code <at> greghogan.com> skribis:

> Starting download of /tmp/guix-file.xReGb4
>>From https://web.archive.org/web/20230321140958/https://files.pythonhosted.org/packages/49/01/21337857631a97ab551bb2e3b5691ed6b1b9586011aa6a5355b9694a37fa/demon-2.0.6.tar.gz...
> In procedure connect*: Connection timed out

I can very much reproduce the bug with “guix import pypi -r cdlib” on
39ba8a10971f15264966823e8696d63c2995df86 and thereabouts.  Turns out
it’s a fun story!  (Josselin, I’m sure you’ll love it.)

First, the person who wrote ‘connect*’ *cough* hadn’t carefully read the
Guile manual, which reads (info "(guile) Ports and File Descriptors"):

  Note that ‘select’ may return early for other reasons, for example due
  to pending interrupts.

Clearly, ‘select’ is returning early, so we should check that and loop.
So far, so good.

Now, why is ‘select’ returning immediately?  Because the underlying
select(2) call is passed an extra file descriptor, the “sleep pipe”, as
you can see in ‘scm_std_select’ in libguile, and it’s that file
descriptor, number 3, that’s active at each select(2) call:

--8<---------------cut here---------------start------------->8---
28825 23:06:04 pselect6(15, [3], [14], [14], {tv_sec=10, tv_nsec=0}, NULL) = 1 (in [3], left {tv_sec=9, tv_nsec=999991063}) <0.000079>
28825 23:06:04 read(3, "A", 1)          = 1 <0.000044>
--8<---------------cut here---------------end--------------->8---

Why so much activity on that “sleep pipe”, you ask?  Well, once we’ve
filtered the read(3, …) calls, a pattern appears:

--8<---------------cut here---------------start------------->8---
  14335:28825 23:06:04 read(3, "A", 1)          = 1 <0.000044>
  14344:28825 23:06:04 read(3, "r", 1)          = 1 <0.000121>
  14353:28825 23:06:04 read(3, "c", 1)          = 1 <0.000107>
  14362:28825 23:06:04 read(3, "h", 1)          = 1 <0.000040>
  14436:28825 23:06:04 read(3, "i", 1)          = 1 <0.000049>
  14445:28825 23:06:04 read(3, "v", 1)          = 1 <0.000029>
  14454:28825 23:06:04 read(3, "e", 1)          = 1 <0.000019>
  14463:28825 23:06:04 read(3, ":", 1)          = 1 <0.000017>
  14472:28825 23:06:04 read(3, " ", 1)          = 1 <0.000019>
  14481:28825 23:06:04 read(3, " ", 1)          = 1 <0.000020>
  14490:28825 23:06:04 read(3, "/", 1)          = 1 <0.000019>
  14499:28825 23:06:04 read(3, "t", 1)          = 1 <0.000015>
  14508:28825 23:06:04 read(3, "m", 1)          = 1 <0.000018>
  15001:28825 23:06:04 read(3, "p", 1)          = 1 <0.000015>
  15010:28825 23:06:04 read(3, "/", 1)          = 1 <0.000016>
  15019:28825 23:06:04 read(3, "g", 1)          = 1 <0.000016>
  15028:28825 23:06:04 read(3, "u", 1)          = 1 <0.000015>
  15037:28825 23:06:04 read(3, "i", 1)          = 1 <0.000015>
  15046:28825 23:06:04 read(3, "x", 1)          = 1 <0.000015>
  15055:28825 23:06:04 read(3, "-", 1)          = 1 <0.000015>
--8<---------------cut here---------------end--------------->8---

See?  Who’s chatting over our sleep pipe?

--8<---------------cut here---------------start------------->8---
   4806:28831 23:06:03 dup2(0, 3)               = 3 <0.000011>
   4807:28831 23:06:03 dup2(3, 0)               = 0 <0.000011>
   4808:28831 23:06:03 dup2(4, 1)               = 1 <0.000012>
   4809:28831 23:06:03 dup2(5, 2)               = 2 <0.000010>
   4810:28831 23:06:03 close(3)                 = 0 <0.000010>
   4811:28831 23:06:03 close(4)                 = 0 <0.000011>
   4812:28831 23:06:03 close(5)                 = 0 <0.000010>
   4813:28831 23:06:03 close(6)                 = 0 <0.000010>
   4814:28831 23:06:03 close(7)                 = 0 <0.000010>
   4815:28831 23:06:03 close(8)                 = 0 <0.000011>
   4816:28831 23:06:03 close(9)                 = 0 <0.000010>
   4817:28831 23:06:03 close(10)                = 0 <0.000010>
   4818:28831 23:06:03 close(11)                = 0 <0.000011>
   4819:28831 23:06:03 close(12)                = 0 <0.000010>
   4820:28831 23:06:03 close(13)                = 0 <0.000010>
   4821:28831 23:06:03 close(14)                = 0 <0.000011>
   4822:28831 23:06:03 close(15)                = -1 EBADF (Bad file descriptor) <0.000011>
   4823:28831 23:06:03 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1024, rlim_max=4*1024}) = 0 <0.000011>
   4824:28831 23:06:03 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000010>
   4825:28831 23:06:03 execve("/home/ludo/src/guix/scripts/unzip", ["unzip", "/tmp/guix-file.bmholm", "-d", "/tmp/guix-directory.Ok2DAA", "cdlib-0.2.6.dist-info/METADATA"], 0x18a7db0 /* 85 v
   […]
   4938:28831 23:06:03 write(1, "Archive:  /tmp/guix-file.bmholm\n", 32) = 32 <0.000018>
--8<---------------cut here---------------end--------------->8---

‘unzip’!  And all its friends invoked from (guix import pypi):

--8<---------------cut here---------------start------------->8---
        (call-with-temporary-directory
         (lambda (dir)
           (parameterize ((current-error-port (%make-void-port "rw+"))
                          (current-output-port (%make-void-port "rw+")))
             (if (string=? "zip" (file-extension source-url))
                 (invoke "unzip" archive "-d" dir)
                 (invoke "tar" "xf" archive "-C" dir)))
           …))
--8<---------------cut here---------------end--------------->8---

Looks like we shoudn’t dup(4, 1) in the child process, because 4 is the
other end of our sleep pipe.  :-)

The problem exists both in 3.0.9 and Guile ‘main’.  Looks related to the
file descriptor shuffling code in ‘do_spawn’.  WDYT, Josselin?

Ludo’.




This bug report was last modified 255 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.