GNU bug report logs - #75658
Non-deterministic Gash error in ‘gcc-mesboot-4.9.4’

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludovic.courtes <at> inria.fr>

Date: Sat, 18 Jan 2025 22:09:01 UTC

Severity: important

Full log


View this message in rfc822 format

From: Ludovic Courtès <ludo <at> gnu.org>
To: Timothy Sample <samplet <at> ngyro.com>
Cc: 75658 <at> debbugs.gnu.org, Janneke Nieuwenhuizen <janneke <at> gnu.org>
Subject: bug#75658: Non-deterministic Gash error in ‘gcc-mesboot-4.9.4’
Date: Sun, 16 Mar 2025 16:01:14 +0100
Hello Timothy,

Thanks for chiming in.

Timothy Sample <samplet <at> ngyro.com> skribis:

> And here’s a reproducer:
>
>   (let loop ()
>     (define fd #f)
>     (let ((P (open-input-file "/dev/null")))
>       ;; Does not change the revealed count of P.
>       (set! fd (fileno P)))
>     (let ((port (open-input-file "/dev/null")))
>       (dup->port port "r" fd)
>       (close-port port)
>       (loop)))
>
> This results in EBADF in seemingly exactly the same way.  (I had to run
> it a few times: sometimes it runs out of file descriptors first.)  This
> happens on bootstrap Guile (2.0.9) and modern Guile.

Nice reproducer; I fully agree with your analysis.

See3in that ‘install-current-ports!’ creates temporary ports (via ‘dup’)
for no reason since nobody captures their reference and they get GC’d
soon after, I rewrote it like this:

--8<---------------cut here---------------start------------->8---
(define (install-current-ports!)
  "Install all current ports into their usual file descriptors.  For
example, if @code{current-input-port} is a @code{file-port?}, make the
process file descriptor 0 refer to the file open for
@code{current-input-port}.  If any current port is a @code{port?} but
not a @code{file-port?}, its corresponding file descriptor will refer
to @file{/dev/null}."
  ;; XXX: Input/output ports?  Closing other FDs?
  (for-each (lambda (i)
              (gc)  ;to trigger bugs
              (let ((current-port (fd->current-port i)))
                (match (current-port)
                  ((? file-port? port)
                   (dup->fdes port i))
                  (#f #t))))
            (iota *fd-count*)))
--8<---------------cut here---------------end--------------->8---

But this illustrates another problem: in the child process, right before
‘execve’, the finalization thread may be restarted, in which case it
creates a new pipe.

In the example below, the finalization pipe is on FDs 9 and 7, but
‘install-current-ports!’ blindly dups to FD 7, thereby closing one end
of the finalization pipe that was just created:

--8<---------------cut here---------------start------------->8---
23647 pipe2([7, 9], O_CLOEXEC)          = 0
23647 rt_sigprocmask(SIG_BLOCK, ~[], [], 8) = 0
23647 clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f84204b3990, parent_tid=0x7f84204b3990, exit_signal=0, stack=0x7f841fb24000, stack_size=0x98ef80, tls=0x7f84204b36c0} => {parent_tid=[23648]}, 88) = 23648
[…]
23647 write(9, "\0", 1)                 = 1
23648 <... read resumed>"\0", 1)        = 1
23648 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
23648 read(7,  <unfinished ...>
23647 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=35845839}) = 0
23647 dup2(12, 7)                       = 7
23647 fcntl(7, F_GETFL)                 = 0x8002 (flags O_RDWR|O_LARGEFILE)
23647 lseek(7, 0, SEEK_CUR)             = -1 ESPIPE (Illegal seek)
23647 dup2(12, 7)                       = 7
23647 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=35899320}) = 0
23647 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
23647 madvise(0x7f842207c000, 12288, MADV_DONTNEED) = 0
23647 write(9, "\0", 1)                 = 1
23648 <... read resumed>"\0", 1)        = 1
23648 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
23648 read(7,  <unfinished ...>
23647 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=39539830}) = 0
23647 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=39555997}) = 0
23647 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
23647 madvise(0x7f842207c000, 12288, MADV_DONTNEED) = 0
23647 madvise(0x7f8421d74000, 8192, MADV_DONTNEED) = 0
23647 write(9, "\0", 1)                 = -1 EPIPE (Broken pipe)
23647 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=23647, si_uid=1000} ---
--8<---------------cut here---------------end--------------->8---

After that dup2(12, 7) call, writing to the finalization pipe yields
SIGPIPE, which terminates the process (here it corresponds to a subshell
running ‘expr’).

Since we’re going to exec right after fork, we could turn off
finalization around ‘primitive-fork’ such that the child doesn’t attempt
to restart the finalization thread before exec.  The Shepherd has code
like this:

--8<---------------cut here---------------start------------->8---
(define %set-automatic-finalization-enabled?!
  ;; When using a statically-linked Guile, for instance in the initrd, we
  ;; cannot resolve this symbol, but most of the time we don't need it
  ;; anyway.  Thus, delay it.
  (let ((proc (delay
                (pointer->procedure int
                                    (dynamic-func
                                     "scm_set_automatic_finalization_enabled"
                                     (dynamic-link))
                                    (list int)))))
    (lambda (enabled?)
      "Switch on or off automatic finalization in a separate thread.
Turning finalization off shuts down the finalization thread as a side effect."
      (->bool ((force proc) (if enabled? 1 0))))))

(define-syntax-rule (without-automatic-finalization exp ...)
  "Turn off automatic finalization within the dynamic extent of EXP."
  (let ((enabled? #t))
    (dynamic-wind
      (lambda ()
        (set! enabled? (%set-automatic-finalization-enabled?! #f)))
      (lambda ()
        exp ...)
      (lambda ()
        (%set-automatic-finalization-enabled?! enabled?)))))
--8<---------------cut here---------------end--------------->8---

Problem is, we cannot use the FFI on the statically-linked Guile.

We could implement fork+exec in C, but we don’t have a C compiler at
this early bootstrap stage.

Thoughts?

Ludo’.




This bug report was last modified 35 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.