GNU bug report logs - #79334
[PATCH] Don't release thread select lock unnecessarily

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Thu, 28 Aug 2025 21:10:02 UTC

Severity: normal

Tags: patch

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 79334 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, dmitry <at> gutov.dev
Subject: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads
Date: Tue, 02 Sep 2025 14:18:31 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Date: Mon, 1 Sep 2025 15:40:37 -0400
> Cc: 79334 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, Dmitry Gutov <dmitry <at> gutov.dev>
> 
>  > In particular, Lisp programs should be able to call
>  > delete-process on a process without getting an error just because some
>  > thread is currently calling select and happens to be monitoring that
>  > process.  To emit such an error would make programming with processes
>  > and threads extremely difficult.
> 
>  This is a problem two orders of magnitude smaller than what we are
>  trying to fix here.  So my suggestion is to leave it for later.
>  There's no special reason why Lisp programs "should be able" to delete
>  a process locked to another thread.  If we don't want to signal an
>  error (though I don't see why not), we could perhaps do nothing
>  silently, until we get to dealing with that later.
> 
> Silently doing nothing is also an unacceptably confusing result, in my opinion.

Why?  Ignoring invalid actions is not something new to Emacs.

>  > > Hmm... I don't understand: why do we need to do this?  The calls to
>  > > compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which
>  > > set bits in the Available mask passed top pselect already avoid
>  > > setting bits for descriptors on which some other thread waits.  So
>  > > pselect cannot possibly indicate any output ready on any descriptors
>  > > belonging to other threads, and this loop you suggest to add:
>  > >
>  > >> +      /* Save the fds we're currently waiting_thread for.  */
>  > >> +      fd_set waiting_fds;
>  > >> +      FD_ZERO (&waiting_fds);
>  > >> +      int max_waiting_fds = 0;
>  > >> +      for (int fd = 0; fd <= max_desc; ++fd)
>  > >> +        {
>  > >> +    if (fd_callback_info[fd].waiting_thread == current_thread)
>  > >> +      {
>  > >> +        FD_SET (fd, &waiting_fds);
>  > >> +        max_waiting_fds = fd;
>  > >> +      }
>  > >> +  }
>  > >
>  > > should be unnecessary.  Or what am I missing?
>  > 
>  > That's exactly the problem.  A file descriptor for which we set
>  > waiting_thread can have delete_read_fd or delete_write_fd called on it
>  > by another thread, or by a signal handler, while we're waiting in
>  > pselect.
> 
>  We need to prevent that from happening, not to go along with it.  But
>  anyway, I don't see why you need another fd_set for this.  Why not
>  just look at the descriptors in the Available set?
> 
> The Available set is cleared by the select call, and afterwards only contains descriptors which has events.  In
> that case we could just look at the descriptors in Available and check waiting_thread for each of them.

Yes, and we don't care about descriptors that are not ready to be
read.

> But if there was an EBADF, then Available is left in an undefined state (see pselect(1)) and we no longer
> know what descriptors we originally set waiting_thread on.

In that case pselect returns a negative result, no?  And if that
happens, we don't examine the descriptors at all, right?  So why is
this situation a problem we need to handle with this additional fd_set?

> Though, I guess we could alternatively just assume that EBADF is always spurious.

Assume and do what?

>  > BTW, the fact that this problematic case is also possible via signal
>  > handlers (namely the SIGCHLD handler) is a separate bug introduced by
>  > the refactoring which added fd_callback_info.  It means that with a
>  > certain interleaving, I think it should be possible to get buggy
>  > behavior by just using processes without threads.  It's quite rare
>  > though and I haven't been able to figure out what exact path that would
>  > cause a problem.
> 
>  Let's not attempt to solve too many problems at the same time.  How to
>  make SIGCHLD safer when threads are present is an outstanding issue,
>  and the solution will IMO depend in whether signals are delivered to
>  the main thread and whether or not the build uses the child signal
>  pipe.
> 
> You misread me: I said SIGCHLD is unsafe even without threads.

Please describe how.




This bug report was last modified 8 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.