GNU bug report logs -
#79334
[PATCH] Don't release thread select lock unnecessarily
Previous Next
Full log
View this message in rfc822 format
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Date: Mon, 1 Sep 2025 15:40:37 -0400
> Cc: 79334 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, Dmitry Gutov <dmitry <at> gutov.dev>
>
> > In particular, Lisp programs should be able to call
> > delete-process on a process without getting an error just because some
> > thread is currently calling select and happens to be monitoring that
> > process. To emit such an error would make programming with processes
> > and threads extremely difficult.
>
> This is a problem two orders of magnitude smaller than what we are
> trying to fix here. So my suggestion is to leave it for later.
> There's no special reason why Lisp programs "should be able" to delete
> a process locked to another thread. If we don't want to signal an
> error (though I don't see why not), we could perhaps do nothing
> silently, until we get to dealing with that later.
>
> Silently doing nothing is also an unacceptably confusing result, in my opinion.
Why? Ignoring invalid actions is not something new to Emacs.
> > > Hmm... I don't understand: why do we need to do this? The calls to
> > > compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which
> > > set bits in the Available mask passed top pselect already avoid
> > > setting bits for descriptors on which some other thread waits. So
> > > pselect cannot possibly indicate any output ready on any descriptors
> > > belonging to other threads, and this loop you suggest to add:
> > >
> > >> + /* Save the fds we're currently waiting_thread for. */
> > >> + fd_set waiting_fds;
> > >> + FD_ZERO (&waiting_fds);
> > >> + int max_waiting_fds = 0;
> > >> + for (int fd = 0; fd <= max_desc; ++fd)
> > >> + {
> > >> + if (fd_callback_info[fd].waiting_thread == current_thread)
> > >> + {
> > >> + FD_SET (fd, &waiting_fds);
> > >> + max_waiting_fds = fd;
> > >> + }
> > >> + }
> > >
> > > should be unnecessary. Or what am I missing?
> >
> > That's exactly the problem. A file descriptor for which we set
> > waiting_thread can have delete_read_fd or delete_write_fd called on it
> > by another thread, or by a signal handler, while we're waiting in
> > pselect.
>
> We need to prevent that from happening, not to go along with it. But
> anyway, I don't see why you need another fd_set for this. Why not
> just look at the descriptors in the Available set?
>
> The Available set is cleared by the select call, and afterwards only contains descriptors which has events. In
> that case we could just look at the descriptors in Available and check waiting_thread for each of them.
Yes, and we don't care about descriptors that are not ready to be
read.
> But if there was an EBADF, then Available is left in an undefined state (see pselect(1)) and we no longer
> know what descriptors we originally set waiting_thread on.
In that case pselect returns a negative result, no? And if that
happens, we don't examine the descriptors at all, right? So why is
this situation a problem we need to handle with this additional fd_set?
> Though, I guess we could alternatively just assume that EBADF is always spurious.
Assume and do what?
> > BTW, the fact that this problematic case is also possible via signal
> > handlers (namely the SIGCHLD handler) is a separate bug introduced by
> > the refactoring which added fd_callback_info. It means that with a
> > certain interleaving, I think it should be possible to get buggy
> > behavior by just using processes without threads. It's quite rare
> > though and I haven't been able to figure out what exact path that would
> > cause a problem.
>
> Let's not attempt to solve too many problems at the same time. How to
> make SIGCHLD safer when threads are present is an outstanding issue,
> and the solution will IMO depend in whether signals are delivered to
> the main thread and whether or not the build uses the child signal
> pipe.
>
> You misread me: I said SIGCHLD is unsafe even without threads.
Please describe how.
This bug report was last modified 8 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.