GNU bug report logs -
#79334
[PATCH] Don't release thread select lock unnecessarily
Previous Next
Full log
View this message in rfc822 format
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: 79334 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, dmitry <at> gutov.dev
> Date: Mon, 01 Sep 2025 12:18:37 -0400
>
> > You assume that the waiting_thread field will be cleared when a thread
> > dies, but that is not guaranteed. Why isn't it better to be defensive
> > and make sure the thread is still alive? If you are right, that test
> > will always be true, but what if you are wrong in some rare cases?
>
> Fair enough; the patch has changed a lot so this
> other_thread_is_waiting_for function doesn't exist anymore, but if I
> re-add it or similar checks, I will check that the thread is alive.
Thanks.
> > What I had in mind was neither delete it nor deactivate it. IOW, the
> > loop which goes through the list of processes should only pay
> > attention to processes locked to this thread and those that are not
> > locked to any live thread. It should leave the other processes to
> > when their threads call status_notify.
>
> There are too many ways in which we delete or deactivate processes to do
> that.
We should audit all of them and prevent that from happening when the
process is locked to another thread.
> In particular, Lisp programs should be able to call
> delete-process on a process without getting an error just because some
> thread is currently calling select and happens to be monitoring that
> process. To emit such an error would make programming with processes
> and threads extremely difficult.
This is a problem two orders of magnitude smaller than what we are
trying to fix here. So my suggestion is to leave it for later.
There's no special reason why Lisp programs "should be able" to delete
a process locked to another thread. If we don't want to signal an
error (though I don't see why not), we could perhaps do nothing
silently, until we get to dealing with that later.
For now, sabotaging threads that wait for some sub-process is a much
more grave problem, and we must fix it if we want threads and
processes to live in peace in Emacs.
> > Hmm... I don't understand: why do we need to do this? The calls to
> > compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which
> > set bits in the Available mask passed top pselect already avoid
> > setting bits for descriptors on which some other thread waits. So
> > pselect cannot possibly indicate any output ready on any descriptors
> > belonging to other threads, and this loop you suggest to add:
> >
> >> + /* Save the fds we're currently waiting_thread for. */
> >> + fd_set waiting_fds;
> >> + FD_ZERO (&waiting_fds);
> >> + int max_waiting_fds = 0;
> >> + for (int fd = 0; fd <= max_desc; ++fd)
> >> + {
> >> + if (fd_callback_info[fd].waiting_thread == current_thread)
> >> + {
> >> + FD_SET (fd, &waiting_fds);
> >> + max_waiting_fds = fd;
> >> + }
> >> + }
> >
> > should be unnecessary. Or what am I missing?
>
> That's exactly the problem. A file descriptor for which we set
> waiting_thread can have delete_read_fd or delete_write_fd called on it
> by another thread, or by a signal handler, while we're waiting in
> pselect.
We need to prevent that from happening, not to go along with it. But
anyway, I don't see why you need another fd_set for this. Why not
just look at the descriptors in the Available set?
> BTW, the fact that this problematic case is also possible via signal
> handlers (namely the SIGCHLD handler) is a separate bug introduced by
> the refactoring which added fd_callback_info. It means that with a
> certain interleaving, I think it should be possible to get buggy
> behavior by just using processes without threads. It's quite rare
> though and I haven't been able to figure out what exact path that would
> cause a problem.
Let's not attempt to solve too many problems at the same time. How to
make SIGCHLD safer when threads are present is an outstanding issue,
and the solution will IMO depend in whether signals are delivered to
the main thread and whether or not the build uses the child signal
pipe.
This bug report was last modified 8 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.