GNU bug report logs -
#79334
[PATCH] Don't release thread select lock unnecessarily
Previous Next
Full log
View this message in rfc822 format
Eli Zaretskii <eliz <at> gnu.org> writes:
>> > > Hmm... I don't understand: why do we need to do this? The calls to
>> > > compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which
>> > > set bits in the Available mask passed top pselect already avoid
>> > > setting bits for descriptors on which some other thread waits. So
>> > > pselect cannot possibly indicate any output ready on any descriptors
>> > > belonging to other threads, and this loop you suggest to add:
>> > >
>> > >> + /* Save the fds we're currently waiting_thread for. */
>> > >> + fd_set waiting_fds;
>> > >> + FD_ZERO (&waiting_fds);
>> > >> + int max_waiting_fds = 0;
>> > >> + for (int fd = 0; fd <= max_desc; ++fd)
>> > >> + {
>> > >> + if (fd_callback_info[fd].waiting_thread == current_thread)
>> > >> + {
>> > >> + FD_SET (fd, &waiting_fds);
>> > >> + max_waiting_fds = fd;
>> > >> + }
>> > >> + }
>> > >
>> > > should be unnecessary. Or what am I missing?
>> >
>> > That's exactly the problem. A file descriptor for which we set
>> > waiting_thread can have delete_read_fd or delete_write_fd called on it
>> > by another thread, or by a signal handler, while we're waiting in
>> > pselect.
>>
>> We need to prevent that from happening, not to go along with it. But
>> anyway, I don't see why you need another fd_set for this. Why not
>> just look at the descriptors in the Available set?
>>
>> The Available set is cleared by the select call, and afterwards only contains descriptors which has events. In
>> that case we could just look at the descriptors in Available and check waiting_thread for each of them.
>
> Yes, and we don't care about descriptors that are not ready to be
> read.
Right.
>> But if there was an EBADF, then Available is left in an undefined state (see pselect(1)) and we no longer
>> know what descriptors we originally set waiting_thread on.
>
> In that case pselect returns a negative result, no? And if that
> happens, we don't examine the descriptors at all, right? So why is
> this situation a problem we need to handle with this additional fd_set?
Because we currently call emacs_abort when we get EBADF:
if (nfds < 0)
{
if (xerrno == EINTR)
no_avail = 1;
else if (xerrno == EBADF)
emacs_abort ();
else
report_file_errno ("Failed select", Qnil, xerrno);
}
>> Though, I guess we could alternatively just assume that EBADF is always spurious.
>
> Assume and do what?
Follow one of the other two branches in the above snippet of code:
- Treat EBADF the same way we do EINTR: there were no events on any fds, so
we continue running wait_reading_process_output.
- Treat EBADF the same way we treat other errno values: signal an error.
That would at least be better than calling emacs_abort.
>> > BTW, the fact that this problematic case is also possible via signal
>> > handlers (namely the SIGCHLD handler) is a separate bug introduced by
>> > the refactoring which added fd_callback_info. It means that with a
>> > certain interleaving, I think it should be possible to get buggy
>> > behavior by just using processes without threads. It's quite rare
>> > though and I haven't been able to figure out what exact path that would
>> > cause a problem.
>>
>> Let's not attempt to solve too many problems at the same time. How to
>> make SIGCHLD safer when threads are present is an outstanding issue,
>> and the solution will IMO depend in whether signals are delivered to
>> the main thread and whether or not the build uses the child signal
>> pipe.
>>
>> You misread me: I said SIGCHLD is unsafe even without threads.
>
> Please describe how.
1. Call wait_reading_process_output
2. wait_reading_process_output calls compute_input_wait_mask, including
SOME-FD in the mask.
3. Receive SIGCHLD, calling handle_child_signal
4. handle_child_signal calls delete_read_fd for SOME-FD
5. Signal handler finishes
6. wait_reading_process_output calls select and returns events for
SOME-FD, even though SOME-FD has been deleted from fd_callback_info
and the process associated with it has died. It seems possible that
this can cause an issue in some code path.
This couldn't happen before the refactoring which added
fd_callback_info, because handle_child_signal directly mutated the
fd_set which is passed to pselect.
This bug report was last modified 8 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.