GNU bug report logs -
#79334
[PATCH] Don't release thread select lock unnecessarily
Previous Next
Full log
Message #86 received at 79334 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Cc: 79334 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, dmitry <at> gutov.dev
>> Date: Tue, 02 Sep 2025 09:38:15 -0400
>>
>> >> But if there was an EBADF, then Available is left in an undefined state (see pselect(1)) and we no longer
>> >> know what descriptors we originally set waiting_thread on.
>> >
>> > In that case pselect returns a negative result, no? And if that
>> > happens, we don't examine the descriptors at all, right? So why is
>> > this situation a problem we need to handle with this additional fd_set?
>>
>> Because we currently call emacs_abort when we get EBADF:
>
> So, if the danger of getting EBADF due to threads-and-processes
> mishaps is real (I'm saying "if" because I don't think we had bug
> reports about that),
It is real. I have seen it plenty at my site. I described this issue
in my initial email. It is the issue I care about solving. This bug is
my report of the issue. Solving it is my highest priority.
> we should change how we treat EBADF in that case. Which should
> probably be a separate discussion anyway. In any case, the distinct
> return value of pselect in that case means that none of the code which
> handles other return values should be bothered about this possibility.
>
>> >> Though, I guess we could alternatively just assume that EBADF is always spurious.
>> >
>> > Assume and do what?
>>
>> Follow one of the other two branches in the above snippet of code:
>>
>> - Treat EBADF the same way we do EINTR: there were no events on any fds, so
>> we continue running wait_reading_process_output.
>>
>> - Treat EBADF the same way we treat other errno values: signal an error.
>> That would at least be better than calling emacs_abort.
>
> Let's discuss this issue separately, okay? We are already trying to
> cover too many different scenarios, which makes this discussion hard
> to follow and the probability of losing focus high.
It's not a separate issue, it's the entire point of this bug. So yes,
let's stay focused on this issue and not other issues.
Which of these two options do you think we should go with?
Or there are other options. For example, we could detect the cause of
the EBADF by remembering the original Available set before calling
pselect, and checking if the fd_callback_info entry for any of those
file descriptors was cleared.
>> >> You misread me: I said SIGCHLD is unsafe even without threads.
>> >
>> > Please describe how.
>>
>> 1. Call wait_reading_process_output
>> 2. wait_reading_process_output calls compute_input_wait_mask, including
>> SOME-FD in the mask.
>> 3. Receive SIGCHLD, calling handle_child_signal
>> 4. handle_child_signal calls delete_read_fd for SOME-FD
>> 5. Signal handler finishes
>> 6. wait_reading_process_output calls select and returns events for
>> SOME-FD, even though SOME-FD has been deleted from fd_callback_info
>> and the process associated with it has died. It seems possible that
>> this can cause an issue in some code path.
>>
>> This couldn't happen before the refactoring which added
>> fd_callback_info, because handle_child_signal directly mutated the
>> fd_set which is passed to pselect.
>
> Are you talking about systems that don't use the "child signal pipe"?
No. This is on GNU/Linux.
> Because if that pipe is in use, then the SIGCHLD handler writes to it,
> and that will cause pselect to return with that descriptor in
> Available, and we then handle the death of the process. In addition,
> delete_read_fd just clears the data from the fd_callback_info of the
> descriptor, and the loop that examines the bits in Available is AFAICT
> careful enough to test these fields of the callback info before doing
> anything because the descriptor was returned by pselect as ready. So
> what kind of problems could have happened in step 6 above?
I'm not sure.
> And once again, this is a separate issue, so let's discuss it
> separately.
Okay. Let me just say again that I suspect there are bugs here which
can be triggered by both threads and signal handlers. We can leave it
at that.
This bug report was last modified 8 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.