GNU bug report logs -
#79334
[PATCH] Don't release thread select lock unnecessarily
Previous Next
Full log
View this message in rfc822 format
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Date: Thu, 28 Aug 2025 17:08:43 -0400
>>
>> Previously, we were using thread_select to release the thread
>> lock around two different calls to select in
>> wait_reading_process_output. The main call to select was fine,
>> but we also used thread_select for a pselect call which returns
>> immediately (with a timeout of 0) and is supposed to just check
>> if any file descriptors are active.
>>
>> If we actually thread switch at that pselect, it's likely to
>> break internal state for Emacs: namely, the call to
>> status_notify immediately after can close file descriptors which
>> another thread is selecting on, causing that thread to get EBADF
>> from select and then call emacs_abort.
>>
>> We don't need to thread switch here, so don't.
>>
>> * src/process.c (wait_reading_process_output): Remove
>> unnecessary thread_select wrapper.
>
> Thanks.
>
> However, I don't think I follow the logic behind this change. I agree
> that status_notify does things that can be dangerous (more about that
> below), but the call to status_notify will happen whether we call
> thread_select or pselect directly, so if status_notify can do some
> damage, it will do that regardless of whether we call pselect directly
> or not. I think you assume that the problematic call to status_notify
> will happen in another thread, if we allow switching threads at this
> point, is that right? IOW, I think you assume that, if we switch to
> another thread, that other thread could call status_notify, and thus
> close some descriptors on which the current thread, the one which
> calls thread_select at this point, is waiting. If that is your
> assumption, then the actual situation could also be the opposite: the
> current thread, the one where you want to call pselect, will call
> status_notify after pselect returns, and that will close some
> descriptors used by other threads. In that case, calling pselect
> directly will make things worse, not better, because it makes this
> closing of descriptors imminent.
Yes. I think there's multiple problems in this code and I haven't
tracked them all down yet.
That being said, I'm confident that it's incorrect and unnecessary to be
using thread_select instead of pselect here, so we should stop doing
that in addition to any other fix.
> So, while I agree we have a problem here, I think the place and the
> way we should solve it are different.
>
> Which brings us to status_notify. It indeed can close the descriptors
> of processes which terminated, so we should probably make it safer.
> For example, we could leave alone descriptors marked with non-NULL
> waiting_thread if that thread is still alive and is different from the
> current thread. Would that fix the problem? If yes, I think we
> should install such a change.
That was my initial idea for a fix as well. But I'm not sure about the
specific details. Namely, perhaps we need to be checking waiting_thread
everywhere that we're doing FOR_EACH_PROCESS, not just in status_notify.
And maybe some new functions/macros should be added to simplify this;
maybe a version of FOR_EACH_PROCESS which only iterates over processes
for which waiting_thread == current_thread.
> We should perhaps also understand better what happens with the current
> code if and when descriptors are closed by some other thread in this
> scenario. A thread using such a descriptor could be either stuck in
> the pselect call, or it could be before or after the call to pselect.
> Paul, what happens if a descriptor passed to pselect becomes closed
> while pselect waits? does pselect return with EBADF, or does pselect
> handle that gracefully, like by considering such descriptors to be
> never "ready"?
>
> If the thread is before or after pselect, then either calling pselect
> or attempting to read from a closed descriptor after pselect returns
> will cause EBADF. Is our code prepared to deal with such a situation
> (e.g., by ignoring the failed read), or will that cause some Lisp API
> to signal an error or produce unexpected results? If the latter, we
> should probably handle EBADF specially, because I don't believe we
> could completely prevent this from happening.
(I'm more hopeful: I think we can completely prevent this from
happening, we just need to be more careful about our locking.)
This bug report was last modified 8 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.