GNU bug report logs -
#79334
[PATCH] Don't release thread select lock unnecessarily
Previous Next
Full log
View this message in rfc822 format
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 79334 <at> debbugs.gnu.org, Dmitry Gutov
> <dmitry <at> gutov.dev>
> Date: Fri, 29 Aug 2025 09:07:50 -0400
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
> > However, I don't think I follow the logic behind this change. I agree
> > that status_notify does things that can be dangerous (more about that
> > below), but the call to status_notify will happen whether we call
> > thread_select or pselect directly, so if status_notify can do some
> > damage, it will do that regardless of whether we call pselect directly
> > or not. I think you assume that the problematic call to status_notify
> > will happen in another thread, if we allow switching threads at this
> > point, is that right? IOW, I think you assume that, if we switch to
> > another thread, that other thread could call status_notify, and thus
> > close some descriptors on which the current thread, the one which
> > calls thread_select at this point, is waiting. If that is your
> > assumption, then the actual situation could also be the opposite: the
> > current thread, the one where you want to call pselect, will call
> > status_notify after pselect returns, and that will close some
> > descriptors used by other threads. In that case, calling pselect
> > directly will make things worse, not better, because it makes this
> > closing of descriptors imminent.
>
> Yes. I think there's multiple problems in this code and I haven't
> tracked them all down yet.
>
> That being said, I'm confident that it's incorrect and unnecessary to be
> using thread_select instead of pselect here, so we should stop doing
> that in addition to any other fix.
If we fix status_notify, what other reasons remain for not calling
thread_select there?
> > So, while I agree we have a problem here, I think the place and the
> > way we should solve it are different.
> >
> > Which brings us to status_notify. It indeed can close the descriptors
> > of processes which terminated, so we should probably make it safer.
> > For example, we could leave alone descriptors marked with non-NULL
> > waiting_thread if that thread is still alive and is different from the
> > current thread. Would that fix the problem? If yes, I think we
> > should install such a change.
>
> That was my initial idea for a fix as well. But I'm not sure about the
> specific details. Namely, perhaps we need to be checking waiting_thread
> everywhere that we're doing FOR_EACH_PROCESS, not just in status_notify.
> And maybe some new functions/macros should be added to simplify this;
> maybe a version of FOR_EACH_PROCESS which only iterates over processes
> for which waiting_thread == current_thread.
You may be right, we should audit all the users of FOR_EACH_PROCESS.
At least one of them (get-buffer-process) is probably okay as it is,
but others might need to avoid looking at processes locked to other
threads, indeed.
> > We should perhaps also understand better what happens with the current
> > code if and when descriptors are closed by some other thread in this
> > scenario. A thread using such a descriptor could be either stuck in
> > the pselect call, or it could be before or after the call to pselect.
> > Paul, what happens if a descriptor passed to pselect becomes closed
> > while pselect waits? does pselect return with EBADF, or does pselect
> > handle that gracefully, like by considering such descriptors to be
> > never "ready"?
> >
> > If the thread is before or after pselect, then either calling pselect
> > or attempting to read from a closed descriptor after pselect returns
> > will cause EBADF. Is our code prepared to deal with such a situation
> > (e.g., by ignoring the failed read), or will that cause some Lisp API
> > to signal an error or produce unexpected results? If the latter, we
> > should probably handle EBADF specially, because I don't believe we
> > could completely prevent this from happening.
>
> (I'm more hopeful: I think we can completely prevent this from
> happening, we just need to be more careful about our locking.)
If we can prevent that completely, it's even better.
This bug report was last modified 8 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.