#79334 - [PATCH] Don't release thread select lock unnecessarily

GNU bug report logs - #79334
[PATCH] Don't release thread select lock unnecessarily

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Thu, 28 Aug 2025 21:10:02 UTC

Severity: normal

Tags: patch

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org> To: Spencer Baugh <sbaugh <at> janestreet.com> Cc: 79334 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, dmitry <at> gutov.dev Subject: bug#79334: [PATCH] Don't release thread select lock unnecessarily Date: Fri, 29 Aug 2025 16:36:51 +0300

> From: Spencer Baugh <sbaugh <at> janestreet.com> > Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 79334 <at> debbugs.gnu.org, Dmitry Gutov > <dmitry <at> gutov.dev> > Date: Fri, 29 Aug 2025 09:07:50 -0400 > > Eli Zaretskii <eliz <at> gnu.org> writes: > > > However, I don't think I follow the logic behind this change. I agree > > that status_notify does things that can be dangerous (more about that > > below), but the call to status_notify will happen whether we call > > thread_select or pselect directly, so if status_notify can do some > > damage, it will do that regardless of whether we call pselect directly > > or not. I think you assume that the problematic call to status_notify > > will happen in another thread, if we allow switching threads at this > > point, is that right? IOW, I think you assume that, if we switch to > > another thread, that other thread could call status_notify, and thus > > close some descriptors on which the current thread, the one which > > calls thread_select at this point, is waiting. If that is your > > assumption, then the actual situation could also be the opposite: the > > current thread, the one where you want to call pselect, will call > > status_notify after pselect returns, and that will close some > > descriptors used by other threads. In that case, calling pselect > > directly will make things worse, not better, because it makes this > > closing of descriptors imminent. > > Yes. I think there's multiple problems in this code and I haven't > tracked them all down yet. > > That being said, I'm confident that it's incorrect and unnecessary to be > using thread_select instead of pselect here, so we should stop doing > that in addition to any other fix. If we fix status_notify, what other reasons remain for not calling thread_select there? > > So, while I agree we have a problem here, I think the place and the > > way we should solve it are different. > > > > Which brings us to status_notify. It indeed can close the descriptors > > of processes which terminated, so we should probably make it safer. > > For example, we could leave alone descriptors marked with non-NULL > > waiting_thread if that thread is still alive and is different from the > > current thread. Would that fix the problem? If yes, I think we > > should install such a change. > > That was my initial idea for a fix as well. But I'm not sure about the > specific details. Namely, perhaps we need to be checking waiting_thread > everywhere that we're doing FOR_EACH_PROCESS, not just in status_notify. > And maybe some new functions/macros should be added to simplify this; > maybe a version of FOR_EACH_PROCESS which only iterates over processes > for which waiting_thread == current_thread. You may be right, we should audit all the users of FOR_EACH_PROCESS. At least one of them (get-buffer-process) is probably okay as it is, but others might need to avoid looking at processes locked to other threads, indeed. > > We should perhaps also understand better what happens with the current > > code if and when descriptors are closed by some other thread in this > > scenario. A thread using such a descriptor could be either stuck in > > the pselect call, or it could be before or after the call to pselect. > > Paul, what happens if a descriptor passed to pselect becomes closed > > while pselect waits? does pselect return with EBADF, or does pselect > > handle that gracefully, like by considering such descriptors to be > > never "ready"? > > > > If the thread is before or after pselect, then either calling pselect > > or attempting to read from a closed descriptor after pselect returns > > will cause EBADF. Is our code prepared to deal with such a situation > > (e.g., by ignoring the failed read), or will that cause some Lisp API > > to signal an error or produce unexpected results? If the latter, we > > should probably handle EBADF specially, because I don't believe we > > could completely prevent this from happening. > > (I'm more hopeful: I think we can completely prevent this from > happening, we just need to be more careful about our locking.) If we can prevent that completely, it's even better.

This bug report was last modified 8 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #79334 [PATCH] Don't release thread select lock unnecessarily

GNU bug report logs - #79334
[PATCH] Don't release thread select lock unnecessarily