#79334 - [PATCH] Don't release thread select lock unnecessarily

GNU bug report logs - #79334
[PATCH] Don't release thread select lock unnecessarily

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Thu, 28 Aug 2025 21:10:02 UTC

Severity: normal

Tags: patch

View this message in rfc822 format

From: Spencer Baugh <sbaugh <at> janestreet.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 79334 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>, Dmitry Gutov <dmitry <at> gutov.dev> Subject: bug#79334: [PATCH] Don't release thread select lock unnecessarily Date: Fri, 29 Aug 2025 09:07:50 -0400

Eli Zaretskii <eliz <at> gnu.org> writes: >> From: Spencer Baugh <sbaugh <at> janestreet.com> >> Date: Thu, 28 Aug 2025 17:08:43 -0400 >> >> Previously, we were using thread_select to release the thread >> lock around two different calls to select in >> wait_reading_process_output. The main call to select was fine, >> but we also used thread_select for a pselect call which returns >> immediately (with a timeout of 0) and is supposed to just check >> if any file descriptors are active. >> >> If we actually thread switch at that pselect, it's likely to >> break internal state for Emacs: namely, the call to >> status_notify immediately after can close file descriptors which >> another thread is selecting on, causing that thread to get EBADF >> from select and then call emacs_abort. >> >> We don't need to thread switch here, so don't. >> >> * src/process.c (wait_reading_process_output): Remove >> unnecessary thread_select wrapper. > > Thanks. > > However, I don't think I follow the logic behind this change. I agree > that status_notify does things that can be dangerous (more about that > below), but the call to status_notify will happen whether we call > thread_select or pselect directly, so if status_notify can do some > damage, it will do that regardless of whether we call pselect directly > or not. I think you assume that the problematic call to status_notify > will happen in another thread, if we allow switching threads at this > point, is that right? IOW, I think you assume that, if we switch to > another thread, that other thread could call status_notify, and thus > close some descriptors on which the current thread, the one which > calls thread_select at this point, is waiting. If that is your > assumption, then the actual situation could also be the opposite: the > current thread, the one where you want to call pselect, will call > status_notify after pselect returns, and that will close some > descriptors used by other threads. In that case, calling pselect > directly will make things worse, not better, because it makes this > closing of descriptors imminent. Yes. I think there's multiple problems in this code and I haven't tracked them all down yet. That being said, I'm confident that it's incorrect and unnecessary to be using thread_select instead of pselect here, so we should stop doing that in addition to any other fix. > So, while I agree we have a problem here, I think the place and the > way we should solve it are different. > > Which brings us to status_notify. It indeed can close the descriptors > of processes which terminated, so we should probably make it safer. > For example, we could leave alone descriptors marked with non-NULL > waiting_thread if that thread is still alive and is different from the > current thread. Would that fix the problem? If yes, I think we > should install such a change. That was my initial idea for a fix as well. But I'm not sure about the specific details. Namely, perhaps we need to be checking waiting_thread everywhere that we're doing FOR_EACH_PROCESS, not just in status_notify. And maybe some new functions/macros should be added to simplify this; maybe a version of FOR_EACH_PROCESS which only iterates over processes for which waiting_thread == current_thread. > We should perhaps also understand better what happens with the current > code if and when descriptors are closed by some other thread in this > scenario. A thread using such a descriptor could be either stuck in > the pselect call, or it could be before or after the call to pselect. > Paul, what happens if a descriptor passed to pselect becomes closed > while pselect waits? does pselect return with EBADF, or does pselect > handle that gracefully, like by considering such descriptors to be > never "ready"? > > If the thread is before or after pselect, then either calling pselect > or attempting to read from a closed descriptor after pselect returns > will cause EBADF. Is our code prepared to deal with such a situation > (e.g., by ignoring the failed read), or will that cause some Lisp API > to signal an error or produce unexpected results? If the latter, we > should probably handle EBADF specially, because I don't believe we > could completely prevent this from happening. (I'm more hopeful: I think we can completely prevent this from happening, we just need to be more careful about our locking.)

This bug report was last modified 8 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #79334 [PATCH] Don't release thread select lock unnecessarily

GNU bug report logs - #79334
[PATCH] Don't release thread select lock unnecessarily