From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 28 17:09:11 2025 Received: (at submit) by debbugs.gnu.org; 28 Aug 2025 21:09:11 +0000 Received: from localhost ([127.0.0.1]:40303 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1urjrn-0000Mb-CD for submit@debbugs.gnu.org; Thu, 28 Aug 2025 17:09:10 -0400 Received: from lists.gnu.org ([2001:470:142::17]:53598) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1urjrb-0000K2-2Z for submit@debbugs.gnu.org; Thu, 28 Aug 2025 17:09:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1urjrV-0007cx-G4 for bug-gnu-emacs@gnu.org; Thu, 28 Aug 2025 17:08:49 -0400 Received: from mxout5.mail.janestreet.com ([64.215.233.18]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1urjrS-0006yi-Hf for bug-gnu-emacs@gnu.org; Thu, 28 Aug 2025 17:08:49 -0400 From: Spencer Baugh To: bug-gnu-emacs@gnu.org Subject: [PATCH] Don't release thread select lock unnecessarily X-Debbugs-Cc: Dmitry Gutov Date: Thu, 28 Aug 2025 17:08:43 -0400 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1756415324; bh=oG6k7arttFbIy05aU6tB9IVuhq7uUKY56UKJ/ZvUbhw=; h=From:To:Subject:Date; b=FmijEXYwTC3zYww9Ot1knKLIt5dOgbBu5C8n6nB4eThxiXbt2u065WfO/Ihhf2c0M FNUhK6MaFrjBI8ZbDs44PcAzaLwzBPnJ8YePPd/roklIEnY59NV5k2y7R5VaYzsIyZ 4sa3UmX8bdr5zaYwq4c1n701LUiUnqXzFzPts6GvHy+ymxlDy8qZINZjvz/Z5AG0Fj fUmKia1R5AcjseqKlhnlyhKaE2LHC6zu4nqClpYW4KLXkTfgmsrPEr5v0yqrxKm8YC FeRi3erFJz6f5FI+G7sCARZRXfwDVVBXq3bfmzbEjKOnG63r4UeFSlt7iCY8MF6xmZ b6G8cyW2Zvzxw== Received-SPF: pass client-ip=64.215.233.18; envelope-from=sbaugh@janestreet.com; helo=mxout5.mail.janestreet.com X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.1 (/) --=-=-= Content-Type: text/plain Tags: patch Previously, we were using thread_select to release the thread lock around two different calls to select in wait_reading_process_output. The main call to select was fine, but we also used thread_select for a pselect call which returns immediately (with a timeout of 0) and is supposed to just check if any file descriptors are active. If we actually thread switch at that pselect, it's likely to break internal state for Emacs: namely, the call to status_notify immediately after can close file descriptors which another thread is selecting on, causing that thread to get EBADF from select and then call emacs_abort. We don't need to thread switch here, so don't. * src/process.c (wait_reading_process_output): Remove unnecessary thread_select wrapper. In GNU Emacs 30.1.90 (build 29, x86_64-pc-linux-gnu, X toolkit, cairo version 1.15.12, Xaw scroll bars) of 2025-08-26 built on igm-qws-u22796a Repository revision: 54857fe2fb0ed033afcba231f920f8f7fa185333 Repository branch: emacs-30 Windowing system distributor 'The X.Org Foundation', version 11.0.12011000 System Description: Rocky Linux 8.10 (Green Obsidian) Configured using: 'configure --with-x-toolkit=lucid --without-gpm --without-gconf --without-selinux --without-imagemagick --with-modules --with-gif=no --with-cairo --with-rsvg --without-compress-install --with-tree-sitter --with-native-compilation=aot PKG_CONFIG_PATH=/usr/local/home/garnish/libtree-sitter/0.22.6-1/lib/pkgconfig/' --=-=-= Content-Type: text/patch Content-Disposition: attachment; filename=0001-Don-t-release-thread-select-lock-unnecessarily.patch >From 68e7ed37036498a0df0232196c50cced38cdb751 Mon Sep 17 00:00:00 2001 From: Spencer Baugh Date: Thu, 28 Aug 2025 17:06:56 -0400 Subject: [PATCH] Don't release thread select lock unnecessarily Previously, we were using thread_select to release the thread lock around two different calls to select in wait_reading_process_output. The main call to select was fine, but we also used thread_select for a pselect call which returns immediately (with a timeout of 0) and is supposed to just check if any file descriptors are active. If we actually thread switch at that pselect, it's likely to break internal state for Emacs: namely, the call to status_notify immediately after can close file descriptors which another thread is selecting on, causing that thread to get EBADF from select and then call emacs_abort. We don't need to thread switch here, so don't. * src/process.c (wait_reading_process_output): Remove unnecessary thread_select wrapper. --- src/process.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/src/process.c b/src/process.c index c8b70a4174c..b33bebb9a35 100644 --- a/src/process.c +++ b/src/process.c @@ -5513,11 +5513,13 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd, if (0 <= fd) FD_CLR (fd, &Atemp); + /* We're just checking to see if any of these fds are active, + so make pselect return immediately. */ timeout = make_timespec (0, 0); - if ((thread_select (pselect, max_desc + 1, - &Atemp, - (num_pending_connects > 0 ? &Ctemp : NULL), - NULL, &timeout, NULL) + if ((pselect (max_desc + 1, + &Atemp, + (num_pending_connects > 0 ? &Ctemp : NULL), + NULL, &timeout, NULL) <= 0)) { /* It's okay for us to do this and then continue with -- 2.43.7 --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 08:25:17 2025 Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 12:25:18 +0000 Received: from localhost ([127.0.0.1]:41981 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1uryAL-00085W-Te for submit@debbugs.gnu.org; Fri, 29 Aug 2025 08:25:17 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52872) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1uryAF-00083q-TB for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 08:25:11 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uryA9-0008SE-FW; Fri, 29 Aug 2025 08:25:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=A94arxGB1YohBT1rPRd0MPJzmHkTduP0TX8/MPX2v5o=; b=ZdbZ5Nn/xJrU aEk4NFcaVzfyRtuxuqfgddmGDvvJI4fKv05DORjADj5ov2y5jhfQ/olqwx2jXpBDDOvlZXmPF9lZ0 kGsWEXWSVSCcanJtawOz4IovlmlyUP1uHYbyAxDhiukGI+DbvOSSFAh3xYtYT3e/ojkwrD3Kr/PLQ PuOHu8r6vTCl6HfoAVtW15yVDcFLbL+DJeyCb0CAOmvnzDuUCIVSefIi58FRDRxBGMB4p6R5zqJTA /JqHx4tAMWdWyK6Kg9mzaLDEVohACjlV1+u9axBAsEg6+USHxU2SliFLN2mP0ytk+Tl+jw9VzOzzi oDR/mVAWikbfFGOmP5kHrA==; Date: Fri, 29 Aug 2025 15:24:59 +0300 Message-Id: <86cy8es4bo.fsf@gnu.org> From: Eli Zaretskii To: Spencer Baugh , Paul Eggert In-Reply-To: (message from Spencer Baugh on Thu, 28 Aug 2025 17:08:43 -0400) Subject: Re: [PATCH] Don't release thread select lock unnecessarily References: X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Spencer Baugh > Date: Thu, 28 Aug 2025 17:08:43 -0400 > > Previously, we were using thread_select to release the thread > lock around two different calls to select in > wait_reading_process_output. The main call to select was fine, > but we also used thread_select for a pselect call which returns > immediately (with a timeout of 0) and is supposed to just check > if any file descriptors are active. > > If we actually thread switch at that pselect, it's likely to > break internal state for Emacs: namely, the call to > status_notify immediately after can close file descriptors which > another thread is selecting on, causing that thread to get EBADF > from select and then call emacs_abort. > > We don't need to thread switch here, so don't. > > * src/process.c (wait_reading_process_output): Remove > unnecessary thread_select wrapper. Thanks. However, I don't think I follow the logic behind this change. I agree that status_notify does things that can be dangerous (more about that below), but the call to status_notify will happen whether we call thread_select or pselect directly, so if status_notify can do some damage, it will do that regardless of whether we call pselect directly or not. I think you assume that the problematic call to status_notify will happen in another thread, if we allow switching threads at this point, is that right? IOW, I think you assume that, if we switch to another thread, that other thread could call status_notify, and thus close some descriptors on which the current thread, the one which calls thread_select at this point, is waiting. If that is your assumption, then the actual situation could also be the opposite: the current thread, the one where you want to call pselect, will call status_notify after pselect returns, and that will close some descriptors used by other threads. In that case, calling pselect directly will make things worse, not better, because it makes this closing of descriptors imminent. So, while I agree we have a problem here, I think the place and the way we should solve it are different. Which brings us to status_notify. It indeed can close the descriptors of processes which terminated, so we should probably make it safer. For example, we could leave alone descriptors marked with non-NULL waiting_thread if that thread is still alive and is different from the current thread. Would that fix the problem? If yes, I think we should install such a change. We should perhaps also understand better what happens with the current code if and when descriptors are closed by some other thread in this scenario. A thread using such a descriptor could be either stuck in the pselect call, or it could be before or after the call to pselect. Paul, what happens if a descriptor passed to pselect becomes closed while pselect waits? does pselect return with EBADF, or does pselect handle that gracefully, like by considering such descriptors to be never "ready"? If the thread is before or after pselect, then either calling pselect or attempting to read from a closed descriptor after pselect returns will cause EBADF. Is our code prepared to deal with such a situation (e.g., by ignoring the failed read), or will that cause some Lisp API to signal an error or produce unexpected results? If the latter, we should probably handle EBADF specially, because I don't believe we could completely prevent this from happening. From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 09:08:01 2025 Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 13:08:01 +0000 Received: from localhost ([127.0.0.1]:42079 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1urypk-0005qe-1B for submit@debbugs.gnu.org; Fri, 29 Aug 2025 09:08:01 -0400 Received: from mxout5.mail.janestreet.com ([64.215.233.18]:44385) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1urypg-0005pS-Cw for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 09:07:57 -0400 From: Spencer Baugh To: Eli Zaretskii Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily In-Reply-To: <86cy8es4bo.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 29 Aug 2025 15:24:59 +0300") References: <86cy8es4bo.fsf@gnu.org> Date: Fri, 29 Aug 2025 09:07:50 -0400 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1756472870; bh=9nk+1fqBMoMPmgli+EJG7Ou/LigbjYj07QqRvUfgPB0=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=ZHDmYO/AcuoBh9mXcr+309GYVkxFuBbmY9RKc6Fyv6LkVsz+4Ep5zcuwQiriBX5JB +yCu7+yPTZxV9SHKNds/9+JIU/ik8D/k1rIa+7zmNCyigRDAWT5v7IJM8L1dGrL1Jn 74Jn52/evTYEgcBJjnEc6rvZnXOf7VV7lJmuGrwP+KVwynS6myX549r4HXexTXMJJT vcDmWlNNKPMyCefu4cJOlZiUMuiyNB8923qb2EdAPCvJTbYUO7Vu8w53AB2wnbFrgz 7C4lAH+okQetca2iI6Xx0KXWLBCYFPHuthcv6tuBT7RjI3X3f8e0qc+/SCJPrOvZz/ A7+pTM0z1lgJw== X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, Paul Eggert , Dmitry Gutov X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Eli Zaretskii writes: >> From: Spencer Baugh >> Date: Thu, 28 Aug 2025 17:08:43 -0400 >> >> Previously, we were using thread_select to release the thread >> lock around two different calls to select in >> wait_reading_process_output. The main call to select was fine, >> but we also used thread_select for a pselect call which returns >> immediately (with a timeout of 0) and is supposed to just check >> if any file descriptors are active. >> >> If we actually thread switch at that pselect, it's likely to >> break internal state for Emacs: namely, the call to >> status_notify immediately after can close file descriptors which >> another thread is selecting on, causing that thread to get EBADF >> from select and then call emacs_abort. >> >> We don't need to thread switch here, so don't. >> >> * src/process.c (wait_reading_process_output): Remove >> unnecessary thread_select wrapper. > > Thanks. > > However, I don't think I follow the logic behind this change. I agree > that status_notify does things that can be dangerous (more about that > below), but the call to status_notify will happen whether we call > thread_select or pselect directly, so if status_notify can do some > damage, it will do that regardless of whether we call pselect directly > or not. I think you assume that the problematic call to status_notify > will happen in another thread, if we allow switching threads at this > point, is that right? IOW, I think you assume that, if we switch to > another thread, that other thread could call status_notify, and thus > close some descriptors on which the current thread, the one which > calls thread_select at this point, is waiting. If that is your > assumption, then the actual situation could also be the opposite: the > current thread, the one where you want to call pselect, will call > status_notify after pselect returns, and that will close some > descriptors used by other threads. In that case, calling pselect > directly will make things worse, not better, because it makes this > closing of descriptors imminent. Yes. I think there's multiple problems in this code and I haven't tracked them all down yet. That being said, I'm confident that it's incorrect and unnecessary to be using thread_select instead of pselect here, so we should stop doing that in addition to any other fix. > So, while I agree we have a problem here, I think the place and the > way we should solve it are different. > > Which brings us to status_notify. It indeed can close the descriptors > of processes which terminated, so we should probably make it safer. > For example, we could leave alone descriptors marked with non-NULL > waiting_thread if that thread is still alive and is different from the > current thread. Would that fix the problem? If yes, I think we > should install such a change. That was my initial idea for a fix as well. But I'm not sure about the specific details. Namely, perhaps we need to be checking waiting_thread everywhere that we're doing FOR_EACH_PROCESS, not just in status_notify. And maybe some new functions/macros should be added to simplify this; maybe a version of FOR_EACH_PROCESS which only iterates over processes for which waiting_thread == current_thread. > We should perhaps also understand better what happens with the current > code if and when descriptors are closed by some other thread in this > scenario. A thread using such a descriptor could be either stuck in > the pselect call, or it could be before or after the call to pselect. > Paul, what happens if a descriptor passed to pselect becomes closed > while pselect waits? does pselect return with EBADF, or does pselect > handle that gracefully, like by considering such descriptors to be > never "ready"? > > If the thread is before or after pselect, then either calling pselect > or attempting to read from a closed descriptor after pselect returns > will cause EBADF. Is our code prepared to deal with such a situation > (e.g., by ignoring the failed read), or will that cause some Lisp API > to signal an error or produce unexpected results? If the latter, we > should probably handle EBADF specially, because I don't believe we > could completely prevent this from happening. (I'm more hopeful: I think we can completely prevent this from happening, we just need to be more careful about our locking.) From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 09:37:06 2025 Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 13:37:06 +0000 Received: from localhost ([127.0.0.1]:42175 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1urzHs-0001bt-RO for submit@debbugs.gnu.org; Fri, 29 Aug 2025 09:37:06 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51086) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1urzHo-0001aY-6i for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 09:37:02 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1urzHh-0003xm-H4; Fri, 29 Aug 2025 09:36:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=GM5lVVYcgn+bIJ0Q1L7OTwvAGXgIiPlgD6RkljjB9NY=; b=Z5i4iAXaT7Kx YNGedJ9YlYpyNUbBLas4ku5Fv3m/NsibpHqCXsPg/UbGi0u1nujxPcF2VxJ8X+rBHLwhsgfMc0x1l 2jJ1qPR7iH6kfSKo25CTPMR+lQjhK7J7u+I64t4u9LajMgIHH8PwWfSqDx3cmV943YuOmgyM+qY97 nMHDHiFD6AqJxGQxRbK/JXkOrsOotemKFe4ShiqiiBvDBbR65N+Cba85LheByffH2urBe3nQE4QY6 VgMMesX6BA+rGFIki+qnNFO7Zxg8FX6snkK2Ittu/DYWhMrnOiOMfOBy6pCeh4vSsRwS4hGz27Q4t dyu0+FDOk3wkWJkKGDm8gA==; Date: Fri, 29 Aug 2025 16:36:51 +0300 Message-Id: <86a53is0zw.fsf@gnu.org> From: Eli Zaretskii To: Spencer Baugh In-Reply-To: (message from Spencer Baugh on Fri, 29 Aug 2025 09:07:50 -0400) Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily References: <86cy8es4bo.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Spencer Baugh > Cc: Paul Eggert , 79334@debbugs.gnu.org, Dmitry Gutov > > Date: Fri, 29 Aug 2025 09:07:50 -0400 > > Eli Zaretskii writes: > > > However, I don't think I follow the logic behind this change. I agree > > that status_notify does things that can be dangerous (more about that > > below), but the call to status_notify will happen whether we call > > thread_select or pselect directly, so if status_notify can do some > > damage, it will do that regardless of whether we call pselect directly > > or not. I think you assume that the problematic call to status_notify > > will happen in another thread, if we allow switching threads at this > > point, is that right? IOW, I think you assume that, if we switch to > > another thread, that other thread could call status_notify, and thus > > close some descriptors on which the current thread, the one which > > calls thread_select at this point, is waiting. If that is your > > assumption, then the actual situation could also be the opposite: the > > current thread, the one where you want to call pselect, will call > > status_notify after pselect returns, and that will close some > > descriptors used by other threads. In that case, calling pselect > > directly will make things worse, not better, because it makes this > > closing of descriptors imminent. > > Yes. I think there's multiple problems in this code and I haven't > tracked them all down yet. > > That being said, I'm confident that it's incorrect and unnecessary to be > using thread_select instead of pselect here, so we should stop doing > that in addition to any other fix. If we fix status_notify, what other reasons remain for not calling thread_select there? > > So, while I agree we have a problem here, I think the place and the > > way we should solve it are different. > > > > Which brings us to status_notify. It indeed can close the descriptors > > of processes which terminated, so we should probably make it safer. > > For example, we could leave alone descriptors marked with non-NULL > > waiting_thread if that thread is still alive and is different from the > > current thread. Would that fix the problem? If yes, I think we > > should install such a change. > > That was my initial idea for a fix as well. But I'm not sure about the > specific details. Namely, perhaps we need to be checking waiting_thread > everywhere that we're doing FOR_EACH_PROCESS, not just in status_notify. > And maybe some new functions/macros should be added to simplify this; > maybe a version of FOR_EACH_PROCESS which only iterates over processes > for which waiting_thread == current_thread. You may be right, we should audit all the users of FOR_EACH_PROCESS. At least one of them (get-buffer-process) is probably okay as it is, but others might need to avoid looking at processes locked to other threads, indeed. > > We should perhaps also understand better what happens with the current > > code if and when descriptors are closed by some other thread in this > > scenario. A thread using such a descriptor could be either stuck in > > the pselect call, or it could be before or after the call to pselect. > > Paul, what happens if a descriptor passed to pselect becomes closed > > while pselect waits? does pselect return with EBADF, or does pselect > > handle that gracefully, like by considering such descriptors to be > > never "ready"? > > > > If the thread is before or after pselect, then either calling pselect > > or attempting to read from a closed descriptor after pselect returns > > will cause EBADF. Is our code prepared to deal with such a situation > > (e.g., by ignoring the failed read), or will that cause some Lisp API > > to signal an error or produce unexpected results? If the latter, we > > should probably handle EBADF specially, because I don't believe we > > could completely prevent this from happening. > > (I'm more hopeful: I think we can completely prevent this from > happening, we just need to be more careful about our locking.) If we can prevent that completely, it's even better. From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 09:45:18 2025 Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 13:45:18 +0000 Received: from localhost ([127.0.0.1]:42186 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1urzPo-0002nF-CV for submit@debbugs.gnu.org; Fri, 29 Aug 2025 09:45:17 -0400 Received: from mxout5.mail.janestreet.com ([64.215.233.18]:50375) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1urzPk-0002iS-RG for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 09:45:13 -0400 From: Spencer Baugh To: Eli Zaretskii Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily In-Reply-To: <86a53is0zw.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 29 Aug 2025 16:36:51 +0300") References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> Date: Fri, 29 Aug 2025 09:45:07 -0400 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1756475107; bh=PKJZnrnixgrqu84Y0iPcA3eO5ttQIxkzUmsbGEvFx7Y=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=XkLlJR7CpUGqhmAmALvGFx+vAZ8igtC0n0YVZL27IfVAziqh1wUZ/U7bdKX84+YuG BREehTmNSK7gZdyvk4/m5DT/OpaafOf6chcU/txzj3S8P8ICLxKTeSA6uBAd7h3FHo cLXlEp81QQrmbzyzpj7qOFljExbzQw8lMSlMhQkUEg1fuFHrWS7P2aWZRlpQY5/jKn fsT/TD1aA6ry1Ej+FgeSAE68eVV/ZZUS2N+HIgNLFGhRa9TGZyXYTTf3YLYnAttr7t cbJmrYwcT2qvYAh3IuBnFEiz2MqarWqDyYuEVo10zaZPZFQpvOS53pZFNmU2XsTz1r Vktp7mDa83HhA== X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Eli Zaretskii writes: >> From: Spencer Baugh >> Cc: Paul Eggert , 79334@debbugs.gnu.org, Dmitry Gutov >> >> Date: Fri, 29 Aug 2025 09:07:50 -0400 >> >> Eli Zaretskii writes: >> >> > However, I don't think I follow the logic behind this change. I agree >> > that status_notify does things that can be dangerous (more about that >> > below), but the call to status_notify will happen whether we call >> > thread_select or pselect directly, so if status_notify can do some >> > damage, it will do that regardless of whether we call pselect directly >> > or not. I think you assume that the problematic call to status_notify >> > will happen in another thread, if we allow switching threads at this >> > point, is that right? IOW, I think you assume that, if we switch to >> > another thread, that other thread could call status_notify, and thus >> > close some descriptors on which the current thread, the one which >> > calls thread_select at this point, is waiting. If that is your >> > assumption, then the actual situation could also be the opposite: the >> > current thread, the one where you want to call pselect, will call >> > status_notify after pselect returns, and that will close some >> > descriptors used by other threads. In that case, calling pselect >> > directly will make things worse, not better, because it makes this >> > closing of descriptors imminent. >> >> Yes. I think there's multiple problems in this code and I haven't >> tracked them all down yet. >> >> That being said, I'm confident that it's incorrect and unnecessary to be >> using thread_select instead of pselect here, so we should stop doing >> that in addition to any other fix. > > If we fix status_notify, what other reasons remain for not calling > thread_select there? Primarily: Calling thread_select here complicates reasoning about the code. We should not release the lock when not necessary, because it makes it harder to think about possible thread interleavings. And it's not necessary here because this pselect call returns immediately. (The only reason the pselect call wouldn't return immediately is if we released the lock and then switched to another thread) Also, avoiding thread_select here will slightly improve performance by avoiding lock operations. From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 10:01:08 2025 Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 14:01:08 +0000 Received: from localhost ([127.0.0.1]:42576 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1urzf9-0005DB-HL for submit@debbugs.gnu.org; Fri, 29 Aug 2025 10:01:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56786) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1urzf6-0005BZ-5e for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 10:01:05 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1urzf0-0003np-GE; Fri, 29 Aug 2025 10:00:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=BZTU3E0kIckzj2s1rgae/vZF9A3NjXstMJzQXgeDSvw=; b=WT5Qr6gjYOFx Sa7uxPlt9FnFagWnOsC92XGG84RxXWGRVF3tedq89VcxgAQdD/NHyslgS0i9+30R3ghRtSpyEnhA1 q24O88vOsy3mVlXSlhNueeOkzfWjiYvBstvy9TQrNzKvwXro9FGh6KlSxhMVUQ13i2igeaZqE8J+6 fByhtVApuHYHWJkXVtnLtoi9B4hMedFloXRLmvjTYFO2NRbU2O4m/dh3q7kZCnuqLIIHwXxY1sFOj sOyaLqF+3NtKWNMVePr6JjZWkBmeNaZI1vLp+EI9cvIijQ5eZfrpHMK4VBttP7k1NO5z0TeUSJIWt Hl46DInfexcNJm4KCOerCQ==; Date: Fri, 29 Aug 2025 17:00:56 +0300 Message-Id: <867bymrzvr.fsf@gnu.org> From: Eli Zaretskii To: Spencer Baugh In-Reply-To: (message from Spencer Baugh on Fri, 29 Aug 2025 09:45:07 -0400) Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Spencer Baugh > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev > Date: Fri, 29 Aug 2025 09:45:07 -0400 > > Eli Zaretskii writes: > > >> That being said, I'm confident that it's incorrect and unnecessary to be > >> using thread_select instead of pselect here, so we should stop doing > >> that in addition to any other fix. > > > > If we fix status_notify, what other reasons remain for not calling > > thread_select there? > > Primarily: Calling thread_select here complicates reasoning about the > code. We should not release the lock when not necessary, because it > makes it harder to think about possible thread interleavings. And it's > not necessary here because this pselect call returns immediately. How do we define "necessary" in this context? > (The only reason the pselect call wouldn't return immediately is if we > released the lock and then switched to another thread) You seem to assume that a call to thread_select is only necessary when some potentially prolonged operation or wait is expected? But I'm not sure this is correct, because that could prevent other threads from running for too long, and force programmers to inject sleep-for etc. in their programs. By allowing thread switches as frequently as possible lets other threads more opportunities to run (to some extent simulating preemptive scheduling), which I think is a Good Thing overall, no? From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 10:49:45 2025 Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 14:49:46 +0000 Received: from localhost ([127.0.0.1]:42774 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1us0QC-0003Mn-5H for submit@debbugs.gnu.org; Fri, 29 Aug 2025 10:49:45 -0400 Received: from mxout5.mail.janestreet.com ([64.215.233.18]:49683) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1us0Q8-0003LE-FM for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 10:49:42 -0400 From: Spencer Baugh To: Eli Zaretskii Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily In-Reply-To: <867bymrzvr.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 29 Aug 2025 17:00:56 +0300") References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <867bymrzvr.fsf@gnu.org> Date: Fri, 29 Aug 2025 10:49:34 -0400 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1756478974; bh=at7cROYOKajFHWykh5sJEnUx1DFD2SfPsFq6+h/bbDg=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=yQS/z161DuInozzR2waFgcA8AGX5HLUBko+wVk7IFgK/1hbBAMU0cXPr9Ys5X1s/O 04kvkug3gAcz6tTHYs2jXcuWGohpqzlDqeU8dN6i10xtCGADsNCHsty2ws+CNb6U7X YPFO1MMFWLf1TCaxG53oR76pent60LcjOJC8WenQYqkQhxKM28+irVSS/NpXCctgnN G1jkMmwRpLxri80vtqTywZAT+qPRwmuwWy+yLk4S05lhO39OPn7WFXcRnBBjkJWd+N mzE6RkZ+zcn7sHuDBnCclS2IvkKsRa0XLZqSPIc7r3a7z/ConeQFB6FYOmGbkKvHH6 KqPPqXWj/q/PQ== X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Eli Zaretskii writes: >> From: Spencer Baugh >> Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev >> Date: Fri, 29 Aug 2025 09:45:07 -0400 >> >> Eli Zaretskii writes: >> >> >> That being said, I'm confident that it's incorrect and unnecessary to be >> >> using thread_select instead of pselect here, so we should stop doing >> >> that in addition to any other fix. >> > >> > If we fix status_notify, what other reasons remain for not calling >> > thread_select there? >> >> Primarily: Calling thread_select here complicates reasoning about the >> code. We should not release the lock when not necessary, because it >> makes it harder to think about possible thread interleavings. And it's >> not necessary here because this pselect call returns immediately. > > How do we define "necessary" in this context? It's necessary to release the lock around long pselect calls because otherwise other threads will not run while Emacs is waiting for input, which means they probably won't run at all. >> (The only reason the pselect call wouldn't return immediately is if we >> released the lock and then switched to another thread) > > You seem to assume that a call to thread_select is only necessary when > some potentially prolonged operation or wait is expected? But I'm not > sure this is correct, because that could prevent other threads from > running for too long, and force programmers to inject sleep-for > etc. in their programs. By allowing thread switches as frequently as > possible lets other threads more opportunities to run (to some extent > simulating preemptive scheduling), which I think is a Good Thing > overall, no? Three points: - Most importantly: We can only allow thread switches at known safe points. It's not clear that this is a safe point. (Actually, we know it currently isn't, because of the status_notify issue. But I suspect there may be other issues which are harder to analyze) - Adding more thread switches is not necessarily a good thing. It can degrade whole-system performance, because context switching is slow. - wait_reading_process_output already does a thread switch around the pselect, so it's not important for one to happen here. From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 11:41:53 2025 Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 15:41:53 +0000 Received: from localhost ([127.0.0.1]:42921 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1us1Eb-0002by-Ve for submit@debbugs.gnu.org; Fri, 29 Aug 2025 11:41:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:49032) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1us1EX-0002ae-9Q for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 11:41:48 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1us1EP-0003An-4X; Fri, 29 Aug 2025 11:41:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=wwmcgsbcK/TljZ988x0eaBWcKQ36ow9LSegB+I8w3xw=; b=quytDHwMnKjS Li9dCNei+sCJ4ySJXAJVVImUueaEhCqBUs2c2tf1Wh9pbmuwC4D4SmQVdAEHLVhnmQQc5u1KmvauJ xqXQiOfCZNgileh5pdDbacVGJBTteI3hlljEMt0sdgP/m6dc+uUAh4mjzrUmx73zK1ptkfHgX8Z8g AoOHFBKtDl9ePPYHmxikn1VrIiWZ8RZwVGjd3G+cdGNlSfA0xlxPuk597gxccLm0iEnpsoWeS0csO X4I2Yeheyvu/BUXMRcqqGbAzNfiRe9kluGYYJ56HwNP//eBrLChSiAv2DIUhNNZfX1EaNzjjhr5j1 ztvQYLbwCIawrEXPKaP5qA==; Date: Fri, 29 Aug 2025 18:41:00 +0300 Message-Id: <865xe6rv8z.fsf@gnu.org> From: Eli Zaretskii To: Spencer Baugh In-Reply-To: (message from Spencer Baugh on Fri, 29 Aug 2025 10:49:34 -0400) Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <867bymrzvr.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Spencer Baugh > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev > Date: Fri, 29 Aug 2025 10:49:34 -0400 > > Eli Zaretskii writes: > > >> Primarily: Calling thread_select here complicates reasoning about the > >> code. We should not release the lock when not necessary, because it > >> makes it harder to think about possible thread interleavings. And it's > >> not necessary here because this pselect call returns immediately. > > > > How do we define "necessary" in this context? > > It's necessary to release the lock around long pselect calls because > otherwise other threads will not run while Emacs is waiting for input, > which means they probably won't run at all. > > >> (The only reason the pselect call wouldn't return immediately is if we > >> released the lock and then switched to another thread) > > > > You seem to assume that a call to thread_select is only necessary when > > some potentially prolonged operation or wait is expected? But I'm not > > sure this is correct, because that could prevent other threads from > > running for too long, and force programmers to inject sleep-for > > etc. in their programs. By allowing thread switches as frequently as > > possible lets other threads more opportunities to run (to some extent > > simulating preemptive scheduling), which I think is a Good Thing > > overall, no? > > Three points: > > - Most importantly: We can only allow thread switches at known safe > points. It's not clear that this is a safe point. (Actually, we know > it currently isn't, because of the status_notify issue. But I suspect > there may be other issues which are harder to analyze) This reason will be removed once we fix status_notify. > - Adding more thread switches is not necessarily a good thing. It can > degrade whole-system performance, because context switching is slow. > > - wait_reading_process_output already does a thread switch around the > pselect, so it's not important for one to happen here. You basically give the same single reason in several different wordings. I see your point, but my experience with threads in Emacs is the opposite: there seem to be too few thread switch opportunities, so many programs don't work as expected unless one sprinkles sleep-for in various places. For example, try to run something in a non-main thread and then attempt to interact with Emacs in the main thread. IOW, it is quite easy for a Lisp program to hog the global lock and starve the other threads, so each additional thread-switch opportunity is IME a blessing, not a curse. From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 11:43:44 2025 Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 15:43:44 +0000 Received: from localhost ([127.0.0.1]:42926 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1us1GQ-0002rZ-PX for submit@debbugs.gnu.org; Fri, 29 Aug 2025 11:43:44 -0400 Received: from mxout5.mail.janestreet.com ([64.215.233.18]:56661) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1us1GM-0002qE-Nu for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 11:43:40 -0400 Received: from mail-lj1-f199.google.com ([209.85.208.199]) by mxgoog2.mail.janestreet.com with esmtps (TLS1.3:TLS_AES_128_GCM_SHA256:128) (Exim 4.98.2) id 1us1GH-00000001yfd-1bVq for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 11:43:33 -0400 Received: by mail-lj1-f199.google.com with SMTP id 38308e7fff4ca-336b13923d4so7739521fa.1 for <79334@debbugs.gnu.org>; Fri, 29 Aug 2025 08:43:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=google; t=1756482212; x=1757087012; darn=debbugs.gnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=fxdnxATSoWkbDJ8mPjtVVQCjjd9ivKg0YuV9rA5ZMKw=; b=ECsf5kpY0xQv9N0U88hPeXxvpEGt8E/cwTpcg7ct7qDnh92x/G6EN3yv2VzvXFvVOv W3wTdn93/SRAA0B4wU3XkNxoGzgIcDSsmo5xVYOApFDlE719+7f7PnvO9s2q6BSGUF8Q IhBOBvRal9aldN0E/zDdGavyDakuepecxAQ6Q= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1756482213; bh=fxdnxATSoWkbDJ8mPjtVVQCjjd9ivKg0YuV9rA5ZMKw=; h=References:In-Reply-To:From:Date:Subject:To:Cc; b=xMqRpmQve88dSNKv6PVyL7TjBXCCVfhpLSqDYawmPEQyodIt7LMCGtGuUbJhGPEWp hw3d81higT2hffSrXZiq6ldKc9qLz7qxmdsHHeGV9vF2McZ4h7A/FQRdVxYgD2Y3o6 ayWxdVX9y8zSnBS5ZtfyiGYgA8b2GJ78qJ/FkdIfM1zMrT3dcu6jYwTI1/VylbtkuC Ak2A+cttvfkYl/Ta0AM3M96CVYHBV6qT3BY0jji+W1/2tPvi8Ypzl/xPnN3LcfVADm E+ACyVvf7v0CXNAaMeYIrhLS3gRlk6dAMZTQnCfNcGsepvDElzhMfIe+6y+BOepsvh hXebMUPsIiw6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756482212; x=1757087012; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fxdnxATSoWkbDJ8mPjtVVQCjjd9ivKg0YuV9rA5ZMKw=; b=lDXrUd3RuUHcRDSuxs8ThdLu9Q7eAZYzTZvPmMludupKnfcAv7C1RxF+hLYvwwc0v3 G4o9sowt0xpocGFd5PZZkl23RP1ISDHZWVhQMrJBty/ie6iv0EiVuXX1yO1WBDLPXEyR ypAX4x7mIfR3uK3SHbQzW5t1LuUJcguAn02HMp0lJfnSYkz3dpiGKGz0Ws+LeMY2qgOQ hriVjvJAmgeA/cud/gsnswioXOxXe1KCsrPDXw7JSqTFr1nA7LiBWr6J1mCJuGedGaQk WrAX8zipcT47v73KQ2aAtblm0hF72TpPJNVE9JWBx/ObFJWefXE6sbSF+tU/LG9kPF84 ZBpw== X-Gm-Message-State: AOJu0YwBpImaUj+B4W+Ugxy44X95Z9d4tUKy7fDH554uILDatSQLJ9F4 VMNLMoLUvOc6wdxfMULJ/+LZZTmS5xYkJUsXLj8/xcxN1kpx/rAw7ULfeTtj+mLefjmCh5nTsnn zhgGbbfebwiCHo7PLmw3ocA2C0uHyfqslfOszSevu5L7o/389sPV36A7yXZSXMz83US/DfRPrYj l1/VVk5lz9K8zHu8Sa5KHXf8g2dzQlMw== X-Gm-Gg: ASbGncuezZgq3UlqFR+C2EjosvwgTGCpsXpd6wxP2zUm4RF7e20+Ywa68umD0Cf0dOv Qv0XbQ7wspRRAdobRXLZJ6objAd6bxAN8gKTomGy1qb0mAl/RRUC/cTOS2i+1MvzsIaXU5+pFb4 Mu/jFGVxPGf07e/h8GcVwH X-Received: by 2002:a05:651c:f13:b0:32b:7165:d0 with SMTP id 38308e7fff4ca-33650e3f8e8mr63959121fa.10.1756482211833; Fri, 29 Aug 2025 08:43:31 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGvUPUj75CSkNFySI8hcG5Yj7udCRWw2P0gSKoXWkcKfDUIIHklgfxFk5HuT0NEsyS4awNj3cZ98+uWLOIOumk= X-Received: by 2002:a05:651c:f13:b0:32b:7165:d0 with SMTP id 38308e7fff4ca-33650e3f8e8mr63959021fa.10.1756482211398; Fri, 29 Aug 2025 08:43:31 -0700 (PDT) MIME-Version: 1.0 References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <867bymrzvr.fsf@gnu.org> <865xe6rv8z.fsf@gnu.org> In-Reply-To: <865xe6rv8z.fsf@gnu.org> From: Spencer Baugh Date: Fri, 29 Aug 2025 11:43:21 -0400 X-Gm-Features: Ac12FXw3CwyQ9uwCdLgAJcsfw4YfJVjUGXJeXMiq2js0UPK1KUrL3hbsvWwwhaE Message-ID: Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily To: Eli Zaretskii Content-Type: multipart/alternative; boundary="00000000000043dbae063d82e2d0" X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, Dmitry Gutov X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --00000000000043dbae063d82e2d0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Aug 29, 2025, 11:41=E2=80=AFAM Eli Zaretskii wrote: > > From: Spencer Baugh > > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev > > Date: Fri, 29 Aug 2025 10:49:34 -0400 > > > > Eli Zaretskii writes: > > > > >> Primarily: Calling thread_select here complicates reasoning about th= e > > >> code. We should not release the lock when not necessary, because it > > >> makes it harder to think about possible thread interleavings. And > it's > > >> not necessary here because this pselect call returns immediately. > > > > > > How do we define "necessary" in this context? > > > > It's necessary to release the lock around long pselect calls because > > otherwise other threads will not run while Emacs is waiting for input, > > which means they probably won't run at all. > > > > >> (The only reason the pselect call wouldn't return immediately is if = we > > >> released the lock and then switched to another thread) > > > > > > You seem to assume that a call to thread_select is only necessary whe= n > > > some potentially prolonged operation or wait is expected? But I'm no= t > > > sure this is correct, because that could prevent other threads from > > > running for too long, and force programmers to inject sleep-for > > > etc. in their programs. By allowing thread switches as frequently as > > > possible lets other threads more opportunities to run (to some extent > > > simulating preemptive scheduling), which I think is a Good Thing > > > overall, no? > > > > Three points: > > > > - Most importantly: We can only allow thread switches at known safe > > points. It's not clear that this is a safe point. (Actually, we kno= w > > it currently isn't, because of the status_notify issue. But I suspec= t > > there may be other issues which are harder to analyze) > > This reason will be removed once we fix status_notify. > > > - Adding more thread switches is not necessarily a good thing. It can > > degrade whole-system performance, because context switching is slow. > > > > - wait_reading_process_output already does a thread switch around the > > pselect, so it's not important for one to happen here. > > You basically give the same single reason in several different > wordings. > > I see your point, but my experience with threads in Emacs is the > opposite: there seem to be too few thread switch opportunities, so > many programs don't work as expected unless one sprinkles sleep-for in > various places. For example, try to run something in a non-main > thread and then attempt to interact with Emacs in the main thread. > > IOW, it is quite easy for a Lisp program to hog the global lock and > starve the other threads, so each additional thread-switch opportunity > is IME a blessing, not a curse. > Okay, so then how about we delete this unsafe thread_select and put some thread-yield calls around wait_reading_process_output? > --00000000000043dbae063d82e2d0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Fri, Aug 29, 2025, 11:41=E2= =80=AFAM Eli Zaretskii <eliz@gnu.org= > wrote:
> From: Spencer Baug= h <sbaugh@janestreet.com>
> Cc: 79334@debbugs.gnu.org,=C2=A0 eggert@cs.ucla.edu,=C2=A0 = dm= itry@gutov.dev
> Date: Fri, 29 Aug 2025 10:49:34 -0400
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> >> Primarily: Calling thread_select here complicates reasoning a= bout the
> >> code.=C2=A0 We should not release the lock when not necessary= , because it
> >> makes it harder to think about possible thread interleavings.= =C2=A0 And it's
> >> not necessary here because this pselect call returns immediat= ely.
> >
> > How do we define "necessary" in this context?
>
> It's necessary to release the lock around long pselect calls becau= se
> otherwise other threads will not run while Emacs is waiting for input,=
> which means they probably won't run at all.
>
> >> (The only reason the pselect call wouldn't return immedia= tely is if we
> >> released the lock and then switched to another thread)
> >
> > You seem to assume that a call to thread_select is only necessary= when
> > some potentially prolonged operation or wait is expected?=C2=A0 B= ut I'm not
> > sure this is correct, because that could prevent other threads fr= om
> > running for too long, and force programmers to inject sleep-for > > etc. in their programs.=C2=A0 By allowing thread switches as freq= uently as
> > possible lets other threads more opportunities to run (to some ex= tent
> > simulating preemptive scheduling), which I think is a Good Thing<= br> > > overall, no?
>
> Three points:
>
> - Most importantly: We can only allow thread switches at known safe >=C2=A0 =C2=A0points.=C2=A0 It's not clear that this is a safe point= .=C2=A0 (Actually, we know
>=C2=A0 =C2=A0it currently isn't, because of the status_notify issue= .=C2=A0 But I suspect
>=C2=A0 =C2=A0there may be other issues which are harder to analyze)

This reason will be removed once we fix status_notify.

> - Adding more thread switches is not necessarily a good thing.=C2=A0 I= t can
>=C2=A0 =C2=A0degrade whole-system performance, because context switchin= g is slow.
>
> - wait_reading_process_output already does a thread switch around the<= br> >=C2=A0 =C2=A0pselect, so it's not important for one to happen here.=

You basically give the same single reason in several different
wordings.

I see your point, but my experience with threads in Emacs is the
opposite: there seem to be too few thread switch opportunities, so
many programs don't work as expected unless one sprinkles sleep-for in<= br> various places.=C2=A0 For example, try to run something in a non-main
thread and then attempt to interact with Emacs in the main thread.

IOW, it is quite easy for a Lisp program to hog the global lock and
starve the other threads, so each additional thread-switch opportunity
is IME a blessing, not a curse.

Okay, so then how about we delete this unsaf= e thread_select and put some thread-yield calls around wait_reading_process= _output?
--00000000000043dbae063d82e2d0-- From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 11:57:57 2025 Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 15:57:57 +0000 Received: from localhost ([127.0.0.1]:42949 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1us1UC-0004xJ-KN for submit@debbugs.gnu.org; Fri, 29 Aug 2025 11:57:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:49534) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1us1U9-0004wG-R2 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 11:57:54 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1us1U3-00067b-3f; Fri, 29 Aug 2025 11:57:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=hAvlzZKbhrFJaTcWfOhliEnhTcPlkM3JMVMJn+TMCQw=; b=mO+8W4S7kOBI 6CsXgPXsD/ExRCz6g8TCZV5IPLe2TauKfAfcFzydn8IIWPFSpZb8UE3wBcDeiX2KAD2t5sqtbHWyf W6VjlyguEPrpNhWLrcmRUd/HDAvCSc2M0V4U6gpovI46vtbjs39cIstNVdv1GUYXNyDe0UTaWMrlo tDDaKwPK+bvirV3tGbC3L0S6iTS7op0Kr301jrjlqNzDERmZ4XTIy1DcA72J7xylYDG8xM1tV0VRv eOyQRgLL25kiqlQsQ9RhQPn3KbS1fc27vOPYBZ0wDJ+SxGnBrnavj86M7k9QUx1FM61QyEnlHKjwG J3+kmVD40s/7V+6anVQU2w==; Date: Fri, 29 Aug 2025 18:57:33 +0300 Message-Id: <86349aruhe.fsf@gnu.org> From: Eli Zaretskii To: Spencer Baugh In-Reply-To: (message from Spencer Baugh on Fri, 29 Aug 2025 11:43:21 -0400) Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <867bymrzvr.fsf@gnu.org> <865xe6rv8z.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Spencer Baugh > Date: Fri, 29 Aug 2025 11:43:21 -0400 > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, Dmitry Gutov > > Okay, so then how about we delete this unsafe thread_select and put some thread-yield calls around > wait_reading_process_output? I'm not objected to not calling thread_select there, but can we have tests that will show us the gains? Also, I think we should fix the problems with status_notify and other users of FOR_EACH_PROCESS before we make the change with thread_select, because current code in status_notify is definitely wrong, and affects the related behavior. From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 12:11:23 2025 Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 16:11:24 +0000 Received: from localhost ([127.0.0.1]:42987 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1us1hD-0006yr-0J for submit@debbugs.gnu.org; Fri, 29 Aug 2025 12:11:23 -0400 Received: from mxout1.mail.janestreet.com ([38.105.200.78]:49049) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1us1h9-0006xb-S5 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 12:11:21 -0400 Received: from mail-lf1-f70.google.com ([209.85.167.70]) by mxgoog2.mail.janestreet.com with esmtps (TLS1.3:TLS_AES_128_GCM_SHA256:128) (Exim 4.98.2) id 1us1h4-000000021Z0-2Ff0 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 12:11:14 -0400 Received: by mail-lf1-f70.google.com with SMTP id 2adb3069b0e04-55f41cbbb89so1219956e87.3 for <79334@debbugs.gnu.org>; Fri, 29 Aug 2025 09:11:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=google; t=1756483873; x=1757088673; darn=debbugs.gnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=rbb9gJC/mW9AXUi1WMgjOjVqpA/9OaqtV8CQbxbkb3A=; b=IfcwY7FJDWXMoNbEPl6wjXhOR+cYr/AiVpvbh84/+4KzICFbvmXvNfAPlsxhepMX/W twLSVPalPOW90X1nxRdoRkOdYZwZpqhb/8ZszWv0BkZSWhHocXyKmjD1tbfJZBK0095t NqNZd3mfijcnAP4rS+gEQD6FUwmqoR2JCq0Yc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1756483874; bh=rbb9gJC/mW9AXUi1WMgjOjVqpA/9OaqtV8CQbxbkb3A=; h=References:In-Reply-To:From:Date:Subject:To:Cc; b=XilS+77jhf0lQTP8tVj+Wwl1tm1nn3qiRecuGZi9eTfOQgDlweFoamGV/3Duj6aVS XX3RtWm9JcFxuOB3EduE5i/4SIvh6OOZc//Kuf3dXEemesMu8tZVK9OSGQzeU3UMT6 mzOnxqHtzJuiveA8arFBPSWoMD+Q4VYK2ywuoxp0bTmN6O+pWePWZE7SyBcrprxE7M 5Y8TjbZ2ebh4CEuY5lDr6Ah3t5iIXRz2DaeoSRZxGQJrTVhmXAg+DhnunRNyek63Mj tWU4WOEa2URi7vnPx0j/d/ZUuv+JeerVkVwDXS0v4Q3CfzCYRrosq4q4o21+7aIY4W 2FCiAGyHrW0wA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756483873; x=1757088673; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rbb9gJC/mW9AXUi1WMgjOjVqpA/9OaqtV8CQbxbkb3A=; b=uODLg178U8C1oexbDL1u6ujkfjq2xbkfuhPoQJQOWTZLs/SdkNeCwQJ4LaZkTHJ1s3 ATGmd0QAMoN8CVknIC97Tc1fbqYrO6ng7G+TbR0iHokUOE7GscW0u07zXtlWJd6qk58L eAaN4i+1l9hvE2lc7XQz+uanx5OVVS66IkvWYiV15TCWSs07uNefk3tiK6qr+xZQQEWv 5hLSqqlDI2ccCbnbCK8LJsHFQ6HQqKA0uaKZxRA0BbW4CT545RoH8PeSFO12mR2R+Hw2 lDHY5wNfJMcw5awlr8yOEfobqj2u7CfsurWJPtZcAhU9xRj6dVeajbrXqN9PKn54+2rm GaSg== X-Gm-Message-State: AOJu0Yx77Z7pCWaurDM9IP4LkJtEI+QWc64jFdpW/Vt2XVTMRz/F72nE 2HWAyWwc734/vEiTx0+xLhESbGqXe1hkK4Bl2tYm5OocmNBuHstMxyrbk4+GiP+FaM22ai2E0fM tKMQrPuXRV2lgmkp8iv+B9JBfhI2Qtku/AlnwWzMpFhglM79QWE111JcQMDBvAeXeEQQwUqT3L1 Ypm2MjNT/oRnytfvX+tEVft2O3jOuW4w== X-Gm-Gg: ASbGnctGClBsIJG9bAmTGEatLb0m9UlVZmFSMGHFjw1/n8tRDswSNOI2ApqoJw1pv2R 8bV9kgQg2MvmEBMfrpxndsP4kJlKXYP47xigEEklyWeKuIkOvVmBSistSIEDAUFNOcHSKvaP1k/ ev2Sue9ybbtgERGGQ6uUvG X-Received: by 2002:a05:6512:3e09:b0:55f:4746:6202 with SMTP id 2adb3069b0e04-55f47466584mr5235077e87.11.1756483872988; Fri, 29 Aug 2025 09:11:12 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGrhRSkqk/ar9SzOpgIyqc89bzt5v1+WVzT6yswIwUra87PecZsquU6TBama02ql/SfNmrrNWkT3yc/BuKRKNw= X-Received: by 2002:a05:6512:3e09:b0:55f:4746:6202 with SMTP id 2adb3069b0e04-55f47466584mr5235070e87.11.1756483872567; Fri, 29 Aug 2025 09:11:12 -0700 (PDT) MIME-Version: 1.0 References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <867bymrzvr.fsf@gnu.org> <865xe6rv8z.fsf@gnu.org> <86349aruhe.fsf@gnu.org> In-Reply-To: <86349aruhe.fsf@gnu.org> From: Spencer Baugh Date: Fri, 29 Aug 2025 12:11:02 -0400 X-Gm-Features: Ac12FXwaXqkWr-aOstM-1dCSxGLjByIF83ucZOF8ZBjfKxfctAJxVfqp7lW67PU Message-ID: Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily To: Eli Zaretskii Content-Type: multipart/alternative; boundary="000000000000478d8d063d8345d0" X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, Dmitry Gutov X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --000000000000478d8d063d8345d0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Aug 29, 2025, 11:57=E2=80=AFAM Eli Zaretskii wrote: > > From: Spencer Baugh > > Date: Fri, 29 Aug 2025 11:43:21 -0400 > > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, Dmitry Gutov < > dmitry@gutov.dev> > > > > Okay, so then how about we delete this unsafe thread_select and put som= e > thread-yield calls around > > wait_reading_process_output? > > I'm not objected to not calling thread_select there, but can we have > tests that will show us the gains? > The primary gain is that it becomes easier to reason about locking. I'm not sure how to write a test for that :) Also, I think we should fix the problems with status_notify and other > users of FOR_EACH_PROCESS before we make the change with > thread_select, because current code in status_notify is definitely > wrong, and affects the related behavior. > Yes, I'll work on a patch which does both. (I'm not going to bother making a change which doesn't rely on the change to remove the thread_select, because this code is hard enough to reason about already) > --000000000000478d8d063d8345d0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Fri, Aug 29, 2025, 11:57=E2= =80=AFAM Eli Zaretskii <eliz@gnu.org= > wrote:
> From: Spencer Baug= h <sbaugh@janestreet.com>
> Date: Fri, 29 Aug 2025 11:43:21 -0400
> Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, Dmitry Gutov= <dmitry@gutov.dev>
>
> Okay, so then how about we delete this unsafe thread_select and put so= me thread-yield calls around
> wait_reading_process_output?

I'm not objected to not calling thread_select there, but can we have tests that will show us the gains?

The primary gain is that it becomes easie= r to reason about locking.=C2=A0 I'm not sure how to write a test for t= hat :)

Also, I think we should fix the problems with status_notify and other
users of FOR_EACH_PROCESS before we make the change with
thread_select, because current code in status_notify is definitely
wrong, and affects the related behavior.

Yes, I'll work on a patch which= does both.=C2=A0 (I'm not going to bother making a change which doesn&= #39;t rely on the change to remove the thread_select, because this code is = hard enough to reason about already)
--000000000000478d8d063d8345d0-- From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 13:32:06 2025 Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 17:32:06 +0000 Received: from localhost ([127.0.0.1]:43228 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1us2xJ-0001j8-MA for submit@debbugs.gnu.org; Fri, 29 Aug 2025 13:32:06 -0400 Received: from mxout5.mail.janestreet.com ([64.215.233.18]:52887) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1us2xB-0001hA-Uo for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 13:32:02 -0400 From: Spencer Baugh To: Eli Zaretskii Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily In-Reply-To: <86a53is0zw.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 29 Aug 2025 16:36:51 +0300") References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> Date: Fri, 29 Aug 2025 13:31:52 -0400 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1756488712; bh=9IUO6GfrnOPEvuCYo1nlgXF+A7ZmqGgLuMpkqCAo3nM=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=vE7fSDsWizdWfI94QcCdb8RDPOqYKXXQaQ/V1dkkf00Gy6rBxi6iEJEUbTvUAx93A HE2uu9YbFyBYKb3gEWFbGAUTcLd37/YKUMEkqH4s3esxBX8XnwcTWdnHa1NHLW/aEk RlMJjYnR44jypuoroJBF5l7frNcuAfJEs/m6tvO/CDNWjn/xM2CSyd42T8xMaf2jZY PIsQBfKl8JnHWJsmY04bYVTDoC3aAgumuf+TX2jm/lPf1lwFLpb4FWQLIvZ3XQlpWo 8EHxiekZtOkXiBnid6yGiO0VJTIDt6UEw/CXS4+H4+8itpS4JH2xzfnH4h43ARDc1t 97Sj7OaesfPsw== X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Eli Zaretskii writes: >> From: Spencer Baugh >> Cc: Paul Eggert , 79334@debbugs.gnu.org, Dmitry Gutov >> >> Date: Fri, 29 Aug 2025 09:07:50 -0400 >> >> Eli Zaretskii writes: >> > Which brings us to status_notify. It indeed can close the descriptors >> > of processes which terminated, so we should probably make it safer. >> > For example, we could leave alone descriptors marked with non-NULL >> > waiting_thread if that thread is still alive and is different from the >> > current thread. Would that fix the problem? If yes, I think we >> > should install such a change. >> >> That was my initial idea for a fix as well. But I'm not sure about the >> specific details. Namely, perhaps we need to be checking waiting_thread >> everywhere that we're doing FOR_EACH_PROCESS, not just in status_notify. >> And maybe some new functions/macros should be added to simplify this; >> maybe a version of FOR_EACH_PROCESS which only iterates over processes >> for which waiting_thread == current_thread. > > You may be right, we should audit all the users of FOR_EACH_PROCESS. > At least one of them (get-buffer-process) is probably okay as it is, > but others might need to avoid looking at processes locked to other > threads, indeed. While doing this, I had another thought... Do you know if we have any code which handles the fact that delete-process might be called on a process while another thread is calling pselect on its fds? I don't think we do, which suggests that might also be problematic. I'll work on a fix which encompasses that as well. From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 15:50:13 2025 Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 19:50:13 +0000 Received: from localhost ([127.0.0.1]:43404 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1us56w-0005GM-3i for submit@debbugs.gnu.org; Fri, 29 Aug 2025 15:50:13 -0400 Received: from mxout5.mail.janestreet.com ([64.215.233.18]:41087) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1us56s-0005CF-3S for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 15:50:07 -0400 From: Spencer Baugh To: Eli Zaretskii Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily In-Reply-To: (Spencer Baugh's message of "Fri, 29 Aug 2025 13:31:52 -0400") References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> Date: Fri, 29 Aug 2025 15:50:00 -0400 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1756497000; bh=atD57mlVj0ftflEqlmrlBDIpkjm+CYOS/wHD8+LsTVs=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=J8xwHa4+rzs2O/EReG8Ki2LzkjFCVjKVo+0nOOZkcgPk78tP7wxaZlSVgit5S0d8k TraIACQIKoCa5xPTsIRCO6/53L8HaTU9r5vfOu6ORiItWwcJ/ZOm4n7iynW2USyaM1 Ppc/rGE7fWV4AGfCSGxPHOYitc0sg/8GHwiQj644yj4YOPxEa5ym55xodDU2hW9z/J m5/bqQFVbzz8llGaLzP7cDxKe31BLQ48Rjq8YqQpzBuPkY9o+FW8vLjYgeuw3Fhc98 tyDaGjy0LnSCjeJ2ziYF0PbtR9DEHHmQDYnRflkenpZqLE8L2ABJB/JVQzDS33dpD/ b+kHdp0CO9nPA== X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --=-=-= Content-Type: text/plain Spencer Baugh writes: > While doing this, I had another thought... > > Do you know if we have any code which handles the fact that > delete-process might be called on a process while another thread is > calling pselect on its fds? > > I don't think we do, which suggests that might also be problematic. > I'll work on a fix which encompasses that as well. OK, I think this patch fixes the issue. Many different pieces of code in Emacs can close file descriptors. We need to be generically careful to not close file descriptors which some other thread is currently waiting on. In my patch, we do this by letting the waiting_thread itself close those file descriptors, after it returns from select. Unfortunately I don't have a reliable test which produces the failure case, but in this patch I added a fprintf when we close a deferred fd. This prints whenever some thread would have closed an fd that another thread was waiting on. If I run the following program (with emacs -Q --batch): (defun my-break-thread () (let ((proc (make-process :name "foo" :command '("sleep" "1")))) (while (process-live-p proc) (accept-process-output proc 0.05)))) (while t (make-thread #'my-break-thread "thread1") (thread-join (make-thread #'my-break-thread "thread2"))) I get about one print every 10 seconds. Each print is a problem case which has been averted, so I do think this patch fixes the bug. (Obviously, we should remove the fprintf before applying the change) --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-Defer-closing-file-descriptors-used-by-other-threads.patch >From 4dda5e4fff3067cac53d6273ed3bd327437d00c8 Mon Sep 17 00:00:00 2001 From: Spencer Baugh Date: Fri, 29 Aug 2025 15:39:00 -0400 Subject: [PATCH] Defer closing file descriptors used by other threads * src/process.c (other_thread_is_waiting_for): Add. (clear_fd_callback_data): Don't clear WAITING_FOR_CLOSE_FD. (close_process_fd): Set WAITING_FOR_CLOSE_FD for fds with a waiting_thread. (wait_reading_process_output): Close fds with WAITING_FOR_CLOSE_FD. (bug#79334) --- src/process.c | 46 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/src/process.c b/src/process.c index b33bebb9a35..8273ba78095 100644 --- a/src/process.c +++ b/src/process.c @@ -448,7 +448,9 @@ make_lisp_proc (struct Lisp_Process *p) /* This descriptor refers to a process. */ PROCESS_FD = 8, /* A non-blocking connect. Only valid if FOR_WRITE is set. */ - NON_BLOCKING_CONNECT_FD = 16 + NON_BLOCKING_CONNECT_FD = 16, + /* This fd is waiting to be closed. */ + WAITING_FOR_CLOSE_FD = 32, }; static struct fd_callback_data @@ -466,14 +468,30 @@ make_lisp_proc (struct Lisp_Process *p) struct thread_state *waiting_thread; } fd_callback_info[FD_SETSIZE]; +static bool +other_thread_is_waiting_for (struct fd_callback_data* elem) +{ + return elem->waiting_thread != NULL && elem->waiting_thread != current_thread; +} + static void clear_fd_callback_data(struct fd_callback_data* elem) { elem->func = NULL; elem->data = NULL; - elem->flags = 0; elem->thread = NULL; - elem->waiting_thread = NULL; + if (other_thread_is_waiting_for (elem)) + { + /* Preserve WAITING_FOR_CLOSE_FD; see process_close_fd. Leaving + flags non-zero means recompute_max_desc won't reduce max_desc + past this element. */ + elem->flags &= WAITING_FOR_CLOSE_FD; + } + else + { + elem->flags = 0; + elem->waiting_thread = NULL; + } } @@ -2113,7 +2131,11 @@ close_process_fd (int *fd_addr) if (0 <= fd) { *fd_addr = -1; - emacs_close (fd); + if (other_thread_is_waiting_for (&fd_callback_info[fd])) + /* Let waiting_thread handle closing the fd. */ + fd_callback_info[fd].flags = WAITING_FOR_CLOSE_FD; + else + emacs_close (fd); } } @@ -5816,6 +5838,22 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd, /* Make C-g and alarm signals set flags again. */ clear_waiting_for_input (); + /* Close fds which other threads wanted to close. */ + for (int fd = 0; fd <= max_desc; ++fd) + { + if (fd_callback_info[fd].waiting_thread == current_thread + && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD)) + { + fprintf (stderr, "closing deferred fd %d\n", fd); + emacs_close (fd); + /* Ignore any events which happened on this fd. */ + FD_CLR (fd, &Available); + FD_CLR (fd, &Writeok); + fd_callback_info[fd].flags = 0; + } + } + recompute_max_desc (); + /* If we woke up due to SIGWINCH, actually change size now. */ do_pending_window_change (0); -- 2.43.7 --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 16:25:49 2025 Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 20:25:49 +0000 Received: from localhost ([127.0.0.1]:43447 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1us5fQ-00029S-CD for submit@debbugs.gnu.org; Fri, 29 Aug 2025 16:25:49 -0400 Received: from mxout5.mail.janestreet.com ([64.215.233.18]:52051) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1us5fM-00028M-RH for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 16:25:46 -0400 From: Spencer Baugh To: Eli Zaretskii Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads In-Reply-To: (Spencer Baugh's message of "Fri, 29 Aug 2025 15:50:00 -0400") References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> Date: Fri, 29 Aug 2025 16:25:39 -0400 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1756499139; bh=phY/bhhTrwbAP0bEN7nHJIJsOHPuoxkPKqn5Fz+RGJk=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=He+3LH3Y86QjxzJUJNDXkvoR4yQviSTLniSzpanDyeN55+cFXfZ0luTJGz5VEiM6M ZZwPdWhyP72jOT4SxO9ablJ2NenDTTyb3DXOFAHQtyhn1rM2xoXEZU2Rl+LFeYNNfI ZXZxBqnEG41H9ntVVI7W0m3UFDDAYYmDn9JZhGBlyWuknq59UGqJm2XKnWXGpVLN9H eWbGHyuUI5NXVAdLsZNLfN0J8O8GwIOy08dDE9synpYMPob6ZHLU0y6xFflKpgYFtx uUW40U0Y5kK6MOK1udCuV29AMqHRZwpeekAojziNge4NkJJbly8s6EXB4ublqA2is3 VqyahkqRoPSqQ== X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --=-=-= Content-Type: text/plain Spencer Baugh writes: > Spencer Baugh writes: >> While doing this, I had another thought... >> >> Do you know if we have any code which handles the fact that >> delete-process might be called on a process while another thread is >> calling pselect on its fds? >> >> I don't think we do, which suggests that might also be problematic. >> I'll work on a fix which encompasses that as well. > > OK, I think this patch fixes the issue. > > Many different pieces of code in Emacs can close file descriptors. We > need to be generically careful to not close file descriptors which some > other thread is currently waiting on. In my patch, we do this by > letting the waiting_thread itself close those file descriptors, after it > returns from select. > > Unfortunately I don't have a reliable test which produces the failure > case, but in this patch I added a fprintf when we close a deferred fd. > This prints whenever some thread would have closed an fd that another > thread was waiting on. If I run the following program (with emacs -Q > --batch): > > (defun my-break-thread () > (let ((proc (make-process > :name "foo" > :command '("sleep" "1")))) > (while (process-live-p proc) > (accept-process-output proc 0.05)))) > (while t > (make-thread #'my-break-thread "thread1") > (thread-join (make-thread #'my-break-thread "thread2"))) > > I get about one print every 10 seconds. Each print is a problem case > which has been averted, so I do think this patch fixes the bug. > > (Obviously, we should remove the fprintf before applying the change) Of course, I immediately found a bug in my patch (I wasn't clearing waiting_thread after closing a deferred fd). Fixed in the attached. In related news: running process-tests.el in a loop also triggers my "Closing deferred fd" message. Which suggests that src/process-tests.el was probably also suffering from bugs in this area, which could have been causing hangs or flaky failures. Which should now hopefully be fixed. --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-Defer-closing-file-descriptors-used-by-other-threads.patch >From 9e46aa50f5de7d719a6e427fe44bb6b8b8e226fc Mon Sep 17 00:00:00 2001 From: Spencer Baugh Date: Fri, 29 Aug 2025 15:39:00 -0400 Subject: [PATCH] Defer closing file descriptors used by other threads * src/process.c (other_thread_is_waiting_for): Add. (clear_fd_callback_data): Don't clear WAITING_FOR_CLOSE_FD. (close_process_fd): Set WAITING_FOR_CLOSE_FD for fds with a waiting_thread. (wait_reading_process_output): Close fds with WAITING_FOR_CLOSE_FD. (bug#79334) --- src/process.c | 46 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/src/process.c b/src/process.c index b33bebb9a35..9180b92c61f 100644 --- a/src/process.c +++ b/src/process.c @@ -448,7 +448,9 @@ make_lisp_proc (struct Lisp_Process *p) /* This descriptor refers to a process. */ PROCESS_FD = 8, /* A non-blocking connect. Only valid if FOR_WRITE is set. */ - NON_BLOCKING_CONNECT_FD = 16 + NON_BLOCKING_CONNECT_FD = 16, + /* This fd is waiting to be closed. */ + WAITING_FOR_CLOSE_FD = 32, }; static struct fd_callback_data @@ -466,14 +468,30 @@ make_lisp_proc (struct Lisp_Process *p) struct thread_state *waiting_thread; } fd_callback_info[FD_SETSIZE]; +static bool +other_thread_is_waiting_for (struct fd_callback_data* elem) +{ + return elem->waiting_thread != NULL && elem->waiting_thread != current_thread; +} + static void clear_fd_callback_data(struct fd_callback_data* elem) { elem->func = NULL; elem->data = NULL; - elem->flags = 0; elem->thread = NULL; - elem->waiting_thread = NULL; + if (other_thread_is_waiting_for (elem)) + { + /* Preserve WAITING_FOR_CLOSE_FD; see process_close_fd. Leaving + flags non-zero means recompute_max_desc won't reduce max_desc + past this element. */ + elem->flags &= WAITING_FOR_CLOSE_FD; + } + else + { + elem->flags = 0; + elem->waiting_thread = NULL; + } } @@ -2113,7 +2131,11 @@ close_process_fd (int *fd_addr) if (0 <= fd) { *fd_addr = -1; - emacs_close (fd); + if (other_thread_is_waiting_for (&fd_callback_info[fd])) + /* Let waiting_thread handle closing the fd. */ + fd_callback_info[fd].flags = WAITING_FOR_CLOSE_FD; + else + emacs_close (fd); } } @@ -5816,6 +5838,22 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd, /* Make C-g and alarm signals set flags again. */ clear_waiting_for_input (); + /* Close fds which other threads wanted to close. */ + for (int fd = 0; fd <= max_desc; ++fd) + { + if (fd_callback_info[fd].waiting_thread == current_thread + && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD)) + { + fprintf (stderr, "closing deferred fd %d\n", fd); + emacs_close (fd); + /* Ignore any events which happened on this fd. */ + FD_CLR (fd, &Available); + FD_CLR (fd, &Writeok); + clear_fd_callback_data (&fd_callback_info[fd]); + } + } + recompute_max_desc (); + /* If we woke up due to SIGWINCH, actually change size now. */ do_pending_window_change (0); -- 2.43.7 --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 30 02:18:27 2025 Received: (at 79334) by debbugs.gnu.org; 30 Aug 2025 06:18:27 +0000 Received: from localhost ([127.0.0.1]:44269 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1usEuv-0007ZO-F4 for submit@debbugs.gnu.org; Sat, 30 Aug 2025 02:18:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55540) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1usEus-0007YD-VV for 79334@debbugs.gnu.org; Sat, 30 Aug 2025 02:18:24 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1usEul-0000T5-Vv; Sat, 30 Aug 2025 02:18:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=37HdJREzOX0Ff/ItCAWw+CoUcHE8596w5562m4Crwjc=; b=rRiUTqsbfbdG DNuiQG2SrdwiXrjLxp3wy/x0KhLeGYwN3lTZf0QWki/PaWdAi2xHOQjIuDghxLM5EEV20F0DwNt3S A3MEnt/qTQbw3Mn4t0ckSXGDKquEPdmMhcodCmhEiQxIp7Vrl73whdrurxT7WLvzzTgHVG6mWYfjS dF+Zi7JWK46YyhX1YOCmAYX2dn/vA/BbDHr7d+Fj2PHV6SMtCTITaag0SVG2yE3rtvD1JAKKilpLH Tq50GJCTfMotAcCFNyjsYSiQGTyaW0O6xyDadzI/JK1K1hVDcq56TSU9aSsHtKk7ShUj1/64YxUIw bCQnJBUKC1SOkeLTzPjrbg==; Date: Sat, 30 Aug 2025 09:18:12 +0300 Message-Id: <861pots57f.fsf@gnu.org> From: Eli Zaretskii To: Spencer Baugh In-Reply-To: (message from Spencer Baugh on Fri, 29 Aug 2025 15:50:00 -0400) Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Spencer Baugh > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev > Date: Fri, 29 Aug 2025 15:50:00 -0400 > > Many different pieces of code in Emacs can close file descriptors. We > need to be generically careful to not close file descriptors which some > other thread is currently waiting on. In my patch, we do this by > letting the waiting_thread itself close those file descriptors, after it > returns from select. Not literally "when returning from pselect", I hope, but when the process is being deleted _and_ the waiting thread returns from pselect, right? IOW, we don't close the descriptor each time pselecft returns in the right thread, then reopen it before the next call to pselect, because that won't work. > +static bool > +other_thread_is_waiting_for (struct fd_callback_data* elem) > +{ > + return elem->waiting_thread != NULL && elem->waiting_thread != current_thread; > +} This should also check that the waiting thread is still alive. If it isn't, it's okay to close the descriptor. Otherwise, descriptors belonging to threads that exited might never be closed. > @@ -2113,7 +2131,11 @@ close_process_fd (int *fd_addr) > if (0 <= fd) > { > *fd_addr = -1; > - emacs_close (fd); > + if (other_thread_is_waiting_for (&fd_callback_info[fd])) > + /* Let waiting_thread handle closing the fd. */ ^^ Style: 2 spaces between the period and "*/". > + /* Close fds which other threads wanted to close. */ > + for (int fd = 0; fd <= max_desc; ++fd) > + { > + if (fd_callback_info[fd].waiting_thread == current_thread > + && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD)) > + { > + fprintf (stderr, "closing deferred fd %d\n", fd); > + emacs_close (fd); > + /* Ignore any events which happened on this fd. */ > + FD_CLR (fd, &Available); > + FD_CLR (fd, &Writeok); > + fd_callback_info[fd].flags = 0; Why do it here and not when we deactivate processes? Until now we never closed any descriptors inside wait_reading_process_output. More generally, why this design and not a simpler change which only deactivates processes whose I/O is waited by the current thread, as discussed previously? I think this alternative would be easier to understand and reason about, because (as established in another discussion) wait_reading_process_output frequently loops more than we expect it to. Looking at all the callers of close_process_fd, it sounds like it's too low-level to do this. For example, it is called from create_process (including in the forked child) to close the unneeded ends of the pipe, where we probably don't want this. And clear_fd_callback_data is also used for the keyboard input descriptor, which is probably also not relevant. So I wonder whether it's clean to have this logic at such a low level, where the purpose of the descriptor is barely known, and not at a higher level, where we know what each descriptor is used for, and therefore the purpose and the intent of the logic is much more clear. From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 30 02:36:06 2025 Received: (at 79334) by debbugs.gnu.org; 30 Aug 2025 06:36:06 +0000 Received: from localhost ([127.0.0.1]:44312 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1usFBx-0001uC-GY for submit@debbugs.gnu.org; Sat, 30 Aug 2025 02:36:06 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45716) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1usFBt-0001sp-4A for 79334@debbugs.gnu.org; Sat, 30 Aug 2025 02:35:59 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1usFBi-00031Y-1v; Sat, 30 Aug 2025 02:35:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=/P4xvteEzG7SA+MVTLWp3uJca9UwfYBherFneiGL2Xs=; b=oH6m3zEqU/Hj jf5TfcvCz42AIq+luViPRKDd453t8dG/Ce+ScJlvf3YUaiR+sHYqylmyeufeB99QJ/O75EdAY3Xe3 zTLeUg2zDdrFxPuscrPMXgkDXHM/xZxjVIISVL0qgP1s58NMbNT6xmUchKRJvzfe/azQwFTOBU5bD XRI/T73eU3ySz0WoiU4DvD6aKoeLjfA+5eKgGHqrMrCoReBboCefcpxkKE9RUIqvkMGKW4ikG/Y0i zo19wl0k+4XKDyKYLZsYVfayuR7O1HyqclJq/lQX5nCn5XWH4WCErAh7/X/yKQJOtfUKk5Z1G0rja LUjY1+XTef/A4p36sgwjGg==; Date: Sat, 30 Aug 2025 09:35:42 +0300 Message-Id: <86zfbhqptt.fsf@gnu.org> From: Eli Zaretskii To: Spencer Baugh In-Reply-To: (message from Spencer Baugh on Fri, 29 Aug 2025 16:25:39 -0400) Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Spencer Baugh > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev > Date: Fri, 29 Aug 2025 16:25:39 -0400 > > In related news: running process-tests.el in a loop also triggers my > "Closing deferred fd" message. Which suggests that src/process-tests.el > was probably also suffering from bugs in this area, which could have > been causing hangs or flaky failures. Which should now hopefully be > fixed. process-tests uses questionable setup, apart of being based on non-portable assumptions. At the time, I was unable to convince the author of the tests to spell out the assumptions and expectations from what should happen in those situations, without which it was impossible to decide whether some failures on MS-Windows are bugs or just non-portable code that cannot work on Windows. Some of those tests always fail for me on Windows. > + for (int fd = 0; fd <= max_desc; ++fd) > + { > + if (fd_callback_info[fd].waiting_thread == current_thread > + && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD)) ^^^^^^^^^^^^^^^^^^^^^^^ Shouldn't this test that the WAITING_FOR_CLOSE_FD bit is set, not that it is the _only_ bit set? Even if the code is correct today, it might not be future-proof if we add some additional flags at some point. > + /* Ignore any events which happened on this fd. */ > + FD_CLR (fd, &Available); > + FD_CLR (fd, &Writeok); Is this wise? How can we be sure that these events were already handled (presumably, in another thread)? If they were not, we will lose events. What will happen if we don't ignore them here? From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 30 11:55:36 2025 Received: (at 79334) by debbugs.gnu.org; 30 Aug 2025 15:55:36 +0000 Received: from localhost ([127.0.0.1]:49100 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1usNvT-0000cx-To for submit@debbugs.gnu.org; Sat, 30 Aug 2025 11:55:36 -0400 Received: from mxout5.mail.janestreet.com ([64.215.233.18]:59135) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1usNvR-0000cj-7S for 79334@debbugs.gnu.org; Sat, 30 Aug 2025 11:55:34 -0400 From: Spencer Baugh To: Eli Zaretskii Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads In-Reply-To: <86zfbhqptt.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 30 Aug 2025 09:35:42 +0300") References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <86zfbhqptt.fsf@gnu.org> Date: Sat, 30 Aug 2025 11:55:27 -0400 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1756569327; bh=2yHKk7CzlQRcNALXv3lUnlD9q8K1fQQENU6tF+gPOBM=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=P0AG8w3L+sbckKKhzOvJtoqacOkE6VTvYb2XE5/gUSq0I8tT0PYSHXW3grhb7w85U VdtXsTZ72i3IYxR2mJPDUgTdKXfamBEmEvrUW0DD/JotkNdB58MzND51cYaAi1zCmL 9hBNXEB5qwdrWMmV+75Fi4atpaEOxUCPe3k4Ya9IJDD/2hCMzkJblOQne2gRUHm7ol tVCge504xv4fjGv/1grhnu9a7xnRAvBA8PDuIevLpMFwyEfsMOgCt1c85iToX4mQLf JtBqzMXV6XFP05JQkLEztgTkOHFjeU7Ri2eG2ZK3z4eXITx0zaj/O7BGtcyQdn4HI1 6WfLZjmkDXL9g== X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --=-=-= Content-Type: text/plain (I'm responding to both your emails with review feedback here) Eli Zaretskii writes: >> From: Spencer Baugh >> Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev >> Date: Fri, 29 Aug 2025 16:25:39 -0400 >> + for (int fd = 0; fd <= max_desc; ++fd) >> + { >> + if (fd_callback_info[fd].waiting_thread == current_thread >> + && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD)) > ^^^^^^^^^^^^^^^^^^^^^^^ > Shouldn't this test that the WAITING_FOR_CLOSE_FD bit is set, not that > it is the _only_ bit set? Even if the code is correct today, it might > not be future-proof if we add some additional flags at some point. True. Fixed. >> + /* Ignore any events which happened on this fd. */ >> + FD_CLR (fd, &Available); >> + FD_CLR (fd, &Writeok); > > Is this wise? How can we be sure that these events were already > handled (presumably, in another thread)? If they were not, we will > lose events. What will happen if we don't ignore them here? Any events which might appear at this point were already events we were going to lose by closing this file descriptor. For example, if a process produces output just as we do delete-process on it, we never read that output. The only change is that we now have to explicitly ignore those events; before, we silently dropped them at the time we closed the file descriptor. >> Many different pieces of code in Emacs can close file descriptors. We >> need to be generically careful to not close file descriptors which some >> other thread is currently waiting on. In my patch, we do this by >> letting the waiting_thread itself close those file descriptors, after it >> returns from select. > > Not literally "when returning from pselect", I hope, but when the > process is being deleted _and_ the waiting thread returns from > pselect, right? IOW, we don't close the descriptor each time pselecft > returns in the right thread, then reopen it before the next call to > pselect, because that won't work. Right. >> +static bool >> +other_thread_is_waiting_for (struct fd_callback_data* elem) >> +{ >> + return elem->waiting_thread != NULL && elem->waiting_thread != current_thread; >> +} > > This should also check that the waiting thread is still alive. If it > isn't, it's okay to close the descriptor. Otherwise, descriptors > belonging to threads that exited might never be closed. waiting_thread always points to a living thread, because it's only set while a thread is inside wait_reading_process_output. (In fact, when a thread sees waiting_thread set to a value which is neither NULL nor current_thread, it can only ever point to a thread which is currently inside thread_select) >> @@ -2113,7 +2131,11 @@ close_process_fd (int *fd_addr) >> if (0 <= fd) >> { >> *fd_addr = -1; >> - emacs_close (fd); >> + if (other_thread_is_waiting_for (&fd_callback_info[fd])) >> + /* Let waiting_thread handle closing the fd. */ > ^^ > Style: 2 spaces between the period and "*/". Fixed. >> + /* Close fds which other threads wanted to close. */ >> + for (int fd = 0; fd <= max_desc; ++fd) >> + { >> + if (fd_callback_info[fd].waiting_thread == current_thread >> + && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD)) >> + { >> + fprintf (stderr, "closing deferred fd %d\n", fd); >> + emacs_close (fd); >> + /* Ignore any events which happened on this fd. */ >> + FD_CLR (fd, &Available); >> + FD_CLR (fd, &Writeok); >> + fd_callback_info[fd].flags = 0; > > Why do it here and not when we deactivate processes? Because that would close file descriptors that another thread is currently selecting on, causing EBADF. > Until now we never closed any descriptors inside > wait_reading_process_output. Just to be clear, we can call deactivate_process from wait_reading_process_output, through status_notify, through process filters, or other similar. That closes descriptors. > More generally, why this design and not a simpler change which only > deactivates processes whose I/O is waited by the current thread, as > discussed previously? I looked into that, but I determined that it's too complicated. It would require us to keep track of a new kind of process state: a process which has been deleted, but hasn't been deactivated yet. This would need to be supported for each kind of process, and we'd need to record new kinds of state to finish the deactivation later. It's simpler to keep the ability to delete any process at any time, and have that fully deactivate the process as it does today. Doing that only causes one issue: if another thread is selecting on the process's fds, that thread will break if it tries to handle events on the fds, or the thread will simply get EBADF from select. So, we detect that by checking waiting_thread, and marking the fds so that the other thread knows not to touch them, and to instead just close them to finish deleting the process. > I think this alternative would be easier to understand and reason > about, because (as established in another discussion) > wait_reading_process_output frequently loops more than we expect it > to. I'm not sure what you mean by that, why does wait_reading_process_output looping more than we expect cause issues? I don't think it would. > Looking at all the callers of close_process_fd, it sounds like it's > too low-level to do this. For example, it is called from > create_process (including in the forked child) to close the unneeded > ends of the pipe, where we probably don't want this. And > clear_fd_callback_data is also used for the keyboard input descriptor, > which is probably also not relevant. True. I looked into fixing these, and that suggested to me an altogether simpler approach. We can keep the logic entirely contained in wait_reading_process_output, just adding a bit of code around the select call. And we don't need to defer deleting the fds at all: just recognize when it's happened, and be careful to not touch those fds afterwards. Attached. --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-Ignore-descriptors-deleted-during-select.patch >From b37e71f47913e1011c4a7857e166e78eddc2a184 Mon Sep 17 00:00:00 2001 From: Spencer Baugh Date: Sat, 30 Aug 2025 11:42:19 -0400 Subject: [PATCH] Ignore descriptors deleted during select * src/process.c (wait_reading_process_output): Save the fds we're waiting on and check that we're still waiting after select. (bug#79334) --- src/process.c | 37 +++++++++++++++++++++++++++++++++++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/src/process.c b/src/process.c index 1612248a11d..11af2f55f63 100644 --- a/src/process.c +++ b/src/process.c @@ -5643,6 +5643,7 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd, nfds = read_kbd ? 0 : 1; no_avail = 1; FD_ZERO (&Available); + xerrno = 0; } else { @@ -5756,6 +5757,18 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd, && timespec_cmp (short_timeout, timeout) < 0) timeout = short_timeout; #endif + /* Save the fds we're currently waiting_thread for. */ + fd_set waiting_fds; + FD_ZERO (&waiting_fds); + int max_waiting_fds = 0; + for (int fd = 0; fd <= max_desc; ++fd) + { + if (fd_callback_info[fd].waiting_thread == current_thread) + { + FD_SET (fd, &waiting_fds); + max_waiting_fds = fd; + } + } /* Android requires using a replacement for pselect in android.c to poll for events. */ @@ -5809,10 +5822,30 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd, FD_SET (channel, &Available); } #endif + xerrno = errno; + /* Check if any of waiting_fds was closed by another thread or + a signal handler. */ + for (int fd = 0; fd <= max_waiting_fds; ++fd) + { + if (FD_ISSET (fd, &waiting_fds) + && fd_callback_info[fd].waiting_thread != current_thread) + { + /* Ignore any events which we happened to see on this + fd before it was closed; we can't do anything for + it now, since it's closed. */ + FD_CLR (fd, &Available); + FD_CLR (fd, &Writeok); + /* If we got an EBADF and no events, this is probably + why; clear it so we can try again. */ + if (nfds < 0 && xerrno == EBADF) + { + nfds = 0; + xerrno = 0; + } + } + } } - xerrno = errno; - /* Make C-g and alarm signals set flags again. */ clear_waiting_for_input (); -- 2.43.7 --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 30 12:36:17 2025 Received: (at 79334) by debbugs.gnu.org; 30 Aug 2025 16:36:18 +0000 Received: from localhost ([127.0.0.1]:49502 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1usOYq-0003EB-T7 for submit@debbugs.gnu.org; Sat, 30 Aug 2025 12:36:17 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52644) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1usOYk-0003Do-CI for 79334@debbugs.gnu.org; Sat, 30 Aug 2025 12:36:13 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1usOYb-0001nr-6T; Sat, 30 Aug 2025 12:36:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=raezIEp48dBi7BFRaXVbPHQaQPAaSVCnHRK66D8sQDQ=; b=rUlwx0ctE15Z 2iusHfN27meVn8FZwbHrkdjQNfkRZJSg9I8ISUhaC/cu/VJyqAxPQsa/MKwf550Jzv6owE8RZopJJ BWnYBa6FI5nDcDnYzFIWNFcWChRRO6KZl42OmI0l6ODwLmW+ZB30fD7ACzekdPk3vJ9SvmQJZBFx9 62sUeElm29Nm3M+7vDYE4S4URdQCtGHOJRs1df/JT4ZAL74sjnBzEZPLD9yF2+VxRFneCoVC41mtW M4ivDbEpopFnm2H2no78I46/0Ta4wJAtz1vGgCEnvcxe4parbbLZfNw87tqcFpXWQX0a6VViQMbxy 7CZLvWwANgOm21ydcL4w4A==; Date: Sat, 30 Aug 2025 19:35:39 +0300 Message-Id: <86jz2kpy1w.fsf@gnu.org> From: Eli Zaretskii To: Spencer Baugh In-Reply-To: (message from Spencer Baugh on Sat, 30 Aug 2025 11:55:27 -0400) Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <86zfbhqptt.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Spencer Baugh > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev > Date: Sat, 30 Aug 2025 11:55:27 -0400 > > >> + /* Ignore any events which happened on this fd. */ > >> + FD_CLR (fd, &Available); > >> + FD_CLR (fd, &Writeok); > > > > Is this wise? How can we be sure that these events were already > > handled (presumably, in another thread)? If they were not, we will > > lose events. What will happen if we don't ignore them here? > > Any events which might appear at this point were already events we were > going to lose by closing this file descriptor. For example, if a > process produces output just as we do delete-process on it, we never > read that output. The only change is that we now have to explicitly > ignore those events; before, we silently dropped them at the time we > closed the file descriptor. Where did we close it? There's this loop _after_ the place where you add your loop which clears the bits in Available: for (channel = 0; channel <= max_desc; ++channel) { struct fd_callback_data *d = &fd_callback_info[channel]; if (d->func && ((d->flags & FOR_READ && FD_ISSET (channel, &Available)) || ((d->flags & FOR_WRITE) && FD_ISSET (channel, &Writeok)))) d->func (channel, d->data); } AFAIU, it would notice the set bits and act upon them. Now with this new addition we are clearing them ahead of that loop. That doesn't necessarily sound right to me. > >> +static bool > >> +other_thread_is_waiting_for (struct fd_callback_data* elem) > >> +{ > >> + return elem->waiting_thread != NULL && elem->waiting_thread != current_thread; > >> +} > > > > This should also check that the waiting thread is still alive. If it > > isn't, it's okay to close the descriptor. Otherwise, descriptors > > belonging to threads that exited might never be closed. > > waiting_thread always points to a living thread, because it's only set > while a thread is inside wait_reading_process_output. > > (In fact, when a thread sees waiting_thread set to a value which is > neither NULL nor current_thread, it can only ever point to a thread > which is currently inside thread_select) You assume that the waiting_thread field will be cleared when a thread dies, but that is not guaranteed. Why isn't it better to be defensive and make sure the thread is still alive? If you are right, that test will always be true, but what if you are wrong in some rare cases? > >> + if (fd_callback_info[fd].waiting_thread == current_thread > >> + && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD)) > >> + { > >> + fprintf (stderr, "closing deferred fd %d\n", fd); > >> + emacs_close (fd); > >> + /* Ignore any events which happened on this fd. */ > >> + FD_CLR (fd, &Available); > >> + FD_CLR (fd, &Writeok); > >> + fd_callback_info[fd].flags = 0; > > > > Why do it here and not when we deactivate processes? > > Because that would close file descriptors that another thread is > currently selecting on, causing EBADF. No, I mean to deactivate only the processes whose I/O channels are locked to the current thread. > > Until now we never closed any descriptors inside > > wait_reading_process_output. > > Just to be clear, we can call deactivate_process from > wait_reading_process_output, through status_notify, through process > filters, or other similar. That closes descriptors. Yes, and so I suggest doing this logic of deactivating only the processes that are locked to the current thread there. > > More generally, why this design and not a simpler change which only > > deactivates processes whose I/O is waited by the current thread, as > > discussed previously? > > I looked into that, but I determined that it's too complicated. It > would require us to keep track of a new kind of process state: a process > which has been deleted, but hasn't been deactivated yet. What I had in mind was neither delete it nor deactivate it. IOW, the loop which goes through the list of processes should only pay attention to processes locked to this thread and those that are not locked to any live thread. It should leave the other processes to when their threads call status_notify. > > Looking at all the callers of close_process_fd, it sounds like it's > > too low-level to do this. For example, it is called from > > create_process (including in the forked child) to close the unneeded > > ends of the pipe, where we probably don't want this. And > > clear_fd_callback_data is also used for the keyboard input descriptor, > > which is probably also not relevant. > > True. > > I looked into fixing these, and that suggested to me an altogether > simpler approach. We can keep the logic entirely contained in > wait_reading_process_output, just adding a bit of code around the select > call. > > And we don't need to defer deleting the fds at all: just recognize when > it's happened, and be careful to not touch those fds afterwards. Hmm... I don't understand: why do we need to do this? The calls to compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which set bits in the Available mask passed top pselect already avoid setting bits for descriptors on which some other thread waits. So pselect cannot possibly indicate any output ready on any descriptors belonging to other threads, and this loop you suggest to add: > + /* Save the fds we're currently waiting_thread for. */ > + fd_set waiting_fds; > + FD_ZERO (&waiting_fds); > + int max_waiting_fds = 0; > + for (int fd = 0; fd <= max_desc; ++fd) > + { > + if (fd_callback_info[fd].waiting_thread == current_thread) > + { > + FD_SET (fd, &waiting_fds); > + max_waiting_fds = fd; > + } > + } should be unnecessary. Or what am I missing? From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 30 14:39:05 2025 Received: (at 79334) by debbugs.gnu.org; 30 Aug 2025 18:39:05 +0000 Received: from localhost ([127.0.0.1]:49777 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1usQTg-0000rM-HA for submit@debbugs.gnu.org; Sat, 30 Aug 2025 14:39:04 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:56948) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1usQTe-0000qs-5m for 79334@debbugs.gnu.org; Sat, 30 Aug 2025 14:39:02 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1usQTX-0000q6-Ks; Sat, 30 Aug 2025 14:38:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=8LEoYm1gbNW2XxsqKaotoKk4KLK8AoTVTw7bWqdpKeI=; b=D7IEz5XMQNSF b+BSITnsDKLNYKblAPBqkvKWdsgGGPC84kbTgOyhunHlP1VLUW/hhQtoeuztQGUJfiwIDeR5Vl+v4 q2rsS3RfFZpxQT4ZDsZnMLEPjgVOEcTIw46QnohG5bTI/hHHDVldZKfuFBLlAGNkqwVDYXlEPVFWV I9HTaqd1p18pSjHnkBpYjYeRXeso8ZLHO0Jwh0X6LdAXo45uf+ahCvJaSEWOlf/PmH7ZrZLUJz58m VuKtstb7J/jhDRhLIwZk5x7V+dKYVg7HF2W8FTsV8k48yA+L9n6hDRXt9hFTn7nMNwDZY/E//IOgi a/4vxTrHYgN0dW1nDX+wYg==; Date: Sat, 30 Aug 2025 21:38:52 +0300 Message-Id: <86a53gpscj.fsf@gnu.org> From: Eli Zaretskii To: sbaugh@janestreet.com In-Reply-To: <86jz2kpy1w.fsf@gnu.org> (message from Eli Zaretskii on Sat, 30 Aug 2025 19:35:39 +0300) Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <86zfbhqptt.fsf@gnu.org> <86jz2kpy1w.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev > Date: Sat, 30 Aug 2025 19:35:39 +0300 > From: Eli Zaretskii > > > >> + if (fd_callback_info[fd].waiting_thread == current_thread > > >> + && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD)) > > >> + { > > >> + fprintf (stderr, "closing deferred fd %d\n", fd); > > >> + emacs_close (fd); > > >> + /* Ignore any events which happened on this fd. */ > > >> + FD_CLR (fd, &Available); > > >> + FD_CLR (fd, &Writeok); > > >> + fd_callback_info[fd].flags = 0; > > > > > > Why do it here and not when we deactivate processes? > > > > Because that would close file descriptors that another thread is > > currently selecting on, causing EBADF. > > No, I mean to deactivate only the processes whose I/O channels are > locked to the current thread. > > > > Until now we never closed any descriptors inside > > > wait_reading_process_output. > > > > Just to be clear, we can call deactivate_process from > > wait_reading_process_output, through status_notify, through process > > filters, or other similar. That closes descriptors. > > Yes, and so I suggest doing this logic of deactivating only the > processes that are locked to the current thread there. Btw, as a nice bonus of that approach, process sentinel will run in the context of the thread to which the process is locked. From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 30 23:20:39 2025 Received: (at 79334) by debbugs.gnu.org; 31 Aug 2025 03:20:39 +0000 Received: from localhost ([127.0.0.1]:51030 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1usYcR-0003dk-BX for submit@debbugs.gnu.org; Sat, 30 Aug 2025 23:20:39 -0400 Received: from fout-b4-smtp.messagingengine.com ([202.12.124.147]:59649) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1usYcN-0003dW-P7 for 79334@debbugs.gnu.org; Sat, 30 Aug 2025 23:20:37 -0400 Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfout.stl.internal (Postfix) with ESMTP id 0AFD41D00079; Sat, 30 Aug 2025 23:20:30 -0400 (EDT) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-06.internal (MEProxy); Sat, 30 Aug 2025 23:20:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1756610429; x=1756696829; bh=kBqdDm/J9GTebJbtyuW2zpOIxB1rhdME0Tm8bp3Ab7I=; b= fOPGnDhG0mGXGFYEBlMFdy8CpNsCzgDZzndt6JI5CKk+TdK3Pt3MqGgOsChq56An qjc/44gE8+2oRItg4kJyyrht/6GF1yMIUxQ0SPkrXSw6QUtV3OfXaABaf8rALsrl U1rcy3VyTdnRP1FrEFdDBQgDrsPCzLW3qvCQX1PBmi76rIjSaC5Ai+mDYIyXUuYK 8prpcK5IeiGOiA03Gwl6FG5i6xHr/vNbIY9MQhf0Vwjp8AcvT2SFE8uf88T7XHLj Hs+s3BSOIdH6/JpzhsGzIVFxt3oO2aBw6phKV39mzyJZwPeFCiXDAJkABqDplJ6s SdnrxETMTl6CMzBE5PtlMw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1756610429; x= 1756696829; bh=kBqdDm/J9GTebJbtyuW2zpOIxB1rhdME0Tm8bp3Ab7I=; b=D K7+vxNeR0ntkaLpqbYvcLK93lYK7VhaHi8FxuBg0EfnRKcayUcBSWUUkmDHJY7Sm 5I4VMun8tK8eDWgSzw7kXCm4GEZ9NcTMP0nps4i0+0ZJUy+YZUSQNFyG9ZqDDwGM APzu5eqjDmj2gVX04PkVjVrSVdyyJbKdqwtf37UarBOEuxIyS5Jk1OG+hLh4f/2y geKq9lRPLh8hEqSzpunA/kKIvAM37VisqG5kXKkfLTJhb9Mv5dcHJj/sjktGBeTR DE05TtgS5p5/aSmMdkPY8IoFiVPV60zb9Z1m3wwCzBcURduyzViZMG46gM4tiCt/ 0wwymAt7G9tO6IBufSksg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgddukeekudekucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepkfffgggfuffvvehfhfgjtgfgsehtkeertddtvdejnecuhfhrohhmpeffmhhithhr hicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrthhtvg hrnhepgeelfeetkefghfdvhfdtgeevveevteetgeetveegtedthefhudekteehffeukeek necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepughmih htrhihsehguhhtohhvrdguvghvpdhnsggprhgtphhtthhopeegpdhmohguvgepshhmthhp ohhuthdprhgtphhtthhopehssggruhhghhesjhgrnhgvshhtrhgvvghtrdgtohhmpdhrtg hpthhtohepvghlihiisehgnhhurdhorhhgpdhrtghpthhtohepjeelfeefgeesuggvsggs uhhgshdrghhnuhdrohhrghdprhgtphhtthhopegvghhgvghrthestghsrdhutghlrgdrvg guuh X-ME-Proxy: Feedback-ID: i07de48aa:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat, 30 Aug 2025 23:20:27 -0400 (EDT) Message-ID: <55676972-950d-440c-a291-0adefced4e6f@gutov.dev> Date: Sun, 31 Aug 2025 06:20:24 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily To: Spencer Baugh , Eli Zaretskii References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <867bymrzvr.fsf@gnu.org> <865xe6rv8z.fsf@gnu.org> <86349aruhe.fsf@gnu.org> Content-Language: en-US From: Dmitry Gutov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) On 29/08/2025 19:11, Spencer Baugh wrote: > On Fri, Aug 29, 2025, 11:57 AM Eli Zaretskii > wrote: > > > From: Spencer Baugh > > > Date: Fri, 29 Aug 2025 11:43:21 -0400 > > Cc: 79334@debbugs.gnu.org , > eggert@cs.ucla.edu , Dmitry Gutov > > > > > > Okay, so then how about we delete this unsafe thread_select and > put some thread-yield calls around > > wait_reading_process_output? > > I'm not objected to not calling thread_select there, but can we have > tests that will show us the gains? > > > The primary gain is that it becomes easier to reason about locking.  I'm > not sure how to write a test for that :) I think what could help in this area is some benchmarking scenario, and the goal would be to show that it doesn't regress (or within the margin of error) when 'pselect' is used directly. A script which would create a bunch of threads and some amount of processes spawned by them with certain characteristics. Depending on what workload switching in that thread_select could help optimize. From debbugs-submit-bounces@debbugs.gnu.org Sun Aug 31 20:40:05 2025 Received: (at 79334) by debbugs.gnu.org; 1 Sep 2025 00:40:05 +0000 Received: from localhost ([127.0.0.1]:55810 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ussaa-000670-Uw for submit@debbugs.gnu.org; Sun, 31 Aug 2025 20:40:05 -0400 Received: from fhigh-b4-smtp.messagingengine.com ([202.12.124.155]:41797) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1ussaW-00065X-Mg for 79334@debbugs.gnu.org; Sun, 31 Aug 2025 20:40:02 -0400 Received: from phl-compute-10.internal (phl-compute-10.internal [10.202.2.50]) by mailfhigh.stl.internal (Postfix) with ESMTP id 2EE427A00E5; Sun, 31 Aug 2025 20:39:55 -0400 (EDT) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-10.internal (MEProxy); Sun, 31 Aug 2025 20:39:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1756687195; x=1756773595; bh=8YCpyn73utj8PCl/bcalfnUy6XzbxM41ctQwwXN7Y/k=; b= AVORwJqvW96dssj4L65bo2MiHWyN+Tk5CTwo/K3aWosHRCc8iIcB6XfOHGgNSQ53 FijsWEBhPDn+E4UyqHyVIgQAusKHFouh7JYOzLvU1zwShYxEnHmPt1oSi2KjQK/g txvyU0kKhT+C66Q3U3SXwhQX4GR33wBtEDiIv2lJxP2nc30//P/FyzaRbb2BHED3 4RPIfRKP/G4tkcob0ekyOd3ia1rt/CCC2RgeDiMV/AseDxVvXi4H8Kld2GU6hhZe bB6hw1siAPpdE0JsOC/7yrUT6VXVrsnKH4qutbV7/XCOHFCmLt5GJcdO1k4uS1hs iHrcL0p/qyD3SMzfTR5UUg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1756687195; x= 1756773595; bh=8YCpyn73utj8PCl/bcalfnUy6XzbxM41ctQwwXN7Y/k=; b=h qEHRczFVR9ZRcLfxiFabSf6U9l01g/yYCVxg8tWTwIXo9aLz4ZMzZ7o97b1rXEs3 OtaODJ5im61pA74AwBGvjT9OjfdC/QR1mk9y/LKDI7VQSI2nx0Y2C4jaZzbjGNfo pxcUO7wXASuaqBMA/7ouaF5RZL/RR0rBFnslu9bXIBakWi6eiDnc9XLBdOT+XH6+ T6UcH+hFQA/ScvqiCyMcI8BeJA8DRZnNXldtLCT8h4APi45jR1WlMKXhIHx2HhP+ mvudFQ4AoWhmjdwrFnnfl+nlyd2rS6NCmhs0WAdCzPc5d21Q16+hPE5rBWIznPsg vDUS9piQScYSnfJ1NYlXA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgdduledtjeefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeffmhhithhr hicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrthhtvg hrnhepteduleejgeehtefgheegjeekueehvdevieekueeftddvtdevfefhvdevgedujeeh necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepughmih htrhihsehguhhtohhvrdguvghvpdhnsggprhgtphhtthhopeegpdhmohguvgepshhmthhp ohhuthdprhgtphhtthhopegvlhhiiiesghhnuhdrohhrghdprhgtphhtthhopehssggruh hghhesjhgrnhgvshhtrhgvvghtrdgtohhmpdhrtghpthhtohepjeelfeefgeesuggvsggs uhhgshdrghhnuhdrohhrghdprhgtphhtthhopegvghhgvghrthestghsrdhutghlrgdrvg guuh X-ME-Proxy: Feedback-ID: i07de48aa:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 31 Aug 2025 20:39:53 -0400 (EDT) Message-ID: Date: Mon, 1 Sep 2025 03:39:50 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily To: Eli Zaretskii , Spencer Baugh References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <867bymrzvr.fsf@gnu.org> <865xe6rv8z.fsf@gnu.org> Content-Language: en-US From: Dmitry Gutov In-Reply-To: <865xe6rv8z.fsf@gnu.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) On 29/08/2025 18:41, Eli Zaretskii wrote: > For example, try to run something in a non-main > thread and then attempt to interact with Emacs in the main thread. Could that be code that does straight computation and does not switch threads at all, unless sleep-for is added? Perhaps relatedly, diff-hl in "threaded" mode has a much better interactivity since switching to 'make-process' from 'call-process' for its external program interaction. But you're probably thinking of different kind of examples. From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 01 12:18:46 2025 Received: (at 79334) by debbugs.gnu.org; 1 Sep 2025 16:18:46 +0000 Received: from localhost ([127.0.0.1]:58313 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ut7F0-0008Nb-08 for submit@debbugs.gnu.org; Mon, 01 Sep 2025 12:18:46 -0400 Received: from mxout5.mail.janestreet.com ([64.215.233.18]:38151) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1ut7Ex-0008NM-4X for 79334@debbugs.gnu.org; Mon, 01 Sep 2025 12:18:43 -0400 From: Spencer Baugh To: Eli Zaretskii Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads In-Reply-To: <86jz2kpy1w.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 30 Aug 2025 19:35:39 +0300") References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <86zfbhqptt.fsf@gnu.org> <86jz2kpy1w.fsf@gnu.org> Date: Mon, 01 Sep 2025 12:18:37 -0400 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1756743517; bh=thdIrYshUBoOQVmp2I25ATjxrErnxWZ7ivf5EY1vQNc=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=3Dq5Z4ZTjnYJbtNpiOdCmv5jzIAhFXJNU4uUQc+o8f0MdOykWOnDLTErnoonsytY8 P4yFt8AfpwduaD+HXAJ6itz8AwRLZ9MnhNpQhC+lLCyf2q0EKB2BgN+NXz20NLGcQs fsR4Ca0tcLATHS3oX7fLUjANkcOPk8f4lTqDHFRR/pc7hYEDs6evR1++qGevKHxv/e EWCo52T4sS+BKlFDP++v6lB3dfwr5wvkHyLp65S7nJR5HjTf42b435rgNxXCjopqv/ 71ClIv21i+qUBV8C56OItVcnMkBETLdPb99BrAhatkYMwmIWgsGtMGOrXA+G39QNtd vLPb+NIkdGS/g== X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Eli Zaretskii writes: >> From: Spencer Baugh >> >> +static bool >> >> +other_thread_is_waiting_for (struct fd_callback_data* elem) >> >> +{ >> >> + return elem->waiting_thread != NULL && elem->waiting_thread != current_thread; >> >> +} >> > >> > This should also check that the waiting thread is still alive. If it >> > isn't, it's okay to close the descriptor. Otherwise, descriptors >> > belonging to threads that exited might never be closed. >> >> waiting_thread always points to a living thread, because it's only set >> while a thread is inside wait_reading_process_output. >> >> (In fact, when a thread sees waiting_thread set to a value which is >> neither NULL nor current_thread, it can only ever point to a thread >> which is currently inside thread_select) > > You assume that the waiting_thread field will be cleared when a thread > dies, but that is not guaranteed. Why isn't it better to be defensive > and make sure the thread is still alive? If you are right, that test > will always be true, but what if you are wrong in some rare cases? Fair enough; the patch has changed a lot so this other_thread_is_waiting_for function doesn't exist anymore, but if I re-add it or similar checks, I will check that the thread is alive. >> >> + if (fd_callback_info[fd].waiting_thread == current_thread >> >> + && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD)) >> >> + { >> >> + fprintf (stderr, "closing deferred fd %d\n", fd); >> >> + emacs_close (fd); >> >> + /* Ignore any events which happened on this fd. */ >> >> + FD_CLR (fd, &Available); >> >> + FD_CLR (fd, &Writeok); >> >> + fd_callback_info[fd].flags = 0; >> > >> > Why do it here and not when we deactivate processes? >> >> Because that would close file descriptors that another thread is >> currently selecting on, causing EBADF. > > No, I mean to deactivate only the processes whose I/O channels are > locked to the current thread. > >> > Until now we never closed any descriptors inside >> > wait_reading_process_output. >> >> Just to be clear, we can call deactivate_process from >> wait_reading_process_output, through status_notify, through process >> filters, or other similar. That closes descriptors. > > Yes, and so I suggest doing this logic of deactivating only the > processes that are locked to the current thread there. > >> > More generally, why this design and not a simpler change which only >> > deactivates processes whose I/O is waited by the current thread, as >> > discussed previously? >> >> I looked into that, but I determined that it's too complicated. It >> would require us to keep track of a new kind of process state: a process >> which has been deleted, but hasn't been deactivated yet. > > What I had in mind was neither delete it nor deactivate it. IOW, the > loop which goes through the list of processes should only pay > attention to processes locked to this thread and those that are not > locked to any live thread. It should leave the other processes to > when their threads call status_notify. There are too many ways in which we delete or deactivate processes to do that. In particular, Lisp programs should be able to call delete-process on a process without getting an error just because some thread is currently calling select and happens to be monitoring that process. To emit such an error would make programming with processes and threads extremely difficult. >> > Looking at all the callers of close_process_fd, it sounds like it's >> > too low-level to do this. For example, it is called from >> > create_process (including in the forked child) to close the unneeded >> > ends of the pipe, where we probably don't want this. And >> > clear_fd_callback_data is also used for the keyboard input descriptor, >> > which is probably also not relevant. >> >> True. >> >> I looked into fixing these, and that suggested to me an altogether >> simpler approach. We can keep the logic entirely contained in >> wait_reading_process_output, just adding a bit of code around the select >> call. >> >> And we don't need to defer deleting the fds at all: just recognize when >> it's happened, and be careful to not touch those fds afterwards. > > Hmm... I don't understand: why do we need to do this? The calls to > compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which > set bits in the Available mask passed top pselect already avoid > setting bits for descriptors on which some other thread waits. So > pselect cannot possibly indicate any output ready on any descriptors > belonging to other threads, and this loop you suggest to add: > >> + /* Save the fds we're currently waiting_thread for. */ >> + fd_set waiting_fds; >> + FD_ZERO (&waiting_fds); >> + int max_waiting_fds = 0; >> + for (int fd = 0; fd <= max_desc; ++fd) >> + { >> + if (fd_callback_info[fd].waiting_thread == current_thread) >> + { >> + FD_SET (fd, &waiting_fds); >> + max_waiting_fds = fd; >> + } >> + } > > should be unnecessary. Or what am I missing? That's exactly the problem. A file descriptor for which we set waiting_thread can have delete_read_fd or delete_write_fd called on it by another thread, or by a signal handler, while we're waiting in pselect. We need to catch the fact that the file descriptor has been removed from the set of file descriptors we're supposed to monitor, and avoid reacting to events on it. We can catch this easily by seeing that waiting_thread has been cleared or changed while we were blocked in select. BTW, the fact that this problematic case is also possible via signal handlers (namely the SIGCHLD handler) is a separate bug introduced by the refactoring which added fd_callback_info. It means that with a certain interleaving, I think it should be possible to get buggy behavior by just using processes without threads. It's quite rare though and I haven't been able to figure out what exact path that would cause a problem. From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 01 15:25:45 2025 Received: (at 79334) by debbugs.gnu.org; 1 Sep 2025 19:25:46 +0000 Received: from localhost ([127.0.0.1]:58537 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1utA9x-0002bo-A8 for submit@debbugs.gnu.org; Mon, 01 Sep 2025 15:25:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41010) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1utA9u-0002bU-Mv for 79334@debbugs.gnu.org; Mon, 01 Sep 2025 15:25:43 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1utA9l-0001HC-8M; Mon, 01 Sep 2025 15:25:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=9B0ZZ/jkgQ7hrvUpL46jOl1XsDU80uUi/rpDHxM6vbs=; b=i3DyP9NmmLbu uGLvTZ/KnSy+hJe/SQNAgQlGKE6Ajsc9XbeIHECP+fTHITp0KGqkjla7QvHSGDFievkhi3nRBAtMM tZF25njzMhxbr3wOFyjhU5J4BCKhMNGh8J+r7TpZazA41r9KxAipjvsDSAr7FlVyeFgwid0YKthts ffiZoI75sxKN9IOobVeuV/vm6hLEOOOsAuYTsgmiz4IhMDkXReUN4JJObqCq5Af8E0tkXR+4du/DS WC4YwBMPQiSTbFahlNOwObQVD9PdE+Acj9upRZ3yiUIBRTCP5PwhF0v02+H7feGcRCutZ4fJt+qX+ 522hdddkfidmicw//knukw==; Date: Mon, 01 Sep 2025 22:25:26 +0300 Message-Id: <86ldmym0ux.fsf@gnu.org> From: Eli Zaretskii To: Spencer Baugh In-Reply-To: (message from Spencer Baugh on Mon, 01 Sep 2025 12:18:37 -0400) Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <86zfbhqptt.fsf@gnu.org> <86jz2kpy1w.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Spencer Baugh > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev > Date: Mon, 01 Sep 2025 12:18:37 -0400 > > > You assume that the waiting_thread field will be cleared when a thread > > dies, but that is not guaranteed. Why isn't it better to be defensive > > and make sure the thread is still alive? If you are right, that test > > will always be true, but what if you are wrong in some rare cases? > > Fair enough; the patch has changed a lot so this > other_thread_is_waiting_for function doesn't exist anymore, but if I > re-add it or similar checks, I will check that the thread is alive. Thanks. > > What I had in mind was neither delete it nor deactivate it. IOW, the > > loop which goes through the list of processes should only pay > > attention to processes locked to this thread and those that are not > > locked to any live thread. It should leave the other processes to > > when their threads call status_notify. > > There are too many ways in which we delete or deactivate processes to do > that. We should audit all of them and prevent that from happening when the process is locked to another thread. > In particular, Lisp programs should be able to call > delete-process on a process without getting an error just because some > thread is currently calling select and happens to be monitoring that > process. To emit such an error would make programming with processes > and threads extremely difficult. This is a problem two orders of magnitude smaller than what we are trying to fix here. So my suggestion is to leave it for later. There's no special reason why Lisp programs "should be able" to delete a process locked to another thread. If we don't want to signal an error (though I don't see why not), we could perhaps do nothing silently, until we get to dealing with that later. For now, sabotaging threads that wait for some sub-process is a much more grave problem, and we must fix it if we want threads and processes to live in peace in Emacs. > > Hmm... I don't understand: why do we need to do this? The calls to > > compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which > > set bits in the Available mask passed top pselect already avoid > > setting bits for descriptors on which some other thread waits. So > > pselect cannot possibly indicate any output ready on any descriptors > > belonging to other threads, and this loop you suggest to add: > > > >> + /* Save the fds we're currently waiting_thread for. */ > >> + fd_set waiting_fds; > >> + FD_ZERO (&waiting_fds); > >> + int max_waiting_fds = 0; > >> + for (int fd = 0; fd <= max_desc; ++fd) > >> + { > >> + if (fd_callback_info[fd].waiting_thread == current_thread) > >> + { > >> + FD_SET (fd, &waiting_fds); > >> + max_waiting_fds = fd; > >> + } > >> + } > > > > should be unnecessary. Or what am I missing? > > That's exactly the problem. A file descriptor for which we set > waiting_thread can have delete_read_fd or delete_write_fd called on it > by another thread, or by a signal handler, while we're waiting in > pselect. We need to prevent that from happening, not to go along with it. But anyway, I don't see why you need another fd_set for this. Why not just look at the descriptors in the Available set? > BTW, the fact that this problematic case is also possible via signal > handlers (namely the SIGCHLD handler) is a separate bug introduced by > the refactoring which added fd_callback_info. It means that with a > certain interleaving, I think it should be possible to get buggy > behavior by just using processes without threads. It's quite rare > though and I haven't been able to figure out what exact path that would > cause a problem. Let's not attempt to solve too many problems at the same time. How to make SIGCHLD safer when threads are present is an outstanding issue, and the solution will IMO depend in whether signals are delivered to the main thread and whether or not the build uses the child signal pipe. From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 01 15:41:00 2025 Received: (at 79334) by debbugs.gnu.org; 1 Sep 2025 19:41:00 +0000 Received: from localhost ([127.0.0.1]:58577 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1utAOh-0003Gt-2Y for submit@debbugs.gnu.org; Mon, 01 Sep 2025 15:40:59 -0400 Received: from mxout1.mail.janestreet.com ([38.105.200.78]:55107) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1utAOd-0003Ge-LX for 79334@debbugs.gnu.org; Mon, 01 Sep 2025 15:40:57 -0400 Received: from mail-lj1-f200.google.com ([209.85.208.200]) by mxgoog2.mail.janestreet.com with esmtps (TLS1.3:TLS_AES_128_GCM_SHA256:128) (Exim 4.98.2) id 1utAOY-00000005pgk-1xPJ for 79334@debbugs.gnu.org; Mon, 01 Sep 2025 15:40:50 -0400 Received: by mail-lj1-f200.google.com with SMTP id 38308e7fff4ca-336c3108badso20997601fa.3 for <79334@debbugs.gnu.org>; Mon, 01 Sep 2025 12:40:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=google; t=1756755649; x=1757360449; darn=debbugs.gnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hsXJ+hPsvWAeH4Zyfx3PMMW94ejI9xBkVjJV8rYzITw=; b=m4q2qgaR1H6xyqZryrkHxSwBcQXgDs7ty5La0z8CH1TyakncVg2GDyjmKm+k5Hzpd2 zCAHqBNFF8YU1oSCm7s6vQIXUPsNUyGh0ZH/Sg37tGUEuQm9fS2jclt0NptCfVE9MEi1 Vu5Y9Qufapxj97dM9ZoeQUwNOfwzGTtZuBL3U= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1756755650; bh=hsXJ+hPsvWAeH4Zyfx3PMMW94ejI9xBkVjJV8rYzITw=; h=References:In-Reply-To:From:Date:Subject:To:Cc; b=UshhT9A7/S5jTOWoeWD5Ja7JW4TfDHeKKrsZvcH0id+L2zmpVJnZdv/etalJ45uk7 tvG3RmRG52gqSai1pFxrTbEFVv08OUaEnX31ZnAUBrW1ypunqsqdYnsyWiBaNbT7j8 XY0kw69uUBEZeDigT1lLMXpVgIQlE/56z24wqCLKWOtRgeNZA+/UP8FIqRPQEg3u9S R/rPGm4CZtKh1Nvhcy7MMeHtDGEqZ4ACUdXPes0mL+NCi+2XUv8NZXhAA//5QZ2tqu Ls1fNs7pffBd68zgEiZVtwwVr/x5kiWWLl+81n4i5bdjySqZL7bMVkc8Yu5fJp7JqU DlLASARlkd8xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756755649; x=1757360449; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hsXJ+hPsvWAeH4Zyfx3PMMW94ejI9xBkVjJV8rYzITw=; b=tuUGl2En6lpsPLP8T7IFfMC/b8DBocjkrvdGo7DA+BRdExoM5IcANX/0KCBfI2KImf zfC0HefRtRbHdoh0ShIJNJTnj0RFT9ewCfQlZheY2T+WqltTqDvisZsEAJDsQys68LAP nWygoMvvIo4fYXsb9quWtuY/y4md2uWdDfaBZCqjz6QWvW5mzYq4R9EkkrGzER85FMyV h/Eb8h656P4aA6eHP0Tj4/SaXOnvE1aibA6e2tyqIY0kOvz8WOKpvG4LBrPkzoqPrmH3 VzJ8E/J/xmJiP8USJp6M+61w4fUKqapmXSNslNIEYAE1XGSzrUErrloS5q/ag3Kz2XeO qkhQ== X-Gm-Message-State: AOJu0Yy0YmOhQi3sr0PrZv6P7GldF8K94D6JHQQf3PuXz4L0pHBrfZ/J xkdVLwdX8isqP4RHXBg4ZZZfXTeAk1fHMJ6bx3qgF5c0NfxtPRpBoVa4B/KcWNuWLcRdcP+kSUo 09xvfbvKYaW2ECsrjmaDbgELaXDbxDPyJkJHnagqUz5oxCsXu4SyxgqOXITmu+XGHOhnFOZsJBw 5XQYi8omb1pGglfr/csdC0XyHB0gadLA== X-Gm-Gg: ASbGncsIsR0WijZC6kC2ZH18bvOMhooosBFhIFlO0Ih4KjxIZ5Tit+hJ8QTmPdi/o8c rb6TDO75jL2mXYMsUJS/V52U+8cJqfQ6DFqeOzVz4rQGXBJ1jUmziwRrR0llzPztidsT00ute+G XMsko+epwda+ToKdrL3ap8 X-Received: by 2002:a2e:be23:0:b0:332:1de5:c513 with SMTP id 38308e7fff4ca-336ca901440mr24870951fa.4.1756755648653; Mon, 01 Sep 2025 12:40:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGUlSYGavU+sYo+tMJYoR07G5m3Q0RE9H92MDwv0DKAbvD4XFEgyipfkkE1Yt5VPmP2dcSijLJB2faDAruRMMg= X-Received: by 2002:a2e:be23:0:b0:332:1de5:c513 with SMTP id 38308e7fff4ca-336ca901440mr24870891fa.4.1756755648163; Mon, 01 Sep 2025 12:40:48 -0700 (PDT) MIME-Version: 1.0 References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <86zfbhqptt.fsf@gnu.org> <86jz2kpy1w.fsf@gnu.org> <86ldmym0ux.fsf@gnu.org> In-Reply-To: <86ldmym0ux.fsf@gnu.org> From: Spencer Baugh Date: Mon, 1 Sep 2025 15:40:37 -0400 X-Gm-Features: Ac12FXyUblflpNO1YUj2f8dXcmOzgfch-ujFAZH6okGf-3Y__6g8my__JThELP0 Message-ID: Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads To: Eli Zaretskii Content-Type: multipart/alternative; boundary="0000000000005dc3f4063dc28cb0" X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, Dmitry Gutov X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --0000000000005dc3f4063dc28cb0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Sep 1, 2025, 3:25=E2=80=AFPM Eli Zaretskii wrote: > > From: Spencer Baugh > > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev > > Date: Mon, 01 Sep 2025 12:18:37 -0400 > > > > > You assume that the waiting_thread field will be cleared when a threa= d > > > dies, but that is not guaranteed. Why isn't it better to be defensiv= e > > > and make sure the thread is still alive? If you are right, that test > > > will always be true, but what if you are wrong in some rare cases? > > > > Fair enough; the patch has changed a lot so this > > other_thread_is_waiting_for function doesn't exist anymore, but if I > > re-add it or similar checks, I will check that the thread is alive. > > Thanks. > > > > What I had in mind was neither delete it nor deactivate it. IOW, the > > > loop which goes through the list of processes should only pay > > > attention to processes locked to this thread and those that are not > > > locked to any live thread. It should leave the other processes to > > > when their threads call status_notify. > > > > There are too many ways in which we delete or deactivate processes to d= o > > that. > > We should audit all of them and prevent that from happening when the > process is locked to another thread. > > > In particular, Lisp programs should be able to call > > delete-process on a process without getting an error just because some > > thread is currently calling select and happens to be monitoring that > > process. To emit such an error would make programming with processes > > and threads extremely difficult. > > This is a problem two orders of magnitude smaller than what we are > trying to fix here. So my suggestion is to leave it for later. > There's no special reason why Lisp programs "should be able" to delete > a process locked to another thread. If we don't want to signal an > error (though I don't see why not), we could perhaps do nothing > silently, until we get to dealing with that later. > Silently doing nothing is also an unacceptably confusing result, in my opinion. But it's okay: I believe it's possible to solve the problem without introducing such bad behavior. For now, sabotaging threads that wait for some sub-process is a much > more grave problem, and we must fix it if we want threads and > processes to live in peace in Emacs. > Of course. > > Hmm... I don't understand: why do we need to do this? The calls to > > > compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which > > > set bits in the Available mask passed top pselect already avoid > > > setting bits for descriptors on which some other thread waits. So > > > pselect cannot possibly indicate any output ready on any descriptors > > > belonging to other threads, and this loop you suggest to add: > > > > > >> + /* Save the fds we're currently waiting_thread for. */ > > >> + fd_set waiting_fds; > > >> + FD_ZERO (&waiting_fds); > > >> + int max_waiting_fds =3D 0; > > >> + for (int fd =3D 0; fd <=3D max_desc; ++fd) > > >> + { > > >> + if (fd_callback_info[fd].waiting_thread =3D=3D current_thread) > > >> + { > > >> + FD_SET (fd, &waiting_fds); > > >> + max_waiting_fds =3D fd; > > >> + } > > >> + } > > > > > > should be unnecessary. Or what am I missing? > > > > That's exactly the problem. A file descriptor for which we set > > waiting_thread can have delete_read_fd or delete_write_fd called on it > > by another thread, or by a signal handler, while we're waiting in > > pselect. > > We need to prevent that from happening, not to go along with it. But > anyway, I don't see why you need another fd_set for this. Why not > just look at the descriptors in the Available set? > The Available set is cleared by the select call, and afterwards only contains descriptors which has events. In that case we could just look at the descriptors in Available and check waiting_thread for each of them. But if there was an EBADF, then Available is left in an undefined state (see pselect(1)) and we no longer know what descriptors we originally set waiting_thread on. So we can't check to see if one of our descriptors was "stolen" (had waiting_thread set to a different value), which would mean the EBADF is spurious and we should continue running. Though, I guess we could alternatively just assume that EBADF is always spurious. > BTW, the fact that this problematic case is also possible via signal > > handlers (namely the SIGCHLD handler) is a separate bug introduced by > > the refactoring which added fd_callback_info. It means that with a > > certain interleaving, I think it should be possible to get buggy > > behavior by just using processes without threads. It's quite rare > > though and I haven't been able to figure out what exact path that would > > cause a problem. > > Let's not attempt to solve too many problems at the same time. How to > make SIGCHLD safer when threads are present is an outstanding issue, > and the solution will IMO depend in whether signals are delivered to > the main thread and whether or not the build uses the child signal > pipe. > You misread me: I said SIGCHLD is unsafe even without threads. > --0000000000005dc3f4063dc28cb0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Mon, Sep 1, 2025, 3:25=E2=80= =AFPM Eli Zaretskii <eliz@gnu.org>= ; wrote:
> From: Spencer Baugh &= lt;sbaugh@janestreet.com>
> Cc: 79334@debbugs.gnu.org,=C2=A0 eggert@cs.ucla.edu,=C2=A0 = dm= itry@gutov.dev
> Date: Mon, 01 Sep 2025 12:18:37 -0400
>
> > You assume that the waiting_thread field will be cleared when a t= hread
> > dies, but that is not guaranteed.=C2=A0 Why isn't it better t= o be defensive
> > and make sure the thread is still alive?=C2=A0 If you are right, = that test
> > will always be true, but what if you are wrong in some rare cases= ?
>
> Fair enough; the patch has changed a lot so this
> other_thread_is_waiting_for function doesn't exist anymore, but if= I
> re-add it or similar checks, I will check that the thread is alive.
Thanks.

> > What I had in mind was neither delete it nor deactivate it.=C2=A0= IOW, the
> > loop which goes through the list of processes should only pay
> > attention to processes locked to this thread and those that are n= ot
> > locked to any live thread.=C2=A0 It should leave the other proces= ses to
> > when their threads call status_notify.
>
> There are too many ways in which we delete or deactivate processes to = do
> that.

We should audit all of them and prevent that from happening when the
process is locked to another thread.

> In particular, Lisp programs should be able to call
> delete-process on a process without getting an error just because some=
> thread is currently calling select and happens to be monitoring that > process.=C2=A0 To emit such an error would make programming with proce= sses
> and threads extremely difficult.

This is a problem two orders of magnitude smaller than what we are
trying to fix here.=C2=A0 So my suggestion is to leave it for later.
There's no special reason why Lisp programs "should be able" = to delete
a process locked to another thread.=C2=A0 If we don't want to signal an=
error (though I don't see why not), we could perhaps do nothing
silently, until we get to dealing with that later.

Silently doing nothing is= also an unacceptably confusing result, in my opinion.

But it's okay: I believe it's possib= le to solve the problem without introducing such bad behavior.

For now, sabotaging threads that wait for some sub-process is a much
more grave problem, and we must fix it if we want threads and
processes to live in peace in Emacs.

Of course.
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex"> > > Hmm... I don't understand: why do we need to do this?=C2=A0 T= he calls to
> > compute_input_wait_mask, compute_non_keyboard_wait_mask etc., whi= ch
> > set bits in the Available mask passed top pselect already avoid > > setting bits for descriptors on which some other thread waits.=C2= =A0 So
> > pselect cannot possibly indicate any output ready on any descript= ors
> > belonging to other threads, and this loop you suggest to add:
> >
> >> +=C2=A0 =C2=A0 =C2=A0 /* Save the fds we're currently wai= ting_thread for.=C2=A0 */
> >> +=C2=A0 =C2=A0 =C2=A0 fd_set waiting_fds;
> >> +=C2=A0 =C2=A0 =C2=A0 FD_ZERO (&waiting_fds);
> >> +=C2=A0 =C2=A0 =C2=A0 int max_waiting_fds =3D 0;
> >> +=C2=A0 =C2=A0 =C2=A0 for (int fd =3D 0; fd <=3D max_desc;= ++fd)
> >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 {
> >> +=C2=A0 =C2=A0 if (fd_callback_info[fd].waiting_thread =3D=3D= current_thread)
> >> +=C2=A0 =C2=A0 =C2=A0 {
> >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 FD_SET (fd, &waiting_fds); > >> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_waiting_fds =3D fd;
> >> +=C2=A0 =C2=A0 =C2=A0 }
> >> +=C2=A0 }
> >
> > should be unnecessary.=C2=A0 Or what am I missing?
>
> That's exactly the problem.=C2=A0 A file descriptor for which we s= et
> waiting_thread can have delete_read_fd or delete_write_fd called on it=
> by another thread, or by a signal handler, while we're waiting in<= br> > pselect.

We need to prevent that from happening, not to go along with it.=C2=A0 But<= br> anyway, I don't see why you need another fd_set for this.=C2=A0 Why not=
just look at the descriptors in the Available set?

The Available set is clea= red by the select call, and afterwards only contains descriptors which has = events.=C2=A0 In that case we could just look at the descriptors in Availab= le and check waiting_thread for each of them.

But if there was an EBADF, then Available is left in = an undefined state (see pselect(1)) and we no longer know what descriptors = we originally set waiting_thread on.=C2=A0 So we can't check to see if = one of our descriptors was "stolen" (had waiting_thread set to a = different value), which would mean the EBADF is spurious and we should cont= inue running.

Though, I = guess we could alternatively just assume that EBADF is always spurious.

> BTW, the fact that this problematic case is also possible via signal > handlers (namely the SIGCHLD handler) is a separate bug introduced by<= br> > the refactoring which added fd_callback_info.=C2=A0 It means that with= a
> certain interleaving, I think it should be possible to get buggy
> behavior by just using processes without threads.=C2=A0 It's quite= rare
> though and I haven't been able to figure out what exact path that = would
> cause a problem.

Let's not attempt to solve too many problems at the same time.=C2=A0 Ho= w to
make SIGCHLD safer when threads are present is an outstanding issue,
and the solution will IMO depend in whether signals are delivered to
the main thread and whether or not the build uses the child signal
pipe.

You misread me: I said SIGCHLD is unsafe even without threads.
--0000000000005dc3f4063dc28cb0-- From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 02 07:18:44 2025 Received: (at 79334) by debbugs.gnu.org; 2 Sep 2025 11:18:44 +0000 Received: from localhost ([127.0.0.1]:60513 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1utP2C-0005py-3B for submit@debbugs.gnu.org; Tue, 02 Sep 2025 07:18:44 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:43994) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1utP29-0005pl-8p for 79334@debbugs.gnu.org; Tue, 02 Sep 2025 07:18:42 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1utP23-0007eA-54; Tue, 02 Sep 2025 07:18:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=3EsvoDTrpakJXeHmJ3bLk+DwCSBxCLY4p2VHUqJV6Sg=; b=LRAICAfuZ3Qd QU35EOgQxmviHX4+85qHzv/gylG/MLJNBGKC6DPIz5prhiYgXkTGoUCt+9a4jkgdOOw6XllElpXAZ RqJm87xOHWUdIzp6D1kjO1SvSUEmWjLoMDbiBuc7Gpi4GzhqAKXdzgxdkypl5/FTZlPnijxxn7zae B7ix97S59YRc9NoXzBKc+8AZql6xtnCKoCZq/244/yJ8kOBwCLddTwKR3f1xZaLFZIggxz1muVXoX oHjNTp+5JukA9C2MPYrvs/7La4MBVW0CaAzTgbr0m9QT+2TtfsTI8pWufXQOAMwsVWUGveZZudYsQ klBGsSCRfBpc1HTmlDg04Q==; Date: Tue, 02 Sep 2025 14:18:31 +0300 Message-Id: <86bjntm7aw.fsf@gnu.org> From: Eli Zaretskii To: Spencer Baugh In-Reply-To: (message from Spencer Baugh on Mon, 1 Sep 2025 15:40:37 -0400) Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <86zfbhqptt.fsf@gnu.org> <86jz2kpy1w.fsf@gnu.org> <86ldmym0ux.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Spencer Baugh > Date: Mon, 1 Sep 2025 15:40:37 -0400 > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, Dmitry Gutov > > > In particular, Lisp programs should be able to call > > delete-process on a process without getting an error just because some > > thread is currently calling select and happens to be monitoring that > > process. To emit such an error would make programming with processes > > and threads extremely difficult. > > This is a problem two orders of magnitude smaller than what we are > trying to fix here. So my suggestion is to leave it for later. > There's no special reason why Lisp programs "should be able" to delete > a process locked to another thread. If we don't want to signal an > error (though I don't see why not), we could perhaps do nothing > silently, until we get to dealing with that later. > > Silently doing nothing is also an unacceptably confusing result, in my opinion. Why? Ignoring invalid actions is not something new to Emacs. > > > Hmm... I don't understand: why do we need to do this? The calls to > > > compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which > > > set bits in the Available mask passed top pselect already avoid > > > setting bits for descriptors on which some other thread waits. So > > > pselect cannot possibly indicate any output ready on any descriptors > > > belonging to other threads, and this loop you suggest to add: > > > > > >> + /* Save the fds we're currently waiting_thread for. */ > > >> + fd_set waiting_fds; > > >> + FD_ZERO (&waiting_fds); > > >> + int max_waiting_fds = 0; > > >> + for (int fd = 0; fd <= max_desc; ++fd) > > >> + { > > >> + if (fd_callback_info[fd].waiting_thread == current_thread) > > >> + { > > >> + FD_SET (fd, &waiting_fds); > > >> + max_waiting_fds = fd; > > >> + } > > >> + } > > > > > > should be unnecessary. Or what am I missing? > > > > That's exactly the problem. A file descriptor for which we set > > waiting_thread can have delete_read_fd or delete_write_fd called on it > > by another thread, or by a signal handler, while we're waiting in > > pselect. > > We need to prevent that from happening, not to go along with it. But > anyway, I don't see why you need another fd_set for this. Why not > just look at the descriptors in the Available set? > > The Available set is cleared by the select call, and afterwards only contains descriptors which has events. In > that case we could just look at the descriptors in Available and check waiting_thread for each of them. Yes, and we don't care about descriptors that are not ready to be read. > But if there was an EBADF, then Available is left in an undefined state (see pselect(1)) and we no longer > know what descriptors we originally set waiting_thread on. In that case pselect returns a negative result, no? And if that happens, we don't examine the descriptors at all, right? So why is this situation a problem we need to handle with this additional fd_set? > Though, I guess we could alternatively just assume that EBADF is always spurious. Assume and do what? > > BTW, the fact that this problematic case is also possible via signal > > handlers (namely the SIGCHLD handler) is a separate bug introduced by > > the refactoring which added fd_callback_info. It means that with a > > certain interleaving, I think it should be possible to get buggy > > behavior by just using processes without threads. It's quite rare > > though and I haven't been able to figure out what exact path that would > > cause a problem. > > Let's not attempt to solve too many problems at the same time. How to > make SIGCHLD safer when threads are present is an outstanding issue, > and the solution will IMO depend in whether signals are delivered to > the main thread and whether or not the build uses the child signal > pipe. > > You misread me: I said SIGCHLD is unsafe even without threads. Please describe how. From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 02 09:38:27 2025 Received: (at 79334) by debbugs.gnu.org; 2 Sep 2025 13:38:28 +0000 Received: from localhost ([127.0.0.1]:33720 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1utRDO-00047K-Ui for submit@debbugs.gnu.org; Tue, 02 Sep 2025 09:38:27 -0400 Received: from mxout5.mail.janestreet.com ([64.215.233.18]:46961) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1utRDK-00046n-GH for 79334@debbugs.gnu.org; Tue, 02 Sep 2025 09:38:23 -0400 From: Spencer Baugh To: Eli Zaretskii Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads In-Reply-To: <86bjntm7aw.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 02 Sep 2025 14:18:31 +0300") References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <86zfbhqptt.fsf@gnu.org> <86jz2kpy1w.fsf@gnu.org> <86ldmym0ux.fsf@gnu.org> <86bjntm7aw.fsf@gnu.org> Date: Tue, 02 Sep 2025 09:38:15 -0400 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1756820295; bh=/GVeROQ20HPMbm6tHn3GvxCNKE8SN5BQeTZPunZC2P4=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=uygAzPTvAssN161KRXtPBMiwK5NcDbJMsAGDjj9ZtK7akoXaFOJzYseX+b1RJJ6ss 3jDh3UN8+bcItXwMjSjXIEXhS5EWc3X/P5CcTHOd+xZrBKTVb0wr5o2hXsVEFDm3zi wc72YlX4aHFJK0Y0U+qHjE+xVhlnuw0KOu7FfBHa2GYxsrFE9hNGOcrIgWbmnGoeU5 Gfhe345ZnRdVnkQsaVZ0sItoZ3ou8sDMTZzt+xmtEx9pV1YMF8GkZ7ZY1Lg3lFuE72 xOqACYq5q8+1sqxmyMM/3z4LedXY7/vvYVRGzb/upQPFAVno+WyarTVqeXzvW/+Yoy u3sSWxAz65MMA== X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Eli Zaretskii writes: >> > > Hmm... I don't understand: why do we need to do this? The calls to >> > > compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which >> > > set bits in the Available mask passed top pselect already avoid >> > > setting bits for descriptors on which some other thread waits. So >> > > pselect cannot possibly indicate any output ready on any descriptors >> > > belonging to other threads, and this loop you suggest to add: >> > > >> > >> + /* Save the fds we're currently waiting_thread for. */ >> > >> + fd_set waiting_fds; >> > >> + FD_ZERO (&waiting_fds); >> > >> + int max_waiting_fds = 0; >> > >> + for (int fd = 0; fd <= max_desc; ++fd) >> > >> + { >> > >> + if (fd_callback_info[fd].waiting_thread == current_thread) >> > >> + { >> > >> + FD_SET (fd, &waiting_fds); >> > >> + max_waiting_fds = fd; >> > >> + } >> > >> + } >> > > >> > > should be unnecessary. Or what am I missing? >> > >> > That's exactly the problem. A file descriptor for which we set >> > waiting_thread can have delete_read_fd or delete_write_fd called on it >> > by another thread, or by a signal handler, while we're waiting in >> > pselect. >> >> We need to prevent that from happening, not to go along with it. But >> anyway, I don't see why you need another fd_set for this. Why not >> just look at the descriptors in the Available set? >> >> The Available set is cleared by the select call, and afterwards only contains descriptors which has events. In >> that case we could just look at the descriptors in Available and check waiting_thread for each of them. > > Yes, and we don't care about descriptors that are not ready to be > read. Right. >> But if there was an EBADF, then Available is left in an undefined state (see pselect(1)) and we no longer >> know what descriptors we originally set waiting_thread on. > > In that case pselect returns a negative result, no? And if that > happens, we don't examine the descriptors at all, right? So why is > this situation a problem we need to handle with this additional fd_set? Because we currently call emacs_abort when we get EBADF: if (nfds < 0) { if (xerrno == EINTR) no_avail = 1; else if (xerrno == EBADF) emacs_abort (); else report_file_errno ("Failed select", Qnil, xerrno); } >> Though, I guess we could alternatively just assume that EBADF is always spurious. > > Assume and do what? Follow one of the other two branches in the above snippet of code: - Treat EBADF the same way we do EINTR: there were no events on any fds, so we continue running wait_reading_process_output. - Treat EBADF the same way we treat other errno values: signal an error. That would at least be better than calling emacs_abort. >> > BTW, the fact that this problematic case is also possible via signal >> > handlers (namely the SIGCHLD handler) is a separate bug introduced by >> > the refactoring which added fd_callback_info. It means that with a >> > certain interleaving, I think it should be possible to get buggy >> > behavior by just using processes without threads. It's quite rare >> > though and I haven't been able to figure out what exact path that would >> > cause a problem. >> >> Let's not attempt to solve too many problems at the same time. How to >> make SIGCHLD safer when threads are present is an outstanding issue, >> and the solution will IMO depend in whether signals are delivered to >> the main thread and whether or not the build uses the child signal >> pipe. >> >> You misread me: I said SIGCHLD is unsafe even without threads. > > Please describe how. 1. Call wait_reading_process_output 2. wait_reading_process_output calls compute_input_wait_mask, including SOME-FD in the mask. 3. Receive SIGCHLD, calling handle_child_signal 4. handle_child_signal calls delete_read_fd for SOME-FD 5. Signal handler finishes 6. wait_reading_process_output calls select and returns events for SOME-FD, even though SOME-FD has been deleted from fd_callback_info and the process associated with it has died. It seems possible that this can cause an issue in some code path. This couldn't happen before the refactoring which added fd_callback_info, because handle_child_signal directly mutated the fd_set which is passed to pselect. From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 02 10:40:15 2025 Received: (at 79334) by debbugs.gnu.org; 2 Sep 2025 14:40:16 +0000 Received: from localhost ([127.0.0.1]:35363 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1utSBB-0001qW-Fh for submit@debbugs.gnu.org; Tue, 02 Sep 2025 10:40:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46420) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1utSB3-0001lX-B1 for 79334@debbugs.gnu.org; Tue, 02 Sep 2025 10:40:09 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1utSAr-0002If-Fr; Tue, 02 Sep 2025 10:39:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=nUxB0KzW8g+efBzs3LBuf7v8X2SJvK2Uzzh5l3zb+qU=; b=NnOZNbB25iA9 2BYDQXi+qJSmUZUHNGRae3AV5UqXiB6ailMvYFPt6+sx4xyMvCb1CaBIZMVhehGNHytVITBMLg1ZS OZEcwLTNp0Xk2nYVauDY66ZPiD8EjPvea20fsXW5gz2K1pJuswpHD8W7loJMcAHWhzkPTFrwRr1aE g15aus96FrJjRtW8iAslsExmXCsQs/yv4AdPmAEX7pjdXFK+6kPbC7cvxkpQGWh+3fH48lMOC4c5e vz/SYkED23qkXs63xuVVMLxcAEp6rXBgQflmPqou633kadMEko3R9YWQ1fBAtQVz8lMyFjfNF4iEm lTOH4Vj2RnrnqpDP2Ee7CQ==; Date: Tue, 02 Sep 2025 17:39:39 +0300 Message-Id: <86y0qwlxzo.fsf@gnu.org> From: Eli Zaretskii To: Spencer Baugh In-Reply-To: (message from Spencer Baugh on Tue, 02 Sep 2025 09:38:15 -0400) Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <86zfbhqptt.fsf@gnu.org> <86jz2kpy1w.fsf@gnu.org> <86ldmym0ux.fsf@gnu.org> <86bjntm7aw.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Spencer Baugh > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev > Date: Tue, 02 Sep 2025 09:38:15 -0400 > > >> But if there was an EBADF, then Available is left in an undefined state (see pselect(1)) and we no longer > >> know what descriptors we originally set waiting_thread on. > > > > In that case pselect returns a negative result, no? And if that > > happens, we don't examine the descriptors at all, right? So why is > > this situation a problem we need to handle with this additional fd_set? > > Because we currently call emacs_abort when we get EBADF: So, if the danger of getting EBADF due to threads-and-processes mishaps is real (I'm saying "if" because I don't think we had bug reports about that), we should change how we treat EBADF in that case. Which should probably be a separate discussion anyway. In any case, the distinct return value of pselect in that case means that none of the code which handles other return values should be bothered about this possibility. > >> Though, I guess we could alternatively just assume that EBADF is always spurious. > > > > Assume and do what? > > Follow one of the other two branches in the above snippet of code: > > - Treat EBADF the same way we do EINTR: there were no events on any fds, so > we continue running wait_reading_process_output. > > - Treat EBADF the same way we treat other errno values: signal an error. > That would at least be better than calling emacs_abort. Let's discuss this issue separately, okay? We are already trying to cover too many different scenarios, which makes this discussion hard to follow and the probability of losing focus high. > >> You misread me: I said SIGCHLD is unsafe even without threads. > > > > Please describe how. > > 1. Call wait_reading_process_output > 2. wait_reading_process_output calls compute_input_wait_mask, including > SOME-FD in the mask. > 3. Receive SIGCHLD, calling handle_child_signal > 4. handle_child_signal calls delete_read_fd for SOME-FD > 5. Signal handler finishes > 6. wait_reading_process_output calls select and returns events for > SOME-FD, even though SOME-FD has been deleted from fd_callback_info > and the process associated with it has died. It seems possible that > this can cause an issue in some code path. > > This couldn't happen before the refactoring which added > fd_callback_info, because handle_child_signal directly mutated the > fd_set which is passed to pselect. Are you talking about systems that don't use the "child signal pipe"? Because if that pipe is in use, then the SIGCHLD handler writes to it, and that will cause pselect to return with that descriptor in Available, and we then handle the death of the process. In addition, delete_read_fd just clears the data from the fd_callback_info of the descriptor, and the loop that examines the bits in Available is AFAICT careful enough to test these fields of the callback info before doing anything because the descriptor was returned by pselect as ready. So what kind of problems could have happened in step 6 above? And once again, this is a separate issue, so let's discuss it separately. In this bug, let's focus on what happens in status_notify and in general when we deactivate/delete a process which exited. Can we return to that now? From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 02 10:59:51 2025 Received: (at 79334) by debbugs.gnu.org; 2 Sep 2025 14:59:51 +0000 Received: from localhost ([127.0.0.1]:35597 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1utSUA-0003Wg-Je for submit@debbugs.gnu.org; Tue, 02 Sep 2025 10:59:51 -0400 Received: from mxout5.mail.janestreet.com ([64.215.233.18]:54347) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1utSU5-0003W3-Q7 for 79334@debbugs.gnu.org; Tue, 02 Sep 2025 10:59:46 -0400 From: Spencer Baugh To: Eli Zaretskii Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads In-Reply-To: <86y0qwlxzo.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 02 Sep 2025 17:39:39 +0300") References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <86zfbhqptt.fsf@gnu.org> <86jz2kpy1w.fsf@gnu.org> <86ldmym0ux.fsf@gnu.org> <86bjntm7aw.fsf@gnu.org> <86y0qwlxzo.fsf@gnu.org> Date: Tue, 02 Sep 2025 10:59:39 -0400 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1756825180; bh=BoWAyrOubgQHvj4bFXEbVVhM8ZA+ck1/a+F63bO9td0=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=C4I/3NkCFg1OpBy1POMla5gRVOLnujDz422EDP1ClmqGN9L7UBhSLwwEj4Tf5SxEr C+yj+v/Sj3wtxdi7GVVH6juVwcMMwRJv0shoEKxefG39MhfkBPnwXCVDDNhjxb91bF 6z8AcmwjEg+f0qk4sl4m7LZSVXi+m0DzQfSFWSs7lbDTSgAEQcVF8Oxp1PT9xQRA3c 0oMVCXlDUv9opit0k81o/J2nROrWCEJQfagEJcw9tuJOgzRNXIAtNnJ+fSaYCoa0pS wM8G5zumUlcNSBpbfVR8kh0K8jtPJsuGMFlXtU4P9JLUswSkU2a//y87DZ8sGgH3YO Fmw5s8U8Cqwow== X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Eli Zaretskii writes: >> From: Spencer Baugh >> Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev >> Date: Tue, 02 Sep 2025 09:38:15 -0400 >> >> >> But if there was an EBADF, then Available is left in an undefined state (see pselect(1)) and we no longer >> >> know what descriptors we originally set waiting_thread on. >> > >> > In that case pselect returns a negative result, no? And if that >> > happens, we don't examine the descriptors at all, right? So why is >> > this situation a problem we need to handle with this additional fd_set? >> >> Because we currently call emacs_abort when we get EBADF: > > So, if the danger of getting EBADF due to threads-and-processes > mishaps is real (I'm saying "if" because I don't think we had bug > reports about that), It is real. I have seen it plenty at my site. I described this issue in my initial email. It is the issue I care about solving. This bug is my report of the issue. Solving it is my highest priority. > we should change how we treat EBADF in that case. Which should > probably be a separate discussion anyway. In any case, the distinct > return value of pselect in that case means that none of the code which > handles other return values should be bothered about this possibility. > >> >> Though, I guess we could alternatively just assume that EBADF is always spurious. >> > >> > Assume and do what? >> >> Follow one of the other two branches in the above snippet of code: >> >> - Treat EBADF the same way we do EINTR: there were no events on any fds, so >> we continue running wait_reading_process_output. >> >> - Treat EBADF the same way we treat other errno values: signal an error. >> That would at least be better than calling emacs_abort. > > Let's discuss this issue separately, okay? We are already trying to > cover too many different scenarios, which makes this discussion hard > to follow and the probability of losing focus high. It's not a separate issue, it's the entire point of this bug. So yes, let's stay focused on this issue and not other issues. Which of these two options do you think we should go with? Or there are other options. For example, we could detect the cause of the EBADF by remembering the original Available set before calling pselect, and checking if the fd_callback_info entry for any of those file descriptors was cleared. >> >> You misread me: I said SIGCHLD is unsafe even without threads. >> > >> > Please describe how. >> >> 1. Call wait_reading_process_output >> 2. wait_reading_process_output calls compute_input_wait_mask, including >> SOME-FD in the mask. >> 3. Receive SIGCHLD, calling handle_child_signal >> 4. handle_child_signal calls delete_read_fd for SOME-FD >> 5. Signal handler finishes >> 6. wait_reading_process_output calls select and returns events for >> SOME-FD, even though SOME-FD has been deleted from fd_callback_info >> and the process associated with it has died. It seems possible that >> this can cause an issue in some code path. >> >> This couldn't happen before the refactoring which added >> fd_callback_info, because handle_child_signal directly mutated the >> fd_set which is passed to pselect. > > Are you talking about systems that don't use the "child signal pipe"? No. This is on GNU/Linux. > Because if that pipe is in use, then the SIGCHLD handler writes to it, > and that will cause pselect to return with that descriptor in > Available, and we then handle the death of the process. In addition, > delete_read_fd just clears the data from the fd_callback_info of the > descriptor, and the loop that examines the bits in Available is AFAICT > careful enough to test these fields of the callback info before doing > anything because the descriptor was returned by pselect as ready. So > what kind of problems could have happened in step 6 above? I'm not sure. > And once again, this is a separate issue, so let's discuss it > separately. Okay. Let me just say again that I suspect there are bugs here which can be triggered by both threads and signal handlers. We can leave it at that. From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 02 11:59:43 2025 Received: (at 79334) by debbugs.gnu.org; 2 Sep 2025 15:59:43 +0000 Received: from localhost ([127.0.0.1]:36095 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1utTQ6-0007Dx-RU for submit@debbugs.gnu.org; Tue, 02 Sep 2025 11:59:43 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55112) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1utTQ2-0007DZ-TX for 79334@debbugs.gnu.org; Tue, 02 Sep 2025 11:59:40 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1utTPu-0001RR-NZ; Tue, 02 Sep 2025 11:59:30 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=sng6gW7pioCvoWFjt62CZcwRHn9LYqx7aZJ+SCQAkPM=; b=XNT6UMi1CvfH Mq+WnDXfbHi4Ge6nv/VFdLp2/rigdB9bTKYP5eyAGe66xZY0/iRq2hpCL+jP74y4AHNxYCyPrKpVv mpK0/ToH6zMzrqNINnXVaVwQqeGIvrGhDMb1v6GXEXlpnSvbK2gmv5KbFR9ejY+cFrye0pSMuuT4D kHMp2u8QYzaRna07pUQfmhZHj6IVR4FP60DHZLmzUCcR2BKnCKdXZIDCKRnaXOIPJzLr/WfSpzvLf ET6UEGm401ENzBiQM+XHK3VHXSXpAKal8X+NTjeHjC4edo5mVl+1t+jxRxN6ymT7N3dKgSPt/DgJb z/eaC7QqXeozdgX/61ATwQ==; Date: Tue, 02 Sep 2025 18:59:28 +0300 Message-Id: <86ms7cluan.fsf@gnu.org> From: Eli Zaretskii To: Spencer Baugh In-Reply-To: (message from Spencer Baugh on Tue, 02 Sep 2025 10:59:39 -0400) Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads References: <86cy8es4bo.fsf@gnu.org> <86a53is0zw.fsf@gnu.org> <86zfbhqptt.fsf@gnu.org> <86jz2kpy1w.fsf@gnu.org> <86ldmym0ux.fsf@gnu.org> <86bjntm7aw.fsf@gnu.org> <86y0qwlxzo.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79334 Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Spencer Baugh > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev > Date: Tue, 02 Sep 2025 10:59:39 -0400 > > Eli Zaretskii writes: > > >> Because we currently call emacs_abort when we get EBADF: > > > > So, if the danger of getting EBADF due to threads-and-processes > > mishaps is real (I'm saying "if" because I don't think we had bug > > reports about that), > > It is real. I have seen it plenty at my site. I described this issue > in my initial email. It is the issue I care about solving. This bug is > my report of the issue. Solving it is my highest priority. OK, so let's see if fixing deletion of processes fixes that problem. If you see that in your usage, it should be relatively easy to test. Then we can take it from there. > > Let's discuss this issue separately, okay? We are already trying to > > cover too many different scenarios, which makes this discussion hard > > to follow and the probability of losing focus high. > > It's not a separate issue, it's the entire point of this bug. So yes, > let's stay focused on this issue and not other issues. > > Which of these two options do you think we should go with? I would like fix the process deletion and closing of the descriptors first, because that can definitely be one reason for EBADF. We can then discuss additional measures. From where I stand, it is best to completely eliminate EBADF due to these problems; if we succeed, the code which deals with EBADF could maybe be left as it is, since this should then be some spurious problem with no known reasons. If it turns out that we cannot eliminate EBADF completely by fixing the process-deactivation code, then yes, we will need to do something when we get EBADF as result. So what happened to the changes in the code that handles deactivation of a process? can we return to that and arrive at some agreed-upon solution? AFAIR, last time I asked you why not fix these problems inside status_notify (and the few other places which use FOR_EACH_PROCESS). > > Because if that pipe is in use, then the SIGCHLD handler writes to it, > > and that will cause pselect to return with that descriptor in > > Available, and we then handle the death of the process. In addition, > > delete_read_fd just clears the data from the fd_callback_info of the > > descriptor, and the loop that examines the bits in Available is AFAICT > > careful enough to test these fields of the callback info before doing > > anything because the descriptor was returned by pselect as ready. So > > what kind of problems could have happened in step 6 above? > > I'm not sure. That's my reading of the code. Of course, I could be wrong, but then we need a description of the scenario details which causes such problems, including the control and data flow which make it possible. > > And once again, this is a separate issue, so let's discuss it > > separately. > > Okay. Let me just say again that I suspect there are bugs here which > can be triggered by both threads and signal handlers. We can leave it > at that. Yes, I agree that signals add a non-trivial complexity to this stuff, due to their asynchronous nature and their ability to run code on the "wrong" thread.