From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 28 17:09:11 2025
Received: (at submit) by debbugs.gnu.org; 28 Aug 2025 21:09:11 +0000
Received: from localhost ([127.0.0.1]:40303 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1urjrn-0000Mb-CD
	for submit@debbugs.gnu.org; Thu, 28 Aug 2025 17:09:10 -0400
Received: from lists.gnu.org ([2001:470:142::17]:53598)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <sbaugh@janestreet.com>)
 id 1urjrb-0000K2-2Z
 for submit@debbugs.gnu.org; Thu, 28 Aug 2025 17:09:02 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <sbaugh@janestreet.com>)
 id 1urjrV-0007cx-G4
 for bug-gnu-emacs@gnu.org; Thu, 28 Aug 2025 17:08:49 -0400
Received: from mxout5.mail.janestreet.com ([64.215.233.18])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <sbaugh@janestreet.com>)
 id 1urjrS-0006yi-Hf
 for bug-gnu-emacs@gnu.org; Thu, 28 Aug 2025 17:08:49 -0400
From: Spencer Baugh <sbaugh@janestreet.com>
To: bug-gnu-emacs@gnu.org
Subject: [PATCH] Don't release thread select lock unnecessarily
X-Debbugs-Cc: Dmitry Gutov <dmitry@gutov.dev>
Date: Thu, 28 Aug 2025 17:08:43 -0400
Message-ID: <ierwm6n6tno.fsf@janestreet.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1756415324;
 bh=oG6k7arttFbIy05aU6tB9IVuhq7uUKY56UKJ/ZvUbhw=;
 h=From:To:Subject:Date;
 b=FmijEXYwTC3zYww9Ot1knKLIt5dOgbBu5C8n6nB4eThxiXbt2u065WfO/Ihhf2c0M
 FNUhK6MaFrjBI8ZbDs44PcAzaLwzBPnJ8YePPd/roklIEnY59NV5k2y7R5VaYzsIyZ
 4sa3UmX8bdr5zaYwq4c1n701LUiUnqXzFzPts6GvHy+ymxlDy8qZINZjvz/Z5AG0Fj
 fUmKia1R5AcjseqKlhnlyhKaE2LHC6zu4nqClpYW4KLXkTfgmsrPEr5v0yqrxKm8YC
 FeRi3erFJz6f5FI+G7sCARZRXfwDVVBXq3bfmzbEjKOnG63r4UeFSlt7iCY8MF6xmZ
 b6G8cyW2Zvzxw==
Received-SPF: pass client-ip=64.215.233.18; envelope-from=sbaugh@janestreet.com;
 helo=mxout5.mail.janestreet.com
X-Spam_score_int: -43
X-Spam_score: -4.4
X-Spam_bar: ----
X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001,
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001,
 SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: 0.9 (/)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -0.1 (/)

--=-=-=
Content-Type: text/plain

Tags: patch


Previously, we were using thread_select to release the thread
lock around two different calls to select in
wait_reading_process_output.  The main call to select was fine,
but we also used thread_select for a pselect call which returns
immediately (with a timeout of 0) and is supposed to just check
if any file descriptors are active.

If we actually thread switch at that pselect, it's likely to
break internal state for Emacs: namely, the call to
status_notify immediately after can close file descriptors which
another thread is selecting on, causing that thread to get EBADF
from select and then call emacs_abort.

We don't need to thread switch here, so don't.

* src/process.c (wait_reading_process_output): Remove
unnecessary thread_select wrapper.


In GNU Emacs 30.1.90 (build 29, x86_64-pc-linux-gnu, X toolkit, cairo
 version 1.15.12, Xaw scroll bars) of 2025-08-26 built on
 igm-qws-u22796a
Repository revision: 54857fe2fb0ed033afcba231f920f8f7fa185333
Repository branch: emacs-30
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Rocky Linux 8.10 (Green Obsidian)

Configured using:
 'configure --with-x-toolkit=lucid --without-gpm --without-gconf
 --without-selinux --without-imagemagick --with-modules --with-gif=no
 --with-cairo --with-rsvg --without-compress-install --with-tree-sitter
 --with-native-compilation=aot
 PKG_CONFIG_PATH=/usr/local/home/garnish/libtree-sitter/0.22.6-1/lib/pkgconfig/'


--=-=-=
Content-Type: text/patch
Content-Disposition: attachment;
 filename=0001-Don-t-release-thread-select-lock-unnecessarily.patch

>From 68e7ed37036498a0df0232196c50cced38cdb751 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh@janestreet.com>
Date: Thu, 28 Aug 2025 17:06:56 -0400
Subject: [PATCH] Don't release thread select lock unnecessarily

Previously, we were using thread_select to release the thread
lock around two different calls to select in
wait_reading_process_output.  The main call to select was fine,
but we also used thread_select for a pselect call which returns
immediately (with a timeout of 0) and is supposed to just check
if any file descriptors are active.

If we actually thread switch at that pselect, it's likely to
break internal state for Emacs: namely, the call to
status_notify immediately after can close file descriptors which
another thread is selecting on, causing that thread to get EBADF
from select and then call emacs_abort.

We don't need to thread switch here, so don't.

* src/process.c (wait_reading_process_output): Remove
unnecessary thread_select wrapper.
---
 src/process.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/process.c b/src/process.c
index c8b70a4174c..b33bebb9a35 100644
--- a/src/process.c
+++ b/src/process.c
@@ -5513,11 +5513,13 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 	  if (0 <= fd)
 	    FD_CLR (fd, &Atemp);
 
+	  /* We're just checking to see if any of these fds are active,
+	     so make pselect return immediately. */
 	  timeout = make_timespec (0, 0);
-	  if ((thread_select (pselect, max_desc + 1,
-			      &Atemp,
-			      (num_pending_connects > 0 ? &Ctemp : NULL),
-			      NULL, &timeout, NULL)
+	  if ((pselect (max_desc + 1,
+			&Atemp,
+			(num_pending_connects > 0 ? &Ctemp : NULL),
+			NULL, &timeout, NULL)
 	       <= 0))
 	    {
 	      /* It's okay for us to do this and then continue with
-- 
2.43.7


--=-=-=--


From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 08:25:17 2025
Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 12:25:18 +0000
Received: from localhost ([127.0.0.1]:41981 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1uryAL-00085W-Te
	for submit@debbugs.gnu.org; Fri, 29 Aug 2025 08:25:17 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:52872)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1uryAF-00083q-TB
 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 08:25:11 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1uryA9-0008SE-FW; Fri, 29 Aug 2025 08:25:01 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=A94arxGB1YohBT1rPRd0MPJzmHkTduP0TX8/MPX2v5o=; b=ZdbZ5Nn/xJrU
 aEk4NFcaVzfyRtuxuqfgddmGDvvJI4fKv05DORjADj5ov2y5jhfQ/olqwx2jXpBDDOvlZXmPF9lZ0
 kGsWEXWSVSCcanJtawOz4IovlmlyUP1uHYbyAxDhiukGI+DbvOSSFAh3xYtYT3e/ojkwrD3Kr/PLQ
 PuOHu8r6vTCl6HfoAVtW15yVDcFLbL+DJeyCb0CAOmvnzDuUCIVSefIi58FRDRxBGMB4p6R5zqJTA
 /JqHx4tAMWdWyK6Kg9mzaLDEVohACjlV1+u9axBAsEg6+USHxU2SliFLN2mP0ytk+Tl+jw9VzOzzi
 oDR/mVAWikbfFGOmP5kHrA==;
Date: Fri, 29 Aug 2025 15:24:59 +0300
Message-Id: <86cy8es4bo.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: Spencer Baugh <sbaugh@janestreet.com>,
 Paul Eggert <eggert@cs.ucla.edu>
In-Reply-To: <ierwm6n6tno.fsf@janestreet.com> (message from Spencer Baugh on
 Thu, 28 Aug 2025 17:08:43 -0400)
Subject: Re: [PATCH] Don't release thread select lock unnecessarily
References: <ierwm6n6tno.fsf@janestreet.com>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Spencer Baugh <sbaugh@janestreet.com>
> Date: Thu, 28 Aug 2025 17:08:43 -0400
> 
> Previously, we were using thread_select to release the thread
> lock around two different calls to select in
> wait_reading_process_output.  The main call to select was fine,
> but we also used thread_select for a pselect call which returns
> immediately (with a timeout of 0) and is supposed to just check
> if any file descriptors are active.
> 
> If we actually thread switch at that pselect, it's likely to
> break internal state for Emacs: namely, the call to
> status_notify immediately after can close file descriptors which
> another thread is selecting on, causing that thread to get EBADF
> from select and then call emacs_abort.
> 
> We don't need to thread switch here, so don't.
> 
> * src/process.c (wait_reading_process_output): Remove
> unnecessary thread_select wrapper.

Thanks.

However, I don't think I follow the logic behind this change.  I agree
that status_notify does things that can be dangerous (more about that
below), but the call to status_notify will happen whether we call
thread_select or pselect directly, so if status_notify can do some
damage, it will do that regardless of whether we call pselect directly
or not.  I think you assume that the problematic call to status_notify
will happen in another thread, if we allow switching threads at this
point, is that right?  IOW, I think you assume that, if we switch to
another thread, that other thread could call status_notify, and thus
close some descriptors on which the current thread, the one which
calls thread_select at this point, is waiting.  If that is your
assumption, then the actual situation could also be the opposite: the
current thread, the one where you want to call pselect, will call
status_notify after pselect returns, and that will close some
descriptors used by other threads.  In that case, calling pselect
directly will make things worse, not better, because it makes this
closing of descriptors imminent.

So, while I agree we have a problem here, I think the place and the
way we should solve it are different.

Which brings us to status_notify.  It indeed can close the descriptors
of processes which terminated, so we should probably make it safer.
For example, we could leave alone descriptors marked with non-NULL
waiting_thread if that thread is still alive and is different from the
current thread.  Would that fix the problem?  If yes, I think we
should install such a change.

We should perhaps also understand better what happens with the current
code if and when descriptors are closed by some other thread in this
scenario.  A thread using such a descriptor could be either stuck in
the pselect call, or it could be before or after the call to pselect.
Paul, what happens if a descriptor passed to pselect becomes closed
while pselect waits? does pselect return with EBADF, or does pselect
handle that gracefully, like by considering such descriptors to be
never "ready"?

If the thread is before or after pselect, then either calling pselect
or attempting to read from a closed descriptor after pselect returns
will cause EBADF.  Is our code prepared to deal with such a situation
(e.g., by ignoring the failed read), or will that cause some Lisp API
to signal an error or produce unexpected results?  If the latter, we
should probably handle EBADF specially, because I don't believe we
could completely prevent this from happening.


From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 09:08:01 2025
Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 13:08:01 +0000
Received: from localhost ([127.0.0.1]:42079 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1urypk-0005qe-1B
	for submit@debbugs.gnu.org; Fri, 29 Aug 2025 09:08:01 -0400
Received: from mxout5.mail.janestreet.com ([64.215.233.18]:44385)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <sbaugh@janestreet.com>)
 id 1urypg-0005pS-Cw
 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 09:07:57 -0400
From: Spencer Baugh <sbaugh@janestreet.com>
To: Eli Zaretskii <eliz@gnu.org>
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily
In-Reply-To: <86cy8es4bo.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 29 Aug
 2025 15:24:59 +0300")
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
Date: Fri, 29 Aug 2025 09:07:50 -0400
Message-ID: <ierqzwu6ztl.fsf@janestreet.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1756472870;
 bh=9nk+1fqBMoMPmgli+EJG7Ou/LigbjYj07QqRvUfgPB0=;
 h=From:To:Cc:Subject:In-Reply-To:References:Date;
 b=ZHDmYO/AcuoBh9mXcr+309GYVkxFuBbmY9RKc6Fyv6LkVsz+4Ep5zcuwQiriBX5JB
 +yCu7+yPTZxV9SHKNds/9+JIU/ik8D/k1rIa+7zmNCyigRDAWT5v7IJM8L1dGrL1Jn
 74Jn52/evTYEgcBJjnEc6rvZnXOf7VV7lJmuGrwP+KVwynS6myX549r4HXexTXMJJT
 vcDmWlNNKPMyCefu4cJOlZiUMuiyNB8923qb2EdAPCvJTbYUO7Vu8w53AB2wnbFrgz
 7C4lAH+okQetca2iI6Xx0KXWLBCYFPHuthcv6tuBT7RjI3X3f8e0qc+/SCJPrOvZz/
 A7+pTM0z1lgJw==
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, Paul Eggert <eggert@cs.ucla.edu>,
 Dmitry Gutov <dmitry@gutov.dev>
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Spencer Baugh <sbaugh@janestreet.com>
>> Date: Thu, 28 Aug 2025 17:08:43 -0400
>> 
>> Previously, we were using thread_select to release the thread
>> lock around two different calls to select in
>> wait_reading_process_output.  The main call to select was fine,
>> but we also used thread_select for a pselect call which returns
>> immediately (with a timeout of 0) and is supposed to just check
>> if any file descriptors are active.
>> 
>> If we actually thread switch at that pselect, it's likely to
>> break internal state for Emacs: namely, the call to
>> status_notify immediately after can close file descriptors which
>> another thread is selecting on, causing that thread to get EBADF
>> from select and then call emacs_abort.
>> 
>> We don't need to thread switch here, so don't.
>> 
>> * src/process.c (wait_reading_process_output): Remove
>> unnecessary thread_select wrapper.
>
> Thanks.
>
> However, I don't think I follow the logic behind this change.  I agree
> that status_notify does things that can be dangerous (more about that
> below), but the call to status_notify will happen whether we call
> thread_select or pselect directly, so if status_notify can do some
> damage, it will do that regardless of whether we call pselect directly
> or not.  I think you assume that the problematic call to status_notify
> will happen in another thread, if we allow switching threads at this
> point, is that right?  IOW, I think you assume that, if we switch to
> another thread, that other thread could call status_notify, and thus
> close some descriptors on which the current thread, the one which
> calls thread_select at this point, is waiting.  If that is your
> assumption, then the actual situation could also be the opposite: the
> current thread, the one where you want to call pselect, will call
> status_notify after pselect returns, and that will close some
> descriptors used by other threads.  In that case, calling pselect
> directly will make things worse, not better, because it makes this
> closing of descriptors imminent.

Yes.  I think there's multiple problems in this code and I haven't
tracked them all down yet.

That being said, I'm confident that it's incorrect and unnecessary to be
using thread_select instead of pselect here, so we should stop doing
that in addition to any other fix.

> So, while I agree we have a problem here, I think the place and the
> way we should solve it are different.
>
> Which brings us to status_notify.  It indeed can close the descriptors
> of processes which terminated, so we should probably make it safer.
> For example, we could leave alone descriptors marked with non-NULL
> waiting_thread if that thread is still alive and is different from the
> current thread.  Would that fix the problem?  If yes, I think we
> should install such a change.

That was my initial idea for a fix as well.  But I'm not sure about the
specific details.  Namely, perhaps we need to be checking waiting_thread
everywhere that we're doing FOR_EACH_PROCESS, not just in status_notify.
And maybe some new functions/macros should be added to simplify this;
maybe a version of FOR_EACH_PROCESS which only iterates over processes
for which waiting_thread == current_thread.

> We should perhaps also understand better what happens with the current
> code if and when descriptors are closed by some other thread in this
> scenario.  A thread using such a descriptor could be either stuck in
> the pselect call, or it could be before or after the call to pselect.
> Paul, what happens if a descriptor passed to pselect becomes closed
> while pselect waits? does pselect return with EBADF, or does pselect
> handle that gracefully, like by considering such descriptors to be
> never "ready"?
>
> If the thread is before or after pselect, then either calling pselect
> or attempting to read from a closed descriptor after pselect returns
> will cause EBADF.  Is our code prepared to deal with such a situation
> (e.g., by ignoring the failed read), or will that cause some Lisp API
> to signal an error or produce unexpected results?  If the latter, we
> should probably handle EBADF specially, because I don't believe we
> could completely prevent this from happening.

(I'm more hopeful: I think we can completely prevent this from
happening, we just need to be more careful about our locking.)


From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 09:37:06 2025
Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 13:37:06 +0000
Received: from localhost ([127.0.0.1]:42175 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1urzHs-0001bt-RO
	for submit@debbugs.gnu.org; Fri, 29 Aug 2025 09:37:06 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:51086)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1urzHo-0001aY-6i
 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 09:37:02 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1urzHh-0003xm-H4; Fri, 29 Aug 2025 09:36:53 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=GM5lVVYcgn+bIJ0Q1L7OTwvAGXgIiPlgD6RkljjB9NY=; b=Z5i4iAXaT7Kx
 YNGedJ9YlYpyNUbBLas4ku5Fv3m/NsibpHqCXsPg/UbGi0u1nujxPcF2VxJ8X+rBHLwhsgfMc0x1l
 2jJ1qPR7iH6kfSKo25CTPMR+lQjhK7J7u+I64t4u9LajMgIHH8PwWfSqDx3cmV943YuOmgyM+qY97
 nMHDHiFD6AqJxGQxRbK/JXkOrsOotemKFe4ShiqiiBvDBbR65N+Cba85LheByffH2urBe3nQE4QY6
 VgMMesX6BA+rGFIki+qnNFO7Zxg8FX6snkK2Ittu/DYWhMrnOiOMfOBy6pCeh4vSsRwS4hGz27Q4t
 dyu0+FDOk3wkWJkKGDm8gA==;
Date: Fri, 29 Aug 2025 16:36:51 +0300
Message-Id: <86a53is0zw.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: Spencer Baugh <sbaugh@janestreet.com>
In-Reply-To: <ierqzwu6ztl.fsf@janestreet.com> (message from Spencer Baugh on
 Fri, 29 Aug 2025 09:07:50 -0400)
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Spencer Baugh <sbaugh@janestreet.com>
> Cc: Paul Eggert <eggert@cs.ucla.edu>,  79334@debbugs.gnu.org, Dmitry Gutov
>   <dmitry@gutov.dev>
> Date: Fri, 29 Aug 2025 09:07:50 -0400
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > However, I don't think I follow the logic behind this change.  I agree
> > that status_notify does things that can be dangerous (more about that
> > below), but the call to status_notify will happen whether we call
> > thread_select or pselect directly, so if status_notify can do some
> > damage, it will do that regardless of whether we call pselect directly
> > or not.  I think you assume that the problematic call to status_notify
> > will happen in another thread, if we allow switching threads at this
> > point, is that right?  IOW, I think you assume that, if we switch to
> > another thread, that other thread could call status_notify, and thus
> > close some descriptors on which the current thread, the one which
> > calls thread_select at this point, is waiting.  If that is your
> > assumption, then the actual situation could also be the opposite: the
> > current thread, the one where you want to call pselect, will call
> > status_notify after pselect returns, and that will close some
> > descriptors used by other threads.  In that case, calling pselect
> > directly will make things worse, not better, because it makes this
> > closing of descriptors imminent.
> 
> Yes.  I think there's multiple problems in this code and I haven't
> tracked them all down yet.
> 
> That being said, I'm confident that it's incorrect and unnecessary to be
> using thread_select instead of pselect here, so we should stop doing
> that in addition to any other fix.

If we fix status_notify, what other reasons remain for not calling
thread_select there?

> > So, while I agree we have a problem here, I think the place and the
> > way we should solve it are different.
> >
> > Which brings us to status_notify.  It indeed can close the descriptors
> > of processes which terminated, so we should probably make it safer.
> > For example, we could leave alone descriptors marked with non-NULL
> > waiting_thread if that thread is still alive and is different from the
> > current thread.  Would that fix the problem?  If yes, I think we
> > should install such a change.
> 
> That was my initial idea for a fix as well.  But I'm not sure about the
> specific details.  Namely, perhaps we need to be checking waiting_thread
> everywhere that we're doing FOR_EACH_PROCESS, not just in status_notify.
> And maybe some new functions/macros should be added to simplify this;
> maybe a version of FOR_EACH_PROCESS which only iterates over processes
> for which waiting_thread == current_thread.

You may be right, we should audit all the users of FOR_EACH_PROCESS.
At least one of them (get-buffer-process) is probably okay as it is,
but others might need to avoid looking at processes locked to other
threads, indeed.

> > We should perhaps also understand better what happens with the current
> > code if and when descriptors are closed by some other thread in this
> > scenario.  A thread using such a descriptor could be either stuck in
> > the pselect call, or it could be before or after the call to pselect.
> > Paul, what happens if a descriptor passed to pselect becomes closed
> > while pselect waits? does pselect return with EBADF, or does pselect
> > handle that gracefully, like by considering such descriptors to be
> > never "ready"?
> >
> > If the thread is before or after pselect, then either calling pselect
> > or attempting to read from a closed descriptor after pselect returns
> > will cause EBADF.  Is our code prepared to deal with such a situation
> > (e.g., by ignoring the failed read), or will that cause some Lisp API
> > to signal an error or produce unexpected results?  If the latter, we
> > should probably handle EBADF specially, because I don't believe we
> > could completely prevent this from happening.
> 
> (I'm more hopeful: I think we can completely prevent this from
> happening, we just need to be more careful about our locking.)

If we can prevent that completely, it's even better.


From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 09:45:18 2025
Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 13:45:18 +0000
Received: from localhost ([127.0.0.1]:42186 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1urzPo-0002nF-CV
	for submit@debbugs.gnu.org; Fri, 29 Aug 2025 09:45:17 -0400
Received: from mxout5.mail.janestreet.com ([64.215.233.18]:50375)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <sbaugh@janestreet.com>)
 id 1urzPk-0002iS-RG
 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 09:45:13 -0400
From: Spencer Baugh <sbaugh@janestreet.com>
To: Eli Zaretskii <eliz@gnu.org>
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily
In-Reply-To: <86a53is0zw.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 29 Aug
 2025 16:36:51 +0300")
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
Date: Fri, 29 Aug 2025 09:45:07 -0400
Message-ID: <ieriki66y3g.fsf@janestreet.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1756475107;
 bh=PKJZnrnixgrqu84Y0iPcA3eO5ttQIxkzUmsbGEvFx7Y=;
 h=From:To:Cc:Subject:In-Reply-To:References:Date;
 b=XkLlJR7CpUGqhmAmALvGFx+vAZ8igtC0n0YVZL27IfVAziqh1wUZ/U7bdKX84+YuG
 BREehTmNSK7gZdyvk4/m5DT/OpaafOf6chcU/txzj3S8P8ICLxKTeSA6uBAd7h3FHo
 cLXlEp81QQrmbzyzpj7qOFljExbzQw8lMSlMhQkUEg1fuFHrWS7P2aWZRlpQY5/jKn
 fsT/TD1aA6ry1Ej+FgeSAE68eVV/ZZUS2N+HIgNLFGhRa9TGZyXYTTf3YLYnAttr7t
 cbJmrYwcT2qvYAh3IuBnFEiz2MqarWqDyYuEVo10zaZPZFQpvOS53pZFNmU2XsTz1r
 Vktp7mDa83HhA==
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Spencer Baugh <sbaugh@janestreet.com>
>> Cc: Paul Eggert <eggert@cs.ucla.edu>,  79334@debbugs.gnu.org, Dmitry Gutov
>>   <dmitry@gutov.dev>
>> Date: Fri, 29 Aug 2025 09:07:50 -0400
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > However, I don't think I follow the logic behind this change.  I agree
>> > that status_notify does things that can be dangerous (more about that
>> > below), but the call to status_notify will happen whether we call
>> > thread_select or pselect directly, so if status_notify can do some
>> > damage, it will do that regardless of whether we call pselect directly
>> > or not.  I think you assume that the problematic call to status_notify
>> > will happen in another thread, if we allow switching threads at this
>> > point, is that right?  IOW, I think you assume that, if we switch to
>> > another thread, that other thread could call status_notify, and thus
>> > close some descriptors on which the current thread, the one which
>> > calls thread_select at this point, is waiting.  If that is your
>> > assumption, then the actual situation could also be the opposite: the
>> > current thread, the one where you want to call pselect, will call
>> > status_notify after pselect returns, and that will close some
>> > descriptors used by other threads.  In that case, calling pselect
>> > directly will make things worse, not better, because it makes this
>> > closing of descriptors imminent.
>> 
>> Yes.  I think there's multiple problems in this code and I haven't
>> tracked them all down yet.
>> 
>> That being said, I'm confident that it's incorrect and unnecessary to be
>> using thread_select instead of pselect here, so we should stop doing
>> that in addition to any other fix.
>
> If we fix status_notify, what other reasons remain for not calling
> thread_select there?

Primarily: Calling thread_select here complicates reasoning about the
code.  We should not release the lock when not necessary, because it
makes it harder to think about possible thread interleavings.  And it's
not necessary here because this pselect call returns immediately.

(The only reason the pselect call wouldn't return immediately is if we
released the lock and then switched to another thread)

Also, avoiding thread_select here will slightly improve performance by
avoiding lock operations.


From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 10:01:08 2025
Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 14:01:08 +0000
Received: from localhost ([127.0.0.1]:42576 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1urzf9-0005DB-HL
	for submit@debbugs.gnu.org; Fri, 29 Aug 2025 10:01:08 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:56786)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1urzf6-0005BZ-5e
 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 10:01:05 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1urzf0-0003np-GE; Fri, 29 Aug 2025 10:00:58 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=BZTU3E0kIckzj2s1rgae/vZF9A3NjXstMJzQXgeDSvw=; b=WT5Qr6gjYOFx
 Sa7uxPlt9FnFagWnOsC92XGG84RxXWGRVF3tedq89VcxgAQdD/NHyslgS0i9+30R3ghRtSpyEnhA1
 q24O88vOsy3mVlXSlhNueeOkzfWjiYvBstvy9TQrNzKvwXro9FGh6KlSxhMVUQ13i2igeaZqE8J+6
 fByhtVApuHYHWJkXVtnLtoi9B4hMedFloXRLmvjTYFO2NRbU2O4m/dh3q7kZCnuqLIIHwXxY1sFOj
 sOyaLqF+3NtKWNMVePr6JjZWkBmeNaZI1vLp+EI9cvIijQ5eZfrpHMK4VBttP7k1NO5z0TeUSJIWt
 Hl46DInfexcNJm4KCOerCQ==;
Date: Fri, 29 Aug 2025 17:00:56 +0300
Message-Id: <867bymrzvr.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: Spencer Baugh <sbaugh@janestreet.com>
In-Reply-To: <ieriki66y3g.fsf@janestreet.com> (message from Spencer Baugh on
 Fri, 29 Aug 2025 09:45:07 -0400)
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ieriki66y3g.fsf@janestreet.com>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Spencer Baugh <sbaugh@janestreet.com>
> Cc: 79334@debbugs.gnu.org,  eggert@cs.ucla.edu,  dmitry@gutov.dev
> Date: Fri, 29 Aug 2025 09:45:07 -0400
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> That being said, I'm confident that it's incorrect and unnecessary to be
> >> using thread_select instead of pselect here, so we should stop doing
> >> that in addition to any other fix.
> >
> > If we fix status_notify, what other reasons remain for not calling
> > thread_select there?
> 
> Primarily: Calling thread_select here complicates reasoning about the
> code.  We should not release the lock when not necessary, because it
> makes it harder to think about possible thread interleavings.  And it's
> not necessary here because this pselect call returns immediately.

How do we define "necessary" in this context?

> (The only reason the pselect call wouldn't return immediately is if we
> released the lock and then switched to another thread)

You seem to assume that a call to thread_select is only necessary when
some potentially prolonged operation or wait is expected?  But I'm not
sure this is correct, because that could prevent other threads from
running for too long, and force programmers to inject sleep-for
etc. in their programs.  By allowing thread switches as frequently as
possible lets other threads more opportunities to run (to some extent
simulating preemptive scheduling), which I think is a Good Thing
overall, no?


From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 10:49:45 2025
Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 14:49:46 +0000
Received: from localhost ([127.0.0.1]:42774 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1us0QC-0003Mn-5H
	for submit@debbugs.gnu.org; Fri, 29 Aug 2025 10:49:45 -0400
Received: from mxout5.mail.janestreet.com ([64.215.233.18]:49683)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <sbaugh@janestreet.com>)
 id 1us0Q8-0003LE-FM
 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 10:49:42 -0400
From: Spencer Baugh <sbaugh@janestreet.com>
To: Eli Zaretskii <eliz@gnu.org>
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily
In-Reply-To: <867bymrzvr.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 29 Aug
 2025 17:00:56 +0300")
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ieriki66y3g.fsf@janestreet.com> <867bymrzvr.fsf@gnu.org>
Date: Fri, 29 Aug 2025 10:49:34 -0400
Message-ID: <ierfrda6v41.fsf@janestreet.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1756478974;
 bh=at7cROYOKajFHWykh5sJEnUx1DFD2SfPsFq6+h/bbDg=;
 h=From:To:Cc:Subject:In-Reply-To:References:Date;
 b=yQS/z161DuInozzR2waFgcA8AGX5HLUBko+wVk7IFgK/1hbBAMU0cXPr9Ys5X1s/O
 04kvkug3gAcz6tTHYs2jXcuWGohpqzlDqeU8dN6i10xtCGADsNCHsty2ws+CNb6U7X
 YPFO1MMFWLf1TCaxG53oR76pent60LcjOJC8WenQYqkQhxKM28+irVSS/NpXCctgnN
 G1jkMmwRpLxri80vtqTywZAT+qPRwmuwWy+yLk4S05lhO39OPn7WFXcRnBBjkJWd+N
 mzE6RkZ+zcn7sHuDBnCclS2IvkKsRa0XLZqSPIc7r3a7z/ConeQFB6FYOmGbkKvHH6
 KqPPqXWj/q/PQ==
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Spencer Baugh <sbaugh@janestreet.com>
>> Cc: 79334@debbugs.gnu.org,  eggert@cs.ucla.edu,  dmitry@gutov.dev
>> Date: Fri, 29 Aug 2025 09:45:07 -0400
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> >> That being said, I'm confident that it's incorrect and unnecessary to be
>> >> using thread_select instead of pselect here, so we should stop doing
>> >> that in addition to any other fix.
>> >
>> > If we fix status_notify, what other reasons remain for not calling
>> > thread_select there?
>> 
>> Primarily: Calling thread_select here complicates reasoning about the
>> code.  We should not release the lock when not necessary, because it
>> makes it harder to think about possible thread interleavings.  And it's
>> not necessary here because this pselect call returns immediately.
>
> How do we define "necessary" in this context?

It's necessary to release the lock around long pselect calls because
otherwise other threads will not run while Emacs is waiting for input,
which means they probably won't run at all.

>> (The only reason the pselect call wouldn't return immediately is if we
>> released the lock and then switched to another thread)
>
> You seem to assume that a call to thread_select is only necessary when
> some potentially prolonged operation or wait is expected?  But I'm not
> sure this is correct, because that could prevent other threads from
> running for too long, and force programmers to inject sleep-for
> etc. in their programs.  By allowing thread switches as frequently as
> possible lets other threads more opportunities to run (to some extent
> simulating preemptive scheduling), which I think is a Good Thing
> overall, no?

Three points:

- Most importantly: We can only allow thread switches at known safe
  points.  It's not clear that this is a safe point.  (Actually, we know
  it currently isn't, because of the status_notify issue.  But I suspect
  there may be other issues which are harder to analyze)

- Adding more thread switches is not necessarily a good thing.  It can
  degrade whole-system performance, because context switching is slow.

- wait_reading_process_output already does a thread switch around the
  pselect, so it's not important for one to happen here.


From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 11:41:53 2025
Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 15:41:53 +0000
Received: from localhost ([127.0.0.1]:42921 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1us1Eb-0002by-Ve
	for submit@debbugs.gnu.org; Fri, 29 Aug 2025 11:41:53 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:49032)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1us1EX-0002ae-9Q
 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 11:41:48 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1us1EP-0003An-4X; Fri, 29 Aug 2025 11:41:37 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=wwmcgsbcK/TljZ988x0eaBWcKQ36ow9LSegB+I8w3xw=; b=quytDHwMnKjS
 Li9dCNei+sCJ4ySJXAJVVImUueaEhCqBUs2c2tf1Wh9pbmuwC4D4SmQVdAEHLVhnmQQc5u1KmvauJ
 xqXQiOfCZNgileh5pdDbacVGJBTteI3hlljEMt0sdgP/m6dc+uUAh4mjzrUmx73zK1ptkfHgX8Z8g
 AoOHFBKtDl9ePPYHmxikn1VrIiWZ8RZwVGjd3G+cdGNlSfA0xlxPuk597gxccLm0iEnpsoWeS0csO
 X4I2Yeheyvu/BUXMRcqqGbAzNfiRe9kluGYYJ56HwNP//eBrLChSiAv2DIUhNNZfX1EaNzjjhr5j1
 ztvQYLbwCIawrEXPKaP5qA==;
Date: Fri, 29 Aug 2025 18:41:00 +0300
Message-Id: <865xe6rv8z.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: Spencer Baugh <sbaugh@janestreet.com>
In-Reply-To: <ierfrda6v41.fsf@janestreet.com> (message from Spencer Baugh on
 Fri, 29 Aug 2025 10:49:34 -0400)
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ieriki66y3g.fsf@janestreet.com> <867bymrzvr.fsf@gnu.org>
 <ierfrda6v41.fsf@janestreet.com>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Spencer Baugh <sbaugh@janestreet.com>
> Cc: 79334@debbugs.gnu.org,  eggert@cs.ucla.edu,  dmitry@gutov.dev
> Date: Fri, 29 Aug 2025 10:49:34 -0400
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> Primarily: Calling thread_select here complicates reasoning about the
> >> code.  We should not release the lock when not necessary, because it
> >> makes it harder to think about possible thread interleavings.  And it's
> >> not necessary here because this pselect call returns immediately.
> >
> > How do we define "necessary" in this context?
> 
> It's necessary to release the lock around long pselect calls because
> otherwise other threads will not run while Emacs is waiting for input,
> which means they probably won't run at all.
> 
> >> (The only reason the pselect call wouldn't return immediately is if we
> >> released the lock and then switched to another thread)
> >
> > You seem to assume that a call to thread_select is only necessary when
> > some potentially prolonged operation or wait is expected?  But I'm not
> > sure this is correct, because that could prevent other threads from
> > running for too long, and force programmers to inject sleep-for
> > etc. in their programs.  By allowing thread switches as frequently as
> > possible lets other threads more opportunities to run (to some extent
> > simulating preemptive scheduling), which I think is a Good Thing
> > overall, no?
> 
> Three points:
> 
> - Most importantly: We can only allow thread switches at known safe
>   points.  It's not clear that this is a safe point.  (Actually, we know
>   it currently isn't, because of the status_notify issue.  But I suspect
>   there may be other issues which are harder to analyze)

This reason will be removed once we fix status_notify.

> - Adding more thread switches is not necessarily a good thing.  It can
>   degrade whole-system performance, because context switching is slow.
> 
> - wait_reading_process_output already does a thread switch around the
>   pselect, so it's not important for one to happen here.

You basically give the same single reason in several different
wordings.

I see your point, but my experience with threads in Emacs is the
opposite: there seem to be too few thread switch opportunities, so
many programs don't work as expected unless one sprinkles sleep-for in
various places.  For example, try to run something in a non-main
thread and then attempt to interact with Emacs in the main thread.

IOW, it is quite easy for a Lisp program to hog the global lock and
starve the other threads, so each additional thread-switch opportunity
is IME a blessing, not a curse.


From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 11:43:44 2025
Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 15:43:44 +0000
Received: from localhost ([127.0.0.1]:42926 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1us1GQ-0002rZ-PX
	for submit@debbugs.gnu.org; Fri, 29 Aug 2025 11:43:44 -0400
Received: from mxout5.mail.janestreet.com ([64.215.233.18]:56661)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <sbaugh@janestreet.com>)
 id 1us1GM-0002qE-Nu
 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 11:43:40 -0400
Received: from mail-lj1-f199.google.com ([209.85.208.199])
 by mxgoog2.mail.janestreet.com with esmtps (TLS1.3:TLS_AES_128_GCM_SHA256:128)
 (Exim 4.98.2) id 1us1GH-00000001yfd-1bVq for 79334@debbugs.gnu.org;
 Fri, 29 Aug 2025 11:43:33 -0400
Received: by mail-lj1-f199.google.com with SMTP id
 38308e7fff4ca-336b13923d4so7739521fa.1
 for <79334@debbugs.gnu.org>; Fri, 29 Aug 2025 08:43:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=janestreet.com; s=google; t=1756482212; x=1757087012; darn=debbugs.gnu.org; 
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:from:to:cc:subject:date:message-id:reply-to;
 bh=fxdnxATSoWkbDJ8mPjtVVQCjjd9ivKg0YuV9rA5ZMKw=;
 b=ECsf5kpY0xQv9N0U88hPeXxvpEGt8E/cwTpcg7ct7qDnh92x/G6EN3yv2VzvXFvVOv
 W3wTdn93/SRAA0B4wU3XkNxoGzgIcDSsmo5xVYOApFDlE719+7f7PnvO9s2q6BSGUF8Q
 IhBOBvRal9aldN0E/zDdGavyDakuepecxAQ6Q=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1756482213;
 bh=fxdnxATSoWkbDJ8mPjtVVQCjjd9ivKg0YuV9rA5ZMKw=;
 h=References:In-Reply-To:From:Date:Subject:To:Cc;
 b=xMqRpmQve88dSNKv6PVyL7TjBXCCVfhpLSqDYawmPEQyodIt7LMCGtGuUbJhGPEWp
 hw3d81higT2hffSrXZiq6ldKc9qLz7qxmdsHHeGV9vF2McZ4h7A/FQRdVxYgD2Y3o6
 ayWxdVX9y8zSnBS5ZtfyiGYgA8b2GJ78qJ/FkdIfM1zMrT3dcu6jYwTI1/VylbtkuC
 Ak2A+cttvfkYl/Ta0AM3M96CVYHBV6qT3BY0jji+W1/2tPvi8Ypzl/xPnN3LcfVADm
 E+ACyVvf7v0CXNAaMeYIrhLS3gRlk6dAMZTQnCfNcGsepvDElzhMfIe+6y+BOepsvh
 hXebMUPsIiw6g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1756482212; x=1757087012;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=fxdnxATSoWkbDJ8mPjtVVQCjjd9ivKg0YuV9rA5ZMKw=;
 b=lDXrUd3RuUHcRDSuxs8ThdLu9Q7eAZYzTZvPmMludupKnfcAv7C1RxF+hLYvwwc0v3
 G4o9sowt0xpocGFd5PZZkl23RP1ISDHZWVhQMrJBty/ie6iv0EiVuXX1yO1WBDLPXEyR
 ypAX4x7mIfR3uK3SHbQzW5t1LuUJcguAn02HMp0lJfnSYkz3dpiGKGz0Ws+LeMY2qgOQ
 hriVjvJAmgeA/cud/gsnswioXOxXe1KCsrPDXw7JSqTFr1nA7LiBWr6J1mCJuGedGaQk
 WrAX8zipcT47v73KQ2aAtblm0hF72TpPJNVE9JWBx/ObFJWefXE6sbSF+tU/LG9kPF84
 ZBpw==
X-Gm-Message-State: AOJu0YwBpImaUj+B4W+Ugxy44X95Z9d4tUKy7fDH554uILDatSQLJ9F4
 VMNLMoLUvOc6wdxfMULJ/+LZZTmS5xYkJUsXLj8/xcxN1kpx/rAw7ULfeTtj+mLefjmCh5nTsnn
 zhgGbbfebwiCHo7PLmw3ocA2C0uHyfqslfOszSevu5L7o/389sPV36A7yXZSXMz83US/DfRPrYj
 l1/VVk5lz9K8zHu8Sa5KHXf8g2dzQlMw==
X-Gm-Gg: ASbGncuezZgq3UlqFR+C2EjosvwgTGCpsXpd6wxP2zUm4RF7e20+Ywa68umD0Cf0dOv
 Qv0XbQ7wspRRAdobRXLZJ6objAd6bxAN8gKTomGy1qb0mAl/RRUC/cTOS2i+1MvzsIaXU5+pFb4
 Mu/jFGVxPGf07e/h8GcVwH
X-Received: by 2002:a05:651c:f13:b0:32b:7165:d0 with SMTP id
 38308e7fff4ca-33650e3f8e8mr63959121fa.10.1756482211833; 
 Fri, 29 Aug 2025 08:43:31 -0700 (PDT)
X-Google-Smtp-Source: AGHT+IGvUPUj75CSkNFySI8hcG5Yj7udCRWw2P0gSKoXWkcKfDUIIHklgfxFk5HuT0NEsyS4awNj3cZ98+uWLOIOumk=
X-Received: by 2002:a05:651c:f13:b0:32b:7165:d0 with SMTP id
 38308e7fff4ca-33650e3f8e8mr63959021fa.10.1756482211398; Fri, 29 Aug 2025
 08:43:31 -0700 (PDT)
MIME-Version: 1.0
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ieriki66y3g.fsf@janestreet.com>
 <867bymrzvr.fsf@gnu.org> <ierfrda6v41.fsf@janestreet.com>
 <865xe6rv8z.fsf@gnu.org>
In-Reply-To: <865xe6rv8z.fsf@gnu.org>
From: Spencer Baugh <sbaugh@janestreet.com>
Date: Fri, 29 Aug 2025 11:43:21 -0400
X-Gm-Features: Ac12FXw3CwyQ9uwCdLgAJcsfw4YfJVjUGXJeXMiq2js0UPK1KUrL3hbsvWwwhaE
Message-ID: <CAO=BR8PCwypg+7TMbbQueQh82drqQTeYNU0LGqVQ=oDv58v5NQ@mail.gmail.com>
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily
To: Eli Zaretskii <eliz@gnu.org>
Content-Type: multipart/alternative; boundary="00000000000043dbae063d82e2d0"
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, Dmitry Gutov <dmitry@gutov.dev>
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

--00000000000043dbae063d82e2d0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Fri, Aug 29, 2025, 11:41=E2=80=AFAM Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Spencer Baugh <sbaugh@janestreet.com>
> > Cc: 79334@debbugs.gnu.org,  eggert@cs.ucla.edu,  dmitry@gutov.dev
> > Date: Fri, 29 Aug 2025 10:49:34 -0400
> >
> > Eli Zaretskii <eliz@gnu.org> writes:
> >
> > >> Primarily: Calling thread_select here complicates reasoning about th=
e
> > >> code.  We should not release the lock when not necessary, because it
> > >> makes it harder to think about possible thread interleavings.  And
> it's
> > >> not necessary here because this pselect call returns immediately.
> > >
> > > How do we define "necessary" in this context?
> >
> > It's necessary to release the lock around long pselect calls because
> > otherwise other threads will not run while Emacs is waiting for input,
> > which means they probably won't run at all.
> >
> > >> (The only reason the pselect call wouldn't return immediately is if =
we
> > >> released the lock and then switched to another thread)
> > >
> > > You seem to assume that a call to thread_select is only necessary whe=
n
> > > some potentially prolonged operation or wait is expected?  But I'm no=
t
> > > sure this is correct, because that could prevent other threads from
> > > running for too long, and force programmers to inject sleep-for
> > > etc. in their programs.  By allowing thread switches as frequently as
> > > possible lets other threads more opportunities to run (to some extent
> > > simulating preemptive scheduling), which I think is a Good Thing
> > > overall, no?
> >
> > Three points:
> >
> > - Most importantly: We can only allow thread switches at known safe
> >   points.  It's not clear that this is a safe point.  (Actually, we kno=
w
> >   it currently isn't, because of the status_notify issue.  But I suspec=
t
> >   there may be other issues which are harder to analyze)
>
> This reason will be removed once we fix status_notify.
>
> > - Adding more thread switches is not necessarily a good thing.  It can
> >   degrade whole-system performance, because context switching is slow.
> >
> > - wait_reading_process_output already does a thread switch around the
> >   pselect, so it's not important for one to happen here.
>
> You basically give the same single reason in several different
> wordings.
>
> I see your point, but my experience with threads in Emacs is the
> opposite: there seem to be too few thread switch opportunities, so
> many programs don't work as expected unless one sprinkles sleep-for in
> various places.  For example, try to run something in a non-main
> thread and then attempt to interact with Emacs in the main thread.
>
> IOW, it is quite easy for a Lisp program to hog the global lock and
> starve the other threads, so each additional thread-switch opportunity
> is IME a blessing, not a curse.
>

Okay, so then how about we delete this unsafe thread_select and put some
thread-yield calls around wait_reading_process_output?

>

--00000000000043dbae063d82e2d0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div><br><br><div class=3D"gmail_quote gmail_quote_contai=
ner"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, Aug 29, 2025, 11:41=E2=
=80=AFAM Eli Zaretskii &lt;<a href=3D"mailto:eliz@gnu.org">eliz@gnu.org</a>=
&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex">&gt; From: Spencer Baug=
h &lt;<a href=3D"mailto:sbaugh@janestreet.com" target=3D"_blank" rel=3D"nor=
eferrer">sbaugh@janestreet.com</a>&gt;<br>
&gt; Cc: <a href=3D"mailto:79334@debbugs.gnu.org" target=3D"_blank" rel=3D"=
noreferrer">79334@debbugs.gnu.org</a>,=C2=A0 <a href=3D"mailto:eggert@cs.uc=
la.edu" target=3D"_blank" rel=3D"noreferrer">eggert@cs.ucla.edu</a>,=C2=A0 =
<a href=3D"mailto:dmitry@gutov.dev" target=3D"_blank" rel=3D"noreferrer">dm=
itry@gutov.dev</a><br>
&gt; Date: Fri, 29 Aug 2025 10:49:34 -0400<br>
&gt; <br>
&gt; Eli Zaretskii &lt;<a href=3D"mailto:eliz@gnu.org" target=3D"_blank" re=
l=3D"noreferrer">eliz@gnu.org</a>&gt; writes:<br>
&gt; <br>
&gt; &gt;&gt; Primarily: Calling thread_select here complicates reasoning a=
bout the<br>
&gt; &gt;&gt; code.=C2=A0 We should not release the lock when not necessary=
, because it<br>
&gt; &gt;&gt; makes it harder to think about possible thread interleavings.=
=C2=A0 And it&#39;s<br>
&gt; &gt;&gt; not necessary here because this pselect call returns immediat=
ely.<br>
&gt; &gt;<br>
&gt; &gt; How do we define &quot;necessary&quot; in this context?<br>
&gt; <br>
&gt; It&#39;s necessary to release the lock around long pselect calls becau=
se<br>
&gt; otherwise other threads will not run while Emacs is waiting for input,=
<br>
&gt; which means they probably won&#39;t run at all.<br>
&gt; <br>
&gt; &gt;&gt; (The only reason the pselect call wouldn&#39;t return immedia=
tely is if we<br>
&gt; &gt;&gt; released the lock and then switched to another thread)<br>
&gt; &gt;<br>
&gt; &gt; You seem to assume that a call to thread_select is only necessary=
 when<br>
&gt; &gt; some potentially prolonged operation or wait is expected?=C2=A0 B=
ut I&#39;m not<br>
&gt; &gt; sure this is correct, because that could prevent other threads fr=
om<br>
&gt; &gt; running for too long, and force programmers to inject sleep-for<b=
r>
&gt; &gt; etc. in their programs.=C2=A0 By allowing thread switches as freq=
uently as<br>
&gt; &gt; possible lets other threads more opportunities to run (to some ex=
tent<br>
&gt; &gt; simulating preemptive scheduling), which I think is a Good Thing<=
br>
&gt; &gt; overall, no?<br>
&gt; <br>
&gt; Three points:<br>
&gt; <br>
&gt; - Most importantly: We can only allow thread switches at known safe<br=
>
&gt;=C2=A0 =C2=A0points.=C2=A0 It&#39;s not clear that this is a safe point=
.=C2=A0 (Actually, we know<br>
&gt;=C2=A0 =C2=A0it currently isn&#39;t, because of the status_notify issue=
.=C2=A0 But I suspect<br>
&gt;=C2=A0 =C2=A0there may be other issues which are harder to analyze)<br>
<br>
This reason will be removed once we fix status_notify.<br>
<br>
&gt; - Adding more thread switches is not necessarily a good thing.=C2=A0 I=
t can<br>
&gt;=C2=A0 =C2=A0degrade whole-system performance, because context switchin=
g is slow.<br>
&gt; <br>
&gt; - wait_reading_process_output already does a thread switch around the<=
br>
&gt;=C2=A0 =C2=A0pselect, so it&#39;s not important for one to happen here.=
<br>
<br>
You basically give the same single reason in several different<br>
wordings.<br>
<br>
I see your point, but my experience with threads in Emacs is the<br>
opposite: there seem to be too few thread switch opportunities, so<br>
many programs don&#39;t work as expected unless one sprinkles sleep-for in<=
br>
various places.=C2=A0 For example, try to run something in a non-main<br>
thread and then attempt to interact with Emacs in the main thread.<br>
<br>
IOW, it is quite easy for a Lisp program to hog the global lock and<br>
starve the other threads, so each additional thread-switch opportunity<br>
is IME a blessing, not a curse.<br></blockquote></div></div><div dir=3D"aut=
o"><br></div><div dir=3D"auto">Okay, so then how about we delete this unsaf=
e thread_select and put some thread-yield calls around wait_reading_process=
_output?</div><div dir=3D"auto"><div class=3D"gmail_quote gmail_quote_conta=
iner"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex">
</blockquote></div></div></div>

--00000000000043dbae063d82e2d0--


From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 11:57:57 2025
Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 15:57:57 +0000
Received: from localhost ([127.0.0.1]:42949 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1us1UC-0004xJ-KN
	for submit@debbugs.gnu.org; Fri, 29 Aug 2025 11:57:57 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:49534)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1us1U9-0004wG-R2
 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 11:57:54 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1us1U3-00067b-3f; Fri, 29 Aug 2025 11:57:47 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=hAvlzZKbhrFJaTcWfOhliEnhTcPlkM3JMVMJn+TMCQw=; b=mO+8W4S7kOBI
 6CsXgPXsD/ExRCz6g8TCZV5IPLe2TauKfAfcFzydn8IIWPFSpZb8UE3wBcDeiX2KAD2t5sqtbHWyf
 W6VjlyguEPrpNhWLrcmRUd/HDAvCSc2M0V4U6gpovI46vtbjs39cIstNVdv1GUYXNyDe0UTaWMrlo
 tDDaKwPK+bvirV3tGbC3L0S6iTS7op0Kr301jrjlqNzDERmZ4XTIy1DcA72J7xylYDG8xM1tV0VRv
 eOyQRgLL25kiqlQsQ9RhQPn3KbS1fc27vOPYBZ0wDJ+SxGnBrnavj86M7k9QUx1FM61QyEnlHKjwG
 J3+kmVD40s/7V+6anVQU2w==;
Date: Fri, 29 Aug 2025 18:57:33 +0300
Message-Id: <86349aruhe.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: Spencer Baugh <sbaugh@janestreet.com>
In-Reply-To: <CAO=BR8PCwypg+7TMbbQueQh82drqQTeYNU0LGqVQ=oDv58v5NQ@mail.gmail.com>
 (message from Spencer Baugh on Fri, 29 Aug 2025 11:43:21 -0400)
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ieriki66y3g.fsf@janestreet.com>
 <867bymrzvr.fsf@gnu.org> <ierfrda6v41.fsf@janestreet.com>
 <865xe6rv8z.fsf@gnu.org>
 <CAO=BR8PCwypg+7TMbbQueQh82drqQTeYNU0LGqVQ=oDv58v5NQ@mail.gmail.com>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Spencer Baugh <sbaugh@janestreet.com>
> Date: Fri, 29 Aug 2025 11:43:21 -0400
> Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, Dmitry Gutov <dmitry@gutov.dev>
> 
> Okay, so then how about we delete this unsafe thread_select and put some thread-yield calls around
> wait_reading_process_output?

I'm not objected to not calling thread_select there, but can we have
tests that will show us the gains?

Also, I think we should fix the problems with status_notify and other
users of FOR_EACH_PROCESS before we make the change with
thread_select, because current code in status_notify is definitely
wrong, and affects the related behavior.


From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 12:11:23 2025
Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 16:11:24 +0000
Received: from localhost ([127.0.0.1]:42987 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1us1hD-0006yr-0J
	for submit@debbugs.gnu.org; Fri, 29 Aug 2025 12:11:23 -0400
Received: from mxout1.mail.janestreet.com ([38.105.200.78]:49049)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <sbaugh@janestreet.com>)
 id 1us1h9-0006xb-S5
 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 12:11:21 -0400
Received: from mail-lf1-f70.google.com ([209.85.167.70])
 by mxgoog2.mail.janestreet.com with esmtps (TLS1.3:TLS_AES_128_GCM_SHA256:128)
 (Exim 4.98.2) id 1us1h4-000000021Z0-2Ff0 for 79334@debbugs.gnu.org;
 Fri, 29 Aug 2025 12:11:14 -0400
Received: by mail-lf1-f70.google.com with SMTP id
 2adb3069b0e04-55f41cbbb89so1219956e87.3
 for <79334@debbugs.gnu.org>; Fri, 29 Aug 2025 09:11:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=janestreet.com; s=google; t=1756483873; x=1757088673; darn=debbugs.gnu.org; 
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:from:to:cc:subject:date:message-id:reply-to;
 bh=rbb9gJC/mW9AXUi1WMgjOjVqpA/9OaqtV8CQbxbkb3A=;
 b=IfcwY7FJDWXMoNbEPl6wjXhOR+cYr/AiVpvbh84/+4KzICFbvmXvNfAPlsxhepMX/W
 twLSVPalPOW90X1nxRdoRkOdYZwZpqhb/8ZszWv0BkZSWhHocXyKmjD1tbfJZBK0095t
 NqNZd3mfijcnAP4rS+gEQD6FUwmqoR2JCq0Yc=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1756483874;
 bh=rbb9gJC/mW9AXUi1WMgjOjVqpA/9OaqtV8CQbxbkb3A=;
 h=References:In-Reply-To:From:Date:Subject:To:Cc;
 b=XilS+77jhf0lQTP8tVj+Wwl1tm1nn3qiRecuGZi9eTfOQgDlweFoamGV/3Duj6aVS
 XX3RtWm9JcFxuOB3EduE5i/4SIvh6OOZc//Kuf3dXEemesMu8tZVK9OSGQzeU3UMT6
 mzOnxqHtzJuiveA8arFBPSWoMD+Q4VYK2ywuoxp0bTmN6O+pWePWZE7SyBcrprxE7M
 5Y8TjbZ2ebh4CEuY5lDr6Ah3t5iIXRz2DaeoSRZxGQJrTVhmXAg+DhnunRNyek63Mj
 tWU4WOEa2URi7vnPx0j/d/ZUuv+JeerVkVwDXS0v4Q3CfzCYRrosq4q4o21+7aIY4W
 2FCiAGyHrW0wA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1756483873; x=1757088673;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=rbb9gJC/mW9AXUi1WMgjOjVqpA/9OaqtV8CQbxbkb3A=;
 b=uODLg178U8C1oexbDL1u6ujkfjq2xbkfuhPoQJQOWTZLs/SdkNeCwQJ4LaZkTHJ1s3
 ATGmd0QAMoN8CVknIC97Tc1fbqYrO6ng7G+TbR0iHokUOE7GscW0u07zXtlWJd6qk58L
 eAaN4i+1l9hvE2lc7XQz+uanx5OVVS66IkvWYiV15TCWSs07uNefk3tiK6qr+xZQQEWv
 5hLSqqlDI2ccCbnbCK8LJsHFQ6HQqKA0uaKZxRA0BbW4CT545RoH8PeSFO12mR2R+Hw2
 lDHY5wNfJMcw5awlr8yOEfobqj2u7CfsurWJPtZcAhU9xRj6dVeajbrXqN9PKn54+2rm
 GaSg==
X-Gm-Message-State: AOJu0Yx77Z7pCWaurDM9IP4LkJtEI+QWc64jFdpW/Vt2XVTMRz/F72nE
 2HWAyWwc734/vEiTx0+xLhESbGqXe1hkK4Bl2tYm5OocmNBuHstMxyrbk4+GiP+FaM22ai2E0fM
 tKMQrPuXRV2lgmkp8iv+B9JBfhI2Qtku/AlnwWzMpFhglM79QWE111JcQMDBvAeXeEQQwUqT3L1
 Ypm2MjNT/oRnytfvX+tEVft2O3jOuW4w==
X-Gm-Gg: ASbGnctGClBsIJG9bAmTGEatLb0m9UlVZmFSMGHFjw1/n8tRDswSNOI2ApqoJw1pv2R
 8bV9kgQg2MvmEBMfrpxndsP4kJlKXYP47xigEEklyWeKuIkOvVmBSistSIEDAUFNOcHSKvaP1k/
 ev2Sue9ybbtgERGGQ6uUvG
X-Received: by 2002:a05:6512:3e09:b0:55f:4746:6202 with SMTP id
 2adb3069b0e04-55f47466584mr5235077e87.11.1756483872988; 
 Fri, 29 Aug 2025 09:11:12 -0700 (PDT)
X-Google-Smtp-Source: AGHT+IGrhRSkqk/ar9SzOpgIyqc89bzt5v1+WVzT6yswIwUra87PecZsquU6TBama02ql/SfNmrrNWkT3yc/BuKRKNw=
X-Received: by 2002:a05:6512:3e09:b0:55f:4746:6202 with SMTP id
 2adb3069b0e04-55f47466584mr5235070e87.11.1756483872567; Fri, 29 Aug 2025
 09:11:12 -0700 (PDT)
MIME-Version: 1.0
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ieriki66y3g.fsf@janestreet.com>
 <867bymrzvr.fsf@gnu.org> <ierfrda6v41.fsf@janestreet.com>
 <865xe6rv8z.fsf@gnu.org>
 <CAO=BR8PCwypg+7TMbbQueQh82drqQTeYNU0LGqVQ=oDv58v5NQ@mail.gmail.com>
 <86349aruhe.fsf@gnu.org>
In-Reply-To: <86349aruhe.fsf@gnu.org>
From: Spencer Baugh <sbaugh@janestreet.com>
Date: Fri, 29 Aug 2025 12:11:02 -0400
X-Gm-Features: Ac12FXwaXqkWr-aOstM-1dCSxGLjByIF83ucZOF8ZBjfKxfctAJxVfqp7lW67PU
Message-ID: <CAO=BR8OFBKbTht7_uQsKOsfNOBz_EqTPBrrF9fizv8qb5K8KJw@mail.gmail.com>
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily
To: Eli Zaretskii <eliz@gnu.org>
Content-Type: multipart/alternative; boundary="000000000000478d8d063d8345d0"
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, Dmitry Gutov <dmitry@gutov.dev>
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

--000000000000478d8d063d8345d0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Fri, Aug 29, 2025, 11:57=E2=80=AFAM Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Spencer Baugh <sbaugh@janestreet.com>
> > Date: Fri, 29 Aug 2025 11:43:21 -0400
> > Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, Dmitry Gutov <
> dmitry@gutov.dev>
> >
> > Okay, so then how about we delete this unsafe thread_select and put som=
e
> thread-yield calls around
> > wait_reading_process_output?
>
> I'm not objected to not calling thread_select there, but can we have
> tests that will show us the gains?
>

The primary gain is that it becomes easier to reason about locking.  I'm
not sure how to write a test for that :)

Also, I think we should fix the problems with status_notify and other
> users of FOR_EACH_PROCESS before we make the change with
> thread_select, because current code in status_notify is definitely
> wrong, and affects the related behavior.
>

Yes, I'll work on a patch which does both.  (I'm not going to bother making
a change which doesn't rely on the change to remove the thread_select,
because this code is hard enough to reason about already)

>

--000000000000478d8d063d8345d0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div><br><br><div class=3D"gmail_quote gmail_quote_contai=
ner"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, Aug 29, 2025, 11:57=E2=
=80=AFAM Eli Zaretskii &lt;<a href=3D"mailto:eliz@gnu.org">eliz@gnu.org</a>=
&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 =
0 .8ex;border-left:1px #ccc solid;padding-left:1ex">&gt; From: Spencer Baug=
h &lt;<a href=3D"mailto:sbaugh@janestreet.com" target=3D"_blank" rel=3D"nor=
eferrer">sbaugh@janestreet.com</a>&gt;<br>
&gt; Date: Fri, 29 Aug 2025 11:43:21 -0400<br>
&gt; Cc: <a href=3D"mailto:79334@debbugs.gnu.org" target=3D"_blank" rel=3D"=
noreferrer">79334@debbugs.gnu.org</a>, <a href=3D"mailto:eggert@cs.ucla.edu=
" target=3D"_blank" rel=3D"noreferrer">eggert@cs.ucla.edu</a>, Dmitry Gutov=
 &lt;<a href=3D"mailto:dmitry@gutov.dev" target=3D"_blank" rel=3D"noreferre=
r">dmitry@gutov.dev</a>&gt;<br>
&gt; <br>
&gt; Okay, so then how about we delete this unsafe thread_select and put so=
me thread-yield calls around<br>
&gt; wait_reading_process_output?<br>
<br>
I&#39;m not objected to not calling thread_select there, but can we have<br=
>
tests that will show us the gains?<br></blockquote></div></div><div dir=3D"=
auto"><br></div><div dir=3D"auto">The primary gain is that it becomes easie=
r to reason about locking.=C2=A0 I&#39;m not sure how to write a test for t=
hat :)</div><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=3D"gma=
il_quote gmail_quote_container"><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Also, I think we should fix the problems with status_notify and other<br>
users of FOR_EACH_PROCESS before we make the change with<br>
thread_select, because current code in status_notify is definitely<br>
wrong, and affects the related behavior.<br></blockquote></div></div><div d=
ir=3D"auto"><br></div><div dir=3D"auto">Yes, I&#39;ll work on a patch which=
 does both.=C2=A0 (I&#39;m not going to bother making a change which doesn&=
#39;t rely on the change to remove the thread_select, because this code is =
hard enough to reason about already)</div><div dir=3D"auto"><div class=3D"g=
mail_quote gmail_quote_container"><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
</blockquote></div></div></div>

--000000000000478d8d063d8345d0--


From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 13:32:06 2025
Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 17:32:06 +0000
Received: from localhost ([127.0.0.1]:43228 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1us2xJ-0001j8-MA
	for submit@debbugs.gnu.org; Fri, 29 Aug 2025 13:32:06 -0400
Received: from mxout5.mail.janestreet.com ([64.215.233.18]:52887)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <sbaugh@janestreet.com>)
 id 1us2xB-0001hA-Uo
 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 13:32:02 -0400
From: Spencer Baugh <sbaugh@janestreet.com>
To: Eli Zaretskii <eliz@gnu.org>
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily
In-Reply-To: <86a53is0zw.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 29 Aug
 2025 16:36:51 +0300")
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
Date: Fri, 29 Aug 2025 13:31:52 -0400
Message-ID: <ier8qj26nlj.fsf@janestreet.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1756488712;
 bh=9IUO6GfrnOPEvuCYo1nlgXF+A7ZmqGgLuMpkqCAo3nM=;
 h=From:To:Cc:Subject:In-Reply-To:References:Date;
 b=vE7fSDsWizdWfI94QcCdb8RDPOqYKXXQaQ/V1dkkf00Gy6rBxi6iEJEUbTvUAx93A
 HE2uu9YbFyBYKb3gEWFbGAUTcLd37/YKUMEkqH4s3esxBX8XnwcTWdnHa1NHLW/aEk
 RlMJjYnR44jypuoroJBF5l7frNcuAfJEs/m6tvO/CDNWjn/xM2CSyd42T8xMaf2jZY
 PIsQBfKl8JnHWJsmY04bYVTDoC3aAgumuf+TX2jm/lPf1lwFLpb4FWQLIvZ3XQlpWo
 8EHxiekZtOkXiBnid6yGiO0VJTIDt6UEw/CXS4+H4+8itpS4JH2xzfnH4h43ARDc1t
 97Sj7OaesfPsw==
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Spencer Baugh <sbaugh@janestreet.com>
>> Cc: Paul Eggert <eggert@cs.ucla.edu>,  79334@debbugs.gnu.org, Dmitry Gutov
>>   <dmitry@gutov.dev>
>> Date: Fri, 29 Aug 2025 09:07:50 -0400
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> > Which brings us to status_notify.  It indeed can close the descriptors
>> > of processes which terminated, so we should probably make it safer.
>> > For example, we could leave alone descriptors marked with non-NULL
>> > waiting_thread if that thread is still alive and is different from the
>> > current thread.  Would that fix the problem?  If yes, I think we
>> > should install such a change.
>> 
>> That was my initial idea for a fix as well.  But I'm not sure about the
>> specific details.  Namely, perhaps we need to be checking waiting_thread
>> everywhere that we're doing FOR_EACH_PROCESS, not just in status_notify.
>> And maybe some new functions/macros should be added to simplify this;
>> maybe a version of FOR_EACH_PROCESS which only iterates over processes
>> for which waiting_thread == current_thread.
>
> You may be right, we should audit all the users of FOR_EACH_PROCESS.
> At least one of them (get-buffer-process) is probably okay as it is,
> but others might need to avoid looking at processes locked to other
> threads, indeed.

While doing this, I had another thought...

Do you know if we have any code which handles the fact that
delete-process might be called on a process while another thread is
calling pselect on its fds?

I don't think we do, which suggests that might also be problematic.
I'll work on a fix which encompasses that as well.


From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 15:50:13 2025
Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 19:50:13 +0000
Received: from localhost ([127.0.0.1]:43404 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1us56w-0005GM-3i
	for submit@debbugs.gnu.org; Fri, 29 Aug 2025 15:50:13 -0400
Received: from mxout5.mail.janestreet.com ([64.215.233.18]:41087)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <sbaugh@janestreet.com>)
 id 1us56s-0005CF-3S
 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 15:50:07 -0400
From: Spencer Baugh <sbaugh@janestreet.com>
To: Eli Zaretskii <eliz@gnu.org>
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily
In-Reply-To: <ier8qj26nlj.fsf@janestreet.com> (Spencer Baugh's message of
 "Fri, 29 Aug 2025 13:31:52 -0400")
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com>
Date: Fri, 29 Aug 2025 15:50:00 -0400
Message-ID: <ier5xe57vrr.fsf@janestreet.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1756497000;
 bh=atD57mlVj0ftflEqlmrlBDIpkjm+CYOS/wHD8+LsTVs=;
 h=From:To:Cc:Subject:In-Reply-To:References:Date;
 b=J8xwHa4+rzs2O/EReG8Ki2LzkjFCVjKVo+0nOOZkcgPk78tP7wxaZlSVgit5S0d8k
 TraIACQIKoCa5xPTsIRCO6/53L8HaTU9r5vfOu6ORiItWwcJ/ZOm4n7iynW2USyaM1
 Ppc/rGE7fWV4AGfCSGxPHOYitc0sg/8GHwiQj644yj4YOPxEa5ym55xodDU2hW9z/J
 m5/bqQFVbzz8llGaLzP7cDxKe31BLQ48Rjq8YqQpzBuPkY9o+FW8vLjYgeuw3Fhc98
 tyDaGjy0LnSCjeJ2ziYF0PbtR9DEHHmQDYnRflkenpZqLE8L2ABJB/JVQzDS33dpD/
 b+kHdp0CO9nPA==
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

--=-=-=
Content-Type: text/plain

Spencer Baugh <sbaugh@janestreet.com> writes:
> While doing this, I had another thought...
>
> Do you know if we have any code which handles the fact that
> delete-process might be called on a process while another thread is
> calling pselect on its fds?
>
> I don't think we do, which suggests that might also be problematic.
> I'll work on a fix which encompasses that as well.

OK, I think this patch fixes the issue.

Many different pieces of code in Emacs can close file descriptors.  We
need to be generically careful to not close file descriptors which some
other thread is currently waiting on.  In my patch, we do this by
letting the waiting_thread itself close those file descriptors, after it
returns from select.

Unfortunately I don't have a reliable test which produces the failure
case, but in this patch I added a fprintf when we close a deferred fd.
This prints whenever some thread would have closed an fd that another
thread was waiting on.  If I run the following program (with emacs -Q
--batch):

(defun my-break-thread ()
  (let ((proc (make-process
               :name "foo"
               :command '("sleep" "1"))))
    (while (process-live-p proc)
      (accept-process-output proc 0.05))))
(while t
  (make-thread #'my-break-thread "thread1")
  (thread-join (make-thread #'my-break-thread "thread2")))

I get about one print every 10 seconds.  Each print is a problem case
which has been averted, so I do think this patch fixes the bug.

(Obviously, we should remove the fprintf before applying the change)


--=-=-=
Content-Type: text/x-patch
Content-Disposition: inline;
 filename=0001-Defer-closing-file-descriptors-used-by-other-threads.patch

>From 4dda5e4fff3067cac53d6273ed3bd327437d00c8 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh@janestreet.com>
Date: Fri, 29 Aug 2025 15:39:00 -0400
Subject: [PATCH] Defer closing file descriptors used by other threads

* src/process.c (other_thread_is_waiting_for): Add.
(clear_fd_callback_data): Don't clear WAITING_FOR_CLOSE_FD.
(close_process_fd): Set WAITING_FOR_CLOSE_FD for fds with a
waiting_thread.
(wait_reading_process_output): Close fds with
WAITING_FOR_CLOSE_FD. (bug#79334)
---
 src/process.c | 46 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/src/process.c b/src/process.c
index b33bebb9a35..8273ba78095 100644
--- a/src/process.c
+++ b/src/process.c
@@ -448,7 +448,9 @@ make_lisp_proc (struct Lisp_Process *p)
   /* This descriptor refers to a process.  */
   PROCESS_FD = 8,
   /* A non-blocking connect.  Only valid if FOR_WRITE is set.  */
-  NON_BLOCKING_CONNECT_FD = 16
+  NON_BLOCKING_CONNECT_FD = 16,
+  /* This fd is waiting to be closed. */
+  WAITING_FOR_CLOSE_FD = 32,
 };
 
 static struct fd_callback_data
@@ -466,14 +468,30 @@ make_lisp_proc (struct Lisp_Process *p)
   struct thread_state *waiting_thread;
 } fd_callback_info[FD_SETSIZE];
 
+static bool
+other_thread_is_waiting_for (struct fd_callback_data* elem)
+{
+  return elem->waiting_thread != NULL && elem->waiting_thread != current_thread;
+}
+
 static void
 clear_fd_callback_data(struct fd_callback_data* elem)
 {
   elem->func = NULL;
   elem->data = NULL;
-  elem->flags = 0;
   elem->thread = NULL;
-  elem->waiting_thread = NULL;
+  if (other_thread_is_waiting_for (elem))
+    {
+      /* Preserve WAITING_FOR_CLOSE_FD; see process_close_fd.  Leaving
+	 flags non-zero means recompute_max_desc won't reduce max_desc
+	 past this element. */
+      elem->flags &= WAITING_FOR_CLOSE_FD;
+    }
+  else
+    {
+      elem->flags = 0;
+      elem->waiting_thread = NULL;
+    }
 }
 
 
@@ -2113,7 +2131,11 @@ close_process_fd (int *fd_addr)
   if (0 <= fd)
     {
       *fd_addr = -1;
-      emacs_close (fd);
+      if (other_thread_is_waiting_for (&fd_callback_info[fd]))
+	/* Let waiting_thread handle closing the fd. */
+	fd_callback_info[fd].flags = WAITING_FOR_CLOSE_FD;
+      else
+	emacs_close (fd);
     }
 }
 
@@ -5816,6 +5838,22 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
       /* Make C-g and alarm signals set flags again.  */
       clear_waiting_for_input ();
 
+      /* Close fds which other threads wanted to close.  */
+      for (int fd = 0; fd <= max_desc; ++fd)
+	{
+	  if (fd_callback_info[fd].waiting_thread == current_thread
+	      && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD))
+	    {
+	      fprintf (stderr, "closing deferred fd %d\n", fd);
+	      emacs_close (fd);
+	      /* Ignore any events which happened on this fd. */
+	      FD_CLR (fd, &Available);
+	      FD_CLR (fd, &Writeok);
+	      fd_callback_info[fd].flags = 0;
+	    }
+	}
+      recompute_max_desc ();
+
       /*  If we woke up due to SIGWINCH, actually change size now.  */
       do_pending_window_change (0);
 
-- 
2.43.7


--=-=-=--


From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 29 16:25:49 2025
Received: (at 79334) by debbugs.gnu.org; 29 Aug 2025 20:25:49 +0000
Received: from localhost ([127.0.0.1]:43447 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1us5fQ-00029S-CD
	for submit@debbugs.gnu.org; Fri, 29 Aug 2025 16:25:49 -0400
Received: from mxout5.mail.janestreet.com ([64.215.233.18]:52051)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <sbaugh@janestreet.com>)
 id 1us5fM-00028M-RH
 for 79334@debbugs.gnu.org; Fri, 29 Aug 2025 16:25:46 -0400
From: Spencer Baugh <sbaugh@janestreet.com>
To: Eli Zaretskii <eliz@gnu.org>
Subject: Re: bug#79334: [PATCH] Don't release thread select lock
 unnecessarily, [PATCH] Defer closing file descriptors used by other
 threads
In-Reply-To: <ier5xe57vrr.fsf@janestreet.com> (Spencer Baugh's message of
 "Fri, 29 Aug 2025 15:50:00 -0400")
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com> <ier5xe57vrr.fsf@janestreet.com>
Date: Fri, 29 Aug 2025 16:25:39 -0400
Message-ID: <ier34997u4c.fsf_-_@janestreet.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1756499139;
 bh=phY/bhhTrwbAP0bEN7nHJIJsOHPuoxkPKqn5Fz+RGJk=;
 h=From:To:Cc:Subject:In-Reply-To:References:Date;
 b=He+3LH3Y86QjxzJUJNDXkvoR4yQviSTLniSzpanDyeN55+cFXfZ0luTJGz5VEiM6M
 ZZwPdWhyP72jOT4SxO9ablJ2NenDTTyb3DXOFAHQtyhn1rM2xoXEZU2Rl+LFeYNNfI
 ZXZxBqnEG41H9ntVVI7W0m3UFDDAYYmDn9JZhGBlyWuknq59UGqJm2XKnWXGpVLN9H
 eWbGHyuUI5NXVAdLsZNLfN0J8O8GwIOy08dDE9synpYMPob6ZHLU0y6xFflKpgYFtx
 uUW40U0Y5kK6MOK1udCuV29AMqHRZwpeekAojziNge4NkJJbly8s6EXB4ublqA2is3
 VqyahkqRoPSqQ==
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

--=-=-=
Content-Type: text/plain

Spencer Baugh <sbaugh@janestreet.com> writes:
> Spencer Baugh <sbaugh@janestreet.com> writes:
>> While doing this, I had another thought...
>>
>> Do you know if we have any code which handles the fact that
>> delete-process might be called on a process while another thread is
>> calling pselect on its fds?
>>
>> I don't think we do, which suggests that might also be problematic.
>> I'll work on a fix which encompasses that as well.
>
> OK, I think this patch fixes the issue.
>
> Many different pieces of code in Emacs can close file descriptors.  We
> need to be generically careful to not close file descriptors which some
> other thread is currently waiting on.  In my patch, we do this by
> letting the waiting_thread itself close those file descriptors, after it
> returns from select.
>
> Unfortunately I don't have a reliable test which produces the failure
> case, but in this patch I added a fprintf when we close a deferred fd.
> This prints whenever some thread would have closed an fd that another
> thread was waiting on.  If I run the following program (with emacs -Q
> --batch):
>
> (defun my-break-thread ()
>   (let ((proc (make-process
>                :name "foo"
>                :command '("sleep" "1"))))
>     (while (process-live-p proc)
>       (accept-process-output proc 0.05))))
> (while t
>   (make-thread #'my-break-thread "thread1")
>   (thread-join (make-thread #'my-break-thread "thread2")))
>
> I get about one print every 10 seconds.  Each print is a problem case
> which has been averted, so I do think this patch fixes the bug.
>
> (Obviously, we should remove the fprintf before applying the change)

Of course, I immediately found a bug in my patch (I wasn't clearing
waiting_thread after closing a deferred fd).  Fixed in the attached.

In related news: running process-tests.el in a loop also triggers my
"Closing deferred fd" message.  Which suggests that src/process-tests.el
was probably also suffering from bugs in this area, which could have
been causing hangs or flaky failures.  Which should now hopefully be
fixed.


--=-=-=
Content-Type: text/x-patch
Content-Disposition: inline;
 filename=0001-Defer-closing-file-descriptors-used-by-other-threads.patch

>From 9e46aa50f5de7d719a6e427fe44bb6b8b8e226fc Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh@janestreet.com>
Date: Fri, 29 Aug 2025 15:39:00 -0400
Subject: [PATCH] Defer closing file descriptors used by other threads

* src/process.c (other_thread_is_waiting_for): Add.
(clear_fd_callback_data): Don't clear WAITING_FOR_CLOSE_FD.
(close_process_fd): Set WAITING_FOR_CLOSE_FD for fds with a
waiting_thread.
(wait_reading_process_output): Close fds with
WAITING_FOR_CLOSE_FD. (bug#79334)
---
 src/process.c | 46 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/src/process.c b/src/process.c
index b33bebb9a35..9180b92c61f 100644
--- a/src/process.c
+++ b/src/process.c
@@ -448,7 +448,9 @@ make_lisp_proc (struct Lisp_Process *p)
   /* This descriptor refers to a process.  */
   PROCESS_FD = 8,
   /* A non-blocking connect.  Only valid if FOR_WRITE is set.  */
-  NON_BLOCKING_CONNECT_FD = 16
+  NON_BLOCKING_CONNECT_FD = 16,
+  /* This fd is waiting to be closed. */
+  WAITING_FOR_CLOSE_FD = 32,
 };
 
 static struct fd_callback_data
@@ -466,14 +468,30 @@ make_lisp_proc (struct Lisp_Process *p)
   struct thread_state *waiting_thread;
 } fd_callback_info[FD_SETSIZE];
 
+static bool
+other_thread_is_waiting_for (struct fd_callback_data* elem)
+{
+  return elem->waiting_thread != NULL && elem->waiting_thread != current_thread;
+}
+
 static void
 clear_fd_callback_data(struct fd_callback_data* elem)
 {
   elem->func = NULL;
   elem->data = NULL;
-  elem->flags = 0;
   elem->thread = NULL;
-  elem->waiting_thread = NULL;
+  if (other_thread_is_waiting_for (elem))
+    {
+      /* Preserve WAITING_FOR_CLOSE_FD; see process_close_fd.  Leaving
+	 flags non-zero means recompute_max_desc won't reduce max_desc
+	 past this element. */
+      elem->flags &= WAITING_FOR_CLOSE_FD;
+    }
+  else
+    {
+      elem->flags = 0;
+      elem->waiting_thread = NULL;
+    }
 }
 
 
@@ -2113,7 +2131,11 @@ close_process_fd (int *fd_addr)
   if (0 <= fd)
     {
       *fd_addr = -1;
-      emacs_close (fd);
+      if (other_thread_is_waiting_for (&fd_callback_info[fd]))
+	/* Let waiting_thread handle closing the fd. */
+	fd_callback_info[fd].flags = WAITING_FOR_CLOSE_FD;
+      else
+	emacs_close (fd);
     }
 }
 
@@ -5816,6 +5838,22 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
       /* Make C-g and alarm signals set flags again.  */
       clear_waiting_for_input ();
 
+      /* Close fds which other threads wanted to close.  */
+      for (int fd = 0; fd <= max_desc; ++fd)
+	{
+	  if (fd_callback_info[fd].waiting_thread == current_thread
+	      && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD))
+	    {
+	      fprintf (stderr, "closing deferred fd %d\n", fd);
+	      emacs_close (fd);
+	      /* Ignore any events which happened on this fd. */
+	      FD_CLR (fd, &Available);
+	      FD_CLR (fd, &Writeok);
+	      clear_fd_callback_data (&fd_callback_info[fd]);
+	    }
+	}
+      recompute_max_desc ();
+
       /*  If we woke up due to SIGWINCH, actually change size now.  */
       do_pending_window_change (0);
 
-- 
2.43.7


--=-=-=--


From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 30 02:18:27 2025
Received: (at 79334) by debbugs.gnu.org; 30 Aug 2025 06:18:27 +0000
Received: from localhost ([127.0.0.1]:44269 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1usEuv-0007ZO-F4
	for submit@debbugs.gnu.org; Sat, 30 Aug 2025 02:18:26 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:55540)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1usEus-0007YD-VV
 for 79334@debbugs.gnu.org; Sat, 30 Aug 2025 02:18:24 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1usEul-0000T5-Vv; Sat, 30 Aug 2025 02:18:16 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=37HdJREzOX0Ff/ItCAWw+CoUcHE8596w5562m4Crwjc=; b=rRiUTqsbfbdG
 DNuiQG2SrdwiXrjLxp3wy/x0KhLeGYwN3lTZf0QWki/PaWdAi2xHOQjIuDghxLM5EEV20F0DwNt3S
 A3MEnt/qTQbw3Mn4t0ckSXGDKquEPdmMhcodCmhEiQxIp7Vrl73whdrurxT7WLvzzTgHVG6mWYfjS
 dF+Zi7JWK46YyhX1YOCmAYX2dn/vA/BbDHr7d+Fj2PHV6SMtCTITaag0SVG2yE3rtvD1JAKKilpLH
 Tq50GJCTfMotAcCFNyjsYSiQGTyaW0O6xyDadzI/JK1K1hVDcq56TSU9aSsHtKk7ShUj1/64YxUIw
 bCQnJBUKC1SOkeLTzPjrbg==;
Date: Sat, 30 Aug 2025 09:18:12 +0300
Message-Id: <861pots57f.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: Spencer Baugh <sbaugh@janestreet.com>
In-Reply-To: <ier5xe57vrr.fsf@janestreet.com> (message from Spencer Baugh on
 Fri, 29 Aug 2025 15:50:00 -0400)
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com> <ier5xe57vrr.fsf@janestreet.com>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Spencer Baugh <sbaugh@janestreet.com>
> Cc: 79334@debbugs.gnu.org,  eggert@cs.ucla.edu,  dmitry@gutov.dev
> Date: Fri, 29 Aug 2025 15:50:00 -0400
> 
> Many different pieces of code in Emacs can close file descriptors.  We
> need to be generically careful to not close file descriptors which some
> other thread is currently waiting on.  In my patch, we do this by
> letting the waiting_thread itself close those file descriptors, after it
> returns from select.

Not literally "when returning from pselect", I hope, but when the
process is being deleted _and_ the waiting thread returns from
pselect, right?  IOW, we don't close the descriptor each time pselecft
returns in the right thread, then reopen it before the next call to
pselect, because that won't work.

> +static bool
> +other_thread_is_waiting_for (struct fd_callback_data* elem)
> +{
> +  return elem->waiting_thread != NULL && elem->waiting_thread != current_thread;
> +}

This should also check that the waiting thread is still alive.  If it
isn't, it's okay to close the descriptor.  Otherwise, descriptors
belonging to threads that exited might never be closed.

> @@ -2113,7 +2131,11 @@ close_process_fd (int *fd_addr)
>    if (0 <= fd)
>      {
>        *fd_addr = -1;
> -      emacs_close (fd);
> +      if (other_thread_is_waiting_for (&fd_callback_info[fd]))
> +	/* Let waiting_thread handle closing the fd. */
                                                   ^^
Style: 2 spaces between the period and "*/".

> +      /* Close fds which other threads wanted to close.  */
> +      for (int fd = 0; fd <= max_desc; ++fd)
> +	{
> +	  if (fd_callback_info[fd].waiting_thread == current_thread
> +	      && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD))
> +	    {
> +	      fprintf (stderr, "closing deferred fd %d\n", fd);
> +	      emacs_close (fd);
> +	      /* Ignore any events which happened on this fd. */
> +	      FD_CLR (fd, &Available);
> +	      FD_CLR (fd, &Writeok);
> +	      fd_callback_info[fd].flags = 0;

Why do it here and not when we deactivate processes?  Until now we
never closed any descriptors inside wait_reading_process_output.  More
generally, why this design and not a simpler change which only
deactivates processes whose I/O is waited by the current thread, as
discussed previously?  I think this alternative would be easier to
understand and reason about, because (as established in another
discussion) wait_reading_process_output frequently loops more than we
expect it to.

Looking at all the callers of close_process_fd, it sounds like it's
too low-level to do this.  For example, it is called from
create_process (including in the forked child) to close the unneeded
ends of the pipe, where we probably don't want this.  And
clear_fd_callback_data is also used for the keyboard input descriptor,
which is probably also not relevant.  So I wonder whether it's clean
to have this logic at such a low level, where the purpose of the
descriptor is barely known, and not at a higher level, where we know
what each descriptor is used for, and therefore the purpose and the
intent of the logic is much more clear.


From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 30 02:36:06 2025
Received: (at 79334) by debbugs.gnu.org; 30 Aug 2025 06:36:06 +0000
Received: from localhost ([127.0.0.1]:44312 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1usFBx-0001uC-GY
	for submit@debbugs.gnu.org; Sat, 30 Aug 2025 02:36:06 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:45716)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1usFBt-0001sp-4A
 for 79334@debbugs.gnu.org; Sat, 30 Aug 2025 02:35:59 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1usFBi-00031Y-1v; Sat, 30 Aug 2025 02:35:47 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=/P4xvteEzG7SA+MVTLWp3uJca9UwfYBherFneiGL2Xs=; b=oH6m3zEqU/Hj
 jf5TfcvCz42AIq+luViPRKDd453t8dG/Ce+ScJlvf3YUaiR+sHYqylmyeufeB99QJ/O75EdAY3Xe3
 zTLeUg2zDdrFxPuscrPMXgkDXHM/xZxjVIISVL0qgP1s58NMbNT6xmUchKRJvzfe/azQwFTOBU5bD
 XRI/T73eU3ySz0WoiU4DvD6aKoeLjfA+5eKgGHqrMrCoReBboCefcpxkKE9RUIqvkMGKW4ikG/Y0i
 zo19wl0k+4XKDyKYLZsYVfayuR7O1HyqclJq/lQX5nCn5XWH4WCErAh7/X/yKQJOtfUKk5Z1G0rja
 LUjY1+XTef/A4p36sgwjGg==;
Date: Sat, 30 Aug 2025 09:35:42 +0300
Message-Id: <86zfbhqptt.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: Spencer Baugh <sbaugh@janestreet.com>
In-Reply-To: <ier34997u4c.fsf_-_@janestreet.com> (message from Spencer Baugh
 on Fri, 29 Aug 2025 16:25:39 -0400)
Subject: Re: bug#79334: [PATCH] Don't release thread select lock
 unnecessarily, [PATCH] Defer closing file descriptors used by other
 threads
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com> <ier5xe57vrr.fsf@janestreet.com>
 <ier34997u4c.fsf_-_@janestreet.com>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Spencer Baugh <sbaugh@janestreet.com>
> Cc: 79334@debbugs.gnu.org,  eggert@cs.ucla.edu,  dmitry@gutov.dev
> Date: Fri, 29 Aug 2025 16:25:39 -0400
> 
> In related news: running process-tests.el in a loop also triggers my
> "Closing deferred fd" message.  Which suggests that src/process-tests.el
> was probably also suffering from bugs in this area, which could have
> been causing hangs or flaky failures.  Which should now hopefully be
> fixed.

process-tests uses questionable setup, apart of being based on
non-portable assumptions.  At the time, I was unable to convince the
author of the tests to spell out the assumptions and expectations from
what should happen in those situations, without which it was
impossible to decide whether some failures on MS-Windows are bugs or
just non-portable code that cannot work on Windows.  Some of those
tests always fail for me on Windows.

> +      for (int fd = 0; fd <= max_desc; ++fd)
> +	{
> +	  if (fd_callback_info[fd].waiting_thread == current_thread
> +	      && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD))
                                             ^^^^^^^^^^^^^^^^^^^^^^^
Shouldn't this test that the WAITING_FOR_CLOSE_FD bit is set, not that
it is the _only_ bit set?  Even if the code is correct today, it might
not be future-proof if we add some additional flags at some point.

> +	      /* Ignore any events which happened on this fd. */
> +	      FD_CLR (fd, &Available);
> +	      FD_CLR (fd, &Writeok);

Is this wise?  How can we be sure that these events were already
handled (presumably, in another thread)?  If they were not, we will
lose events.  What will happen if we don't ignore them here?


From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 30 11:55:36 2025
Received: (at 79334) by debbugs.gnu.org; 30 Aug 2025 15:55:36 +0000
Received: from localhost ([127.0.0.1]:49100 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1usNvT-0000cx-To
	for submit@debbugs.gnu.org; Sat, 30 Aug 2025 11:55:36 -0400
Received: from mxout5.mail.janestreet.com ([64.215.233.18]:59135)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <sbaugh@janestreet.com>)
 id 1usNvR-0000cj-7S
 for 79334@debbugs.gnu.org; Sat, 30 Aug 2025 11:55:34 -0400
From: Spencer Baugh <sbaugh@janestreet.com>
To: Eli Zaretskii <eliz@gnu.org>
Subject: Re: bug#79334: [PATCH] Don't release thread select lock
 unnecessarily, [PATCH] Defer closing file descriptors used by other
 threads
In-Reply-To: <86zfbhqptt.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 30 Aug
 2025 09:35:42 +0300")
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com> <ier5xe57vrr.fsf@janestreet.com>
 <ier34997u4c.fsf_-_@janestreet.com> <86zfbhqptt.fsf@gnu.org>
Date: Sat, 30 Aug 2025 11:55:27 -0400
Message-ID: <ierldn0olcg.fsf@janestreet.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1756569327;
 bh=2yHKk7CzlQRcNALXv3lUnlD9q8K1fQQENU6tF+gPOBM=;
 h=From:To:Cc:Subject:In-Reply-To:References:Date;
 b=P0AG8w3L+sbckKKhzOvJtoqacOkE6VTvYb2XE5/gUSq0I8tT0PYSHXW3grhb7w85U
 VdtXsTZ72i3IYxR2mJPDUgTdKXfamBEmEvrUW0DD/JotkNdB58MzND51cYaAi1zCmL
 9hBNXEB5qwdrWMmV+75Fi4atpaEOxUCPe3k4Ya9IJDD/2hCMzkJblOQne2gRUHm7ol
 tVCge504xv4fjGv/1grhnu9a7xnRAvBA8PDuIevLpMFwyEfsMOgCt1c85iToX4mQLf
 JtBqzMXV6XFP05JQkLEztgTkOHFjeU7Ri2eG2ZK3z4eXITx0zaj/O7BGtcyQdn4HI1
 6WfLZjmkDXL9g==
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

--=-=-=
Content-Type: text/plain


(I'm responding to both your emails with review feedback here)

Eli Zaretskii <eliz@gnu.org> writes:
>> From: Spencer Baugh <sbaugh@janestreet.com>
>> Cc: 79334@debbugs.gnu.org,  eggert@cs.ucla.edu,  dmitry@gutov.dev
>> Date: Fri, 29 Aug 2025 16:25:39 -0400
>> +      for (int fd = 0; fd <= max_desc; ++fd)
>> +	{
>> +	  if (fd_callback_info[fd].waiting_thread == current_thread
>> +	      && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD))
>                                              ^^^^^^^^^^^^^^^^^^^^^^^
> Shouldn't this test that the WAITING_FOR_CLOSE_FD bit is set, not that
> it is the _only_ bit set?  Even if the code is correct today, it might
> not be future-proof if we add some additional flags at some point.

True.  Fixed.

>> +	      /* Ignore any events which happened on this fd. */
>> +	      FD_CLR (fd, &Available);
>> +	      FD_CLR (fd, &Writeok);
>
> Is this wise?  How can we be sure that these events were already
> handled (presumably, in another thread)?  If they were not, we will
> lose events.  What will happen if we don't ignore them here?

Any events which might appear at this point were already events we were
going to lose by closing this file descriptor.  For example, if a
process produces output just as we do delete-process on it, we never
read that output.  The only change is that we now have to explicitly
ignore those events; before, we silently dropped them at the time we
closed the file descriptor.

>> Many different pieces of code in Emacs can close file descriptors.  We
>> need to be generically careful to not close file descriptors which some
>> other thread is currently waiting on.  In my patch, we do this by
>> letting the waiting_thread itself close those file descriptors, after it
>> returns from select.
>
> Not literally "when returning from pselect", I hope, but when the
> process is being deleted _and_ the waiting thread returns from
> pselect, right?  IOW, we don't close the descriptor each time pselecft
> returns in the right thread, then reopen it before the next call to
> pselect, because that won't work.

Right.

>> +static bool
>> +other_thread_is_waiting_for (struct fd_callback_data* elem)
>> +{
>> +  return elem->waiting_thread != NULL && elem->waiting_thread != current_thread;
>> +}
>
> This should also check that the waiting thread is still alive.  If it
> isn't, it's okay to close the descriptor.  Otherwise, descriptors
> belonging to threads that exited might never be closed.

waiting_thread always points to a living thread, because it's only set
while a thread is inside wait_reading_process_output.

(In fact, when a thread sees waiting_thread set to a value which is
neither NULL nor current_thread, it can only ever point to a thread
which is currently inside thread_select)

>> @@ -2113,7 +2131,11 @@ close_process_fd (int *fd_addr)
>>    if (0 <= fd)
>>      {
>>        *fd_addr = -1;
>> -      emacs_close (fd);
>> +      if (other_thread_is_waiting_for (&fd_callback_info[fd]))
>> +	/* Let waiting_thread handle closing the fd. */
>                                                    ^^
> Style: 2 spaces between the period and "*/".

Fixed.

>> +      /* Close fds which other threads wanted to close.  */
>> +      for (int fd = 0; fd <= max_desc; ++fd)
>> +	{
>> +	  if (fd_callback_info[fd].waiting_thread == current_thread
>> +	      && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD))
>> +	    {
>> +	      fprintf (stderr, "closing deferred fd %d\n", fd);
>> +	      emacs_close (fd);
>> +	      /* Ignore any events which happened on this fd. */
>> +	      FD_CLR (fd, &Available);
>> +	      FD_CLR (fd, &Writeok);
>> +	      fd_callback_info[fd].flags = 0;
>
> Why do it here and not when we deactivate processes?

Because that would close file descriptors that another thread is
currently selecting on, causing EBADF.

> Until now we never closed any descriptors inside
> wait_reading_process_output.

Just to be clear, we can call deactivate_process from
wait_reading_process_output, through status_notify, through process
filters, or other similar.  That closes descriptors.

> More generally, why this design and not a simpler change which only
> deactivates processes whose I/O is waited by the current thread, as
> discussed previously?

I looked into that, but I determined that it's too complicated.  It
would require us to keep track of a new kind of process state: a process
which has been deleted, but hasn't been deactivated yet.  This would
need to be supported for each kind of process, and we'd need to record
new kinds of state to finish the deactivation later.

It's simpler to keep the ability to delete any process at any time, and
have that fully deactivate the process as it does today.  Doing that
only causes one issue: if another thread is selecting on the process's
fds, that thread will break if it tries to handle events on the fds, or
the thread will simply get EBADF from select.  So, we detect that by
checking waiting_thread, and marking the fds so that the other thread
knows not to touch them, and to instead just close them to finish
deleting the process.

> I think this alternative would be easier to understand and reason
> about, because (as established in another discussion)
> wait_reading_process_output frequently loops more than we expect it
> to.

I'm not sure what you mean by that, why does wait_reading_process_output
looping more than we expect cause issues?  I don't think it would.

> Looking at all the callers of close_process_fd, it sounds like it's
> too low-level to do this.  For example, it is called from
> create_process (including in the forked child) to close the unneeded
> ends of the pipe, where we probably don't want this.  And
> clear_fd_callback_data is also used for the keyboard input descriptor,
> which is probably also not relevant.

True.

I looked into fixing these, and that suggested to me an altogether
simpler approach.  We can keep the logic entirely contained in
wait_reading_process_output, just adding a bit of code around the select
call.

And we don't need to defer deleting the fds at all: just recognize when
it's happened, and be careful to not touch those fds afterwards.

Attached.


--=-=-=
Content-Type: text/x-patch
Content-Disposition: inline;
 filename=0001-Ignore-descriptors-deleted-during-select.patch

>From b37e71f47913e1011c4a7857e166e78eddc2a184 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh@janestreet.com>
Date: Sat, 30 Aug 2025 11:42:19 -0400
Subject: [PATCH] Ignore descriptors deleted during select

* src/process.c (wait_reading_process_output): Save the fds
we're waiting on and check that we're still waiting after
select. (bug#79334)
---
 src/process.c | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/src/process.c b/src/process.c
index 1612248a11d..11af2f55f63 100644
--- a/src/process.c
+++ b/src/process.c
@@ -5643,6 +5643,7 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 	  nfds = read_kbd ? 0 : 1;
 	  no_avail = 1;
 	  FD_ZERO (&Available);
+	  xerrno = 0;
 	}
       else
 	{
@@ -5756,6 +5757,18 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 	      && timespec_cmp (short_timeout, timeout) < 0)
 	    timeout = short_timeout;
 #endif
+      /* Save the fds we're currently waiting_thread for.  */
+      fd_set waiting_fds;
+      FD_ZERO (&waiting_fds);
+      int max_waiting_fds = 0;
+      for (int fd = 0; fd <= max_desc; ++fd)
+        {
+	  if (fd_callback_info[fd].waiting_thread == current_thread)
+	    {
+	      FD_SET (fd, &waiting_fds);
+	      max_waiting_fds = fd;
+	    }
+	}
 
 	  /* Android requires using a replacement for pselect in
 	     android.c to poll for events.  */
@@ -5809,10 +5822,30 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 		    FD_SET (channel, &Available);
 	    }
 #endif
+	  xerrno = errno;
+	  /* Check if any of waiting_fds was closed by another thread or
+	     a signal handler. */
+	  for (int fd = 0; fd <= max_waiting_fds; ++fd)
+	    {
+	      if (FD_ISSET (fd, &waiting_fds)
+		  && fd_callback_info[fd].waiting_thread != current_thread)
+		{
+		  /* Ignore any events which we happened to see on this
+		     fd before it was closed; we can't do anything for
+		     it now, since it's closed.  */
+		  FD_CLR (fd, &Available);
+		  FD_CLR (fd, &Writeok);
+		  /* If we got an EBADF and no events, this is probably
+		     why; clear it so we can try again.  */
+		  if (nfds < 0 && xerrno == EBADF)
+		    {
+		      nfds = 0;
+		      xerrno = 0;
+		    }
+		}
+	    }
 	}
 
-      xerrno = errno;
-
       /* Make C-g and alarm signals set flags again.  */
       clear_waiting_for_input ();
 
-- 
2.43.7


--=-=-=--


From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 30 12:36:17 2025
Received: (at 79334) by debbugs.gnu.org; 30 Aug 2025 16:36:18 +0000
Received: from localhost ([127.0.0.1]:49502 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1usOYq-0003EB-T7
	for submit@debbugs.gnu.org; Sat, 30 Aug 2025 12:36:17 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:52644)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1usOYk-0003Do-CI
 for 79334@debbugs.gnu.org; Sat, 30 Aug 2025 12:36:13 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1usOYb-0001nr-6T; Sat, 30 Aug 2025 12:36:01 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=raezIEp48dBi7BFRaXVbPHQaQPAaSVCnHRK66D8sQDQ=; b=rUlwx0ctE15Z
 2iusHfN27meVn8FZwbHrkdjQNfkRZJSg9I8ISUhaC/cu/VJyqAxPQsa/MKwf550Jzv6owE8RZopJJ
 BWnYBa6FI5nDcDnYzFIWNFcWChRRO6KZl42OmI0l6ODwLmW+ZB30fD7ACzekdPk3vJ9SvmQJZBFx9
 62sUeElm29Nm3M+7vDYE4S4URdQCtGHOJRs1df/JT4ZAL74sjnBzEZPLD9yF2+VxRFneCoVC41mtW
 M4ivDbEpopFnm2H2no78I46/0Ta4wJAtz1vGgCEnvcxe4parbbLZfNw87tqcFpXWQX0a6VViQMbxy
 7CZLvWwANgOm21ydcL4w4A==;
Date: Sat, 30 Aug 2025 19:35:39 +0300
Message-Id: <86jz2kpy1w.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: Spencer Baugh <sbaugh@janestreet.com>
In-Reply-To: <ierldn0olcg.fsf@janestreet.com> (message from Spencer Baugh on
 Sat, 30 Aug 2025 11:55:27 -0400)
Subject: Re: bug#79334: [PATCH] Don't release thread select lock
 unnecessarily, [PATCH] Defer closing file descriptors used by other
 threads
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com> <ier5xe57vrr.fsf@janestreet.com>
 <ier34997u4c.fsf_-_@janestreet.com> <86zfbhqptt.fsf@gnu.org>
 <ierldn0olcg.fsf@janestreet.com>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Spencer Baugh <sbaugh@janestreet.com>
> Cc: 79334@debbugs.gnu.org,  eggert@cs.ucla.edu,  dmitry@gutov.dev
> Date: Sat, 30 Aug 2025 11:55:27 -0400
> 
> >> +	      /* Ignore any events which happened on this fd. */
> >> +	      FD_CLR (fd, &Available);
> >> +	      FD_CLR (fd, &Writeok);
> >
> > Is this wise?  How can we be sure that these events were already
> > handled (presumably, in another thread)?  If they were not, we will
> > lose events.  What will happen if we don't ignore them here?
> 
> Any events which might appear at this point were already events we were
> going to lose by closing this file descriptor.  For example, if a
> process produces output just as we do delete-process on it, we never
> read that output.  The only change is that we now have to explicitly
> ignore those events; before, we silently dropped them at the time we
> closed the file descriptor.

Where did we close it?  There's this loop _after_ the place where you
add your loop which clears the bits in Available:

      for (channel = 0; channel <= max_desc; ++channel)
        {
          struct fd_callback_data *d = &fd_callback_info[channel];
          if (d->func
	      && ((d->flags & FOR_READ
		   && FD_ISSET (channel, &Available))
		  || ((d->flags & FOR_WRITE)
		      && FD_ISSET (channel, &Writeok))))
            d->func (channel, d->data);
	}

AFAIU, it would notice the set bits and act upon them.  Now with this
new addition we are clearing them ahead of that loop.  That doesn't
necessarily sound right to me.

> >> +static bool
> >> +other_thread_is_waiting_for (struct fd_callback_data* elem)
> >> +{
> >> +  return elem->waiting_thread != NULL && elem->waiting_thread != current_thread;
> >> +}
> >
> > This should also check that the waiting thread is still alive.  If it
> > isn't, it's okay to close the descriptor.  Otherwise, descriptors
> > belonging to threads that exited might never be closed.
> 
> waiting_thread always points to a living thread, because it's only set
> while a thread is inside wait_reading_process_output.
> 
> (In fact, when a thread sees waiting_thread set to a value which is
> neither NULL nor current_thread, it can only ever point to a thread
> which is currently inside thread_select)

You assume that the waiting_thread field will be cleared when a thread
dies, but that is not guaranteed.  Why isn't it better to be defensive
and make sure the thread is still alive?  If you are right, that test
will always be true, but what if you are wrong in some rare cases?

> >> +	  if (fd_callback_info[fd].waiting_thread == current_thread
> >> +	      && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD))
> >> +	    {
> >> +	      fprintf (stderr, "closing deferred fd %d\n", fd);
> >> +	      emacs_close (fd);
> >> +	      /* Ignore any events which happened on this fd. */
> >> +	      FD_CLR (fd, &Available);
> >> +	      FD_CLR (fd, &Writeok);
> >> +	      fd_callback_info[fd].flags = 0;
> >
> > Why do it here and not when we deactivate processes?
> 
> Because that would close file descriptors that another thread is
> currently selecting on, causing EBADF.

No, I mean to deactivate only the processes whose I/O channels are
locked to the current thread.

> > Until now we never closed any descriptors inside
> > wait_reading_process_output.
> 
> Just to be clear, we can call deactivate_process from
> wait_reading_process_output, through status_notify, through process
> filters, or other similar.  That closes descriptors.

Yes, and so I suggest doing this logic of deactivating only the
processes that are locked to the current thread there.

> > More generally, why this design and not a simpler change which only
> > deactivates processes whose I/O is waited by the current thread, as
> > discussed previously?
> 
> I looked into that, but I determined that it's too complicated.  It
> would require us to keep track of a new kind of process state: a process
> which has been deleted, but hasn't been deactivated yet.

What I had in mind was neither delete it nor deactivate it.  IOW, the
loop which goes through the list of processes should only pay
attention to processes locked to this thread and those that are not
locked to any live thread.  It should leave the other processes to
when their threads call status_notify.

> > Looking at all the callers of close_process_fd, it sounds like it's
> > too low-level to do this.  For example, it is called from
> > create_process (including in the forked child) to close the unneeded
> > ends of the pipe, where we probably don't want this.  And
> > clear_fd_callback_data is also used for the keyboard input descriptor,
> > which is probably also not relevant.
> 
> True.
> 
> I looked into fixing these, and that suggested to me an altogether
> simpler approach.  We can keep the logic entirely contained in
> wait_reading_process_output, just adding a bit of code around the select
> call.
> 
> And we don't need to defer deleting the fds at all: just recognize when
> it's happened, and be careful to not touch those fds afterwards.

Hmm... I don't understand: why do we need to do this?  The calls to
compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which
set bits in the Available mask passed top pselect already avoid
setting bits for descriptors on which some other thread waits.  So
pselect cannot possibly indicate any output ready on any descriptors
belonging to other threads, and this loop you suggest to add:

> +      /* Save the fds we're currently waiting_thread for.  */
> +      fd_set waiting_fds;
> +      FD_ZERO (&waiting_fds);
> +      int max_waiting_fds = 0;
> +      for (int fd = 0; fd <= max_desc; ++fd)
> +        {
> +	  if (fd_callback_info[fd].waiting_thread == current_thread)
> +	    {
> +	      FD_SET (fd, &waiting_fds);
> +	      max_waiting_fds = fd;
> +	    }
> +	}

should be unnecessary.  Or what am I missing?


From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 30 14:39:05 2025
Received: (at 79334) by debbugs.gnu.org; 30 Aug 2025 18:39:05 +0000
Received: from localhost ([127.0.0.1]:49777 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1usQTg-0000rM-HA
	for submit@debbugs.gnu.org; Sat, 30 Aug 2025 14:39:04 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:56948)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1usQTe-0000qs-5m
 for 79334@debbugs.gnu.org; Sat, 30 Aug 2025 14:39:02 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1usQTX-0000q6-Ks; Sat, 30 Aug 2025 14:38:55 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=8LEoYm1gbNW2XxsqKaotoKk4KLK8AoTVTw7bWqdpKeI=; b=D7IEz5XMQNSF
 b+BSITnsDKLNYKblAPBqkvKWdsgGGPC84kbTgOyhunHlP1VLUW/hhQtoeuztQGUJfiwIDeR5Vl+v4
 q2rsS3RfFZpxQT4ZDsZnMLEPjgVOEcTIw46QnohG5bTI/hHHDVldZKfuFBLlAGNkqwVDYXlEPVFWV
 I9HTaqd1p18pSjHnkBpYjYeRXeso8ZLHO0Jwh0X6LdAXo45uf+ahCvJaSEWOlf/PmH7ZrZLUJz58m
 VuKtstb7J/jhDRhLIwZk5x7V+dKYVg7HF2W8FTsV8k48yA+L9n6hDRXt9hFTn7nMNwDZY/E//IOgi
 a/4vxTrHYgN0dW1nDX+wYg==;
Date: Sat, 30 Aug 2025 21:38:52 +0300
Message-Id: <86a53gpscj.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: sbaugh@janestreet.com
In-Reply-To: <86jz2kpy1w.fsf@gnu.org> (message from Eli Zaretskii on Sat, 30
 Aug 2025 19:35:39 +0300)
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily, 
 [PATCH] Defer closing file descriptors used by other threads
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com> <ier5xe57vrr.fsf@janestreet.com>
 <ier34997u4c.fsf_-_@janestreet.com> <86zfbhqptt.fsf@gnu.org>
 <ierldn0olcg.fsf@janestreet.com> <86jz2kpy1w.fsf@gnu.org>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
> Date: Sat, 30 Aug 2025 19:35:39 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> > >> +	  if (fd_callback_info[fd].waiting_thread == current_thread
> > >> +	      && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD))
> > >> +	    {
> > >> +	      fprintf (stderr, "closing deferred fd %d\n", fd);
> > >> +	      emacs_close (fd);
> > >> +	      /* Ignore any events which happened on this fd. */
> > >> +	      FD_CLR (fd, &Available);
> > >> +	      FD_CLR (fd, &Writeok);
> > >> +	      fd_callback_info[fd].flags = 0;
> > >
> > > Why do it here and not when we deactivate processes?
> > 
> > Because that would close file descriptors that another thread is
> > currently selecting on, causing EBADF.
> 
> No, I mean to deactivate only the processes whose I/O channels are
> locked to the current thread.
> 
> > > Until now we never closed any descriptors inside
> > > wait_reading_process_output.
> > 
> > Just to be clear, we can call deactivate_process from
> > wait_reading_process_output, through status_notify, through process
> > filters, or other similar.  That closes descriptors.
> 
> Yes, and so I suggest doing this logic of deactivating only the
> processes that are locked to the current thread there.

Btw, as a nice bonus of that approach, process sentinel will run in
the context of the thread to which the process is locked.


From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 30 23:20:39 2025
Received: (at 79334) by debbugs.gnu.org; 31 Aug 2025 03:20:39 +0000
Received: from localhost ([127.0.0.1]:51030 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1usYcR-0003dk-BX
	for submit@debbugs.gnu.org; Sat, 30 Aug 2025 23:20:39 -0400
Received: from fout-b4-smtp.messagingengine.com ([202.12.124.147]:59649)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <dmitry@gutov.dev>) id 1usYcN-0003dW-P7
 for 79334@debbugs.gnu.org; Sat, 30 Aug 2025 23:20:37 -0400
Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46])
 by mailfout.stl.internal (Postfix) with ESMTP id 0AFD41D00079;
 Sat, 30 Aug 2025 23:20:30 -0400 (EDT)
Received: from phl-mailfrontend-02 ([10.202.2.163])
 by phl-compute-06.internal (MEProxy); Sat, 30 Aug 2025 23:20:30 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc
 :cc:content-transfer-encoding:content-type:content-type:date
 :date:from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:subject:subject:to:to; s=fm2; t=1756610429;
 x=1756696829; bh=kBqdDm/J9GTebJbtyuW2zpOIxB1rhdME0Tm8bp3Ab7I=; b=
 fOPGnDhG0mGXGFYEBlMFdy8CpNsCzgDZzndt6JI5CKk+TdK3Pt3MqGgOsChq56An
 qjc/44gE8+2oRItg4kJyyrht/6GF1yMIUxQ0SPkrXSw6QUtV3OfXaABaf8rALsrl
 U1rcy3VyTdnRP1FrEFdDBQgDrsPCzLW3qvCQX1PBmi76rIjSaC5Ai+mDYIyXUuYK
 8prpcK5IeiGOiA03Gwl6FG5i6xHr/vNbIY9MQhf0Vwjp8AcvT2SFE8uf88T7XHLj
 Hs+s3BSOIdH6/JpzhsGzIVFxt3oO2aBw6phKV39mzyJZwPeFCiXDAJkABqDplJ6s
 SdnrxETMTl6CMzBE5PtlMw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding
 :content-type:content-type:date:date:feedback-id:feedback-id
 :from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:subject:subject:to:to:x-me-proxy
 :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1756610429; x=
 1756696829; bh=kBqdDm/J9GTebJbtyuW2zpOIxB1rhdME0Tm8bp3Ab7I=; b=D
 K7+vxNeR0ntkaLpqbYvcLK93lYK7VhaHi8FxuBg0EfnRKcayUcBSWUUkmDHJY7Sm
 5I4VMun8tK8eDWgSzw7kXCm4GEZ9NcTMP0nps4i0+0ZJUy+YZUSQNFyG9ZqDDwGM
 APzu5eqjDmj2gVX04PkVjVrSVdyyJbKdqwtf37UarBOEuxIyS5Jk1OG+hLh4f/2y
 geKq9lRPLh8hEqSzpunA/kKIvAM37VisqG5kXKkfLTJhb9Mv5dcHJj/sjktGBeTR
 DE05TtgS5p5/aSmMdkPY8IoFiVPV60zb9Z1m3wwCzBcURduyzViZMG46gM4tiCt/
 0wwymAt7G9tO6IBufSksg==
X-ME-Sender: <xms:fb-zaJJbRM1arfSVIYRbEaUAur_y4aDQJlCOXH93lT8_IMQAkz9vIQ>
 <xme:fb-zaOleOYQDvsTPClHTVbiZYo6DID-JbGZJ6K4b8ya2vVH5E0JgUZJ5bdrHzFxBm
 iCCOWuzcdMPLdKaEVU>
X-ME-Received: <xmr:fb-zaBQNFbODelBz5qdq3KehI0CuTbDGblceis-v9HV2KiI5F8HfuH6mK7R1R0_tqFQ>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgddukeekudekucetufdoteggodetrf
 dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu
 rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf
 gurhepkfffgggfuffvvehfhfgjtgfgsehtkeertddtvdejnecuhfhrohhmpeffmhhithhr
 hicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrthhtvg
 hrnhepgeelfeetkefghfdvhfdtgeevveevteetgeetveegtedthefhudekteehffeukeek
 necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepughmih
 htrhihsehguhhtohhvrdguvghvpdhnsggprhgtphhtthhopeegpdhmohguvgepshhmthhp
 ohhuthdprhgtphhtthhopehssggruhhghhesjhgrnhgvshhtrhgvvghtrdgtohhmpdhrtg
 hpthhtohepvghlihiisehgnhhurdhorhhgpdhrtghpthhtohepjeelfeefgeesuggvsggs
 uhhgshdrghhnuhdrohhrghdprhgtphhtthhopegvghhgvghrthestghsrdhutghlrgdrvg
 guuh
X-ME-Proxy: <xmx:fb-zaAKdERhbPcEsPrHURpFs9IM7n0M1ICubOYAXLr49_L7wJxovmA>
 <xmx:fb-zaHCMlKeBgkYIscewssvW5Msd56C1d6_KaH8uYVz3uyLMYzPLFQ>
 <xmx:fb-zaJD-vgq6tUJ60TYsMvEhvCRx8XHfO0Be5Hma9T7cVIulqX8dLA>
 <xmx:fb-zaJDtaD7241TZECoZm8_rEU0IEDil7Xo5n7RCBmY8TSR3ztsyEg>
 <xmx:fb-zaBGSNoHz0B8JbfbYzhbIKAAx7IYzArrsZU5XwahikBcmbYOtidVH>
Feedback-ID: i07de48aa:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat,
 30 Aug 2025 23:20:27 -0400 (EDT)
Message-ID: <55676972-950d-440c-a291-0adefced4e6f@gutov.dev>
Date: Sun, 31 Aug 2025 06:20:24 +0300
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily
To: Spencer Baugh <sbaugh@janestreet.com>, Eli Zaretskii <eliz@gnu.org>
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ieriki66y3g.fsf@janestreet.com> <867bymrzvr.fsf@gnu.org>
 <ierfrda6v41.fsf@janestreet.com> <865xe6rv8z.fsf@gnu.org>
 <CAO=BR8PCwypg+7TMbbQueQh82drqQTeYNU0LGqVQ=oDv58v5NQ@mail.gmail.com>
 <86349aruhe.fsf@gnu.org>
 <CAO=BR8OFBKbTht7_uQsKOsfNOBz_EqTPBrrF9fizv8qb5K8KJw@mail.gmail.com>
Content-Language: en-US
From: Dmitry Gutov <dmitry@gutov.dev>
In-Reply-To: <CAO=BR8OFBKbTht7_uQsKOsfNOBz_EqTPBrrF9fizv8qb5K8KJw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Score: -0.7 (/)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -1.7 (-)

On 29/08/2025 19:11, Spencer Baugh wrote:
> On Fri, Aug 29, 2025, 11:57 AM Eli Zaretskii <eliz@gnu.org 
> <mailto:eliz@gnu.org>> wrote:
> 
>      > From: Spencer Baugh <sbaugh@janestreet.com
>     <mailto:sbaugh@janestreet.com>>
>      > Date: Fri, 29 Aug 2025 11:43:21 -0400
>      > Cc: 79334@debbugs.gnu.org <mailto:79334@debbugs.gnu.org>,
>     eggert@cs.ucla.edu <mailto:eggert@cs.ucla.edu>, Dmitry Gutov
>     <dmitry@gutov.dev <mailto:dmitry@gutov.dev>>
>      >
>      > Okay, so then how about we delete this unsafe thread_select and
>     put some thread-yield calls around
>      > wait_reading_process_output?
> 
>     I'm not objected to not calling thread_select there, but can we have
>     tests that will show us the gains?
> 
> 
> The primary gain is that it becomes easier to reason about locking.  I'm 
> not sure how to write a test for that :)

I think what could help in this area is some benchmarking scenario, and 
the goal would be to show that it doesn't regress (or within the margin 
of error) when 'pselect' is used directly.

A script which would create a bunch of threads and some amount of 
processes spawned by them with certain characteristics. Depending on 
what workload switching in that thread_select could help optimize.


From debbugs-submit-bounces@debbugs.gnu.org Sun Aug 31 20:40:05 2025
Received: (at 79334) by debbugs.gnu.org; 1 Sep 2025 00:40:05 +0000
Received: from localhost ([127.0.0.1]:55810 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1ussaa-000670-Uw
	for submit@debbugs.gnu.org; Sun, 31 Aug 2025 20:40:05 -0400
Received: from fhigh-b4-smtp.messagingengine.com ([202.12.124.155]:41797)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <dmitry@gutov.dev>) id 1ussaW-00065X-Mg
 for 79334@debbugs.gnu.org; Sun, 31 Aug 2025 20:40:02 -0400
Received: from phl-compute-10.internal (phl-compute-10.internal [10.202.2.50])
 by mailfhigh.stl.internal (Postfix) with ESMTP id 2EE427A00E5;
 Sun, 31 Aug 2025 20:39:55 -0400 (EDT)
Received: from phl-mailfrontend-02 ([10.202.2.163])
 by phl-compute-10.internal (MEProxy); Sun, 31 Aug 2025 20:39:55 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc
 :cc:content-transfer-encoding:content-type:content-type:date
 :date:from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:subject:subject:to:to; s=fm2; t=1756687195;
 x=1756773595; bh=8YCpyn73utj8PCl/bcalfnUy6XzbxM41ctQwwXN7Y/k=; b=
 AVORwJqvW96dssj4L65bo2MiHWyN+Tk5CTwo/K3aWosHRCc8iIcB6XfOHGgNSQ53
 FijsWEBhPDn+E4UyqHyVIgQAusKHFouh7JYOzLvU1zwShYxEnHmPt1oSi2KjQK/g
 txvyU0kKhT+C66Q3U3SXwhQX4GR33wBtEDiIv2lJxP2nc30//P/FyzaRbb2BHED3
 4RPIfRKP/G4tkcob0ekyOd3ia1rt/CCC2RgeDiMV/AseDxVvXi4H8Kld2GU6hhZe
 bB6hw1siAPpdE0JsOC/7yrUT6VXVrsnKH4qutbV7/XCOHFCmLt5GJcdO1k4uS1hs
 iHrcL0p/qyD3SMzfTR5UUg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:cc:content-transfer-encoding
 :content-type:content-type:date:date:feedback-id:feedback-id
 :from:from:in-reply-to:in-reply-to:message-id:mime-version
 :references:reply-to:subject:subject:to:to:x-me-proxy
 :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1756687195; x=
 1756773595; bh=8YCpyn73utj8PCl/bcalfnUy6XzbxM41ctQwwXN7Y/k=; b=h
 qEHRczFVR9ZRcLfxiFabSf6U9l01g/yYCVxg8tWTwIXo9aLz4ZMzZ7o97b1rXEs3
 OtaODJ5im61pA74AwBGvjT9OjfdC/QR1mk9y/LKDI7VQSI2nx0Y2C4jaZzbjGNfo
 pxcUO7wXASuaqBMA/7ouaF5RZL/RR0rBFnslu9bXIBakWi6eiDnc9XLBdOT+XH6+
 T6UcH+hFQA/ScvqiCyMcI8BeJA8DRZnNXldtLCT8h4APi45jR1WlMKXhIHx2HhP+
 mvudFQ4AoWhmjdwrFnnfl+nlyd2rS6NCmhs0WAdCzPc5d21Q16+hPE5rBWIznPsg
 vDUS9piQScYSnfJ1NYlXA==
X-ME-Sender: <xms:Wuu0aGVsSvztFosLuCmujArUjhwuZEzEliX_EpIwrpXL5sXopeLobQ>
 <xme:Wuu0aMAacHks8W-AAfwG3dGtgzQfeAVB6Qyr3SG2OX2gWz2MO3E65ZhuKkMh8KrAX
 FDEQy-Fabmyl-8H1Ds>
X-ME-Received: <xmr:Wuu0aJ8xWuTKqf-thVNTIkfGNtN_aSd9UJY6e1TSe-_i39O3mjWqQi8DMp69jxteo2M>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgdduledtjeefucetufdoteggodetrf
 dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu
 rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf
 gurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeffmhhithhr
 hicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrthhtvg
 hrnhepteduleejgeehtefgheegjeekueehvdevieekueeftddvtdevfefhvdevgedujeeh
 necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepughmih
 htrhihsehguhhtohhvrdguvghvpdhnsggprhgtphhtthhopeegpdhmohguvgepshhmthhp
 ohhuthdprhgtphhtthhopegvlhhiiiesghhnuhdrohhrghdprhgtphhtthhopehssggruh
 hghhesjhgrnhgvshhtrhgvvghtrdgtohhmpdhrtghpthhtohepjeelfeefgeesuggvsggs
 uhhgshdrghhnuhdrohhrghdprhgtphhtthhopegvghhgvghrthestghsrdhutghlrgdrvg
 guuh
X-ME-Proxy: <xmx:Wuu0aDEOe2tiZckX0ajNshpE-EP9oJS0R0LfkBhM7LrQ3rA_dwUlnQ>
 <xmx:Wuu0aHP3fQD5xTaFimlwpclVQCJPISrFKpxC75AfbKlVXOe96zikrw>
 <xmx:Wuu0aNdACt23rE7himh7yXxrcC5iSHGmautYPK7oHqs5hyZoz3LCxQ>
 <xmx:Wuu0aMvvcxGI_cxk97VJCk7s_dSFveANGfoFFanEdvKRXJcE9AKKtg>
 <xmx:W-u0aPz_Db3Z-Rz71tsA3AATxekg3Da_5mECrtxhXsTou41J7vFQXOVX>
Feedback-ID: i07de48aa:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun,
 31 Aug 2025 20:39:53 -0400 (EDT)
Message-ID: <ea775e4c-d182-48ce-b7c1-76ba2051cd0f@gutov.dev>
Date: Mon, 1 Sep 2025 03:39:50 +0300
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: bug#79334: [PATCH] Don't release thread select lock unnecessarily
To: Eli Zaretskii <eliz@gnu.org>, Spencer Baugh <sbaugh@janestreet.com>
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ieriki66y3g.fsf@janestreet.com> <867bymrzvr.fsf@gnu.org>
 <ierfrda6v41.fsf@janestreet.com> <865xe6rv8z.fsf@gnu.org>
Content-Language: en-US
From: Dmitry Gutov <dmitry@gutov.dev>
In-Reply-To: <865xe6rv8z.fsf@gnu.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -0.7 (/)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -1.7 (-)

On 29/08/2025 18:41, Eli Zaretskii wrote:
> For example, try to run something in a non-main
> thread and then attempt to interact with Emacs in the main thread.

Could that be code that does straight computation and does not switch 
threads at all, unless sleep-for is added?

Perhaps relatedly, diff-hl in "threaded" mode has a much better 
interactivity since switching to 'make-process' from 'call-process' for 
its external program interaction. But you're probably thinking of 
different kind of examples.


From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 01 12:18:46 2025
Received: (at 79334) by debbugs.gnu.org; 1 Sep 2025 16:18:46 +0000
Received: from localhost ([127.0.0.1]:58313 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1ut7F0-0008Nb-08
	for submit@debbugs.gnu.org; Mon, 01 Sep 2025 12:18:46 -0400
Received: from mxout5.mail.janestreet.com ([64.215.233.18]:38151)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <sbaugh@janestreet.com>)
 id 1ut7Ex-0008NM-4X
 for 79334@debbugs.gnu.org; Mon, 01 Sep 2025 12:18:43 -0400
From: Spencer Baugh <sbaugh@janestreet.com>
To: Eli Zaretskii <eliz@gnu.org>
Subject: Re: bug#79334: [PATCH] Don't release thread select lock
 unnecessarily, [PATCH] Defer closing file descriptors used by other
 threads
In-Reply-To: <86jz2kpy1w.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 30 Aug
 2025 19:35:39 +0300")
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com> <ier5xe57vrr.fsf@janestreet.com>
 <ier34997u4c.fsf_-_@janestreet.com> <86zfbhqptt.fsf@gnu.org>
 <ierldn0olcg.fsf@janestreet.com> <86jz2kpy1w.fsf@gnu.org>
Date: Mon, 01 Sep 2025 12:18:37 -0400
Message-ID: <iera53eno2q.fsf@janestreet.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1756743517;
 bh=thdIrYshUBoOQVmp2I25ATjxrErnxWZ7ivf5EY1vQNc=;
 h=From:To:Cc:Subject:In-Reply-To:References:Date;
 b=3Dq5Z4ZTjnYJbtNpiOdCmv5jzIAhFXJNU4uUQc+o8f0MdOykWOnDLTErnoonsytY8
 P4yFt8AfpwduaD+HXAJ6itz8AwRLZ9MnhNpQhC+lLCyf2q0EKB2BgN+NXz20NLGcQs
 fsR4Ca0tcLATHS3oX7fLUjANkcOPk8f4lTqDHFRR/pc7hYEDs6evR1++qGevKHxv/e
 EWCo52T4sS+BKlFDP++v6lB3dfwr5wvkHyLp65S7nJR5HjTf42b435rgNxXCjopqv/
 71ClIv21i+qUBV8C56OItVcnMkBETLdPb99BrAhatkYMwmIWgsGtMGOrXA+G39QNtd
 vLPb+NIkdGS/g==
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Eli Zaretskii <eliz@gnu.org> writes:
>> From: Spencer Baugh <sbaugh@janestreet.com>
>> >> +static bool
>> >> +other_thread_is_waiting_for (struct fd_callback_data* elem)
>> >> +{
>> >> +  return elem->waiting_thread != NULL && elem->waiting_thread != current_thread;
>> >> +}
>> >
>> > This should also check that the waiting thread is still alive.  If it
>> > isn't, it's okay to close the descriptor.  Otherwise, descriptors
>> > belonging to threads that exited might never be closed.
>> 
>> waiting_thread always points to a living thread, because it's only set
>> while a thread is inside wait_reading_process_output.
>> 
>> (In fact, when a thread sees waiting_thread set to a value which is
>> neither NULL nor current_thread, it can only ever point to a thread
>> which is currently inside thread_select)
>
> You assume that the waiting_thread field will be cleared when a thread
> dies, but that is not guaranteed.  Why isn't it better to be defensive
> and make sure the thread is still alive?  If you are right, that test
> will always be true, but what if you are wrong in some rare cases?

Fair enough; the patch has changed a lot so this
other_thread_is_waiting_for function doesn't exist anymore, but if I
re-add it or similar checks, I will check that the thread is alive.

>> >> +	  if (fd_callback_info[fd].waiting_thread == current_thread
>> >> +	      && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD))
>> >> +	    {
>> >> +	      fprintf (stderr, "closing deferred fd %d\n", fd);
>> >> +	      emacs_close (fd);
>> >> +	      /* Ignore any events which happened on this fd. */
>> >> +	      FD_CLR (fd, &Available);
>> >> +	      FD_CLR (fd, &Writeok);
>> >> +	      fd_callback_info[fd].flags = 0;
>> >
>> > Why do it here and not when we deactivate processes?
>> 
>> Because that would close file descriptors that another thread is
>> currently selecting on, causing EBADF.
>
> No, I mean to deactivate only the processes whose I/O channels are
> locked to the current thread.
>
>> > Until now we never closed any descriptors inside
>> > wait_reading_process_output.
>> 
>> Just to be clear, we can call deactivate_process from
>> wait_reading_process_output, through status_notify, through process
>> filters, or other similar.  That closes descriptors.
>
> Yes, and so I suggest doing this logic of deactivating only the
> processes that are locked to the current thread there.
>
>> > More generally, why this design and not a simpler change which only
>> > deactivates processes whose I/O is waited by the current thread, as
>> > discussed previously?
>> 
>> I looked into that, but I determined that it's too complicated.  It
>> would require us to keep track of a new kind of process state: a process
>> which has been deleted, but hasn't been deactivated yet.
>
> What I had in mind was neither delete it nor deactivate it.  IOW, the
> loop which goes through the list of processes should only pay
> attention to processes locked to this thread and those that are not
> locked to any live thread.  It should leave the other processes to
> when their threads call status_notify.

There are too many ways in which we delete or deactivate processes to do
that.  In particular, Lisp programs should be able to call
delete-process on a process without getting an error just because some
thread is currently calling select and happens to be monitoring that
process.  To emit such an error would make programming with processes
and threads extremely difficult.

>> > Looking at all the callers of close_process_fd, it sounds like it's
>> > too low-level to do this.  For example, it is called from
>> > create_process (including in the forked child) to close the unneeded
>> > ends of the pipe, where we probably don't want this.  And
>> > clear_fd_callback_data is also used for the keyboard input descriptor,
>> > which is probably also not relevant.
>> 
>> True.
>> 
>> I looked into fixing these, and that suggested to me an altogether
>> simpler approach.  We can keep the logic entirely contained in
>> wait_reading_process_output, just adding a bit of code around the select
>> call.
>> 
>> And we don't need to defer deleting the fds at all: just recognize when
>> it's happened, and be careful to not touch those fds afterwards.
>
> Hmm... I don't understand: why do we need to do this?  The calls to
> compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which
> set bits in the Available mask passed top pselect already avoid
> setting bits for descriptors on which some other thread waits.  So
> pselect cannot possibly indicate any output ready on any descriptors
> belonging to other threads, and this loop you suggest to add:
>
>> +      /* Save the fds we're currently waiting_thread for.  */
>> +      fd_set waiting_fds;
>> +      FD_ZERO (&waiting_fds);
>> +      int max_waiting_fds = 0;
>> +      for (int fd = 0; fd <= max_desc; ++fd)
>> +        {
>> +	  if (fd_callback_info[fd].waiting_thread == current_thread)
>> +	    {
>> +	      FD_SET (fd, &waiting_fds);
>> +	      max_waiting_fds = fd;
>> +	    }
>> +	}
>
> should be unnecessary.  Or what am I missing?

That's exactly the problem.  A file descriptor for which we set
waiting_thread can have delete_read_fd or delete_write_fd called on it
by another thread, or by a signal handler, while we're waiting in
pselect.

We need to catch the fact that the file descriptor has been removed from
the set of file descriptors we're supposed to monitor, and avoid
reacting to events on it.  We can catch this easily by seeing that
waiting_thread has been cleared or changed while we were blocked in
select.

BTW, the fact that this problematic case is also possible via signal
handlers (namely the SIGCHLD handler) is a separate bug introduced by
the refactoring which added fd_callback_info.  It means that with a
certain interleaving, I think it should be possible to get buggy
behavior by just using processes without threads.  It's quite rare
though and I haven't been able to figure out what exact path that would
cause a problem.


From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 01 15:25:45 2025
Received: (at 79334) by debbugs.gnu.org; 1 Sep 2025 19:25:46 +0000
Received: from localhost ([127.0.0.1]:58537 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1utA9x-0002bo-A8
	for submit@debbugs.gnu.org; Mon, 01 Sep 2025 15:25:45 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:41010)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1utA9u-0002bU-Mv
 for 79334@debbugs.gnu.org; Mon, 01 Sep 2025 15:25:43 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1utA9l-0001HC-8M; Mon, 01 Sep 2025 15:25:33 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=9B0ZZ/jkgQ7hrvUpL46jOl1XsDU80uUi/rpDHxM6vbs=; b=i3DyP9NmmLbu
 uGLvTZ/KnSy+hJe/SQNAgQlGKE6Ajsc9XbeIHECP+fTHITp0KGqkjla7QvHSGDFievkhi3nRBAtMM
 tZF25njzMhxbr3wOFyjhU5J4BCKhMNGh8J+r7TpZazA41r9KxAipjvsDSAr7FlVyeFgwid0YKthts
 ffiZoI75sxKN9IOobVeuV/vm6hLEOOOsAuYTsgmiz4IhMDkXReUN4JJObqCq5Af8E0tkXR+4du/DS
 WC4YwBMPQiSTbFahlNOwObQVD9PdE+Acj9upRZ3yiUIBRTCP5PwhF0v02+H7feGcRCutZ4fJt+qX+
 522hdddkfidmicw//knukw==;
Date: Mon, 01 Sep 2025 22:25:26 +0300
Message-Id: <86ldmym0ux.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: Spencer Baugh <sbaugh@janestreet.com>
In-Reply-To: <iera53eno2q.fsf@janestreet.com> (message from Spencer Baugh on
 Mon, 01 Sep 2025 12:18:37 -0400)
Subject: Re: bug#79334: [PATCH] Don't release thread select lock
 unnecessarily, [PATCH] Defer closing file descriptors used by other
 threads
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com> <ier5xe57vrr.fsf@janestreet.com>
 <ier34997u4c.fsf_-_@janestreet.com> <86zfbhqptt.fsf@gnu.org>
 <ierldn0olcg.fsf@janestreet.com> <86jz2kpy1w.fsf@gnu.org>
 <iera53eno2q.fsf@janestreet.com>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Spencer Baugh <sbaugh@janestreet.com>
> Cc: 79334@debbugs.gnu.org,  eggert@cs.ucla.edu,  dmitry@gutov.dev
> Date: Mon, 01 Sep 2025 12:18:37 -0400
> 
> > You assume that the waiting_thread field will be cleared when a thread
> > dies, but that is not guaranteed.  Why isn't it better to be defensive
> > and make sure the thread is still alive?  If you are right, that test
> > will always be true, but what if you are wrong in some rare cases?
> 
> Fair enough; the patch has changed a lot so this
> other_thread_is_waiting_for function doesn't exist anymore, but if I
> re-add it or similar checks, I will check that the thread is alive.

Thanks.

> > What I had in mind was neither delete it nor deactivate it.  IOW, the
> > loop which goes through the list of processes should only pay
> > attention to processes locked to this thread and those that are not
> > locked to any live thread.  It should leave the other processes to
> > when their threads call status_notify.
> 
> There are too many ways in which we delete or deactivate processes to do
> that.

We should audit all of them and prevent that from happening when the
process is locked to another thread.

> In particular, Lisp programs should be able to call
> delete-process on a process without getting an error just because some
> thread is currently calling select and happens to be monitoring that
> process.  To emit such an error would make programming with processes
> and threads extremely difficult.

This is a problem two orders of magnitude smaller than what we are
trying to fix here.  So my suggestion is to leave it for later.
There's no special reason why Lisp programs "should be able" to delete
a process locked to another thread.  If we don't want to signal an
error (though I don't see why not), we could perhaps do nothing
silently, until we get to dealing with that later.

For now, sabotaging threads that wait for some sub-process is a much
more grave problem, and we must fix it if we want threads and
processes to live in peace in Emacs.

> > Hmm... I don't understand: why do we need to do this?  The calls to
> > compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which
> > set bits in the Available mask passed top pselect already avoid
> > setting bits for descriptors on which some other thread waits.  So
> > pselect cannot possibly indicate any output ready on any descriptors
> > belonging to other threads, and this loop you suggest to add:
> >
> >> +      /* Save the fds we're currently waiting_thread for.  */
> >> +      fd_set waiting_fds;
> >> +      FD_ZERO (&waiting_fds);
> >> +      int max_waiting_fds = 0;
> >> +      for (int fd = 0; fd <= max_desc; ++fd)
> >> +        {
> >> +	  if (fd_callback_info[fd].waiting_thread == current_thread)
> >> +	    {
> >> +	      FD_SET (fd, &waiting_fds);
> >> +	      max_waiting_fds = fd;
> >> +	    }
> >> +	}
> >
> > should be unnecessary.  Or what am I missing?
> 
> That's exactly the problem.  A file descriptor for which we set
> waiting_thread can have delete_read_fd or delete_write_fd called on it
> by another thread, or by a signal handler, while we're waiting in
> pselect.

We need to prevent that from happening, not to go along with it.  But
anyway, I don't see why you need another fd_set for this.  Why not
just look at the descriptors in the Available set?

> BTW, the fact that this problematic case is also possible via signal
> handlers (namely the SIGCHLD handler) is a separate bug introduced by
> the refactoring which added fd_callback_info.  It means that with a
> certain interleaving, I think it should be possible to get buggy
> behavior by just using processes without threads.  It's quite rare
> though and I haven't been able to figure out what exact path that would
> cause a problem.

Let's not attempt to solve too many problems at the same time.  How to
make SIGCHLD safer when threads are present is an outstanding issue,
and the solution will IMO depend in whether signals are delivered to
the main thread and whether or not the build uses the child signal
pipe.


From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 01 15:41:00 2025
Received: (at 79334) by debbugs.gnu.org; 1 Sep 2025 19:41:00 +0000
Received: from localhost ([127.0.0.1]:58577 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1utAOh-0003Gt-2Y
	for submit@debbugs.gnu.org; Mon, 01 Sep 2025 15:40:59 -0400
Received: from mxout1.mail.janestreet.com ([38.105.200.78]:55107)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <sbaugh@janestreet.com>)
 id 1utAOd-0003Ge-LX
 for 79334@debbugs.gnu.org; Mon, 01 Sep 2025 15:40:57 -0400
Received: from mail-lj1-f200.google.com ([209.85.208.200])
 by mxgoog2.mail.janestreet.com with esmtps (TLS1.3:TLS_AES_128_GCM_SHA256:128)
 (Exim 4.98.2) id 1utAOY-00000005pgk-1xPJ for 79334@debbugs.gnu.org;
 Mon, 01 Sep 2025 15:40:50 -0400
Received: by mail-lj1-f200.google.com with SMTP id
 38308e7fff4ca-336c3108badso20997601fa.3
 for <79334@debbugs.gnu.org>; Mon, 01 Sep 2025 12:40:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=janestreet.com; s=google; t=1756755649; x=1757360449; darn=debbugs.gnu.org; 
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:from:to:cc:subject:date:message-id:reply-to;
 bh=hsXJ+hPsvWAeH4Zyfx3PMMW94ejI9xBkVjJV8rYzITw=;
 b=m4q2qgaR1H6xyqZryrkHxSwBcQXgDs7ty5La0z8CH1TyakncVg2GDyjmKm+k5Hzpd2
 zCAHqBNFF8YU1oSCm7s6vQIXUPsNUyGh0ZH/Sg37tGUEuQm9fS2jclt0NptCfVE9MEi1
 Vu5Y9Qufapxj97dM9ZoeQUwNOfwzGTtZuBL3U=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1756755650;
 bh=hsXJ+hPsvWAeH4Zyfx3PMMW94ejI9xBkVjJV8rYzITw=;
 h=References:In-Reply-To:From:Date:Subject:To:Cc;
 b=UshhT9A7/S5jTOWoeWD5Ja7JW4TfDHeKKrsZvcH0id+L2zmpVJnZdv/etalJ45uk7
 tvG3RmRG52gqSai1pFxrTbEFVv08OUaEnX31ZnAUBrW1ypunqsqdYnsyWiBaNbT7j8
 XY0kw69uUBEZeDigT1lLMXpVgIQlE/56z24wqCLKWOtRgeNZA+/UP8FIqRPQEg3u9S
 R/rPGm4CZtKh1Nvhcy7MMeHtDGEqZ4ACUdXPes0mL+NCi+2XUv8NZXhAA//5QZ2tqu
 Ls1fNs7pffBd68zgEiZVtwwVr/x5kiWWLl+81n4i5bdjySqZL7bMVkc8Yu5fJp7JqU
 DlLASARlkd8xw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1756755649; x=1757360449;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=hsXJ+hPsvWAeH4Zyfx3PMMW94ejI9xBkVjJV8rYzITw=;
 b=tuUGl2En6lpsPLP8T7IFfMC/b8DBocjkrvdGo7DA+BRdExoM5IcANX/0KCBfI2KImf
 zfC0HefRtRbHdoh0ShIJNJTnj0RFT9ewCfQlZheY2T+WqltTqDvisZsEAJDsQys68LAP
 nWygoMvvIo4fYXsb9quWtuY/y4md2uWdDfaBZCqjz6QWvW5mzYq4R9EkkrGzER85FMyV
 h/Eb8h656P4aA6eHP0Tj4/SaXOnvE1aibA6e2tyqIY0kOvz8WOKpvG4LBrPkzoqPrmH3
 VzJ8E/J/xmJiP8USJp6M+61w4fUKqapmXSNslNIEYAE1XGSzrUErrloS5q/ag3Kz2XeO
 qkhQ==
X-Gm-Message-State: AOJu0Yy0YmOhQi3sr0PrZv6P7GldF8K94D6JHQQf3PuXz4L0pHBrfZ/J
 xkdVLwdX8isqP4RHXBg4ZZZfXTeAk1fHMJ6bx3qgF5c0NfxtPRpBoVa4B/KcWNuWLcRdcP+kSUo
 09xvfbvKYaW2ECsrjmaDbgELaXDbxDPyJkJHnagqUz5oxCsXu4SyxgqOXITmu+XGHOhnFOZsJBw
 5XQYi8omb1pGglfr/csdC0XyHB0gadLA==
X-Gm-Gg: ASbGncsIsR0WijZC6kC2ZH18bvOMhooosBFhIFlO0Ih4KjxIZ5Tit+hJ8QTmPdi/o8c
 rb6TDO75jL2mXYMsUJS/V52U+8cJqfQ6DFqeOzVz4rQGXBJ1jUmziwRrR0llzPztidsT00ute+G
 XMsko+epwda+ToKdrL3ap8
X-Received: by 2002:a2e:be23:0:b0:332:1de5:c513 with SMTP id
 38308e7fff4ca-336ca901440mr24870951fa.4.1756755648653; 
 Mon, 01 Sep 2025 12:40:48 -0700 (PDT)
X-Google-Smtp-Source: AGHT+IGUlSYGavU+sYo+tMJYoR07G5m3Q0RE9H92MDwv0DKAbvD4XFEgyipfkkE1Yt5VPmP2dcSijLJB2faDAruRMMg=
X-Received: by 2002:a2e:be23:0:b0:332:1de5:c513 with SMTP id
 38308e7fff4ca-336ca901440mr24870891fa.4.1756755648163; Mon, 01 Sep 2025
 12:40:48 -0700 (PDT)
MIME-Version: 1.0
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com>
 <ier5xe57vrr.fsf@janestreet.com> <ier34997u4c.fsf_-_@janestreet.com>
 <86zfbhqptt.fsf@gnu.org> <ierldn0olcg.fsf@janestreet.com>
 <86jz2kpy1w.fsf@gnu.org>
 <iera53eno2q.fsf@janestreet.com> <86ldmym0ux.fsf@gnu.org>
In-Reply-To: <86ldmym0ux.fsf@gnu.org>
From: Spencer Baugh <sbaugh@janestreet.com>
Date: Mon, 1 Sep 2025 15:40:37 -0400
X-Gm-Features: Ac12FXyUblflpNO1YUj2f8dXcmOzgfch-ujFAZH6okGf-3Y__6g8my__JThELP0
Message-ID: <CAO=BR8PurEeiexhE-_enihOrJBoXSBQ0pQyH9eiC-SYe+HZtOg@mail.gmail.com>
Subject: Re: bug#79334: [PATCH] Don't release thread select lock
 unnecessarily, [PATCH] Defer closing file descriptors used by other threads
To: Eli Zaretskii <eliz@gnu.org>
Content-Type: multipart/alternative; boundary="0000000000005dc3f4063dc28cb0"
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, Dmitry Gutov <dmitry@gutov.dev>
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

--0000000000005dc3f4063dc28cb0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, Sep 1, 2025, 3:25=E2=80=AFPM Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Spencer Baugh <sbaugh@janestreet.com>
> > Cc: 79334@debbugs.gnu.org,  eggert@cs.ucla.edu,  dmitry@gutov.dev
> > Date: Mon, 01 Sep 2025 12:18:37 -0400
> >
> > > You assume that the waiting_thread field will be cleared when a threa=
d
> > > dies, but that is not guaranteed.  Why isn't it better to be defensiv=
e
> > > and make sure the thread is still alive?  If you are right, that test
> > > will always be true, but what if you are wrong in some rare cases?
> >
> > Fair enough; the patch has changed a lot so this
> > other_thread_is_waiting_for function doesn't exist anymore, but if I
> > re-add it or similar checks, I will check that the thread is alive.
>
> Thanks.
>
> > > What I had in mind was neither delete it nor deactivate it.  IOW, the
> > > loop which goes through the list of processes should only pay
> > > attention to processes locked to this thread and those that are not
> > > locked to any live thread.  It should leave the other processes to
> > > when their threads call status_notify.
> >
> > There are too many ways in which we delete or deactivate processes to d=
o
> > that.
>
> We should audit all of them and prevent that from happening when the
> process is locked to another thread.
>
> > In particular, Lisp programs should be able to call
> > delete-process on a process without getting an error just because some
> > thread is currently calling select and happens to be monitoring that
> > process.  To emit such an error would make programming with processes
> > and threads extremely difficult.
>
> This is a problem two orders of magnitude smaller than what we are
> trying to fix here.  So my suggestion is to leave it for later.
> There's no special reason why Lisp programs "should be able" to delete
> a process locked to another thread.  If we don't want to signal an
> error (though I don't see why not), we could perhaps do nothing
> silently, until we get to dealing with that later.
>

Silently doing nothing is also an unacceptably confusing result, in my
opinion.

But it's okay: I believe it's possible to solve the problem without
introducing such bad behavior.

For now, sabotaging threads that wait for some sub-process is a much
> more grave problem, and we must fix it if we want threads and
> processes to live in peace in Emacs.
>

Of course.

> > Hmm... I don't understand: why do we need to do this?  The calls to
> > > compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which
> > > set bits in the Available mask passed top pselect already avoid
> > > setting bits for descriptors on which some other thread waits.  So
> > > pselect cannot possibly indicate any output ready on any descriptors
> > > belonging to other threads, and this loop you suggest to add:
> > >
> > >> +      /* Save the fds we're currently waiting_thread for.  */
> > >> +      fd_set waiting_fds;
> > >> +      FD_ZERO (&waiting_fds);
> > >> +      int max_waiting_fds =3D 0;
> > >> +      for (int fd =3D 0; fd <=3D max_desc; ++fd)
> > >> +        {
> > >> +    if (fd_callback_info[fd].waiting_thread =3D=3D current_thread)
> > >> +      {
> > >> +        FD_SET (fd, &waiting_fds);
> > >> +        max_waiting_fds =3D fd;
> > >> +      }
> > >> +  }
> > >
> > > should be unnecessary.  Or what am I missing?
> >
> > That's exactly the problem.  A file descriptor for which we set
> > waiting_thread can have delete_read_fd or delete_write_fd called on it
> > by another thread, or by a signal handler, while we're waiting in
> > pselect.
>
> We need to prevent that from happening, not to go along with it.  But
> anyway, I don't see why you need another fd_set for this.  Why not
> just look at the descriptors in the Available set?
>

The Available set is cleared by the select call, and afterwards only
contains descriptors which has events.  In that case we could just look at
the descriptors in Available and check waiting_thread for each of them.

But if there was an EBADF, then Available is left in an undefined state
(see pselect(1)) and we no longer know what descriptors we originally set
waiting_thread on.  So we can't check to see if one of our descriptors was
"stolen" (had waiting_thread set to a different value), which would mean
the EBADF is spurious and we should continue running.

Though, I guess we could alternatively just assume that EBADF is always
spurious.

> BTW, the fact that this problematic case is also possible via signal
> > handlers (namely the SIGCHLD handler) is a separate bug introduced by
> > the refactoring which added fd_callback_info.  It means that with a
> > certain interleaving, I think it should be possible to get buggy
> > behavior by just using processes without threads.  It's quite rare
> > though and I haven't been able to figure out what exact path that would
> > cause a problem.
>
> Let's not attempt to solve too many problems at the same time.  How to
> make SIGCHLD safer when threads are present is an outstanding issue,
> and the solution will IMO depend in whether signals are delivered to
> the main thread and whether or not the build uses the child signal
> pipe.
>

You misread me: I said SIGCHLD is unsafe even without threads.

>

--0000000000005dc3f4063dc28cb0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div><br><br><div class=3D"gmail_quote gmail_quote_contai=
ner"><div dir=3D"ltr" class=3D"gmail_attr">On Mon, Sep 1, 2025, 3:25=E2=80=
=AFPM Eli Zaretskii &lt;<a href=3D"mailto:eliz@gnu.org">eliz@gnu.org</a>&gt=
; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .=
8ex;border-left:1px #ccc solid;padding-left:1ex">&gt; From: Spencer Baugh &=
lt;<a href=3D"mailto:sbaugh@janestreet.com" target=3D"_blank" rel=3D"norefe=
rrer">sbaugh@janestreet.com</a>&gt;<br>
&gt; Cc: <a href=3D"mailto:79334@debbugs.gnu.org" target=3D"_blank" rel=3D"=
noreferrer">79334@debbugs.gnu.org</a>,=C2=A0 <a href=3D"mailto:eggert@cs.uc=
la.edu" target=3D"_blank" rel=3D"noreferrer">eggert@cs.ucla.edu</a>,=C2=A0 =
<a href=3D"mailto:dmitry@gutov.dev" target=3D"_blank" rel=3D"noreferrer">dm=
itry@gutov.dev</a><br>
&gt; Date: Mon, 01 Sep 2025 12:18:37 -0400<br>
&gt; <br>
&gt; &gt; You assume that the waiting_thread field will be cleared when a t=
hread<br>
&gt; &gt; dies, but that is not guaranteed.=C2=A0 Why isn&#39;t it better t=
o be defensive<br>
&gt; &gt; and make sure the thread is still alive?=C2=A0 If you are right, =
that test<br>
&gt; &gt; will always be true, but what if you are wrong in some rare cases=
?<br>
&gt; <br>
&gt; Fair enough; the patch has changed a lot so this<br>
&gt; other_thread_is_waiting_for function doesn&#39;t exist anymore, but if=
 I<br>
&gt; re-add it or similar checks, I will check that the thread is alive.<br=
>
<br>
Thanks.<br>
<br>
&gt; &gt; What I had in mind was neither delete it nor deactivate it.=C2=A0=
 IOW, the<br>
&gt; &gt; loop which goes through the list of processes should only pay<br>
&gt; &gt; attention to processes locked to this thread and those that are n=
ot<br>
&gt; &gt; locked to any live thread.=C2=A0 It should leave the other proces=
ses to<br>
&gt; &gt; when their threads call status_notify.<br>
&gt; <br>
&gt; There are too many ways in which we delete or deactivate processes to =
do<br>
&gt; that.<br>
<br>
We should audit all of them and prevent that from happening when the<br>
process is locked to another thread.<br>
<br>
&gt; In particular, Lisp programs should be able to call<br>
&gt; delete-process on a process without getting an error just because some=
<br>
&gt; thread is currently calling select and happens to be monitoring that<b=
r>
&gt; process.=C2=A0 To emit such an error would make programming with proce=
sses<br>
&gt; and threads extremely difficult.<br>
<br>
This is a problem two orders of magnitude smaller than what we are<br>
trying to fix here.=C2=A0 So my suggestion is to leave it for later.<br>
There&#39;s no special reason why Lisp programs &quot;should be able&quot; =
to delete<br>
a process locked to another thread.=C2=A0 If we don&#39;t want to signal an=
<br>
error (though I don&#39;t see why not), we could perhaps do nothing<br>
silently, until we get to dealing with that later.<br></blockquote></div></=
div><div dir=3D"auto"><br></div><div dir=3D"auto">Silently doing nothing is=
 also an unacceptably confusing result, in my opinion.</div><div dir=3D"aut=
o"><br></div><div dir=3D"auto">But it&#39;s okay: I believe it&#39;s possib=
le to solve the problem without introducing such bad behavior.</div><div di=
r=3D"auto"><br></div><div dir=3D"auto"><div class=3D"gmail_quote gmail_quot=
e_container"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;b=
order-left:1px #ccc solid;padding-left:1ex">
For now, sabotaging threads that wait for some sub-process is a much<br>
more grave problem, and we must fix it if we want threads and<br>
processes to live in peace in Emacs.<br></blockquote></div></div><div dir=
=3D"auto"><br></div><div dir=3D"auto">Of course.</div><div dir=3D"auto"><br=
></div><div dir=3D"auto"><div class=3D"gmail_quote gmail_quote_container"><=
blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px=
 #ccc solid;padding-left:1ex">
&gt; &gt; Hmm... I don&#39;t understand: why do we need to do this?=C2=A0 T=
he calls to<br>
&gt; &gt; compute_input_wait_mask, compute_non_keyboard_wait_mask etc., whi=
ch<br>
&gt; &gt; set bits in the Available mask passed top pselect already avoid<b=
r>
&gt; &gt; setting bits for descriptors on which some other thread waits.=C2=
=A0 So<br>
&gt; &gt; pselect cannot possibly indicate any output ready on any descript=
ors<br>
&gt; &gt; belonging to other threads, and this loop you suggest to add:<br>
&gt; &gt;<br>
&gt; &gt;&gt; +=C2=A0 =C2=A0 =C2=A0 /* Save the fds we&#39;re currently wai=
ting_thread for.=C2=A0 */<br>
&gt; &gt;&gt; +=C2=A0 =C2=A0 =C2=A0 fd_set waiting_fds;<br>
&gt; &gt;&gt; +=C2=A0 =C2=A0 =C2=A0 FD_ZERO (&amp;waiting_fds);<br>
&gt; &gt;&gt; +=C2=A0 =C2=A0 =C2=A0 int max_waiting_fds =3D 0;<br>
&gt; &gt;&gt; +=C2=A0 =C2=A0 =C2=A0 for (int fd =3D 0; fd &lt;=3D max_desc;=
 ++fd)<br>
&gt; &gt;&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 {<br>
&gt; &gt;&gt; +=C2=A0 =C2=A0 if (fd_callback_info[fd].waiting_thread =3D=3D=
 current_thread)<br>
&gt; &gt;&gt; +=C2=A0 =C2=A0 =C2=A0 {<br>
&gt; &gt;&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 FD_SET (fd, &amp;waiting_fds);<b=
r>
&gt; &gt;&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_waiting_fds =3D fd;<br>
&gt; &gt;&gt; +=C2=A0 =C2=A0 =C2=A0 }<br>
&gt; &gt;&gt; +=C2=A0 }<br>
&gt; &gt;<br>
&gt; &gt; should be unnecessary.=C2=A0 Or what am I missing?<br>
&gt; <br>
&gt; That&#39;s exactly the problem.=C2=A0 A file descriptor for which we s=
et<br>
&gt; waiting_thread can have delete_read_fd or delete_write_fd called on it=
<br>
&gt; by another thread, or by a signal handler, while we&#39;re waiting in<=
br>
&gt; pselect.<br>
<br>
We need to prevent that from happening, not to go along with it.=C2=A0 But<=
br>
anyway, I don&#39;t see why you need another fd_set for this.=C2=A0 Why not=
<br>
just look at the descriptors in the Available set?<br></blockquote></div></=
div><div dir=3D"auto"><br></div><div dir=3D"auto">The Available set is clea=
red by the select call, and afterwards only contains descriptors which has =
events.=C2=A0 In that case we could just look at the descriptors in Availab=
le and check waiting_thread for each of them.</div><div dir=3D"auto"><br></=
div><div dir=3D"auto">But if there was an EBADF, then Available is left in =
an undefined state (see pselect(1)) and we no longer know what descriptors =
we originally set waiting_thread on.=C2=A0 So we can&#39;t check to see if =
one of our descriptors was &quot;stolen&quot; (had waiting_thread set to a =
different value), which would mean the EBADF is spurious and we should cont=
inue running.</div><div dir=3D"auto"><br></div><div dir=3D"auto">Though, I =
guess we could alternatively just assume that EBADF is always spurious.</di=
v><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=3D"gmail_quote g=
mail_quote_container"><blockquote class=3D"gmail_quote" style=3D"margin:0 0=
 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
&gt; BTW, the fact that this problematic case is also possible via signal<b=
r>
&gt; handlers (namely the SIGCHLD handler) is a separate bug introduced by<=
br>
&gt; the refactoring which added fd_callback_info.=C2=A0 It means that with=
 a<br>
&gt; certain interleaving, I think it should be possible to get buggy<br>
&gt; behavior by just using processes without threads.=C2=A0 It&#39;s quite=
 rare<br>
&gt; though and I haven&#39;t been able to figure out what exact path that =
would<br>
&gt; cause a problem.<br>
<br>
Let&#39;s not attempt to solve too many problems at the same time.=C2=A0 Ho=
w to<br>
make SIGCHLD safer when threads are present is an outstanding issue,<br>
and the solution will IMO depend in whether signals are delivered to<br>
the main thread and whether or not the build uses the child signal<br>
pipe.<br></blockquote></div></div><div dir=3D"auto"><br></div><div dir=3D"a=
uto">You misread me: I said SIGCHLD is unsafe even without threads.</div><d=
iv dir=3D"auto"><div class=3D"gmail_quote gmail_quote_container"><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex">
</blockquote></div></div></div>

--0000000000005dc3f4063dc28cb0--


From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 02 07:18:44 2025
Received: (at 79334) by debbugs.gnu.org; 2 Sep 2025 11:18:44 +0000
Received: from localhost ([127.0.0.1]:60513 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1utP2C-0005py-3B
	for submit@debbugs.gnu.org; Tue, 02 Sep 2025 07:18:44 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:43994)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1utP29-0005pl-8p
 for 79334@debbugs.gnu.org; Tue, 02 Sep 2025 07:18:42 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1utP23-0007eA-54; Tue, 02 Sep 2025 07:18:35 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=3EsvoDTrpakJXeHmJ3bLk+DwCSBxCLY4p2VHUqJV6Sg=; b=LRAICAfuZ3Qd
 QU35EOgQxmviHX4+85qHzv/gylG/MLJNBGKC6DPIz5prhiYgXkTGoUCt+9a4jkgdOOw6XllElpXAZ
 RqJm87xOHWUdIzp6D1kjO1SvSUEmWjLoMDbiBuc7Gpi4GzhqAKXdzgxdkypl5/FTZlPnijxxn7zae
 B7ix97S59YRc9NoXzBKc+8AZql6xtnCKoCZq/244/yJ8kOBwCLddTwKR3f1xZaLFZIggxz1muVXoX
 oHjNTp+5JukA9C2MPYrvs/7La4MBVW0CaAzTgbr0m9QT+2TtfsTI8pWufXQOAMwsVWUGveZZudYsQ
 klBGsSCRfBpc1HTmlDg04Q==;
Date: Tue, 02 Sep 2025 14:18:31 +0300
Message-Id: <86bjntm7aw.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: Spencer Baugh <sbaugh@janestreet.com>
In-Reply-To: <CAO=BR8PurEeiexhE-_enihOrJBoXSBQ0pQyH9eiC-SYe+HZtOg@mail.gmail.com>
 (message from Spencer Baugh on Mon, 1 Sep 2025 15:40:37 -0400)
Subject: Re: bug#79334: [PATCH] Don't release thread select lock
 unnecessarily, [PATCH] Defer closing file descriptors used by other threads
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com>
 <ier5xe57vrr.fsf@janestreet.com> <ier34997u4c.fsf_-_@janestreet.com>
 <86zfbhqptt.fsf@gnu.org> <ierldn0olcg.fsf@janestreet.com>
 <86jz2kpy1w.fsf@gnu.org>
 <iera53eno2q.fsf@janestreet.com> <86ldmym0ux.fsf@gnu.org>
 <CAO=BR8PurEeiexhE-_enihOrJBoXSBQ0pQyH9eiC-SYe+HZtOg@mail.gmail.com>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Spencer Baugh <sbaugh@janestreet.com>
> Date: Mon, 1 Sep 2025 15:40:37 -0400
> Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, Dmitry Gutov <dmitry@gutov.dev>
> 
>  > In particular, Lisp programs should be able to call
>  > delete-process on a process without getting an error just because some
>  > thread is currently calling select and happens to be monitoring that
>  > process.  To emit such an error would make programming with processes
>  > and threads extremely difficult.
> 
>  This is a problem two orders of magnitude smaller than what we are
>  trying to fix here.  So my suggestion is to leave it for later.
>  There's no special reason why Lisp programs "should be able" to delete
>  a process locked to another thread.  If we don't want to signal an
>  error (though I don't see why not), we could perhaps do nothing
>  silently, until we get to dealing with that later.
> 
> Silently doing nothing is also an unacceptably confusing result, in my opinion.

Why?  Ignoring invalid actions is not something new to Emacs.

>  > > Hmm... I don't understand: why do we need to do this?  The calls to
>  > > compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which
>  > > set bits in the Available mask passed top pselect already avoid
>  > > setting bits for descriptors on which some other thread waits.  So
>  > > pselect cannot possibly indicate any output ready on any descriptors
>  > > belonging to other threads, and this loop you suggest to add:
>  > >
>  > >> +      /* Save the fds we're currently waiting_thread for.  */
>  > >> +      fd_set waiting_fds;
>  > >> +      FD_ZERO (&waiting_fds);
>  > >> +      int max_waiting_fds = 0;
>  > >> +      for (int fd = 0; fd <= max_desc; ++fd)
>  > >> +        {
>  > >> +    if (fd_callback_info[fd].waiting_thread == current_thread)
>  > >> +      {
>  > >> +        FD_SET (fd, &waiting_fds);
>  > >> +        max_waiting_fds = fd;
>  > >> +      }
>  > >> +  }
>  > >
>  > > should be unnecessary.  Or what am I missing?
>  > 
>  > That's exactly the problem.  A file descriptor for which we set
>  > waiting_thread can have delete_read_fd or delete_write_fd called on it
>  > by another thread, or by a signal handler, while we're waiting in
>  > pselect.
> 
>  We need to prevent that from happening, not to go along with it.  But
>  anyway, I don't see why you need another fd_set for this.  Why not
>  just look at the descriptors in the Available set?
> 
> The Available set is cleared by the select call, and afterwards only contains descriptors which has events.  In
> that case we could just look at the descriptors in Available and check waiting_thread for each of them.

Yes, and we don't care about descriptors that are not ready to be
read.

> But if there was an EBADF, then Available is left in an undefined state (see pselect(1)) and we no longer
> know what descriptors we originally set waiting_thread on.

In that case pselect returns a negative result, no?  And if that
happens, we don't examine the descriptors at all, right?  So why is
this situation a problem we need to handle with this additional fd_set?

> Though, I guess we could alternatively just assume that EBADF is always spurious.

Assume and do what?

>  > BTW, the fact that this problematic case is also possible via signal
>  > handlers (namely the SIGCHLD handler) is a separate bug introduced by
>  > the refactoring which added fd_callback_info.  It means that with a
>  > certain interleaving, I think it should be possible to get buggy
>  > behavior by just using processes without threads.  It's quite rare
>  > though and I haven't been able to figure out what exact path that would
>  > cause a problem.
> 
>  Let's not attempt to solve too many problems at the same time.  How to
>  make SIGCHLD safer when threads are present is an outstanding issue,
>  and the solution will IMO depend in whether signals are delivered to
>  the main thread and whether or not the build uses the child signal
>  pipe.
> 
> You misread me: I said SIGCHLD is unsafe even without threads.

Please describe how.


From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 02 09:38:27 2025
Received: (at 79334) by debbugs.gnu.org; 2 Sep 2025 13:38:28 +0000
Received: from localhost ([127.0.0.1]:33720 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1utRDO-00047K-Ui
	for submit@debbugs.gnu.org; Tue, 02 Sep 2025 09:38:27 -0400
Received: from mxout5.mail.janestreet.com ([64.215.233.18]:46961)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <sbaugh@janestreet.com>)
 id 1utRDK-00046n-GH
 for 79334@debbugs.gnu.org; Tue, 02 Sep 2025 09:38:23 -0400
From: Spencer Baugh <sbaugh@janestreet.com>
To: Eli Zaretskii <eliz@gnu.org>
Subject: Re: bug#79334: [PATCH] Don't release thread select lock
 unnecessarily, [PATCH] Defer closing file descriptors used by other
 threads
In-Reply-To: <86bjntm7aw.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 02 Sep
 2025 14:18:31 +0300")
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com> <ier5xe57vrr.fsf@janestreet.com>
 <ier34997u4c.fsf_-_@janestreet.com> <86zfbhqptt.fsf@gnu.org>
 <ierldn0olcg.fsf@janestreet.com> <86jz2kpy1w.fsf@gnu.org>
 <iera53eno2q.fsf@janestreet.com> <86ldmym0ux.fsf@gnu.org>
 <CAO=BR8PurEeiexhE-_enihOrJBoXSBQ0pQyH9eiC-SYe+HZtOg@mail.gmail.com>
 <86bjntm7aw.fsf@gnu.org>
Date: Tue, 02 Sep 2025 09:38:15 -0400
Message-ID: <ierv7m1m0u0.fsf@janestreet.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1756820295;
 bh=/GVeROQ20HPMbm6tHn3GvxCNKE8SN5BQeTZPunZC2P4=;
 h=From:To:Cc:Subject:In-Reply-To:References:Date;
 b=uygAzPTvAssN161KRXtPBMiwK5NcDbJMsAGDjj9ZtK7akoXaFOJzYseX+b1RJJ6ss
 3jDh3UN8+bcItXwMjSjXIEXhS5EWc3X/P5CcTHOd+xZrBKTVb0wr5o2hXsVEFDm3zi
 wc72YlX4aHFJK0Y0U+qHjE+xVhlnuw0KOu7FfBHa2GYxsrFE9hNGOcrIgWbmnGoeU5
 Gfhe345ZnRdVnkQsaVZ0sItoZ3ou8sDMTZzt+xmtEx9pV1YMF8GkZ7ZY1Lg3lFuE72
 xOqACYq5q8+1sqxmyMM/3z4LedXY7/vvYVRGzb/upQPFAVno+WyarTVqeXzvW/+Yoy
 u3sSWxAz65MMA==
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Eli Zaretskii <eliz@gnu.org> writes:
>>  > > Hmm... I don't understand: why do we need to do this?  The calls to
>>  > > compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which
>>  > > set bits in the Available mask passed top pselect already avoid
>>  > > setting bits for descriptors on which some other thread waits.  So
>>  > > pselect cannot possibly indicate any output ready on any descriptors
>>  > > belonging to other threads, and this loop you suggest to add:
>>  > >
>>  > >> +      /* Save the fds we're currently waiting_thread for.  */
>>  > >> +      fd_set waiting_fds;
>>  > >> +      FD_ZERO (&waiting_fds);
>>  > >> +      int max_waiting_fds = 0;
>>  > >> +      for (int fd = 0; fd <= max_desc; ++fd)
>>  > >> +        {
>>  > >> +    if (fd_callback_info[fd].waiting_thread == current_thread)
>>  > >> +      {
>>  > >> +        FD_SET (fd, &waiting_fds);
>>  > >> +        max_waiting_fds = fd;
>>  > >> +      }
>>  > >> +  }
>>  > >
>>  > > should be unnecessary.  Or what am I missing?
>>  > 
>>  > That's exactly the problem.  A file descriptor for which we set
>>  > waiting_thread can have delete_read_fd or delete_write_fd called on it
>>  > by another thread, or by a signal handler, while we're waiting in
>>  > pselect.
>> 
>>  We need to prevent that from happening, not to go along with it.  But
>>  anyway, I don't see why you need another fd_set for this.  Why not
>>  just look at the descriptors in the Available set?
>> 
>> The Available set is cleared by the select call, and afterwards only contains descriptors which has events.  In
>> that case we could just look at the descriptors in Available and check waiting_thread for each of them.
>
> Yes, and we don't care about descriptors that are not ready to be
> read.

Right.

>> But if there was an EBADF, then Available is left in an undefined state (see pselect(1)) and we no longer
>> know what descriptors we originally set waiting_thread on.
>
> In that case pselect returns a negative result, no?  And if that
> happens, we don't examine the descriptors at all, right?  So why is
> this situation a problem we need to handle with this additional fd_set?

Because we currently call emacs_abort when we get EBADF:

if (nfds < 0)
  {
    if (xerrno == EINTR)
      no_avail = 1;
    else if (xerrno == EBADF)
      emacs_abort ();
    else
      report_file_errno ("Failed select", Qnil, xerrno);
  }

>> Though, I guess we could alternatively just assume that EBADF is always spurious.
>
> Assume and do what?

Follow one of the other two branches in the above snippet of code:

- Treat EBADF the same way we do EINTR: there were no events on any fds, so
  we continue running wait_reading_process_output.

- Treat EBADF the same way we treat other errno values: signal an error.
  That would at least be better than calling emacs_abort.

>>  > BTW, the fact that this problematic case is also possible via signal
>>  > handlers (namely the SIGCHLD handler) is a separate bug introduced by
>>  > the refactoring which added fd_callback_info.  It means that with a
>>  > certain interleaving, I think it should be possible to get buggy
>>  > behavior by just using processes without threads.  It's quite rare
>>  > though and I haven't been able to figure out what exact path that would
>>  > cause a problem.
>> 
>>  Let's not attempt to solve too many problems at the same time.  How to
>>  make SIGCHLD safer when threads are present is an outstanding issue,
>>  and the solution will IMO depend in whether signals are delivered to
>>  the main thread and whether or not the build uses the child signal
>>  pipe.
>> 
>> You misread me: I said SIGCHLD is unsafe even without threads.
>
> Please describe how.

1. Call wait_reading_process_output
2. wait_reading_process_output calls compute_input_wait_mask, including
   SOME-FD in the mask.
3. Receive SIGCHLD, calling handle_child_signal
4. handle_child_signal calls delete_read_fd for SOME-FD
5. Signal handler finishes
6. wait_reading_process_output calls select and returns events for
   SOME-FD, even though SOME-FD has been deleted from fd_callback_info
   and the process associated with it has died.  It seems possible that
   this can cause an issue in some code path.

This couldn't happen before the refactoring which added
fd_callback_info, because handle_child_signal directly mutated the
fd_set which is passed to pselect.


From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 02 10:40:15 2025
Received: (at 79334) by debbugs.gnu.org; 2 Sep 2025 14:40:16 +0000
Received: from localhost ([127.0.0.1]:35363 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1utSBB-0001qW-Fh
	for submit@debbugs.gnu.org; Tue, 02 Sep 2025 10:40:15 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:46420)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1utSB3-0001lX-B1
 for 79334@debbugs.gnu.org; Tue, 02 Sep 2025 10:40:09 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1utSAr-0002If-Fr; Tue, 02 Sep 2025 10:39:54 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=nUxB0KzW8g+efBzs3LBuf7v8X2SJvK2Uzzh5l3zb+qU=; b=NnOZNbB25iA9
 2BYDQXi+qJSmUZUHNGRae3AV5UqXiB6ailMvYFPt6+sx4xyMvCb1CaBIZMVhehGNHytVITBMLg1ZS
 OZEcwLTNp0Xk2nYVauDY66ZPiD8EjPvea20fsXW5gz2K1pJuswpHD8W7loJMcAHWhzkPTFrwRr1aE
 g15aus96FrJjRtW8iAslsExmXCsQs/yv4AdPmAEX7pjdXFK+6kPbC7cvxkpQGWh+3fH48lMOC4c5e
 vz/SYkED23qkXs63xuVVMLxcAEp6rXBgQflmPqou633kadMEko3R9YWQ1fBAtQVz8lMyFjfNF4iEm
 lTOH4Vj2RnrnqpDP2Ee7CQ==;
Date: Tue, 02 Sep 2025 17:39:39 +0300
Message-Id: <86y0qwlxzo.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: Spencer Baugh <sbaugh@janestreet.com>
In-Reply-To: <ierv7m1m0u0.fsf@janestreet.com> (message from Spencer Baugh on
 Tue, 02 Sep 2025 09:38:15 -0400)
Subject: Re: bug#79334: [PATCH] Don't release thread select lock
 unnecessarily, [PATCH] Defer closing file descriptors used by other
 threads
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com> <ier5xe57vrr.fsf@janestreet.com>
 <ier34997u4c.fsf_-_@janestreet.com> <86zfbhqptt.fsf@gnu.org>
 <ierldn0olcg.fsf@janestreet.com> <86jz2kpy1w.fsf@gnu.org>
 <iera53eno2q.fsf@janestreet.com> <86ldmym0ux.fsf@gnu.org>
 <CAO=BR8PurEeiexhE-_enihOrJBoXSBQ0pQyH9eiC-SYe+HZtOg@mail.gmail.com>
 <86bjntm7aw.fsf@gnu.org> <ierv7m1m0u0.fsf@janestreet.com>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Spencer Baugh <sbaugh@janestreet.com>
> Cc: 79334@debbugs.gnu.org,  eggert@cs.ucla.edu,  dmitry@gutov.dev
> Date: Tue, 02 Sep 2025 09:38:15 -0400
> 
> >> But if there was an EBADF, then Available is left in an undefined state (see pselect(1)) and we no longer
> >> know what descriptors we originally set waiting_thread on.
> >
> > In that case pselect returns a negative result, no?  And if that
> > happens, we don't examine the descriptors at all, right?  So why is
> > this situation a problem we need to handle with this additional fd_set?
> 
> Because we currently call emacs_abort when we get EBADF:

So, if the danger of getting EBADF due to threads-and-processes
mishaps is real (I'm saying "if" because I don't think we had bug
reports about that), we should change how we treat EBADF in that case.
Which should probably be a separate discussion anyway.  In any case,
the distinct return value of pselect in that case means that none of
the code which handles other return values should be bothered about
this possibility.

> >> Though, I guess we could alternatively just assume that EBADF is always spurious.
> >
> > Assume and do what?
> 
> Follow one of the other two branches in the above snippet of code:
> 
> - Treat EBADF the same way we do EINTR: there were no events on any fds, so
>   we continue running wait_reading_process_output.
> 
> - Treat EBADF the same way we treat other errno values: signal an error.
>   That would at least be better than calling emacs_abort.

Let's discuss this issue separately, okay?  We are already trying to
cover too many different scenarios, which makes this discussion hard
to follow and the probability of losing focus high.

> >> You misread me: I said SIGCHLD is unsafe even without threads.
> >
> > Please describe how.
> 
> 1. Call wait_reading_process_output
> 2. wait_reading_process_output calls compute_input_wait_mask, including
>    SOME-FD in the mask.
> 3. Receive SIGCHLD, calling handle_child_signal
> 4. handle_child_signal calls delete_read_fd for SOME-FD
> 5. Signal handler finishes
> 6. wait_reading_process_output calls select and returns events for
>    SOME-FD, even though SOME-FD has been deleted from fd_callback_info
>    and the process associated with it has died.  It seems possible that
>    this can cause an issue in some code path.
> 
> This couldn't happen before the refactoring which added
> fd_callback_info, because handle_child_signal directly mutated the
> fd_set which is passed to pselect.

Are you talking about systems that don't use the "child signal pipe"?
Because if that pipe is in use, then the SIGCHLD handler writes to it,
and that will cause pselect to return with that descriptor in
Available, and we then handle the death of the process.  In addition,
delete_read_fd just clears the data from the fd_callback_info of the
descriptor, and the loop that examines the bits in Available is AFAICT
careful enough to test these fields of the callback info before doing
anything because the descriptor was returned by pselect as ready.  So
what kind of problems could have happened in step 6 above?

And once again, this is a separate issue, so let's discuss it
separately.  In this bug, let's focus on what happens in status_notify
and in general when we deactivate/delete a process which exited.  Can
we return to that now?


From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 02 10:59:51 2025
Received: (at 79334) by debbugs.gnu.org; 2 Sep 2025 14:59:51 +0000
Received: from localhost ([127.0.0.1]:35597 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1utSUA-0003Wg-Je
	for submit@debbugs.gnu.org; Tue, 02 Sep 2025 10:59:51 -0400
Received: from mxout5.mail.janestreet.com ([64.215.233.18]:54347)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <sbaugh@janestreet.com>)
 id 1utSU5-0003W3-Q7
 for 79334@debbugs.gnu.org; Tue, 02 Sep 2025 10:59:46 -0400
From: Spencer Baugh <sbaugh@janestreet.com>
To: Eli Zaretskii <eliz@gnu.org>
Subject: Re: bug#79334: [PATCH] Don't release thread select lock
 unnecessarily, [PATCH] Defer closing file descriptors used by other
 threads
In-Reply-To: <86y0qwlxzo.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 02 Sep
 2025 17:39:39 +0300")
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com> <ier5xe57vrr.fsf@janestreet.com>
 <ier34997u4c.fsf_-_@janestreet.com> <86zfbhqptt.fsf@gnu.org>
 <ierldn0olcg.fsf@janestreet.com> <86jz2kpy1w.fsf@gnu.org>
 <iera53eno2q.fsf@janestreet.com> <86ldmym0ux.fsf@gnu.org>
 <CAO=BR8PurEeiexhE-_enihOrJBoXSBQ0pQyH9eiC-SYe+HZtOg@mail.gmail.com>
 <86bjntm7aw.fsf@gnu.org> <ierv7m1m0u0.fsf@janestreet.com>
 <86y0qwlxzo.fsf@gnu.org>
Date: Tue, 02 Sep 2025 10:59:39 -0400
Message-ID: <ieriki0nbms.fsf@janestreet.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com;
 s=waixah; t=1756825180;
 bh=BoWAyrOubgQHvj4bFXEbVVhM8ZA+ck1/a+F63bO9td0=;
 h=From:To:Cc:Subject:In-Reply-To:References:Date;
 b=C4I/3NkCFg1OpBy1POMla5gRVOLnujDz422EDP1ClmqGN9L7UBhSLwwEj4Tf5SxEr
 C+yj+v/Sj3wtxdi7GVVH6juVwcMMwRJv0shoEKxefG39MhfkBPnwXCVDDNhjxb91bF
 6z8AcmwjEg+f0qk4sl4m7LZSVXi+m0DzQfSFWSs7lbDTSgAEQcVF8Oxp1PT9xQRA3c
 0oMVCXlDUv9opit0k81o/J2nROrWCEJQfagEJcw9tuJOgzRNXIAtNnJ+fSaYCoa0pS
 wM8G5zumUlcNSBpbfVR8kh0K8jtPJsuGMFlXtU4P9JLUswSkU2a//y87DZ8sGgH3YO
 Fmw5s8U8Cqwow==
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Spencer Baugh <sbaugh@janestreet.com>
>> Cc: 79334@debbugs.gnu.org,  eggert@cs.ucla.edu,  dmitry@gutov.dev
>> Date: Tue, 02 Sep 2025 09:38:15 -0400
>> 
>> >> But if there was an EBADF, then Available is left in an undefined state (see pselect(1)) and we no longer
>> >> know what descriptors we originally set waiting_thread on.
>> >
>> > In that case pselect returns a negative result, no?  And if that
>> > happens, we don't examine the descriptors at all, right?  So why is
>> > this situation a problem we need to handle with this additional fd_set?
>> 
>> Because we currently call emacs_abort when we get EBADF:
>
> So, if the danger of getting EBADF due to threads-and-processes
> mishaps is real (I'm saying "if" because I don't think we had bug
> reports about that),

It is real.  I have seen it plenty at my site.  I described this issue
in my initial email.  It is the issue I care about solving.  This bug is
my report of the issue.  Solving it is my highest priority.

> we should change how we treat EBADF in that case.  Which should
> probably be a separate discussion anyway.  In any case, the distinct
> return value of pselect in that case means that none of the code which
> handles other return values should be bothered about this possibility.
>
>> >> Though, I guess we could alternatively just assume that EBADF is always spurious.
>> >
>> > Assume and do what?
>> 
>> Follow one of the other two branches in the above snippet of code:
>> 
>> - Treat EBADF the same way we do EINTR: there were no events on any fds, so
>>   we continue running wait_reading_process_output.
>> 
>> - Treat EBADF the same way we treat other errno values: signal an error.
>>   That would at least be better than calling emacs_abort.
>
> Let's discuss this issue separately, okay?  We are already trying to
> cover too many different scenarios, which makes this discussion hard
> to follow and the probability of losing focus high.

It's not a separate issue, it's the entire point of this bug.  So yes,
let's stay focused on this issue and not other issues.

Which of these two options do you think we should go with?

Or there are other options.  For example, we could detect the cause of
the EBADF by remembering the original Available set before calling
pselect, and checking if the fd_callback_info entry for any of those
file descriptors was cleared.

>> >> You misread me: I said SIGCHLD is unsafe even without threads.
>> >
>> > Please describe how.
>> 
>> 1. Call wait_reading_process_output
>> 2. wait_reading_process_output calls compute_input_wait_mask, including
>>    SOME-FD in the mask.
>> 3. Receive SIGCHLD, calling handle_child_signal
>> 4. handle_child_signal calls delete_read_fd for SOME-FD
>> 5. Signal handler finishes
>> 6. wait_reading_process_output calls select and returns events for
>>    SOME-FD, even though SOME-FD has been deleted from fd_callback_info
>>    and the process associated with it has died.  It seems possible that
>>    this can cause an issue in some code path.
>> 
>> This couldn't happen before the refactoring which added
>> fd_callback_info, because handle_child_signal directly mutated the
>> fd_set which is passed to pselect.
>
> Are you talking about systems that don't use the "child signal pipe"?

No.  This is on GNU/Linux.

> Because if that pipe is in use, then the SIGCHLD handler writes to it,
> and that will cause pselect to return with that descriptor in
> Available, and we then handle the death of the process.  In addition,
> delete_read_fd just clears the data from the fd_callback_info of the
> descriptor, and the loop that examines the bits in Available is AFAICT
> careful enough to test these fields of the callback info before doing
> anything because the descriptor was returned by pselect as ready.  So
> what kind of problems could have happened in step 6 above?

I'm not sure.

> And once again, this is a separate issue, so let's discuss it
> separately.

Okay.  Let me just say again that I suspect there are bugs here which
can be triggered by both threads and signal handlers.  We can leave it
at that.


From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 02 11:59:43 2025
Received: (at 79334) by debbugs.gnu.org; 2 Sep 2025 15:59:43 +0000
Received: from localhost ([127.0.0.1]:36095 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1utTQ6-0007Dx-RU
	for submit@debbugs.gnu.org; Tue, 02 Sep 2025 11:59:43 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:55112)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@gnu.org>) id 1utTQ2-0007DZ-TX
 for 79334@debbugs.gnu.org; Tue, 02 Sep 2025 11:59:40 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@gnu.org>)
 id 1utTPu-0001RR-NZ; Tue, 02 Sep 2025 11:59:30 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date:
 mime-version; bh=sng6gW7pioCvoWFjt62CZcwRHn9LYqx7aZJ+SCQAkPM=; b=XNT6UMi1CvfH
 Mq+WnDXfbHi4Ge6nv/VFdLp2/rigdB9bTKYP5eyAGe66xZY0/iRq2hpCL+jP74y4AHNxYCyPrKpVv
 mpK0/ToH6zMzrqNINnXVaVwQqeGIvrGhDMb1v6GXEXlpnSvbK2gmv5KbFR9ejY+cFrye0pSMuuT4D
 kHMp2u8QYzaRna07pUQfmhZHj6IVR4FP60DHZLmzUCcR2BKnCKdXZIDCKRnaXOIPJzLr/WfSpzvLf
 ET6UEGm401ENzBiQM+XHK3VHXSXpAKal8X+NTjeHjC4edo5mVl+1t+jxRxN6ymT7N3dKgSPt/DgJb
 z/eaC7QqXeozdgX/61ATwQ==;
Date: Tue, 02 Sep 2025 18:59:28 +0300
Message-Id: <86ms7cluan.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: Spencer Baugh <sbaugh@janestreet.com>
In-Reply-To: <ieriki0nbms.fsf@janestreet.com> (message from Spencer Baugh on
 Tue, 02 Sep 2025 10:59:39 -0400)
Subject: Re: bug#79334: [PATCH] Don't release thread select lock
 unnecessarily, [PATCH] Defer closing file descriptors used by other
 threads
References: <ierwm6n6tno.fsf@janestreet.com> <86cy8es4bo.fsf@gnu.org>
 <ierqzwu6ztl.fsf@janestreet.com> <86a53is0zw.fsf@gnu.org>
 <ier8qj26nlj.fsf@janestreet.com> <ier5xe57vrr.fsf@janestreet.com>
 <ier34997u4c.fsf_-_@janestreet.com> <86zfbhqptt.fsf@gnu.org>
 <ierldn0olcg.fsf@janestreet.com> <86jz2kpy1w.fsf@gnu.org>
 <iera53eno2q.fsf@janestreet.com> <86ldmym0ux.fsf@gnu.org>
 <CAO=BR8PurEeiexhE-_enihOrJBoXSBQ0pQyH9eiC-SYe+HZtOg@mail.gmail.com>
 <86bjntm7aw.fsf@gnu.org> <ierv7m1m0u0.fsf@janestreet.com>
 <86y0qwlxzo.fsf@gnu.org> <ieriki0nbms.fsf@janestreet.com>
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79334
Cc: 79334@debbugs.gnu.org, eggert@cs.ucla.edu, dmitry@gutov.dev
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Spencer Baugh <sbaugh@janestreet.com>
> Cc: 79334@debbugs.gnu.org,  eggert@cs.ucla.edu,  dmitry@gutov.dev
> Date: Tue, 02 Sep 2025 10:59:39 -0400
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> Because we currently call emacs_abort when we get EBADF:
> >
> > So, if the danger of getting EBADF due to threads-and-processes
> > mishaps is real (I'm saying "if" because I don't think we had bug
> > reports about that),
> 
> It is real.  I have seen it plenty at my site.  I described this issue
> in my initial email.  It is the issue I care about solving.  This bug is
> my report of the issue.  Solving it is my highest priority.

OK, so let's see if fixing deletion of processes fixes that problem.
If you see that in your usage, it should be relatively easy to test.
Then we can take it from there.

> > Let's discuss this issue separately, okay?  We are already trying to
> > cover too many different scenarios, which makes this discussion hard
> > to follow and the probability of losing focus high.
> 
> It's not a separate issue, it's the entire point of this bug.  So yes,
> let's stay focused on this issue and not other issues.
> 
> Which of these two options do you think we should go with?

I would like fix the process deletion and closing of the descriptors
first, because that can definitely be one reason for EBADF.  We can
then discuss additional measures.  From where I stand, it is best to
completely eliminate EBADF due to these problems; if we succeed, the
code which deals with EBADF could maybe be left as it is, since this
should then be some spurious problem with no known reasons.  If it
turns out that we cannot eliminate EBADF completely by fixing the
process-deactivation code, then yes, we will need to do something when
we get EBADF as result.

So what happened to the changes in the code that handles deactivation
of a process? can we return to that and arrive at some agreed-upon
solution?  AFAIR, last time I asked you why not fix these problems
inside status_notify (and the few other places which use
FOR_EACH_PROCESS).

> > Because if that pipe is in use, then the SIGCHLD handler writes to it,
> > and that will cause pselect to return with that descriptor in
> > Available, and we then handle the death of the process.  In addition,
> > delete_read_fd just clears the data from the fd_callback_info of the
> > descriptor, and the loop that examines the bits in Available is AFAICT
> > careful enough to test these fields of the callback info before doing
> > anything because the descriptor was returned by pselect as ready.  So
> > what kind of problems could have happened in step 6 above?
> 
> I'm not sure.

That's my reading of the code.  Of course, I could be wrong, but then
we need a description of the scenario details which causes such
problems, including the control and data flow which make it possible.

> > And once again, this is a separate issue, so let's discuss it
> > separately.
> 
> Okay.  Let me just say again that I suspect there are bugs here which
> can be triggered by both threads and signal handlers.  We can leave it
> at that.

Yes, I agree that signals add a non-trivial complexity to this stuff,
due to their asynchronous nature and their ability to run code on the
"wrong" thread.