GNU bug report logs - #79334
[PATCH] Don't release thread select lock unnecessarily

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Thu, 28 Aug 2025 21:10:02 UTC

Severity: normal

Tags: patch

Full log


Message #53 received at 79334 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79334 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, dmitry <at> gutov.dev
Subject: Re: bug#79334: [PATCH] Don't release thread select lock
 unnecessarily, [PATCH] Defer closing file descriptors used by other
 threads
Date: Sat, 30 Aug 2025 11:55:27 -0400
[Message part 1 (text/plain, inline)]
(I'm responding to both your emails with review feedback here)

Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Cc: 79334 <at> debbugs.gnu.org,  eggert <at> cs.ucla.edu,  dmitry <at> gutov.dev
>> Date: Fri, 29 Aug 2025 16:25:39 -0400
>> +      for (int fd = 0; fd <= max_desc; ++fd)
>> +	{
>> +	  if (fd_callback_info[fd].waiting_thread == current_thread
>> +	      && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD))
>                                              ^^^^^^^^^^^^^^^^^^^^^^^
> Shouldn't this test that the WAITING_FOR_CLOSE_FD bit is set, not that
> it is the _only_ bit set?  Even if the code is correct today, it might
> not be future-proof if we add some additional flags at some point.

True.  Fixed.

>> +	      /* Ignore any events which happened on this fd. */
>> +	      FD_CLR (fd, &Available);
>> +	      FD_CLR (fd, &Writeok);
>
> Is this wise?  How can we be sure that these events were already
> handled (presumably, in another thread)?  If they were not, we will
> lose events.  What will happen if we don't ignore them here?

Any events which might appear at this point were already events we were
going to lose by closing this file descriptor.  For example, if a
process produces output just as we do delete-process on it, we never
read that output.  The only change is that we now have to explicitly
ignore those events; before, we silently dropped them at the time we
closed the file descriptor.

>> Many different pieces of code in Emacs can close file descriptors.  We
>> need to be generically careful to not close file descriptors which some
>> other thread is currently waiting on.  In my patch, we do this by
>> letting the waiting_thread itself close those file descriptors, after it
>> returns from select.
>
> Not literally "when returning from pselect", I hope, but when the
> process is being deleted _and_ the waiting thread returns from
> pselect, right?  IOW, we don't close the descriptor each time pselecft
> returns in the right thread, then reopen it before the next call to
> pselect, because that won't work.

Right.

>> +static bool
>> +other_thread_is_waiting_for (struct fd_callback_data* elem)
>> +{
>> +  return elem->waiting_thread != NULL && elem->waiting_thread != current_thread;
>> +}
>
> This should also check that the waiting thread is still alive.  If it
> isn't, it's okay to close the descriptor.  Otherwise, descriptors
> belonging to threads that exited might never be closed.

waiting_thread always points to a living thread, because it's only set
while a thread is inside wait_reading_process_output.

(In fact, when a thread sees waiting_thread set to a value which is
neither NULL nor current_thread, it can only ever point to a thread
which is currently inside thread_select)

>> @@ -2113,7 +2131,11 @@ close_process_fd (int *fd_addr)
>>    if (0 <= fd)
>>      {
>>        *fd_addr = -1;
>> -      emacs_close (fd);
>> +      if (other_thread_is_waiting_for (&fd_callback_info[fd]))
>> +	/* Let waiting_thread handle closing the fd. */
>                                                    ^^
> Style: 2 spaces between the period and "*/".

Fixed.

>> +      /* Close fds which other threads wanted to close.  */
>> +      for (int fd = 0; fd <= max_desc; ++fd)
>> +	{
>> +	  if (fd_callback_info[fd].waiting_thread == current_thread
>> +	      && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD))
>> +	    {
>> +	      fprintf (stderr, "closing deferred fd %d\n", fd);
>> +	      emacs_close (fd);
>> +	      /* Ignore any events which happened on this fd. */
>> +	      FD_CLR (fd, &Available);
>> +	      FD_CLR (fd, &Writeok);
>> +	      fd_callback_info[fd].flags = 0;
>
> Why do it here and not when we deactivate processes?

Because that would close file descriptors that another thread is
currently selecting on, causing EBADF.

> Until now we never closed any descriptors inside
> wait_reading_process_output.

Just to be clear, we can call deactivate_process from
wait_reading_process_output, through status_notify, through process
filters, or other similar.  That closes descriptors.

> More generally, why this design and not a simpler change which only
> deactivates processes whose I/O is waited by the current thread, as
> discussed previously?

I looked into that, but I determined that it's too complicated.  It
would require us to keep track of a new kind of process state: a process
which has been deleted, but hasn't been deactivated yet.  This would
need to be supported for each kind of process, and we'd need to record
new kinds of state to finish the deactivation later.

It's simpler to keep the ability to delete any process at any time, and
have that fully deactivate the process as it does today.  Doing that
only causes one issue: if another thread is selecting on the process's
fds, that thread will break if it tries to handle events on the fds, or
the thread will simply get EBADF from select.  So, we detect that by
checking waiting_thread, and marking the fds so that the other thread
knows not to touch them, and to instead just close them to finish
deleting the process.

> I think this alternative would be easier to understand and reason
> about, because (as established in another discussion)
> wait_reading_process_output frequently loops more than we expect it
> to.

I'm not sure what you mean by that, why does wait_reading_process_output
looping more than we expect cause issues?  I don't think it would.

> Looking at all the callers of close_process_fd, it sounds like it's
> too low-level to do this.  For example, it is called from
> create_process (including in the forked child) to close the unneeded
> ends of the pipe, where we probably don't want this.  And
> clear_fd_callback_data is also used for the keyboard input descriptor,
> which is probably also not relevant.

True.

I looked into fixing these, and that suggested to me an altogether
simpler approach.  We can keep the logic entirely contained in
wait_reading_process_output, just adding a bit of code around the select
call.

And we don't need to defer deleting the fds at all: just recognize when
it's happened, and be careful to not touch those fds afterwards.

Attached.

[0001-Ignore-descriptors-deleted-during-select.patch (text/x-patch, inline)]
From b37e71f47913e1011c4a7857e166e78eddc2a184 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Sat, 30 Aug 2025 11:42:19 -0400
Subject: [PATCH] Ignore descriptors deleted during select

* src/process.c (wait_reading_process_output): Save the fds
we're waiting on and check that we're still waiting after
select. (bug#79334)
---
 src/process.c | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/src/process.c b/src/process.c
index 1612248a11d..11af2f55f63 100644
--- a/src/process.c
+++ b/src/process.c
@@ -5643,6 +5643,7 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 	  nfds = read_kbd ? 0 : 1;
 	  no_avail = 1;
 	  FD_ZERO (&Available);
+	  xerrno = 0;
 	}
       else
 	{
@@ -5756,6 +5757,18 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 	      && timespec_cmp (short_timeout, timeout) < 0)
 	    timeout = short_timeout;
 #endif
+      /* Save the fds we're currently waiting_thread for.  */
+      fd_set waiting_fds;
+      FD_ZERO (&waiting_fds);
+      int max_waiting_fds = 0;
+      for (int fd = 0; fd <= max_desc; ++fd)
+        {
+	  if (fd_callback_info[fd].waiting_thread == current_thread)
+	    {
+	      FD_SET (fd, &waiting_fds);
+	      max_waiting_fds = fd;
+	    }
+	}
 
 	  /* Android requires using a replacement for pselect in
 	     android.c to poll for events.  */
@@ -5809,10 +5822,30 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
 		    FD_SET (channel, &Available);
 	    }
 #endif
+	  xerrno = errno;
+	  /* Check if any of waiting_fds was closed by another thread or
+	     a signal handler. */
+	  for (int fd = 0; fd <= max_waiting_fds; ++fd)
+	    {
+	      if (FD_ISSET (fd, &waiting_fds)
+		  && fd_callback_info[fd].waiting_thread != current_thread)
+		{
+		  /* Ignore any events which we happened to see on this
+		     fd before it was closed; we can't do anything for
+		     it now, since it's closed.  */
+		  FD_CLR (fd, &Available);
+		  FD_CLR (fd, &Writeok);
+		  /* If we got an EBADF and no events, this is probably
+		     why; clear it so we can try again.  */
+		  if (nfds < 0 && xerrno == EBADF)
+		    {
+		      nfds = 0;
+		      xerrno = 0;
+		    }
+		}
+	    }
 	}
 
-      xerrno = errno;
-
       /* Make C-g and alarm signals set flags again.  */
       clear_waiting_for_input ();
 
-- 
2.43.7


This bug report was last modified 8 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.