GNU bug report logs - #79334
[PATCH] Don't release thread select lock unnecessarily

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Thu, 28 Aug 2025 21:10:02 UTC

Severity: normal

Tags: patch

Full log


Message #44 received at 79334 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79334 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, dmitry <at> gutov.dev
Subject: Re: bug#79334: [PATCH] Don't release thread select lock
 unnecessarily, [PATCH] Defer closing file descriptors used by other
 threads
Date: Fri, 29 Aug 2025 16:25:39 -0400
[Message part 1 (text/plain, inline)]
Spencer Baugh <sbaugh <at> janestreet.com> writes:
> Spencer Baugh <sbaugh <at> janestreet.com> writes:
>> While doing this, I had another thought...
>>
>> Do you know if we have any code which handles the fact that
>> delete-process might be called on a process while another thread is
>> calling pselect on its fds?
>>
>> I don't think we do, which suggests that might also be problematic.
>> I'll work on a fix which encompasses that as well.
>
> OK, I think this patch fixes the issue.
>
> Many different pieces of code in Emacs can close file descriptors.  We
> need to be generically careful to not close file descriptors which some
> other thread is currently waiting on.  In my patch, we do this by
> letting the waiting_thread itself close those file descriptors, after it
> returns from select.
>
> Unfortunately I don't have a reliable test which produces the failure
> case, but in this patch I added a fprintf when we close a deferred fd.
> This prints whenever some thread would have closed an fd that another
> thread was waiting on.  If I run the following program (with emacs -Q
> --batch):
>
> (defun my-break-thread ()
>   (let ((proc (make-process
>                :name "foo"
>                :command '("sleep" "1"))))
>     (while (process-live-p proc)
>       (accept-process-output proc 0.05))))
> (while t
>   (make-thread #'my-break-thread "thread1")
>   (thread-join (make-thread #'my-break-thread "thread2")))
>
> I get about one print every 10 seconds.  Each print is a problem case
> which has been averted, so I do think this patch fixes the bug.
>
> (Obviously, we should remove the fprintf before applying the change)

Of course, I immediately found a bug in my patch (I wasn't clearing
waiting_thread after closing a deferred fd).  Fixed in the attached.

In related news: running process-tests.el in a loop also triggers my
"Closing deferred fd" message.  Which suggests that src/process-tests.el
was probably also suffering from bugs in this area, which could have
been causing hangs or flaky failures.  Which should now hopefully be
fixed.

[0001-Defer-closing-file-descriptors-used-by-other-threads.patch (text/x-patch, inline)]
From 9e46aa50f5de7d719a6e427fe44bb6b8b8e226fc Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Fri, 29 Aug 2025 15:39:00 -0400
Subject: [PATCH] Defer closing file descriptors used by other threads

* src/process.c (other_thread_is_waiting_for): Add.
(clear_fd_callback_data): Don't clear WAITING_FOR_CLOSE_FD.
(close_process_fd): Set WAITING_FOR_CLOSE_FD for fds with a
waiting_thread.
(wait_reading_process_output): Close fds with
WAITING_FOR_CLOSE_FD. (bug#79334)
---
 src/process.c | 46 ++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 42 insertions(+), 4 deletions(-)

diff --git a/src/process.c b/src/process.c
index b33bebb9a35..9180b92c61f 100644
--- a/src/process.c
+++ b/src/process.c
@@ -448,7 +448,9 @@ make_lisp_proc (struct Lisp_Process *p)
   /* This descriptor refers to a process.  */
   PROCESS_FD = 8,
   /* A non-blocking connect.  Only valid if FOR_WRITE is set.  */
-  NON_BLOCKING_CONNECT_FD = 16
+  NON_BLOCKING_CONNECT_FD = 16,
+  /* This fd is waiting to be closed. */
+  WAITING_FOR_CLOSE_FD = 32,
 };
 
 static struct fd_callback_data
@@ -466,14 +468,30 @@ make_lisp_proc (struct Lisp_Process *p)
   struct thread_state *waiting_thread;
 } fd_callback_info[FD_SETSIZE];
 
+static bool
+other_thread_is_waiting_for (struct fd_callback_data* elem)
+{
+  return elem->waiting_thread != NULL && elem->waiting_thread != current_thread;
+}
+
 static void
 clear_fd_callback_data(struct fd_callback_data* elem)
 {
   elem->func = NULL;
   elem->data = NULL;
-  elem->flags = 0;
   elem->thread = NULL;
-  elem->waiting_thread = NULL;
+  if (other_thread_is_waiting_for (elem))
+    {
+      /* Preserve WAITING_FOR_CLOSE_FD; see process_close_fd.  Leaving
+	 flags non-zero means recompute_max_desc won't reduce max_desc
+	 past this element. */
+      elem->flags &= WAITING_FOR_CLOSE_FD;
+    }
+  else
+    {
+      elem->flags = 0;
+      elem->waiting_thread = NULL;
+    }
 }
 
 
@@ -2113,7 +2131,11 @@ close_process_fd (int *fd_addr)
   if (0 <= fd)
     {
       *fd_addr = -1;
-      emacs_close (fd);
+      if (other_thread_is_waiting_for (&fd_callback_info[fd]))
+	/* Let waiting_thread handle closing the fd. */
+	fd_callback_info[fd].flags = WAITING_FOR_CLOSE_FD;
+      else
+	emacs_close (fd);
     }
 }
 
@@ -5816,6 +5838,22 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
       /* Make C-g and alarm signals set flags again.  */
       clear_waiting_for_input ();
 
+      /* Close fds which other threads wanted to close.  */
+      for (int fd = 0; fd <= max_desc; ++fd)
+	{
+	  if (fd_callback_info[fd].waiting_thread == current_thread
+	      && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD))
+	    {
+	      fprintf (stderr, "closing deferred fd %d\n", fd);
+	      emacs_close (fd);
+	      /* Ignore any events which happened on this fd. */
+	      FD_CLR (fd, &Available);
+	      FD_CLR (fd, &Writeok);
+	      clear_fd_callback_data (&fd_callback_info[fd]);
+	    }
+	}
+      recompute_max_desc ();
+
       /*  If we woke up due to SIGWINCH, actually change size now.  */
       do_pending_window_change (0);
 
-- 
2.43.7


This bug report was last modified 8 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.