GNU bug report logs - #79201
30.1.90; set-process-thread can permanently break fd_callback_info slots

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Fri, 8 Aug 2025 17:07:02 UTC

Severity: normal

Found in version 30.1.90

To reply to this bug, email your comments to 79201 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Fri, 08 Aug 2025 17:07:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Spencer Baugh <sbaugh <at> janestreet.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 08 Aug 2025 17:07:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: bug-gnu-emacs <at> gnu.org
Cc: dmitry <at> gutov.dev, app-emacs-dev <at> janestreet.com,
 John Wiegley <jwiegley <at> gmail.com>
Subject: 30.1.90; set-process-thread can permanently break fd_callback_info
 slots
Date: Fri, 08 Aug 2025 13:06:22 -0400
Using set-process-thread sets the fd_callback_info[fd].thread slot,
which is not cleared on process exit.  As a result
fd_callback_info[fd].thread can contain a dangling pointer to a dead
thread, which means that fd_callback_info[fd] will be permanently
broken, and any new process created which uses that slot will also be
broken.

Reproduction: the following code will hang forever when the
(delete-process proc1) line is present, but terminates just fine when
that line is deleted.

(setq my-thread (make-thread (lambda () (while t (sleep-for 60)))))
(setq proc1
      (make-process
       :name "proc1"
       :command '("sleep" "inf")))
; Set the process's thread to my-thread
(set-process-thread proc1 my-thread)
; Delete the process so fd_callback_info[fd].thread doesn't get cleared
; on thread exit.
(delete-process proc1)
; Kill my-thread, just for completeness.
(thread-signal my-thread 'error nil)
; Start up a seemingly completely unrelated process; Emacs will never be
; able to read from this process because it's reusing
; fd_callback_info[fd] and .thread is pointing to a dead thread.
(setq proc2
      (make-process
       :name "proc2"
       :command '("sh" "-c" "echo hi; sleep inf")))
(message "waiting on proc2")
(while (null (accept-process-output proc2)))

In fact, SIGINTing this process while it's hanging will cause Emacs to
segfault.  It looks like current_thread is NULL?  Not sure why that
happens, possibly an unrelated bug.

Anyway, probably we should be setting fd_callback_info[fd].thread to
NULL also when the process exits.  But, I suggest we should further also
set it to NULL (and also waiting_thread) when we start using a
fd_callback_info[fd] slot, e.g. in add_read_fd.  That avoids the
possibility of permanent contamination of fd_callback_info slots, which
I think is possible in some other ways, though I haven't been able to
reproduce it yet...


In GNU Emacs 30.1.90 (build 8, x86_64-pc-linux-gnu, X toolkit, cairo
 version 1.15.12, Xaw scroll bars) of 2025-08-06 built on
 igm-qws-u22796a
Repository revision: a7392a6cea9cb7d77fab044a8e8cbcb012b5d6c7
Repository branch: emacs-30
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Rocky Linux 8.10 (Green Obsidian)

Configured using:
 'configure --with-x-toolkit=lucid --without-gpm --without-gconf
 --without-gsettings --without-selinux --without-imagemagick
 --with-modules --with-gif=no --with-cairo --with-rsvg
 --without-compress-install --with-tree-sitter
 --with-native-compilation=aot
 PKG_CONFIG_PATH=/usr/local/home/garnish/libtree-sitter/0.22.6-1/lib/pkgconfig/'

Configured features:
CAIRO DBUS FREETYPE GLIB GMP GNUTLS HARFBUZZ JPEG LIBSYSTEMD LIBXML2
MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER PNG RSVG SECCOMP SOUND
SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS TREE_SITTER X11 XDBE XIM
XINPUT2 XPM LUCID ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Fri, 08 Aug 2025 18:23:01 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "John Wiegley" <johnw <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: dmitry <at> gutov.dev, bug-gnu-emacs <at> gnu.org, app-emacs-dev <at> janestreet.com
Subject: Re: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Fri, 08 Aug 2025 11:22:23 -0700
>>>>> Spencer Baugh <sbaugh <at> janestreet.com> writes:

> (while (null (accept-process-output proc2)))

Based on the what the manual says, shouldn’t this be?

    (while (accept-process-output proc2))

> Anyway, probably we should be setting fd_callback_info[fd].thread to
> NULL also when the process exits.  But, I suggest we should further also
> set it to NULL (and also waiting_thread) when we start using a
> fd_callback_info[fd] slot, e.g. in add_read_fd.  That avoids the
> possibility of permanent contamination of fd_callback_info slots, which
> I think is possible in some other ways, though I haven't been able to
> reproduce it yet...

This sounds like an important fix to me. I haven’t run into this problem yet
myself, but the logic above tracks.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Fri, 08 Aug 2025 20:35:02 GMT) Full text and rfc822 format available.

Message #11 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: "John Wiegley" <johnw <at> gnu.org>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Fri, 08 Aug 2025 16:34:10 -0400
[Message part 1 (text/plain, inline)]
"John Wiegley" <johnw <at> gnu.org> writes:

>>>>>> Spencer Baugh <sbaugh <at> janestreet.com> writes:
>
>> (while (null (accept-process-output proc2)))
>
> Based on the what the manual says, shouldn’t this be?
>
>     (while (accept-process-output proc2))

No, I'm trying to run accept-process-output repeatedly until proc2
produces any output at all, which will cause accept-process-output to
return non-nil.

>> Anyway, probably we should be setting fd_callback_info[fd].thread to
>> NULL also when the process exits.  But, I suggest we should further also
>> set it to NULL (and also waiting_thread) when we start using a
>> fd_callback_info[fd] slot, e.g. in add_read_fd.  That avoids the
>> possibility of permanent contamination of fd_callback_info slots, which
>> I think is possible in some other ways, though I haven't been able to
>> reproduce it yet...
>
> This sounds like an important fix to me. I haven’t run into this problem yet
> myself, but the logic above tracks.

Here's a patch to fix it by initializing these fields.

[0001-Initialize-fd_callback_info-thread-variables-on-reus.patch (text/x-patch, inline)]
From d97986f4a46a229c3a19a4e554e16b2104018388 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Fri, 8 Aug 2025 16:33:20 -0400
Subject: [PATCH] Initialize fd_callback_info thread variables on reuse

By using set-process-thread and then delete-process,
fd_callback_info[fd].thread will contain a dangling pointer to a
dead thread, making that fd number permanently broken and Emacs
unable to read output from it.

Handle all such issues by initializing .thread and
.waiting_thread to NULL at the time we start using an
fd_callback_info[fd] slot.

* src/process.c (initialize_fd_callback_info): Add.
(add_read_fd, add_process_read_fd, add_write_fd)
(add_non_blocking_write_fd, add_keyboard_wait_descriptor): Call
initialize_fd_callback_info. (bug#79201)
---
 src/process.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/process.c b/src/process.c
index e61ec425f7e..d5d0db1943e 100644
--- a/src/process.c
+++ b/src/process.c
@@ -467,6 +467,15 @@ make_lisp_proc (struct Lisp_Process *p)
 } fd_callback_info[FD_SETSIZE];
 
 
+static void
+initialize_fd_callback_info(int fd)
+{
+  eassert (0 <= fd && fd < FD_SETSIZE);
+
+  fd_callback_info[fd].thread = NULL;
+  fd_callback_info[fd].waiting_thread = NULL;
+}
+
 /* Add a file descriptor FD to be monitored for when read is possible.
    When read is possible, call FUNC with argument DATA.  */
 
@@ -475,7 +484,6 @@ add_read_fd (int fd, fd_callback func, void *data)
 {
   add_keyboard_wait_descriptor (fd);
 
-  eassert (0 <= fd && fd < FD_SETSIZE);
   fd_callback_info[fd].func = func;
   fd_callback_info[fd].data = data;
 }
@@ -490,14 +498,13 @@ add_non_keyboard_read_fd (int fd, fd_callback func, void *data)
 static void
 add_process_read_fd (int fd)
 {
-  eassert (fd >= 0 && fd < FD_SETSIZE);
+  initialize_fd_callback_info(fd);
   eassert (fd_callback_info[fd].func == NULL);
 
   fd_callback_info[fd].flags &= ~KEYBOARD_FD;
   fd_callback_info[fd].flags |= FOR_READ;
   if (fd > max_desc)
     max_desc = fd;
-  eassert (0 <= fd && fd < FD_SETSIZE);
   fd_callback_info[fd].flags |= PROCESS_FD;
 }
 
@@ -522,7 +529,7 @@ delete_read_fd (int fd)
 void
 add_write_fd (int fd, fd_callback func, void *data)
 {
-  eassert (fd >= 0 && fd < FD_SETSIZE);
+  initialize_fd_callback_info(fd);
 
   fd_callback_info[fd].func = func;
   fd_callback_info[fd].data = data;
@@ -534,7 +541,7 @@ add_write_fd (int fd, fd_callback func, void *data)
 static void
 add_non_blocking_write_fd (int fd)
 {
-  eassert (fd >= 0 && fd < FD_SETSIZE);
+  initialize_fd_callback_info(fd);
   eassert (fd_callback_info[fd].func == NULL);
 
   fd_callback_info[fd].flags |= FOR_WRITE | NON_BLOCKING_CONNECT_FD;
@@ -8296,7 +8303,7 @@ remove_slash_colon (Lisp_Object name)
 add_keyboard_wait_descriptor (int desc)
 {
 #ifdef subprocesses /* Actually means "not MSDOS".  */
-  eassert (desc >= 0 && desc < FD_SETSIZE);
+  initialize_fd_callback_info(desc);
   fd_callback_info[desc].flags &= ~PROCESS_FD;
   fd_callback_info[desc].flags |= (FOR_READ | KEYBOARD_FD);
   if (desc > max_desc)
-- 
2.39.3


Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Sat, 09 Aug 2025 10:34:02 GMT) Full text and rfc822 format available.

Message #14 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90;
 set-process-thread can permanently break fd_callback_info slots
Date: Sat, 09 Aug 2025 13:33:40 +0300
> Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, app-emacs-dev <at> janestreet.com
> Date: Fri, 08 Aug 2025 16:34:10 -0400
> From:  Spencer Baugh via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> Here's a patch to fix it by initializing these fields.

Thanks, but I don't think this is right.  First,
update_processes_for_thread_death clears the .thread member of
fd_callback_info of those processes that were locked to the dying
thread.  Second, wait_reading_process_output_unwind clears the
.waiting_thread member of fd_callback_info for all descriptors, when
we exit wait_reading_process_output (which is called by
accept-process-output).  And finally, IMO it is wrong for the add_*_fd
functions to overwrite these two members, because they are set and
reset by other functions, mainly those which compute the wait mask for
the pselect call.  add_*_fd should not touch these members.

I believe in this case update_processes_for_thread_death doesn't do
its job because your scenario kills the process before the thread
dies, and update_processes_for_thread_death only looks at the
processes that are in Vprocess_alist.  So it is possible that all
that's missing is that delete_read_fd and delete_write_fd (which are
called from deactivate_process) should clear the .thread member of
fd_callback_info corresponding to the process's input and output
descriptors.

If that doesn't fix the issue with your reproducer, please investigate
why, and let's take it from there.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Mon, 11 Aug 2025 16:04:01 GMT) Full text and rfc822 format available.

Message #17 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Mon, 11 Aug 2025 12:03:16 -0400
Eli Zaretskii <eliz <at> gnu.org> writes:

>> Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, app-emacs-dev <at> janestreet.com
>> Date: Fri, 08 Aug 2025 16:34:10 -0400
>> From:  Spencer Baugh via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>> Here's a patch to fix it by initializing these fields.
>
> Thanks, but I don't think this is right.  First,
> update_processes_for_thread_death clears the .thread member of
> fd_callback_info of those processes that were locked to the dying
> thread.  Second, wait_reading_process_output_unwind clears the
> .waiting_thread member of fd_callback_info for all descriptors, when
> we exit wait_reading_process_output (which is called by
> accept-process-output).

Yes.

> And finally, IMO it is wrong for the add_*_fd functions to overwrite
> these two members, because they are set and reset by other functions,
> mainly those which compute the wait mask for the pselect call.
> add_*_fd should not touch these members.

I think we should at least add an eassert in those functions which
checks that .waiting_thread is null.  It is never valid for that to be
non-null in add_*_fd.

> I believe in this case update_processes_for_thread_death doesn't do
> its job because your scenario kills the process before the thread
> dies, and update_processes_for_thread_death only looks at the
> processes that are in Vprocess_alist.  So it is possible that all
> that's missing is that delete_read_fd and delete_write_fd (which are
> called from deactivate_process) should clear the .thread member of
> fd_callback_info corresponding to the process's input and output
> descriptors.
>
> If that doesn't fix the issue with your reproducer, please investigate
> why, and let's take it from there.

Yes, that does fix the issue with my reproducer.

But I think it may cause other issues.  delete_read_fd and
delete_write_fd are called from many other places.  For example,
delete_write_fd is called before calling add_process_read_fd on (I
think?) the same fd in wait_reading_process_output.  Clearing the
.thread there would break the guarantees of set-process-thread.

Further complicating the issue: I think there's deeper issues with
set-process-thread.  There's also incorrect information in the manual
about it:

processes.texi:
> by default a process is locked to the thread that created it.  When a
> process is locked to a thread, output from the process can only be
> accepted by that thread.
...
> @defun set-process-thread process thread
> Set the locking thread of @var{process} to @var{thread}.  @var{thread}
> may be @code{nil}, in which case the process is unlocked.
> @end defun

This is not true.  By default .thread is NULL, i.e., processes are not
locked to any thread.

I think this behavior was just never implemented.  But, it doesn't seem
to have been needed, since no-one has complained about the lack of it in
the last 13 years since threads were released.

I did a quick survey of 300 ELPA packages and there's no use of
set-process-thread in any of them.  And set-process-thread is not useful
in practice if processes don't start out locked to a particular thread,
since trying to lock a process to a thread if it's already unlocked is
inherently racy.

So I think there are two good options.  Either:

A. Make the code match the manual by locking processes to threads by
   default, and somehow fix the issues this causes with breaking
   fd_callback_info slots.

Or:
B. Deprecate set-process-thread and make it a no-op.  Simply delete this
   incorrect section of the manual.

My preference is for B, since this also reduces complexity and will
slightly improve performance.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Mon, 11 Aug 2025 16:52:01 GMT) Full text and rfc822 format available.

Message #20 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Mon, 11 Aug 2025 19:51:08 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: 79201 <at> debbugs.gnu.org,  dmitry <at> gutov.dev,  johnw <at> gnu.org,
>    app-emacs-dev <at> janestreet.com
> Date: Mon, 11 Aug 2025 12:03:16 -0400
> 
> > And finally, IMO it is wrong for the add_*_fd functions to overwrite
> > these two members, because they are set and reset by other functions,
> > mainly those which compute the wait mask for the pselect call.
> > add_*_fd should not touch these members.
> 
> I think we should at least add an eassert in those functions which
> checks that .waiting_thread is null.  It is never valid for that to be
> non-null in add_*_fd.

Maybe, maybe not.  I'm not yet sure.  Let's discuss this later.

> > I believe in this case update_processes_for_thread_death doesn't do
> > its job because your scenario kills the process before the thread
> > dies, and update_processes_for_thread_death only looks at the
> > processes that are in Vprocess_alist.  So it is possible that all
> > that's missing is that delete_read_fd and delete_write_fd (which are
> > called from deactivate_process) should clear the .thread member of
> > fd_callback_info corresponding to the process's input and output
> > descriptors.
> >
> > If that doesn't fix the issue with your reproducer, please investigate
> > why, and let's take it from there.
> 
> Yes, that does fix the issue with my reproducer.

Then we should do that, I think.

> But I think it may cause other issues.  delete_read_fd and
> delete_write_fd are called from many other places.  For example,
> delete_write_fd is called before calling add_process_read_fd on (I
> think?) the same fd in wait_reading_process_output.

I don't see these calls to delete an fd before adding it, please point
to the code you had in mind.  When a descriptor is deleted, its
information is completely forgotten, so there's no reason to keep the
.thread and .waiting_thread members.

> Clearing the .thread there would break the guarantees of
> set-process-thread.

I don't think set-process-thread has anything to do with this, see
below.

> Further complicating the issue: I think there's deeper issues with
> set-process-thread.  There's also incorrect information in the manual
> about it:
> 
> processes.texi:
> > by default a process is locked to the thread that created it.  When a
> > process is locked to a thread, output from the process can only be
> > accepted by that thread.
> ...
> > @defun set-process-thread process thread
> > Set the locking thread of @var{process} to @var{thread}.  @var{thread}
> > may be @code{nil}, in which case the process is unlocked.
> > @end defun
> 
> This is not true.  By default .thread is NULL, i.e., processes are not
> locked to any thread.

set-process-thread doesn't set the .thread member of the descriptors,
it sets the .thread member of the process object.  Here's the code
from make-process that processes.texi has in mind:

  static Lisp_Object
  make_process (Lisp_Object name)
  {
    struct Lisp_Process *p = allocate_process ();
    /* Initialize Lisp data.  Note that allocate_process initializes all
       Lisp data to nil, so do it only for slots which should not be nil.  */
    pset_status (p, Qrun);
    pset_mark (p, Fmake_marker ());
    pset_thread (p, Fcurrent_thread ());  <<<<<<<<<<<<<<<<<<<<

And here's accept-process-output heeding this:

  if (! NILP (process))
    {
      CHECK_PROCESS (process);
      struct Lisp_Process *proc = XPROCESS (process);

      /* Can't wait for a process that is dedicated to a different
	 thread.  */
      if (!NILP (proc->thread) && !BASE_EQ (proc->thread, Fcurrent_thread ()))
	{
	  Lisp_Object proc_thread_name = XTHREAD (proc->thread)->name;

	  error ("Attempt to accept output from process %s locked to thread %s",
		 SDATA (proc->name),
		 STRINGP (proc_thread_name)
		 ? SDATA (proc_thread_name)
		 : SDATA (Fprin1_to_string (proc->thread, Qt, Qnil)));
	}

The .thread and .waiting_thread are set when we prepare descriptors
for passing them to pselect, they are a different (although related)
mechanism.

> I think this behavior was just never implemented.  But, it doesn't seem
> to have been needed, since no-one has complained about the lack of it in
> the last 13 years since threads were released.

See above: it's both implemented and working.  You are welcome to add
tests for that.

> So I think there are two good options.  Either:
> 
> A. Make the code match the manual by locking processes to threads by
>    default, and somehow fix the issues this causes with breaking
>    fd_callback_info slots.
> 
> Or:
> B. Deprecate set-process-thread and make it a no-op.  Simply delete this
>    incorrect section of the manual.
> 
> My preference is for B, since this also reduces complexity and will
> slightly improve performance.

Given the code I show above, do you still see a problem?  If yes,
please tell the details.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Mon, 11 Aug 2025 17:21:02 GMT) Full text and rfc822 format available.

Message #23 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Mon, 11 Aug 2025 13:20:05 -0400
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Cc: 79201 <at> debbugs.gnu.org,  dmitry <at> gutov.dev,  johnw <at> gnu.org,
>>    app-emacs-dev <at> janestreet.com
>> Date: Mon, 11 Aug 2025 12:03:16 -0400
>> 
>> > And finally, IMO it is wrong for the add_*_fd functions to overwrite
>> > these two members, because they are set and reset by other functions,
>> > mainly those which compute the wait mask for the pselect call.
>> > add_*_fd should not touch these members.
>> 
>> I think we should at least add an eassert in those functions which
>> checks that .waiting_thread is null.  It is never valid for that to be
>> non-null in add_*_fd.
>
> Maybe, maybe not.  I'm not yet sure.  Let's discuss this later.
>
>> > I believe in this case update_processes_for_thread_death doesn't do
>> > its job because your scenario kills the process before the thread
>> > dies, and update_processes_for_thread_death only looks at the
>> > processes that are in Vprocess_alist.  So it is possible that all
>> > that's missing is that delete_read_fd and delete_write_fd (which are
>> > called from deactivate_process) should clear the .thread member of
>> > fd_callback_info corresponding to the process's input and output
>> > descriptors.
>> >
>> > If that doesn't fix the issue with your reproducer, please investigate
>> > why, and let's take it from there.
>> 
>> Yes, that does fix the issue with my reproducer.
>
> Then we should do that, I think.
>
>> But I think it may cause other issues.  delete_read_fd and
>> delete_write_fd are called from many other places.  For example,
>> delete_write_fd is called before calling add_process_read_fd on (I
>> think?) the same fd in wait_reading_process_output.
>
> I don't see these calls to delete an fd before adding it, please point
> to the code you had in mind.  When a descriptor is deleted, its
> information is completely forgotten, so there's no reason to keep the
> .thread and .waiting_thread members.
>
>> Clearing the .thread there would break the guarantees of
>> set-process-thread.
>
> I don't think set-process-thread has anything to do with this, see
> below.

set-process-thread is the only code which ever sets the .thread field of
fd_callback_info to a non-null value.

>> Further complicating the issue: I think there's deeper issues with
>> set-process-thread.  There's also incorrect information in the manual
>> about it:
>> 
>> processes.texi:
>> > by default a process is locked to the thread that created it.  When a
>> > process is locked to a thread, output from the process can only be
>> > accepted by that thread.
>> ...
>> > @defun set-process-thread process thread
>> > Set the locking thread of @var{process} to @var{thread}.  @var{thread}
>> > may be @code{nil}, in which case the process is unlocked.
>> > @end defun
>> 
>> This is not true.  By default .thread is NULL, i.e., processes are not
>> locked to any thread.
>
> set-process-thread doesn't set the .thread member of the descriptors,

No, it does.  Here's the code:

  DEFUN ("set-process-thread", Fset_process_thread, Sset_process_thread,
         2, 2, 0,
         doc: /* Set the locking thread of PROCESS to be THREAD.
  If THREAD is nil, the process is unlocked.  */)
    (Lisp_Object process, Lisp_Object thread)
  {
...
    if (proc->infd >= 0)
      fd_callback_info[proc->infd].thread = tstate;
    eassert (proc->outfd < FD_SETSIZE);
    if (proc->outfd >= 0)
      fd_callback_info[proc->outfd].thread = tstate;
  
    return thread;
  }

I think set-process-thread should not do this, though.  It should only
set the thread member of the process object.  It should not set the
thread member of the descriptors.

> it sets the .thread member of the process object.  Here's the code
> from make-process that processes.texi has in mind:
>
>   static Lisp_Object
>   make_process (Lisp_Object name)
>   {
>     struct Lisp_Process *p = allocate_process ();
>     /* Initialize Lisp data.  Note that allocate_process initializes all
>        Lisp data to nil, so do it only for slots which should not be nil.  */
>     pset_status (p, Qrun);
>     pset_mark (p, Fmake_marker ());
>     pset_thread (p, Fcurrent_thread ());  <<<<<<<<<<<<<<<<<<<<
>
> And here's accept-process-output heeding this:
>
>   if (! NILP (process))
>     {
>       CHECK_PROCESS (process);
>       struct Lisp_Process *proc = XPROCESS (process);
>
>       /* Can't wait for a process that is dedicated to a different
> 	 thread.  */
>       if (!NILP (proc->thread) && !BASE_EQ (proc->thread, Fcurrent_thread ()))
> 	{
> 	  Lisp_Object proc_thread_name = XTHREAD (proc->thread)->name;
>
> 	  error ("Attempt to accept output from process %s locked to thread %s",
> 		 SDATA (proc->name),
> 		 STRINGP (proc_thread_name)
> 		 ? SDATA (proc_thread_name)
> 		 : SDATA (Fprin1_to_string (proc->thread, Qt, Qnil)));
> 	}
>
> The .thread and .waiting_thread are set when we prepare descriptors
> for passing them to pselect, they are a different (although related)
> mechanism.
>
>> I think this behavior was just never implemented.  But, it doesn't seem
>> to have been needed, since no-one has complained about the lack of it in
>> the last 13 years since threads were released.
>
> See above: it's both implemented and working.  You are welcome to add
> tests for that.

Yes, sorry, I missed that there are two separate mechanisms.  I think
set-process-thread should continue setting the thread member of the
process object, but not the thread member of the fd_callback_info struct.

>> So I think there are two good options.  Either:
>> 
>> A. Make the code match the manual by locking processes to threads by
>>    default, and somehow fix the issues this causes with breaking
>>    fd_callback_info slots.
>> 
>> Or:
>> B. Deprecate set-process-thread and make it a no-op.  Simply delete this
>>    incorrect section of the manual.
>> 
>> My preference is for B, since this also reduces complexity and will
>> slightly improve performance.
>
> Given the code I show above, do you still see a problem?  If yes,
> please tell the details.

Yes.  Let's just make set-process-thread stop setting the thread member
of the fd_callback_info struct.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Mon, 11 Aug 2025 17:28:02 GMT) Full text and rfc822 format available.

Message #26 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Mon, 11 Aug 2025 20:26:38 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: 79201 <at> debbugs.gnu.org,  dmitry <at> gutov.dev,  johnw <at> gnu.org,
>    app-emacs-dev <at> janestreet.com
> Date: Mon, 11 Aug 2025 13:20:05 -0400
> 
> Yes, sorry, I missed that there are two separate mechanisms.  I think
> set-process-thread should continue setting the thread member of the
> process object, but not the thread member of the fd_callback_info struct.

Why not?  set-process-thread can be called when the descriptors for
the process are already set up.  That's unlike make-process, which
indeed doesn't set the .thread member of the descriptors.

> Yes.  Let's just make set-process-thread stop setting the thread member
> of the fd_callback_info struct.

I don't agree it shouldn't see above.  What problems do you see as the
result of that?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Mon, 11 Aug 2025 17:59:02 GMT) Full text and rfc822 format available.

Message #29 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Mon, 11 Aug 2025 13:58:02 -0400
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Cc: 79201 <at> debbugs.gnu.org,  dmitry <at> gutov.dev,  johnw <at> gnu.org,
>>    app-emacs-dev <at> janestreet.com
>> Date: Mon, 11 Aug 2025 13:20:05 -0400
>> 
>> Yes, sorry, I missed that there are two separate mechanisms.  I think
>> set-process-thread should continue setting the thread member of the
>> process object, but not the thread member of the fd_callback_info struct.
>
> Why not?  set-process-thread can be called when the descriptors for
> the process are already set up.  That's unlike make-process, which
> indeed doesn't set the .thread member of the descriptors.

It's useless and causes this bug.

>> Yes.  Let's just make set-process-thread stop setting the thread member
>> of the fd_callback_info struct.
>
> I don't agree it shouldn't see above.  What problems do you see as the
> result of that?

It causes this bug.

Here's an attached patch which makes this change.  The thread member of
fd_callback_info is then never set, so it can be entirely deleted.

This doesn't need an announcement, because the behavior of "calling
accept-process-output on a process locked to a different thread" still
works the same way it always did.  There's no (non-bug) user-visible
change.

[0001-Remove-thread-member-of-fd_callback_info.patch (text/x-patch, inline)]
From b5311d3af0d020f718b62f239f9e360379276230 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Mon, 11 Aug 2025 13:54:14 -0400
Subject: [PATCH] Remove thread member of fd_callback_info

Previously, if set-process-thread was called on a process after its
creation, we would set the thread field in the fd_callback_info for that
process's file descriptors.

This behavior isn't useful, and cauases bugs since this field could stay
set to a dangling pointer if delete-process is called before thread
exit.  So simply remove it.

* src/process.c (compute_input_wait_mask)
(compute_non_process_wait_mask, compute_non_keyboard_wait_mask)
(compute_write_mask, update_processes_for_thread_death)
(Fset_process_thread): Remove thread member of fd_callback_info
(bug#79201)
---
 src/process.c | 34 ++--------------------------------
 1 file changed, 2 insertions(+), 32 deletions(-)

diff --git a/src/process.c b/src/process.c
index e61ec425f7e..1a5c99139a6 100644
--- a/src/process.c
+++ b/src/process.c
@@ -591,9 +591,6 @@ compute_input_wait_mask (fd_set *mask)
   eassert (max_desc < FD_SETSIZE);
   for (fd = 0; fd <= max_desc; ++fd)
     {
-      if (fd_callback_info[fd].thread != NULL
-	  && fd_callback_info[fd].thread != current_thread)
-	continue;
       if (fd_callback_info[fd].waiting_thread != NULL
 	  && fd_callback_info[fd].waiting_thread != current_thread)
 	continue;
@@ -614,9 +611,6 @@ compute_non_process_wait_mask (fd_set *mask)
   eassert (max_desc < FD_SETSIZE);
   for (fd = 0; fd <= max_desc; ++fd)
     {
-      if (fd_callback_info[fd].thread != NULL
-	  && fd_callback_info[fd].thread != current_thread)
-	continue;
       if (fd_callback_info[fd].waiting_thread != NULL
 	  && fd_callback_info[fd].waiting_thread != current_thread)
 	continue;
@@ -638,9 +632,6 @@ compute_non_keyboard_wait_mask (fd_set *mask)
   eassert (max_desc < FD_SETSIZE);
   for (fd = 0; fd <= max_desc; ++fd)
     {
-      if (fd_callback_info[fd].thread != NULL
-	  && fd_callback_info[fd].thread != current_thread)
-	continue;
       if (fd_callback_info[fd].waiting_thread != NULL
 	  && fd_callback_info[fd].waiting_thread != current_thread)
 	continue;
@@ -662,9 +653,6 @@ compute_write_mask (fd_set *mask)
   eassert (max_desc < FD_SETSIZE);
   for (fd = 0; fd <= max_desc; ++fd)
     {
-      if (fd_callback_info[fd].thread != NULL
-	  && fd_callback_info[fd].thread != current_thread)
-	continue;
       if (fd_callback_info[fd].waiting_thread != NULL
 	  && fd_callback_info[fd].waiting_thread != current_thread)
 	continue;
@@ -976,12 +964,6 @@ update_processes_for_thread_death (Lisp_Object dying_thread)
 	  struct Lisp_Process *proc = XPROCESS (process);
 
 	  pset_thread (proc, Qnil);
-	  eassert (proc->infd < FD_SETSIZE);
-	  if (proc->infd >= 0)
-	    fd_callback_info[proc->infd].thread = NULL;
-	  eassert (proc->outfd < FD_SETSIZE);
-	  if (proc->outfd >= 0)
-	    fd_callback_info[proc->outfd].thread = NULL;
 	}
     }
 }
@@ -1451,25 +1433,13 @@ DEFUN ("set-process-thread", Fset_process_thread, Sset_process_thread,
   (Lisp_Object process, Lisp_Object thread)
 {
   struct Lisp_Process *proc;
-  struct thread_state *tstate;
 
   CHECK_PROCESS (process);
-  if (NILP (thread))
-    tstate = NULL;
-  else
-    {
-      CHECK_THREAD (thread);
-      tstate = XTHREAD (thread);
-    }
+  if (!NILP (thread))
+    CHECK_THREAD (thread);
 
   proc = XPROCESS (process);
   pset_thread (proc, thread);
-  eassert (proc->infd < FD_SETSIZE);
-  if (proc->infd >= 0)
-    fd_callback_info[proc->infd].thread = tstate;
-  eassert (proc->outfd < FD_SETSIZE);
-  if (proc->outfd >= 0)
-    fd_callback_info[proc->outfd].thread = tstate;
 
   return thread;
 }
-- 
2.39.3


Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Mon, 11 Aug 2025 18:11:02 GMT) Full text and rfc822 format available.

Message #32 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Mon, 11 Aug 2025 21:09:30 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: 79201 <at> debbugs.gnu.org,  dmitry <at> gutov.dev,  johnw <at> gnu.org,
>    app-emacs-dev <at> janestreet.com
> Date: Mon, 11 Aug 2025 13:58:02 -0400
> 
> >> Yes, sorry, I missed that there are two separate mechanisms.  I think
> >> set-process-thread should continue setting the thread member of the
> >> process object, but not the thread member of the fd_callback_info struct.
> >
> > Why not?  set-process-thread can be called when the descriptors for
> > the process are already set up.  That's unlike make-process, which
> > indeed doesn't set the .thread member of the descriptors.
> 
> It's useless and causes this bug.

You ignored my explanation why I think it is not useless.  Please
don't ignore it.  What happens if set-process-thread is called when
the descriptors are already set?

> >> Yes.  Let's just make set-process-thread stop setting the thread member
> >> of the fd_callback_info struct.
> >
> > I don't agree it shouldn't see above.  What problems do you see as the
> > result of that?
> 
> It causes this bug.

Didn't you agree that cleaning up in delete_read_fd and
delete_write_fd solves the problem?  The problem would not exists if
the thread dies before the process exits or is terminated, right?

> Here's an attached patch which makes this change.  The thread member of
> fd_callback_info is then never set, so it can be entirely deleted.

Sorry, I cannot accept such changes without a very good reason.  This
is not some random code or a typo, this stuff was deliberately written
like that, and written for a reason.  If you disagree with the design,
let's first make sure we understand the design and its rationale, and
then please explain why the design is wrong.  It is not enough to show
a single bug, especially if it can be fixed in other ways that don't
remove essential parts of the design.

> This doesn't need an announcement, because the behavior of "calling
> accept-process-output on a process locked to a different thread" still
> works the same way it always did.  There's no (non-bug) user-visible
> change.

Except if set-process-thread is called at some un-opportune moment.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Mon, 11 Aug 2025 18:54:01 GMT) Full text and rfc822 format available.

Message #35 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: sbaugh <at> janestreet.com
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90;
 set-process-thread can permanently break fd_callback_info slots
Date: Mon, 11 Aug 2025 21:52:48 +0300
> Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
>  app-emacs-dev <at> janestreet.com
> Date: Mon, 11 Aug 2025 21:09:30 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
> Sorry, I cannot accept such changes without a very good reason.  This
> is not some random code or a typo, this stuff was deliberately written
> like that, and written for a reason.  If you disagree with the design,
> let's first make sure we understand the design and its rationale, and
> then please explain why the design is wrong.  It is not enough to show
> a single bug, especially if it can be fixed in other ways that don't
> remove essential parts of the design.
> 
> > This doesn't need an announcement, because the behavior of "calling
> > accept-process-output on a process locked to a different thread" still
> > works the same way it always did.  There's no (non-bug) user-visible
> > change.
> 
> Except if set-process-thread is called at some un-opportune moment.

I actually think that if there is a bug, then it's in make-process: it
should set the .thread member of the process's input and output
descriptors with the current thread, like set-process-thread does.  In
addition to cleaning up in delete_read_fd and delete_write_fd, this
should fix the problems discussed here.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Mon, 11 Aug 2025 22:33:02 GMT) Full text and rfc822 format available.

Message #38 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: "John Wiegley" <johnw <at> gnu.org>
To: dick.r.chiang <at> gmail.com, "Eli Zaretskii" <eliz <at> gnu.org>
Cc: 79201 <at> debbugs.gnu.org, Spencer Baugh <sbaugh <at> janestreet.com>,
 app-emacs-dev <at> janestreet.com, dmitry <at> gutov.dev
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Mon, 11 Aug 2025 15:32:09 -0700
This is neither helpful nor constructive. If there are changes to be made, please provide the evidence and justification asked for. Insulting maintainers will not motivate the result you seek.

John

On Mon, Aug 11, 2025, at 12:15 PM, dick.r.chiang <at> gmail.com wrote:
> EZ> This is not some random code or a typo
>
> Yeah it is.  You just don't know it yet.
>
> Brother Spencer, why bother arguing with an idiot?
>
> Incidentally, the fd_callback_info cruft admits sporadic hangs
> documented in Bug#49897.  They only get exposed when using multiple
> threads, which very few emacs programs do.  Fixed in my fork.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Tue, 12 Aug 2025 04:07:01 GMT) Full text and rfc822 format available.

Message #41 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: John Wiegley <johnw <at> gnu.org>
To: Dick <dick.r.chiang <at> gmail.com>
Cc: 79201 <at> debbugs.gnu.org, Spencer Baugh <sbaugh <at> janestreet.com>,
 Eli Zaretskii <eliz <at> gnu.org>, app-emacs-dev <at> janestreet.com, dmitry <at> gutov.dev
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Mon, 11 Aug 2025 21:06:25 -0700
>>>>> Dick  <dick.r.chiang <at> gmail.com> writes:

> I remember it taking weeks to trace the irreproducible hangs with
> non-blocking gnus. It would probably take a day to refresh my memory and
> generate an MRE.

I can see you’ve had this attitude toward the mailing list for many years,
Dick. I’m not interested. If you ever want to collaborate, we’re here.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Tue, 12 Aug 2025 22:17:01 GMT) Full text and rfc822 format available.

Message #44 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Tue, 12 Aug 2025 18:16:12 -0400
Anyway, I found a reproducer for the other bug I mentioned, where
waiting_thread could also get contaminated and break fd_callback_info
slots.  It seems to be quite a bad bug: it will happen spontaneously
just from running multiple threads interacting with processes.

If you run the following simple Lisp code after applying the following
patch (which is just to make the breakage more obvious; without the
patch, Emacs would be left in a silently broken state), Emacs will abort
in a few seconds - though it probably varies based on your system.

This indicates that .waiting_thread is somehow being left at a non-NULL
value even after the thread has left wait_reading_process_output.

Any ideas?

(defun my-break-thread ()
  (let ((proc (make-process
               :name "foo"
               :command '("sleep" "1"))))
    (while (process-live-p proc)
      (accept-process-output proc 0.05))))

(while t
  (make-thread #'my-break-thread "thread1")
  (thread-join (make-thread #'my-break-thread "thread2")))

diff --git a/src/process.c b/src/process.c
index 1a5c99139a6..294c204ae96 100644
--- a/src/process.c
+++ b/src/process.c
@@ -467,6 +467,16 @@ make_lisp_proc (struct Lisp_Process *p)
 } fd_callback_info[FD_SETSIZE];
 
 
+static void
+assert_fd_callback_info_unused(int fd)
+{
+  eassert (0 <= fd && fd < FD_SETSIZE);
+  if (fd_callback_info[fd].waiting_thread != NULL) {
+    fprintf (stderr, "waiting_thread non-NULL\n");
+    terminate_due_to_signal (SIGABRT, INT_MAX);
+  }
+}
+
 /* Add a file descriptor FD to be monitored for when read is possible.
    When read is possible, call FUNC with argument DATA.  */
 
@@ -475,7 +485,6 @@ add_read_fd (int fd, fd_callback func, void *data)
 {
   add_keyboard_wait_descriptor (fd);
 
-  eassert (0 <= fd && fd < FD_SETSIZE);
   fd_callback_info[fd].func = func;
   fd_callback_info[fd].data = data;
 }
@@ -490,14 +499,13 @@ add_non_keyboard_read_fd (int fd, fd_callback func, void *data)
 static void
 add_process_read_fd (int fd)
 {
-  eassert (fd >= 0 && fd < FD_SETSIZE);
+  assert_fd_callback_info_unused(fd);
   eassert (fd_callback_info[fd].func == NULL);
 
   fd_callback_info[fd].flags &= ~KEYBOARD_FD;
   fd_callback_info[fd].flags |= FOR_READ;
   if (fd > max_desc)
     max_desc = fd;
-  eassert (0 <= fd && fd < FD_SETSIZE);
   fd_callback_info[fd].flags |= PROCESS_FD;
 }
 
@@ -522,7 +530,7 @@ delete_read_fd (int fd)
 void
 add_write_fd (int fd, fd_callback func, void *data)
 {
-  eassert (fd >= 0 && fd < FD_SETSIZE);
+  assert_fd_callback_info_unused(fd);
 
   fd_callback_info[fd].func = func;
   fd_callback_info[fd].data = data;
@@ -534,7 +542,7 @@ add_write_fd (int fd, fd_callback func, void *data)
 static void
 add_non_blocking_write_fd (int fd)
 {
-  eassert (fd >= 0 && fd < FD_SETSIZE);
+  assert_fd_callback_info_unused(fd);
   eassert (fd_callback_info[fd].func == NULL);
 
   fd_callback_info[fd].flags |= FOR_WRITE | NON_BLOCKING_CONNECT_FD;
@@ -8266,7 +8274,7 @@ remove_slash_colon (Lisp_Object name)
 add_keyboard_wait_descriptor (int desc)
 {
 #ifdef subprocesses /* Actually means "not MSDOS".  */
-  eassert (desc >= 0 && desc < FD_SETSIZE);
+  assert_fd_callback_info_unused(desc);
   fd_callback_info[desc].flags &= ~PROCESS_FD;
   fd_callback_info[desc].flags |= (FOR_READ | KEYBOARD_FD);
   if (desc > max_desc)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Wed, 13 Aug 2025 11:53:02 GMT) Full text and rfc822 format available.

Message #47 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Wed, 13 Aug 2025 14:52:04 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: 79201 <at> debbugs.gnu.org,  dmitry <at> gutov.dev,  johnw <at> gnu.org,
>    app-emacs-dev <at> janestreet.com
> Date: Tue, 12 Aug 2025 18:16:12 -0400
> 
> 
> Anyway, I found a reproducer for the other bug I mentioned, where
> waiting_thread could also get contaminated and break fd_callback_info
> slots.  It seems to be quite a bad bug: it will happen spontaneously
> just from running multiple threads interacting with processes.
> 
> If you run the following simple Lisp code after applying the following
> patch (which is just to make the breakage more obvious; without the
> patch, Emacs would be left in a silently broken state), Emacs will abort
> in a few seconds - though it probably varies based on your system.
> 
> This indicates that .waiting_thread is somehow being left at a non-NULL
> value even after the thread has left wait_reading_process_output.
> 
> Any ideas?

I'm not sure I understand your idea behind
assert_fd_callback_info_unused.  wait_reading_process_output calls
thread_select in a loop, so a given thread which called
wait_reading_process_output could spend more than one loop iteration
without leaving the function.  Each iteration, such a thread will call
add_process_read_fd and other similar functions, to set up the
descriptors it will wait on, and then call thread_select.  When it
calls thread_select, it releases the global lock, so some other thread
could grab the lock and start running.  Then this other thread could
see the .waiting_thread member non-NULL, because the thread which
marked the descriptor as being waited on by it is stuck in a pselect
call, called from wait_reading_process_output via thread_select.

Could the above scenario be the reason for what you see?  IOW, why did
you expect to find the .waiting_thread members cleared whenever some
thread calls assert_fd_callback_info_unused, and why did you say "even
after the thread has left wait_reading_process_output"?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Wed, 13 Aug 2025 16:47:02 GMT) Full text and rfc822 format available.

Message #50 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Wed, 13 Aug 2025 12:46:39 -0400
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Cc: 79201 <at> debbugs.gnu.org,  dmitry <at> gutov.dev,  johnw <at> gnu.org,
>>    app-emacs-dev <at> janestreet.com
>> Date: Tue, 12 Aug 2025 18:16:12 -0400
>> 
>> 
>> Anyway, I found a reproducer for the other bug I mentioned, where
>> waiting_thread could also get contaminated and break fd_callback_info
>> slots.  It seems to be quite a bad bug: it will happen spontaneously
>> just from running multiple threads interacting with processes.
>> 
>> If you run the following simple Lisp code after applying the following
>> patch (which is just to make the breakage more obvious; without the
>> patch, Emacs would be left in a silently broken state), Emacs will abort
>> in a few seconds - though it probably varies based on your system.
>> 
>> This indicates that .waiting_thread is somehow being left at a non-NULL
>> value even after the thread has left wait_reading_process_output.
>> 
>> Any ideas?
>
> I'm not sure I understand your idea behind
> assert_fd_callback_info_unused.  wait_reading_process_output calls
> thread_select in a loop, so a given thread which called
> wait_reading_process_output could spend more than one loop iteration
> without leaving the function.

Yes.

> Each iteration, such a thread will call add_process_read_fd and other
> similar functions, to set up the descriptors it will wait on, and then
> call thread_select.

That is incorrect.  add_process_read_fd is not usually called from
wait_reading_process_output or thread_select.  It's only called from
make-process.  There is one branch where add_process_read_fd is called
in wait_reading_process_output, for async network connections, but
that's not being hit here, since we aren't using async network
connections.

> When it calls thread_select, it releases the global lock, so some
> other thread could grab the lock and start running.

Yes.

> Then this other thread could see the .waiting_thread member non-NULL,
> because the thread which marked the descriptor as being waited on by
> it is stuck in a pselect call, called from wait_reading_process_output
> via thread_select.

Yes.  For existing file descriptors, the waiting_thread member should
often be non-NULL, and this tells other threads not to wait on that file
descriptor.

> Could the above scenario be the reason for what you see?

No, because the non-nil waiting_thread is seen in make-process, for a
process and file descriptor that is newly created.

If you run the reproducer, the assertion is hit in make-process.  Here's
the backtrace.

#0  0x0000000000422fbb in terminate_due_to_signal (sig=sig <at> entry=6, backtrace_limit=backtrace_limit <at> entry=2147483647) at emacs.c:443
#1  0x000000000061f59e in assert_fd_callback_info_unused (fd=9) at process.c:476
#2  0x000000000061f5e7 in assert_fd_callback_info_unused (fd=fd <at> entry=9) at process.c:509
#3  0x000000000061f5e7 in add_process_read_fd (fd=fd <at> entry=9) at process.c:502
#4  0x0000000000625138 in create_process (current_dir=XIL(0x7fffc4005684), new_argv=0x7fffcbbfd2c0, process=XIL(0x7fffc0000de5)) at process.c:2238
#5  0x0000000000625138 in Fmake_process (nargs=<optimized out>, args=<optimized out>) at process.c:2058
#6  0x00000000005d094c in eval_sub (form=<optimized out>) at lisp.h:2231
#7  0x00000000005d1c91 in Flet (args=XIL(0x7ffff73accc3)) at eval.c:1164
#8  0x00000000005d0a60 in eval_sub (form=<optimized out>) at lisp.h:2231
#9  0x00000000005d1130 in Fprogn (body=<optimized out>) at eval.c:445
#10 0x00000000005d1130 in funcall_lambda (fun=XIL(0x8ead4d), nargs=0, arg_vector=0x7fffcbbfd740) at eval.c:3429
#11 0x00000000005cd6e3 in Ffuncall (nargs=nargs <at> entry=1, args=args <at> entry=0x7fffcbbfd738) at eval.c:3172
#12 0x000000000064a560 in invoke_thread_function () at thread.c:747
#13 0x00000000005cbfb2 in internal_condition_case (bfun=bfun <at> entry=0x64a530 <invoke_thread_function>, handlers=handlers <at> entry=XIL(0x30), hfun=hfun <at> entry=0x649d80 <record_thread_error>) at eval.c:1690
#14 0x000000000064a448 in run_thread (state=0x8eb4b8) at lisp.h:1164
#15 0x00007fffee6081ea in start_thread () at /lib64/libpthread.so.0
#16 0x00007fffeae39953 in clone () at /lib64/libc.so.6

Lisp Backtrace:
"make-process" (0xcbbfd490)
"let" (0xcbbfd608)
"my-break-thread" (0xcbbfd740)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Wed, 13 Aug 2025 16:59:02 GMT) Full text and rfc822 format available.

Message #53 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Wed, 13 Aug 2025 19:57:41 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: 79201 <at> debbugs.gnu.org,  dmitry <at> gutov.dev,  johnw <at> gnu.org,
>    app-emacs-dev <at> janestreet.com
> Date: Wed, 13 Aug 2025 12:46:39 -0400
> 
> > Could the above scenario be the reason for what you see?
> 
> No, because the non-nil waiting_thread is seen in make-process, for a
> process and file descriptor that is newly created.

Then I think the descriptors was closed without clearing the
.waiting_thread member.  So that goes back to what I suggested
up-thread: delete_read_fd and delete_write_fd should clear these
members.  They are called either after closing the descriptor or when
we stop reading/writing to them, so their .waiting_thread member is no
longer useful.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Wed, 13 Aug 2025 18:37:01 GMT) Full text and rfc822 format available.

Message #56 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Wed, 13 Aug 2025 14:36:29 -0400
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Cc: 79201 <at> debbugs.gnu.org,  dmitry <at> gutov.dev,  johnw <at> gnu.org,
>>    app-emacs-dev <at> janestreet.com
>> Date: Wed, 13 Aug 2025 12:46:39 -0400
>> 
>> > Could the above scenario be the reason for what you see?
>> 
>> No, because the non-nil waiting_thread is seen in make-process, for a
>> process and file descriptor that is newly created.
>
> Then I think the descriptors was closed without clearing the
> .waiting_thread member.  So that goes back to what I suggested
> up-thread: delete_read_fd and delete_write_fd should clear these
> members.  They are called either after closing the descriptor or when
> we stop reading/writing to them, so their .waiting_thread member is no
> longer useful.

That doesn't explain why this is happening, though.  Descriptors are
always closed without clearing the .waiting_thread member.  Clearing
.waiting_thread is clear_waiting_thread_info's job.  Why isn't it
happening?

One theory: clear_waiting_thread_info only clears .waiting_thread up to
max_desc.  That's not it, though, because the bug still happens (albeit
slower) if I do:

@@ -670,7 +678,7 @@ clear_waiting_thread_info (void)
   int fd;
 
   eassert (max_desc < FD_SETSIZE);
-  for (fd = 0; fd <= max_desc; ++fd)
+  for (fd = 0; fd < FD_SETSIZE; ++fd)
     {
       if (fd_callback_info[fd].waiting_thread == current_thread)
 	fd_callback_info[fd].waiting_thread = NULL;




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Thu, 14 Aug 2025 04:40:02 GMT) Full text and rfc822 format available.

Message #59 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Thu, 14 Aug 2025 07:39:34 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: 79201 <at> debbugs.gnu.org,  dmitry <at> gutov.dev,  johnw <at> gnu.org,
>    app-emacs-dev <at> janestreet.com
> Date: Wed, 13 Aug 2025 14:36:29 -0400
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> >> From: Spencer Baugh <sbaugh <at> janestreet.com>
> >> Cc: 79201 <at> debbugs.gnu.org,  dmitry <at> gutov.dev,  johnw <at> gnu.org,
> >>    app-emacs-dev <at> janestreet.com
> >> Date: Wed, 13 Aug 2025 12:46:39 -0400
> >> 
> >> > Could the above scenario be the reason for what you see?
> >> 
> >> No, because the non-nil waiting_thread is seen in make-process, for a
> >> process and file descriptor that is newly created.
> >
> > Then I think the descriptors was closed without clearing the
> > .waiting_thread member.  So that goes back to what I suggested
> > up-thread: delete_read_fd and delete_write_fd should clear these
> > members.  They are called either after closing the descriptor or when
> > we stop reading/writing to them, so their .waiting_thread member is no
> > longer useful.
> 
> That doesn't explain why this is happening, though.  Descriptors are
> always closed without clearing the .waiting_thread member.  Clearing
> .waiting_thread is clear_waiting_thread_info's job.  Why isn't it
> happening?

clear_waiting_thread_info is called when some thread returns from
wait_reading_process_output.  If a thread didn't return, but is
instead stuck in the call to pselect, another thread can run and
access the descriptors.

I think if you want to understand exactly how this happens, you need
to trace the places that close the descriptors.  AFAIU, the scenario
where make-process reuses a descriptor already marked with
.waiting_thread of another thread is because that descriptor was
recently closed.  Otherwise, it's not possible for make-process to get
a descriptor which is already in use.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Thu, 14 Aug 2025 06:02:02 GMT) Full text and rfc822 format available.

Message #62 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: sbaugh <at> janestreet.com
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90;
 set-process-thread can permanently break fd_callback_info slots
Date: Thu, 14 Aug 2025 09:01:13 +0300
> Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
>  app-emacs-dev <at> janestreet.com
> Date: Thu, 14 Aug 2025 07:39:34 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
> > That doesn't explain why this is happening, though.  Descriptors are
> > always closed without clearing the .waiting_thread member.  Clearing
> > .waiting_thread is clear_waiting_thread_info's job.  Why isn't it
> > happening?
> 
> clear_waiting_thread_info is called when some thread returns from
> wait_reading_process_output.  If a thread didn't return, but is
> instead stuck in the call to pselect, another thread can run and
> access the descriptors.
> 
> I think if you want to understand exactly how this happens, you need
> to trace the places that close the descriptors.  AFAIU, the scenario
> where make-process reuses a descriptor already marked with
> .waiting_thread of another thread is because that descriptor was
> recently closed.  Otherwise, it's not possible for make-process to get
> a descriptor which is already in use.

And also note that handle_child_signal, which is the SIGCHLD handler,
and is therefore invoked asynchronously, calls delete_read_fd on the
input descriptor of the process that died.  Which might be highly
relevant for your recipe, because it calls "sleep 1", which dies after
1 sec without producing any output.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Thu, 14 Aug 2025 16:22:02 GMT) Full text and rfc822 format available.

Message #65 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Thu, 14 Aug 2025 12:21:15 -0400
[Message part 1 (text/plain, inline)]
Spencer Baugh <sbaugh <at> janestreet.com> writes:
> One theory: clear_waiting_thread_info only clears .waiting_thread up to
> max_desc.

OK, I figured it out.  This is indeed the sole issue.  With this patch:

diff --git a/src/process.c b/src/process.c
index 1a5c99139a6..5db427a4a99 100644
--- a/src/process.c
+++ b/src/process.c
@@ -670,10 +670,15 @@ clear_waiting_thread_info (void)
   int fd;
 
   eassert (max_desc < FD_SETSIZE);
-  for (fd = 0; fd <= max_desc; ++fd)
+  for (fd = 0; fd < FD_SETSIZE; ++fd)
     {
-      if (fd_callback_info[fd].waiting_thread == current_thread)
+      if (fd_callback_info[fd].waiting_thread == current_thread) {
+	if (fd > max_desc) {
+	  fprintf (stderr, "fd too high: %d > max_desc=%d\n", fd, max_desc);
+	  terminate_due_to_signal (SIGABRT, INT_MAX);
+	}
 	fd_callback_info[fd].waiting_thread = NULL;
+      }
     }
 }
 
And my earlier reproducer (but not my earlier patch), we quickly get an
abort, indicating that there's an fd_callback_info slot which would be
left with a non-NULL .waiting_thread.

The cause is that recompute_max_desc runs when a file descriptor is
deleted, and reduces max_desc.  But when it does so, that means
clear_waiting_thread_info will not get the chance to clear the
waiting_thread field in fd_callback_info slots above the new max_desc.

My initial thought was that recompute_max_desc should zero the
fd_callback_info[fd] entry when it reduces max_desc.  But it turns out
to be equivalent, and simpler, to zero fd_callback_info[desc] completely
instead of just setting flags to 0 right before we call
recompute_max_desc.

I do this in the attached patch, which fixes the bug.

[0001-Zero-fd_callback_info-when-deleting-an-fd.patch (text/x-patch, inline)]
From 975741cc141c45c4d857f51098a35d98885920f4 Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Thu, 14 Aug 2025 12:17:23 -0400
Subject: [PATCH] Zero fd_callback_info when deleting an fd

.waiting_thread and .thread could be left set to non-NULL values
in a deleted fd_callback_info entry.  These would never be
cleared by e.g. clear_waiting_thread_info since that only clears
fd_callback_info entries up to max_desc.  Clear fd_callback_info
entirely when deleting an entry.

* src/process.c (delete_write_fd)
(delete_keyboard_wait_descriptor): Set fd_callback_info
completely to zero.  (bug#79201)
(deactivate_process): Remove unnecessary recompute_max_desc.
---
 src/process.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/process.c b/src/process.c
index 1a5c99139a6..36717e39289 100644
--- a/src/process.c
+++ b/src/process.c
@@ -574,8 +574,7 @@ delete_write_fd (int fd)
   fd_callback_info[fd].flags &= ~(FOR_WRITE | NON_BLOCKING_CONNECT_FD);
   if (fd_callback_info[fd].flags == 0)
     {
-      fd_callback_info[fd].func = 0;
-      fd_callback_info[fd].data = 0;
+      fd_callback_info[fd] = (struct fd_callback_data) {};
 
       if (fd == max_desc)
 	recompute_max_desc ();
@@ -4809,8 +4808,6 @@ deactivate_process (Lisp_Object proc)
       delete_read_fd (inchannel);
       if ((fd_callback_info[inchannel].flags & NON_BLOCKING_CONNECT_FD) != 0)
 	delete_write_fd (inchannel);
-      if (inchannel == max_desc)
-	recompute_max_desc ();
     }
 }
 
@@ -8282,7 +8279,7 @@ delete_keyboard_wait_descriptor (int desc)
 #ifdef subprocesses
   eassert (desc >= 0 && desc < FD_SETSIZE);
 
-  fd_callback_info[desc].flags &= ~(FOR_READ | KEYBOARD_FD | PROCESS_FD);
+  fd_callback_info[desc] = (struct fd_callback_data) {};
 
   if (desc == max_desc)
     recompute_max_desc ();
-- 
2.39.3


Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Fri, 15 Aug 2025 11:53:02 GMT) Full text and rfc822 format available.

Message #68 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Fri, 15 Aug 2025 14:52:04 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: 79201 <at> debbugs.gnu.org,  dmitry <at> gutov.dev,  johnw <at> gnu.org,
>    app-emacs-dev <at> janestreet.com
> Date: Thu, 14 Aug 2025 12:21:15 -0400
> 
> The cause is that recompute_max_desc runs when a file descriptor is
> deleted, and reduces max_desc.  But when it does so, that means
> clear_waiting_thread_info will not get the chance to clear the
> waiting_thread field in fd_callback_info slots above the new max_desc.

Makes sense, thanks.

> My initial thought was that recompute_max_desc should zero the
> fd_callback_info[fd] entry when it reduces max_desc.  But it turns out
> to be equivalent, and simpler, to zero fd_callback_info[desc] completely
> instead of just setting flags to 0 right before we call
> recompute_max_desc.
> 
> I do this in the attached patch, which fixes the bug.

Thanks, the patch seems OK with a minor comment:

> --- a/src/process.c
> +++ b/src/process.c
> @@ -574,8 +574,7 @@ delete_write_fd (int fd)
>    fd_callback_info[fd].flags &= ~(FOR_WRITE | NON_BLOCKING_CONNECT_FD);
>    if (fd_callback_info[fd].flags == 0)
>      {
> -      fd_callback_info[fd].func = 0;
> -      fd_callback_info[fd].data = 0;
> +      fd_callback_info[fd] = (struct fd_callback_data) {};

Please just clear the .thread and .waiting_thread members.  This way,
the code is easier to read, and it is also easier to find all the
places which assign something to these members.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79201; Package emacs. (Fri, 15 Aug 2025 14:23:01 GMT) Full text and rfc822 format available.

Message #71 received at 79201 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79201 <at> debbugs.gnu.org, dmitry <at> gutov.dev, johnw <at> gnu.org,
 app-emacs-dev <at> janestreet.com
Subject: Re: bug#79201: 30.1.90; set-process-thread can permanently break
 fd_callback_info slots
Date: Fri, 15 Aug 2025 10:22:20 -0400
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Cc: 79201 <at> debbugs.gnu.org,  dmitry <at> gutov.dev,  johnw <at> gnu.org,
>>    app-emacs-dev <at> janestreet.com
>> Date: Thu, 14 Aug 2025 12:21:15 -0400
>> 
>> The cause is that recompute_max_desc runs when a file descriptor is
>> deleted, and reduces max_desc.  But when it does so, that means
>> clear_waiting_thread_info will not get the chance to clear the
>> waiting_thread field in fd_callback_info slots above the new max_desc.
>
> Makes sense, thanks.
>
>> My initial thought was that recompute_max_desc should zero the
>> fd_callback_info[fd] entry when it reduces max_desc.  But it turns out
>> to be equivalent, and simpler, to zero fd_callback_info[desc] completely
>> instead of just setting flags to 0 right before we call
>> recompute_max_desc.
>> 
>> I do this in the attached patch, which fixes the bug.
>
> Thanks, the patch seems OK with a minor comment:
>
>> --- a/src/process.c
>> +++ b/src/process.c
>> @@ -574,8 +574,7 @@ delete_write_fd (int fd)
>>    fd_callback_info[fd].flags &= ~(FOR_WRITE | NON_BLOCKING_CONNECT_FD);
>>    if (fd_callback_info[fd].flags == 0)
>>      {
>> -      fd_callback_info[fd].func = 0;
>> -      fd_callback_info[fd].data = 0;
>> +      fd_callback_info[fd] = (struct fd_callback_data) {};
>
> Please just clear the .thread and .waiting_thread members.  This way,
> the code is easier to read, and it is also easier to find all the
> places which assign something to these members.

Agreed.  Done in the attached patch.

[0001-Zero-fd_callback_info-when-deleting-an-fd.patch (text/x-patch, inline)]
From e6bddffd8b8e11e598f2ca75d7915184d38def4e Mon Sep 17 00:00:00 2001
From: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Thu, 14 Aug 2025 12:17:23 -0400
Subject: [PATCH] Zero fd_callback_info when deleting an fd

.waiting_thread and .thread could be left set to non-NULL values
in a deleted fd_callback_info entry.  These would never be
cleared by e.g. clear_waiting_thread_info since that only clears
fd_callback_info entries up to max_desc.  Clear fd_callback_info
entirely when deleting an entry.

* src/process.c (clear_fd_callback_data): Add.
(delete_write_fd, delete_keyboard_wait_descriptor): Call
clear_fd_callback_data. (bug#79201)
(delete_read_fd): Remove duplicated clearing code.
(deactivate_process): Remove duplicate recompute_max_desc.
---
 src/process.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/src/process.c b/src/process.c
index 1a5c99139a6..042df7108eb 100644
--- a/src/process.c
+++ b/src/process.c
@@ -466,6 +466,16 @@ make_lisp_proc (struct Lisp_Process *p)
   struct thread_state *waiting_thread;
 } fd_callback_info[FD_SETSIZE];
 
+static void
+clear_fd_callback_data(struct fd_callback_data* elem)
+{
+  elem->func = NULL;
+  elem->data = NULL;
+  elem->flags = 0;
+  elem->thread = NULL;
+  elem->waiting_thread = NULL;
+}
+
 
 /* Add a file descriptor FD to be monitored for when read is possible.
    When read is possible, call FUNC with argument DATA.  */
@@ -507,13 +517,6 @@ add_process_read_fd (int fd)
 delete_read_fd (int fd)
 {
   delete_keyboard_wait_descriptor (fd);
-
-  eassert (0 <= fd && fd < FD_SETSIZE);
-  if (fd_callback_info[fd].flags == 0)
-    {
-      fd_callback_info[fd].func = 0;
-      fd_callback_info[fd].data = 0;
-    }
 }
 
 /* Add a file descriptor FD to be monitored for when write is possible.
@@ -574,8 +577,7 @@ delete_write_fd (int fd)
   fd_callback_info[fd].flags &= ~(FOR_WRITE | NON_BLOCKING_CONNECT_FD);
   if (fd_callback_info[fd].flags == 0)
     {
-      fd_callback_info[fd].func = 0;
-      fd_callback_info[fd].data = 0;
+      clear_fd_callback_data(&fd_callback_info[fd]);
 
       if (fd == max_desc)
 	recompute_max_desc ();
@@ -4809,8 +4811,6 @@ deactivate_process (Lisp_Object proc)
       delete_read_fd (inchannel);
       if ((fd_callback_info[inchannel].flags & NON_BLOCKING_CONNECT_FD) != 0)
 	delete_write_fd (inchannel);
-      if (inchannel == max_desc)
-	recompute_max_desc ();
     }
 }
 
@@ -8282,7 +8282,7 @@ delete_keyboard_wait_descriptor (int desc)
 #ifdef subprocesses
   eassert (desc >= 0 && desc < FD_SETSIZE);
 
-  fd_callback_info[desc].flags &= ~(FOR_READ | KEYBOARD_FD | PROCESS_FD);
+  clear_fd_callback_data(&fd_callback_info[desc]);
 
   if (desc == max_desc)
     recompute_max_desc ();
-- 
2.39.3


This bug report was last modified today.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.