GNU bug report logs - #79334
[PATCH] Don't release thread select lock unnecessarily

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Thu, 28 Aug 2025 21:10:02 UTC

Severity: normal

Tags: patch

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 79334 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, dmitry <at> gutov.dev
Subject: bug#79334: [PATCH] Don't release thread select lock unnecessarily, [PATCH] Defer closing file descriptors used by other threads
Date: Sat, 30 Aug 2025 19:35:39 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: 79334 <at> debbugs.gnu.org,  eggert <at> cs.ucla.edu,  dmitry <at> gutov.dev
> Date: Sat, 30 Aug 2025 11:55:27 -0400
> 
> >> +	      /* Ignore any events which happened on this fd. */
> >> +	      FD_CLR (fd, &Available);
> >> +	      FD_CLR (fd, &Writeok);
> >
> > Is this wise?  How can we be sure that these events were already
> > handled (presumably, in another thread)?  If they were not, we will
> > lose events.  What will happen if we don't ignore them here?
> 
> Any events which might appear at this point were already events we were
> going to lose by closing this file descriptor.  For example, if a
> process produces output just as we do delete-process on it, we never
> read that output.  The only change is that we now have to explicitly
> ignore those events; before, we silently dropped them at the time we
> closed the file descriptor.

Where did we close it?  There's this loop _after_ the place where you
add your loop which clears the bits in Available:

      for (channel = 0; channel <= max_desc; ++channel)
        {
          struct fd_callback_data *d = &fd_callback_info[channel];
          if (d->func
	      && ((d->flags & FOR_READ
		   && FD_ISSET (channel, &Available))
		  || ((d->flags & FOR_WRITE)
		      && FD_ISSET (channel, &Writeok))))
            d->func (channel, d->data);
	}

AFAIU, it would notice the set bits and act upon them.  Now with this
new addition we are clearing them ahead of that loop.  That doesn't
necessarily sound right to me.

> >> +static bool
> >> +other_thread_is_waiting_for (struct fd_callback_data* elem)
> >> +{
> >> +  return elem->waiting_thread != NULL && elem->waiting_thread != current_thread;
> >> +}
> >
> > This should also check that the waiting thread is still alive.  If it
> > isn't, it's okay to close the descriptor.  Otherwise, descriptors
> > belonging to threads that exited might never be closed.
> 
> waiting_thread always points to a living thread, because it's only set
> while a thread is inside wait_reading_process_output.
> 
> (In fact, when a thread sees waiting_thread set to a value which is
> neither NULL nor current_thread, it can only ever point to a thread
> which is currently inside thread_select)

You assume that the waiting_thread field will be cleared when a thread
dies, but that is not guaranteed.  Why isn't it better to be defensive
and make sure the thread is still alive?  If you are right, that test
will always be true, but what if you are wrong in some rare cases?

> >> +	  if (fd_callback_info[fd].waiting_thread == current_thread
> >> +	      && (fd_callback_info[fd].flags == WAITING_FOR_CLOSE_FD))
> >> +	    {
> >> +	      fprintf (stderr, "closing deferred fd %d\n", fd);
> >> +	      emacs_close (fd);
> >> +	      /* Ignore any events which happened on this fd. */
> >> +	      FD_CLR (fd, &Available);
> >> +	      FD_CLR (fd, &Writeok);
> >> +	      fd_callback_info[fd].flags = 0;
> >
> > Why do it here and not when we deactivate processes?
> 
> Because that would close file descriptors that another thread is
> currently selecting on, causing EBADF.

No, I mean to deactivate only the processes whose I/O channels are
locked to the current thread.

> > Until now we never closed any descriptors inside
> > wait_reading_process_output.
> 
> Just to be clear, we can call deactivate_process from
> wait_reading_process_output, through status_notify, through process
> filters, or other similar.  That closes descriptors.

Yes, and so I suggest doing this logic of deactivating only the
processes that are locked to the current thread there.

> > More generally, why this design and not a simpler change which only
> > deactivates processes whose I/O is waited by the current thread, as
> > discussed previously?
> 
> I looked into that, but I determined that it's too complicated.  It
> would require us to keep track of a new kind of process state: a process
> which has been deleted, but hasn't been deactivated yet.

What I had in mind was neither delete it nor deactivate it.  IOW, the
loop which goes through the list of processes should only pay
attention to processes locked to this thread and those that are not
locked to any live thread.  It should leave the other processes to
when their threads call status_notify.

> > Looking at all the callers of close_process_fd, it sounds like it's
> > too low-level to do this.  For example, it is called from
> > create_process (including in the forked child) to close the unneeded
> > ends of the pipe, where we probably don't want this.  And
> > clear_fd_callback_data is also used for the keyboard input descriptor,
> > which is probably also not relevant.
> 
> True.
> 
> I looked into fixing these, and that suggested to me an altogether
> simpler approach.  We can keep the logic entirely contained in
> wait_reading_process_output, just adding a bit of code around the select
> call.
> 
> And we don't need to defer deleting the fds at all: just recognize when
> it's happened, and be careful to not touch those fds afterwards.

Hmm... I don't understand: why do we need to do this?  The calls to
compute_input_wait_mask, compute_non_keyboard_wait_mask etc., which
set bits in the Available mask passed top pselect already avoid
setting bits for descriptors on which some other thread waits.  So
pselect cannot possibly indicate any output ready on any descriptors
belonging to other threads, and this loop you suggest to add:

> +      /* Save the fds we're currently waiting_thread for.  */
> +      fd_set waiting_fds;
> +      FD_ZERO (&waiting_fds);
> +      int max_waiting_fds = 0;
> +      for (int fd = 0; fd <= max_desc; ++fd)
> +        {
> +	  if (fd_callback_info[fd].waiting_thread == current_thread)
> +	    {
> +	      FD_SET (fd, &waiting_fds);
> +	      max_waiting_fds = fd;
> +	    }
> +	}

should be unnecessary.  Or what am I missing?




This bug report was last modified 8 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.