GNU bug report logs - #79334
[PATCH] Don't release thread select lock unnecessarily

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Thu, 28 Aug 2025 21:10:02 UTC

Severity: normal

Tags: patch

Full log


View this message in rfc822 format

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79334 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>, Dmitry Gutov <dmitry <at> gutov.dev>
Subject: bug#79334: [PATCH] Don't release thread select lock unnecessarily
Date: Fri, 29 Aug 2025 09:07:50 -0400
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Date: Thu, 28 Aug 2025 17:08:43 -0400
>> 
>> Previously, we were using thread_select to release the thread
>> lock around two different calls to select in
>> wait_reading_process_output.  The main call to select was fine,
>> but we also used thread_select for a pselect call which returns
>> immediately (with a timeout of 0) and is supposed to just check
>> if any file descriptors are active.
>> 
>> If we actually thread switch at that pselect, it's likely to
>> break internal state for Emacs: namely, the call to
>> status_notify immediately after can close file descriptors which
>> another thread is selecting on, causing that thread to get EBADF
>> from select and then call emacs_abort.
>> 
>> We don't need to thread switch here, so don't.
>> 
>> * src/process.c (wait_reading_process_output): Remove
>> unnecessary thread_select wrapper.
>
> Thanks.
>
> However, I don't think I follow the logic behind this change.  I agree
> that status_notify does things that can be dangerous (more about that
> below), but the call to status_notify will happen whether we call
> thread_select or pselect directly, so if status_notify can do some
> damage, it will do that regardless of whether we call pselect directly
> or not.  I think you assume that the problematic call to status_notify
> will happen in another thread, if we allow switching threads at this
> point, is that right?  IOW, I think you assume that, if we switch to
> another thread, that other thread could call status_notify, and thus
> close some descriptors on which the current thread, the one which
> calls thread_select at this point, is waiting.  If that is your
> assumption, then the actual situation could also be the opposite: the
> current thread, the one where you want to call pselect, will call
> status_notify after pselect returns, and that will close some
> descriptors used by other threads.  In that case, calling pselect
> directly will make things worse, not better, because it makes this
> closing of descriptors imminent.

Yes.  I think there's multiple problems in this code and I haven't
tracked them all down yet.

That being said, I'm confident that it's incorrect and unnecessary to be
using thread_select instead of pselect here, so we should stop doing
that in addition to any other fix.

> So, while I agree we have a problem here, I think the place and the
> way we should solve it are different.
>
> Which brings us to status_notify.  It indeed can close the descriptors
> of processes which terminated, so we should probably make it safer.
> For example, we could leave alone descriptors marked with non-NULL
> waiting_thread if that thread is still alive and is different from the
> current thread.  Would that fix the problem?  If yes, I think we
> should install such a change.

That was my initial idea for a fix as well.  But I'm not sure about the
specific details.  Namely, perhaps we need to be checking waiting_thread
everywhere that we're doing FOR_EACH_PROCESS, not just in status_notify.
And maybe some new functions/macros should be added to simplify this;
maybe a version of FOR_EACH_PROCESS which only iterates over processes
for which waiting_thread == current_thread.

> We should perhaps also understand better what happens with the current
> code if and when descriptors are closed by some other thread in this
> scenario.  A thread using such a descriptor could be either stuck in
> the pselect call, or it could be before or after the call to pselect.
> Paul, what happens if a descriptor passed to pselect becomes closed
> while pselect waits? does pselect return with EBADF, or does pselect
> handle that gracefully, like by considering such descriptors to be
> never "ready"?
>
> If the thread is before or after pselect, then either calling pselect
> or attempting to read from a closed descriptor after pselect returns
> will cause EBADF.  Is our code prepared to deal with such a situation
> (e.g., by ignoring the failed read), or will that cause some Lisp API
> to signal an error or produce unexpected results?  If the latter, we
> should probably handle EBADF specially, because I don't believe we
> could completely prevent this from happening.

(I'm more hopeful: I think we can completely prevent this from
happening, we just need to be more careful about our locking.)




This bug report was last modified 8 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.