GNU bug report logs - #79333
31.0.50; Processes (still) aren't actually locked to threads

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Thu, 28 Aug 2025 19:46:02 UTC

Severity: normal

Found in version 31.0.50

To reply to this bug, email your comments to 79333 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to eliz <at> gnu.org, dmitry <at> gutov.dev, bug-gnu-emacs <at> gnu.org:
bug#79333; Package emacs. (Thu, 28 Aug 2025 19:46:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Spencer Baugh <sbaugh <at> janestreet.com>:
New bug report received and forwarded. Copy sent to eliz <at> gnu.org, dmitry <at> gutov.dev, bug-gnu-emacs <at> gnu.org. (Thu, 28 Aug 2025 19:46:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 31.0.50; Processes (still) aren't actually locked to threads
Date: Thu, 28 Aug 2025 15:45:18 -0400

1. emacs -Q
2. Eval this:

(define-advice shell-command-sentinel (:before (process signal))
  (message "process thread: %s, current thread: %s"
           (process-thread process)
           (current-thread)))
(make-thread
 (lambda ()
   (async-shell-command "sleep 2")
   ;; Do nothing.
   (thread-join (make-thread (lambda () (while t (sit-for 1)))))))
(sit-for 3)

3. It hangs due to some thread bug, independent of the main bug I'm
   reporting here.  Just hit C-g.  (Notably, it doesn't hang when run in
   emacs --batch)

4. Observe in *Messages* a message like this:

process thread: #<thread 0x3a25bf60>, current thread: #<thread 0x7f87c009e920>

The sentinel ran in a thread which is not process-thread.

So even after c93be71e45, processes aren't actually locked to threads.

(Once again, I think we should take the opportunity here to delete the
code for locking processes to threads, since IMO it is not useful, and
it is still broken)


In GNU Emacs 31.0.50 (build 88, x86_64-pc-linux-gnu, GTK+ Version
 3.22.30, cairo version 1.15.12) of 2025-08-28 built on igm-qws-u22796a
Repository revision: 1ebb6e8822b5fc635549be14a3d4f2dd6f2d77a4
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Rocky Linux 8.10 (Green Obsidian)

Configured using:
 'configure -C --with-gif=no'

Configured features:
CAIRO DBUS FREETYPE GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG LIBSELINUX
LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER PNG RSVG
SECCOMP SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM
XINERAMA XINPUT2 XPM XRANDR GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  minibuffer-regexp-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr compile comint ansi-osc ansi-color ring emacsbug
lisp-mnt message mailcap yank-media puny dired dired-loaddefs rfc822 mml
mml-sec password-cache epa derived epg rfc6068 epg-config gnus-util
text-property-search time-date subr-x mm-decode mm-bodies mm-encode
mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047
rfc2045 ietf-drums mm-util mail-prsvr mail-utils comp-run bytecomp
byte-compile comp-common rx warnings icons cl-loaddefs cl-lib rmc
iso-transl tooltip cconv eldoc paren electric uniquify ediff-hook
vc-hooks lisp-float-type elisp-mode mwheel term/x-win x-win
term/common-win x-dnd touch-screen tool-bar dnd fontset image regexp-opt
fringe tabulated-list replace newcomment text-mode lisp-mode prog-mode
register page tab-bar menu-bar rfn-eshadow isearch easymenu timer select
scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors
frame minibuffer nadvice seq simple cl-generic indonesian philippine
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure
cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp
files window text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget keymap hashtable-print-readable backquote
threads dbusbind inotify dynamic-setting system-font-setting
font-render-setting cairo gtk x-toolkit xinput2 x multi-tty move-toolbar
make-network-process tty-child-frames native-compile emacs)

Memory information:
((conses 16 86231 9338) (symbols 48 6873 0) (strings 32 24499 2161)
 (string-bytes 1 811121) (vectors 16 15629)
 (vector-slots 8 186044 7873) (floats 8 30 1) (intervals 56 275 0)
 (buffers 984 11))

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79333; Package emacs. (Thu, 28 Aug 2025 20:45:03 GMT) Full text and rfc822 format available.

Message #8 received at 79333 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: 79333 <at> debbugs.gnu.org
Cc: Dmitry Gutov <dmitry <at> gutov.dev>, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to
 threads
Date: Thu, 28 Aug 2025 16:44:10 -0400

Spencer Baugh <sbaugh <at> janestreet.com> writes:
> 3. It hangs due to some thread bug, independent of the main bug I'm
>    reporting here.  Just hit C-g.  (Notably, it doesn't hang when run in
>    emacs --batch)

Extra detail about this: This hang happens for GTK3 Emacs, but doesn't
happen for with-x-toolkit=lucid.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79333; Package emacs. (Fri, 29 Aug 2025 07:36:02 GMT) Full text and rfc822 format available.

Message #11 received at 79333 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org
Subject: Re: bug#79333: 31.0.50;
 Processes (still) aren't actually locked to threads
Date: Fri, 29 Aug 2025 10:35:15 +0300

> Cc: Eli Zaretskii <eliz <at> gnu.org>, Dmitry Gutov <dmitry <at> gutov.dev>
> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Date: Thu, 28 Aug 2025 15:45:18 -0400
> 
> 
> 1. emacs -Q
> 2. Eval this:
> 
> (define-advice shell-command-sentinel (:before (process signal))
>   (message "process thread: %s, current thread: %s"
>            (process-thread process)
>            (current-thread)))
> (make-thread
>  (lambda ()
>    (async-shell-command "sleep 2")
>    ;; Do nothing.
>    (thread-join (make-thread (lambda () (while t (sit-for 1)))))))
> (sit-for 3)
> 
> 3. It hangs due to some thread bug, independent of the main bug I'm
>    reporting here.  Just hit C-g.  (Notably, it doesn't hang when run in
>    emacs --batch)
> 
> 4. Observe in *Messages* a message like this:
> 
> process thread: #<thread 0x3a25bf60>, current thread: #<thread 0x7f87c009e920>
> 
> The sentinel ran in a thread which is not process-thread.

I believe that's because of the way we process SIGCHLD on Posix
systems: we write to a special file descriptor to wake pselect.  That
file descriptor is used by all the subprocesses, so it cannot be made
thread-specific.  See child_signal_init and child_signal_notify.

Since this descriptor is shared by all the subprocesses, the sentinel
on Posix systems can run in the context of some random thread that
succeeds to grab the global lock after it returns from pselect.

Making sentinels run in the context of the thread that started the
process would require to redesign this part of Emacs.  Until then, we
will need to document this subtlety.

> So even after c93be71e45, processes aren't actually locked to threads.

Only the sentinel is not locked.

> (Once again, I think we should take the opportunity here to delete the
> code for locking processes to threads, since IMO it is not useful, and
> it is still broken)

I did explain why it is useful, and you haven't brought up any
arguments to the contrary.  And if "broken" means that the sentinel
can run in the context of a random thread, then how come you are
asking to leave it in this broken state?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79333; Package emacs. (Fri, 29 Aug 2025 12:56:02 GMT) Full text and rfc822 format available.

Message #14 received at 79333 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org
Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to
 threads
Date: Fri, 29 Aug 2025 08:55:41 -0400

Eli Zaretskii <eliz <at> gnu.org> writes:

>> Cc: Eli Zaretskii <eliz <at> gnu.org>, Dmitry Gutov <dmitry <at> gutov.dev>
>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Date: Thu, 28 Aug 2025 15:45:18 -0400
>> 
>> 
>> 1. emacs -Q
>> 2. Eval this:
>> 
>> (define-advice shell-command-sentinel (:before (process signal))
>>   (message "process thread: %s, current thread: %s"
>>            (process-thread process)
>>            (current-thread)))
>> (make-thread
>>  (lambda ()
>>    (async-shell-command "sleep 2")
>>    ;; Do nothing.
>>    (thread-join (make-thread (lambda () (while t (sit-for 1)))))))
>> (sit-for 3)
>> 
>> 3. It hangs due to some thread bug, independent of the main bug I'm
>>    reporting here.  Just hit C-g.  (Notably, it doesn't hang when run in
>>    emacs --batch)
>> 
>> 4. Observe in *Messages* a message like this:
>> 
>> process thread: #<thread 0x3a25bf60>, current thread: #<thread 0x7f87c009e920>
>> 
>> The sentinel ran in a thread which is not process-thread.
>
> I believe that's because of the way we process SIGCHLD on Posix
> systems: we write to a special file descriptor to wake pselect.  That
> file descriptor is used by all the subprocesses, so it cannot be made
> thread-specific.  See child_signal_init and child_signal_notify.
>
> Since this descriptor is shared by all the subprocesses, the sentinel
> on Posix systems can run in the context of some random thread that
> succeeds to grab the global lock after it returns from pselect.
>
> Making sentinels run in the context of the thread that started the
> process would require to redesign this part of Emacs.  Until then, we
> will need to document this subtlety.

Makes sense.

>> So even after c93be71e45, processes aren't actually locked to threads.
>
> Only the sentinel is not locked.

The filter can also be run in a different thread, because it's run for
any remaining output at the time the process terminates.

This code demonstrates that:

(define-advice shell-command-sentinel (:before (process signal))
  (message "process thread: %s, current thread: %s"
           (process-thread process)
           (current-thread)))
(define-advice comint-output-filter (:before (process string))
  (message "filter: %s, current thread: %s"
           (process-thread process)
           (current-thread)))
(make-thread
 (lambda ()
   (async-shell-command "sleep 2 && echo hi")
   ;; Do nothing.
   (thread-join (make-thread (lambda () (while t (sit-for 1)))))))
(sit-for 3)

So, the following are unlocked:
- calls into the filter triggered by process state changes
- calls into the sentinel

And the following are locked:
- other calls into the filter

Seems hard to document.

>> (Once again, I think we should take the opportunity here to delete the
>> code for locking processes to threads, since IMO it is not useful, and
>> it is still broken)
>
> I did explain why it is useful, and you haven't brought up any
> arguments to the contrary.  And if "broken" means that the sentinel
> can run in the context of a random thread, then how come you are
> asking to leave it in this broken state?

I was working on an example to demonstrate how process locking can cause
problems for unrelated Lisp code when I found this bug.

Here's a finished example of the problem.

Suppose I have the following Lisp program which doesn't use threads:

(run-at-time .1 nil #'async-shell-command "sleep 1 && echo foobar && sleep inf")
(with-current-buffer (get-buffer-create shell-command-buffer-name-async)
  (while (string-empty-p (buffer-string))
    (message "waiting for some process output")
    (sit-for 1))
  (message "buffer contents: %s" (buffer-string)))

This is intended to represent some arbitrary package which calls
make-process in a timer or a hook.  The command "sleep 1 && echo foobar
&& sleep inf" is chosen to represent some interactive executable like a
shell or REPL.

This code runs just fine, with the output appearing in the buffer as
expected.

Now suppose I have some other unrelated package which runs the following
code using threads:

  (make-thread
   (lambda ()
     (accept-process-output nil 1)
     (thread-join (make-thread (lambda () (while t (message "doing work") (sit-for 10)))))))

This is intended to represent thread 1 waiting for some other thread 2
to complete some long-running task.  Crucially, thread 1 is blocked in
thread-join, which doesn't run wait_reading_process_output, so thread 1
won't read output from any locked processes.

Running the second piece of code hangs the first piece of code, even
though neither of them are buggy, and they aren't visibly interacting,
and they can be written by totally different authors.

Specifically: The timer can be run in the thread created by the second
piece of code, and then the process will be created locked to that
thread.  Then the process's output will never be read by thread 1.  So
the first piece of code will hang.

Here's a complete example, with some logging.  (Note that the problem
doesn't occur on every run because it's a race condition.)

  (define-advice start-process (:filter-return (proc))
    (message "start-process: %s" (process-thread proc))
    proc)
  (run-at-time .5 nil #'async-shell-command "sleep 1 && echo foobar && sleep inf")
  (make-thread
   (lambda ()
     (accept-process-output nil 1)
     (thread-join (make-thread (lambda () (while t (message "doing work") (sit-for 10)))))))
  (with-current-buffer (get-buffer-create shell-command-buffer-name-async)
    (while (string-empty-p (buffer-string))
      (message "waiting for some process output")
      (sit-for 1))
    (message "buffer contents: %s" (buffer-string)))

This isn't just an academic problem.  Most Lisp programs that start
processes will run into this problem if there are any Lisp threads
running.

One way to fix this would be to make it so that thread-join and
condition-wait both call wait_reading_process_output.  Then there
wouldn't be a way to block a thread without calling
wait_reading_process_output.  Unfortunately, that's difficult to do in a
portable way, because we would need to integrate waiting for a condition
variable into the wait_reading_process_output event loop, which is
impossible on GNU/Linux.

Another way to fix this would be to make timers run only in the thread
which started them.  However, this is insufficient, because the same
problem can occur with hooks.  Any hook can be run by an unrelated
thread, and processes started in that hook may hang if the thread is
doing work which doesn't involve calling wait_reading_process_output.

As far as I can tell, the only possible fix for this problem is to not
lock processes to threads.  This problem seems worse than the problems
prevented by locking processes to threads, so I think this is the right
fix.

(Especially because, as the original bug demonstrates, we aren't fully
locking processes to threads, neither sentinels nor filters, so we
aren't actually getting the benefits of that locking, only the costs)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79333; Package emacs. (Fri, 29 Aug 2025 13:20:03 GMT) Full text and rfc822 format available.

Message #17 received at 79333 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org
Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to
 threads
Date: Fri, 29 Aug 2025 16:19:30 +0300

> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: dmitry <at> gutov.dev,  79333 <at> debbugs.gnu.org
> Date: Fri, 29 Aug 2025 08:55:41 -0400
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Making sentinels run in the context of the thread that started the
> > process would require to redesign this part of Emacs.  Until then, we
> > will need to document this subtlety.
> 
> Makes sense.
> 
> >> So even after c93be71e45, processes aren't actually locked to threads.
> >
> > Only the sentinel is not locked.
> 
> The filter can also be run in a different thread, because it's run for
> any remaining output at the time the process terminates.
> 
> This code demonstrates that:

In the discussion of bug#79334, I suggested to make a change in
status_notify which should prevent that, as long as the thread to
which the process is locked is alive.

> >> (Once again, I think we should take the opportunity here to delete the
> >> code for locking processes to threads, since IMO it is not useful, and
> >> it is still broken)
> >
> > I did explain why it is useful, and you haven't brought up any
> > arguments to the contrary.  And if "broken" means that the sentinel
> > can run in the context of a random thread, then how come you are
> > asking to leave it in this broken state?
> 
> I was working on an example to demonstrate how process locking can cause
> problems for unrelated Lisp code when I found this bug.
> 
> Here's a finished example of the problem.
> 
> Suppose I have the following Lisp program which doesn't use threads:
> 
> (run-at-time .1 nil #'async-shell-command "sleep 1 && echo foobar && sleep inf")
> (with-current-buffer (get-buffer-create shell-command-buffer-name-async)
>   (while (string-empty-p (buffer-string))
>     (message "waiting for some process output")
>     (sit-for 1))
>   (message "buffer contents: %s" (buffer-string)))
> 
> This is intended to represent some arbitrary package which calls
> make-process in a timer or a hook.  The command "sleep 1 && echo foobar
> && sleep inf" is chosen to represent some interactive executable like a
> shell or REPL.
> 
> This code runs just fine, with the output appearing in the buffer as
> expected.
> 
> Now suppose I have some other unrelated package which runs the following
> code using threads:
> 
>   (make-thread
>    (lambda ()
>      (accept-process-output nil 1)
>      (thread-join (make-thread (lambda () (while t (message "doing work") (sit-for 10)))))))
> 
> This is intended to represent thread 1 waiting for some other thread 2
> to complete some long-running task.  Crucially, thread 1 is blocked in
> thread-join, which doesn't run wait_reading_process_output, so thread 1
> won't read output from any locked processes.
> 
> Running the second piece of code hangs the first piece of code, even
> though neither of them are buggy, and they aren't visibly interacting,
> and they can be written by totally different authors.
> 
> Specifically: The timer can be run in the thread created by the second
> piece of code, and then the process will be created locked to that
> thread.  Then the process's output will never be read by thread 1.  So
> the first piece of code will hang.

I don't understand why one thread starts a process, then another
thread waits for its output, and the program which arranges for that
doesn't unlock the process so the other thread could do its job.  This
is what set-process-thread is for, and in this (IMO rather unusual)
arrangement, calling it with a nil THREAD argument is exactly what
should be done.

> As far as I can tell, the only possible fix for this problem is to not
> lock processes to threads.

No, the fix is for the program to unlock the process using
set-process-thread.  Did you try that, and if so, did it help?

> (Especially because, as the original bug demonstrates, we aren't fully
> locking processes to threads, neither sentinels nor filters, so we
> aren't actually getting the benefits of that locking, only the costs)

If we want programs using threads to be more deterministic and
predictable, we need to beef up the locking, not throw it away.  At
least that is my conclusion from all these discussions, and the above
doesn't contradict it, at least not yet.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79333; Package emacs. (Fri, 29 Aug 2025 13:30:03 GMT) Full text and rfc822 format available.

Message #20 received at 79333 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org
Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to
 threads
Date: Fri, 29 Aug 2025 09:29:37 -0400

Eli Zaretskii <eliz <at> gnu.org> writes:
>> >> (Once again, I think we should take the opportunity here to delete the
>> >> code for locking processes to threads, since IMO it is not useful, and
>> >> it is still broken)
>> >
>> > I did explain why it is useful, and you haven't brought up any
>> > arguments to the contrary.  And if "broken" means that the sentinel
>> > can run in the context of a random thread, then how come you are
>> > asking to leave it in this broken state?
>> 
>> I was working on an example to demonstrate how process locking can cause
>> problems for unrelated Lisp code when I found this bug.
>> 
>> Here's a finished example of the problem.
>> 
>> Suppose I have the following Lisp program which doesn't use threads:
>> 
>> (run-at-time .1 nil #'async-shell-command "sleep 1 && echo foobar && sleep inf")
>> (with-current-buffer (get-buffer-create shell-command-buffer-name-async)
>>   (while (string-empty-p (buffer-string))
>>     (message "waiting for some process output")
>>     (sit-for 1))
>>   (message "buffer contents: %s" (buffer-string)))
>> 
>> This is intended to represent some arbitrary package which calls
>> make-process in a timer or a hook.  The command "sleep 1 && echo foobar
>> && sleep inf" is chosen to represent some interactive executable like a
>> shell or REPL.
>> 
>> This code runs just fine, with the output appearing in the buffer as
>> expected.
>> 
>> Now suppose I have some other unrelated package which runs the following
>> code using threads:
>> 
>>   (make-thread
>>    (lambda ()
>>      (accept-process-output nil 1)
>>      (thread-join (make-thread (lambda () (while t (message "doing work") (sit-for 10)))))))
>> 
>> This is intended to represent thread 1 waiting for some other thread 2
>> to complete some long-running task.  Crucially, thread 1 is blocked in
>> thread-join, which doesn't run wait_reading_process_output, so thread 1
>> won't read output from any locked processes.
>> 
>> Running the second piece of code hangs the first piece of code, even
>> though neither of them are buggy, and they aren't visibly interacting,
>> and they can be written by totally different authors.
>> 
>> Specifically: The timer can be run in the thread created by the second
>> piece of code, and then the process will be created locked to that
>> thread.  Then the process's output will never be read by thread 1.  So
>> the first piece of code will hang.
>
> I don't understand why one thread starts a process, then another
> thread waits for its output, and the program which arranges for that
> doesn't unlock the process so the other thread could do its job.

Yes, that's the bug.

It is not intentional that the process is started in a thread.  That is
what causes the bug.

> This is what set-process-thread is
> for, and in this (IMO rather unusual) arrangement, calling it with a
> nil THREAD argument is exactly what should be done.

The point is that these are two independent pieces of code, written by
different authors.

If they just happen to interleave in this way, then the process is
*accidentally, unintentionally* started in a thread.

How would the author of snippet 1 know to call set-process-thread in
this case?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79333; Package emacs. (Fri, 29 Aug 2025 13:56:02 GMT) Full text and rfc822 format available.

Message #23 received at 79333 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org
Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to
 threads
Date: Fri, 29 Aug 2025 16:55:19 +0300

> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: dmitry <at> gutov.dev,  79333 <at> debbugs.gnu.org
> Date: Fri, 29 Aug 2025 09:29:37 -0400
> 
> > I don't understand why one thread starts a process, then another
> > thread waits for its output, and the program which arranges for that
> > doesn't unlock the process so the other thread could do its job.
> 
> Yes, that's the bug.
> 
> It is not intentional that the process is started in a thread.  That is
> what causes the bug.
> 
> > This is what set-process-thread is
> > for, and in this (IMO rather unusual) arrangement, calling it with a
> > nil THREAD argument is exactly what should be done.
> 
> The point is that these are two independent pieces of code, written by
> different authors.
> 
> If they just happen to interleave in this way, then the process is
> *accidentally, unintentionally* started in a thread.
> 
> How would the author of snippet 1 know to call set-process-thread in
> this case?

If you are saying that two arbitrary independently-written pieces of
code can get in trouble if they are lumped together to run by the same
Lisp program in two separate threads, then I agree.

However, having a function that starts a process, but doesn't process
its output, and another function that doesn't start any processes, but
does accept output from subprocesses, is an unusual thing to do.  This
could happen as a deliberate design of a program, but then we are not
talking about two snippets oblivious to one another, because the
person who brings them together like that in the same program does
that deliberately, and should understand that for it to work, the
process should be either unlocked or locked to the thread which wants
to read and process its output.

IOW, when making such programs where threads are not independent calls
for some adjustments in the code of each thread.

What I have in mind is a different case, which I think is much more
common, at least at this stage of using Lisp thread in Emacs.  It's a
case where one takes a single-threaded Lisp program, and runs it from
a separate thread so as to avoid blocking the Emacs's main thread.  In
that case, the same thread will both start the process and expect to
be able to process its output (because that's how single-threaded Lisp
programs work), and therefore having the process locked by default
lets such code work as expected when it is run from a thread.
Especially if you take several such programs, each with its own
subprocess, and let them all run from several different threads at the
same time.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79333; Package emacs. (Fri, 29 Aug 2025 15:21:04 GMT) Full text and rfc822 format available.

Message #26 received at 79333 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org
Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to
 threads
Date: Fri, 29 Aug 2025 11:20:32 -0400

Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Cc: dmitry <at> gutov.dev,  79333 <at> debbugs.gnu.org
>> Date: Fri, 29 Aug 2025 09:29:37 -0400
>> 
>> > I don't understand why one thread starts a process, then another
>> > thread waits for its output, and the program which arranges for that
>> > doesn't unlock the process so the other thread could do its job.
>> 
>> Yes, that's the bug.
>> 
>> It is not intentional that the process is started in a thread.  That is
>> what causes the bug.
>> 
>> > This is what set-process-thread is
>> > for, and in this (IMO rather unusual) arrangement, calling it with a
>> > nil THREAD argument is exactly what should be done.
>> 
>> The point is that these are two independent pieces of code, written by
>> different authors.
>> 
>> If they just happen to interleave in this way, then the process is
>> *accidentally, unintentionally* started in a thread.
>> 
>> How would the author of snippet 1 know to call set-process-thread in
>> this case?
>
> If you are saying that two arbitrary independently-written pieces of
> code can get in trouble if they are lumped together to run by the same
> Lisp program in two separate threads, then I agree.

I guess that's what I'm saying.  But the Lisp program here is just
"Emacs".  This combination of two independent pieces of code just
automatically happens when users is using one package which is using
timers, and another package which is using threads.  Which of course
happens all the time without anyone choosing to do it.

For example, one package might add a find-file-hook which starts a
subprocess, then another package might add a find-file-hook which starts
a thread.  Then when the two hooks run in succession, it would cause
this problem.

> However, having a function that starts a process, but doesn't process
> its output, and another function that doesn't start any processes, but
> does accept output from subprocesses, is an unusual thing to do.

Ah, I guess you're referring to the explicit accept-process-output call.
I think that was a confusing part of my example, because it was not
necessary to cause the issue.

Here's a more refined example:

;; Package 1 (perhaps run in a find-file-hook)
(run-at-time .3 nil #'async-shell-command "sleep 1 && echo foobar && sleep inf")
;; Package 2 (perhaps run in a find-file-hook)
(make-thread
 (lambda ()
   (sit-for 1)
   (thread-join (make-thread (lambda () (while t (sit-for 1)))))))

The shell command started by package 1 will sometimes hang forever
without producing output.

> This could happen as a deliberate design of a program, but then we are
> not talking about two snippets oblivious to one another, because the
> person who brings them together like that in the same program does
> that deliberately, and should understand that for it to work, the
> process should be either unlocked or locked to the thread which wants
> to read and process its output.
>
> IOW, when making such programs where threads are not independent calls
> for some adjustments in the code of each thread.
>
> What I have in mind is a different case, which I think is much more
> common, at least at this stage of using Lisp thread in Emacs.  It's a
> case where one takes a single-threaded Lisp program, and runs it from
> a separate thread so as to avoid blocking the Emacs's main thread.  In
> that case, the same thread will both start the process and expect to
> be able to process its output (because that's how single-threaded Lisp
> programs work), and therefore having the process locked by default
> lets such code work as expected when it is run from a thread.
> Especially if you take several such programs, each with its own
> subprocess, and let them all run from several different threads at the
> same time.

Yes, I definitely want that real-world case to work right.  I agree that
that is a very important case.  But I think it already works right with
processes not locked to threads for any non-buggy program.

(I personally have written or used lots of code like that with threads,
and the fact that processes were not fully locked to threads did not
cause problems.  As one public example, diff-hl-mode's
diff-hl-update-async)

For example, a program like this will work correctly even in a thread:

(let ((proc (make-process ...)))
  (accept-process-output proc))

This would run the filter functions in the same thread, because output
from PROC can't be read by another thread until we do a thread switch,
which will only happen when we call accept-process-output.

If the program was instead something like:

(let ((proc (make-process ...)))
  (sit-for 1)
  (accept-process-output proc))

then the (accept-process-output proc) might block because the sit-for
can thread switch.  But this program is already buggy, since sit-for
runs wait_reading_process_output which could read the output from PROC.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79333; Package emacs. (Fri, 29 Aug 2025 15:55:01 GMT) Full text and rfc822 format available.

Message #29 received at 79333 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org
Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to
 threads
Date: Fri, 29 Aug 2025 18:53:49 +0300

> From: Spencer Baugh <sbaugh <at> janestreet.com>
> Cc: dmitry <at> gutov.dev,  79333 <at> debbugs.gnu.org
> Date: Fri, 29 Aug 2025 11:20:32 -0400
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > If you are saying that two arbitrary independently-written pieces of
> > code can get in trouble if they are lumped together to run by the same
> > Lisp program in two separate threads, then I agree.
> 
> I guess that's what I'm saying.  But the Lisp program here is just
> "Emacs".  This combination of two independent pieces of code just
> automatically happens when users is using one package which is using
> timers, and another package which is using threads.  Which of course
> happens all the time without anyone choosing to do it.
> 
> For example, one package might add a find-file-hook which starts a
> subprocess, then another package might add a find-file-hook which starts
> a thread.  Then when the two hooks run in succession, it would cause
> this problem.

It's possible that we should have some guidelines for such situations.
But this is way far in the future, from where I stand: right now,
taking some processing, which works single-threaded and making it run
from a separate thread doesn't work well, and we should first make
sure that's solved.

> > However, having a function that starts a process, but doesn't process
> > its output, and another function that doesn't start any processes, but
> > does accept output from subprocesses, is an unusual thing to do.
> 
> Ah, I guess you're referring to the explicit accept-process-output call.
> I think that was a confusing part of my example, because it was not
> necessary to cause the issue.
> 
> Here's a more refined example:
> 
> ;; Package 1 (perhaps run in a find-file-hook)
> (run-at-time .3 nil #'async-shell-command "sleep 1 && echo foobar && sleep inf")
> ;; Package 2 (perhaps run in a find-file-hook)
> (make-thread
>  (lambda ()
>    (sit-for 1)
>    (thread-join (make-thread (lambda () (while t (sit-for 1)))))))
> 
> The shell command started by package 1 will sometimes hang forever
> without producing output.

I believe this is because of that issue with status_notify.  At least,
we should fix that before we revisit the above and see if anything
else needs to be fixed there.

> > What I have in mind is a different case, which I think is much more
> > common, at least at this stage of using Lisp thread in Emacs.  It's a
> > case where one takes a single-threaded Lisp program, and runs it from
> > a separate thread so as to avoid blocking the Emacs's main thread.  In
> > that case, the same thread will both start the process and expect to
> > be able to process its output (because that's how single-threaded Lisp
> > programs work), and therefore having the process locked by default
> > lets such code work as expected when it is run from a thread.
> > Especially if you take several such programs, each with its own
> > subprocess, and let them all run from several different threads at the
> > same time.
> 
> Yes, I definitely want that real-world case to work right.  I agree that
> that is a very important case.  But I think it already works right with
> processes not locked to threads for any non-buggy program.

That's not my experience.  If random threads get return values from
accept-process-output, you can easily have a thread whose
accept-process-output call never returns until timeout, because the
output was already read by another thread.

> (I personally have written or used lots of code like that with threads,
> and the fact that processes were not fully locked to threads did not
> cause problems.

And locking them does cause problems?

> For example, a program like this will work correctly even in a thread:
> 
> (let ((proc (make-process ...)))
>   (accept-process-output proc))
> 
> This would run the filter functions in the same thread, because output
> from PROC can't be read by another thread until we do a thread switch,
> which will only happen when we call accept-process-output.

What matters is which thread gets first to the pselect call.  That's
unpredictable, because it's racy.

> If the program was instead something like:
> 
> (let ((proc (make-process ...)))
>   (sit-for 1)
>   (accept-process-output proc))
> 
> then the (accept-process-output proc) might block because the sit-for
> can thread switch.  But this program is already buggy, since sit-for
> runs wait_reading_process_output which could read the output from PROC.

A program can easily call sit-for indirectly, because sit-for is
called all over the place in Emacs.

This is why locking processes is better: it makes the program more
predictable.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79333; Package emacs. (Fri, 29 Aug 2025 16:08:01 GMT) Full text and rfc822 format available.

Message #32 received at 79333 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Dmitry Gutov <dmitry <at> gutov.dev>, 79333 <at> debbugs.gnu.org
Subject: Re: bug#79333: 31.0.50;
 Processes (still) aren't actually locked to threads
Date: Fri, 29 Aug 2025 12:06:58 -0400

[Message part 1 (text/plain, inline)]

On Fri, Aug 29, 2025, 11:53 AM Eli Zaretskii <eliz <at> gnu.org> wrote:

> > From: Spencer Baugh <sbaugh <at> janestreet.com>
> > Cc: dmitry <at> gutov.dev,  79333 <at> debbugs.gnu.org
> > Date: Fri, 29 Aug 2025 11:20:32 -0400
> >
> > Eli Zaretskii <eliz <at> gnu.org> writes:
> >
> > > If you are saying that two arbitrary independently-written pieces of
> > > code can get in trouble if they are lumped together to run by the same
> > > Lisp program in two separate threads, then I agree.
> >
> > I guess that's what I'm saying.  But the Lisp program here is just
> > "Emacs".  This combination of two independent pieces of code just
> > automatically happens when users is using one package which is using
> > timers, and another package which is using threads.  Which of course
> > happens all the time without anyone choosing to do it.
> >
> > For example, one package might add a find-file-hook which starts a
> > subprocess, then another package might add a find-file-hook which starts
> > a thread.  Then when the two hooks run in succession, it would cause
> > this problem.
>
> It's possible that we should have some guidelines for such situations.
> But this is way far in the future, from where I stand: right now,
> taking some processing, which works single-threaded and making it run
> from a separate thread doesn't work well, and we should first make
> sure that's solved.
>

What would the guidelines be?  I don't believe there's any way to fix this
problem other than by unlocking every process you create.

> > However, having a function that starts a process, but doesn't process
> > > its output, and another function that doesn't start any processes, but
> > > does accept output from subprocesses, is an unusual thing to do.
> >
> > Ah, I guess you're referring to the explicit accept-process-output call.
> > I think that was a confusing part of my example, because it was not
> > necessary to cause the issue.
> >
> > Here's a more refined example:
> >
> > ;; Package 1 (perhaps run in a find-file-hook)
> > (run-at-time .3 nil #'async-shell-command "sleep 1 && echo foobar &&
> sleep inf")
> > ;; Package 2 (perhaps run in a find-file-hook)
> > (make-thread
> >  (lambda ()
> >    (sit-for 1)
> >    (thread-join (make-thread (lambda () (while t (sit-for 1)))))))
> >
> > The shell command started by package 1 will sometimes hang forever
> > without producing output.
>
> I believe this is because of that issue with status_notify.  At least,
> we should fix that before we revisit the above and see if anything
> else needs to be fixed there.
>

This bug still happens even with my initial fixes for the status_notify
issue.  But sure, we can fix that first and then come back to this one.

(I want to make sure we don't release Emacs 32 with the change that I
believe breaks existing thread programs, but as long as we resolve the
issues before then, I'm in no rush.  I've just reverted the change at my
site anyway)

> > What I have in mind is a different case, which I think is much more
> > > common, at least at this stage of using Lisp thread in Emacs.  It's a
> > > case where one takes a single-threaded Lisp program, and runs it from
> > > a separate thread so as to avoid blocking the Emacs's main thread.  In
> > > that case, the same thread will both start the process and expect to
> > > be able to process its output (because that's how single-threaded Lisp
> > > programs work), and therefore having the process locked by default
> > > lets such code work as expected when it is run from a thread.
> > > Especially if you take several such programs, each with its own
> > > subprocess, and let them all run from several different threads at the
> > > same time.
> >
> > Yes, I definitely want that real-world case to work right.  I agree that
> > that is a very important case.  But I think it already works right with
> > processes not locked to threads for any non-buggy program.
>
> That's not my experience.  If random threads get return values from
> accept-process-output, you can easily have a thread whose
> accept-process-output call never returns until timeout, because the
> output was already read by another thread.
>

I know you don't have much time to work on this, but it would really help
if you could give a concrete example program that demonstrates this.

> (I personally have written or used lots of code like that with threads,
> > and the fact that processes were not fully locked to threads did not
> > cause problems.
>
> And locking them does cause problems?
>

Yes.  Such as in the example I was describing above.

> For example, a program like this will work correctly even in a thread:
> >
> > (let ((proc (make-process ...)))
> >   (accept-process-output proc))
> >
> > This would run the filter functions in the same thread, because output
> > from PROC can't be read by another thread until we do a thread switch,
> > which will only happen when we call accept-process-output.
>
> What matters is which thread gets first to the pselect call.  That's
> unpredictable, because it's racy.
>

It is not racy in this example.  Even without locking.

> If the program was instead something like:
> >
> > (let ((proc (make-process ...)))
> >   (sit-for 1)
> >   (accept-process-output proc))
> >
> > then the (accept-process-output proc) might block because the sit-for
> > can thread switch.  But this program is already buggy, since sit-for
> > runs wait_reading_process_output which could read the output from PROC.
>
> A program can easily call sit-for indirectly, because sit-for is
> called all over the place in Emacs.
>

That's my point.  This second example program is buggy whether threads are
used or not.

>

[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79333; Package emacs. (Mon, 01 Sep 2025 01:25:01 GMT) Full text and rfc822 format available.

Message #35 received at 79333 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Spencer Baugh <sbaugh <at> janestreet.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 79333 <at> debbugs.gnu.org
Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to
 threads
Date: Mon, 1 Sep 2025 04:24:27 +0300

On 29/08/2025 19:06, Spencer Baugh wrote:
>      > If the program was instead something like:
>      >
>      > (let ((proc (make-process ...)))
>      >   (sit-for 1)
>      >   (accept-process-output proc))
>      >
>      > then the (accept-process-output proc) might block because the sit-for
>      > can thread switch.  But this program is already buggy, since sit-for
>      > runs wait_reading_process_output which could read the output from
>     PROC.
> 
>     A program can easily call sit-for indirectly, because sit-for is
>     called all over the place in Emacs.
> 
> 
> That's my point.  This second example program is buggy whether threads 
> are used or not.

I wonder if we would consider comint-proc-query already problematic in 
this regard.

It does:

      (comint-send-string proc str) ; send the query
      (accept-process-output proc)  ; wait for some output

and comint-send-string -> process-send-string -> send_process, which has 
a 'wait_reading_process_output' call inside.

Is it at least theoretically possible that the latter call consumes the 
output from the process, making the subsequent accept-process-output 
call in the function hang?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79333; Package emacs. (Mon, 01 Sep 2025 15:00:02 GMT) Full text and rfc822 format available.

Message #38 received at 79333 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: sbaugh <at> janestreet.com, 79333 <at> debbugs.gnu.org
Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to
 threads
Date: Mon, 01 Sep 2025 17:59:06 +0300

> Date: Mon, 1 Sep 2025 04:24:27 +0300
> Cc: 79333 <at> debbugs.gnu.org
> From: Dmitry Gutov <dmitry <at> gutov.dev>
> 
> On 29/08/2025 19:06, Spencer Baugh wrote:
> >      > If the program was instead something like:
> >      >
> >      > (let ((proc (make-process ...)))
> >      >   (sit-for 1)
> >      >   (accept-process-output proc))
> >      >
> >      > then the (accept-process-output proc) might block because the sit-for
> >      > can thread switch.  But this program is already buggy, since sit-for
> >      > runs wait_reading_process_output which could read the output from
> >     PROC.
> > 
> >     A program can easily call sit-for indirectly, because sit-for is
> >     called all over the place in Emacs.
> > 
> > 
> > That's my point.  This second example program is buggy whether threads 
> > are used or not.
> 
> I wonder if we would consider comint-proc-query already problematic in 
> this regard.
> 
> It does:
> 
>        (comint-send-string proc str) ; send the query
>        (accept-process-output proc)  ; wait for some output
> 
> and comint-send-string -> process-send-string -> send_process, which has 
> a 'wait_reading_process_output' call inside.

Why do you think this could be problematic?

> Is it at least theoretically possible that the latter call consumes the 
> output from the process, making the subsequent accept-process-output 
> call in the function hang?

It shouldn't.  send_process only calls wait_reading_process_output if
it cannot write the whole string in one go, AFAIR, in which case the
last part of the process's output should still be available to
accept-process-output call.

This bug report was last modified 9 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #79333 31.0.50; Processes (still) aren't actually locked to threads

GNU bug report logs - #79333
31.0.50; Processes (still) aren't actually locked to threads