Package: emacs;
Reported by: Spencer Baugh <sbaugh <at> janestreet.com>
Date: Thu, 28 Aug 2025 19:46:02 UTC
Severity: normal
Found in version 31.0.50
To reply to this bug, email your comments to 79333 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
eliz <at> gnu.org, dmitry <at> gutov.dev, bug-gnu-emacs <at> gnu.org
:bug#79333
; Package emacs
.
(Thu, 28 Aug 2025 19:46:02 GMT) Full text and rfc822 format available.Spencer Baugh <sbaugh <at> janestreet.com>
:eliz <at> gnu.org, dmitry <at> gutov.dev, bug-gnu-emacs <at> gnu.org
.
(Thu, 28 Aug 2025 19:46:02 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Spencer Baugh <sbaugh <at> janestreet.com> To: bug-gnu-emacs <at> gnu.org Subject: 31.0.50; Processes (still) aren't actually locked to threads Date: Thu, 28 Aug 2025 15:45:18 -0400
1. emacs -Q 2. Eval this: (define-advice shell-command-sentinel (:before (process signal)) (message "process thread: %s, current thread: %s" (process-thread process) (current-thread))) (make-thread (lambda () (async-shell-command "sleep 2") ;; Do nothing. (thread-join (make-thread (lambda () (while t (sit-for 1))))))) (sit-for 3) 3. It hangs due to some thread bug, independent of the main bug I'm reporting here. Just hit C-g. (Notably, it doesn't hang when run in emacs --batch) 4. Observe in *Messages* a message like this: process thread: #<thread 0x3a25bf60>, current thread: #<thread 0x7f87c009e920> The sentinel ran in a thread which is not process-thread. So even after c93be71e45, processes aren't actually locked to threads. (Once again, I think we should take the opportunity here to delete the code for locking processes to threads, since IMO it is not useful, and it is still broken) In GNU Emacs 31.0.50 (build 88, x86_64-pc-linux-gnu, GTK+ Version 3.22.30, cairo version 1.15.12) of 2025-08-28 built on igm-qws-u22796a Repository revision: 1ebb6e8822b5fc635549be14a3d4f2dd6f2d77a4 Repository branch: master Windowing system distributor 'The X.Org Foundation', version 11.0.12011000 System Description: Rocky Linux 8.10 (Green Obsidian) Configured using: 'configure -C --with-gif=no' Configured features: CAIRO DBUS FREETYPE GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG LIBSELINUX LIBSYSTEMD LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XINERAMA XINPUT2 XPM XRANDR GTK3 ZLIB Important settings: value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix Major mode: Lisp Interaction Minor modes in effect: tooltip-mode: t global-eldoc-mode: t eldoc-mode: t show-paren-mode: t electric-indent-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t minibuffer-regexp-mode: t line-number-mode: t indent-tabs-mode: t transient-mark-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t Load-path shadows: None found. Features: (shadow sort mail-extr compile comint ansi-osc ansi-color ring emacsbug lisp-mnt message mailcap yank-media puny dired dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068 epg-config gnus-util text-property-search time-date subr-x mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils comp-run bytecomp byte-compile comp-common rx warnings icons cl-loaddefs cl-lib rmc iso-transl tooltip cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd touch-screen tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors frame minibuffer nadvice seq simple cl-generic indonesian philippine cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite emoji-zwj charscript charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp files window text-properties overlay sha1 md5 base64 format env code-pages mule custom widget keymap hashtable-print-readable backquote threads dbusbind inotify dynamic-setting system-font-setting font-render-setting cairo gtk x-toolkit xinput2 x multi-tty move-toolbar make-network-process tty-child-frames native-compile emacs) Memory information: ((conses 16 86231 9338) (symbols 48 6873 0) (strings 32 24499 2161) (string-bytes 1 811121) (vectors 16 15629) (vector-slots 8 186044 7873) (floats 8 30 1) (intervals 56 275 0) (buffers 984 11))
bug-gnu-emacs <at> gnu.org
:bug#79333
; Package emacs
.
(Thu, 28 Aug 2025 20:45:03 GMT) Full text and rfc822 format available.Message #8 received at 79333 <at> debbugs.gnu.org (full text, mbox):
From: Spencer Baugh <sbaugh <at> janestreet.com> To: 79333 <at> debbugs.gnu.org Cc: Dmitry Gutov <dmitry <at> gutov.dev>, Eli Zaretskii <eliz <at> gnu.org> Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to threads Date: Thu, 28 Aug 2025 16:44:10 -0400
Spencer Baugh <sbaugh <at> janestreet.com> writes: > 3. It hangs due to some thread bug, independent of the main bug I'm > reporting here. Just hit C-g. (Notably, it doesn't hang when run in > emacs --batch) Extra detail about this: This hang happens for GTK3 Emacs, but doesn't happen for with-x-toolkit=lucid.
bug-gnu-emacs <at> gnu.org
:bug#79333
; Package emacs
.
(Fri, 29 Aug 2025 07:36:02 GMT) Full text and rfc822 format available.Message #11 received at 79333 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Spencer Baugh <sbaugh <at> janestreet.com> Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to threads Date: Fri, 29 Aug 2025 10:35:15 +0300
> Cc: Eli Zaretskii <eliz <at> gnu.org>, Dmitry Gutov <dmitry <at> gutov.dev> > From: Spencer Baugh <sbaugh <at> janestreet.com> > Date: Thu, 28 Aug 2025 15:45:18 -0400 > > > 1. emacs -Q > 2. Eval this: > > (define-advice shell-command-sentinel (:before (process signal)) > (message "process thread: %s, current thread: %s" > (process-thread process) > (current-thread))) > (make-thread > (lambda () > (async-shell-command "sleep 2") > ;; Do nothing. > (thread-join (make-thread (lambda () (while t (sit-for 1))))))) > (sit-for 3) > > 3. It hangs due to some thread bug, independent of the main bug I'm > reporting here. Just hit C-g. (Notably, it doesn't hang when run in > emacs --batch) > > 4. Observe in *Messages* a message like this: > > process thread: #<thread 0x3a25bf60>, current thread: #<thread 0x7f87c009e920> > > The sentinel ran in a thread which is not process-thread. I believe that's because of the way we process SIGCHLD on Posix systems: we write to a special file descriptor to wake pselect. That file descriptor is used by all the subprocesses, so it cannot be made thread-specific. See child_signal_init and child_signal_notify. Since this descriptor is shared by all the subprocesses, the sentinel on Posix systems can run in the context of some random thread that succeeds to grab the global lock after it returns from pselect. Making sentinels run in the context of the thread that started the process would require to redesign this part of Emacs. Until then, we will need to document this subtlety. > So even after c93be71e45, processes aren't actually locked to threads. Only the sentinel is not locked. > (Once again, I think we should take the opportunity here to delete the > code for locking processes to threads, since IMO it is not useful, and > it is still broken) I did explain why it is useful, and you haven't brought up any arguments to the contrary. And if "broken" means that the sentinel can run in the context of a random thread, then how come you are asking to leave it in this broken state?
bug-gnu-emacs <at> gnu.org
:bug#79333
; Package emacs
.
(Fri, 29 Aug 2025 12:56:02 GMT) Full text and rfc822 format available.Message #14 received at 79333 <at> debbugs.gnu.org (full text, mbox):
From: Spencer Baugh <sbaugh <at> janestreet.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to threads Date: Fri, 29 Aug 2025 08:55:41 -0400
Eli Zaretskii <eliz <at> gnu.org> writes: >> Cc: Eli Zaretskii <eliz <at> gnu.org>, Dmitry Gutov <dmitry <at> gutov.dev> >> From: Spencer Baugh <sbaugh <at> janestreet.com> >> Date: Thu, 28 Aug 2025 15:45:18 -0400 >> >> >> 1. emacs -Q >> 2. Eval this: >> >> (define-advice shell-command-sentinel (:before (process signal)) >> (message "process thread: %s, current thread: %s" >> (process-thread process) >> (current-thread))) >> (make-thread >> (lambda () >> (async-shell-command "sleep 2") >> ;; Do nothing. >> (thread-join (make-thread (lambda () (while t (sit-for 1))))))) >> (sit-for 3) >> >> 3. It hangs due to some thread bug, independent of the main bug I'm >> reporting here. Just hit C-g. (Notably, it doesn't hang when run in >> emacs --batch) >> >> 4. Observe in *Messages* a message like this: >> >> process thread: #<thread 0x3a25bf60>, current thread: #<thread 0x7f87c009e920> >> >> The sentinel ran in a thread which is not process-thread. > > I believe that's because of the way we process SIGCHLD on Posix > systems: we write to a special file descriptor to wake pselect. That > file descriptor is used by all the subprocesses, so it cannot be made > thread-specific. See child_signal_init and child_signal_notify. > > Since this descriptor is shared by all the subprocesses, the sentinel > on Posix systems can run in the context of some random thread that > succeeds to grab the global lock after it returns from pselect. > > Making sentinels run in the context of the thread that started the > process would require to redesign this part of Emacs. Until then, we > will need to document this subtlety. Makes sense. >> So even after c93be71e45, processes aren't actually locked to threads. > > Only the sentinel is not locked. The filter can also be run in a different thread, because it's run for any remaining output at the time the process terminates. This code demonstrates that: (define-advice shell-command-sentinel (:before (process signal)) (message "process thread: %s, current thread: %s" (process-thread process) (current-thread))) (define-advice comint-output-filter (:before (process string)) (message "filter: %s, current thread: %s" (process-thread process) (current-thread))) (make-thread (lambda () (async-shell-command "sleep 2 && echo hi") ;; Do nothing. (thread-join (make-thread (lambda () (while t (sit-for 1))))))) (sit-for 3) So, the following are unlocked: - calls into the filter triggered by process state changes - calls into the sentinel And the following are locked: - other calls into the filter Seems hard to document. >> (Once again, I think we should take the opportunity here to delete the >> code for locking processes to threads, since IMO it is not useful, and >> it is still broken) > > I did explain why it is useful, and you haven't brought up any > arguments to the contrary. And if "broken" means that the sentinel > can run in the context of a random thread, then how come you are > asking to leave it in this broken state? I was working on an example to demonstrate how process locking can cause problems for unrelated Lisp code when I found this bug. Here's a finished example of the problem. Suppose I have the following Lisp program which doesn't use threads: (run-at-time .1 nil #'async-shell-command "sleep 1 && echo foobar && sleep inf") (with-current-buffer (get-buffer-create shell-command-buffer-name-async) (while (string-empty-p (buffer-string)) (message "waiting for some process output") (sit-for 1)) (message "buffer contents: %s" (buffer-string))) This is intended to represent some arbitrary package which calls make-process in a timer or a hook. The command "sleep 1 && echo foobar && sleep inf" is chosen to represent some interactive executable like a shell or REPL. This code runs just fine, with the output appearing in the buffer as expected. Now suppose I have some other unrelated package which runs the following code using threads: (make-thread (lambda () (accept-process-output nil 1) (thread-join (make-thread (lambda () (while t (message "doing work") (sit-for 10))))))) This is intended to represent thread 1 waiting for some other thread 2 to complete some long-running task. Crucially, thread 1 is blocked in thread-join, which doesn't run wait_reading_process_output, so thread 1 won't read output from any locked processes. Running the second piece of code hangs the first piece of code, even though neither of them are buggy, and they aren't visibly interacting, and they can be written by totally different authors. Specifically: The timer can be run in the thread created by the second piece of code, and then the process will be created locked to that thread. Then the process's output will never be read by thread 1. So the first piece of code will hang. Here's a complete example, with some logging. (Note that the problem doesn't occur on every run because it's a race condition.) (define-advice start-process (:filter-return (proc)) (message "start-process: %s" (process-thread proc)) proc) (run-at-time .5 nil #'async-shell-command "sleep 1 && echo foobar && sleep inf") (make-thread (lambda () (accept-process-output nil 1) (thread-join (make-thread (lambda () (while t (message "doing work") (sit-for 10))))))) (with-current-buffer (get-buffer-create shell-command-buffer-name-async) (while (string-empty-p (buffer-string)) (message "waiting for some process output") (sit-for 1)) (message "buffer contents: %s" (buffer-string))) This isn't just an academic problem. Most Lisp programs that start processes will run into this problem if there are any Lisp threads running. One way to fix this would be to make it so that thread-join and condition-wait both call wait_reading_process_output. Then there wouldn't be a way to block a thread without calling wait_reading_process_output. Unfortunately, that's difficult to do in a portable way, because we would need to integrate waiting for a condition variable into the wait_reading_process_output event loop, which is impossible on GNU/Linux. Another way to fix this would be to make timers run only in the thread which started them. However, this is insufficient, because the same problem can occur with hooks. Any hook can be run by an unrelated thread, and processes started in that hook may hang if the thread is doing work which doesn't involve calling wait_reading_process_output. As far as I can tell, the only possible fix for this problem is to not lock processes to threads. This problem seems worse than the problems prevented by locking processes to threads, so I think this is the right fix. (Especially because, as the original bug demonstrates, we aren't fully locking processes to threads, neither sentinels nor filters, so we aren't actually getting the benefits of that locking, only the costs)
bug-gnu-emacs <at> gnu.org
:bug#79333
; Package emacs
.
(Fri, 29 Aug 2025 13:20:03 GMT) Full text and rfc822 format available.Message #17 received at 79333 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Spencer Baugh <sbaugh <at> janestreet.com> Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to threads Date: Fri, 29 Aug 2025 16:19:30 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com> > Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org > Date: Fri, 29 Aug 2025 08:55:41 -0400 > > Eli Zaretskii <eliz <at> gnu.org> writes: > > > Making sentinels run in the context of the thread that started the > > process would require to redesign this part of Emacs. Until then, we > > will need to document this subtlety. > > Makes sense. > > >> So even after c93be71e45, processes aren't actually locked to threads. > > > > Only the sentinel is not locked. > > The filter can also be run in a different thread, because it's run for > any remaining output at the time the process terminates. > > This code demonstrates that: In the discussion of bug#79334, I suggested to make a change in status_notify which should prevent that, as long as the thread to which the process is locked is alive. > >> (Once again, I think we should take the opportunity here to delete the > >> code for locking processes to threads, since IMO it is not useful, and > >> it is still broken) > > > > I did explain why it is useful, and you haven't brought up any > > arguments to the contrary. And if "broken" means that the sentinel > > can run in the context of a random thread, then how come you are > > asking to leave it in this broken state? > > I was working on an example to demonstrate how process locking can cause > problems for unrelated Lisp code when I found this bug. > > Here's a finished example of the problem. > > Suppose I have the following Lisp program which doesn't use threads: > > (run-at-time .1 nil #'async-shell-command "sleep 1 && echo foobar && sleep inf") > (with-current-buffer (get-buffer-create shell-command-buffer-name-async) > (while (string-empty-p (buffer-string)) > (message "waiting for some process output") > (sit-for 1)) > (message "buffer contents: %s" (buffer-string))) > > This is intended to represent some arbitrary package which calls > make-process in a timer or a hook. The command "sleep 1 && echo foobar > && sleep inf" is chosen to represent some interactive executable like a > shell or REPL. > > This code runs just fine, with the output appearing in the buffer as > expected. > > Now suppose I have some other unrelated package which runs the following > code using threads: > > (make-thread > (lambda () > (accept-process-output nil 1) > (thread-join (make-thread (lambda () (while t (message "doing work") (sit-for 10))))))) > > This is intended to represent thread 1 waiting for some other thread 2 > to complete some long-running task. Crucially, thread 1 is blocked in > thread-join, which doesn't run wait_reading_process_output, so thread 1 > won't read output from any locked processes. > > Running the second piece of code hangs the first piece of code, even > though neither of them are buggy, and they aren't visibly interacting, > and they can be written by totally different authors. > > Specifically: The timer can be run in the thread created by the second > piece of code, and then the process will be created locked to that > thread. Then the process's output will never be read by thread 1. So > the first piece of code will hang. I don't understand why one thread starts a process, then another thread waits for its output, and the program which arranges for that doesn't unlock the process so the other thread could do its job. This is what set-process-thread is for, and in this (IMO rather unusual) arrangement, calling it with a nil THREAD argument is exactly what should be done. > As far as I can tell, the only possible fix for this problem is to not > lock processes to threads. No, the fix is for the program to unlock the process using set-process-thread. Did you try that, and if so, did it help? > (Especially because, as the original bug demonstrates, we aren't fully > locking processes to threads, neither sentinels nor filters, so we > aren't actually getting the benefits of that locking, only the costs) If we want programs using threads to be more deterministic and predictable, we need to beef up the locking, not throw it away. At least that is my conclusion from all these discussions, and the above doesn't contradict it, at least not yet.
bug-gnu-emacs <at> gnu.org
:bug#79333
; Package emacs
.
(Fri, 29 Aug 2025 13:30:03 GMT) Full text and rfc822 format available.Message #20 received at 79333 <at> debbugs.gnu.org (full text, mbox):
From: Spencer Baugh <sbaugh <at> janestreet.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to threads Date: Fri, 29 Aug 2025 09:29:37 -0400
Eli Zaretskii <eliz <at> gnu.org> writes: >> >> (Once again, I think we should take the opportunity here to delete the >> >> code for locking processes to threads, since IMO it is not useful, and >> >> it is still broken) >> > >> > I did explain why it is useful, and you haven't brought up any >> > arguments to the contrary. And if "broken" means that the sentinel >> > can run in the context of a random thread, then how come you are >> > asking to leave it in this broken state? >> >> I was working on an example to demonstrate how process locking can cause >> problems for unrelated Lisp code when I found this bug. >> >> Here's a finished example of the problem. >> >> Suppose I have the following Lisp program which doesn't use threads: >> >> (run-at-time .1 nil #'async-shell-command "sleep 1 && echo foobar && sleep inf") >> (with-current-buffer (get-buffer-create shell-command-buffer-name-async) >> (while (string-empty-p (buffer-string)) >> (message "waiting for some process output") >> (sit-for 1)) >> (message "buffer contents: %s" (buffer-string))) >> >> This is intended to represent some arbitrary package which calls >> make-process in a timer or a hook. The command "sleep 1 && echo foobar >> && sleep inf" is chosen to represent some interactive executable like a >> shell or REPL. >> >> This code runs just fine, with the output appearing in the buffer as >> expected. >> >> Now suppose I have some other unrelated package which runs the following >> code using threads: >> >> (make-thread >> (lambda () >> (accept-process-output nil 1) >> (thread-join (make-thread (lambda () (while t (message "doing work") (sit-for 10))))))) >> >> This is intended to represent thread 1 waiting for some other thread 2 >> to complete some long-running task. Crucially, thread 1 is blocked in >> thread-join, which doesn't run wait_reading_process_output, so thread 1 >> won't read output from any locked processes. >> >> Running the second piece of code hangs the first piece of code, even >> though neither of them are buggy, and they aren't visibly interacting, >> and they can be written by totally different authors. >> >> Specifically: The timer can be run in the thread created by the second >> piece of code, and then the process will be created locked to that >> thread. Then the process's output will never be read by thread 1. So >> the first piece of code will hang. > > I don't understand why one thread starts a process, then another > thread waits for its output, and the program which arranges for that > doesn't unlock the process so the other thread could do its job. Yes, that's the bug. It is not intentional that the process is started in a thread. That is what causes the bug. > This is what set-process-thread is > for, and in this (IMO rather unusual) arrangement, calling it with a > nil THREAD argument is exactly what should be done. The point is that these are two independent pieces of code, written by different authors. If they just happen to interleave in this way, then the process is *accidentally, unintentionally* started in a thread. How would the author of snippet 1 know to call set-process-thread in this case?
bug-gnu-emacs <at> gnu.org
:bug#79333
; Package emacs
.
(Fri, 29 Aug 2025 13:56:02 GMT) Full text and rfc822 format available.Message #23 received at 79333 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Spencer Baugh <sbaugh <at> janestreet.com> Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to threads Date: Fri, 29 Aug 2025 16:55:19 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com> > Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org > Date: Fri, 29 Aug 2025 09:29:37 -0400 > > > I don't understand why one thread starts a process, then another > > thread waits for its output, and the program which arranges for that > > doesn't unlock the process so the other thread could do its job. > > Yes, that's the bug. > > It is not intentional that the process is started in a thread. That is > what causes the bug. > > > This is what set-process-thread is > > for, and in this (IMO rather unusual) arrangement, calling it with a > > nil THREAD argument is exactly what should be done. > > The point is that these are two independent pieces of code, written by > different authors. > > If they just happen to interleave in this way, then the process is > *accidentally, unintentionally* started in a thread. > > How would the author of snippet 1 know to call set-process-thread in > this case? If you are saying that two arbitrary independently-written pieces of code can get in trouble if they are lumped together to run by the same Lisp program in two separate threads, then I agree. However, having a function that starts a process, but doesn't process its output, and another function that doesn't start any processes, but does accept output from subprocesses, is an unusual thing to do. This could happen as a deliberate design of a program, but then we are not talking about two snippets oblivious to one another, because the person who brings them together like that in the same program does that deliberately, and should understand that for it to work, the process should be either unlocked or locked to the thread which wants to read and process its output. IOW, when making such programs where threads are not independent calls for some adjustments in the code of each thread. What I have in mind is a different case, which I think is much more common, at least at this stage of using Lisp thread in Emacs. It's a case where one takes a single-threaded Lisp program, and runs it from a separate thread so as to avoid blocking the Emacs's main thread. In that case, the same thread will both start the process and expect to be able to process its output (because that's how single-threaded Lisp programs work), and therefore having the process locked by default lets such code work as expected when it is run from a thread. Especially if you take several such programs, each with its own subprocess, and let them all run from several different threads at the same time.
bug-gnu-emacs <at> gnu.org
:bug#79333
; Package emacs
.
(Fri, 29 Aug 2025 15:21:04 GMT) Full text and rfc822 format available.Message #26 received at 79333 <at> debbugs.gnu.org (full text, mbox):
From: Spencer Baugh <sbaugh <at> janestreet.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to threads Date: Fri, 29 Aug 2025 11:20:32 -0400
Eli Zaretskii <eliz <at> gnu.org> writes: >> From: Spencer Baugh <sbaugh <at> janestreet.com> >> Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org >> Date: Fri, 29 Aug 2025 09:29:37 -0400 >> >> > I don't understand why one thread starts a process, then another >> > thread waits for its output, and the program which arranges for that >> > doesn't unlock the process so the other thread could do its job. >> >> Yes, that's the bug. >> >> It is not intentional that the process is started in a thread. That is >> what causes the bug. >> >> > This is what set-process-thread is >> > for, and in this (IMO rather unusual) arrangement, calling it with a >> > nil THREAD argument is exactly what should be done. >> >> The point is that these are two independent pieces of code, written by >> different authors. >> >> If they just happen to interleave in this way, then the process is >> *accidentally, unintentionally* started in a thread. >> >> How would the author of snippet 1 know to call set-process-thread in >> this case? > > If you are saying that two arbitrary independently-written pieces of > code can get in trouble if they are lumped together to run by the same > Lisp program in two separate threads, then I agree. I guess that's what I'm saying. But the Lisp program here is just "Emacs". This combination of two independent pieces of code just automatically happens when users is using one package which is using timers, and another package which is using threads. Which of course happens all the time without anyone choosing to do it. For example, one package might add a find-file-hook which starts a subprocess, then another package might add a find-file-hook which starts a thread. Then when the two hooks run in succession, it would cause this problem. > However, having a function that starts a process, but doesn't process > its output, and another function that doesn't start any processes, but > does accept output from subprocesses, is an unusual thing to do. Ah, I guess you're referring to the explicit accept-process-output call. I think that was a confusing part of my example, because it was not necessary to cause the issue. Here's a more refined example: ;; Package 1 (perhaps run in a find-file-hook) (run-at-time .3 nil #'async-shell-command "sleep 1 && echo foobar && sleep inf") ;; Package 2 (perhaps run in a find-file-hook) (make-thread (lambda () (sit-for 1) (thread-join (make-thread (lambda () (while t (sit-for 1))))))) The shell command started by package 1 will sometimes hang forever without producing output. > This could happen as a deliberate design of a program, but then we are > not talking about two snippets oblivious to one another, because the > person who brings them together like that in the same program does > that deliberately, and should understand that for it to work, the > process should be either unlocked or locked to the thread which wants > to read and process its output. > > IOW, when making such programs where threads are not independent calls > for some adjustments in the code of each thread. > > What I have in mind is a different case, which I think is much more > common, at least at this stage of using Lisp thread in Emacs. It's a > case where one takes a single-threaded Lisp program, and runs it from > a separate thread so as to avoid blocking the Emacs's main thread. In > that case, the same thread will both start the process and expect to > be able to process its output (because that's how single-threaded Lisp > programs work), and therefore having the process locked by default > lets such code work as expected when it is run from a thread. > Especially if you take several such programs, each with its own > subprocess, and let them all run from several different threads at the > same time. Yes, I definitely want that real-world case to work right. I agree that that is a very important case. But I think it already works right with processes not locked to threads for any non-buggy program. (I personally have written or used lots of code like that with threads, and the fact that processes were not fully locked to threads did not cause problems. As one public example, diff-hl-mode's diff-hl-update-async) For example, a program like this will work correctly even in a thread: (let ((proc (make-process ...))) (accept-process-output proc)) This would run the filter functions in the same thread, because output from PROC can't be read by another thread until we do a thread switch, which will only happen when we call accept-process-output. If the program was instead something like: (let ((proc (make-process ...))) (sit-for 1) (accept-process-output proc)) then the (accept-process-output proc) might block because the sit-for can thread switch. But this program is already buggy, since sit-for runs wait_reading_process_output which could read the output from PROC.
bug-gnu-emacs <at> gnu.org
:bug#79333
; Package emacs
.
(Fri, 29 Aug 2025 15:55:01 GMT) Full text and rfc822 format available.Message #29 received at 79333 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Spencer Baugh <sbaugh <at> janestreet.com> Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to threads Date: Fri, 29 Aug 2025 18:53:49 +0300
> From: Spencer Baugh <sbaugh <at> janestreet.com> > Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org > Date: Fri, 29 Aug 2025 11:20:32 -0400 > > Eli Zaretskii <eliz <at> gnu.org> writes: > > > If you are saying that two arbitrary independently-written pieces of > > code can get in trouble if they are lumped together to run by the same > > Lisp program in two separate threads, then I agree. > > I guess that's what I'm saying. But the Lisp program here is just > "Emacs". This combination of two independent pieces of code just > automatically happens when users is using one package which is using > timers, and another package which is using threads. Which of course > happens all the time without anyone choosing to do it. > > For example, one package might add a find-file-hook which starts a > subprocess, then another package might add a find-file-hook which starts > a thread. Then when the two hooks run in succession, it would cause > this problem. It's possible that we should have some guidelines for such situations. But this is way far in the future, from where I stand: right now, taking some processing, which works single-threaded and making it run from a separate thread doesn't work well, and we should first make sure that's solved. > > However, having a function that starts a process, but doesn't process > > its output, and another function that doesn't start any processes, but > > does accept output from subprocesses, is an unusual thing to do. > > Ah, I guess you're referring to the explicit accept-process-output call. > I think that was a confusing part of my example, because it was not > necessary to cause the issue. > > Here's a more refined example: > > ;; Package 1 (perhaps run in a find-file-hook) > (run-at-time .3 nil #'async-shell-command "sleep 1 && echo foobar && sleep inf") > ;; Package 2 (perhaps run in a find-file-hook) > (make-thread > (lambda () > (sit-for 1) > (thread-join (make-thread (lambda () (while t (sit-for 1))))))) > > The shell command started by package 1 will sometimes hang forever > without producing output. I believe this is because of that issue with status_notify. At least, we should fix that before we revisit the above and see if anything else needs to be fixed there. > > What I have in mind is a different case, which I think is much more > > common, at least at this stage of using Lisp thread in Emacs. It's a > > case where one takes a single-threaded Lisp program, and runs it from > > a separate thread so as to avoid blocking the Emacs's main thread. In > > that case, the same thread will both start the process and expect to > > be able to process its output (because that's how single-threaded Lisp > > programs work), and therefore having the process locked by default > > lets such code work as expected when it is run from a thread. > > Especially if you take several such programs, each with its own > > subprocess, and let them all run from several different threads at the > > same time. > > Yes, I definitely want that real-world case to work right. I agree that > that is a very important case. But I think it already works right with > processes not locked to threads for any non-buggy program. That's not my experience. If random threads get return values from accept-process-output, you can easily have a thread whose accept-process-output call never returns until timeout, because the output was already read by another thread. > (I personally have written or used lots of code like that with threads, > and the fact that processes were not fully locked to threads did not > cause problems. And locking them does cause problems? > For example, a program like this will work correctly even in a thread: > > (let ((proc (make-process ...))) > (accept-process-output proc)) > > This would run the filter functions in the same thread, because output > from PROC can't be read by another thread until we do a thread switch, > which will only happen when we call accept-process-output. What matters is which thread gets first to the pselect call. That's unpredictable, because it's racy. > If the program was instead something like: > > (let ((proc (make-process ...))) > (sit-for 1) > (accept-process-output proc)) > > then the (accept-process-output proc) might block because the sit-for > can thread switch. But this program is already buggy, since sit-for > runs wait_reading_process_output which could read the output from PROC. A program can easily call sit-for indirectly, because sit-for is called all over the place in Emacs. This is why locking processes is better: it makes the program more predictable.
bug-gnu-emacs <at> gnu.org
:bug#79333
; Package emacs
.
(Fri, 29 Aug 2025 16:08:01 GMT) Full text and rfc822 format available.Message #32 received at 79333 <at> debbugs.gnu.org (full text, mbox):
From: Spencer Baugh <sbaugh <at> janestreet.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: Dmitry Gutov <dmitry <at> gutov.dev>, 79333 <at> debbugs.gnu.org Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to threads Date: Fri, 29 Aug 2025 12:06:58 -0400
[Message part 1 (text/plain, inline)]
On Fri, Aug 29, 2025, 11:53 AM Eli Zaretskii <eliz <at> gnu.org> wrote: > > From: Spencer Baugh <sbaugh <at> janestreet.com> > > Cc: dmitry <at> gutov.dev, 79333 <at> debbugs.gnu.org > > Date: Fri, 29 Aug 2025 11:20:32 -0400 > > > > Eli Zaretskii <eliz <at> gnu.org> writes: > > > > > If you are saying that two arbitrary independently-written pieces of > > > code can get in trouble if they are lumped together to run by the same > > > Lisp program in two separate threads, then I agree. > > > > I guess that's what I'm saying. But the Lisp program here is just > > "Emacs". This combination of two independent pieces of code just > > automatically happens when users is using one package which is using > > timers, and another package which is using threads. Which of course > > happens all the time without anyone choosing to do it. > > > > For example, one package might add a find-file-hook which starts a > > subprocess, then another package might add a find-file-hook which starts > > a thread. Then when the two hooks run in succession, it would cause > > this problem. > > It's possible that we should have some guidelines for such situations. > But this is way far in the future, from where I stand: right now, > taking some processing, which works single-threaded and making it run > from a separate thread doesn't work well, and we should first make > sure that's solved. > What would the guidelines be? I don't believe there's any way to fix this problem other than by unlocking every process you create. > > However, having a function that starts a process, but doesn't process > > > its output, and another function that doesn't start any processes, but > > > does accept output from subprocesses, is an unusual thing to do. > > > > Ah, I guess you're referring to the explicit accept-process-output call. > > I think that was a confusing part of my example, because it was not > > necessary to cause the issue. > > > > Here's a more refined example: > > > > ;; Package 1 (perhaps run in a find-file-hook) > > (run-at-time .3 nil #'async-shell-command "sleep 1 && echo foobar && > sleep inf") > > ;; Package 2 (perhaps run in a find-file-hook) > > (make-thread > > (lambda () > > (sit-for 1) > > (thread-join (make-thread (lambda () (while t (sit-for 1))))))) > > > > The shell command started by package 1 will sometimes hang forever > > without producing output. > > I believe this is because of that issue with status_notify. At least, > we should fix that before we revisit the above and see if anything > else needs to be fixed there. > This bug still happens even with my initial fixes for the status_notify issue. But sure, we can fix that first and then come back to this one. (I want to make sure we don't release Emacs 32 with the change that I believe breaks existing thread programs, but as long as we resolve the issues before then, I'm in no rush. I've just reverted the change at my site anyway) > > What I have in mind is a different case, which I think is much more > > > common, at least at this stage of using Lisp thread in Emacs. It's a > > > case where one takes a single-threaded Lisp program, and runs it from > > > a separate thread so as to avoid blocking the Emacs's main thread. In > > > that case, the same thread will both start the process and expect to > > > be able to process its output (because that's how single-threaded Lisp > > > programs work), and therefore having the process locked by default > > > lets such code work as expected when it is run from a thread. > > > Especially if you take several such programs, each with its own > > > subprocess, and let them all run from several different threads at the > > > same time. > > > > Yes, I definitely want that real-world case to work right. I agree that > > that is a very important case. But I think it already works right with > > processes not locked to threads for any non-buggy program. > > That's not my experience. If random threads get return values from > accept-process-output, you can easily have a thread whose > accept-process-output call never returns until timeout, because the > output was already read by another thread. > I know you don't have much time to work on this, but it would really help if you could give a concrete example program that demonstrates this. > (I personally have written or used lots of code like that with threads, > > and the fact that processes were not fully locked to threads did not > > cause problems. > > And locking them does cause problems? > Yes. Such as in the example I was describing above. > For example, a program like this will work correctly even in a thread: > > > > (let ((proc (make-process ...))) > > (accept-process-output proc)) > > > > This would run the filter functions in the same thread, because output > > from PROC can't be read by another thread until we do a thread switch, > > which will only happen when we call accept-process-output. > > What matters is which thread gets first to the pselect call. That's > unpredictable, because it's racy. > It is not racy in this example. Even without locking. > If the program was instead something like: > > > > (let ((proc (make-process ...))) > > (sit-for 1) > > (accept-process-output proc)) > > > > then the (accept-process-output proc) might block because the sit-for > > can thread switch. But this program is already buggy, since sit-for > > runs wait_reading_process_output which could read the output from PROC. > > A program can easily call sit-for indirectly, because sit-for is > called all over the place in Emacs. > That's my point. This second example program is buggy whether threads are used or not. >
[Message part 2 (text/html, inline)]
bug-gnu-emacs <at> gnu.org
:bug#79333
; Package emacs
.
(Mon, 01 Sep 2025 01:25:01 GMT) Full text and rfc822 format available.Message #35 received at 79333 <at> debbugs.gnu.org (full text, mbox):
From: Dmitry Gutov <dmitry <at> gutov.dev> To: Spencer Baugh <sbaugh <at> janestreet.com>, Eli Zaretskii <eliz <at> gnu.org> Cc: 79333 <at> debbugs.gnu.org Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to threads Date: Mon, 1 Sep 2025 04:24:27 +0300
On 29/08/2025 19:06, Spencer Baugh wrote: > > If the program was instead something like: > > > > (let ((proc (make-process ...))) > > (sit-for 1) > > (accept-process-output proc)) > > > > then the (accept-process-output proc) might block because the sit-for > > can thread switch. But this program is already buggy, since sit-for > > runs wait_reading_process_output which could read the output from > PROC. > > A program can easily call sit-for indirectly, because sit-for is > called all over the place in Emacs. > > > That's my point. This second example program is buggy whether threads > are used or not. I wonder if we would consider comint-proc-query already problematic in this regard. It does: (comint-send-string proc str) ; send the query (accept-process-output proc) ; wait for some output and comint-send-string -> process-send-string -> send_process, which has a 'wait_reading_process_output' call inside. Is it at least theoretically possible that the latter call consumes the output from the process, making the subsequent accept-process-output call in the function hang?
bug-gnu-emacs <at> gnu.org
:bug#79333
; Package emacs
.
(Mon, 01 Sep 2025 15:00:02 GMT) Full text and rfc822 format available.Message #38 received at 79333 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Dmitry Gutov <dmitry <at> gutov.dev> Cc: sbaugh <at> janestreet.com, 79333 <at> debbugs.gnu.org Subject: Re: bug#79333: 31.0.50; Processes (still) aren't actually locked to threads Date: Mon, 01 Sep 2025 17:59:06 +0300
> Date: Mon, 1 Sep 2025 04:24:27 +0300 > Cc: 79333 <at> debbugs.gnu.org > From: Dmitry Gutov <dmitry <at> gutov.dev> > > On 29/08/2025 19:06, Spencer Baugh wrote: > > > If the program was instead something like: > > > > > > (let ((proc (make-process ...))) > > > (sit-for 1) > > > (accept-process-output proc)) > > > > > > then the (accept-process-output proc) might block because the sit-for > > > can thread switch. But this program is already buggy, since sit-for > > > runs wait_reading_process_output which could read the output from > > PROC. > > > > A program can easily call sit-for indirectly, because sit-for is > > called all over the place in Emacs. > > > > > > That's my point. This second example program is buggy whether threads > > are used or not. > > I wonder if we would consider comint-proc-query already problematic in > this regard. > > It does: > > (comint-send-string proc str) ; send the query > (accept-process-output proc) ; wait for some output > > and comint-send-string -> process-send-string -> send_process, which has > a 'wait_reading_process_output' call inside. Why do you think this could be problematic? > Is it at least theoretically possible that the latter call consumes the > output from the process, making the subsequent accept-process-output > call in the function hang? It shouldn't. send_process only calls wait_reading_process_output if it cannot write the whole string in one go, AFAIR, in which case the last part of the process's output should still be available to accept-process-output call.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.