On Fri, Aug 29, 2025, 11:53 AM Eli Zaretskii <eliz@gnu.org> wrote:
> From: Spencer Baugh <sbaugh@janestreet.com>
> Cc: dmitry@gutov.dev79333@debbugs.gnu.org
> Date: Fri, 29 Aug 2025 11:20:32 -0400
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > If you are saying that two arbitrary independently-written pieces of
> > code can get in trouble if they are lumped together to run by the same
> > Lisp program in two separate threads, then I agree.
>
> I guess that's what I'm saying.  But the Lisp program here is just
> "Emacs".  This combination of two independent pieces of code just
> automatically happens when users is using one package which is using
> timers, and another package which is using threads.  Which of course
> happens all the time without anyone choosing to do it.
>
> For example, one package might add a find-file-hook which starts a
> subprocess, then another package might add a find-file-hook which starts
> a thread.  Then when the two hooks run in succession, it would cause
> this problem.

It's possible that we should have some guidelines for such situations.
But this is way far in the future, from where I stand: right now,
taking some processing, which works single-threaded and making it run
from a separate thread doesn't work well, and we should first make
sure that's solved.

What would the guidelines be?  I don't believe there's any way to fix this problem other than by unlocking every process you create.

> > However, having a function that starts a process, but doesn't process
> > its output, and another function that doesn't start any processes, but
> > does accept output from subprocesses, is an unusual thing to do.
>
> Ah, I guess you're referring to the explicit accept-process-output call.
> I think that was a confusing part of my example, because it was not
> necessary to cause the issue.
>
> Here's a more refined example:
>
> ;; Package 1 (perhaps run in a find-file-hook)
> (run-at-time .3 nil #'async-shell-command "sleep 1 && echo foobar && sleep inf")
> ;; Package 2 (perhaps run in a find-file-hook)
> (make-thread
>  (lambda ()
>    (sit-for 1)
>    (thread-join (make-thread (lambda () (while t (sit-for 1)))))))
>
> The shell command started by package 1 will sometimes hang forever
> without producing output.

I believe this is because of that issue with status_notify.  At least,
we should fix that before we revisit the above and see if anything
else needs to be fixed there.

This bug still happens even with my initial fixes for the status_notify issue.  But sure, we can fix that first and then come back to this one.

(I want to make sure we don't release Emacs 32 with the change that I believe breaks existing thread programs, but as long as we resolve the issues before then, I'm in no rush.  I've just reverted the change at my site anyway)

> > What I have in mind is a different case, which I think is much more
> > common, at least at this stage of using Lisp thread in Emacs.  It's a
> > case where one takes a single-threaded Lisp program, and runs it from
> > a separate thread so as to avoid blocking the Emacs's main thread.  In
> > that case, the same thread will both start the process and expect to
> > be able to process its output (because that's how single-threaded Lisp
> > programs work), and therefore having the process locked by default
> > lets such code work as expected when it is run from a thread.
> > Especially if you take several such programs, each with its own
> > subprocess, and let them all run from several different threads at the
> > same time.
>
> Yes, I definitely want that real-world case to work right.  I agree that
> that is a very important case.  But I think it already works right with
> processes not locked to threads for any non-buggy program.

That's not my experience.  If random threads get return values from
accept-process-output, you can easily have a thread whose
accept-process-output call never returns until timeout, because the
output was already read by another thread.

I know you don't have much time to work on this, but it would really help if you could give a concrete example program that demonstrates this.

> (I personally have written or used lots of code like that with threads,
> and the fact that processes were not fully locked to threads did not
> cause problems.

And locking them does cause problems?

Yes.  Such as in the example I was describing above.

> For example, a program like this will work correctly even in a thread:
>
> (let ((proc (make-process ...)))
>   (accept-process-output proc))
>
> This would run the filter functions in the same thread, because output
> from PROC can't be read by another thread until we do a thread switch,
> which will only happen when we call accept-process-output.

What matters is which thread gets first to the pselect call.  That's
unpredictable, because it's racy.

It is not racy in this example.  Even without locking.

> If the program was instead something like:
>
> (let ((proc (make-process ...)))
>   (sit-for 1)
>   (accept-process-output proc))
>
> then the (accept-process-output proc) might block because the sit-for
> can thread switch.  But this program is already buggy, since sit-for
> runs wait_reading_process_output which could read the output from PROC.

A program can easily call sit-for indirectly, because sit-for is
called all over the place in Emacs.

That's my point.  This second example program is buggy whether threads are used or not.