GNU bug report logs - #79367
31.0.50; magit-commit sometimes doesn't work if diff-hl-update-async is t

Previous Next

Package: emacs;

Reported by: Zhengyi Fu <i <at> fuzy.me>

Date: Tue, 2 Sep 2025 06:21:01 UTC

Severity: normal

Found in version 31.0.50

Full log


Message #95 received at 79367 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: i <at> fuzy.me, dmitry <at> gutov.dev, 79367 <at> debbugs.gnu.org
Subject: Re: bug#79367: 31.0.50; magit-commit sometimes doesn't work if
 diff-hl-update-async is t
Date: Wed, 03 Sep 2025 14:04:42 -0400
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Cc: Dmitry Gutov <dmitry <at> gutov.dev>,  i <at> fuzy.me,  79367 <at> debbugs.gnu.org
>> Date: Tue, 02 Sep 2025 15:33:46 -0400
>> 
>> Eli Zaretskii <eliz <at> gnu.org> writes:
>> 
>> > I can code the proposed fix in a few minutes, if there's agreement to
>> > what I proposed to do.  I'm still uncertain what I proposed is
>> > agreed-upon.  So let me reiterate that:
>> >
>> >   . if the process on behalf of which we called
>> >     server_accept_connection was not locked to some thread, undo what
>> >     make_process did to the new process when it called pset_thread
>> >   . otherwise, call set_proc_thread to lock the new process to the
>> >     same thread as the one which caused this call
>> 
>> Whatever you do, please post your change for review rather than just
>> pushing it.
>
> Will do.
>
>> >> I don't want us to spend a lot of time arguing revert-or-no-revert now, 
>> >> but could you, Eli, try to decide on a full example of a buggy scenario 
>> >> which is really solved by thread locking being?
>> >
>> > I already described that in so many words.
>> 
>> Can you send that full description again?  Perhaps I can translate it
>> from words into code for you.
>
> Look at my posts as part of discussing bug#79201.  For example:

OK, attempting to excerpt the parts where you describe what could go
wrong when a thread is unlocked:

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=79201#88

> Wouldn't we want the thread to be able to read the output
> from its process after the wait?  Without marking the descriptors with
> the thread, that output could have been read by some other thread
> during the wait, and then our thread will hang forever if it calls
> accept-process-output after the wait ended -- a much more serious
> problem, and in a program that has no bug per se.

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=79201#94

> > > An output from a process could have been read by a thread other than
> > > the one that started the process, thus "starving" the thread that
> > > started the process from reading its output.
> > 
> > That has been possible, and common, since threads were originally added.
> 
> And caused strange bugs, whereby a thread that started a process would
> hang in accept-process-output because some other thread inadvertently
> read the output from that process.

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=79201#100

> If you study the code in wait_reading_process_output, you will
> immediately see what kind of godawful mess will that cause with
> reading output from several processes started by several threads.
> 
> wait_reading_process_output was written under an implicit assumption
> that there's only one thread that handles all the subprocesses.
> Therefore, it consistently checks all of the I/O descriptors for
> available input from the subprocesses, and consumes that as soon as
> it's available.  That cannot possibly work reliably, let alone
> predictably, when several threads are involved.  This is so obvious
> from reading the code that I'm astonished I need to even explain it.

Note that we need to make sure wait_reading_process_output works
reliably in this case for the sake of processes which aren't locked to
threads.  So defaulting to locking processes to threads doesn't actually
help with this.

>   https://debbugs.gnu.org/cgi/bugreport.cgi?bug=79201#106

> Moreover, IME many weird problems and buggy behavior in Lisp programs
> using threads were caused by the wrong thread inadvertently reading
> output of a process, which then left the thread that waits for that
> process's output stuck waiting forever.  If anything, we should make
> process locking to its thread stronger, not weaker, if we want
> relatively simple threaded programs to work reliably.

So, to summarize, it sounds like your issue with not locking processes
to threads is:

  Without locking processes to the creating thread, a process's output
  can be read by some other thread during the wait, and then our thread
  will hang forever if it calls accept-process-output after the wait
  ended.

This is unfortunately not detailed enough to translate into code.  I've
never seen a real-world program which would be vulnerable to this
problem, as described here.  I can make up toy programs which meet this
description which would hang, but I'm not sure they're actually what
you're concerned about.  Especially because they aren't anything like
real-world programs.

Could you give some more details about the scenario, so I can try to
write a test demonstrating the problem?  Or could you point to a
real-world program which is vulnerable to this problem, so I can turn it
into a test?





This bug report was last modified 7 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.