GNU bug report logs - #56487
xgselect race condition leading to abort when USE_GTK not defined

Previous Next

Package: emacs;

Reported by: Tom Gillespie <tgbugs <at> gmail.com>

Date: Sun, 10 Jul 2022 21:07:02 UTC

Severity: normal

Tags: moreinfo, patch

Full log


View this message in rfc822 format

From: Po Lu <luangruo <at> yahoo.com>
To: Tom Gillespie <tgbugs <at> gmail.com>
Cc: 56487 <at> debbugs.gnu.org
Subject: bug#56487: xgselect race condition leading to abort when USE_GTK not defined
Date: Mon, 11 Jul 2022 18:16:36 +0800
Tom Gillespie <tgbugs <at> gmail.com> writes:

>> Thanks.  Why did the code previously under !USE_GTK have to be removed?
>
> When the !USE_GTK code is used an abort in glib will happen
> stochastically due to an out-of-sync call to release_select_lock
> in thread.c. This happens on my system somewhere between
> approximately 1 in 10 and 1 in 10000 times that the test file
> is run.
>
> As far as I can tell from testing there is no difference in behavior
> between the USE_GTK and !USE_GTK code. Also, as far as
> I can tell from reading, the behavior should be almost identical.
> The only addition is to check for already_has_events before
> calling thread_select, which may be enough to shift the timing
> to prevent a race.
>
> I have not been able to figure out what the actual underlying
> cause is (I tried). All I can say for sure is that there is
> something that calls into g_main_context_release and
> context->owner_count has a negative overflow to 4294967295.
>
> I do not think that it is because something somehow sneaks
> in between the calls to the atomics in acquire_select_lock
> and relese_select_lock. If you would like I can send along
> a couple of patches that include changes I made to try to
> see what is going on.
>
> The real underlying issue would seem to be that there is a
> missing lock somewhere and that the use of atomics is not
> sufficient, but I could be wrong about that.

So I suggest that someone find that problem instead.  Any attempt to
"fix" a race condition by moving code around so that the timings are
slightly different is simply deluding oneself.

Thanks.




This bug report was last modified 115 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.