GNU bug report logs - #17561
24.4.50; Emacs can forget processes

Previous Next

Package: emacs;

Reported by: Jorgen Schaefer <contact <at> jorgenschaefer.de>

Date: Fri, 23 May 2014 15:54:02 UTC

Severity: normal

Tags: moreinfo

Merged with 17337, 17628

Found in version 24.4.50

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #26 received at 17561 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jorgen Schaefer <forcer <at> forcix.cx>
Cc: 17561 <at> debbugs.gnu.org
Subject: Re: Emacs can forget processes
Date: Mon, 26 May 2014 16:58:45 -0700
Jorgen Schaefer wrote:
> the-bug.txt.gz ... shows the strace of a
> running Emacs session which starts to exhibit the bug, so we now have a
> trace that includes the time when the bug actually happens. It shows me
> starting Python subprocesses (elpy RPC processes) and killing them, in
> an attempt to reproduce the problem. Eventually, this succeeds, and
> Emacs goes into the loop reading from fd 15.

Thanks for that trace; it's helpful.  A couple of things.  First, what 
platform are you running on, exactly?  I'm running on an AMD Phenom II 
X4 910e (according to /proc/cpuinfo), and using Fedora 20 x86-64 (Linux 
kernel 3.14.4-200.fc20.x86_64, glibc 2.18), and compiling with GCC 4.9.0.

Second, looking at the-bug.txt I'm suspicious about the interaction 
between vfork, pthread_sigmask (aka rt_sigprocmask) and rt_sigreturn. 
Could you please try building Emacs with fork rather than vfork and see 
whether that fixes the problem?  If it fixes things, great; if not I'd 
like to see another strace of the bug.  To use fork instead of vfork, 
you can run "./configure ac_cv_func_fork_works=no", or (faster but 
flimsier) edit src/config.h to add "#define fork vfork".

Here are some details as to why I think vfork is related to the problem.

You see a bug on line 7717 of the trace in the-bug.txt, where read(15, 
...) returns 0 and Emacs just keeps trying.  I see bugs before that, 
including:

* On trace line 1043, file descriptor 15 is created, as part of the 
preparation for vforking and execing subprocess 27012.  So far so good.

* On trace line 1048 vfork succeeds.  Still good.

* On trace line 1049 the parent process continues, and executes another 
system call!  The child process has neither execed nor terminated.  This 
is incorrect; if vfork is called, the Linux man page says "the calling 
thread is suspended until the child terminates (either normally, by 
calling _exit(2), or abnormally, after delivery of a fatal signal), or 
it makes a call to execve(2)".  I'm reasonably sure that Emacs relies on 
this and will fail badly if the parent and child can stomp on the same 
memory in parallel.  (Another possibility, alas, is that strace is buggy 
and its output unreliable....)

* On trace line 1049, the signal mask is cleared except for QUIT, ALRM, 
PROF.

* On trace line 5598, an INT is delivered just as the INT bit is removed 
from the signal mask.  This strongly suggests that the INT bit was set 
in the signal mask -- but that bit was cleared on line 1049, so what 
gives?  This suggests that the signal mask is messed up somehow.  One 
possibility is that the buggy vfork is doing it somehow -- perhaps the 
signal mask is part of the shared memory?

* On trace line 7706, the subprocess exits but no CHLD signal is 
delivered to Emacs.  Again, this suggests that the signal mask is messed 
up.  This last problem most likely leads to the symptoms you're 
observing: Emacs isn't told that a child died, so it doesn't check for it.

FYI when I try your test, I get something like the following strace 
output near the vfork.  This is the correct vfork behavior, i.e., the 
parent is suspended until the child execs:

6690  15:51:23.093780 rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0 
<0.000025>
6690  15:51:23.093873 vfork( <unfinished ...>
6692  15:51:23.094078 setsid()          = 6692 <0.000070>
6692  15:51:23.094202 open("/dev/tty", O_RDWR|O_CLOEXEC) = -1 ENXIO (No 
such device or address) <0.000048>
6692  15:51:23.094310 open("/dev/pts/9", O_RDWR|O_CLOEXEC) = 9 <0.000041>
6692  15:51:23.094409 rt_sigaction(SIGINT, {SIG_DFL, [INT], 
SA_RESTORER|SA_RESTART, 0x7f9de6e2fcb0}, {0x4ea440, [INT QUIT ALRM CHLD 
PROF WINCH IO], SA_RESTORER, 0x7f9de73d7750}, 8) = 0 <0.000026>
6692  15:51:23.094507 rt_sigaction(SIGQUIT, {SIG_DFL, [QUIT], 
SA_RESTORER|SA_RESTART, 0x7f9de6e2fcb0}, {0x4ea440, [INT QUIT ALRM CHLD 
PROF WINCH IO], SA_RESTORER, 0x7f9de73d7750}, 8) = 0 <0.000026>
6692  15:51:23.094602 rt_sigaction(SIGPROF, {SIG_DFL, [PROF], 
SA_RESTORER|SA_RESTART, 0x7f9de6e2fcb0}, {SIG_IGN, [PROF], 
SA_RESTORER|SA_RESTART, 0x7f9de6e2fcb0}, 8) = 0 <0.000025>
6692  15:51:23.094694 rt_sigaction(SIGPIPE, {SIG_DFL, [PIPE], 
SA_RESTORER|SA_RESTART, 0x7f9de6e2fcb0}, {SIG_IGN, [PIPE], 
SA_RESTORER|SA_RESTART, 0x7f9de6e2fcb0}, 8) = 0 <0.000025>
6692  15:51:23.094793 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 
<0.000026>
6692  15:51:23.094877 ioctl(9, SNDCTL_TMR_TIMEBASE or 
SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B38400 opost isig icanon echo 
...}) = 0 <0.000029>
6692  15:51:23.094963 ioctl(9, SNDCTL_TMR_STOP or 
SNDRV_TIMER_IOCTL_GINFO or TCSETSW, {B38400 opost isig icanon -echo 
...}) = 0 <0.000030>
6692  15:51:23.095080 ioctl(9, SNDCTL_TMR_TIMEBASE or 
SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B38400 opost isig icanon -echo 
...}) = 0 <0.000029>
6692  15:51:23.095170 getpid()          = 6692 <0.000025>
6692  15:51:23.095258 chdir("/home/eggert/src/gnu/emacs/emacs-24-sc") = 
0 <0.000038>
6692  15:51:23.095466 dup2(9, 0)        = 0 <0.000028>
6692  15:51:23.095545 dup2(9, 1)        = 1 <0.000027>
6692  15:51:23.095621 dup2(9, 2)        = 2 <0.000025>
6692  15:51:23.095696 setpgid(0, 0)     = -1 EPERM (Operation not 
permitted) <0.000026>
6692  15:51:23.095776 ioctl(0, SNDRV_TIMER_IOCTL_SELECT or TIOCSPGRP, 
[6692]) = 0 <0.000027>
6692  15:51:23.095872 
execve("/home/eggert/opt/Linux-x86_64/merged/bin/cat", 
["/home/eggert/opt/Linux-x86_64/me"...], [/* 100 vars */] <unfinished ...>
6690  15:51:23.096361 <... vfork resumed> ) = 6692 <0.002462>
6690  15:51:23.096440 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 
<0.000028>




This bug report was last modified 10 years and 351 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.