Package: emacs;
Reported by: Taylor R Campbell <campbell+gnu-emacs <at> mumble.net>
Date: Mon, 12 Sep 2011 20:28:02 UTC
Severity: normal
Tags: patch
Found in version 23.2
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Message #13 received at 9488 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Paul Eggert <eggert <at> cs.ucla.edu> Cc: 9488 <at> debbugs.gnu.org, hanche <at> math.ntnu.no, 12980 <at> debbugs.gnu.org Subject: Re: 24.3.50; Zombie process left after call-process Date: Tue, 27 Nov 2012 18:00:56 +0200
> Date: Mon, 26 Nov 2012 14:38:41 -0800 > From: Paul Eggert <eggert <at> cs.ucla.edu> > CC: 12980 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>, > 9488 <at> debbugs.gnu.org > > Harald, thanks for the bug report. This part of Emacs is a bit > messy, but I hacked away at it and came up with > the attached proposed patch. > > This patch touches some code for Microsoft platforms > so I'm CC'ing this to Eli. Thanks. > +/* Return true if CHILD is reapable. CHILD must be the process ID of > + a child process. As a side effect this function may reap CHILD, > + discarding its status, and therefore report that CHILD is not > + reapable. > + > + This function is intended for use after the caller was interrupted > + while trying to reap CHILD, and where the caller later tests > + whether CHILD was in fact reaped. The caller should not create > + another child process (via vfork, say) between the earlier attempt > + to reap CHILD and now, as that would cause a race if the newly > + created child process happened to have CHILD as its process-ID. */ I'm confused by this commentary, perhaps because it doesn't explain what is the definition of "reapable". The original naive interpretation (i.e. the child exited and its status is ready to be accessed) seems to contradict the latter part of the commentary, and also the fact that the function will return false both if waitpid returns a positive and a negative value. I would suggest to clarify this. > #ifdef MSDOS /* MW, July 1993 */ > /* Note that on MSDOS `child_setup' actually returns the child process > - exit status, not its PID, so we assign it to `synch_process_retcode' > - below. */ > + exit status, not its PID, so assign it to status below. */ > pid = child_setup (filefd, outfilefd, fd_error, (char **) new_argv, > 0, current_dir); > - > - /* Record that the synchronous process exited and note its > - termination status. */ > - synch_process_alive = 0; > - synch_process_retcode = pid; > - if (synch_process_retcode < 0) /* means it couldn't be exec'ed */ > - { > - synchronize_system_messages_locale (); > - synch_process_death = strerror (errno); > - } > + status = pid; The feature, whereby this function (call-process) could return a description of errno, is hereby lost. That's quite nasty in the MSDOS case, where errno is the only way to indicate to the user what happened, because the value of status in case of failure is always -1, which is not quite descriptive. Can we please not drop this feature on the floor? > + if (0 < pid) > + { > + if (INTEGERP (buffer)) > + record_deleted_pid (pid); > + else > + { > +#ifdef MSDOS > + /* MSDOS needs different cleanup information. */ > + cleanup_info_tail = build_string (tempfile ? tempfile : ""); > +#else > + cleanup_info_tail = INTEGER_TO_CONS (pid); > +#endif > + record_unwind_protect (call_process_cleanup, > + Fcons (Fcurrent_buffer (), > + Fcons (INTEGER_TO_CONS (fd[0]), > + cleanup_info_tail))); > + } > + } > + > + unblock_child_signal (); > unblock_input (); > > #endif /* not WINDOWSNT */ > @@ -658,17 +705,6 @@ > /* Enable sending signal if user quits below. */ > call_process_exited = 0; > > -#if defined (MSDOS) > - /* MSDOS needs different cleanup information. */ > - cleanup_info_tail = build_string (tempfile ? tempfile : ""); > -#else > - cleanup_info_tail = INTEGER_TO_CONS (pid); > -#endif /* not MSDOS */ > - record_unwind_protect (call_process_cleanup, > - Fcons (Fcurrent_buffer (), > - Fcons (INTEGER_TO_CONS (fd[0]), > - cleanup_info_tail))); > - This is wrong for MSDOS, because the temporary file is created before attempting to launch the process, and so it needs to be cleaned up even if we failed to launch the child process. > - if (synch_process_termsig) > + if (WIFSIGNALED (status)) > { > const char *signame; > > synchronize_system_messages_locale (); > - signame = strsignal (synch_process_termsig); > + signame = strsignal (WTERMSIG (status)); > > if (signame == 0) > signame = "unknown"; > > - synch_process_death = signame; > + return code_convert_string_norecord (build_string (signame), > + Vlocale_coding_system, 0); > } > > - if (synch_process_death) > - return code_convert_string_norecord (build_string (synch_process_death), > - Vlocale_coding_system, 0); > - return make_number (synch_process_retcode); > + eassert (WIFEXITED (status)); > + return make_number (WEXITSTATUS (status)); This seem to assume that the only trouble in call-process could be from some signal. Why is that true? > static bool > -process_status_retrieved (pid_t desired, pid_t have, int *status) > +process_status_retrieved (pid_t desired, int *status, int options) > { > - if (have < 0) > + /* Invoke waitpid only with a known process ID; do not invoke > + waitpid with a nonpositive argument. Otherwise, Emacs might > + reap an unwanted process by mistake. For example, invoking > + waitpid (-1, ...) can mess up glib by reaping glib's subprocesses, > + so that another thread running glib won't find them. */ > + eassert (0 < desired); > + > + while (1) > { > - /* Invoke waitpid only with a known process ID; do not invoke > - waitpid with a nonpositive argument. Otherwise, Emacs might > - reap an unwanted process by mistake. For example, invoking > - waitpid (-1, ...) can mess up glib by reaping glib's subprocesses, > - so that another thread running glib won't find them. */ > - do > - have = waitpid (desired, status, WNOHANG | WUNTRACED); > - while (have < 0 && errno == EINTR); > + pid_t pid = waitpid (desired, status, WNOHANG | options); > + if (0 <= pid) > + return 0 < pid; > + if (errno == ECHILD) > + return 0; > + eassert (errno == EINTR); Why is this eassert a good idea? We don't need to enforce the Posix spec at a price of crashing Emacs on users, do we? What bad things can happen if you see some other errno here? > --- src/w32proc.c 2012-11-24 01:57:09 +0000 > +++ src/w32proc.c 2012-11-26 22:06:24 +0000 > @@ -1273,34 +1273,7 @@ > DebPrint (("Wait signaled with process pid %d\n", cp->pid)); > #endif > > - if (status) > - { > - *status = retval; > - } > - else if (synch_process_alive) > - { > - synch_process_alive = 0; > - > - /* Report the status of the synchronous process. */ > - if (WIFEXITED (retval)) > - synch_process_retcode = WEXITSTATUS (retval); > - else if (WIFSIGNALED (retval)) > - { > - int code = WTERMSIG (retval); > - const char *signame; > - > - synchronize_system_messages_locale (); > - signame = strsignal (code); > - > - if (signame == 0) > - signame = "unknown"; > - > - synch_process_death = signame; > - } > - > - reap_subprocess (cp); > - } > - > + *status = retval; > reap_subprocess (cp); The condition testing that status is non-NULL must stay, because that argument can be NULL, even if Emacs currently never uses that option. I didn't see any other obvious problems. However, given the complexity of handling this, and the extent of changes in this changeset, it would be nice to have somewhere a commentary with an overview of how termination of child processes is handled. Reverse engineering that from the code is not really trivial. Would you please consider writing such an overview? TIA
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.