GNU bug report logs -
#8788
Weird testsuite failure on NetBSD (parallel tests, background processes)
Previous Next
Full log
View this message in rfc822 format
Reference:
<http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8788>
[Adding bug-autoconf in CC]
On Thursday 02 June 2011, Stefano Lattarini wrote:
> Hello automakers.
>
> While teststing the `testsuite-work' branch on NetBSD 5, I've encountered
> a weird failure in the test `parallel-tests3.test', which actually caused
> the whole testsuite to crash (!) due to a stray SIGTERM.
>
> [SNIP]
>
> Any idea of what's going on?
>
Ah ah, got it! (I think). The failure is due to an interaction between some
features of GNU make and some (mis)features the NetBSD Korn Shell. Let's see
the details.
[1] The Korn shell gets selected to run the Makefile recipes
-------------------------------------------------------------
On NetBSD, an autoconf-generated configure script will select /bin/ksh as
the $(SHELL) used to execute the Makefile recipes:
$ grep 'SHELL.*=' tests/parallel-tests3.dir/*/config.log
tests/parallel-tests3.dir/parallel/config.log:SHELL='/bin/ksh'
tests/parallel-tests3.dir/serial/config.log:SHELL='/bin/ksh'
[2] The Korn shell has quirks w.r.t. signal handling
----------------------------------------------------
The NetBSD's Korn Shell is one of those shells which try to "propagate"
terminating signals, as explained in the ``Signal Handling'' node of the
(as of today yet unreleased) bleeding-edge autoconf manual; see also these
relevant links:
<http://lists.gnu.org/archive/html/autoconf-patches/2011-09/msg00005.html>
<https://lists.gnu.org/archive/html/bug-autoconf/2011-09/msg00004.html>
<http://mail.opensolaris.org/pipermail/ksh93-integration-discuss/2009-February/004121.html>
And in fact, NetBSD's Korn Shell even seems to propagate a fatal signal
it has received *to all its process group*! Let's see a few examples:
$ /bin/sh -c '/bin/sh -c "kill -15 \$\$"; echo alive'
[1] Terminated /bin/sh -c "kill...
alive
$ /bin/ksh -c '/bin/sh -c "kill -15 \$\$"; echo alive'
Terminated
alive
# ksh apparently terminate its parent
$ /bin/sh -c '/bin/ksh -c "kill -15 \$\$"; echo alive'
Terminated
$ /bin/ksh -c '/bin/ksh -c "kill -15 \$\$"; echo alive'
Terminated
Terminated
Just to be sure, let's try to trace the systems calls made by the Korn
shell:
$ ktrace /bin/sh -c '
> echo parent: $$
> ktrace -a /bin/ksh -c "echo child: \$\$; kill -15 \$\$"
> echo alive
'
parent: 20429
child: 4829
Terminated
$ kdump ktrace.out | grep -i sig | grep -v __sig
4829 1 ksh CALL kill(0x12dd, SIGTERM)
4829 1 ksh PSIG SIGTERM caught handler=0x420810 mask=(): code=SI_USER sent by pid=4829, uid=1242)
4829 1 ksh CALL kill(0, SIGTERM)
4829 1 ksh PSIG SIGTERM SIG_DFL: code=SI_USER sent by pid=4829, uid=1242)
20429 1 sh PSIG SIGTERM SIG_DFL: code=SI_USER sent by pid=4829, uid=1242)
(Note that `0x12dd' is decimal 4829).
[3] GNU make propagates signal to the running recipes
-----------------------------------------------------
If GNU make receives a terminating signal while it's updating some target(s), it
propagates that signal to the currently-executing recipe(s):
$ cat Makefile
all: 1 2
1 2:
@trap 'echo got SIGTERM; exit 77' 15; while :; do :; done
$ gmake -j2 &
[1] 5980
$ kill $!
got SIGTERM
got SIGTERM
gmake: *** [2] Error 77
gmake: *** [1] Error 77
(FWIW, I find this to be an helpful and rational behaviour).
[4] Putting it all together
---------------------------
So here is my diagnosis of what happens when `parallel-tests3.test' is
run on NetBSD with GNU make:
1) various setup/preparation commands get executed in this script; the
Korn shell gets selected to run the recipe of the Makefile;
2) "make -j1 check" is launched in the background:
cd serial
$MAKE -j1 check &
3) some more commands get run, and they concludes before the background
make process launched in (2) has concluded;
4) the shell executing `parallel-tests3.test' explicitly kills the still
running background "make" process with a SIGTERM:
cd ..
kill $!
5) GNU make "relays" the SIGTERM to the korn shell executing the still
running recipe(s);
6) in turn, the korn shell relays the SIGTERM to all processes in its
process group;
7) this includes the top-level make process that is running the automake
testsuite (if any); which explains the crash that is the object of
this bug report.
I'm not 100% positive that point (7) is completely correct, but I'm running
out of time now, so I'll settle for this explanation; kudos to anyone who
can give some confirmation about the correctness of point (7)!
-*-*-*-
Now, the right fix for the bug is *not* to work around this behaviour
of the Korn shell; rather, we should fix the suspicious logic of the
`parallel-tests3.test' script, which was also causing a testsuite hanging
on FreeBSD. Patch coming up shortly.
And it goes without saying that this horrendous NetBSD's Korn Shell
incompatibility should be documented in the autoconf manual; I will
maybe give it a shot in the next days if nobody beats me.
Regards,
Stefano
This bug report was last modified 13 years and 188 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.