GNU bug report logs - #8788
Weird testsuite failure on NetBSD (parallel tests, background processes)

Previous Next

Package: automake;

Reported by: Stefano Lattarini <stefano.lattarini <at> gmail.com>

Date: Thu, 2 Jun 2011 16:46:02 UTC

Severity: normal

Tags: moreinfo, patch

Merged with 10447

Done: Stefano Lattarini <stefano.lattarini <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Stefano Lattarini <stefano.lattarini <at> gmail.com>
To: 8788 <at> debbugs.gnu.org
Cc: bug-autoconf <at> gnu.org
Subject: bug#8788: Weird testsuite failure on NetBSD (parallel tests, background processes)
Date: Tue, 18 Oct 2011 23:16:25 +0200
Reference:
 <http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8788>

[Adding bug-autoconf in CC]

On Thursday 02 June 2011, Stefano Lattarini wrote:
> Hello automakers.
> 
> While teststing the `testsuite-work' branch on NetBSD 5, I've encountered
> a weird failure in the test `parallel-tests3.test', which actually caused
> the whole testsuite to crash (!) due to a stray SIGTERM.
> 
> [SNIP]
> 
> Any idea of what's going on?
> 
Ah ah, got it! (I think).  The failure is due to an interaction between some
features of GNU make and some (mis)features the NetBSD Korn Shell.  Let's see
the details.

[1] The Korn shell gets selected to run the Makefile recipes
-------------------------------------------------------------

On NetBSD, an autoconf-generated configure script will select /bin/ksh as
the $(SHELL) used to execute the Makefile recipes:
 
  $ grep 'SHELL.*=' tests/parallel-tests3.dir/*/config.log
  tests/parallel-tests3.dir/parallel/config.log:SHELL='/bin/ksh'
  tests/parallel-tests3.dir/serial/config.log:SHELL='/bin/ksh'

[2] The Korn shell has quirks w.r.t. signal handling
----------------------------------------------------

The NetBSD's Korn Shell is one of those shells which try to "propagate"
terminating signals, as explained in the ``Signal Handling'' node of the
(as of today yet unreleased) bleeding-edge autoconf manual; see also these
relevant links:

 <http://lists.gnu.org/archive/html/autoconf-patches/2011-09/msg00005.html>
 <https://lists.gnu.org/archive/html/bug-autoconf/2011-09/msg00004.html>
 <http://mail.opensolaris.org/pipermail/ksh93-integration-discuss/2009-February/004121.html>

And in fact, NetBSD's Korn Shell even seems to propagate a fatal signal
it has received *to all its process group*!  Let's see a few examples:

 $ /bin/sh -c '/bin/sh -c "kill -15 \$\$"; echo alive'
 [1]   Terminated              /bin/sh -c "kill...
 alive

 $ /bin/ksh -c '/bin/sh -c "kill -15 \$\$"; echo alive'
 Terminated 
 alive

 # ksh apparently terminate its parent
 $ /bin/sh -c '/bin/ksh -c "kill -15 \$\$"; echo alive'
 Terminated

 $ /bin/ksh -c '/bin/ksh -c "kill -15 \$\$"; echo alive'
 Terminated 
 Terminated

Just to be sure, let's try to trace the systems calls made by the Korn
shell:

  $ ktrace /bin/sh -c '
  > echo parent: $$
  > ktrace -a /bin/ksh -c "echo child: \$\$; kill -15 \$\$"
  > echo alive
  '
  parent: 20429
  child: 4829
  Terminated

  $ kdump ktrace.out | grep -i sig | grep -v __sig
   4829  1 ksh  CALL  kill(0x12dd, SIGTERM)
   4829  1 ksh  PSIG  SIGTERM caught handler=0x420810 mask=(): code=SI_USER sent by pid=4829, uid=1242)
   4829  1 ksh  CALL  kill(0, SIGTERM)
   4829  1 ksh  PSIG  SIGTERM SIG_DFL: code=SI_USER sent by pid=4829, uid=1242)
  20429  1 sh   PSIG  SIGTERM SIG_DFL: code=SI_USER sent by pid=4829, uid=1242)

(Note that `0x12dd' is decimal 4829).

[3] GNU make propagates signal to the running recipes
-----------------------------------------------------

If GNU make receives a terminating signal while it's updating some target(s), it
propagates that signal to the currently-executing recipe(s):

  $ cat Makefile 
  all: 1 2
  1 2:
       @trap 'echo got SIGTERM; exit 77' 15; while :; do :; done
  $ gmake -j2 &
  [1] 5980
  $ kill $!
  got SIGTERM
  got SIGTERM
  gmake: *** [2] Error 77
  gmake: *** [1] Error 77

(FWIW, I find this to be an helpful and rational behaviour).

[4] Putting it all together
---------------------------

So here is my diagnosis of what happens when `parallel-tests3.test' is
run on NetBSD with GNU make:

 1) various setup/preparation commands get executed in this script; the
    Korn shell gets selected to run the recipe of the Makefile;
 2) "make -j1 check" is launched in the background:
      cd serial
      $MAKE -j1 check &
 3) some more commands get run, and they concludes before the background
    make process launched in (2) has concluded;
 4) the shell executing `parallel-tests3.test' explicitly kills the still
    running background "make" process  with a SIGTERM:
      cd ..
      kill $!
 5) GNU make "relays" the SIGTERM to the korn shell executing the still
    running recipe(s);
 6) in turn, the korn shell relays the SIGTERM to all processes in its
    process group;
 7) this includes the top-level make process that is running the automake
    testsuite (if any); which explains the crash that is the object of
    this bug report.

I'm not 100% positive that point (7) is completely correct, but I'm running
out of time now, so I'll settle for this explanation; kudos to anyone who
can give some confirmation about the correctness of point (7)!

-*-*-*-

Now, the right fix for the bug is *not* to work around this behaviour
of the Korn shell; rather, we should fix the suspicious logic of the
`parallel-tests3.test' script, which was also causing a testsuite hanging
on FreeBSD.  Patch coming up shortly.

And it goes without saying that this horrendous NetBSD's Korn Shell
incompatibility should be documented in the autoconf manual; I will
maybe give it a shot in the next days if nobody beats me.

Regards,
  Stefano




This bug report was last modified 13 years and 188 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.