GNU bug report logs - #10430
coreutils-8.14.116-1e18d: "make distcheck" failure on Debian (one test failed)

Previous Next

Package: coreutils;

Reported by: Stefano Lattarini <stefano.lattarini <at> gmail.com>

Date: Wed, 4 Jan 2012 14:17:01 UTC

Severity: normal

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 10430 in the body.
You can then email your comments to 10430 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#10430; Package coreutils. (Wed, 04 Jan 2012 14:17:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefano Lattarini <stefano.lattarini <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 04 Jan 2012 14:17:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stefano Lattarini <stefano.lattarini <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: coreutils-8.14.116-1e18d: "make distcheck" failure on Debian (one
	test failed)
Date: Wed, 04 Jan 2012 15:12:38 +0100
[Message part 1 (text/plain, inline)]
Running:

  $ VERBOSE=yes make distcheck 2>&1 | tee dc.log

failed:

  ...
  ==============================================================
  Testsuite summary for GNU coreutils 8.14.116-1e18d
  ==============================================================
  # TOTAL: 466
  # PASS:  345
  # SKIP:  120
  # XFAIL: 0
  # FAIL:  1
  # XPASS: 0
  # ERROR: 0
  ==============================================================
  See tests/test-suite.log
  Please report to bug-coreutils <at> gnu.org
  ==============================================================
  ...
  make[3]: *** [check] Error 2
  make[3]: Leaving directory `/tmp/coreutils-8.14.116-1e18d/tests/torture/taint/a b/coreutils-8.14.116-1e18d'
  make[2]: *** [taint-distcheck] Error 2
  make[2]: Leaving directory `/tmp/coreutils-8.14.116-1e18d'
  make[1]: *** [distcheck-hook] Error 2
  make[1]: Leaving directory `/tmp/coreutils-8.14.116-1e18d'
  make: *** [distcheck] Error 1

The only failed test was `misc/timeout-group'.

Attached (compressed) are the dc.log file and the config.log file found in
`tests/torture/taint/a b/coreutils-8.14.116-1e18d/config.log'.  Let me know
if you need more information.

Regards,
  Stefano

[config.log.xz (application/octet-stream, attachment)]
[dc.log.xz (application/octet-stream, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#10430; Package coreutils. (Wed, 04 Jan 2012 16:40:01 GMT) Full text and rfc822 format available.

Message #8 received at 10430 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Stefano Lattarini <stefano.lattarini <at> gmail.com>
Cc: 10430 <at> debbugs.gnu.org
Subject: Re: bug#10430: coreutils-8.14.116-1e18d: "make distcheck" failure
	on Debian (one test failed)
Date: Wed, 04 Jan 2012 16:35:57 +0000
On 01/04/2012 02:12 PM, Stefano Lattarini wrote:

> The only failed test was `misc/timeout-group'.

This is either a race in the test or a bug in timeout, neither of which I can see.
Your system is running 2.6.30-2-686 SMP

Does this fail all the time?
(cd tests && make check TESTS=misc/timeout-group VERBOSE=yes)

Does it fail if you force bash?
(cd tests && make check TESTS=misc/timeout-group VERBOSE=yes SHELL=/bin/bash)

If you change timeout.c to fprintf(stderr) that the first
send_sig (monitored_pid) call is made, does that happen in the failing case?

As an outside guess, maybe timer_create() causes threads to
be created on your system, which may in turn cause signal issues?
Does the failure still happen if you s/HAVE_TIMER_SETTIME/0/ in timeout.c?

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#10430; Package coreutils. (Wed, 04 Jan 2012 17:52:01 GMT) Full text and rfc822 format available.

Message #11 received at 10430 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Stefano Lattarini <stefano.lattarini <at> gmail.com>
Cc: 10430 <at> debbugs.gnu.org
Subject: Re: bug#10430: coreutils-8.14.116-1e18d: "make distcheck" failure on
	Debian (one test failed)
Date: Wed, 04 Jan 2012 18:47:41 +0100
Stefano Lattarini wrote:
> Running:
>
>   $ VERBOSE=yes make distcheck 2>&1 | tee dc.log
>
> failed:
>
>   ...
>   ==============================================================
>   Testsuite summary for GNU coreutils 8.14.116-1e18d
>   ==============================================================
>   # TOTAL: 466
>   # PASS:  345
>   # SKIP:  120
>   # XFAIL: 0
>   # FAIL:  1
>   # XPASS: 0
>   # ERROR: 0
>   ==============================================================
>   See tests/test-suite.log
>   Please report to bug-coreutils <at> gnu.org
>   ==============================================================
>   ...
>   make[3]: *** [check] Error 2
>   make[3]: Leaving directory
> /tmp/coreutils-8.14.116-1e18d/tests/torture/taint/a
> b/coreutils-8.14.116-1e18d'
>   make[2]: *** [taint-distcheck] Error 2
>   make[2]: Leaving directory `/tmp/coreutils-8.14.116-1e18d'
>   make[1]: *** [distcheck-hook] Error 2
>   make[1]: Leaving directory `/tmp/coreutils-8.14.116-1e18d'
>   make: *** [distcheck] Error 1
>
> The only failed test was `misc/timeout-group'.

Thanks for the report.
I wonder if that is kernel related, since when I run "make check"
on a debian unstable system with this kernel, 3.1.0-1-amd64
I get no failures.

  # TOTAL: 466
  # PASS:  420
  # SKIP:  46
  # XFAIL: 0
  # FAIL:  0
  # XPASS: 0
  # ERROR: 0




Information forwarded to bug-coreutils <at> gnu.org:
bug#10430; Package coreutils. (Wed, 04 Jan 2012 18:36:01 GMT) Full text and rfc822 format available.

Message #14 received at 10430 <at> debbugs.gnu.org (full text, mbox):

From: Stefano Lattarini <stefano.lattarini <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 10430 <at> debbugs.gnu.org
Subject: Re: bug#10430: coreutils-8.14.116-1e18d: "make distcheck" failure
	on Debian (one test failed)
Date: Wed, 04 Jan 2012 19:31:56 +0100
[Message part 1 (text/plain, inline)]
On 01/04/2012 05:35 PM, Pádraig Brady wrote:
> On 01/04/2012 02:12 PM, Stefano Lattarini wrote:
> 
>> The only failed test was `misc/timeout-group'.
> 
> This is either a race in the test or a bug in timeout, neither of which I can see.
> Your system is running 2.6.30-2-686 SMP
> 
> Does this fail all the time?
> (cd tests && make check TESTS=misc/timeout-group VERBOSE=yes)
>
No, it only fails ~ 6% of the time:

  $ cat foo.sh
  #!/bin/bash
  pass=0
  fail=0
  for ((i = 1; i <= 200; i++)); do
    if make -C tests check TESTS=misc/timeout-group VERBOSE=yes &>/dev/null
    then
      echo "- run $i: pass"
      let pass++
    else
      echo "- run $i: fail"
      let fail++
    fi
  done
  echo PASS: $pass
  echo FAIL: $fail
  $ ./foo.sh
  - run 1: fail
  - run 2: pass
  - run 3: pass
  - run 4: pass
  - run 5: pass
  - run 6: pass
  - run 7: pass
  - run 8: pass
  - run 9: pass
  - run 10: fail
  - run 11: pass
  ...
  - run 199: pass
  - run 200: pass
  PASS: 188
  FAIL: 12

> Does it fail if you force bash?
> (cd tests && make check TESTS=misc/timeout-group VERBOSE=yes SHELL=/bin/bash)
>
The behaviour is the same as above (as was predictable, since /bin/sh is linked
to /bin/bash).

> If you change timeout.c to fprintf(stderr) that the first
> send_sig (monitored_pid) call is made, does that happen in the failing case?
>
OK, so, in 'timeout.c:cleanup()' I've added this line:

   fprintf (stderr, "^^^ send_sig (%lu, %u)\n", monitored_pid, sig);

just before this line:

   send_sig (monitored_pid, sig);

The logs of a failing and a passing test run after this modification are
attached.

> As an outside guess, maybe timer_create() causes threads to
> be created on your system, which may in turn cause signal issues?
> Does the failure still happen if you s/HAVE_TIMER_SETTIME/0/ in timeout.c?
>
Yes (after 10 runs the first time I've tried, after 2 the second time,
after 13 the third and last time).

HTH,
  Stefano
[timeout-group.ko (text/plain, attachment)]
[timeout-group.ok (text/plain, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#10430; Package coreutils. (Wed, 04 Jan 2012 20:11:01 GMT) Full text and rfc822 format available.

Message #17 received at 10430 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Stefano Lattarini <stefano.lattarini <at> gmail.com>
Cc: 10430 <at> debbugs.gnu.org
Subject: Re: bug#10430: coreutils-8.14.116-1e18d: "make distcheck" failure
	on Debian (one test failed)
Date: Wed, 04 Jan 2012 20:07:25 +0000
On 01/04/2012 06:31 PM, Stefano Lattarini wrote:
> On 01/04/2012 05:35 PM, Pádraig Brady wrote:
>> On 01/04/2012 02:12 PM, Stefano Lattarini wrote:
>>
>>> The only failed test was `misc/timeout-group'.
>>
>> This is either a race in the test or a bug in timeout, neither of which I can see.
>> Your system is running 2.6.30-2-686 SMP
>>
>> Does this fail all the time?
>> (cd tests && make check TESTS=misc/timeout-group VERBOSE=yes)
>>
> No, it only fails ~ 6% of the time:

>> If you change timeout.c to fprintf(stderr) that the first
>> send_sig (monitored_pid) call is made, does that happen in the failing case?
>>
> OK, so, in 'timeout.c:cleanup()' I've added this line:
> 
>    fprintf (stderr, "^^^ send_sig (%lu, %u)\n", monitored_pid, sig);
> 
> just before this line:
> 
>    send_sig (monitored_pid, sig);
> 
> The logs of a failing and a passing test run after this modification are
> attached.

Great thanks.
The fprintf (stderr) should be unbuffered, and display before the signal is sent.
I still don't see exactly what's going on though :(

In the non working case we have:

+ env kill -INT -- -4854
+ wait
^^^ send_sig (4856, 2)
+ test -e int.received
+ rm -f int.received timeout.running

Notice that the signal was reported as sent.
Now if the signal wasn't in fact propagated,
the script would wait 20s and return without touching the file,
and hence cause the test to fail.

However I don't think that's what's happening, as the second
part of the test completes in less than a second.
I.E. The signal is reported as sent by the first timeout in the chain,
but no others report as receiving the signal, but then wait returns immediately,
suggesting there may also be an issue with the wait system call.

I'm also confused by the log of the working run.
The first "send_sig" is not reported, even though
the test seems to complete as expected?

I've put various sleeps locally here trying to trigger any races but can't.

For my reference, for me to try if I get access to such a system:
What could cause the breakdown in signal propagation is if
sometimes on exec(), the kernel set the signal handlers to SIG_IGN rather
than SIG_DFL (in contradiction to POSIX). One could test that hypothesis by
explicitly setting the signals listed in install_signal_handlers()
to SIG_DFL just before the execvp().

cheers,
Pádraig.




bug closed, send any further explanations to 10430 <at> debbugs.gnu.org and Stefano Lattarini <stefano.lattarini <at> gmail.com> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Thu, 11 Oct 2018 22:42:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 09 Nov 2018 12:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 317 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.