GNU bug report logs -
#10430
coreutils-8.14.116-1e18d: "make distcheck" failure on Debian (one test failed)
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 10430 in the body.
You can then email your comments to 10430 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#10430
; Package
coreutils
.
(Wed, 04 Jan 2012 14:17:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Stefano Lattarini <stefano.lattarini <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Wed, 04 Jan 2012 14:17:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Running:
$ VERBOSE=yes make distcheck 2>&1 | tee dc.log
failed:
...
==============================================================
Testsuite summary for GNU coreutils 8.14.116-1e18d
==============================================================
# TOTAL: 466
# PASS: 345
# SKIP: 120
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
==============================================================
See tests/test-suite.log
Please report to bug-coreutils <at> gnu.org
==============================================================
...
make[3]: *** [check] Error 2
make[3]: Leaving directory `/tmp/coreutils-8.14.116-1e18d/tests/torture/taint/a b/coreutils-8.14.116-1e18d'
make[2]: *** [taint-distcheck] Error 2
make[2]: Leaving directory `/tmp/coreutils-8.14.116-1e18d'
make[1]: *** [distcheck-hook] Error 2
make[1]: Leaving directory `/tmp/coreutils-8.14.116-1e18d'
make: *** [distcheck] Error 1
The only failed test was `misc/timeout-group'.
Attached (compressed) are the dc.log file and the config.log file found in
`tests/torture/taint/a b/coreutils-8.14.116-1e18d/config.log'. Let me know
if you need more information.
Regards,
Stefano
[config.log.xz (application/octet-stream, attachment)]
[dc.log.xz (application/octet-stream, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#10430
; Package
coreutils
.
(Wed, 04 Jan 2012 16:40:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 10430 <at> debbugs.gnu.org (full text, mbox):
On 01/04/2012 02:12 PM, Stefano Lattarini wrote:
> The only failed test was `misc/timeout-group'.
This is either a race in the test or a bug in timeout, neither of which I can see.
Your system is running 2.6.30-2-686 SMP
Does this fail all the time?
(cd tests && make check TESTS=misc/timeout-group VERBOSE=yes)
Does it fail if you force bash?
(cd tests && make check TESTS=misc/timeout-group VERBOSE=yes SHELL=/bin/bash)
If you change timeout.c to fprintf(stderr) that the first
send_sig (monitored_pid) call is made, does that happen in the failing case?
As an outside guess, maybe timer_create() causes threads to
be created on your system, which may in turn cause signal issues?
Does the failure still happen if you s/HAVE_TIMER_SETTIME/0/ in timeout.c?
cheers,
Pádraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#10430
; Package
coreutils
.
(Wed, 04 Jan 2012 17:52:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 10430 <at> debbugs.gnu.org (full text, mbox):
Stefano Lattarini wrote:
> Running:
>
> $ VERBOSE=yes make distcheck 2>&1 | tee dc.log
>
> failed:
>
> ...
> ==============================================================
> Testsuite summary for GNU coreutils 8.14.116-1e18d
> ==============================================================
> # TOTAL: 466
> # PASS: 345
> # SKIP: 120
> # XFAIL: 0
> # FAIL: 1
> # XPASS: 0
> # ERROR: 0
> ==============================================================
> See tests/test-suite.log
> Please report to bug-coreutils <at> gnu.org
> ==============================================================
> ...
> make[3]: *** [check] Error 2
> make[3]: Leaving directory
> /tmp/coreutils-8.14.116-1e18d/tests/torture/taint/a
> b/coreutils-8.14.116-1e18d'
> make[2]: *** [taint-distcheck] Error 2
> make[2]: Leaving directory `/tmp/coreutils-8.14.116-1e18d'
> make[1]: *** [distcheck-hook] Error 2
> make[1]: Leaving directory `/tmp/coreutils-8.14.116-1e18d'
> make: *** [distcheck] Error 1
>
> The only failed test was `misc/timeout-group'.
Thanks for the report.
I wonder if that is kernel related, since when I run "make check"
on a debian unstable system with this kernel, 3.1.0-1-amd64
I get no failures.
# TOTAL: 466
# PASS: 420
# SKIP: 46
# XFAIL: 0
# FAIL: 0
# XPASS: 0
# ERROR: 0
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#10430
; Package
coreutils
.
(Wed, 04 Jan 2012 18:36:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 10430 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 01/04/2012 05:35 PM, Pádraig Brady wrote:
> On 01/04/2012 02:12 PM, Stefano Lattarini wrote:
>
>> The only failed test was `misc/timeout-group'.
>
> This is either a race in the test or a bug in timeout, neither of which I can see.
> Your system is running 2.6.30-2-686 SMP
>
> Does this fail all the time?
> (cd tests && make check TESTS=misc/timeout-group VERBOSE=yes)
>
No, it only fails ~ 6% of the time:
$ cat foo.sh
#!/bin/bash
pass=0
fail=0
for ((i = 1; i <= 200; i++)); do
if make -C tests check TESTS=misc/timeout-group VERBOSE=yes &>/dev/null
then
echo "- run $i: pass"
let pass++
else
echo "- run $i: fail"
let fail++
fi
done
echo PASS: $pass
echo FAIL: $fail
$ ./foo.sh
- run 1: fail
- run 2: pass
- run 3: pass
- run 4: pass
- run 5: pass
- run 6: pass
- run 7: pass
- run 8: pass
- run 9: pass
- run 10: fail
- run 11: pass
...
- run 199: pass
- run 200: pass
PASS: 188
FAIL: 12
> Does it fail if you force bash?
> (cd tests && make check TESTS=misc/timeout-group VERBOSE=yes SHELL=/bin/bash)
>
The behaviour is the same as above (as was predictable, since /bin/sh is linked
to /bin/bash).
> If you change timeout.c to fprintf(stderr) that the first
> send_sig (monitored_pid) call is made, does that happen in the failing case?
>
OK, so, in 'timeout.c:cleanup()' I've added this line:
fprintf (stderr, "^^^ send_sig (%lu, %u)\n", monitored_pid, sig);
just before this line:
send_sig (monitored_pid, sig);
The logs of a failing and a passing test run after this modification are
attached.
> As an outside guess, maybe timer_create() causes threads to
> be created on your system, which may in turn cause signal issues?
> Does the failure still happen if you s/HAVE_TIMER_SETTIME/0/ in timeout.c?
>
Yes (after 10 runs the first time I've tried, after 2 the second time,
after 13 the third and last time).
HTH,
Stefano
[timeout-group.ko (text/plain, attachment)]
[timeout-group.ok (text/plain, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#10430
; Package
coreutils
.
(Wed, 04 Jan 2012 20:11:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 10430 <at> debbugs.gnu.org (full text, mbox):
On 01/04/2012 06:31 PM, Stefano Lattarini wrote:
> On 01/04/2012 05:35 PM, Pádraig Brady wrote:
>> On 01/04/2012 02:12 PM, Stefano Lattarini wrote:
>>
>>> The only failed test was `misc/timeout-group'.
>>
>> This is either a race in the test or a bug in timeout, neither of which I can see.
>> Your system is running 2.6.30-2-686 SMP
>>
>> Does this fail all the time?
>> (cd tests && make check TESTS=misc/timeout-group VERBOSE=yes)
>>
> No, it only fails ~ 6% of the time:
>> If you change timeout.c to fprintf(stderr) that the first
>> send_sig (monitored_pid) call is made, does that happen in the failing case?
>>
> OK, so, in 'timeout.c:cleanup()' I've added this line:
>
> fprintf (stderr, "^^^ send_sig (%lu, %u)\n", monitored_pid, sig);
>
> just before this line:
>
> send_sig (monitored_pid, sig);
>
> The logs of a failing and a passing test run after this modification are
> attached.
Great thanks.
The fprintf (stderr) should be unbuffered, and display before the signal is sent.
I still don't see exactly what's going on though :(
In the non working case we have:
+ env kill -INT -- -4854
+ wait
^^^ send_sig (4856, 2)
+ test -e int.received
+ rm -f int.received timeout.running
Notice that the signal was reported as sent.
Now if the signal wasn't in fact propagated,
the script would wait 20s and return without touching the file,
and hence cause the test to fail.
However I don't think that's what's happening, as the second
part of the test completes in less than a second.
I.E. The signal is reported as sent by the first timeout in the chain,
but no others report as receiving the signal, but then wait returns immediately,
suggesting there may also be an issue with the wait system call.
I'm also confused by the log of the working run.
The first "send_sig" is not reported, even though
the test seems to complete as expected?
I've put various sleeps locally here trying to trigger any races but can't.
For my reference, for me to try if I get access to such a system:
What could cause the breakdown in signal propagation is if
sometimes on exec(), the kernel set the signal handlers to SIG_IGN rather
than SIG_DFL (in contradiction to POSIX). One could test that hypothesis by
explicitly setting the signals listed in install_signal_handlers()
to SIG_DFL just before the execvp().
cheers,
Pádraig.
bug closed, send any further explanations to
10430 <at> debbugs.gnu.org and Stefano Lattarini <stefano.lattarini <at> gmail.com>
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Thu, 11 Oct 2018 22:42:03 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 09 Nov 2018 12:24:06 GMT)
Full text and
rfc822 format available.
This bug report was last modified 6 years and 317 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.