GNU bug report logs - #10430
coreutils-8.14.116-1e18d: "make distcheck" failure on Debian (one test failed)

Previous Next

Package: coreutils;

Reported by: Stefano Lattarini <stefano.lattarini <at> gmail.com>

Date: Wed, 4 Jan 2012 14:17:01 UTC

Severity: normal

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Pádraig Brady <P <at> draigBrady.com>
To: Stefano Lattarini <stefano.lattarini <at> gmail.com>
Cc: 10430 <at> debbugs.gnu.org
Subject: bug#10430: coreutils-8.14.116-1e18d: "make distcheck" failure on Debian (one test failed)
Date: Wed, 04 Jan 2012 20:07:25 +0000
On 01/04/2012 06:31 PM, Stefano Lattarini wrote:
> On 01/04/2012 05:35 PM, Pádraig Brady wrote:
>> On 01/04/2012 02:12 PM, Stefano Lattarini wrote:
>>
>>> The only failed test was `misc/timeout-group'.
>>
>> This is either a race in the test or a bug in timeout, neither of which I can see.
>> Your system is running 2.6.30-2-686 SMP
>>
>> Does this fail all the time?
>> (cd tests && make check TESTS=misc/timeout-group VERBOSE=yes)
>>
> No, it only fails ~ 6% of the time:

>> If you change timeout.c to fprintf(stderr) that the first
>> send_sig (monitored_pid) call is made, does that happen in the failing case?
>>
> OK, so, in 'timeout.c:cleanup()' I've added this line:
> 
>    fprintf (stderr, "^^^ send_sig (%lu, %u)\n", monitored_pid, sig);
> 
> just before this line:
> 
>    send_sig (monitored_pid, sig);
> 
> The logs of a failing and a passing test run after this modification are
> attached.

Great thanks.
The fprintf (stderr) should be unbuffered, and display before the signal is sent.
I still don't see exactly what's going on though :(

In the non working case we have:

+ env kill -INT -- -4854
+ wait
^^^ send_sig (4856, 2)
+ test -e int.received
+ rm -f int.received timeout.running

Notice that the signal was reported as sent.
Now if the signal wasn't in fact propagated,
the script would wait 20s and return without touching the file,
and hence cause the test to fail.

However I don't think that's what's happening, as the second
part of the test completes in less than a second.
I.E. The signal is reported as sent by the first timeout in the chain,
but no others report as receiving the signal, but then wait returns immediately,
suggesting there may also be an issue with the wait system call.

I'm also confused by the log of the working run.
The first "send_sig" is not reported, even though
the test seems to complete as expected?

I've put various sleeps locally here trying to trigger any races but can't.

For my reference, for me to try if I get access to such a system:
What could cause the breakdown in signal propagation is if
sometimes on exec(), the kernel set the signal handlers to SIG_IGN rather
than SIG_DFL (in contradiction to POSIX). One could test that hypothesis by
explicitly setting the signals listed in install_signal_handlers()
to SIG_DFL just before the execvp().

cheers,
Pádraig.




This bug report was last modified 6 years and 317 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.