GNU bug report logs -
#9737
misc/timeout-group: spurious test failure on SLES 10.3 (coreutils 8.14)
Previous Next
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
On 10/13/2011 11:27 PM, Voelker, Bernhard wrote:
> Pádraig Brady wrote:
>
>> On 10/13/2011 04:58 PM, Voelker, Bernhard wrote:
>>> reopen 9737
>>> thanks
>>>
>>> Pádraig Brady wrote:
>>>
>>>> Bah, this is just a racy test I think.
>>>> Hopefully the attached fixes it.
>>>
>>> Thank you for the patch.
>>>
>>> I tried it 16 times:
>>>
>> * 14x PASS, execution time real < 0.4s
>>>
>>> * 1x test failure (in the 5th run)
>>
>> So the command exited without receiving SIGINT.
>> Or perhaps the touch of the 'received.int' file
>> is being done asynch. Anything special about your
>> file system?
>
> It's a virtual host on a ESX server farm in our data center.
>
> ecs <at> mchp320a:~/berny/depot/coreutils-8.14/tests> uname -a
> Linux mchp320a 2.6.16.60-0.74.7-smp #1 SMP Fri Nov 26 09:16:10 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
>
> ecs <at> mchp320a:~/berny/depot/coreutils-8.14/tests> df -h .
> Filesystem Size Used Avail Use% Mounted on
> /dev/mapper/vg01-lvol0
> 50G 15G 33G 31% /user
>
> ecs <at> mchp320a:~/berny/depot/coreutils-8.14/tests> mount | grep /user
> /dev/mapper/vg01-lvol0 on /user type ext3 (rw,acl,user_xattr)
>
>>> * 1x the test lasted 20s (in the 16th run)
>>
>> But this one passed, which means the command
>> did receive the SIGINT, but then didn't exit?
>
> Sounds like one error is shadowing another.
>
>> I'm confused, sorry,
>> Pádraig.
>
> That's strange, indeed.
>
> I repeated the test with < 0.2 load 100 times:
> the run #5, #18, #28, #53, #58 and #71 resulted in FAIL as above,
> and the run #24 and #25 PASSed but took 20 seconds,
> all other PASSed within <=0.3s.
I reproduced this weirdness in OpenSuse 10.3 in a VM.
Much less frequently though.
Delays in 10 out of 2750
Signal handler call failure in 1 out of 2750
The delays might be due to bash, but I updated
to 4.2 and the issue still persists.
I suspect kernel issues too.
Anyway I've attached 2 patches to replace the previous one.
The first hopefully addresses any races in the test.
I don't think you hit any of these TBH.
The second should detect the signal issues and skip the test.
cheers,
Pádraig.
[1-timeout-races.diff (text/plain, attachment)]
[2-timeout-skips.diff (text/plain, attachment)]
This bug report was last modified 13 years and 206 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.