GNU bug report logs - #47795
utils.test fails intermittently

Previous Next

Package: dejagnu;

Reported by: Rainer Orth <ro <at> CeBiTec.Uni-Bielefeld.DE>

Date: Thu, 15 Apr 2021 12:29:01 UTC

Owned by: jcb62281 <at> gmail.com

Severity: normal

To reply to this bug, email your comments to 47795 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-dejagnu <at> gnu.org:
bug#47795; Package dejagnu. (Thu, 15 Apr 2021 12:29:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Rainer Orth <ro <at> CeBiTec.Uni-Bielefeld.DE>:
New bug report received and forwarded. Copy sent to bug-dejagnu <at> gnu.org. (Thu, 15 Apr 2021 12:29:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Rainer Orth <ro <at> CeBiTec.Uni-Bielefeld.DE>
To: bug-dejagnu <at> gnu.org
Subject: utils.test fails intermittently
Date: Thu, 15 Apr 2021 14:27:44 +0200
I just noticed that one part of the runtest testsuite fails
intermittently on both Solaris 10 and 11 on the 1.6.3 branch.  It's easy
to miss because it only manifests itself in runtest.{sum,log} as

[...]
PASS: testsuite file creates new implied object directory
ERROR: /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test did not complete

and

                === runtest Summary ===
 
-# of expected passes           268
+# of expected passes           294

Unless you know how many passes to expect, you likely miss the problem.

Even when running runtest -v -v instead, neither the runtest output

@@ -157,8 +157,7 @@
 Processing test /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/testsuite_file.test
 Running /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test ...
 Processing test /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test
-WARNED: diff1.txt doesn't exist
-
+ERROR: /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test did not complete

nor runtest.log show much:

Running /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test ...
Processing test /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test

interp delete test_case
interp create test_case
interp create test_case
expect1.30> test_case eval {foreach { n v }}  [list [array get default_vars]] {{ set $n $v }}
test_case eval {foreach { n v }}  [list [array get default_vars]] {{ set $n $v }}
test_case^M^M
expect1.31> test_case eval source /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test
test_case eval source /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test
expect1.32> ERROR: /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test did not complete

The test does succeed when running

runtest --tool runtest

in a row about 2 out of 10 times.

I'm a bit at a loss how to investigate this further.




Information forwarded to bug-dejagnu <at> gnu.org:
bug#47795; Package dejagnu. (Fri, 30 Apr 2021 03:17:01 GMT) Full text and rfc822 format available.

Message #8 received at 47795 <at> debbugs.gnu.org (full text, mbox):

From: Jacob Bachmeyer <jcb62281 <at> gmail.com>
To: Rainer Orth <ro <at> CeBiTec.Uni-Bielefeld.DE>
Cc: 47795 <at> debbugs.gnu.org
Subject: Re: bug#47795: utils.test fails intermittently
Date: Thu, 29 Apr 2021 22:15:53 -0500
Rainer Orth wrote:
> I just noticed that one part of the runtest testsuite fails
> intermittently on both Solaris 10 and 11 on the 1.6.3 branch.  It's easy
> to miss because it only manifests itself in runtest.{sum,log} as
>
> [...]
> PASS: testsuite file creates new implied object directory
> ERROR: /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test did not complete
>
> and
>
>                 === runtest Summary ===
>  
> -# of expected passes           268
> +# of expected passes           294
>
> Unless you know how many passes to expect, you likely miss the problem.
>   

This is a long-standing issue with DejaGnu:  the framework does not know 
how many tests to expect, so missing tests just go, well, /missing/.

This is an architectural limitation in DejaGnu, so the closest to an 
actual solution while preserving backwards compatibility is to improve 
error handling to more reliably report that /something/ went wrong, even 
if we have no way to state how many tests were not run.  Optional "test 
manifest" features, allowing testsuites to provide the framework with a 
list of expected tests, are on my local TODO list, but I have not yet 
found a good way to generalize them to DejaGnu proper.  It is possible 
that some other code consequent to the test grouping features that will 
accompany the new XML output format in 1.7 will pave the way to a clean 
API for this, but I am not yet sure of that.

> Even when running runtest -v -v instead, neither the runtest output
>
> @@ -157,8 +157,7 @@
>  Processing test /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/testsuite_file.test
>  Running /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test ...
>  Processing test /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test
> -WARNED: diff1.txt doesn't exist
> -
> +ERROR: /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test did not complete
>
> nor runtest.log show much:
>
> Running /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test ...
> Processing test /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test
>
> interp delete test_case
> interp create test_case
> interp create test_case
> expect1.30> test_case eval {foreach { n v }}  [list [array get default_vars]] {{ set $n $v }}
> test_case eval {foreach { n v }}  [list [array get default_vars]] {{ set $n $v }}
> test_case^M^M
> expect1.31> test_case eval source /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test
> test_case eval source /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test
> expect1.32> ERROR: /vol/src/gnu/dejagnu/dejagnu-1.6.3-branch/local/testsuite/runtest.libs/utils.test did not complete
>
> The test does succeed when running
>
> runtest --tool runtest
>
> in a row about 2 out of 10 times.
>   

There is a possible clue here:  it appears that we may be losing 
synchronization:  the "interp create test_case" line should be emitted 
in response to an expect1.*> prompt which seems to be missing in your 
log, yet the preceding "interp delete test_case" is included.  Do I 
guess correctly that the line preceding that snippet was another "interp 
delete test_case"?  Particularly interesting in that your log seems to 
show "test_case" returned for the "test_case eval ... foreach ..." line, 
but that line should not produce a response -- the "test_case" response 
should be from the previous "interp create test_case" command.

In fact, examining process_test in testsuite/runtest.libs/libs.exp which 
drives these tests, the error that you are seeing is exactly what is 
produced when an Expect prompt arrives when the driver is expecting to 
see test results.  Losing synchronization could cause this, although 
that check was added to detect the case where the slave interpreter 
aborts due to a Tcl error in the inner test script.

Assuming the "lost sync" hypothesis, when the failure occurs, the 
previous command is still "in-flight" somewhere... and the "1.32" prompt 
is when the "test_case eval source ..." command is given in my logs... 
but your logs show it being issued at the "1.31" prompt... is an 
"expect1./N/>" prompt duplicated somewhere earlier in the log?

It is also possible that an earlier "expect" (command in process_test) 
is timing out while waiting for an "expect*>" prompt from the child 
Expect.  My development box is relatively lightly loaded; what are the 
typical load averages on the system where the problem occurs?

The code in testsuite/runtest.libs/libs.exp is mostly my doing at this 
point, and is probably some of the earlier Expect code that I have 
written.  If there is a bug in there that only affects heavily-loaded 
systems, it is probably my then-inexperience showing.  :-)

> I'm a bit at a loss how to investigate this further.

I have only now found a possible explanation, after sitting on this for 
about two weeks.


-- Jacob




Owner recorded as jcb62281 <at> gmail.com. Request was from Jacob Bachmeyer <jcb62281 <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 04 May 2021 04:19:02 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 45 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.