Package: dejagnu;
Reported by: Tom de Vries <tdevries <at> suse.de>
Date: Fri, 12 Jun 2020 08:36:01 UTC
Owned by: jcb62281 <at> gmail.com
Severity: normal
Done: Jacob Bachmeyer <jcb62281 <at> gmail.com>
Bug is archived. No further changes may be made.
Message #79 received at 41824 <at> debbugs.gnu.org (full text, mbox):
From: Jacob Bachmeyer <jcb62281 <at> gmail.com> To: Pedro Alves <palves <at> redhat.com> Cc: Tom de Vries <tdevries <at> suse.de>, 41824 <at> debbugs.gnu.org Subject: Re: bug#41824: Dejagnu's unknown proc aborts testsuite run when triggered in test-case Date: Sat, 27 Jun 2020 12:21:34 -0500
Pedro Alves wrote: > On 6/26/20 11:37 PM, Jacob Bachmeyer wrote: > >> Pedro Alves wrote: >> >>> On 6/25/20 11:39 PM, Jacob Bachmeyer wrote: >>> >>> >>>> Pedro Alves wrote: >>>> >>>> > > [...] > > Could this be solved by putting each testcase in its own address space? > Yes. Do people actually do that? No -- it's a lot of make work and the > problem is rare. Is that reuse global as array really revealing such a MAJOR > issue with the GDB testcase that it warrants stopping the whole run? No... > > I would argue that instead it is the DejaGnu framework that should make > it so that each testcase is run under as much isolation as possible. > So I call that a design issue in DejaGnu itself. Whether that could > be fixed by something like sourcing each testcase under its own > Tcl slave interpreter, I'm not sure. > This is the current plan -- to eventually run test scripts in independent slave interpreters, which enforces the isolation necessary at the Tcl level for future native parallel testing and also prevents test scripts from accidentally clobbering globals that are important to DejaGnu, some of which have very generic names. > Another example of a Tcl bug that sometimes we trip on. Say we have a > testcase that needs to extract some address from the inferior > in its setup phase, something like this: > > set test "get address" > gdb_test_multiple "x /i \$pc" $test { > -re "($hex).*$gdb_prompt $" { > set addr $expect_out(1,string) > pass $test > } > } > > if {$addr == 0} { > # The above failed, so no use continuing. > return > } > > Now, gdb_test_multiple is a wrapper around expect, that internally catches > GDB crashes, internal GDB errors, timeouts and other things, so the bug above > is often missed. The bug is of course that when e.g., a timeout occurs, $addr > is left uninitialized, so you get a Tcl error referencing a nonexisting variable. > > But the thing is, we already issued a FAIL, and we want to bail out of > the testcase anyway. Crashing the whole testrun for a bug that typically > happens in already-failing code is too strict. And these are the > code paths that are not exercised in a normal run, so are > easy to miss, and are just impossible to have full coverage for. > Fine; I will loosen that, but DejaGnu has no way to know that it is bailing out of a failing testcase and that there are no more tests from that script, so an UNRESOLVED result complaining about the error will still be inserted. >>>>> I think it's safe to say that we all understand the general principle, >>>>> but IMO that applies more to continuing the same testcase (if somehow that >>>>> were possible, like "unknown" suppressing the error), rather than continuing >>>>> with a different testcase. >>>>> >>>>> Continuing the testrun when one testcase errors out does not mask any >>>>> bug in the program that is supposed to be tested. Why would it? The >>>>> testcase aborts, it doesn't continue. The remaining testcases start >>>>> afresh. >>>>> >>>>> >>>> Consider a hypothetical case where GDB has exec.exp (which tests starting a debugged process) and attach.exp (which tests attaching to a process). Unknown to the GDB developers, the "attach" command does not work and causes GDB to dump core (oops!) but attach.exp fails with a Tcl error before reaching the point where "attach" is issued to the inferior GDB. If DejaGnu stops immediately, it is obvious that the testsuite is broken, the bug in the testsuite is fixed, and the bug in GDB is found. If DejaGnu sweeps the Tcl error under the proverbial rug (as previous versions did), the bug in the testsuite effectively hides the bug in GDB , because the developers think the "attach" command is being exercised, but those tests are being (almost) silently skipped. >>>> >>>> >>> First, that scenario doesn't normally cause a Tcl error. If GDB trips >>> an internal error, the GDB testsuite catches it with a FAIL. If GDB just dies >>> (whether it dumps core or not) the GDB testsuite catches that >>> with UNRESOLVED. >>> >>> >> In the hypothetical scenario, attach.exp crashes with a Tcl error *before* issuing the "attach" command to the GDB-under-test. Previous releases of DejaGnu mention a Tcl error in the log and carry on. With this patch, DejaGnu catches that attach.exp crashed and itself inserts an UNRESOLVED result. >> >> I also take this as more support for using the UNRESOLVED status here: the test script crashing is little different from GDB exiting unexpectedly. >> > > Well, I'm actually not sure why doesn't GDB FAIL in the crash case instead of > UNRESOLVED. To me a GDB internal error or the GDB process crashing doesn't > look very different. Both require pretty much the same level of attention > from a developer. But that decision predates me. Who knows, maybe GDB > internal error support was only added after the testsuite was originally > written, and thus the crash handling predates the internal error handling. > Considering that testing GDB was one of the motivations for writing DejaGnu and how old DejaGnu is, I suspect that the testsuite had to handle GDB crashes before GDB had support for reporting internal errors. In some sense, UNRESOLVED is more serious than FAIL; the GDB testsuite has about a hundred FAIL results but typically no UNRESOLVED results. >>> And again, people run the GDB testsuite in parallel mode, so your >>> assumption that the testsuite would stop immediately anyway doesn't >>> hold. One of the parallel runtest invocations bombs >>> out, while the others keep running, so by the time you get >>> back the prompt, other testcases already executed, and you don't >>> see the ERROR in the current terminal anymore. >>> >> This is a fundamental problem with slicing up the testsuite external to DejaGnu and running multiple instances of runtest -- output that is supposed to be "last at the end of the run" can be hidden away by other ongoing subset runs. >> > > That's like arguing against parallelization, and saying that everyone > should run the testsuite in serial mode, and be OK with it taking 2 hours > instead of 30 mins to complete a run... > It is a problem with parallelization, just like aborting a test run is a problem when running parallel tests. > The parallel mode aggregates the summaries from the different runtest > invocations as a single summary at the end, > > https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=contrib/dg-extract-results.py;h=30aa68771d491451f72d6dbd18f6d5ac339451b5;hb=HEAD > > so I don't see what problem here is unsolvable. Again, the issue is that Tcl > ERRORs aren't tallied up on the DejaGnu summary, so the aggregated summary > doesn't contain those either. > The ERRORs will be printed a second time, so all the script needs to do is collect text that appears after the summary and starts with "ERROR:". There seems to also be some confusion between Tcl errors and the ERROR messages produce by DejaGnu's perror procedure (or its internal equivalents). Would it be better to report Tcl errors with "CRASH" instead of "ERROR"? >>> You'll need to >>> look at the testsuite .sum .log files anyway. Only, you will >>> find out that one testcase bombed out, 50000 tests ran, but 1000 others >>> didn't run because of that one testcase. And now you're missing >>> those 1000 results, which maybe included those which you were >>> actually interested in, because you were doing a builbot try run >>> across a number of systems. >>> >>> >> This also applies on a smaller scale for which we have no workaround: gdb.foo/bar.exp bombs out and the remaining /N/ tests in that file did not run. The only real solution is for the test scripts to avoid causing Tcl errors in the first place; anything else is papering over the problem with duct tape. >> > > That's not the same thing. The remaining /N/ tests in the file depend > on the previous /M/ tests. E.g., > > gdb_test "step" > gdb_test "print foo" > > If the "step" fails, the "print foo" test is meaningless. > > However, there should be absolutely no dependency between foo.exp and > bar.exp. It should be possible to run those two different testcases > in any order. Thus, it should be irrelevant to foo.exp whether bar.exp > bombs out of not. Consequently, there's no real reason to not let > foo.exp run regardless of what happens to bar.exp. > This will need to be documented... another item for the local TODO list. >>> Now >>> you wish you had remembered to type --keep-going, but it's not >>> the default, and nobody ever writes Tcl bugs, so you didn't think >>> of it before. >>> >>> >> Note that the snapshot of GDB I have been using for testing this patch does have such a Tcl bug in its testsuite. I did not make any particular effort to choose this snapshot, it was just the current state of GDB master when I started. I have not even looked at *which* commit it is. >> > > Can you tell me what the error looks like? > With commit 7b958a48e1322880f23cdb0a1c35643dd27d3ddb and GDB built without Python support: ERROR: tcl error sourcing /home/jacob/arena/gdb-check/build-queue-test/gdb/testsuite/../../../src/gdb/testsuite/gdb.python/py-format-string.exp. ERROR: invoked "continue" outside of a loop -- Jacob
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.