Package: coreutils;
Reported by: Peter Bray <pdb_ml <at> yahoo.com.au>
Date: Wed, 15 Jul 2015 09:24:01 UTC
Severity: normal
View this message in rfc822 format
From: Peter Bray <pdb_ml <at> yahoo.com.au> To: Pádraig Brady <P <at> draigBrady.com>, 21061 <at> debbugs.gnu.org Subject: bug#21061: coreutils-8.24 - Partially reproducible failures of tests/misc/timeout-parameters.sh Date: Thu, 16 Jul 2015 16:09:46 +1000
[Message part 1 (text/plain, inline)]
On 15/07/15 08:30 PM, Pádraig Brady wrote: > On 15/07/15 10:22, Peter Bray wrote: >> Greetings, >> >> N.B. This bug report is for reference only, and documents only a >> partially reproducible check failure. No Action Requested. >> >> On Solaris 10 (Update 8 and Update 11) and Solaris 11.2 X86 VMs, and >> one Solaris 10 Update 10 (Non-VM) system, I see random "gmake check" >> failures for "tests/misc/timeout-parameters.sh". >> >> Running the test by itself (with the command line below) on the same >> VMs / real system will sometimes succeed and sometimes fail. >> >> gmake check TESTS=tests/misc/timeout-parameters.sh VERBOSE=yes SUBDIRS=. >> >> Looking through the attached "failure.log" file, I extracted the >> following command line test, which may exhibit the failure without all >> the make(1) and test infrastructure code: >> >> failures=0 >> for i in `./src/seq 1 100` >> do >> ./src/timeout 2.34e+5d sleep 0 \ >> || { echo fail; failures=`expr ${failures} + 1` } >> done >> echo "Total Failures: ${failures}" >> >> On a real hardware system (Xeon E3-1245v2) with a 64-bit kernel, >> failures are very rare (only 1 test harness failure seen, no failures >> of the sample code above even with 1..1000 runs). >> >> On virtual machines (also using Xeon E3-1245v2 running VMware ESXi >> 5.5d (latest patches) - two identical ESXi systems running similarly >> configured VMs), test harness failures and failures in the above >> command line check are rare for the 64-bit Solaris kernels. >> >> Failures on Solaris 10 32-bit kernels (on both of these ESXi servers), >> are easily reproduced and vary between 5% (common) and 45% (rare). > > Interesting. I'm not reproducing that in 5000 loops in the above test script > on 32 bit baremetal solaris 10 update 10. > > I presume the large timeout value is causing early timer firing > on your systems for some reason? What does this return? > > time src/timeout 2.34e+5d sleep inf > > Note on 32 bit, the 234000 days will be truncated to a itimerspec of: > { {0,0}, {2147483647,999999999} } > > A wild guess is that perhaps ntp is adjusting the system time > which causes the above timer to be adjusted in the kernel > and roll over to 0, thus triggering early? > > thanks, > Pádraig. > Pádraig, The additional information you requested, but unfortunately I have yet to install gdb(1), so I using system tools for this response. The installation of coreutils-8.24 has been completed on all compile server VMs, so the commands now have a 'g' prefix. % gtimeout 2.34e+5d gsleep inf No output and exit status of 124 [$?=124] (32-bit kernel S10U11 / GCC 4.9.3) % truss gtimeout 2.34e+5d gsleep inf 2>&1 | tee truss.out File Attached "truss.out" (also exits with 124) Note: Adding the -v option below on a separate run, did not yield a great deal of information on the data provided to the timer*() calls. % truss -tall -v timer_create,timer_settime gtimeout 2.34e+5d gsleep inf 2>&1 timer_create(3, 0x00000000, 0x080471AC) = 0 timer_settime(0, 0, 0x080471B0, 0x00000000) = 0 Received signal #14, SIGALRM, in waitid() [caught] siginfo: SIGALRM pid=12118 uid=100 code=-3 waitid(P_PID, 12119, 0x08047130, WEXITED|WTRAPPED) Err#91 ERESTART Also captured truss -l output via: % truss -l -tall -v timer_create,timer_settime gtimeout 2.34e+5d gsleep inf \ 2>&1 | tee truss-l.out Normal apptrace show nothing of great value (does not show actual data just addresses): % apptrace gtimeout 2.34e+5d gsleep inf but it is attached as "apptrace.out". Note that the following apptrace command coredumps on each invocation: % apptrace -v timer_settime gtimeout 2.34e+5d gsleep inf -> gtimeout -> librt.so.1:int timer_settime(timer_t = 0x0, int = 0x0, const struct itimerspec * = 0x8047130, struct itimerspec * = 0x0) arg0 = (timer_t) 0x0 arg1 = (int) 0x0 arg2 = (const struct itimerspec *) 0x8047130 (struct itimerspec) { it_interval: (struct timespec) { tv_sec: (time_t) 0 tv_nsec: (long) 0 it_value: (struct timespec) { tv_sec: (time_t) 0x7fffffff tv_nsec: (long) 0x3b9ac9ff } arg3 = (struct itimerspec *) 0x0 (struct itimerspec) { it_interval: (struct timespec) { tv_sec: (time_t) apptrace: gtimeout: Segmentation Fault(Core dump) This coredump occurs even on 64-bit systems where the gtimeout command waits for the sleep command (to finish - which it won't). The timer_settime(3RT) manual page states that the last argument is permitted to be NULL, so that does not seem to be a problem. And regarding the NTP question, all compile server VMs have NTP disabled. % svcs -a | grep -i ntp disabled Jul_13 svc:/network/ntp:default disabled Jul_13 svc:/network/ntp4:default That is, NTP is disabled (since last boot) but actually disabled since installation. Though NTP is runnning on both ESXi 5.5 hosts. Regards, Peter PS: Investigating with my limited mdb(1) skills shows that its apptrace(1) coredumping not gtimeout(1). % mdb =gtimeout core > $C 08046a98 LMc9bfeea8`apptrace.so.1`print_int+0xbd(7, 0, 8046cc0) 08046bc4 LMc9bfeea8`apptrace.so.1`elt_print+0x137(c91e47b6, 2f, 0, 2, 8046cc0) 08046bf4 LMc9bfeea8`libctf.so.1`ctf_type_rvisit+0x56(c91f3930, 2f, c9b63588, 8046cc0, c91e47b6, 0) 08046c2c LMc9bfeea8`libctf.so.1`ctf_type_rvisit+0x15e(c91f3930, 30, c9b63588, 8046cc0, c91e489b, 0) 08046c64 LMc9bfeea8`libctf.so.1`ctf_type_rvisit+0x15e(c91f3930, 44, c9b63588, 8046cc0, c9ad9048, 0) 08046c8c LMc9bfeea8`libctf.so.1`ctf_type_visit+0x2c(c91f3930, 44, c9b63588, 8046cc0) 08046ce0 LMc9bfeea8`apptrace.so.1`print_value+0x127(c91f3930, 45, 0) 08046fac LMc9bfeea8`apptrace.so.1`la_i86_pltenter+0x3d1(8047030, 38, c9af06f0, c9af0e48, 8047084, c9940304) 08047000 ld.so.1`_audit_pltenter+0x11e(c9af06c0, c9bfea18, c9af0ac0, 8047030, 38, 8047084) 08047050 ld.so.1`audit_pltenter+0x98(c9bfea18, c9af0ac0, c9940308, 38, 8047084, c9940304) 080470d8 ld.so.1`elf_plt_trace+0x4d(0, 0, 8047130, 0) 08047158 settimeout+0x11d(60000000, 4212d440, 8047218, 8047218) 080471e8 main+0x2c6(8052e50, 4, 8047218) 0804720c _start+0x80(4, 80473d0, 80473d9, 80473e2, 80473e9, 0)
[apptrace.out (text/plain, attachment)]
[truss.out (text/plain, attachment)]
[truss-l.out (text/plain, attachment)]
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.