Package: coreutils;
Reported by: Peter Bray <pdb_ml <at> yahoo.com.au>
Date: Wed, 15 Jul 2015 09:24:01 UTC
Severity: normal
To reply to this bug, email your comments to 21061 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-coreutils <at> gnu.org
:bug#21061
; Package coreutils
.
(Wed, 15 Jul 2015 09:24:02 GMT) Full text and rfc822 format available.Peter Bray <pdb_ml <at> yahoo.com.au>
:bug-coreutils <at> gnu.org
.
(Wed, 15 Jul 2015 09:24:02 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Peter Bray <pdb_ml <at> yahoo.com.au> To: bug-coreutils <at> gnu.org Subject: coreutils-8.24 - Partially reproducible failures of tests/misc/timeout-parameters.sh Date: Wed, 15 Jul 2015 19:22:52 +1000
[Message part 1 (text/plain, inline)]
Greetings, N.B. This bug report is for reference only, and documents only a partially reproducible check failure. No Action Requested. On Solaris 10 (Update 8 and Update 11) and Solaris 11.2 X86 VMs, and one Solaris 10 Update 10 (Non-VM) system, I see random "gmake check" failures for "tests/misc/timeout-parameters.sh". Running the test by itself (with the command line below) on the same VMs / real system will sometimes succeed and sometimes fail. gmake check TESTS=tests/misc/timeout-parameters.sh VERBOSE=yes SUBDIRS=. Looking through the attached "failure.log" file, I extracted the following command line test, which may exhibit the failure without all the make(1) and test infrastructure code: failures=0 for i in `./src/seq 1 100` do ./src/timeout 2.34e+5d sleep 0 \ || { echo fail; failures=`expr ${failures} + 1` } done echo "Total Failures: ${failures}" On a real hardware system (Xeon E3-1245v2) with a 64-bit kernel, failures are very rare (only 1 test harness failure seen, no failures of the sample code above even with 1..1000 runs). On virtual machines (also using Xeon E3-1245v2 running VMware ESXi 5.5d (latest patches) - two identical ESXi systems running similarly configured VMs), test harness failures and failures in the above command line check are rare for the 64-bit Solaris kernels. Failures on Solaris 10 32-bit kernels (on both of these ESXi servers), are easily reproduced and vary between 5% (common) and 45% (rare). Regards, Peter Bray Sydney, Australia PS: Again, no action is requested, this is intended as a file for later reference bug report.
[failure.log (text/plain, attachment)]
bug-coreutils <at> gnu.org
:bug#21061
; Package coreutils
.
(Wed, 15 Jul 2015 10:32:01 GMT) Full text and rfc822 format available.Message #8 received at 21061 <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com> To: Peter Bray <pdb_ml <at> yahoo.com.au>, 21061 <at> debbugs.gnu.org Subject: Re: bug#21061: coreutils-8.24 - Partially reproducible failures of tests/misc/timeout-parameters.sh Date: Wed, 15 Jul 2015 11:30:59 +0100
On 15/07/15 10:22, Peter Bray wrote: > Greetings, > > N.B. This bug report is for reference only, and documents only a > partially reproducible check failure. No Action Requested. > > On Solaris 10 (Update 8 and Update 11) and Solaris 11.2 X86 VMs, and > one Solaris 10 Update 10 (Non-VM) system, I see random "gmake check" > failures for "tests/misc/timeout-parameters.sh". > > Running the test by itself (with the command line below) on the same > VMs / real system will sometimes succeed and sometimes fail. > > gmake check TESTS=tests/misc/timeout-parameters.sh VERBOSE=yes SUBDIRS=. > > Looking through the attached "failure.log" file, I extracted the > following command line test, which may exhibit the failure without all > the make(1) and test infrastructure code: > > failures=0 > for i in `./src/seq 1 100` > do > ./src/timeout 2.34e+5d sleep 0 \ > || { echo fail; failures=`expr ${failures} + 1` } > done > echo "Total Failures: ${failures}" > > On a real hardware system (Xeon E3-1245v2) with a 64-bit kernel, > failures are very rare (only 1 test harness failure seen, no failures > of the sample code above even with 1..1000 runs). > > On virtual machines (also using Xeon E3-1245v2 running VMware ESXi > 5.5d (latest patches) - two identical ESXi systems running similarly > configured VMs), test harness failures and failures in the above > command line check are rare for the 64-bit Solaris kernels. > > Failures on Solaris 10 32-bit kernels (on both of these ESXi servers), > are easily reproduced and vary between 5% (common) and 45% (rare). Interesting. I'm not reproducing that in 5000 loops in the above test script on 32 bit baremetal solaris 10 update 10. I presume the large timeout value is causing early timer firing on your systems for some reason? What does this return? time src/timeout 2.34e+5d sleep inf Note on 32 bit, the 234000 days will be truncated to a itimerspec of: { {0,0}, {2147483647,999999999} } A wild guess is that perhaps ntp is adjusting the system time which causes the above timer to be adjusted in the kernel and roll over to 0, thus triggering early? thanks, Pádraig.
bug-coreutils <at> gnu.org
:bug#21061
; Package coreutils
.
(Thu, 16 Jul 2015 06:11:02 GMT) Full text and rfc822 format available.Message #11 received at 21061 <at> debbugs.gnu.org (full text, mbox):
From: Peter Bray <pdb_ml <at> yahoo.com.au> To: Pádraig Brady <P <at> draigBrady.com>, 21061 <at> debbugs.gnu.org Subject: Re: bug#21061: coreutils-8.24 - Partially reproducible failures of tests/misc/timeout-parameters.sh Date: Thu, 16 Jul 2015 16:09:46 +1000
[Message part 1 (text/plain, inline)]
On 15/07/15 08:30 PM, Pádraig Brady wrote: > On 15/07/15 10:22, Peter Bray wrote: >> Greetings, >> >> N.B. This bug report is for reference only, and documents only a >> partially reproducible check failure. No Action Requested. >> >> On Solaris 10 (Update 8 and Update 11) and Solaris 11.2 X86 VMs, and >> one Solaris 10 Update 10 (Non-VM) system, I see random "gmake check" >> failures for "tests/misc/timeout-parameters.sh". >> >> Running the test by itself (with the command line below) on the same >> VMs / real system will sometimes succeed and sometimes fail. >> >> gmake check TESTS=tests/misc/timeout-parameters.sh VERBOSE=yes SUBDIRS=. >> >> Looking through the attached "failure.log" file, I extracted the >> following command line test, which may exhibit the failure without all >> the make(1) and test infrastructure code: >> >> failures=0 >> for i in `./src/seq 1 100` >> do >> ./src/timeout 2.34e+5d sleep 0 \ >> || { echo fail; failures=`expr ${failures} + 1` } >> done >> echo "Total Failures: ${failures}" >> >> On a real hardware system (Xeon E3-1245v2) with a 64-bit kernel, >> failures are very rare (only 1 test harness failure seen, no failures >> of the sample code above even with 1..1000 runs). >> >> On virtual machines (also using Xeon E3-1245v2 running VMware ESXi >> 5.5d (latest patches) - two identical ESXi systems running similarly >> configured VMs), test harness failures and failures in the above >> command line check are rare for the 64-bit Solaris kernels. >> >> Failures on Solaris 10 32-bit kernels (on both of these ESXi servers), >> are easily reproduced and vary between 5% (common) and 45% (rare). > > Interesting. I'm not reproducing that in 5000 loops in the above test script > on 32 bit baremetal solaris 10 update 10. > > I presume the large timeout value is causing early timer firing > on your systems for some reason? What does this return? > > time src/timeout 2.34e+5d sleep inf > > Note on 32 bit, the 234000 days will be truncated to a itimerspec of: > { {0,0}, {2147483647,999999999} } > > A wild guess is that perhaps ntp is adjusting the system time > which causes the above timer to be adjusted in the kernel > and roll over to 0, thus triggering early? > > thanks, > Pádraig. > Pádraig, The additional information you requested, but unfortunately I have yet to install gdb(1), so I using system tools for this response. The installation of coreutils-8.24 has been completed on all compile server VMs, so the commands now have a 'g' prefix. % gtimeout 2.34e+5d gsleep inf No output and exit status of 124 [$?=124] (32-bit kernel S10U11 / GCC 4.9.3) % truss gtimeout 2.34e+5d gsleep inf 2>&1 | tee truss.out File Attached "truss.out" (also exits with 124) Note: Adding the -v option below on a separate run, did not yield a great deal of information on the data provided to the timer*() calls. % truss -tall -v timer_create,timer_settime gtimeout 2.34e+5d gsleep inf 2>&1 timer_create(3, 0x00000000, 0x080471AC) = 0 timer_settime(0, 0, 0x080471B0, 0x00000000) = 0 Received signal #14, SIGALRM, in waitid() [caught] siginfo: SIGALRM pid=12118 uid=100 code=-3 waitid(P_PID, 12119, 0x08047130, WEXITED|WTRAPPED) Err#91 ERESTART Also captured truss -l output via: % truss -l -tall -v timer_create,timer_settime gtimeout 2.34e+5d gsleep inf \ 2>&1 | tee truss-l.out Normal apptrace show nothing of great value (does not show actual data just addresses): % apptrace gtimeout 2.34e+5d gsleep inf but it is attached as "apptrace.out". Note that the following apptrace command coredumps on each invocation: % apptrace -v timer_settime gtimeout 2.34e+5d gsleep inf -> gtimeout -> librt.so.1:int timer_settime(timer_t = 0x0, int = 0x0, const struct itimerspec * = 0x8047130, struct itimerspec * = 0x0) arg0 = (timer_t) 0x0 arg1 = (int) 0x0 arg2 = (const struct itimerspec *) 0x8047130 (struct itimerspec) { it_interval: (struct timespec) { tv_sec: (time_t) 0 tv_nsec: (long) 0 it_value: (struct timespec) { tv_sec: (time_t) 0x7fffffff tv_nsec: (long) 0x3b9ac9ff } arg3 = (struct itimerspec *) 0x0 (struct itimerspec) { it_interval: (struct timespec) { tv_sec: (time_t) apptrace: gtimeout: Segmentation Fault(Core dump) This coredump occurs even on 64-bit systems where the gtimeout command waits for the sleep command (to finish - which it won't). The timer_settime(3RT) manual page states that the last argument is permitted to be NULL, so that does not seem to be a problem. And regarding the NTP question, all compile server VMs have NTP disabled. % svcs -a | grep -i ntp disabled Jul_13 svc:/network/ntp:default disabled Jul_13 svc:/network/ntp4:default That is, NTP is disabled (since last boot) but actually disabled since installation. Though NTP is runnning on both ESXi 5.5 hosts. Regards, Peter PS: Investigating with my limited mdb(1) skills shows that its apptrace(1) coredumping not gtimeout(1). % mdb =gtimeout core > $C 08046a98 LMc9bfeea8`apptrace.so.1`print_int+0xbd(7, 0, 8046cc0) 08046bc4 LMc9bfeea8`apptrace.so.1`elt_print+0x137(c91e47b6, 2f, 0, 2, 8046cc0) 08046bf4 LMc9bfeea8`libctf.so.1`ctf_type_rvisit+0x56(c91f3930, 2f, c9b63588, 8046cc0, c91e47b6, 0) 08046c2c LMc9bfeea8`libctf.so.1`ctf_type_rvisit+0x15e(c91f3930, 30, c9b63588, 8046cc0, c91e489b, 0) 08046c64 LMc9bfeea8`libctf.so.1`ctf_type_rvisit+0x15e(c91f3930, 44, c9b63588, 8046cc0, c9ad9048, 0) 08046c8c LMc9bfeea8`libctf.so.1`ctf_type_visit+0x2c(c91f3930, 44, c9b63588, 8046cc0) 08046ce0 LMc9bfeea8`apptrace.so.1`print_value+0x127(c91f3930, 45, 0) 08046fac LMc9bfeea8`apptrace.so.1`la_i86_pltenter+0x3d1(8047030, 38, c9af06f0, c9af0e48, 8047084, c9940304) 08047000 ld.so.1`_audit_pltenter+0x11e(c9af06c0, c9bfea18, c9af0ac0, 8047030, 38, 8047084) 08047050 ld.so.1`audit_pltenter+0x98(c9bfea18, c9af0ac0, c9940308, 38, 8047084, c9940304) 080470d8 ld.so.1`elf_plt_trace+0x4d(0, 0, 8047130, 0) 08047158 settimeout+0x11d(60000000, 4212d440, 8047218, 8047218) 080471e8 main+0x2c6(8052e50, 4, 8047218) 0804720c _start+0x80(4, 80473d0, 80473d9, 80473e2, 80473e9, 0)
[apptrace.out (text/plain, attachment)]
[truss.out (text/plain, attachment)]
[truss-l.out (text/plain, attachment)]
bug-coreutils <at> gnu.org
:bug#21061
; Package coreutils
.
(Thu, 16 Jul 2015 07:57:01 GMT) Full text and rfc822 format available.Message #14 received at 21061 <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com> To: Peter Bray <pdb_ml <at> yahoo.com.au>, 21061 <at> debbugs.gnu.org Subject: Re: bug#21061: coreutils-8.24 - Partially reproducible failures of tests/misc/timeout-parameters.sh Date: Thu, 16 Jul 2015 08:56:37 +0100
On 16/07/15 07:09, Peter Bray wrote: > On 15/07/15 08:30 PM, Pádraig Brady wrote: >> On 15/07/15 10:22, Peter Bray wrote: >>> Greetings, >>> >>> N.B. This bug report is for reference only, and documents only a >>> partially reproducible check failure. No Action Requested. >>> >>> On Solaris 10 (Update 8 and Update 11) and Solaris 11.2 X86 VMs, and >>> one Solaris 10 Update 10 (Non-VM) system, I see random "gmake check" >>> failures for "tests/misc/timeout-parameters.sh". >>> >>> Running the test by itself (with the command line below) on the same >>> VMs / real system will sometimes succeed and sometimes fail. >>> >>> gmake check TESTS=tests/misc/timeout-parameters.sh VERBOSE=yes SUBDIRS=. >>> >>> Looking through the attached "failure.log" file, I extracted the >>> following command line test, which may exhibit the failure without all >>> the make(1) and test infrastructure code: >>> >>> failures=0 >>> for i in `./src/seq 1 100` >>> do >>> ./src/timeout 2.34e+5d sleep 0 \ >>> || { echo fail; failures=`expr ${failures} + 1` } >>> done >>> echo "Total Failures: ${failures}" >>> >>> On a real hardware system (Xeon E3-1245v2) with a 64-bit kernel, >>> failures are very rare (only 1 test harness failure seen, no failures >>> of the sample code above even with 1..1000 runs). >>> >>> On virtual machines (also using Xeon E3-1245v2 running VMware ESXi >>> 5.5d (latest patches) - two identical ESXi systems running similarly >>> configured VMs), test harness failures and failures in the above >>> command line check are rare for the 64-bit Solaris kernels. >>> >>> Failures on Solaris 10 32-bit kernels (on both of these ESXi servers), >>> are easily reproduced and vary between 5% (common) and 45% (rare). >> >> Interesting. I'm not reproducing that in 5000 loops in the above test script >> on 32 bit baremetal solaris 10 update 10. >> >> I presume the large timeout value is causing early timer firing >> on your systems for some reason? What does this return? >> >> time src/timeout 2.34e+5d sleep inf >> >> Note on 32 bit, the 234000 days will be truncated to a itimerspec of: >> { {0,0}, {2147483647,999999999} } >> >> A wild guess is that perhaps ntp is adjusting the system time >> which causes the above timer to be adjusted in the kernel >> and roll over to 0, thus triggering early? >> >> thanks, >> Pádraig. >> > > Pádraig, > > The additional information you requested, but unfortunately I have yet > to install gdb(1), so I using system tools for this response. The > installation of coreutils-8.24 has been completed on all compile > server VMs, so the commands now have a 'g' prefix. > > % gtimeout 2.34e+5d gsleep inf > > No output and exit status of 124 [$?=124] (32-bit kernel S10U11 / GCC 4.9.3) I was looking to get an indication of how long it waits (I have a fair idea about what is sent to timer_settime(). To that end, the command above has a leading 'time' to return how long it actually took. I.E. something like this would be informative: for i in $(seq 20); do time gtimeout 2.34e+5d sleep inf; done thanks, Pádraig.
bug-coreutils <at> gnu.org
:bug#21061
; Package coreutils
.
(Fri, 17 Jul 2015 03:23:02 GMT) Full text and rfc822 format available.Message #17 received at 21061 <at> debbugs.gnu.org (full text, mbox):
From: Peter Bray <pdb_ml <at> yahoo.com.au> To: Pádraig Brady <P <at> draigBrady.com>, 21061 <at> debbugs.gnu.org Subject: Re: bug#21061: coreutils-8.24 - Partially reproducible failures of tests/misc/timeout-parameters.sh Date: Fri, 17 Jul 2015 13:21:38 +1000
On 16/07/15 05:56 PM, Pádraig Brady wrote: > for i in $(seq 20); do time gtimeout 2.34e+5d sleep inf; done gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.015 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.009 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total Sorry, I didn't make that clear, when the command works (like it does mostly on 64-bit on 64-bit systems, as I poorly conveyed in the last post) it never returns and it returns "immediately" on the failure, which (as the above shows) was 20 out of 20 test cases. > This coredump occurs even on 64-bit systems where the gtimeout command > waits for the sleep command (to finish - which it won't). Sorry for the confusion, Peter
bug-coreutils <at> gnu.org
:bug#21061
; Package coreutils
.
(Fri, 17 Jul 2015 08:57:02 GMT) Full text and rfc822 format available.Message #20 received at 21061 <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com> To: Peter Bray <pdb_ml <at> yahoo.com.au>, 21061 <at> debbugs.gnu.org Subject: Re: bug#21061: coreutils-8.24 - Partially reproducible failures of tests/misc/timeout-parameters.sh Date: Fri, 17 Jul 2015 09:56:07 +0100
On 17/07/15 04:21, Peter Bray wrote: > On 16/07/15 05:56 PM, Pádraig Brady wrote: >> for i in $(seq 20); do time gtimeout 2.34e+5d sleep inf; done > > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.015 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.009 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > gtimeout 2.34e+5d gsleep inf 0.00s user 0.00s system 0% cpu 0.010 total > > Sorry, I didn't make that clear, when the command works (like it does > mostly on 64-bit on 64-bit systems, as I poorly conveyed in the last > post) it never returns and it returns "immediately" on the failure, > which (as the above shows) was 20 out of 20 test cases. Is there ever a failure with non maximal values? For example is there ever an early return with: while true; do time gtimeout 0.1 sleep inf; test $? != 124 && break; done I'm guessing not, and the kernel is having rounding issues? Do you still get the problem if you change the clock in timeout.c from CLOCK_REALTIME to say CLOCK_MONOTONIC? cheers, Pádraig.
bug-coreutils <at> gnu.org
:bug#21061
; Package coreutils
.
(Sat, 18 Jul 2015 06:36:02 GMT) Full text and rfc822 format available.Message #23 received at 21061 <at> debbugs.gnu.org (full text, mbox):
From: Peter Bray <pdb_ml <at> yahoo.com.au> To: Pádraig Brady <P <at> draigBrady.com>, 21061 <at> debbugs.gnu.org Subject: Re: bug#21061: coreutils-8.24 - Partially reproducible failures of tests/misc/timeout-parameters.sh Date: Sat, 18 Jul 2015 13:09:18 +1000
[Message part 1 (text/plain, inline)]
On 17/07/15 06:56 PM, Pádraig Brady wrote: > while true; do time gtimeout 0.1 sleep inf; test $? != 124 && break; done Run this code on S10U8/32 for almost 12 hours without it stopping itself. An almost 10 hours on S10U11/32 with the same result (not stopping itself). Made the requested change to src/timeout.c Backup : src/timeout.c.orig Editted : src/timeout.c Command : gdiff 139c139 < if (timer_create (CLOCK_REALTIME, NULL, &timerid) == 0) --- > if (timer_create (CLOCK_MONOTONIC, NULL, &timerid) == 0) Run as an unprivileged user, gives the following error summary: % egrep '(FAIL|ERROR) ' tests/test-suite.log | sed -e 's/FAIL /FAIL /' FAIL tests/misc/tee.sh (exit status: 1) FAIL tests/misc/timeout-blocked.pl (exit status: 1) FAIL tests/tail-2/wait.sh (exit status: 1) FAIL tests/tail-2/retry.sh (exit status: 1) FAIL tests/tail-2/symlink.sh (exit status: 1) FAIL tests/ls/infloop.sh (exit status: 1) FAIL tests/tail-2/follow-name.sh (exit status: 1) FAIL tests/tail-2/follow-stdin.sh (exit status: 1) On Solaris 10 Update 8 with Solaris 10 Update 11 and Solaris 11.2 have the additional tmpfs build problems mentioned in BUG-21062 With the following executive summary: % grep "timer_create: Not owner" ./tests/test-suite.log | sort | uniq -c 1 + timeout: warning: timer_create: Not owner 7 +timeout: warning: timer_create: Not owner 29 timeout: warning: timer_create: Not owner When run as "root" on Solaris 10 Update 8, the attached "test-suite.log-monotonic-root" shows that two tests FAIL: % egrep '(FAIL|ERROR) ' test-suite.log-*-root | sed -e 's/FAIL /FAIL /' FAIL tests/misc/tee.sh (exit status: 1) FAIL tests/misc/timeout-parameters.sh (exit status: 1) With the following section of the attached log, showing a sequence of failures (exit status = 124) on the root execution of the checks: +++ date +%s ++ expr 2147483647 - 1437184440 + 100 + KERNEL_OVERFLOW_LIMIT=710299307 + timeout 710299307 sleep 0 + test 0 '!=' 124 + timeout 4294967296 sleep 0 + test 124 = 0 + fail=1 ++ expr 4294967295 / 86400 + 1 + timeout 49711d sleep 0 + test 124 = 0 + fail=1 + timeout 999999999999999999999999999999999999999999999999999999999999d sleep 0 + test 124 = 0 + fail=1 + timeout 2.34e+5d sleep 0 + test 124 = 0 + fail=1 + timeout 2.34 sleep 0 + test 0 = 0 + timeout .999999999 sleep 0 + timeout --signal=invalid 1 sleep 0 timeout: invalid: invalid signal Try 'timeout --help' for more information. On Solaris 10 Update 11 as root, attached as "test-suite.log-monotonic-root-s10u11" shows a more promising result but still failure: +++ date +%s ++ expr 2147483647 - 1437187249 + 100 + KERNEL_OVERFLOW_LIMIT=710296498 + timeout 710296498 sleep 0 + test 0 '!=' 124 + timeout 4294967296 sleep 0 + test 124 = 0 + fail=1 ++ expr 4294967295 / 86400 + 1 + timeout 49711d sleep 0 + test 124 = 0 + fail=1 + timeout 999999999999999999999999999999999999999999999999999999999999d sleep 0 + test 124 = 0 + fail=1 + timeout 2.34e+5d sleep 0 + test 124 = 0 + fail=1 + timeout 2.34 sleep 0 + test 0 = 0 + timeout .999999999 sleep 0 + timeout --signal=invalid 1 sleep 0 timeout: invalid: invalid signal Try 'timeout --help' for more information. + test 125 = 125 The requirement for "root" privileges to use MONOTONIC is a non-starter in my books as the utility can not lower privileges for the command it runs - I hope you would agree. It seems that the tests fail even with the CLOCK_MONOTONIC option. The following test run in the monotonic change builds on S10U8 and S10U11, as root for only a minute or so seem to be a repetition of previously seen results, on both systems: # while true; do time ./src/timeout 0.1 ./src/sleep inf; test $? != 124 && break; done ./src/timeout 0.1 ./src/sleep inf 0.00s user 0.00s system 0% cpu 0.105 total ./src/timeout 0.1 ./src/sleep inf 0.00s user 0.00s system 0% cpu 0.107 total But the same code (monotonic build tree) as a non-privileged user, gives approximately a one second result from "time" in addition to the error message. % while true; do time ./src/timeout 0.1 ./src/sleep inf; test $? != 124 && break; done ./src/timeout: warning: timer_create: Not owner ./src/timeout 0.1 ./src/sleep inf 0.00s user 0.00s system 0% cpu 0.996 total ./src/timeout: warning: timer_create: Not owner ./src/timeout 0.1 ./src/sleep inf 0.00s user 0.00s system 0% cpu 1.000 total ./src/timeout: warning: timer_create: Not owner ./src/timeout 0.1 ./src/sleep inf 0.00s user 0.00s system 0% cpu 1.000 total ./src/timeout: warning: timer_create: Not owner ./src/timeout 0.1 ./src/sleep inf 0.00s user 0.00s system 0% cpu 0.999 total ./src/timeout: warning: timer_create: Not owner ./src/timeout 0.1 ./src/sleep inf 0.00s user 0.00s system 0% cpu 1.000 total ./src/timeout: warning: timer_create: Not owner ./src/timeout 0.1 ./src/sleep inf 0.00s user 0.00s system 0% cpu 1.000 total ./src/timeout: warning: timer_create: Not owner ./src/timeout 0.1 ./src/sleep inf 0.00s user 0.00s system 0% cpu 1.000 total ./src/timeout: warning: timer_create: Not owner ./src/timeout 0.1 ./src/sleep inf 0.00s user 0.00s system 0% cpu 1.000 total ./src/timeout: warning: timer_create: Not owner ./src/timeout 0.1 ./src/sleep inf 0.00s user 0.00s system 0% cpu 1.000 total ./src/timeout: warning: timer_create: Not owner ./src/timeout 0.1 ./src/sleep inf 0.00s user 0.00s system 0% cpu 1.000 total ./src/timeout: warning: timer_create: Not owner ./src/timeout 0.1 ./src/sleep inf 0.00s user 0.00s system 0% cpu 0.999 total ./src/timeout: warning: timer_create: Not owner ^C ./src/timeout 0.1 ./src/sleep inf 0.00s user 0.00s system 0% cpu 0.971 total I hope I am provided the requested / required information this time :-) Regards, Peter PS: The tmpfs bug (21062) affects commands as simple as "./src/cp -a tests .." (a command I issued before running the "gmake check" as root on S10U11)
[test-suite.log-monotonic-normal_user (text/plain, attachment)]
[test-suite.log-monotonic-root (text/plain, attachment)]
[test-suite.log-monotonic-root-s10u11 (text/plain, attachment)]
bug-coreutils <at> gnu.org
:bug#21061
; Package coreutils
.
(Sat, 18 Jul 2015 11:11:01 GMT) Full text and rfc822 format available.Message #26 received at 21061 <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com> To: Peter Bray <pdb_ml <at> yahoo.com.au>, 21061 <at> debbugs.gnu.org Subject: Re: bug#21061: coreutils-8.24 - Partially reproducible failures of tests/misc/timeout-parameters.sh Date: Sat, 18 Jul 2015 12:10:43 +0100
On 18/07/15 04:09, Peter Bray wrote: > On 17/07/15 06:56 PM, Pádraig Brady wrote: >> while true; do time gtimeout 0.1 sleep inf; test $? != 124 && break; done > > Run this code on S10U8/32 for almost 12 hours without it stopping > itself. An almost 10 hours on S10U11/32 with the same result (not > stopping itself). Cool. That suggests it's a kernel issue with large timeout values. > > Made the requested change to src/timeout.c > > Backup : src/timeout.c.orig > Editted : src/timeout.c > > Command : gdiff > > 139c139 > < if (timer_create (CLOCK_REALTIME, NULL, &timerid) == 0) > --- > > if (timer_create (CLOCK_MONOTONIC, NULL, &timerid) == 0) > > Run as an unprivileged user, gives the following error summary: > > % egrep '(FAIL|ERROR) ' tests/test-suite.log | sed -e 's/FAIL /FAIL /' > FAIL tests/misc/tee.sh (exit status: 1) > FAIL tests/misc/timeout-blocked.pl (exit status: 1) > FAIL tests/tail-2/wait.sh (exit status: 1) > FAIL tests/tail-2/retry.sh (exit status: 1) > FAIL tests/tail-2/symlink.sh (exit status: 1) > FAIL tests/ls/infloop.sh (exit status: 1) > FAIL tests/tail-2/follow-name.sh (exit status: 1) > FAIL tests/tail-2/follow-stdin.sh (exit status: 1) That's a strange restriction. Linux doesn't have that at least. > > On Solaris 10 Update 8 with Solaris 10 Update 11 and Solaris 11.2 have > the additional tmpfs build problems mentioned in BUG-21062 > > With the following executive summary: > > % grep "timer_create: Not owner" ./tests/test-suite.log | sort | uniq -c > 1 + timeout: warning: timer_create: Not owner > 7 +timeout: warning: timer_create: Not owner > 29 timeout: warning: timer_create: Not owner > > When run as "root" on Solaris 10 Update 8, the attached > "test-suite.log-monotonic-root" shows that two tests FAIL: > > % egrep '(FAIL|ERROR) ' test-suite.log-*-root | sed -e 's/FAIL /FAIL /' > FAIL tests/misc/tee.sh (exit status: 1) > FAIL tests/misc/timeout-parameters.sh (exit status: 1) > > With the following section of the attached log, showing a sequence of > failures (exit status = 124) on the root execution of the checks: > > +++ date +%s > ++ expr 2147483647 - 1437184440 + 100 > + KERNEL_OVERFLOW_LIMIT=710299307 > + timeout 710299307 sleep 0 > + test 0 '!=' 124 > + timeout 4294967296 sleep 0 > + test 124 = 0 > + fail=1 > ++ expr 4294967295 / 86400 + 1 > + timeout 49711d sleep 0 > + test 124 = 0 > + fail=1 > + timeout 999999999999999999999999999999999999999999999999999999999999d > sleep 0 > + test 124 = 0 > + fail=1 > + timeout 2.34e+5d sleep 0 > + test 124 = 0 > + fail=1 > + timeout 2.34 sleep 0 > + test 0 = 0 > + timeout .999999999 sleep 0 > + timeout --signal=invalid 1 sleep 0 > timeout: invalid: invalid signal > Try 'timeout --help' for more information. > > On Solaris 10 Update 11 as root, attached as > "test-suite.log-monotonic-root-s10u11" shows a more promising result but > still failure: > > +++ date +%s > ++ expr 2147483647 - 1437187249 + 100 > + KERNEL_OVERFLOW_LIMIT=710296498 Note we already check for similar 32 bit Linux/HPPA kernel issues in this test. If we found a maximal value that was problematic for solaris and ran the gating test in a loop 10 times say. The non determinism of this is awkward though. > The requirement for "root" privileges to use MONOTONIC is a non-starter > in my books as the utility can not lower privileges for the command it > runs - I hope you would agree. It seems that the tests fail even with > the CLOCK_MONOTONIC option. +! > I hope I am provided the requested / required information this time :-) Yes thanks for the extensive testing. Pádraig.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.