GNU bug report logs - #12820
gnulib testsuite failure in latest master

Previous Next

Package: coreutils;

Reported by: Stefano Lattarini <stefano.lattarini <at> gmail.com>

Date: Wed, 7 Nov 2012 08:56:02 UTC

Severity: normal

Tags: moreinfo

Merged with 13894

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #45 received at 12820 <at> debbugs.gnu.org (full text, mbox):

From: Nix <nix <at> esperi.org.uk>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 12820 <at> debbugs.gnu.org
Subject: Re: bug#12820: FWIW, this is still happening as of gnulib 4a82904
Date: Wed, 27 Feb 2013 18:25:44 +0000
On 26 Feb 2013, Paul Eggert outgrape:

> On 02/26/13 13:48, Nix wrote:
>
>> #2  0x00000000004021e0 in test_utimens (print=print <at> entry=true, func=0x401890 <do_utimensat>) at test-utimens.h:35
>
> If that line number is right, the test program
> did a creat (file, 0600) that succeeded, followed
> by a stat (file, &st) that failed.

Looks like it.

Running test-utimensat in a tight loop shows frequent assertion
failures: running it under high load (e.g. while a 'make -j 9 check' is
running) shows fewer:

No load:

% START=$(date +%s); while [[ $(($(date +%s)-$START)) -lt 10 ]]; do ./test-utimensat; done
test-lutimens.h:162: assertion failed
test-lutimens.h:162: assertion failed
test-utimens.h:103: assertion failed
test-utimens.h:41: assertion failed
test-lutimens.h:162: assertion failed

Parallel make -j 9 check:

% START=$(date +%s); while [[ $(($(date +%s)-$START)) -lt 10 ]]; do ./test-utimensat; done
test-utimens.h:113: assertion failed
test-utimens.h:37: assertion failed
test-utimens.h:130: assertion failed
test-utimens.h:113: assertion failed
test-utimens.h:103: assertion failed
test-lutimens.h:64: assertion failed
test-utimens.h:128: assertion failed
test-utimens.h:103: assertion failed
test-lutimens.h:64: assertion failed

I note that this is not exactly healthy hardware: it is has a slowly
failing fan and is going in for repair tomorrow. However it isn't
overheating yet and *did* just complete a GCC LTO bootstrap-and-test
with no FAILs I wasn't expecting so I don't think the fan failure is
likely to be the cause of these assertions.

As is typical of timing-related faults, running it under strace makes
the frequency of the problems plunge. I've stuck a tarball of all the
distinct failing straces I could get in a few hours of tight loops under
strace up at
<http://www.esperi.org.uk/~nix/bugs/gnulib/utimensat.tar.gz>. (I
truncated the execve lines because dumping my entire process environment
out in public is not something I'm entirely happy doing. Paranoia,
maybe. :) )


Classifying the small set of assertion failures above (a set small
enough that I felt I could attack it without dying of boredom), we see:

test-utimens.h:37: assertion failed
  ASSERT (func (BASE "file", NULL) == 0);

test-utimens.h:113: assertion failed
  ASSERT (st3.st_atime == Y2K);

test-utimens.h:128: assertion failed
  ASSERT (get_stat_mtime_ns (&st3) == get_stat_mtime_ns (&st2));

test-lutimens.h:64: assertion failed
test-lutimens.h:162: assertion failed
test-utimens.h:41: assertion failed
test-utimens.h:103: assertion failed
test-utimens.h:130: assertion failed
  ASSERT (ctime_compare (&st1, &st2) < 0);
  ASSERT (ctime_compare (&st2, &st3) < 0);

I thought it was worth my analyzing at least the largest set of
failures, this one. This is simply a race. When these assertions are
hit, I see things like this:

st1.st_ctime: 1361977729; st2.st_ctime: 1361977729; compare: 0; get_stat_ctime_ns(st1): 613474299, get_stat_ctime_ns(st2): 613474299

i.e. the nsec values are identical. This is surely valid: it just means
the two stats were carried out in the same 1ns interval. All these
asserts should be <= 0. (I just saw an identical failure in
test-lchown.h:112.)




This bug report was last modified 6 years and 182 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.