GNU bug report logs - #68860
race condition with make recheck

Previous Next

Package: automake;

Reported by: Peter Johansson <trojkan <at> gmail.com>

Date: Thu, 1 Feb 2024 01:13:01 UTC

Severity: normal

To reply to this bug, email your comments to 68860 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-automake <at> gnu.org:
bug#68860; Package automake. (Thu, 01 Feb 2024 01:13:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Peter Johansson <trojkan <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-automake <at> gnu.org. (Thu, 01 Feb 2024 01:13:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Peter Johansson <trojkan <at> gmail.com>
To: bug-automake <at> gnu.org
Subject: race condition with make recheck
Date: Thu, 1 Feb 2024 11:11:37 +1000
[Message part 1 (text/plain, inline)]
Hi automakers,

I think I've found a race condition with 'make recheck' that results in 
a source file being compiled twice in parallel and resulting in a 
failure such as

mv: cannot stat '.deps/foo.Tpo': No such file or directory

In my trimmed down example my Makefile.am looks like:

lib_LIBRARIES = libfoo.a
libfoo_a_SOURCES = foo.cc
check_LIBRARIES = libtest.a
libtest_a_SOURCES = test.cc
TESTS = one.test two.test
TEST_EXTENSIONS = .test
AM_DEFAULT_SOURCE_EXT = .cc
EXTRA_PROGRAMS = $(TESTS)
libtest_a_LIBADD = libfoo.a
LDADD = libtest.a libfoo.a

The problem seems to be that both $(TESTS) and check_LIBRARIES depend on 
libfoo.a and trigger compilation of foo.cc. I haven't managed to get the 
same problem with 'make check', so I thought comparing the generated 
rules for check: and recheck: would be useful.

recheck: all $(check_LIBRARIES)

<long rule running failed TESTS>

all: config.h
    $(MAKE) $(AM_MAKEFLAGS) all-am
...

check-am: all-am
    $(MAKE) $(AM_MAKEFLAGS) $(check_LIBRARIES)
    $(MAKE) $(AM_MAKEFLAGS) check-TESTS
check: check-am

I can see how the "check-am: all-am" works as firewall against the race 
condition. OTOH, in the rule for recheck, 'all' triggers a sub-process 
that will build libfoo.a and in the meantime the main process will build 
$(check_LIBRARIES) which trigger the building of libfoo.a. My 
understanding of parallel make is a bit hazy, but I guess the main 
process and sub-process are only talking wrt how many workers they 
employ and are not talking about which rules to work on.

I suppose this is not by design or that I'm doing something illegal by 
having check_LIBRARIES depend stuff that is built within 'make all'. I'm 
not sure what the best way to fix this would be. One idea would to 
change the rule for recheck to

recheck: all

     $(MAKE) $(AM_MAKEFLAGS) $(check_LIBRARIES)

     <long rule running failed TESTS>


but personally I don't fancy these sub-processes because it feels like 
they are the core of the problem for these sort of race conditions.

I have tested with automake 1.16.5 (ubuntu) and 1.16i.

Please find attached a trimmed down example of the problem.


Best Regards,

Peter
[automake.sh (application/x-shellscript, attachment)]

Information forwarded to bug-automake <at> gnu.org:
bug#68860; Package automake. (Thu, 01 Feb 2024 22:26:01 GMT) Full text and rfc822 format available.

Message #8 received at 68860 <at> debbugs.gnu.org (full text, mbox):

From: Karl Berry <karl <at> freefriends.org>
To: trojkan <at> gmail.com
Cc: 68860 <at> debbugs.gnu.org
Subject: Re: bug#68860: race condition with make recheck
Date: Thu, 1 Feb 2024 15:25:20 -0700
Hi Peter,

    The problem seems to be that both $(TESTS) and check_LIBRARIES depend on 
    libfoo.a and trigger compilation of foo.cc. 

Thanks much for the report and analysis. What you wrote looks sensible
to me.

    My understanding of parallel make is a bit hazy,

Me too :(. If anyone else here has a chance to look into this, that
would be great.

    One idea would to change the rule for recheck to

It looks plausible. Another possibility that comes to mind is to make
the recheck target more parallel to all, i.e., with a recheck-am
target. I', not sure.

    Please find attached a trimmed down example of the problem.

Thanks again. Will ponder. --best, karl.




Information forwarded to bug-automake <at> gnu.org:
bug#68860; Package automake. (Fri, 16 Aug 2024 22:23:02 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Bogdan <bogdro_rep <at> gmx.us>
To: automake-patches <at> gnu.org, 68860 <at> debbugs.gnu.org
Cc: Peter Johansson <trojkan <at> gmail.com>
Subject: [bug#68860] race condition with make recheck
Date: Sat, 17 Aug 2024 00:21:44 +0200
[Message part 1 (text/plain, inline)]
Hello.

Thank you for reporting the issue.

The attached patch should fix the problem. It may be a bit of an
overkill, perhaps just one of the fixes would suffice, but it seems to
work at least.

I've re-made your useful script into an Automake test. Since
non-deterministic defects may be hard to find and fix, and certainly
harder to test if they're fixed, the new version simply runs parallel
'make recheck' a few times "just in case". Without the fix, the test
failed in the first or the second run. With the fix, the test (which
runs 'make recheck' 5 times) passed 5 times in a row. This *should* be
a decent sample.

All tests with "check" in the name pass.

The test and my patch can, of course, be adapted and further changed.

--
Regards - Bogdan ('bogdro') D.                 (GNU/Linux & FreeDOS)
X86 assembly (DOS, GNU/Linux):    http://bogdro.evai.pl/index-en.php
Soft(EN): http://bogdro.evai.pl/soft  http://bogdro.evai.pl/soft4asm
www.Xiph.org  www.TorProject.org  www.LibreOffice.org  www.GnuPG.org
[automake-recheck-race-mail.diff (text/x-patch, attachment)]

Information forwarded to bug-automake <at> gnu.org:
bug#68860; Package automake. (Fri, 16 Aug 2024 22:23:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-automake <at> gnu.org:
bug#68860; Package automake. (Sat, 17 Aug 2024 22:24:01 GMT) Full text and rfc822 format available.

Message #17 received at 68860 <at> debbugs.gnu.org (full text, mbox):

From: Karl Berry <karl <at> freefriends.org>
To: bogdro_rep <at> gmx.us
Cc: 68860 <at> debbugs.gnu.org
Subject: Re: bug#68860: race condition with make recheck
Date: Sat, 17 Aug 2024 16:22:34 -0600
Thanks Bogdan! I will review as soon as I have a chance. --best, karl.




Information forwarded to bug-automake <at> gnu.org:
bug#68860; Package automake. (Fri, 23 Aug 2024 21:12:02 GMT) Full text and rfc822 format available.

Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Bogdan <bogdro_rep <at> gmx.us>
To: automake-patches <at> gnu.org, 68860 <at> debbugs.gnu.org, 26471 <at> debbugs.gnu.org
Subject: [bug#68860] race condition with make recheck
Date: Fri, 23 Aug 2024 23:10:07 +0200
Hi.

I've just noticed that bug #68860 (patched) may be a duplicate of
#26471. Different descriptions and error messages, but looks like the
same cause.

--
Regards - Bogdan ('bogdro') D.                 (GNU/Linux & FreeDOS)
X86 assembly (DOS, GNU/Linux):    http://bogdro.evai.pl/index-en.php
Soft(EN): http://bogdro.evai.pl/soft  http://bogdro.evai.pl/soft4asm
www.Xiph.org  www.TorProject.org  www.LibreOffice.org  www.GnuPG.org





Information forwarded to bug-automake <at> gnu.org:
bug#68860; Package automake. (Fri, 23 Aug 2024 21:12:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-automake <at> gnu.org:
bug#68860; Package automake. (Sun, 25 Aug 2024 16:47:02 GMT) Full text and rfc822 format available.

Message #26 received at 68860 <at> debbugs.gnu.org (full text, mbox):

From: Karl Berry <karl <at> freefriends.org>
To: bogdro_rep <at> gmx.us, 68860 <at> debbugs.gnu.org, trojkan <at> gmail.com
Subject: Re: bug#68860: race condition with make recheck
Date: Sun, 25 Aug 2024 10:45:34 -0600
Thanks much, Bogdan.

    -recheck: all %CHECK_DEPS%
    +recheck: all-am %CHECK_DEPS%

Do you have a grip on all-am? Looking at handle_all in bin/automake, I
admit I remain baffled as to what all those pieces of all-am are, and
why it's done as it is.

    -  $output_rules .= "check-am: all-am\n";
    +  $output_rules .= "check-am: all-am";
       if (@check)
         {
    -      pretty_print_rule ("\t\$(MAKE) \$(AM_MAKEFLAGS)", "\t  ", @check);
    +      $output_rules .= " @check";
    +      #pretty_print_rule ("\t\$(MAKE) \$(AM_MAKEFLAGS)", "\t  ", @check);
           depend ('.MAKE', 'check-am');
         }
    +  $output_rules .= "\n";

So I gather the basic fix to output the check targets as dependencies of
check-am, instead of as sub-makes. That seems a plausible reason and fix
for the parallel bug to me.

Anyway, I will tweak a few words and install this soon. --thanks again, karl.




Information forwarded to bug-automake <at> gnu.org:
bug#68860; Package automake. (Sun, 25 Aug 2024 18:45:01 GMT) Full text and rfc822 format available.

Message #29 received at 68860 <at> debbugs.gnu.org (full text, mbox):

From: Bogdan <bogdro_rep <at> gmx.us>
To: Karl Berry <karl <at> freefriends.org>, 68860 <at> debbugs.gnu.org, trojkan <at> gmail.com
Subject: Re: bug#68860: race condition with make recheck
Date: Sun, 25 Aug 2024 20:43:31 +0200
Karl Berry <karl <at> freefriends.org>, 2024-08-25 10:45:
> Thanks much, Bogdan.
>
>      -recheck: all %CHECK_DEPS%
>      +recheck: all-am %CHECK_DEPS%
>
> Do you have a grip on all-am? Looking at handle_all in bin/automake, I
> admit I remain baffled as to what all those pieces of all-am are, and
> why it's done as it is.


 Te be honest, not really :). At least, not fully. As far as I
understand/remember, those "all-am" were the ones processed
recursively. But, I may be wrong, seeing this comment in handle_all:

	# We need to make sure config.h is built before we recurse.
	# We also want to make sure that built sources are built
	# before any ordinary 'all' targets are run.  We can't do this
	# by changing the order of dependencies to the "all" because
	# that breaks when using parallel makes.  Instead we handle
	# things explicitly.

So, "all" just checks/remakes config.h before starting "the real work"
in all-am (be it recursive or not, parallel or not).


>      -  $output_rules .= "check-am: all-am\n";
>      +  $output_rules .= "check-am: all-am";
>         if (@check)
>           {
>      -      pretty_print_rule ("\t\$(MAKE) \$(AM_MAKEFLAGS)", "\t  ", @check);
>      +      $output_rules .= " @check";
>      +      #pretty_print_rule ("\t\$(MAKE) \$(AM_MAKEFLAGS)", "\t  ", @check);
>             depend ('.MAKE', 'check-am');
>           }
>      +  $output_rules .= "\n";
>
> So I gather the basic fix to output the check targets as dependencies of
> check-am, instead of as sub-makes. That seems a plausible reason and fix
> for the parallel bug to me.


 Yes, I'm adding the dependencies as I believe they should be. Here
and in check.am. Maybe the check.am is too much (especially seeing
that skipping the dependency on config.h may *not* be desired) and
fixing only the code will be enough.
 As it is with non-deterministic problem, it's not 100% guaranteed
that this fixes the problem. But, a few runs of parallel 'make
recheck' seems to prove it.


> Anyway, I will tweak a few words and install this soon. --thanks again, karl.


 No problem. And thanks :)

--
Regards - Bogdan ('bogdro') D.                 (GNU/Linux & FreeDOS)
X86 assembly (DOS, GNU/Linux):    http://bogdro.evai.pl/index-en.php
Soft(EN): http://bogdro.evai.pl/soft  http://bogdro.evai.pl/soft4asm
www.Xiph.org  www.TorProject.org  www.LibreOffice.org  www.GnuPG.org




Information forwarded to bug-automake <at> gnu.org:
bug#68860; Package automake. (Mon, 26 Aug 2024 01:19:02 GMT) Full text and rfc822 format available.

Message #32 received at 68860 <at> debbugs.gnu.org (full text, mbox):

From: Karl Berry <karl <at> freefriends.org>
To: bogdro_rep <at> gmx.us
Cc: 68860 <at> debbugs.gnu.org, trojkan <at> gmail.com
Subject: Re: bug#68860: race condition with make recheck
Date: Sun, 25 Aug 2024 19:17:52 -0600
    >      -  $output_rules .= "check-am: all-am\n";
    >      +  $output_rules .= "check-am: all-am";
    >         if (@check)
    >           {
    >      -      pretty_print_rule ("\t\$(MAKE) \$(AM_MAKEFLAGS)", "\t  ", @check);
    >      +      $output_rules .= " @check";

Looking again, the comment before this code says:

  # The check target must depend on the local equivalent of
  # 'all', to ensure all the primary targets are built.  Then it
  # must build the local check rules.

.. which makes sense. We have to make all before we can make check.
Hence the check targets can't be dependencies, since then they would be
run in parallel with make, and the programs built by 'all' might not be
built yet. This explains why they made it a sub-make.

So I'm puzzled as to how all the tests can still be passing. Maybe there
is no test specifically for this? --thanks, karl.




Information forwarded to bug-automake <at> gnu.org:
bug#68860; Package automake. (Mon, 26 Aug 2024 19:52:02 GMT) Full text and rfc822 format available.

Message #35 received at 68860 <at> debbugs.gnu.org (full text, mbox):

From: Bogdan <bogdro_rep <at> gmx.us>
To: Karl Berry <karl <at> freefriends.org>
Cc: 68860 <at> debbugs.gnu.org, trojkan <at> gmail.com
Subject: Re: bug#68860: race condition with make recheck
Date: Mon, 26 Aug 2024 21:49:53 +0200
Karl Berry <karl <at> freefriends.org>, 2024-08-25 19:17:
>      >      -  $output_rules .= "check-am: all-am\n";
>      >      +  $output_rules .= "check-am: all-am";
>      >         if (@check)
>      >           {
>      >      -      pretty_print_rule ("\t\$(MAKE) \$(AM_MAKEFLAGS)", "\t  ", @check);
>      >      +      $output_rules .= " @check";
>
> Looking again, the comment before this code says:
>
>    # The check target must depend on the local equivalent of
>    # 'all', to ensure all the primary targets are built.  Then it
>    # must build the local check rules.
>
> .. which makes sense. We have to make all before we can make check.
> Hence the check targets can't be dependencies, since then they would be
> run in parallel with make, and the programs built by 'all' might not be
> built yet. This explains why they made it a sub-make.


 Totally makes sense, and I'm not removing the dependency on all-am.
When I see that the first command of a target is a 'make', I start
thinking that something in dependency management is wrong. It
shouldn't be needed, right? That's one of the jobs 'make' does -
figure out what needs to be built and in what order. So, if the
dependencies would be correct in the first place, maybe running 'make'
in a target wouldn't be needed (well, not in the beginning, at least).
That's why I'm adding @check to the dependency list instead of
building it manually as the first command. The dependencies /should/
be computed correctly and built just once (if needed, that is).

 But, correct dependencies are maybe just in the perfect world.

 There probably were reasons to do it this way, like parallel make
(which /should/ work correctly, but maybe not all implementations do)
or some implementations that e.g. don't follow the order and break the
builds because of that, or too many too complicated dependencies to
put on each target, or...

 So, what do we do? It has just become a bit scary to apply the
patch, but it looks like it's exactly the dependency list that should
be fixed...


> So I'm puzzled as to how all the tests can still be passing. Maybe there
> is no test specifically for this? --thanks, karl.


 Maybe. Or maybe tests pass on the well-behaving GNU Make, but not on
all 'make's. Or I didn't run the "right ones".


--
Regards - Bogdan ('bogdro') D.                 (GNU/Linux & FreeDOS)
X86 assembly (DOS, GNU/Linux):    http://bogdro.evai.pl/index-en.php
Soft(EN): http://bogdro.evai.pl/soft  http://bogdro.evai.pl/soft4asm
www.Xiph.org  www.TorProject.org  www.LibreOffice.org  www.GnuPG.org




This bug report was last modified 290 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.