GNU bug report logs - #32786
(system* ...) call somehow interferes with atomic-box on armv7l

Previous Next

Package: guile;

Reported by: Михаил Бахтерев <mob <at> k.imm.uran.ru>

Date: Thu, 20 Sep 2018 18:08:01 UTC

Severity: normal

Done: Mark H Weaver <mhw <at> netris.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Михаил Бахтерев
 <mob <at> k.imm.uran.ru>
Subject: bug#32786: closed (Re: bug#32786: (system* ...) call somehow
 interferes with atomic-box on armv7l)
Date: Fri, 05 Oct 2018 23:23:01 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#32786: (system* ...) call somehow interferes with atomic-box on armv7l

which was filed against the guile package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 32786 <at> debbugs.gnu.org.

-- 
32786: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=32786
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Mark H Weaver <mhw <at> netris.org>
To: Михаил Бахтерев <mob <at> k.imm.uran.ru>
Cc: 32786-done <at> debbugs.gnu.org
Subject: Re: bug#32786: (system* ...) call somehow interferes with atomic-box
 on armv7l
Date: Fri, 05 Oct 2018 19:22:13 -0400
Hi,

Михаил Бахтерев <mob <at> k.imm.uran.ru> writes:

> I've tested the patch (i've applied it to the v2.2.4 source code). The
> test sample works now as expected with both test «payloads»: (sleep
> N-seconds) and (system* "any" "command" "here") calls.

Thanks for letting us know the results of your testing, this is very
helpful.  I just pushed a similar fix (same code, improved comments) to
the stable-2.2 branch, commit 2d09e0513fc11e2305077ba3653f6e4c2f266ddb,
which will be in the upcoming guile-2.2.5 release.

    Thanks,
      Mark


>
> Thanks for fixing it.
>
> - MB. Respectfully
>
> On Thu, Sep 27, 2018 at 01:49:23AM -0400, Mark H Weaver wrote:
>> Hi,
>> 
>> Thanks for the additional details.  I was able to reproduce the bug, and
>> I believe I now see the problem.
>> 
>> 'atomic-box-compare-and-swap!' is implemented using
>> 'atomic_compare_exchange_weak' (if available), but neglects to handle
>> the case where 'atomic_compare_exchange_weak' may spuriously fail.  In
>> that case, the box is left unchanged, although the observed value was
>> equal to the expected value.
>> 
>> What's happening here is that the 'atomic-box-compare-and-swap!' in
>> 'sleep-loop' fails spuriously, leaving the box in state #:accepted
>> although it returns #:accepted as its result.  When the main loop
>> discovers this, it changes the state to #:need-to-sleep, although the
>> thread has already ended.
>> 
>> To confirm this hypothesis, I added a print statement to the main loop
>> showing the state of the box that it observed during the protocol
>> exchange, and indeed it sees #:accepted the first time it checks, and
>> #:need-to-sleep in all later iterations.
>> 
>> I've attached a proposed patch that I hope will fix this problem.  If
>> you'd be willing to test it, I'd be grateful, but otherwise I'll try to
>> test it in the next week or so.  My access to armv7l boxes is somewhat
>> limited.
>> 
>> Thanks for this report.
>> 
>>       Mark
>> 
>> 
>
>> From 958d37686a9ac65f48cb2ca32a341cf182c24b5a Mon Sep 17 00:00:00 2001
>> From: Mark H Weaver <mhw <at> netris.org>
>> Date: Thu, 27 Sep 2018 01:00:11 -0400
>> Subject: [PATCH] UNTESTED: Fix 'atomic-box-compare-and-swap!'.
>> 
>> Fixes <https://bugs.gnu.org/32786>.
>> 
>> 'scm_atomic_compare_and_swap_scm' uses 'atomic_compare_exchange_weak' if
>> it's available on the host platform.  'atomic_compare_exchange_weak' may
>> fail spuriously, leaving the atomic object unchanged even when it
>> contained the expected value.  'atomic-box-compare-and-swap!' uses
>> 'scm_atomic_compare_and_swap_scm' without checking its return value, and
>> therefore may return the expected value while leaving the box unchanged.
>> This contradicts the documentation, which explicitly states that "you
>> can know if the swap worked by checking if the return value is 'eq?' to
>> EXPECTED.".
>> 
>> * libguile/atomic.c (scm_atomic_box_compare_and_swap_x): If
>> 'scm_atomic_compare_and_swap_scm' returns false and the observed value
>> is equal to 'expected', then try again.
>> * libguile/vm-engine.c (atomic-box-compare-and-swap!): Ditto.
>> ---
>>  libguile/atomic.c    | 13 +++++++++----
>>  libguile/vm-engine.c | 13 ++++++++-----
>>  2 files changed, 17 insertions(+), 9 deletions(-)
>> 
>> diff --git a/libguile/atomic.c b/libguile/atomic.c
>> index 950874030..504643912 100644
>> --- a/libguile/atomic.c
>> +++ b/libguile/atomic.c
>> @@ -1,4 +1,4 @@
>> -/* Copyright (C) 2016 Free Software Foundation, Inc.
>> +/* Copyright (C) 2016, 2018 Free Software Foundation, Inc.
>>   *
>>   * This library is free software; you can redistribute it and/or
>>   * modify it under the terms of the GNU Lesser General Public License
>> @@ -95,10 +95,15 @@ SCM_DEFINE (scm_atomic_box_compare_and_swap_x,
>>              "if the return value is @code{eq?} to @var{expected}.")
>>  #define FUNC_NAME s_scm_atomic_box_compare_and_swap_x
>>  {
>> +  SCM result = expected;
>> +
>>    SCM_VALIDATE_ATOMIC_BOX (1, box);
>> -  scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (box),
>> -                                   &expected, desired);
>> -  return expected;
>> +  while (!scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (box),
>> +                                           &result, desired)
>> +         && scm_is_eq (result, expected))
>> +    {
>> +    }
>> +  return result;
>>  }
>>  #undef FUNC_NAME
>>  
>> diff --git a/libguile/vm-engine.c b/libguile/vm-engine.c
>> index 19ff3e498..9650e33eb 100644
>> --- a/libguile/vm-engine.c
>> +++ b/libguile/vm-engine.c
>> @@ -3868,16 +3868,19 @@ VM_NAME (scm_i_thread *thread, struct scm_vm *vp,
>>      {
>>        scm_t_uint16 dst, box;
>>        scm_t_uint32 expected, desired;
>> -      SCM scm_box, scm_expected;
>> +      SCM scm_box, scm_expected, scm_result;
>>        UNPACK_12_12 (op, dst, box);
>>        UNPACK_24 (ip[1], expected);
>>        UNPACK_24 (ip[2], desired);
>>        scm_box = SP_REF (box);
>>        VM_VALIDATE_ATOMIC_BOX (scm_box, "atomic-box-compare-and-swap!");
>> -      scm_expected = SP_REF (expected);
>> -      scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (scm_box),
>> -                                       &scm_expected, SP_REF (desired));
>> -      SP_SET (dst, scm_expected);
>> +      scm_result = scm_expected = SP_REF (expected);
>> +      while (!scm_atomic_compare_and_swap_scm (scm_atomic_box_loc (scm_box),
>> +                                               &scm_result, SP_REF (desired))
>> +             && scm_is_eq (scm_result, scm_expected))
>> +        {
>> +        }
>> +      SP_SET (dst, scm_result);
>>        NEXT (3);
>>      }
>>  
>> -- 
>> 2.19.0
>> 

[Message part 3 (message/rfc822, inline)]
From: Михаил Бахтерев <mob <at> k.imm.uran.ru>
To: bug-guile <at> gnu.org
Subject: (system* ...) call somehow interferes with atomic-box on armv7l
Date: Thu, 20 Sep 2018 22:13:06 +0500
[Message part 4 (text/plain, inline)]
Greetings.

I attach the code sample, which expected behaviour is:

  1. to periodically restart thread, which executes «sleep 1s» with
  system* call;

  2. and then to check if it should repeat the execution.
  
The flag, which controls that re-execution is managed by another thread
through atomic-box primitive.

The code does not run as expected: the sleep executing sleep starts
once, and change seems to remain invisible to main thread.

The code works as expected, when:

  1. (sleep 1) is used instead (system* "sleep" "1s");
  2. running the code on x86_64 cpus.

The use of affinity mask

  taskset 0x01 guile test.scm

does not help.

- MB. Respectfully

P.S. Additional information:

$ guile --version  
guile (GNU Guile) 2.2.4
Copyright (C) 2018 Free Software Foundation, Inc.

License LGPLv3+: GNU LGPL 3 or later <http://gnu.org/licenses/lgpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

$ sh config.guess
armv7l-unknown-linux-gnueabihf

$ pacman -Qi guile 
Name            : guile
Version         : 2.2.4-1
Description     : Portable, embeddable Scheme implementation written in C
Architecture    : armv7h
URL             : https://www.gnu.org/software/guile/
Licenses        : GPL
Groups          : None
Provides        : None
Depends On      : gmp  libltdl  ncurses  texinfo  libunistring  gc  libffi
Optional Deps   : None
Required By     : make
Optional For    : gnutls
Conflicts With  : None
Replaces        : None
Installed Size  : 40.83 MiB
Packager        : Arch Linux ARM Build System <builder+xu9 <at> archlinuxarm.org>
Build Date      : Tue 03 Jul 2018 04:47:53 PM +05
Install Date    : Mon 09 Jul 2018 01:02:48 PM +05
Install Reason  : Installed as a dependency for another package
Install Script  : No
Validated By    : Signature
[test.scm (application/vnd.lotus-screencam, attachment)]
[cpuinfo (text/plain, attachment)]

This bug report was last modified 6 years and 287 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.