GNU bug report logs - #67502
[Cuirass] ‘cuirass remote-worker’ gets the CPU count wrong on the OverDrive

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludovic.courtes <at> inria.fr>

Date: Tue, 28 Nov 2023 09:11:02 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 67502 in the body.
You can then email your comments to 67502 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#67502; Package guix. (Tue, 28 Nov 2023 09:11:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ludovic Courtès <ludovic.courtes <at> inria.fr>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Tue, 28 Nov 2023 09:11:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludovic.courtes <at> inria.fr>
To: bug-guix <at> gnu.org
Subject: [Cuirass] ‘cuirass remote-worker’ gets
 the CPU count wrong on the OverDrive
Date: Tue, 28 Nov 2023 10:09:42 +0100
On the OverDrive (AArch64), ‘cuirass remote-worker’ (1.2.0-1.bdc1f9f) says:

  starting 2 workers (parallelism: 1 cores) for server at 10.0.0.1

Instead it should use two cores for each worker:

--8<---------------cut here---------------start------------->8---
ludo <at> dover ~$ guile -c '(use-modules (ice-9 threads)) (pk (current-processor-count))'

;;; (4)
ludo <at> dover ~$ guile -c '(use-modules (ice-9 threads)) (pk (ceiling-quotient (current-processor-count) 2))'

;;; (2)
ludo <at> dover ~$ nproc
4
--8<---------------cut here---------------end--------------->8---

Since ‘current-processor-count’ is implemented indirectly in terms of
‘sched_getaffinity’, this suggests that the process starts with a bogus
affinity mask.  (Time passes…)  That’s indeed the case:

--8<---------------cut here---------------start------------->8---
ludo <at> dover ~$ sudo herd status cuirass-remote-worker
Status of cuirass-remote-worker:
  It is started.
  Running value is 21279.
  It is enabled.
  Provides (cuirass-remote-worker).
  Requires (avahi-daemon guix-daemon networking).
  Will be respawned.
ludo <at> dover ~$ guile -c '(pk (getaffinity 21279))'

;;; (#*1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
--8<---------------cut here---------------end--------------->8---

Compare to the affinity mask on x86_64-linux-gnu:

--8<---------------cut here---------------start------------->8---
ludo <at> guix-hpc3 ~$ sudo guile -c '(pk (getaffinity 1817))'

;;; (#*1111111111111111111111111111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
--8<---------------cut here---------------end--------------->8---

Interesting that the initial affinity mask differs on aarch64-linux-gnu
compared to x86_64-linux-gnu.

Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Tue, 28 Nov 2023 15:30:02 GMT) Full text and rfc822 format available.

Notification sent to Ludovic Courtès <ludovic.courtes <at> inria.fr>:
bug acknowledged by developer. (Tue, 28 Nov 2023 15:30:02 GMT) Full text and rfc822 format available.

Message #10 received at 67502-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 67502-done <at> debbugs.gnu.org
Subject: Re: bug#67502: [Cuirass] ‘cuirass remote-worker’ gets the CPU count wrong on the OverDrive
Date: Tue, 28 Nov 2023 16:28:50 +0100
Ludovic Courtès <ludovic.courtes <at> inria.fr> skribis:

> ludo <at> dover ~$ sudo herd status cuirass-remote-worker
> Status of cuirass-remote-worker:
>   It is started.
>   Running value is 21279.
>   It is enabled.
>   Provides (cuirass-remote-worker).
>   Requires (avahi-daemon guix-daemon networking).
>   Will be respawned.
> ludo <at> dover ~$ guile -c '(pk (getaffinity 21279))'
>
> ;;; (#*1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)

This was due to ‘run-fibers’ binding one thread per CPU core.  Thus,
calling ‘getaffinity’ from within ‘run-fibers’ shows only one CPU and
likewise ‘current-processor-count’ returns 1.

Fixed in Cuirass commit 87a6d6ea7ae79fdf487bbcfd44bb3dce2d7c6e82.

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 27 Dec 2023 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 169 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.