GNU bug report logs - #67502
[Cuirass] ‘cuirass remote-worker’ gets the CPU count wrong on the OverDrive

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludovic.courtes <at> inria.fr>

Date: Tue, 28 Nov 2023 09:11:02 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#67502: closed ([Cuirass] ‘cuirass remote-worker’ gets the CPU count wrong on the
 OverDrive)
Date: Tue, 28 Nov 2023 15:30:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Tue, 28 Nov 2023 16:28:50 +0100
with message-id <87cyvtsxql.fsf <at> gnu.org>
and subject line Re: bug#67502: [Cuirass] ‘cuirass remote-worker’ gets the CPU count wrong on the OverDrive
has caused the debbugs.gnu.org bug report #67502,
regarding [Cuirass] ‘cuirass remote-worker’ gets the CPU count wrong on the OverDrive
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
67502: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=67502
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Ludovic Courtès <ludovic.courtes <at> inria.fr>
To: bug-guix <at> gnu.org
Subject: [Cuirass] ‘cuirass remote-worker’ gets
 the CPU count wrong on the OverDrive
Date: Tue, 28 Nov 2023 10:09:42 +0100
On the OverDrive (AArch64), ‘cuirass remote-worker’ (1.2.0-1.bdc1f9f) says:

  starting 2 workers (parallelism: 1 cores) for server at 10.0.0.1

Instead it should use two cores for each worker:

--8<---------------cut here---------------start------------->8---
ludo <at> dover ~$ guile -c '(use-modules (ice-9 threads)) (pk (current-processor-count))'

;;; (4)
ludo <at> dover ~$ guile -c '(use-modules (ice-9 threads)) (pk (ceiling-quotient (current-processor-count) 2))'

;;; (2)
ludo <at> dover ~$ nproc
4
--8<---------------cut here---------------end--------------->8---

Since ‘current-processor-count’ is implemented indirectly in terms of
‘sched_getaffinity’, this suggests that the process starts with a bogus
affinity mask.  (Time passes…)  That’s indeed the case:

--8<---------------cut here---------------start------------->8---
ludo <at> dover ~$ sudo herd status cuirass-remote-worker
Status of cuirass-remote-worker:
  It is started.
  Running value is 21279.
  It is enabled.
  Provides (cuirass-remote-worker).
  Requires (avahi-daemon guix-daemon networking).
  Will be respawned.
ludo <at> dover ~$ guile -c '(pk (getaffinity 21279))'

;;; (#*1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
--8<---------------cut here---------------end--------------->8---

Compare to the affinity mask on x86_64-linux-gnu:

--8<---------------cut here---------------start------------->8---
ludo <at> guix-hpc3 ~$ sudo guile -c '(pk (getaffinity 1817))'

;;; (#*1111111111111111111111111111111100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)
--8<---------------cut here---------------end--------------->8---

Interesting that the initial affinity mask differs on aarch64-linux-gnu
compared to x86_64-linux-gnu.

Ludo’.


[Message part 3 (message/rfc822, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
To: 67502-done <at> debbugs.gnu.org
Subject: Re: bug#67502: [Cuirass] ‘cuirass remote-worker’ gets the CPU count wrong on the OverDrive
Date: Tue, 28 Nov 2023 16:28:50 +0100
Ludovic Courtès <ludovic.courtes <at> inria.fr> skribis:

> ludo <at> dover ~$ sudo herd status cuirass-remote-worker
> Status of cuirass-remote-worker:
>   It is started.
>   Running value is 21279.
>   It is enabled.
>   Provides (cuirass-remote-worker).
>   Requires (avahi-daemon guix-daemon networking).
>   Will be respawned.
> ludo <at> dover ~$ guile -c '(pk (getaffinity 21279))'
>
> ;;; (#*1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)

This was due to ‘run-fibers’ binding one thread per CPU core.  Thus,
calling ‘getaffinity’ from within ‘run-fibers’ shows only one CPU and
likewise ‘current-processor-count’ returns 1.

Fixed in Cuirass commit 87a6d6ea7ae79fdf487bbcfd44bb3dce2d7c6e82.

Ludo’.


This bug report was last modified 1 year and 170 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.