GNU bug report logs - #47137
[PATCH] Adaptive substitute decompression selection

Previous Next

Package: guix-patches;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Sun, 14 Mar 2021 14:39:02 UTC

Severity: normal

Tags: patch

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Ludovic Courtès <ludo <at> gnu.org>
Subject: bug#47137: closed (Re: bug#47137: [PATCH] Adaptive substitute
 decompression selection)
Date: Sun, 21 Mar 2021 22:47:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#47137: [PATCH] Adaptive substitute decompression selection

which was filed against the guix-patches package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 47137 <at> debbugs.gnu.org.

-- 
47137: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=47137
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
To: 47137-done <at> debbugs.gnu.org
Subject: Re: bug#47137: [PATCH] Adaptive substitute decompression selection
Date: Sun, 21 Mar 2021 23:46:08 +0100
Hi!

Ludovic Courtès <ludo <at> gnu.org> skribis:

> The patch below is a followup to the thread started in December:
>
>   https://lists.gnu.org/archive/html/guix-devel/2020-12/msg00177.html
>
> It provides a naïve but apparently good enough way for ‘guix substitute’
> to choose the compression method that yields the best speed given the
> CPU and current networking conditions.
>
> On a recent x86_64 laptop with fast networking, using ci.guix.gnu.org,
> the effect so far is to choose gzip substitutes, which indeed provides
> slightly faster substitute installation.  When ci.guix provides zstd
> substitutes, the speedup will be higher.
>
> I have yet to check that it sticks to lzip when bandwidth is low.

I did that, using ‘tc’, and it works as expected, staying on lzip.

Pushed as 9da5ec7099b992a8969a17627548cd341c01bd90 with two minor
tweaks: lowered the low hysteresis threshold, and added a comment on how
to use ‘tc’ to test the behavior on “slow” networks.

Rather than running ‘guix build’ followed by ‘guix gc’, I found that
manually invoking ‘guix substitute’ was nicer (long line ahead!):

  ( echo substitute /gnu/store/svv4826f8zfj8grl2qa17xnxk3acsppc-elixir-1.11.4 /tmp/t1; echo substitute /gnu/store/d9dk53m7pwx1dc1p97zm0q323gpk70f9-poezio-0.13.1 /tmp/t4; echo substitute /gnu/store/mra8i18y9gjavhmdlkbb10m4miinirgz-ocaml-4.11.1 /tmp/t2; echo substitute /gnu/store/ay2j5mp20j9vbhibcwp5lmmcmhqkdnga-vim-full-8.2.2632 /tmp/t3; echo substitute /gnu/store/svv4826f8zfj8grl2qa17xnxk3acsppc-elixir-1.11.4 /tmp/t5; echo substitute /gnu/store/ay2j5mp20j9vbhibcwp5lmmcmhqkdnga-vim-full-8.2.2632 /tmp/t6) | GUIX_ALLOW_UNAUTHENTICATED_SUBSTITUTES=yes ./pre-inst-env guix substitute --substitute 4>&2

Note that this change won’t take effect until we update the ‘guix’
package.

Ludo’.

[Message part 3 (message/rfc822, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
To: guix-patches <at> gnu.org
Subject: [PATCH] Adaptive substitute decompression selection
Date: Sun, 14 Mar 2021 15:38:47 +0100
[Message part 4 (text/plain, inline)]
Hi!

The patch below is a followup to the thread started in December:

  https://lists.gnu.org/archive/html/guix-devel/2020-12/msg00177.html

It provides a naïve but apparently good enough way for ‘guix substitute’
to choose the compression method that yields the best speed given the
CPU and current networking conditions.

On a recent x86_64 laptop with fast networking, using ci.guix.gnu.org,
the effect so far is to choose gzip substitutes, which indeed provides
slightly faster substitute installation.  When ci.guix provides zstd
substitutes, the speedup will be higher.

I have yet to check that it sticks to lzip when bandwidth is low.

Thoughts?

Thanks,
Ludo’.

[0001-substitute-Choose-compression-method-based-on-past-C.patch (text/x-patch, inline)]
From 3f95a1ac04c5e178a7fedfc2d03c07bcb1075ead Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo <at> gnu.org>
Date: Sun, 14 Mar 2021 15:05:30 +0100
Subject: [PATCH] substitute: Choose compression method based on past CPU
 usage.

This stems from the observation that substitute download can be
CPU-bound when high-speed networks are in use:

  https://lists.gnu.org/archive/html/guix-devel/2020-12/msg00177.html

* guix/narinfo.scm (decompresses-faster?): New procedure.
(narinfo-best-uri): Add #:fast-decompression?.
* guix/scripts/substitute.scm (%prefer-fast-decompression?): New
variable.
(call-with-cpu-usage-monitoring): New procedure.
(with-cpu-usage-monitoring): New macro.
(display-narinfo-data, process-substitution): Pass #:fast-decompression?
to 'narinfo-best-uri'.
(process-substitution): Wrap 'restore-file' call in
'with-cpu-usage-monitoring'.  Set '%prefer-fast-decompression?'.
---
 guix/narinfo.scm            | 27 ++++++++++++++++---
 guix/scripts/substitute.scm | 53 ++++++++++++++++++++++++++++++++-----
 2 files changed, 69 insertions(+), 11 deletions(-)

diff --git a/guix/narinfo.scm b/guix/narinfo.scm
index 2d06124017..72e0f75fda 100644
--- a/guix/narinfo.scm
+++ b/guix/narinfo.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo <at> gnu.org>
+;;; Copyright © 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021 Ludovic Courtès <ludo <at> gnu.org>
 ;;; Copyright © 2014 Nikita Karetnikov <nikita <at> karetnikov.org>
 ;;; Copyright © 2018 Kyle Meyer <kyle <at> kyleam.com>
 ;;;
@@ -297,9 +297,21 @@ this is a rough approximation."
     (_      (or (string=? compression2 "none")
                 (string=? compression2 "gzip")))))
 
-(define (narinfo-best-uri narinfo)
+(define (decompresses-faster? compression1 compression2)
+  "Return true if COMPRESSION1 generally has a higher decompression throughput
+than COMPRESSION2."
+  (match compression1
+    ("none" #t)
+    ("zstd" #t)
+    ("gzip" (string=? compression2 "lzip"))
+    (_      #f)))
+
+(define* (narinfo-best-uri narinfo #:key fast-decompression?)
   "Select the \"best\" URI to download NARINFO's nar, and return three values:
-the URI, its compression method (a string), and the compressed file size."
+the URI, its compression method (a string), and the compressed file size.
+When FAST-DECOMPRESSION? is true, prefer substitutes with faster
+decompression (typically zstd) rather than substitutes with a higher
+compression ratio (typically lzip)."
   (define choices
     (filter (match-lambda
               ((uri compression file-size)
@@ -321,6 +333,13 @@ the URI, its compression method (a string), and the compressed file size."
           (compresses-better? compression1 compression2))))
       (_ #f)))                                    ;we can't tell
 
-  (match (sort choices file-size<?)
+  (define (speed<? c1 c2)
+    (match c1
+      ((uri1 compression1 . _)
+       (match c2
+         ((uri2 compression2 . _)
+          (decompresses-faster? compression2 compression1))))))
+
+  (match (sort choices (if fast-decompression? (negate speed<?) file-size<?))
     (((uri compression file-size) _ ...)
      (values uri compression file-size))))
diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm
index 6892aa999b..b213e6da06 100755
--- a/guix/scripts/substitute.scm
+++ b/guix/scripts/substitute.scm
@@ -257,6 +257,27 @@ Internal tool to substitute a pre-built binary to a local build.\n"))
 ;;; Daemon/substituter protocol.
 ;;;
 
+(define %prefer-fast-decompression?
+  ;; Whether to prefer fast decompression over good compression ratios.  This
+  ;; serves in particular to choose between lzip (high compression ratio but
+  ;; low decompression throughput) and zstd (lower compression ratio but high
+  ;; decompression throughput).
+  #f)
+
+(define (call-with-cpu-usage-monitoring proc)
+  (let ((before (times)))
+    (proc)
+    (let ((after (times)))
+      (if (= (tms:clock after) (tms:clock before))
+          0
+          (/ (- (tms:utime after) (tms:utime before))
+             (- (tms:clock after) (tms:clock before))
+             1.)))))
+
+(define-syntax-rule (with-cpu-usage-monitoring exp ...)
+  "Evaluate EXP...  Return its CPU usage as a fraction between 0 and 1."
+  (call-with-cpu-usage-monitoring (lambda () exp ...)))
+
 (define (display-narinfo-data narinfo)
   "Write to the current output port the contents of NARINFO in the format
 expected by the daemon."
@@ -269,7 +290,10 @@ expected by the daemon."
   (for-each (cute format #t "~a/~a~%" (%store-prefix) <>)
             (narinfo-references narinfo))
 
-  (let-values (((uri compression file-size) (narinfo-best-uri narinfo)))
+  (let-values (((uri compression file-size)
+                (narinfo-best-uri narinfo
+                                  #:fast-decompression?
+                                  %prefer-fast-decompression?)))
     (format #t "~a\n~a\n"
             (or file-size 0)
             (or (narinfo-size narinfo) 0))))
@@ -438,7 +462,9 @@ the current output port."
            store-item))
 
   (let-values (((uri compression file-size)
-                (narinfo-best-uri narinfo)))
+                (narinfo-best-uri narinfo
+                                  #:fast-decompression?
+                                  %prefer-fast-decompression?)))
     (unless print-build-trace?
       (format (current-error-port)
               (G_ "Downloading ~a...~%") (uri->string uri)))
@@ -476,11 +502,24 @@ the current output port."
                   ((hashed get-hash)
                    (open-hash-input-port algorithm input)))
       ;; Unpack the Nar at INPUT into DESTINATION.
-      (restore-file hashed destination
-                    #:dump-file (if (and destination-in-store?
-                                         deduplicate?)
-                                    dump-file/deduplicate*
-                                    dump-file))
+      (define cpu-usage
+        (with-cpu-usage-monitoring
+         (restore-file hashed destination
+                       #:dump-file (if (and destination-in-store?
+                                            deduplicate?)
+                                       dump-file/deduplicate*
+                                       dump-file))))
+
+      ;; Create a hysteresis: depending on CPU usage, favor compression
+      ;; methods with faster decompression (like ztsd) or methods with better
+      ;; compression ratios (like lzip).  This stems from the observation that
+      ;; substitution can be CPU-bound when high-speed networks are used:
+      ;; <https://lists.gnu.org/archive/html/guix-devel/2020-12/msg00177.html>.
+      (when (> cpu-usage .8)
+        (set! %prefer-fast-decompression? #t))
+      (when (< cpu-usage .4)
+        (set! %prefer-fast-decompression? #f))
+
       (close-port hashed)
       (close-port input)
 
-- 
2.30.2


This bug report was last modified 4 years and 58 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.