GNU bug report logs - #43518
(guix status) occasionally crashes when max-jobs > 1

Previous Next

Package: guix;

Reported by: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>

Date: Sat, 19 Sep 2020 17:36:01 UTC

Severity: important

To reply to this bug, email your comments to 43518 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#43518; Package guix. (Sat, 19 Sep 2020 17:36:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sat, 19 Sep 2020 17:36:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: bug-guix <bug-guix <at> gnu.org>
Subject: Guix substitute crash in procedure raise-exception: wrong type
 agument in position 1: #f
Date: Sat, 19 Sep 2020 13:36:01 -0400
Seems like a second exception raised in the process of raising an
exception :-).

The original crash reason is unkown, the network was operating normally.

downloading from http://127.0.0.1:8080/nar/gzip/8sdx7qyb4l3ni6yy902jlhamb26lqb6f-spice-gtk-0.37 ...


0.0 http://127.0.0.1:8080/nar/gzip/954b3kw59mdvanp33yx93q5pzzll4wg9-qemu-5.0.0 - 29605888
 qemu-5.0.0                                                           1.6MiB/s 00:26 | 41.8MiB transferredBacktrace:
In ice-9/boot-9.scm:
  1731:15 19 (with-exception-handler #<procedure 7f485337bd50 at ice-9/boot-9.scm:1815:7 (exn)> _ # _ # …)
In guix/scripts/package.scm:
   966:10 18 (_)
In guix/status.scm:
    776:4 17 (call-with-status-report _ _)
In guix/store.scm:
   1300:8 16 (call-with-build-handler _ _)
   1300:8 15 (call-with-build-handler #<procedure 7f485337b000 at guix/ui.scm:1166:2 (continue store …> …)
In guix/scripts/package.scm:
    889:2 14 (_)
In guix/build/syscalls.scm:
   1167:4 13 (call-with-file-lock/no-wait _ _ _)
In guix/scripts/package.scm:
   143:19 12 (build-and-use-profile #<store-connection 256.99 7f4854803190> "/var/guix/profiles/per-u…" …)
In guix/store.scm:
  2042:24 11 (run-with-store #<store-connection 256.99 7f4854803190> _ #:guile-for-build _ #:system _ # …)
In guix/profiles.scm:
   1586:2 10 (_ _)
    348:2  9 (_ _)
In guix/store.scm:
  1924:12  8 (_ #<store-connection 256.99 7f4854803190>)
   1357:5  7 (map/accumulate-builds #<store-connection 256.99 7f4854803190> _ _)
  1368:15  6 (_ #<store-connection 256.99 7f4854803190> _ _)
   740:13  5 (process-stderr _ _)
In unknown file:
           4 (display "@ substituter-succeeded /gnu/store/277yzwcx6fshsxlqnwvyv94q66miv3ii-diffoscope…" …)
In guix/status.scm:
   699:16  3 (write! _ _ _)
    613:6  2 (_ (download-progress "/gnu/store/954b3kw59mdvanp33yx93q5pzzll4wg9-qemu-5.0.0" "http:…" …) …)
In guix/progress.scm:
   213:14  1 (display-download-progress "qem@" _ #:start-time _ #:transferred _ #:log-port _)
In ice-9/boot-9.scm:
  1669:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1669:16: In procedure raise-exception:
In procedure =: Wrong type argument in position 1: #f

Maxim




Information forwarded to bug-guix <at> gnu.org:
bug#43518; Package guix. (Tue, 29 Sep 2020 17:00:02 GMT) Full text and rfc822 format available.

Message #8 received at 43518 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: 43518 <at> debbugs.gnu.org
Subject: Re: bug#43518: Guix substitute crash in procedure raise-exception:
 wrong type agument in position 1: #f
Date: Tue, 29 Sep 2020 13:00:30 -0400
Hello,

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:

[...]

> 0.0 http://127.0.0.1:8080/nar/gzip/954b3kw59mdvanp33yx93q5pzzll4wg9-qemu-5.0.0 - 29605888
>  qemu-5.0.0                                                           1.6MiB/s 00:26 | 41.8MiB transferredBacktrace:
> In ice-9/boot-9.scm:
>   1731:15 19 (with-exception-handler #<procedure 7f485337bd50 at ice-9/boot-9.scm:1815:7 (exn)> _ # _ # …)
> In guix/scripts/package.scm:
>    966:10 18 (_)
> In guix/status.scm:
>     776:4 17 (call-with-status-report _ _)
> In guix/store.scm:
>    1300:8 16 (call-with-build-handler _ _)
>    1300:8 15 (call-with-build-handler #<procedure 7f485337b000 at guix/ui.scm:1166:2 (continue store …> …)
> In guix/scripts/package.scm:
>     889:2 14 (_)
> In guix/build/syscalls.scm:
>    1167:4 13 (call-with-file-lock/no-wait _ _ _)
> In guix/scripts/package.scm:
>    143:19 12 (build-and-use-profile #<store-connection 256.99 7f4854803190> "/var/guix/profiles/per-u…" …)
> In guix/store.scm:
>   2042:24 11 (run-with-store #<store-connection 256.99 7f4854803190> _ #:guile-for-build _ #:system _ # …)
> In guix/profiles.scm:
>    1586:2 10 (_ _)
>     348:2  9 (_ _)
> In guix/store.scm:
>   1924:12  8 (_ #<store-connection 256.99 7f4854803190>)
>    1357:5  7 (map/accumulate-builds #<store-connection 256.99 7f4854803190> _ _)
>   1368:15  6 (_ #<store-connection 256.99 7f4854803190> _ _)
>    740:13  5 (process-stderr _ _)
> In unknown file:
>            4 (display "@ substituter-succeeded /gnu/store/277yzwcx6fshsxlqnwvyv94q66miv3ii-diffoscope…" …)
> In guix/status.scm:
>    699:16  3 (write! _ _ _)
>     613:6  2 (_ (download-progress "/gnu/store/954b3kw59mdvanp33yx93q5pzzll4wg9-qemu-5.0.0" "http:…" …) …)
> In guix/progress.scm:
>    213:14  1 (display-download-progress "qem@" _ #:start-time _ #:transferred _ #:log-port _)
> In ice-9/boot-9.scm:
>   1669:16  0 (raise-exception _ #:continuable? _)
>
> ice-9/boot-9.scm:1669:16: In procedure raise-exception:
> In procedure =: Wrong type argument in position 1: #f

Here's a second instance of that problem, that I could get just now,
from Guix commit d48b17adb91d68acf6fb3f321c05102fcc8c39eb:

--8<---------------cut here---------------start------------->8---
guix environment --system=armhf-linux --ad-hoc gcc-toolchain
substitute: updating substitutes from 'http://127.0.0.1:8080'... 100.0%
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
0.4 MB will be downloaded
downloading from https://ci.guix.gnu.org/nar/lzip/jf2x64p5z1ykg4px2plh0v5a7r20pc50-glibc-utf8-locales-2.31 ...
 glibc-utf8-locales-2.31  376KiB                                683KiB/s 00:01 [##################] 100.0%

substitute: updating substitutes from 'http://127.0.0.1:8080'... 100.0%
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
The following derivations will be built:
   /gnu/store/2iswjr97v4kfpb8q0jwzjwwjm9rvcq1q-profile.drv
   /gnu/store/anx317lak54y5lg3hw8i8fn5hy485ayx-gcc-toolchain-10.2.0.drv
   /gnu/store/6vqbb2jks246cjbqclxaq6x1j4zcvvg1-gcc-10.2.0.drv
   /gnu/store/ih15jlds75wh246dnb6s9sixqz93mqrp-module-import-compiled.drv

10 items will be downloaded
building CA certificate bundle...
building fonts directory...
building directory of Info manuals...
building database for manual pages...
substitute: updating substitutes from 'http://127.0.0.1:8080'... 100.0%
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
process 18399 acquired build slot '/var/guix/offload/127.0.0.1:6666/1'
load on machine '127.0.0.1' is 0.78 (normalized: 0.39)
building /gnu/store/ih15jlds75wh246dnb6s9sixqz93mqrp-module-import-compiled.drv...
The following builds are still in progress:
  /gnu/store/dzchr45vhqwa538skm6h5mjqb73ly01i-manual-database.drv
  /gnu/store/ih15jlds75wh246dnb6s9sixqz93mqrp-module-import-compiled.drv
  /gnu/store/s12hk0x8khglmqky7035a904ryjf4hbd-info-dir.drv
  /gnu/store/zafx1kh4jnycqxhy2kfm6jajq3dgp3cm-fonts-dir.drv

The following builds are still in progress:
  /gnu/store/dzchr45vhqwa538skm6h5mjqb73ly01i-manual-database.drv
  /gnu/store/ih15jlds75wh246dnb6s9sixqz93mqrp-module-import-compiled.drv
  /gnu/store/s12hk0x8khglmqky7035a904ryjf4hbd-info-dir.drv

The following builds are still in progress:
  /gnu/store/dzchr45vhqwa538skm6h5mjqb73ly01i-manual-database.drv
  /gnu/store/ih15jlds75wh246dnb6s9sixqz93mqrp-module-import-compiled.drv

The following build is still in progress:
  /gnu/store/ih15jlds75wh246dnb6s9sixqz93mqrp-module-import-compiled.drv

|offloading build of /gnu/store/ih15jlds75wh246dnb6s9sixqz93mqrp-module-import-compiled.drv to '127.0.0.1'
downloading from https://ci.guix.gnu.org/nar/lzip/1nr37qqaqmcn3dbrjg5z7r0mfkaaf2aq-glibc-bootstrap-0 ...
downloading from https://ci.guix.gnu.org/nar/lzip/idr9bk61lar8i6fqpmkvpg8ph5wdbapr-isl-0.22.1 ...
downloading from https://ci.guix.gnu.org/nar/lzip/dnikscj9hksx4g4fs4qd2ynj1rzv6fz9-libelf-0.8.13 ...
downloading from https://ci.guix.gnu.org/nar/6m9zimmw8p6gbc1yfbg454c1r587b7h4-gcc-10.2.0.tar.xz ...
|

substitution of /gnu/store/dnikscj9hksx4g4fs4qd2ynj1rzv6fz9-libelf-0.8.13 complete
 80% [###############################################################################                    ]

substitution of /gnu/store/idr9bk61lar8i6fqpmkvpg8ph5wdbapr-isl-0.22.1 complete
 90% [#########################################################################################          ]

downloading from https://ci.guix.gnu.org/nar/lzip/ws1p3jb5pjlfmsmklq2b6288ny9nwm55-libstdc%2B%2B-7.5.0 ...
substitution of /gnu/store/1nr37qqaqmcn3dbrjg5z7r0mfkaaf2aq-glibc-bootstrap-0 complete
100% [###################################################################################################]

downloading from https://ci.guix.gnu.org/nar/lzip/ldwybsllicl9h433aixm1cv28g70ifz2-libstdc%2B%2B-boot0-4.9.4 ...


4
substitution of /gnu/store/ws1p3jb5pjlfmsmklq2b6288ny9nwm55-libstdc++-7.5.0 complete
substitution of /gnu/store/ldwybsllicl9h433aixm1cv28g70ifz2-libstdc++-boot0-4.9.4 complete
 gcc-10.2.0.tar.xz  74.3MiB                                     812KiB/s 00:25 [####              ]  27.2%

xz 77898736 21561440
 gcc-10.2.0.tar.xz  74.3MiB                                     1.1MiB/s 00:25 [#######           ]  39.1%Backtrace:
In ice-9/boot-9.scm:
  1736:10 19 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
  1731:15 18 (with-exception-handler #<procedure 7f5866dd6240 at ice-9/boot-9.scm:1815:7 (exn)> _ # _ # …)
  1736:10 17 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In guix/store.scm:
   631:37 16 (thunk)
   1300:8 15 (call-with-build-handler _ _)
   1300:8 14 (call-with-build-handler _ _)
   1300:8 13 (call-with-build-handler #<procedure 7f5866e10b70 at guix/ui.scm:1166:2 (continue store …> …)
In guix/status.scm:
    776:4 12 (call-with-status-report _ _)
In guix/scripts/environment.scm:
   741:14 11 (_)
In guix/store.scm:
  2042:24 10 (run-with-store #<store-connection 256.99 7f586aec4410> _ #:guile-for-build _ #:system _ # …)
In guix/scripts/environment.scm:
   754:18  9 (_ _)
In guix/store.scm:
  1911:38  8 (_ #<store-connection 256.99 7f5864a456e0>)
  1368:15  7 (_ #<store-connection 256.99 7f5864a456e0> _ _)
   740:13  6 (process-stderr _ _)
In unknown file:
           5 (display "@ substituter-succeeded /gnu/store/r06j3ms57z4mzfpdzfclsi3i9hr4184g-module-imp…" …)
In guix/status.scm:
   699:16  4 (write! _ _ _)
    613:6  3 (_ (download-progress "/gnu/store/6m9zimmw8p6gbc1yfbg454c1r587b7h4-gcc-10.2.0.tar.xz" # …) …)
In guix/progress.scm:
   213:14  2 (display-download-progress "6m9zimmw8p6gbc1yfbg454c1r58@" _ #:start-time _ #:transferred _ …)
In ice-9/boot-9.scm:
  1669:16  1 (raise-exception _ #:continuable? _)
  1669:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1669:16: In procedure raise-exception:
In procedure =: Wrong type argument in position 1: #f
--8<---------------cut here---------------end--------------->8---

Maxim




Information forwarded to bug-guix <at> gnu.org:
bug#43518; Package guix. (Wed, 30 Sep 2020 17:20:02 GMT) Full text and rfc822 format available.

Message #11 received at 43518 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: 43518 <at> debbugs.gnu.org
Subject: Re: bug#43518: Guix substitute crash in procedure raise-exception:
 wrong type agument in position 1: #f
Date: Wed, 30 Sep 2020 19:19:34 +0200
Hi,

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skribis:

> downloading from https://ci.guix.gnu.org/nar/6m9zimmw8p6gbc1yfbg454c1r587b7h4-gcc-10.2.0.tar.xz ...
[…]
>  gcc-10.2.0.tar.xz  74.3MiB                                     1.1MiB/s 00:25 [#######           ]  39.1%Backtrace:
[…]
> In unknown file:
>            5 (display "@ substituter-succeeded /gnu/store/r06j3ms57z4mzfpdzfclsi3i9hr4184g-module-imp…" …)
> In guix/status.scm:
>    699:16  4 (write! _ _ _)
>     613:6  3 (_ (download-progress "/gnu/store/6m9zimmw8p6gbc1yfbg454c1r587b7h4-gcc-10.2.0.tar.xz" # …) …)
> In guix/progress.scm:
>    213:14  2 (display-download-progress "6m9zimmw8p6gbc1yfbg454c1r58@" _ #:start-time _ #:transferred _ …)
> In ice-9/boot-9.scm:
>   1669:16  1 (raise-exception _ #:continuable? _)
>   1669:16  0 (raise-exception _ #:continuable? _)
>
> ice-9/boot-9.scm:1669:16: In procedure raise-exception:
> In procedure =: Wrong type argument in position 1: #f

If you have a case that’s reproducible, please take advantage of it and
add ‘pk’ calls in (guix progress) to see what happens.  Here it seems
that ‘transferred’ (in ‘display-download-progress’) is #f.

The #f would come from this clause in (guix status):

     ('download-progress item uri
                         (= string->number size)
                         (= string->number transferred))  ;<- here

That in turn comes from ‘progress-reporter/trace’, called from (guix
scripts substitute).

It looks like this issue happens when max-jobs is (noticeably) greater
than 1, as if the client was receiving intermingled “@
download-progress” traces, and thus not getting the right ‘transferred’
value.

Could you try to reproduce it while ‘guix’ is running under strace?
Like:

  strace -o log -s 200 guix build whatever --max-jobs=20

?

Then, if you reproduce the error, we’ll look for intermingled traces in
the log.

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#43518; Package guix. (Fri, 02 Oct 2020 17:50:01 GMT) Full text and rfc822 format available.

Message #14 received at 43518 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 43518 <at> debbugs.gnu.org
Subject: Re: bug#43518: Guix substitute crash in procedure raise-exception:
 wrong type agument in position 1: #f
Date: Fri, 02 Oct 2020 13:50:33 -0400
Hello!

Ludovic Courtès <ludo <at> gnu.org> writes:

> Hi,
>
> Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skribis:
>
>> downloading from https://ci.guix.gnu.org/nar/6m9zimmw8p6gbc1yfbg454c1r587b7h4-gcc-10.2.0.tar.xz ...
> […]
>>  gcc-10.2.0.tar.xz  74.3MiB                                     1.1MiB/s 00:25 [#######           ]  39.1%Backtrace:
> […]
>> In unknown file:
>>            5 (display "@ substituter-succeeded /gnu/store/r06j3ms57z4mzfpdzfclsi3i9hr4184g-module-imp…" …)
>> In guix/status.scm:
>>    699:16  4 (write! _ _ _)
>>     613:6  3 (_ (download-progress "/gnu/store/6m9zimmw8p6gbc1yfbg454c1r587b7h4-gcc-10.2.0.tar.xz" # …) …)
>> In guix/progress.scm:
>>    213:14  2 (display-download-progress "6m9zimmw8p6gbc1yfbg454c1r58@" _ #:start-time _ #:transferred _ …)
>> In ice-9/boot-9.scm:
>>   1669:16  1 (raise-exception _ #:continuable? _)
>>   1669:16  0 (raise-exception _ #:continuable? _)
>>
>> ice-9/boot-9.scm:1669:16: In procedure raise-exception:
>> In procedure =: Wrong type argument in position 1: #f
>
> If you have a case that’s reproducible, please take advantage of it and
> add ‘pk’ calls in (guix progress) to see what happens.  Here it seems
> that ‘transferred’ (in ‘display-download-progress’) is #f.
>
> The #f would come from this clause in (guix status):
>
>      ('download-progress item uri
>                          (= string->number size)
>                          (= string->number transferred))  ;<- here
>
> That in turn comes from ‘progress-reporter/trace’, called from (guix
> scripts substitute).

It's a bit difficult for me to follow the calls here :-).  I thought
it'd have to be in 'progress-reporter/file' because that's the only one
that ends up calling display-download-progress.  The line 213 of (guix
progress) is:

      (unless (zero? transferred)

and if transferred is #f that would indeed fail with the error message
shown in the backtrace, so it seems you are correct in your analysis.

> It looks like this issue happens when max-jobs is (noticeably) greater
> than 1, as if the client was receiving intermingled “@
> download-progress” traces, and thus not getting the right ‘transferred’
> value.
>
> Could you try to reproduce it while ‘guix’ is running under strace?
> Like:
>
>   strace -o log -s 200 guix build whatever --max-jobs=20

I've reconfigured my offload machine to support 20 jobs, and
instrumented my local checkouts with the following pk calls:

--8<---------------cut here---------------start------------->8---
3 files changed, 18 insertions(+), 11 deletions(-)
guix/progress.scm           |  7 ++++++-
guix/scripts/substitute.scm | 19 ++++++++++---------
guix/status.scm             |  3 ++-

modified   guix/progress.scm
@@ -194,6 +194,7 @@ throughput."
      (time-difference (current-time (time-type start-time))
                       start-time)))
 
+  (pk 'display-download-progress 'file= file 'size= size 'transferred= transferred)
   (if (and (number? size) (not (zero? size)))
       (let* ((%  (* 100.0 (/ transferred size)))
              (throughput (/ transferred elapsed))
@@ -247,7 +248,10 @@ ABBREVIATION used to shorten FILE for display."
      (report
       (let ((rate-limited-render (rate-limited render %progress-interval)))
         (lambda (value)
+          (pk 'called-render/transferred= transferred)
+          (pk 'file: file 'size: size)
           (set! transferred value)
+          (pk 'transferred-set-to value)
           (rate-limited-render))))
      ;; Don't miss the last report.
      (stop render))))
@@ -307,6 +311,7 @@ progress reports, write \"build trace\" lines to be processed elsewhere."
                      log-port)))
    (report (let ((report (rate-limited report-progress %progress-interval)))
              (lambda (transferred)
+               (pk 'progress-reporter/trace 'total= total 'transferred= transferred)
                (set! total transferred)
                (report transferred))))
    (stop (lambda ()
@@ -348,7 +353,7 @@ REPORTER, which should be a <progress-reporter> object."
                                 ((? eof-object?) 0)
                                 (x x))))
                        (set! total (+ total n))
-                       (report total)
+                       (report (pk 'progress-report-port:read!:total= total))
                        n))))
        (start)
        (make-custom-binary-input-port "progress-port-proc"
modified   guix/scripts/substitute.scm
@@ -991,15 +991,16 @@ DESTINATION as a nar file.  Verify the substitute against ACL."
                    (let* ((dl-size  (or download-size
                                         (and (equal? compression "none")
                                              (narinfo-size narinfo))))
-                          (reporter (if print-build-trace?
-                                        (progress-reporter/trace
-                                         destination
-                                         (uri->string uri) dl-size
-                                         (current-error-port))
-                                        (progress-reporter/file
-                                         (uri->string uri) dl-size
-                                         (current-error-port)
-                                         #:abbreviation nar-uri-abbreviation))))
+                          (reporter (pk 'reporter (if print-build-trace?
+                                                      (progress-reporter/trace
+                                                       destination
+                                                       (uri->string uri) dl-size
+                                                       (current-error-port))
+                                                      (progress-reporter/file
+                                                       (uri->string uri) dl-size
+                                                       (current-error-port)
+                                                       #:abbreviation nar-uri-abbreviation)))))
+                     (error 'quit)
                      (progress-report-port reporter raw)))
                   ((input pids)
                    ;; NOTE: This 'progress' port of current process will be
modified   guix/status.scm
@@ -532,7 +532,8 @@ addition to build events."
                          (= string->number transferred))
      ;; Print a progress bar, but only if there's only one on-going
      ;; job--otherwise the output would be intermingled.
-     (when (= 1 (simultaneous-jobs status))
+     (pk 'print-build-event:download-progress 'transferred= transferred)
+     (when (= 1 (pk 'simultaneous-jobs= (simultaneous-jobs status)))
        (match (find (matching-download item)
                     (build-status-downloading status))
          (#f #f)                                  ;shouldn't happen!
--8<---------------cut here---------------end--------------->8---

It seems that the only pk messages I see are the ones I put in (guix
status).  Perhaps the other modules are not taken from the client but
from the daemon?  I'm confused :-).

--8<---------------cut here---------------start------------->8---
;;; (print-build-event:download-progress transferred= 4980832)                                                       ;;; (simultaneous-jobs= 3)
--8<---------------cut here---------------end--------------->8---

But so far haven't been able to reproduce the error.  I'll keep
recording strace log when building things in the hope of catching the
bug!

To be continued...

Thanks for the help!

Maxim




Information forwarded to bug-guix <at> gnu.org:
bug#43518; Package guix. (Sat, 03 Oct 2020 09:39:02 GMT) Full text and rfc822 format available.

Message #17 received at 43518 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: 43518 <at> debbugs.gnu.org
Subject: Re: bug#43518: Guix substitute crash in procedure raise-exception:
 wrong type agument in position 1: #f
Date: Sat, 03 Oct 2020 11:37:47 +0200
Hi,

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Hi,
>>
>> Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skribis:
>>
>>> downloading from https://ci.guix.gnu.org/nar/6m9zimmw8p6gbc1yfbg454c1r587b7h4-gcc-10.2.0.tar.xz ...
>> […]
>>>  gcc-10.2.0.tar.xz  74.3MiB                                     1.1MiB/s 00:25 [#######           ]  39.1%Backtrace:
>> […]
>>> In unknown file:
>>>            5 (display "@ substituter-succeeded /gnu/store/r06j3ms57z4mzfpdzfclsi3i9hr4184g-module-imp…" …)
>>> In guix/status.scm:
>>>    699:16  4 (write! _ _ _)
>>>     613:6  3 (_ (download-progress "/gnu/store/6m9zimmw8p6gbc1yfbg454c1r587b7h4-gcc-10.2.0.tar.xz" # …) …)
>>> In guix/progress.scm:
>>>    213:14  2 (display-download-progress "6m9zimmw8p6gbc1yfbg454c1r58@" _ #:start-time _ #:transferred _ …)
>>> In ice-9/boot-9.scm:
>>>   1669:16  1 (raise-exception _ #:continuable? _)
>>>   1669:16  0 (raise-exception _ #:continuable? _)
>>>
>>> ice-9/boot-9.scm:1669:16: In procedure raise-exception:
>>> In procedure =: Wrong type argument in position 1: #f
>>
>> If you have a case that’s reproducible, please take advantage of it and
>> add ‘pk’ calls in (guix progress) to see what happens.  Here it seems
>> that ‘transferred’ (in ‘display-download-progress’) is #f.
>>
>> The #f would come from this clause in (guix status):
>>
>>      ('download-progress item uri
>>                          (= string->number size)
>>                          (= string->number transferred))  ;<- here
>>
>> That in turn comes from ‘progress-reporter/trace’, called from (guix
>> scripts substitute).
>
> It's a bit difficult for me to follow the calls here :-).  I thought
> it'd have to be in 'progress-reporter/file' because that's the only one
> that ends up calling display-download-progress.  The line 213 of (guix
> progress) is:
>
>       (unless (zero? transferred)
>
> and if transferred is #f that would indeed fail with the error message
> shown in the backtrace, so it seems you are correct in your analysis.

That’s correct.

The “@ download-progress” line is printed by (guix scripts substitute)
and later consumed by (guix status) in the client, which is why I
mentioned ‘progress-reporter/trace’ above.

I think the problem we’re looking at could occur if those traces are not
printed in an atomic way, and thus (guix status) gets to see
truncated/mixed up traces.  So I tried this:

  _NIX_OPTIONS=print-extended-build-trace=1 sudo -E \
    ./pre-inst-env strace -s 200 -o ,,s  guix substitute \
       --substitute /gnu/store/pknm43xsza6nlc7bn27djip8fis92akd-gcc-toolchain-10.2.0 /tmp/t.drv

It shows that traces are printed in a single write(2) call:

--8<---------------cut here---------------start------------->8---
write(2, "@ download-progress /tmp/t.drv http://ci.guix.gnu.org/nar/lzip/pknm43xsza6nlc7bn27djip8fis92akd-gcc-toolchain-10.2.0 4843 4843\n", 127) = 127
--8<---------------cut here---------------end--------------->8---

So this side of things seems to be good.  But then traces could be
mangled/truncated by the daemon maybe.  An strace log of the failing
case would be very helpful.

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#43518; Package guix. (Sun, 04 Oct 2020 03:20:01 GMT) Full text and rfc822 format available.

Message #20 received at 43518 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 43518 <at> debbugs.gnu.org
Subject: Re: bug#43518: Guix substitute crash in procedure raise-exception:
 wrong type agument in position 1: #f
Date: Sat, 03 Oct 2020 23:19:32 -0400
Hello!

Ludovic Courtès <ludo <at> gnu.org> writes:

[...]

> The “@ download-progress” line is printed by (guix scripts substitute)
> and later consumed by (guix status) in the client, which is why I
> mentioned ‘progress-reporter/trace’ above.
>
> I think the problem we’re looking at could occur if those traces are not
> printed in an atomic way, and thus (guix status) gets to see
> truncated/mixed up traces.  So I tried this:
>
>   _NIX_OPTIONS=print-extended-build-trace=1 sudo -E \
>     ./pre-inst-env strace -s 200 -o ,,s  guix substitute \
>        --substitute /gnu/store/pknm43xsza6nlc7bn27djip8fis92akd-gcc-toolchain-10.2.0 /tmp/t.drv
>
> It shows that traces are printed in a single write(2) call:
>
> write(2, "@ download-progress /tmp/t.drv http://ci.guix.gnu.org/nar/lzip/pknm43xsza6nlc7bn27djip8fis92akd-gcc-toolchain-10.2.0 4843 4843\n", 127) = 127
>
> So this side of things seems to be good.  But then traces could be
> mangled/truncated by the daemon maybe.  An strace log of the failing
> case would be very helpful.

Not sure this matters or not, but thought I'd add the information here
in case:  the ntpd service was stopped for unkown reasons on my local
machine, leading to 'guix offload status' to output the following
warning:

guix offload: warning: machine '127.0.0.1' is 106 seconds behind

I've since restarted the ntpd service and that warning disappeared.

Maxim




Information forwarded to bug-guix <at> gnu.org:
bug#43518; Package guix. (Sun, 04 Oct 2020 11:54:02 GMT) Full text and rfc822 format available.

Message #23 received at 43518 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 43518 <at> debbugs.gnu.org
Subject: Re: bug#43518: Guix substitute crash in procedure raise-exception:
 wrong type agument in position 1: #f
Date: Sun, 04 Oct 2020 07:53:47 -0400
[Message part 1 (text/plain, inline)]
Hello again!

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:

> Hello!
>
> Ludovic Courtès <ludo <at> gnu.org> writes:
>
> [...]
>
>> The “@ download-progress” line is printed by (guix scripts substitute)
>> and later consumed by (guix status) in the client, which is why I
>> mentioned ‘progress-reporter/trace’ above.
>>
>> I think the problem we’re looking at could occur if those traces are not
>> printed in an atomic way, and thus (guix status) gets to see
>> truncated/mixed up traces.  So I tried this:
>>
>>   _NIX_OPTIONS=print-extended-build-trace=1 sudo -E \
>>     ./pre-inst-env strace -s 200 -o ,,s  guix substitute \
>>        --substitute /gnu/store/pknm43xsza6nlc7bn27djip8fis92akd-gcc-toolchain-10.2.0 /tmp/t.drv
>>
>> It shows that traces are printed in a single write(2) call:
>>
>> write(2, "@ download-progress /tmp/t.drv http://ci.guix.gnu.org/nar/lzip/pknm43xsza6nlc7bn27djip8fis92akd-gcc-toolchain-10.2.0 4843 4843\n", 127) = 127
>>
>> So this side of things seems to be good.  But then traces could be
>> mangled/truncated by the daemon maybe.  An strace log of the failing
>> case would be very helpful.

[...]

I managed to capture two instances of 'transferred= #f' from my pk
output in the attached logs.  Curiously, they didn't lead to the crash.
Attached is a pruned version of the strace log of a command like
'./pre-inst-env guix package -m my-manifest --max-jobs=20'.  Offloading
was in use.

[log.pruned.lz (application/octet-stream, attachment)]
[Message part 3 (text/plain, inline)]
Thanks,

Maxim

Information forwarded to bug-guix <at> gnu.org:
bug#43518; Package guix. (Tue, 01 Dec 2020 18:07:01 GMT) Full text and rfc822 format available.

Message #26 received at 43518 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 43518 <at> debbugs.gnu.org
Subject: Re: bug#43518: Guix substitute crash in procedure raise-exception:
 wrong type agument in position 1: #f
Date: Tue, 01 Dec 2020 12:23:29 -0500
[Message part 1 (text/plain, inline)]
Hello Ludovic,

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:

> Hello again!
>
> Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:
>
>> Hello!
>>
>> Ludovic Courtès <ludo <at> gnu.org> writes:
>>
>> [...]
>>
>>> The “@ download-progress” line is printed by (guix scripts substitute)
>>> and later consumed by (guix status) in the client, which is why I
>>> mentioned ‘progress-reporter/trace’ above.
>>>
>>> I think the problem we’re looking at could occur if those traces are not
>>> printed in an atomic way, and thus (guix status) gets to see
>>> truncated/mixed up traces.  So I tried this:
>>>
>>>   _NIX_OPTIONS=print-extended-build-trace=1 sudo -E \
>>>     ./pre-inst-env strace -s 200 -o ,,s  guix substitute \
>>>        --substitute /gnu/store/pknm43xsza6nlc7bn27djip8fis92akd-gcc-toolchain-10.2.0 /tmp/t.drv
>>>
>>> It shows that traces are printed in a single write(2) call:
>>>
>>> write(2, "@ download-progress /tmp/t.drv http://ci.guix.gnu.org/nar/lzip/pknm43xsza6nlc7bn27djip8fis92akd-gcc-toolchain-10.2.0 4843 4843\n", 127) = 127
>>>
>>> So this side of things seems to be good.  But then traces could be
>>> mangled/truncated by the daemon maybe.  An strace log of the failing
>>> case would be very helpful.
>
> [...]
>
> I managed to capture two instances of 'transferred= #f' from my pk
> output in the attached logs.  Curiously, they didn't lead to the crash.
> Attached is a pruned version of the strace log of a command like
> './pre-inst-env guix package -m my-manifest --max-jobs=20'.  Offloading
> was in use.

The trace attached is even better, in that it triggered the problem!  I
don't have time to take a look now, but I hope it'll be useful in
understanding the issue better.  It's rather precious (quite some luck
seems to be needed to reproduce) :-).

[log.precious (text/plain, attachment)]
[Message part 3 (text/plain, inline)]
Thanks,

Maxim

Information forwarded to bug-guix <at> gnu.org:
bug#43518; Package guix. (Tue, 19 Jan 2021 03:44:02 GMT) Full text and rfc822 format available.

Message #29 received at 43518 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 43518 <at> debbugs.gnu.org
Subject: Re: bug#43518: Guix substitute crash in procedure raise-exception:
 wrong type agument in position 1: #f
Date: Mon, 18 Jan 2021 22:43:06 -0500
Hello,

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:

> Hello Ludovic,
>
> Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:
>
>> Hello again!
>>
>> Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:
>>
>>> Hello!
>>>
>>> Ludovic Courtès <ludo <at> gnu.org> writes:
>>>
>>> [...]
>>>
>>>> The “@ download-progress” line is printed by (guix scripts substitute)
>>>> and later consumed by (guix status) in the client, which is why I
>>>> mentioned ‘progress-reporter/trace’ above.
>>>>
>>>> I think the problem we’re looking at could occur if those traces are not
>>>> printed in an atomic way, and thus (guix status) gets to see
>>>> truncated/mixed up traces.  So I tried this:
>>>>
>>>>   _NIX_OPTIONS=print-extended-build-trace=1 sudo -E \
>>>>     ./pre-inst-env strace -s 200 -o ,,s  guix substitute \
>>>>        --substitute /gnu/store/pknm43xsza6nlc7bn27djip8fis92akd-gcc-toolchain-10.2.0 /tmp/t.drv
>>>>
>>>> It shows that traces are printed in a single write(2) call:
>>>>
>>>> write(2, "@ download-progress /tmp/t.drv http://ci.guix.gnu.org/nar/lzip/pknm43xsza6nlc7bn27djip8fis92akd-gcc-toolchain-10.2.0 4843 4843\n", 127) = 127
>>>>
>>>> So this side of things seems to be good.  But then traces could be
>>>> mangled/truncated by the daemon maybe.  An strace log of the failing
>>>> case would be very helpful.
>>
>> [...]
>>
>> I managed to capture two instances of 'transferred= #f' from my pk
>> output in the attached logs.  Curiously, they didn't lead to the crash.
>> Attached is a pruned version of the strace log of a command like
>> './pre-inst-env guix package -m my-manifest --max-jobs=20'.  Offloading
>> was in use.
>
> The trace attached is even better, in that it triggered the problem!  I
> don't have time to take a look now, but I hope it'll be useful in
> understanding the issue better.  It's rather precious (quite some luck
> seems to be needed to reproduce) :-).
>

[...]

> write(1, ";;; (print-build-event:download-progress transferred= #f)\n", 58) = 58
> write(1, "\n", 1)                       = 1
> write(1, ";;; (simultaneous-jobs= 1)\n", 27) = 27
> write(1, "\n", 1)                       = 1
> write(1, ";;; (display-download-progress file= \"https:@\" size= #f transferred= #f)\n", 73) = 73
> write(2, "Backtrace:\n", 11)            = 11
> futex(0x7f81fbe97648, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> write(2, "In ice-9/boot-9.scm:\n", 21)  = 21
> write(2, "  1731:15 19 (with-exception-handler #<procedure 7f81f8667210 at ice-9/boot-9.scm:1815:7 (exn)> _ # _ # \342\200\246)\n", 109) = 109
> write(2, "In guix/scripts/package.scm:\n", 29) = 29
> write(2, "   972:10 18 (_)\n", 17)      = 17
> write(2, "In guix/status.scm:\n", 20)   = 20
> write(2, "    777:4 17 (call-with-status-report _ _)\n", 43) = 43
> write(2, "In guix/store.scm:\n", 19)    = 19
> write(2, "   1301:8 16 (call-with-build-handler _ _)\n", 43) = 43
> write(2, "   1301:8 15 (call-with-build-handler #<procedure 7f81f866d5d0 at guix/ui.scm:1169:2 (continue store \342\200\246> \342\200\246)\n", 111) = 111
> write(2, "In guix/scripts/package.scm:\n", 29) = 29
> write(2, "    895:2 14 (_)\n", 17)      = 17
> write(2, "In guix/build/syscalls.scm:\n", 28) = 28
> write(2, "   1167:4 13 (call-with-file-lock/no-wait _ _ _)\n", 49) = 49
> write(2, "In guix/scripts/package.scm:\n", 29) = 29
> write(2, "   144:19 12 (build-and-use-profile #<store-connection 256.99 7f81f6934870> \"/var/guix/profiles/per-u\342\200\246\" \342\200\246)\n", 111) = 111
> mmap(NULL, 262144, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f81e1d3f000
> write(2, "In guix/store.scm:\n", 19)    = 19
> write(2, "  2049:24 11 (run-with-store #<store-connection 256.99 7f81f6934870> _ #:guile-for-build _ #:system _ # \342\200\246)\n", 109) = 109
> write(2, "In guix/profiles.scm:\n", 22) = 22
> write(2, "   1598:2 10 (_ _)\n", 19)    = 19
> write(2, "    348:2  9 (_ _)\n", 19)    = 19
> write(2, "In guix/store.scm:\n", 19)    = 19
> write(2, "  1931:12  8 (_ #<store-connection 256.99 7f81f6934870>)\n", 57) = 57
> write(2, "   1358:5  7 (map/accumulate-builds #<store-connection 256.99 7f81f6934870> _ _)\n", 81) = 81
> write(2, "  1369:15  6 (_ #<store-connection 256.99 7f81f6934870> _ _)\n", 61) = 61
> write(2, "   741:13  5 (process-stderr _ _)\n", 34) = 34
> write(2, "In unknown file:\n", 17)      = 17
> write(2, "           4 (display \"@ substituter-succeeded /gnu/store/srcpjs17wygd5f7fa251j716n2lvdlrz-gwenhywfar\342\200\246\" \342\200\246)\n", 111) = 111
> write(2, "In guix/status.scm:\n", 20)   = 20
> write(2, "   700:16  3 (write! _ _ _)\n", 28) = 28
> write(2, "    614:6  2 (_ (download-progress \"/gnu/store/ypz9jjlv4gkfd4m2n359d9ixma0sfydz-ghc-8.6.5\" \"https:@\" \342\200\246) \342\200\246)\n", 111) = 111
> write(2, "In guix/progress.scm:\n", 22) = 22
> write(2, "   214:14  1 (display-download-progress \"https:@\" _ #:start-time _ #:transferred _ #:log-port _)\n", 97) = 97
> write(2, "In ice-9/boot-9.scm:\n", 21)  = 21
> write(2, "  1669:16  0 (raise-exception _ #:continuable? _)\n", 50) = 50
> write(2, "\n", 1)                       = 1
> futex(0x7f81fbe97650, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> futex(0x7f81fbe97ce8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
> write(2, "ice-9/boot-9.scm:1669:16: In procedure raise-exception:\n", 56) = 56
> write(2, "In procedure =: Wrong type argument in position 1: #f\n", 54) = 54
> fcntl(18, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
> close(18)                               = 0
> exit_group(1)                           = ?
> +++ exited with 1 +++

[...]

Today I stumbled upon this issue while building stuff on core-updates,
with the failure happening at a different place:

--8<---------------cut here---------------start------------->8---
jx934sgqx1qmybvn1yi1yfrgjk-rustc-1.24.1-src.tar.gz 67512512 4259936
Backtrace:
In guix/ui.scm:
  2154:12 19 (run-guix-command _ . _)
In ice-9/boot-9.scm:
  1736:10 18 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
  1731:15 17 (with-exception-handler #<procedure 7f2203d53fc0 at ice-9/boot-9.scm:1815:7 (exn)> _ # _ # ?)
  1736:10 16 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In guix/store.scm:
   632:37 15 (thunk)
   1301:8 14 (call-with-build-handler _ _)
   1301:8 13 (call-with-build-handler #<procedure 7f2203d538a0 at guix/ui.scm:1169:2 (continue store ?> ?)
In guix/status.scm:
    780:4 12 (call-with-status-report _ _)
In guix/scripts/environment.scm:
   751:14 11 (_)
In guix/store.scm:
  2049:24 10 (run-with-store #<store-connection 256.99 7f2205598690> _ #:guile-for-build _ #:system _ # ?)
In guix/scripts/environment.scm:
   764:18  9 (_ _)
In guix/store.scm:
  1918:38  8 (_ #<store-connection 256.99 7f2200b33d70>)
  1369:15  7 (_ #<store-connection 256.99 7f2200b33d70> _ _)
   741:13  6 (process-stderr _ _)
In unknown file:
           5 (display "gz https://ci.guix.gnu.org/nar/9dlfwa8qza8k6jj9m5a6sw2rkv73nx2m-rustc-1.20.0-s?" ?)
In guix/status.scm:
   703:16  4 (write! _ _ _)
   616:15  3 (_ (download-succeeded "/gnu/store/9dlfwa8qza8k6jj9m5a6sw2rkv73nx2m-rustc-1.20.0-src.?" ?) ?)
   273:33  2 (compute-status _ #<<build-status> building: (#<<build> derivation: "/gnu/store/wmm8ghla?> ?)
In ice-9/boot-9.scm:
  1669:16  1 (raise-exception _ #:continuable? _)
  1669:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1669:16: In procedure raise-exception:
In procedure struct-vtable: Wrong type argument in position 1 (expecting struct): #f
--8<---------------cut here---------------end--------------->8---

Maxim




Changed bug title to '(guix status) occasionally crashes when max-jobs > 1' from 'Guix substitute crash in procedure raise-exception: wrong type agument in position 1: #f' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Fri, 05 Feb 2021 09:50:02 GMT) Full text and rfc822 format available.

Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Fri, 05 Feb 2021 09:53:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#43518; Package guix. (Fri, 22 Nov 2024 18:31:02 GMT) Full text and rfc822 format available.

Message #36 received at 43518 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Enge <andreas <at> enge.fr>
To: 43518 <at> debbugs.gnu.org
Subject: Action?
Date: Fri, 22 Nov 2024 19:30:49 +0100
I just came along this very old bug, tagged as important;
is it still relevant, or can it be closed?

Andreas





This bug report was last modified 201 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.