GNU bug report logs -
#54723
'guix publish' without '--cache' can send garbled narinfo responses
Previous Next
Full log
Message #38 received at 54723 <at> debbugs.gnu.org (full text, mbox):
Hi,
(+Cc: Mathieu.)
Guillaume Le Vaillant <glv <at> posteo.net> skribis:
> With master at 101edbe63a887678722bc26cd85a7b7f5637428f, I reproduce the
> issue very often when trying to upgrade a profile with around 400
> packages. The logs of the publish server show hundreds of narinfo
> requests and tens of "broken pipe" errors.
>
> With an additional commit reverting
> f743f2046be2c5a338ab871ae8666d8f6de7440b, I can't reproduce the issue
> anymore. The logs still show hundreds of narinfo requests, but no
> errors.
For the record, a simple way to reproduce it is to start ‘guix publish’:
sudo -E ./pre-inst-env guix publish -u nobody -p 9999
and to spawn ‘guix weather’, for example with:
guix weather $(guix package -I. -p ~/.guix-home/profile |cut -f1) \
--substitute-urls=http://localhost:9999
which crashes with the dreaded backtrace:
--8<---------------cut here---------------start------------->8---
Backtrace:
11 (primitive-load "/home/ludo/.config/guix/current/bin/guix")
In guix/ui.scm:
2230:7 10 (run-guix . _)
2193:10 9 (run-guix-command _ . _)
In ice-9/boot-9.scm:
1752:10 8 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In guix/scripts/weather.scm:
595:9 7 (_)
In guix/build/utils.scm:
677:23 6 (every* #<procedure 7f4af9621de0 at guix/scripts/weather.scm:595:17 (server)> _)
In guix/scripts/weather.scm:
597:21 5 (_ "http://localhost:9999")
120:17 4 (report-server-coverage _ ("/gnu/store/428bzp9325mfapyr4ywzwsz4ic7ssx2b-shepherd-0.9.0" "/gnu/store/sll9nfmqk8lkrraqbkyp3y…" …) …)
In unknown file:
3 (_ #<procedure 7f4af896ca40 at guix/scripts/weather.scm:201:2 ()> #<procedure list _> . #w())
In guix/substitutes.scm:
322:31 2 (lookup-narinfos "http://localhost:9999" _ #:open-connection _ #:make-progress-reporter _)
245:26 1 (fetch-narinfos _ _ #:open-connection _ #:make-progress-reporter _)
In ice-9/boot-9.scm:
1685:16 0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Wrong type (expecting exact integer): #f
--8<---------------cut here---------------end--------------->8---
I looked more closely at f743f2046be2c5a338ab871ae8666d8f6de7440b and I
think it had a logical flaw: if you pipeline 100 GET requests, then,
with this commit, ‘guix publish’ would spawn 100 threads that would all
reply concurrently (more or less). This is clearly wrong: replies
should be sent sequentially.
So in commit c1719a0adf3fa7611b56ca4d75b3ac8cf5c9c8ac I went ahead and
reverted it that commit. I also added a test that reproduces the issue
above.
Now, commit f743f2046be2c5a338ab871ae8666d8f6de7440b was itself an
attempt to fix a bug whereby ‘narinfo-string’ would take too long,
thereby preventing ‘guix publish’ from accepting connections (since it’s
single-threaded).
I think the only reasonable way to fix it is by using Fibers to make
‘guix publish’ concurrent (using a fiberized server like in Cuirass).
We should do that at some point. That’ll also allow us to remove some
of the hacks we’ve accumulated.
Now, the only way ‘narinfo-string’ can take too long these days is (I
think) if the store GC lock is held (we should check that hypothesis,
but I believe that if the GC lock is held, then the ‘query-path-info’
RPC made from ‘narinfo-string’ might block until the lock is released).
The GC lock is no longer held for hours on berlin, so there’s less
pressure to address that.
To summarize: I think ‘guix publish’ is okay as-is but we should
fiberize it sometime.
Thoughts?
Ludo’.
This bug report was last modified 3 years and 17 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.