#36630 - [PATCH] guix: parallelize building the manual-database

GNU bug report logs - #36630
[PATCH] guix: parallelize building the manual-database

Date: Fri, 12 Jul 2019 21:44:01 UTC

Severity: normal

Tags: patch

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Message #14 received at 36630 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org> To: Arne Babenhauserheide <arne_bab <at> web.de> Cc: 36630 <at> debbugs.gnu.org Subject: Re: [bug#36630] [PATCH] guix: parallelize building the manual-database Date: Tue, 16 Jul 2019 23:14:48 +0200

Hello, Arne Babenhauserheide <arne_bab <at> web.de> skribis: > Ludovic Courtès <ludo <at> gnu.org> writes: [...] >> I picked the manual-database derivation returned for: >> guix environment --ad-hoc jupyter python-ipython python-ipykernel -n >> (It has 3,046 entries.) > > How exactly did you run the derivation? I’d like to check it if you can > give me the exact commandline to run (a command I can run repeatedly). If you run the command above, it’ll list /gnu/store/…-manual-database.drv. So you can just run: guix build /gnu/store/…-manual-database.drv or: guix build /gnu/store/…-manual-database.drv --check if it had already been built before. >> On a SSD and with a hot cache, on my 4-core laptop, I get 74s with >> ‘master’, and 53s with this patch. > > I’m using a machine with 6 physical cores, hyperthreading, and an NVMe > M.2 disk, so it is likely that it would not be disk-bound for me at 4 > threads. The result may be entirely different with a spinning disk. :-) I’m not saying we should optimize for spinning disks, just that what you see is at one end of the spectrum. >> However, it will definitely not scale linearly, so we should probably >> cap at 2 or 4 threads. WDYT? > > Looking at the underlying action, this seems to be a task that scales > pretty well. It just unpacks files into the disk-cache. > > It should also not consume much memory, so I don’t see a reason to > artificially limit the number of threads. On a many-core machine like we have in our build farm, with spinning disks, I believe that using one thread per core would be counterproductive. >> Another issue with the patch is that the [n/total] counter does not grow >> monotically now: it might temporally go backwards. Consequently, at >> -v1, users will see a progress bar that hesitates and occasionally goes >> backward, which isn’t great. > > It typically jumps forward in the beginning and then stalls until the > first manual page is finished. > > Since par-map uses a global queue of futures to process, and since the > output is the first part of (compute-entry …), I don’t expect the > progress to move backwards in ways a user sees: It could only move > backwards during the initial step where all threads start at the same > time, and there the initial output should be overwritten fast enough to > not be noticeable. Hmm, maybe. I’m sure we’ll get reports saying this looks weird and Something Must Absolutely Be Done About It. :-) But anyway, another issue is that we would need to honor ‘parallel-job-count’, which means using ‘n-par-map’, which doesn’t use futures. > Given that building manual pages is the most timeconsuming part when > installing a small tool into my profile, I think it is worth the > complexity. Especially because most of the complexity is being taken > care of by (ice-9 threads par-map). Just today I realized that the example above (with Jupyter) has so many entries because of propagated inputs; in particular libxext along brings 1,000+ man pages. We should definitely do something about these packages. Needs more thought… Thanks, Ludo’.

This bug report was last modified 5 years and 49 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #36630 [PATCH] guix: parallelize building the manual-database

GNU bug report logs - #36630
[PATCH] guix: parallelize building the manual-database