Package: guix;
Reported by: ludo <at> gnu.org (Ludovic Courtès)
Date: Sun, 13 Nov 2016 17:42:02 UTC
Severity: important
Message #19 received at 24937 <at> debbugs.gnu.org (full text, mbox):
From: ludo <at> gnu.org (Ludovic Courtès) To: Mark H Weaver <mhw <at> netris.org> Cc: 24937 <at> debbugs.gnu.org Subject: Re: bug#24937: "deleting unused links" GC phase is too slow Date: Sun, 11 Dec 2016 19:02:42 +0100
Mark H Weaver <mhw <at> netris.org> skribis: > ludo <at> gnu.org (Ludovic Courtès) writes: > >> Here’s a proposed patch that follows your suggestion, Mark, but places >> an upper bound on the number of directory entries loaded in memory. >> >> On my laptop, which has ~500k entries in /gnu/store/.links, the result >> is something like this (notice the inode numbers in ‘lstat’ calls): >> >> 13738 write(4, "gmlo\0\0\0\0\31\0\0\0\0\0\0\0deleting unused "..., 48) = 48 >> 13738 open("/gnu/store/.links", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 8 >> 13738 fstat(8, {st_dev=makedev(8, 2), st_ino=4014083, st_mode=S_IFDIR|0755, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=119224, st_size=60977152, st_atime=2016/12/11-12:01:59, st_mtime=2016/12/11-09:39:45, st_ctime=2016/12/11-09:39:45}) = 0 >> 13738 getdents(8, {{d_ino=4014083, d_off=4294967296, d_reclen=24, d_name="."} >> [...] >> 13738 lstat("/gnu/store/.links/1f2f3g8waxwymp9sl2slcfyara164i8w1y2sz3h9js2fcviv2rnc", {st_dev=makedev(8, 2), st_ino=47, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=88, st_size=41482, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0 >> 13738 lstat("/gnu/store/.links/1p25kpyfw354in4kykmgh5sy9h925hnil1jdzgxhz7n6abbws8px", {st_dev=makedev(8, 2), st_ino=65, st_mode=S_IFREG|0444, st_nlink=9, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2313, st_atime=2016/12/08-21:02:26, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:49}) = 0 >> 13738 lstat("/gnu/store/.links/163187g637br9ys5pmshb01wjav53bs1g1a83m7c2alpdyx3yqz2", {st_dev=makedev(8, 2), st_ino=68, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=32, st_size=13561, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0 >> [...] >> 13738 getdents(8, {{d_ino=4257114, d_off=1734093409898198492, >> [...] >> 13738 lstat("/gnu/store/.links/1m6g06i01ybbkhjjbirjnj7fckw1b772cwygkvbd6v6zgkln7f7m", {st_dev=makedev(8, 2), st_ino=19, st_mode=S_IFREG|0444, st_nlink=4, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2553, st_atime=2016/12/08-21:02:54, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/07-00:05:19}) = 0 >> 13738 lstat("/gnu/store/.links/1ml8n3q55ikn8h60sn67jq1y7z7mvdp5kwr33pqrid2r6kk1d4kb", {st_dev=makedev(8, 2), st_ino=30, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2090, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:21}) = 0 >> 13738 lstat("/gnu/store/.links/1c8cvwlqyyqby3k13cwm40g26pwca5iiz5dcj43xrgn9y91lfvc2", {st_dev=makedev(8, 2), st_ino=33, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=7958, st_atime=2015/11/04-18:55:32, st_mtime=1970/01/01-01:00:01, st_ctime=2016/01/05-11:35:49}) = 0 >> [...] >> 13738 getdents(8, {{d_ino=328672, >> [...] >> 13738 lstat("/gnu/store/.links/1l55l59dxb74himmkfzx5v63cv7287i6rjhdns1fdlwajqd73lnz", {st_dev=makedev(8, 2), st_ino=21, st_mode=S_IFREG|0444, st_nlink=31, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=615, st_atime=2016/12/08-21:02:30, st_mtime=1970/01/01-01:00:01, st_ctime=2016/12/11-01:44:47}) = 0 >> 13738 lstat("/gnu/store/.links/1c7mm5amw743mb45f1zg4d4r3g549ch35wks9izkcgkx0jirpxsg", {st_dev=makedev(8, 2), st_ino=48, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=360, st_size=176750, st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, st_ctime=2015/11/25-11:29:20}) = 0 >> 13738 lstat("/gnu/store/.links/0z5s7b0yk8mfn1np6gk3cdbmpnjgxg1g0l8vfq1aa01zwp06d3f0", {st_dev=makedev(8, 2), st_ino=61, st_mode=S_IFREG|0444, st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=1440, st_atime=2016/11/03-20:37:50, st_mtime=1970/01/01-01:00:01, st_ctime=2016/11/07-00:01:57}) = 0 >> >> I can’t tell exactly how this affects performance because my machine has >> an SSD where this operation takes ~3 seconds on a cold cache. I suspect >> it has performance comparable to that of doing all the ‘readdir’ at once >> followed by all the ‘lstat’. >> >> Mark, how does that sound? > > I think we should sort the entire directory using merge sort backed to > disk files. If we load chunks of the directory, sort them and process > them individually, I expect that this will increase the amount of I/O > required by a non-trivial factor. In each pass, we would load blocks of > inodes from disk, almost all of which are likely to be present in the > store and thus linked from the directory, but in this scheme we will > process only a small number of them and drop the rest on the floor to be > read again in the next pass. Given that even my fairly optimal > implementation takes about 35 minutes to run on Hydra, I'd prefer to > avoid multiplying that by a non-trivial factor. Sure, though it’s not obvious to me how much of a difference it makes; my guess is that processing in large chunks is already a win, but we’d have to measure. > Why not just use GNU sort? It already exists, and does exactly what we > need. Does ‘sort’ manage to avoid reading whole files in memory? If not, I think it wouldn’t buy us anything to use it. > If you object to using an external program for some reason, I would > prefer to re-implement a similar algorithm in the daemon. Yeah, I’d rather avoid serializing the list of file names/inode number pairs just to invoke ‘sort’ on that. Also, what algorithm are you referring to? Thanks for the quick feedback! Ludo’.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.