#10281 - du: hard-links counting with multiple arguments (commit

GNU bug report logs - #10281
du: hard-links counting with multiple arguments (commit

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Mon, 12 Dec 2011 18:02:02 UTC

Severity: wishlist

Tags: wontfix

Merged with 10282, 11526

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Message #61 received at 10281 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net> To: "Alan Curry" <pacman-cu <at> kosh.dhis.org> Cc: 10281 <at> debbugs.gnu.org Subject: Re: bug#10281: change in behavior of du with multiple arguments (commit Date: Sat, 17 Dec 2011 10:20:09 +0100

Alan Curry wrote: ... > By comparison to a proper tool which doesn't do any unnecessary traversals of > extra directories, your use of du is slow and brittle (if the user forgets > an alternate directory containing a link, the result is wrong) and has only > the slight advantage of already being implemented. > > Here's a working outline of the single-traversal method. I wouldn't suggest > that du should contain equivalent code. A single-purpose perl script, even > without pretty output formatting, feels clean enough to me. Since I've gone > to the trouble (not much) of writing it, I'll keep it as ~/bin/predict_rm_rf > for future use. > > #!/usr/bin/perl -W > use strict; > use File::Find; > > @ARGV or die "Usage: $0 directory [directory ...]\n"; > > my $total = 0; > my %pending = (); > > File::Find::find({wanted => sub { > my ($dev,$ino,$nlink,$blocks) = (lstat($_))[0,1,3,12]; > if(-d _ || $nlink==1) { > $total += $blocks; > return; > } > if($nlink == ++$pending{"$dev.$ino"}) { > delete $pending{"$dev.$ino"}; > $total += $blocks; > } > }}, @ARGV); > > print "$total blocks would be freed by rm -rf @ARGV\n"; That seems useful. However, the number it prints is too large whenever it processes a file or directory more than $nlink times, e.g., when invoked as predict_rm_rf F F it prints double the correct number. To account for that, the script must record every dev/ino pair it processes, say via: File::Find::find({wanted => sub { my ($dev,$ino,$nlink,$blocks) = (lstat($_))[0,1,3,12]; defined $pending{"$dev.$ino"} && $pending{"$dev.$ino"} < 0 and return; if(-d _ || $nlink==1 || $nlink == ++$pending{"$dev.$ino"}) { $total += $blocks; $pending{"$dev.$ino"} = -1; return; } }}, @ARGV); Note that for a large tree, the perl code will be far less efficient than C code like du because: - the perl script must call lstat for every single entry (du can use dirent.d_ino on some file systems). When I checked about a year ago, Perl still had no good way to get something like dirent.d_ino. - du uses a compact representation for a device/inode pair, so may use a lot less memory.

This bug report was last modified 6 years and 303 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #10281 du: hard-links counting with multiple arguments (commit

GNU bug report logs - #10281
du: hard-links counting with multiple arguments (commit