GNU bug report logs -
#10281
du: hard-links counting with multiple arguments (commit
Previous Next
Reported by: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Mon, 12 Dec 2011 18:02:02 UTC
Severity: wishlist
Tags: wontfix
Merged with 10282,
11526
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
Message #61 received at 10281 <at> debbugs.gnu.org (full text, mbox):
Alan Curry wrote:
...
> By comparison to a proper tool which doesn't do any unnecessary traversals of
> extra directories, your use of du is slow and brittle (if the user forgets
> an alternate directory containing a link, the result is wrong) and has only
> the slight advantage of already being implemented.
>
> Here's a working outline of the single-traversal method. I wouldn't suggest
> that du should contain equivalent code. A single-purpose perl script, even
> without pretty output formatting, feels clean enough to me. Since I've gone
> to the trouble (not much) of writing it, I'll keep it as ~/bin/predict_rm_rf
> for future use.
>
> #!/usr/bin/perl -W
> use strict;
> use File::Find;
>
> @ARGV or die "Usage: $0 directory [directory ...]\n";
>
> my $total = 0;
> my %pending = ();
>
> File::Find::find({wanted => sub {
> my ($dev,$ino,$nlink,$blocks) = (lstat($_))[0,1,3,12];
> if(-d _ || $nlink==1) {
> $total += $blocks;
> return;
> }
> if($nlink == ++$pending{"$dev.$ino"}) {
> delete $pending{"$dev.$ino"};
> $total += $blocks;
> }
> }}, @ARGV);
>
> print "$total blocks would be freed by rm -rf @ARGV\n";
That seems useful.
However, the number it prints is too large whenever it processes
a file or directory more than $nlink times, e.g., when invoked as
predict_rm_rf F F
it prints double the correct number.
To account for that, the script must record every dev/ino pair
it processes, say via:
File::Find::find({wanted => sub {
my ($dev,$ino,$nlink,$blocks) = (lstat($_))[0,1,3,12];
defined $pending{"$dev.$ino"} && $pending{"$dev.$ino"} < 0
and return;
if(-d _ || $nlink==1 || $nlink == ++$pending{"$dev.$ino"}) {
$total += $blocks;
$pending{"$dev.$ino"} = -1;
return;
}
}}, @ARGV);
Note that for a large tree, the perl code will be far less efficient
than C code like du because:
- the perl script must call lstat for every single entry (du can
use dirent.d_ino on some file systems). When I checked about a year
ago, Perl still had no good way to get something like dirent.d_ino.
- du uses a compact representation for a device/inode pair, so
may use a lot less memory.
This bug report was last modified 6 years and 303 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.