Package: coreutils;
Reported by: Paul Eggert <eggert <at> CS.UCLA.EDU>
Date: Sat, 3 Jul 2010 06:42:01 UTC
Severity: normal
Fixed in version 8.6
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Jim Meyering <jim <at> meyering.net> To: Paul Eggert <eggert <at> CS.UCLA.EDU> Cc: 6557 <at> debbugs.gnu.org Subject: bug#6557: du sometimes miscounts directories, and files whose link count equals 1 Date: Sat, 03 Jul 2010 10:36:00 +0200
Jim Meyering wrote: > Paul Eggert wrote: >> (I found this bug by code inspection while doing the du performance >> improvement reported in: >> http://lists.gnu.org/archive/html/bug-coreutils/2010-07/msg00014.html >> ) >> >> Unless -l is given, du is not supposed to count the same file more >> than once. It optimizes this test by not bothering to put a file into >> the hash table if its link count is 1, or if it is a directory. But >> this optimization is not correct if -L is given (because the same >> link-count-1 file, or directory, can be seen via symbolic links) or if >> two or more arguments are given (because the same such file can be >> seen under multiple arguments). The optimization should be suppressed >> if -L is given, or if multiple arguments are given. >> >> Here is a patch, with a couple of test cases for it. This patch >> assumes the du performance fix, but I can prepare an independent >> patch if you like. > > Thanks! > Actually, that patch applies just fine, as-is. > However, it induces this new "make check" test failure: ... > This is the additional patch we'd need to make the failing > failing test accept your new output. You're welcome to merge > it into yours. Actually I did that. Here's the adjusted patch, for review. Note the "du: " prefix on the one-line log summary -- that's the part that goes into the Subject below. Plus, I shortened it. Also, I added a log line for the tests/du/files0-from change. (BTW, the following is the output from "git format-patch --stdout -1". It's easy to apply that by saving it in a FILE, then running "git am FILE") From efe53cc72b599979ea292754ecfe8abf7c839d22 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert <at> CS.UCLA.EDU> Date: Fri, 2 Jul 2010 23:41:08 -0700 Subject: [PATCH] du: don't miscount duplicate directories or link-count-1 files * NEWS: Mention this. * src/du.c (hash_all): New static var. (process_file): Use it. (main): Set it. * tests/du/hard-link: Add a couple of test cases to help make sure this bug stays squashed. * tests/du/files0-from: Adjust existing tests to reflect change in semantics with duplicate arguments. --- NEWS | 5 +++++ src/du.c | 15 +++++++++++++-- tests/du/files0-from | 8 ++++---- tests/du/hard-link | 44 ++++++++++++++++++++++++++++++-------------- 4 files changed, 52 insertions(+), 20 deletions(-) diff --git a/NEWS b/NEWS index 3a24925..b02a223 100644 --- a/NEWS +++ b/NEWS @@ -38,6 +38,11 @@ GNU coreutils NEWS -*- outline -*- Also errors are no longer suppressed for unsupported file types, and relative sizes are restricted to supported file types. +** Bug fixes + + du no longer multiply counts a file that is a directory or whose + link count is 1, even if the file is reached multiple times by + following symlinks or via multiple arguments. * Noteworthy changes in release 8.5 (2010-04-23) [stable] diff --git a/src/du.c b/src/du.c index a90568e..4d6e03a 100644 --- a/src/du.c +++ b/src/du.c @@ -132,6 +132,9 @@ static bool apparent_size = false; /* If true, count each hard link of files with multiple links. */ static bool opt_count_all = false; +/* If true, hash all files to look for hard links. */ +static bool hash_all; + /* If true, output the NUL byte instead of a newline at the end of each line. */ static bool opt_nul_terminate_output = false; @@ -518,8 +521,7 @@ process_file (FTS *fts, FTSENT *ent) via a hard link, then don't let it contribute to the sums. */ if (skip || (!opt_count_all - && ! S_ISDIR (sb->st_mode) - && 1 < sb->st_nlink + && (hash_all || (! S_ISDIR (sb->st_mode) && 1 < sb->st_nlink)) && ! hash_ins (sb->st_ino, sb->st_dev))) { /* Note that we must not simply return here. @@ -937,11 +939,20 @@ main (int argc, char **argv) quote (files_from)); ai = argv_iter_init_stream (stdin); + + /* It's not easy here to count the arguments, so assume the + worst. */ + hash_all = true; } else { char **files = (optind < argc ? argv + optind : cwd_only); ai = argv_iter_init_argv (files); + + /* Hash all dev,ino pairs if there are multiple arguments, or if + following non-command-line symlinks, because in either case a + file with just one hard link might be seen more than once. */ + hash_all = (optind + 1 < argc || symlink_deref_bits == FTS_LOGICAL); } if (!ai) diff --git a/tests/du/files0-from b/tests/du/files0-from index 620246d..860fc6a 100755 --- a/tests/du/files0-from +++ b/tests/du/files0-from @@ -70,15 +70,15 @@ my @Tests = {IN=>{f=>"g\0"}}, {AUX=>{g=>''}}, {OUT=>"0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ], - # two file names, no final NUL + # two identical file names, no final NUL ['2', '--files0-from=-', '<', {IN=>{f=>"g\0g"}}, {AUX=>{g=>''}}, - {OUT=>"0\tg\n0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ], + {OUT=>"0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ], - # two file names, with final NUL + # two identical file names, with final NUL ['2a', '--files0-from=-', '<', {IN=>{f=>"g\0g\0"}}, {AUX=>{g=>''}}, - {OUT=>"0\tg\n0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ], + {OUT=>"0\tg\n"}, {OUT_SUBST=>'s/^\d+/0/'} ], # Ensure that $prog processes FILEs following a zero-length name. ['zero-len', '--files0-from=-', '<', diff --git a/tests/du/hard-link b/tests/du/hard-link index 7e4f51a..e22320b 100755 --- a/tests/du/hard-link +++ b/tests/du/hard-link @@ -26,24 +26,40 @@ fi . $srcdir/test-lib.sh mkdir -p dir/sub -( cd dir && { echo non-empty > f1; ln f1 f2; echo non-empty > sub/F; } ) - - -# Note that for this first test, we transform f1 or f2 -# (whichever name we find first) to f_. That is necessary because, -# depending on the type of file system, du could encounter either of those -# two hard-linked files first, thus listing that one and not the other. -du -a --exclude=sub dir \ - | sed 's/^[0-9][0-9]* //' | sed 's/f[12]/f_/' > out || fail=1 -echo === >> out -du -a --exclude=sub --count-links dir \ - | sed 's/^[0-9][0-9]* //' | sort -r >> out || fail=1 +( cd dir && + { echo non-empty > f1 + ln f1 f2 + ln -s f1 f3 + echo non-empty > sub/F; } ) + +du -a -L --exclude=sub --count-links dir \ + | sed 's/^[0-9][0-9]* //' | sort -r > out || fail=1 + +# For these tests, transform f1 or f2 or f3 (whichever name is find +# first) to f_. That is necessary because, depending on the type of +# file system, du could encounter any of those linked files first, +# thus listing that one and not the others. +for args in '-L' 'dir' '-L dir' +do + echo === >> out + du -a --exclude=sub $args dir \ + | sed 's/^[0-9][0-9]* //' | sed 's/f[123]/f_/' >> out || fail=1 +done + cat <<\EOF > exp +dir/f3 +dir/f2 +dir/f1 +dir +=== dir/f_ dir === -dir/f2 -dir/f1 +dir/f_ +dir/f_ +dir +=== +dir/f_ dir EOF -- 1.7.2.rc1.192.g262ff
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.