On 12/13/2011 09:46 AM, Kamil Dudka wrote: >> I agree that printing "0 X" for these seems inconsistent with the >> elision mandated for the second and subsequent encounter of a file, >> but I suppose command line arguments are intrinsically different >> enough that handling them specially makes sense. Maybe even as >> the default. >> >>> Perhaps 'du' needs a new option to control what to do with >>> files that 'du' has already seen before. something that >>> generalizes --count-links. >> >> That sounds like a good way to do it. >> Anyone interested? > > Thank all of you for looking at the issue. If I understand it correctly, the > old behavior was violating POSIX whereas the current default behavior is > correct. Not quite. The POSIX wording does not match historical practice, and appears to be contradictory (or at least ambiguous), so we may need to ask for clarification from the Austin Group. The problem is that POSIX says that if an inode is encountered more than once, it is only listed once (without reference to whether those encounters were from recursion on a single command line argument, recursion across multiple command line arguments, or even if the duplication occurs on the command line itself); but it also says that with '-s', listings are output for all command line arguments. Historically, du implementations elided output for inode duplication found within a single command line argument, but not across multiple command line arguments. The coreutils behavior was changed to elide duplicates across multiple command line arguments; particularly so that in the -s case, you can sum the total usage and get an accurate feel, no matter which order the command line arguments were listed in. But in doing so, we elided duplicate command line arguments, which goes against the POSIX wording that -s will list a summary for all arguments. Hence our proposal of using '0' for a directory previously counted. > I tried du --count-links with the original reproducer and it seemed > to work fine. So what would be the point in adding a new option? I think the proposal is to add a new option that forces du to reset its duplicate inode hash table for each command line argument, to make behavior more like traditional du, even though it means -s can then output a larger usage by summing the first column than what you would get by the default behavior, when encountering command line arguments that are a duplicate with an inode already traversed earlier in the command line. --count-links isn't quite right, because you still want to elide links within a single directory of the command-line argument. Or maybe --count-links gains an optional argument, that says how to count links: --count-links=none -> POSIX behavior (if POSIX requires elision across command line arguments --count-links=per-directory -> traditional behavior, resetting hash between command line arguments --count-links == --count-links=all -> count every file on every encounter -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org