GNU bug report logs - #10281
du: hard-links counting with multiple arguments (commit

Previous Next

Package: coreutils;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Mon, 12 Dec 2011 18:02:02 UTC

Severity: wishlist

Tags: wontfix

Merged with 10282, 11526

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #64 received at 10281 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eric Blake <eblake <at> redhat.com>
Cc: Don Cragun <dcragun <at> sonic.net>, 10281 <at> debbugs.gnu.org,
	austin-group-l <austin-group-l <at> opengroup.org>
Subject: Re: [1003.1(2008)/Issue 7 0000527]: du and files found via multiple
	command line arguments
Date: Sun, 18 Dec 2011 14:03:49 -0800
Eric Blake's Option 1 does not appear to be tenable, as du
traditionally preserved hashes of duplicate files across all
of its operands.  7th Edition Unix 'du' did that, and (as
Jilles Tjoelker pointed out) so do at least two current 'du'
implementations, namely, FreeBSD and GNU.

The idea behind Eric's Option 2 is better, but its wording
is unclear partly because of another issue Jilles raised:
whether a file's disk space should be counted multiple times
if the file occurs multiple times and its link count is 1.
For example:

  mkdir d
  cd d
  cp /bin/sh w
  cp w y
  ln y ../y
  ln -s w x
  ln -s y z
  du -aL

This analyzes a directory with two regular files, 'w' and
'y'.  GNU and Solaris du count these files once each, with
an accurate sum of non-symlink disk usage under the current
directory.  But w's link count is 1 so FreeBSD counts 'w'
twice, thus overcounting disk usage.

The current POSIX wording does not say what to do for this
example, but the intent is to avoid overcounting disk usage,
and the GNU and Solaris behavior supports this intent better.
(The 7th Edition Unix behavior agrees with FreeBSD, but this
predates symbolic links so the behavior is now dubious.)

Given all the above, the standard's wording could be
improved in several different ways, all elaborations of
Option 2.  Here are two possibilities:

  Option 2A - require that files be hashed among all
  operands, and that disk usage be counted at most once.

    Change line 84170 [du DESCRIPTION] from:

      Files with multiple links shall be counted and written
      for only one entry.

    to:

      A file that occurs multiple times shall be counted and
      written for only one entry, even if the occurrences
      are under different file operands.

  Option 2B - leave unspecified whether files are hashed
  among all operands, and leave unspecified whether disk
  usage is counted multiple times for files whose link
  count does not exceed 1.  From the user's point of view,
  this means du's output is a reliable count of disk usage
  only if du is invoked without -L and with -x and with at
  most one operand.

    Change line 84170 [du DESCRIPTION] from:

      Files with multiple links shall be counted and written
      for only one entry.

    to:

      A file that occurs multiple times under one file
      operand and that has a link count greater than 1 shall
      be counted and written for only one entry.  It is
      implementation-defined whether a file that has a link
      count no greater than 1 is counted and written just
      once, or is counted and written for each occurrence.
      It is implementation-defined whether a file that
      occurs under one file operand is counted for other
      file operands.

Option 2A is simpler and clearer, but it invalidates many
existing implementations.  Option 2B modifies the standard
to describe how existing implementations actually work, but
is more complicated and more of a hassle to use reliably.

Eric raised one other issue: the description of the -a
option implies that "du A B" must always list B.  This
implication is incorrect for 7th edition Unix du, GNU du,
and (I expect) FreeBSD du, so it should be fixed as well.
Here's one possible fix, which is independent of the
abovementioned changes.

  Change line ????? [du OPTIONS] from:

    Regardless of the presence of the -a option,
    non-directories given as file operands shall always
    be listed.

  to:

    The -a option does not affect whether
    non-directories given as file operands are listed.

(Sorry, I don't know the line number here; I don't have a
PDF copy of the current standard and don't know offhand how
to get one.)





This bug report was last modified 6 years and 303 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.