GNU bug report logs - #38621
gdu showing different sizes

Previous Next

Package: coreutils;

Reported by: TJ Luoma <luomat <at> gmail.com>

Date: Sun, 15 Dec 2019 08:42:02 UTC

Severity: normal

Tags: notabug

Done: Bernhard Voelker <mail <at> bernhard-voelker.de>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Bernhard Voelker <mail <at> bernhard-voelker.de>
To: TJ Luoma <luomat <at> gmail.com>
Cc: 38621 <at> debbugs.gnu.org
Subject: bug#38621: gdu showing different sizes
Date: Mon, 16 Dec 2019 08:47:10 +0100
On 2019-12-16 07:25, TJ Luoma wrote:
> I sort of followed most of the technical part of that but I still don’t
> understand why it’s not a bug to show different information about two
> identical files.
> 
> Which may indicate that I didn’t understand the technical part very well.
> 
> As an end user, it’s hard to understand how that inconsistency isn’t both
> undesirable and a bug.
> 
> I could maybe see if they were two files with the same byte-count but
> different composition that made the calculations off by 1, but this is an
> identical file and it’s showing up with two different sizes, in a tool
> meant to report sizes.
> 
> That just seems “obviously” wrong even if it’s somehow technically
> explainable.

Thanks for following up on this for further clarifications.

I think the problem is the word "size":
while 'ls' and 'du --apparent-size' show the length of the content of
a file, 'du' (without --apparent-size') reports the space the file
needs on disk.

  $ du --help | sed 3q
  Usage: du [OPTION]... [FILE]...
    or:  du [OPTION]... --files0-from=F
  Summarize disk usage of the set of FILEs, recursively for directories.
____________^^^^^^^^^^

One reason for those sizes to differ are "holes".  As an extreme case,
one can create a 4 Terabyte file (just NULs) on a filesystem which is
much smaller than that:

  # Filesystem size.
  $ df -h --out=size,target .
   Size Mounted on
   591G /mnt

  # Create a NUL-only file of size 4 Terabyte.
  $ truncate -s4T f2

  # 'ls' shows the 4T of file size.
  $ ls -logh f2
  -rw-r--r-- 1 4.0T Dec 16 08:36 f2

  # 'du' shows that the file does not even require any disk usage.
  $ du -h f2
  0	f2

  # ... but with '--apparent-size' reports the real (content) size.
  $ du -h --apparent-size f2
  4.0T	f2

  # Any program will see the 4T content transparently.
  $ wc -c < f2
  4398046511104

In your case, the file was a mixture of regular data and holes,
and 'cp' (without --sparse=always) tried to automatically determine
if the target file should have holes or not (see 'man cp').
Therefore, your 2 files had a different disk usage, but the net length
of the content is identical, of course.

Have a nice day,
Berny




This bug report was last modified 5 years and 157 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.