GNU bug report logs -
#38621
gdu showing different sizes
Previous Next
Reported by: TJ Luoma <luomat <at> gmail.com>
Date: Sun, 15 Dec 2019 08:42:02 UTC
Severity: normal
Tags: notabug
Done: Bernhard Voelker <mail <at> bernhard-voelker.de>
Bug is archived. No further changes may be made.
Full log
Message #8 received at 38621 <at> debbugs.gnu.org (full text, mbox):
tag 38621 notabug
close 38621
stop
On 2019-12-15 06:15, TJ Luoma wrote:
> I ended up with two version of the same file
> 'StreamDeck-4.4.2.12189.pkg' and 'Stream_Deck_4.4.2.12189.pkg' and
> wanted to check to see if they were the same file.
>
> I checked the size with `gdu` like so:
>
> % /usr/local/bin/gdu --si -s *pkg
> 101M StreamDeck-4.4.2.12189.pkg
> 102M Stream_Deck_4.4.2.12189.pkg
>
> Which led me to think they were different files / sizes. But when I
> used `ls -l` I was surprised to see this:
>
> % command ls -l *pkg
> -rw-r--r-- 1 tjluoma staff 88885047 Dec 15 00:00 StreamDeck-4.4.2.12189.pkg
> -rw-r--r--@ 1 tjluoma staff 88885047 Dec 15 00:02 Stream_Deck_4.4.2.12189.pkg
>
> So they _are_ the same size. Are they the same file? I used `md5` to check
>
> % command md5 -r *pkg
> 98ac563a36386ca3aa87f62893302b4f StreamDeck-4.4.2.12189.pkg
> 98ac563a36386ca3aa87f62893302b4f Stream_Deck_4.4.2.12189.pkg
>
> OK, so these are exactly the same file. So… why did `gdu` tell me they
> are different sizes?
>
> % gdu --version
> du (GNU coreutils) 8.31
> Copyright (C) 2019 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
>
> Written by Torbjorn Granlund, David MacKenzie, Paul Eggert,
> and Jim Meyering.
>
> I'm using Mac OS X 10.14.6 (18G2022) with `coreutils` installed via `brew`.
>
> Any help would be appreciated.
This is a "sparse" file, i.e., a file with longer sequences of Zeroes
somewhere in between which can be stored more efficient on the disk.
Any application reading the data will get the correct number of Zeroes,
while some disk space is saved.
E.g. the following creates a 300M file, with the first 100M and the last 100M
with random data, and the 100M between is a "hole":
# Write the 1st 100M (as usual).
$ dd bs=1M count=100 if=/dev/urandom of=f
100+ 0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.466356 s, 225 MB/s
# Write another 100M, but starting at a position of 200M,
# thus leaving Zeroes in between.
$ dd bs=1M seek=200 count=100 if=/dev/urandom of=f
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.462072 s, 227 MB/s
$ ls -logh f
-rw-r--r-- 1 300M Dec 15 18:17 f
$ du -h f # shows the space occupied on disk.
200M f
$ du --apparent-size -h f # shows the size applications would read.
300M f
See the documentation of 'cp' and 'du':
https://www.gnu.org/software/coreutils/cp (the --sparse option)
https://www.gnu.org/software/coreutils/du (the --apparent-size option)
As this is not a bug in du(1), I'm marking this as such, and close the ticket
in our bug tracker. The discussion can continue, of course.
Have a nice day,
Berny
This bug report was last modified 5 years and 157 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.