From unknown Sat Jun 21 12:19:29 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#38621 <38621@debbugs.gnu.org> To: bug#38621 <38621@debbugs.gnu.org> Subject: Status: gdu showing different sizes Reply-To: bug#38621 <38621@debbugs.gnu.org> Date: Sat, 21 Jun 2025 19:19:29 +0000 retitle 38621 gdu showing different sizes reassign 38621 coreutils submitter 38621 TJ Luoma severity 38621 normal tag 38621 notabug thanks From debbugs-submit-bounces@debbugs.gnu.org Sun Dec 15 03:41:05 2019 Received: (at submit) by debbugs.gnu.org; 15 Dec 2019 08:41:05 +0000 Received: from localhost ([127.0.0.1]:35979 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igPSX-0007D9-JS for submit@debbugs.gnu.org; Sun, 15 Dec 2019 03:41:05 -0500 Received: from lists.gnu.org ([209.51.188.17]:47063) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igMFu-0001Cz-GL for submit@debbugs.gnu.org; Sun, 15 Dec 2019 00:15:46 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:36385) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igMFt-0003aP-8D for bug-coreutils@gnu.org; Sun, 15 Dec 2019 00:15:46 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, URIBL_BLOCKED autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1igMFr-0000zs-PS for bug-coreutils@gnu.org; Sun, 15 Dec 2019 00:15:45 -0500 Received: from mail-wm1-x335.google.com ([2a00:1450:4864:20::335]:54277) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1igMFr-0000yb-HU for bug-coreutils@gnu.org; Sun, 15 Dec 2019 00:15:43 -0500 Received: by mail-wm1-x335.google.com with SMTP id b19so3055793wmj.4 for ; Sat, 14 Dec 2019 21:15:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=x5eb/ZLwJdJ3BWgmHUFP4z99KEHbeUJYSqNe/a7fa3o=; b=tZFAw7/ui0a5wrMb7AWcJhlKW9l7DiHmBHJDsKYbR1q67Qp5DeeDGfEwgVpDs6gU16 diyKpbbodQfJI5dxx904s75E6CZztcfSBZe7TUY0Rcs8Oc+QhQ9Wj6jvB6NIpaGOycGk R3OMXGwnheMNCI+4ymk51P23Q7bsfa6TF2POWbo5QN+kGnHAfh7St64K3ANwUvWGwRA1 ZiS8nSwkL8vker4KhkHAVbVA4qUqvQZUqWSVeJ66aXEEAXgN3W3YHJioYu0dV7UU2v4z UtFmpJFy6wtCGXqBTVIP4/CfFG59M0v6SVOXoFkla2bCwfqFmDNnvLyJHjvcRpTghNWq HYrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=x5eb/ZLwJdJ3BWgmHUFP4z99KEHbeUJYSqNe/a7fa3o=; b=MWJ024LTv3FryO/4w2XuQC1+B9LZoq9YzG32BURi0cwKET8HHoqq5qRENKAjPSbegE yj3kMxsxWBm2UPb/thRHVfhn0J2nLnSlJoFOqphiF938zSJ/h1XVpHVOw5tTU8ro6Avp P+qglbgpOKqubkrXp2fp4WklEMuJiDM3GFy0yxJd99Z9IRygrX/myjevH6DUU76HrHNG 76scHxGSf3GS6BQ2VhoLS+7msEpjjMhEzfBnEjg1xhE2kVvf5HNY6pYTpPTfT0by03zB 0n1SSN+3JxfE3lxmfBRzysLHUrXW3oMNyxXp8fdKU0ym2xgdWU1Lsx9dn3GxNXfFup3P ivfQ== X-Gm-Message-State: APjAAAXw1Uvpn0wtgiQdqOSvjGHMJWWzs6j/5OOlfPvDblvXSHgV7Lhx yhAxupw1Qcue00GN+WoDbIhL56J31iZi43Gru2wDVNS+zN0= X-Google-Smtp-Source: APXvYqwtxfRKIkABx8GmJybPpA7y9W87l+7Q6feoefQZGveCAt64ZZ1tcpvhxsdoOI6syyI2jh/2upqSou1MXN7LLOI= X-Received: by 2002:a1c:234b:: with SMTP id j72mr8519499wmj.128.1576386940945; Sat, 14 Dec 2019 21:15:40 -0800 (PST) MIME-Version: 1.0 From: TJ Luoma Date: Sun, 15 Dec 2019 00:15:05 -0500 Message-ID: Subject: gdu showing different sizes To: bug-coreutils@gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::335 X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sun, 15 Dec 2019 03:41:00 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) I ended up with two version of the same file 'StreamDeck-4.4.2.12189.pkg' and 'Stream_Deck_4.4.2.12189.pkg' and wanted to check to see if they were the same file. I checked the size with `gdu` like so: % /usr/local/bin/gdu --si -s *pkg 101M StreamDeck-4.4.2.12189.pkg 102M Stream_Deck_4.4.2.12189.pkg Which led me to think they were different files / sizes. But when I used `ls -l` I was surprised to see this: % command ls -l *pkg -rw-r--r-- 1 tjluoma staff 88885047 Dec 15 00:00 StreamDeck-4.4.2.12189.= pkg -rw-r--r--@ 1 tjluoma staff 88885047 Dec 15 00:02 Stream_Deck_4.4.2.12189= .pkg So they _are_ the same size. Are they the same file? I used `md5` to check % command md5 -r *pkg 98ac563a36386ca3aa87f62893302b4f StreamDeck-4.4.2.12189.pkg 98ac563a36386ca3aa87f62893302b4f Stream_Deck_4.4.2.12189.pkg OK, so these are exactly the same file. So=E2=80=A6 why did `gdu` tell me t= hey are different sizes? % gdu --version du (GNU coreutils) 8.31 Copyright (C) 2019 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later . This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Torbjorn Granlund, David MacKenzie, Paul Eggert, and Jim Meyering. I'm using Mac OS X 10.14.6 (18G2022) with `coreutils` installed via `brew`. Any help would be appreciated. Tj From debbugs-submit-bounces@debbugs.gnu.org Sun Dec 15 16:19:39 2019 Received: (at 38621) by debbugs.gnu.org; 15 Dec 2019 21:19:39 +0000 Received: from localhost ([127.0.0.1]:37297 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igbIh-0007JK-50 for submit@debbugs.gnu.org; Sun, 15 Dec 2019 16:19:39 -0500 Received: from mout.kundenserver.de ([212.227.126.133]:46777) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igbIf-0007J1-3S; Sun, 15 Dec 2019 16:19:37 -0500 Received: from [192.168.101.10] ([91.1.215.130]) by mrelayeu.kundenserver.de (mreue010 [212.227.15.167]) with ESMTPSA (Nemesis) id 1N5max-1heRvF1Iyp-017FeK; Sun, 15 Dec 2019 22:19:30 +0100 Subject: Re: bug#38621: gdu showing different sizes To: TJ Luoma , 38621@debbugs.gnu.org References: From: Bernhard Voelker Autocrypt: addr=mail@bernhard-voelker.de; prefer-encrypt=mutual; keydata= mQENBFPirzMBCACyzYldTjQ4ufFOkByY5Nn5USb5GFoL48nWBwNHjd9KUbtRRNlQiPNKd6hK Gvd3BGi5aoFKA4ytfRk6jbAbW3jVb3R8wYaV08mOy4KVEKxqN4bxsXlMjNChXVR+rtKDmfI+ oPTL+cPH2X6gW4W02IRbVw0uUhNm6zEedC/gNrY/mTlf1enZ46jxZ7BTUZaG+kx38UMISIMB zSzLRtdkwgmHj4jS3p1fF2cwRqLclIfMjKGpbNFPEXeXKWrCLcqHw78795eAR9q0YvrDkfIn GdDBwfb3VM4NdulwIFzvYZMSXvSbbyPLB5YkHU5aAWQHUse4WlfT5ccDpbzUYldRAvF9ABEB AAG0K0Jlcm5oYXJkIFZvZWxrZXIgPG1haWxAYmVybmhhcmQtdm9lbGtlci5kZT6JATkEEwEC ACMFAlPirzMCGwMHCwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRBGUC73lpFxle5wCACC dbs0QaJ0vR3Sff2cKdTk41rUq3YfWngsR///IOU0C5DdkePmCnJE/lUsUy0LRTxcUDLxQR+x QHU8ssRT0JUO9726dI3miy36UdsgmBYaOtLvQcidGmW1R7o0PYYf04+TFtyqKgngOUBPpMgR 6o4UsQxy/OD4bN1WDqOgIjL+D/qJpkKmgp6L6+hhaBCpiOFKRmmV7YyQ3SqVlfQNiHs5ZtkR nXpIjgZARV+GllKucI17bO0CGmTJZ1tstVy0+W3DQT1lbBkTTc++5LONM99D3jjn23l1ocOp folR53F7I4cb2RNfT23v1I59RH37lB9wMOqrKj0UjYAC2YoPGQ3BuQENBFPirzMBCADXLWWp QihBldY6reca8ZKdc3T9qXEOa3akE3DWKztIBmNJhtYOjmpLYajQTkGa7UoJTnbmZE2Rn6ZE oNnvb0gcFNAIcY95KOI+bjOR8HEgh4cx2REXh6L6olIgyXqt/KFusE4wtVZAFxZl+30HzN6n D+1HvrjXxPJRX6MsIYOYyyX9/6OofwJK6QHODYGp8WL2olHDnmsXg4AT6Wlr7qKpKrQELlcF R4xkvdmgL/Ghw/tK0yJTxMIcewCCZWLPOXRmFRbvAadZWPAgVsJ63siNyUlVnVMSzDgTJl+s l/DMabXpqrJQx3/1Yy6mTaDs3XZT/wmBKaTLXx/LByaPxQQ7ABEBAAGJAR8EGAECAAkFAlPi rzMCGwwACgkQRlAu95aRcZWVPwgAqZT6iTXkoP37wYb41323RzhBcJ8JSk4cyBDBUXX0lMrM 3qhiClKG7phpxVdu817Gwc6Hsecg7FfjQAV8MHQ0ZFeEFdk3b2rKBqfsStc+h49/xF3Fb+if CzR9qeQF82fMSxkg18++7hMcHCMO/hPZ/Q0xRi+lrSr2QKDJQuLzSyVU14TxrCkevZjEhtma VNvcJlJzCbiBXee9Fpc5jITUXPFG8E8dxqo1n+duOyIMgozrAnzP7X5V/Ob/Ozf/aGGX9+Jd inyfCX18nWcHALKMU/36Eua/ylalf/2c2YkBp9KCLVmGgPkUgW52EeRPgroIsiwu+rwCSV6Z UyCJ+OymCg== Message-ID: <75bb079a-519b-129b-50c4-167d62c16698@bernhard-voelker.de> Date: Sun, 15 Dec 2019 22:19:29 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K1:i8Rf+9/6TQjxmMeAkfPzRXEco2LY3IBNrlx5SyLjc2Ucrgl4JCg s4NdFOglHxE6OLoZaiOzEZQ6ShCcnUQt30672Vbkurx8QlYF3LbVa07OoABTbym4wxkxrXj 7Ay0bft1MrhAlaM4HYnoxgBfoKuxXcYfAZe3PLOPAg53p6BrhtmWpirOv2Yog8enkNH7yOe Ppc6QCohbjUU6m4Gr4FPg== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:IXPpqta7G+M=:UVUvCk+8ZCDvfO4ZMrGysg 3AGjn10MujbmSfXICxCJA76evgPO7JTPWygPZASqmBdhq1X7pfBKhcTIk86nQxjiM+cCj/m0g PcGzAdjGAs7JuUgY9+PcWy11edUyrvpu2Cpnim5L/LJ0mwKIiMt9OMM8MTh0mMR12kDeDExGV k/RGW0mN3ro72x8BdBArqMRB88w29DwtO9kispwS+cZLfqO1cWJ8B2D8FgvPjUTUS2gqBj0De +0gq9AjSySbwW0XoRDhBZH51sQ97JwwLT1oJebK9WH/zCOERRIWUAu4Q2ghE1g/xAuEqNGI05 aq4XxglCWvutUq49/cD5yWIA6HtS77RTExt76GN5Nj2dsDoFosPUIPCvUyE1N8UZ8aRrv2x0T Fe/N89EQu9XBuV5lQyxXnLirGFfrXWVcPHo7NRhc+1LGreyLSlTW1Sfv0l8lDRiYJ9s8HlHbM v2jyu8kMR8XLTYuPGD+VYbKV+RkxKPAdkCNjC/YC4+fnK2VjDxGLt+4BkBuJDqK5OopzI+peN 0nmtKRTb1WLBJUTevZj2rEK0pS9POnro3WLbVcmPDdZdDPvU+4BJR2vsZJ6WKwRqkE5wslgrn 1DbwqFkybWBSC1hCCoN2qxuKbZ+iSefzRxyDXf24rCslXQBiBitOF6x6MFQhQnKM+S863braY MzzDXIutk9Nxw4bv74afrxBSU4Yk3UIKBqL+ZxtV4XqaovAr42GT9GCzGh5IfZmaZBg1aT3zx s4oAuiYr/TErALdhSJ9tLRRbH1cro5haHW6LJ91RMG17SSvOqvE3kTyxIg4rywTGHYWRMdr2A SNQsPfYUA8YvcTdcfma4lTqe9j5igYxa6ojJ5oDDVO2ev8k8OvT9dbO7BXDlSTyBFlKnBuwWb VLb9e7Y0f++o1Rk1kmlMn2yKBlCKcJKlTyzEPPCb0= X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 38621 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) tag 38621 notabug close 38621 stop On 2019-12-15 06:15, TJ Luoma wrote: > I ended up with two version of the same file > 'StreamDeck-4.4.2.12189.pkg' and 'Stream_Deck_4.4.2.12189.pkg' and > wanted to check to see if they were the same file. > > I checked the size with `gdu` like so: > > % /usr/local/bin/gdu --si -s *pkg > 101M StreamDeck-4.4.2.12189.pkg > 102M Stream_Deck_4.4.2.12189.pkg > > Which led me to think they were different files / sizes. But when I > used `ls -l` I was surprised to see this: > > % command ls -l *pkg > -rw-r--r-- 1 tjluoma staff 88885047 Dec 15 00:00 StreamDeck-4.4.2.12189.pkg > -rw-r--r--@ 1 tjluoma staff 88885047 Dec 15 00:02 Stream_Deck_4.4.2.12189.pkg > > So they _are_ the same size. Are they the same file? I used `md5` to check > > % command md5 -r *pkg > 98ac563a36386ca3aa87f62893302b4f StreamDeck-4.4.2.12189.pkg > 98ac563a36386ca3aa87f62893302b4f Stream_Deck_4.4.2.12189.pkg > > OK, so these are exactly the same file. So… why did `gdu` tell me they > are different sizes? > > % gdu --version > du (GNU coreutils) 8.31 > Copyright (C) 2019 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later . > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > > Written by Torbjorn Granlund, David MacKenzie, Paul Eggert, > and Jim Meyering. > > I'm using Mac OS X 10.14.6 (18G2022) with `coreutils` installed via `brew`. > > Any help would be appreciated. This is a "sparse" file, i.e., a file with longer sequences of Zeroes somewhere in between which can be stored more efficient on the disk. Any application reading the data will get the correct number of Zeroes, while some disk space is saved. E.g. the following creates a 300M file, with the first 100M and the last 100M with random data, and the 100M between is a "hole": # Write the 1st 100M (as usual). $ dd bs=1M count=100 if=/dev/urandom of=f 100+ 0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.466356 s, 225 MB/s # Write another 100M, but starting at a position of 200M, # thus leaving Zeroes in between. $ dd bs=1M seek=200 count=100 if=/dev/urandom of=f 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.462072 s, 227 MB/s $ ls -logh f -rw-r--r-- 1 300M Dec 15 18:17 f $ du -h f # shows the space occupied on disk. 200M f $ du --apparent-size -h f # shows the size applications would read. 300M f See the documentation of 'cp' and 'du': https://www.gnu.org/software/coreutils/cp (the --sparse option) https://www.gnu.org/software/coreutils/du (the --apparent-size option) As this is not a bug in du(1), I'm marking this as such, and close the ticket in our bug tracker. The discussion can continue, of course. Have a nice day, Berny From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 16 01:26:00 2019 Received: (at 38621) by debbugs.gnu.org; 16 Dec 2019 06:26:00 +0000 Received: from localhost ([127.0.0.1]:37652 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igjpP-0007d6-J6 for submit@debbugs.gnu.org; Mon, 16 Dec 2019 01:26:00 -0500 Received: from mail-wm1-f54.google.com ([209.85.128.54]:55543) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igjpN-0007cp-K8 for 38621@debbugs.gnu.org; Mon, 16 Dec 2019 01:25:58 -0500 Received: by mail-wm1-f54.google.com with SMTP id q9so5370890wmj.5 for <38621@debbugs.gnu.org>; Sun, 15 Dec 2019 22:25:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=XtK4gliDjMfDfqXJoVFETKU32Kn/7IM2dLUQX+aZ1q8=; b=T2dYk6QlvVhcQELoTe3h979k86L3cnzUx8ZoRKdm1yFmU0kd1/CymvdVZPQGNkq99N gqisB65XUqpWiFbHk47UIHi4Ay9MfEVMmjbcswIRwIoPXLT8Zq3UmTYv3w4ePd2yALMH DuRZlSgVz6qRiQzSvDmgeGUN5zG8j99eXBKjaRC/uXZ5fPwWaFjZp+E1UzX9/kMSv0Qz khNp11D7PsKCayXh56IA27636zzNWfHRW/Rsbd/iGu3O7B/eyokQjsEgM/3MhiB5M91Z 184sX+5L24sWXmHKaSHEQlZsSklJ2zPAMaPl7DXNN078PGQa4Twpmt7hIkdziBbyekGE 0oaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=XtK4gliDjMfDfqXJoVFETKU32Kn/7IM2dLUQX+aZ1q8=; b=BjofnG7D7OQ6xUJlsL/++fUhxqf/gr1CT49zLRCm248CJsjuqaKvE7XtBcHQ3WzMWz y7Swl4uH2LWq7SlHG5W15qKgfqv/B6Of/UxNELHX68Of7gMX++2RrCEbAFji97NYVlJO sLfHy9tTrlGEpzmgpGJuXQNpk/VF5UWBxwH1reFWG2ZlKYuSO1wet0wTWROWH9iyP83R xTfmTe0TUJQ8+Wluh4qPUCwsHZK58vgp7VnyY2iimw9maf3nWb2IIInotdAe7NnaVdtJ /wtDERE5QZiHEKRP+AanJAIDVvEzk7960xtTqOsPTdfAdyMZtm/Fkdm9uKV9KTv41uZ6 469g== X-Gm-Message-State: APjAAAUmiyhWfNBBRkKRpm8pMcKEQLRKi2gBPk6sFCTjaX1jvgBF6Ti7 pm6EuPGTILgFPlm4zvp/sv+dFn4vYMtktXSlwRc= X-Google-Smtp-Source: APXvYqwpmIVELlB1d/Ayj0FXWTw0pm45yUSOoMaYGHQc0zd2iuj4aBg7e7OTClJpZabrzQXgKP5sPyliEAQFGuvlFiE= X-Received: by 2002:a05:600c:d7:: with SMTP id u23mr26933261wmm.145.1576477551386; Sun, 15 Dec 2019 22:25:51 -0800 (PST) MIME-Version: 1.0 References: <75bb079a-519b-129b-50c4-167d62c16698@bernhard-voelker.de> In-Reply-To: <75bb079a-519b-129b-50c4-167d62c16698@bernhard-voelker.de> From: TJ Luoma Date: Mon, 16 Dec 2019 01:25:40 -0500 Message-ID: Subject: Re: bug#38621: gdu showing different sizes To: Bernhard Voelker Content-Type: multipart/alternative; boundary="0000000000007265190599cc488b" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 38621 Cc: 38621@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --0000000000007265190599cc488b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I sort of followed most of the technical part of that but I still don=E2=80= =99t understand why it=E2=80=99s not a bug to show different information about t= wo identical files. Which may indicate that I didn=E2=80=99t understand the technical part very= well. As an end user, it=E2=80=99s hard to understand how that inconsistency isn= =E2=80=99t both undesirable and a bug. I could maybe see if they were two files with the same byte-count but different composition that made the calculations off by 1, but this is an identical file and it=E2=80=99s showing up with two different sizes, in a t= ool meant to report sizes. That just seems =E2=80=9Cobviously=E2=80=9D wrong even if it=E2=80=99s some= how technically explainable. TjL On Sun, Dec 15, 2019 at 4:19 PM Bernhard Voelker wrote: > tag 38621 notabug > close 38621 > stop > > On 2019-12-15 06:15, TJ Luoma wrote: > > I ended up with two version of the same file > > 'StreamDeck-4.4.2.12189.pkg' and 'Stream_Deck_4.4.2.12189.pkg' and > > wanted to check to see if they were the same file. > > > > I checked the size with `gdu` like so: > > > > % /usr/local/bin/gdu --si -s *pkg > > 101M StreamDeck-4.4.2.12189.pkg > > 102M Stream_Deck_4.4.2.12189.pkg > > > > Which led me to think they were different files / sizes. But when I > > used `ls -l` I was surprised to see this: > > > > % command ls -l *pkg > > -rw-r--r-- 1 tjluoma staff 88885047 Dec 15 00:00 > StreamDeck-4.4.2.12189.pkg > > -rw-r--r--@ 1 tjluoma staff 88885047 Dec 15 00:02 > Stream_Deck_4.4.2.12189.pkg > > > > So they _are_ the same size. Are they the same file? I used `md5` to > check > > > > % command md5 -r *pkg > > 98ac563a36386ca3aa87f62893302b4f StreamDeck-4.4.2.12189.pkg > > 98ac563a36386ca3aa87f62893302b4f Stream_Deck_4.4.2.12189.pkg > > > > OK, so these are exactly the same file. So=E2=80=A6 why did `gdu` tell = me they > > are different sizes? > > > > % gdu --version > > du (GNU coreutils) 8.31 > > Copyright (C) 2019 Free Software Foundation, Inc. > > License GPLv3+: GNU GPL version 3 or later < > https://gnu.org/licenses/gpl.html>. > > This is free software: you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law. > > > > Written by Torbjorn Granlund, David MacKenzie, Paul Eggert, > > and Jim Meyering. > > > > I'm using Mac OS X 10.14.6 (18G2022) with `coreutils` installed via > `brew`. > > > > Any help would be appreciated. > > This is a "sparse" file, i.e., a file with longer sequences of Zeroes > somewhere in between which can be stored more efficient on the disk. > Any application reading the data will get the correct number of Zeroes, > while some disk space is saved. > > E.g. the following creates a 300M file, with the first 100M and the last > 100M > with random data, and the 100M between is a "hole": > > # Write the 1st 100M (as usual). > $ dd bs=3D1M count=3D100 if=3D/dev/urandom of=3Df > 100+ 0 records in > 100+0 records out > 104857600 bytes (105 MB, 100 MiB) copied, 0.466356 s, 225 MB/s > > # Write another 100M, but starting at a position of 200M, > # thus leaving Zeroes in between. > $ dd bs=3D1M seek=3D200 count=3D100 if=3D/dev/urandom of=3Df > 100+0 records in > 100+0 records out > 104857600 bytes (105 MB, 100 MiB) copied, 0.462072 s, 227 MB/s > > $ ls -logh f > -rw-r--r-- 1 300M Dec 15 18:17 f > > $ du -h f # shows the space occupied on disk. > 200M f > > $ du --apparent-size -h f # shows the size applications would read. > 300M f > > See the documentation of 'cp' and 'du': > https://www.gnu.org/software/coreutils/cp (the --sparse option) > https://www.gnu.org/software/coreutils/du (the --apparent-size option) > > As this is not a bug in du(1), I'm marking this as such, and close the > ticket > in our bug tracker. The discussion can continue, of course. > > Have a nice day, > Berny > --0000000000007265190599cc488b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

I sort of followed most of the technical p= art of that but I still don=E2=80=99t understand why it=E2=80=99s not a bug= to show different information about two identical files.=C2=A0

Which may indicate that I didn=E2= =80=99t understand the technical part very well.=C2=A0

As an end user, it=E2=80=99s hard to underst= and how that inconsistency isn=E2=80=99t both undesirable and a bug.
<= div dir=3D"auto">
I could maybe see if they were= two files with the same byte-count but different composition that made the= calculations off by 1, but this is an identical file and it=E2=80=99s show= ing up with two different sizes, in a tool meant to report sizes.=C2=A0

That just seems =E2=80=9Cob= viously=E2=80=9D wrong even if it=E2=80=99s somehow technically explainable= .=C2=A0

TjL
On Sun, = Dec 15, 2019 at 4:19 PM Bernhard Voelker <mail@bernhard-voelker.de> wrote:
tag 38621 notabug
close 38621
stop

On 2019-12-15 06:15, TJ Luoma wrote:
> I ended up with two version of the same file
> 'StreamDeck-4.4.2.12189.pkg' and 'Stream_Deck_4.4.2.12189.= pkg' and
> wanted to check to see if they were the same file.
>
> I checked the size with `gdu` like so:
>
> % /usr/local/bin/gdu --si -s *pkg
> 101M=C2=A0 =C2=A0 =C2=A0StreamDeck-4.4.2.12189.pkg
> 102M=C2=A0 =C2=A0 =C2=A0Stream_Deck_4.4.2.12189.pkg
>
> Which led me to think they were different files / sizes. But when I > used `ls -l` I was surprised to see this:
>
> % command ls -l *pkg
> -rw-r--r--=C2=A0 1 tjluoma=C2=A0 staff=C2=A0 88885047 Dec 15 00:00 Str= eamDeck-4.4.2.12189.pkg
> -rw-r--r--@ 1 tjluoma=C2=A0 staff=C2=A0 88885047 Dec 15 00:02 Stream_D= eck_4.4.2.12189.pkg
>
> So they _are_ the same size. Are they the same file? I used `md5` to c= heck
>
> % command md5 -r *pkg
> 98ac563a36386ca3aa87f62893302b4f StreamDeck-4.4.2.12189.pkg
> 98ac563a36386ca3aa87f62893302b4f Stream_Deck_4.4.2.12189.pkg
>
> OK, so these are exactly the same file. So=E2=80=A6 why did `gdu` tell= me they
> are different sizes?
>
> %=C2=A0 gdu --version
> du (GNU coreutils) 8.31
> Copyright (C) 2019 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <https://gnu.org= /licenses/gpl.html>.
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
>
> Written by Torbjorn Granlund, David MacKenzie, Paul Eggert,
> and Jim Meyering.
>
> I'm using Mac OS X 10.14.6 (18G2022) with `coreutils` installed vi= a `brew`.
>
> Any help would be appreciated.

This is a "sparse" file, i.e., a file with longer sequences of Ze= roes
somewhere in between which can be stored more efficient on the disk.
Any application reading the data will get the correct number of Zeroes,
while some disk space is saved.

E.g. the following creates a 300M file, with the first 100M and the last 10= 0M
with random data, and the 100M between is a "hole":

=C2=A0 # Write the 1st 100M (as usual).
=C2=A0 $ dd bs=3D1M count=3D100 if=3D/dev/urandom of=3Df
=C2=A0 100+ 0 records in
=C2=A0 100+0 records out
=C2=A0 104857600 bytes (105 MB, 100 MiB) copied, 0.466356 s, 225 MB/s

=C2=A0 # Write another 100M, but starting at a position of 200M,
=C2=A0 # thus leaving Zeroes in between.
=C2=A0 $ dd bs=3D1M seek=3D200 count=3D100 if=3D/dev/urandom of=3Df
=C2=A0 100+0 records in
=C2=A0 100+0 records out
=C2=A0 104857600 bytes (105 MB, 100 MiB) copied, 0.462072 s, 227 MB/s

=C2=A0 $ ls -logh f
=C2=A0 -rw-r--r-- 1 300M Dec 15 18:17 f

=C2=A0 $ du -h f=C2=A0 # shows the space occupied on disk.
=C2=A0 200M=C2=A0 f

=C2=A0 $ du --apparent-size -h f=C2=A0 # shows the size applications would = read.
=C2=A0 300M=C2=A0 f

See the documentation of 'cp' and 'du':
https://www.gnu.org/software/coreutils/cp=C2=A0 (the --= sparse option)
https://www.gnu.org/software/coreutils/du=C2=A0 (the --= apparent-size option)

As this is not a bug in du(1), I'm marking this as such, and close the = ticket
in our bug tracker.=C2=A0 The discussion can continue, of course.

Have a nice day,
Berny
--0000000000007265190599cc488b-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 16 02:47:20 2019 Received: (at 38621) by debbugs.gnu.org; 16 Dec 2019 07:47:20 +0000 Received: from localhost ([127.0.0.1]:37695 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igl67-0001gM-T4 for submit@debbugs.gnu.org; Mon, 16 Dec 2019 02:47:20 -0500 Received: from mout.kundenserver.de ([212.227.126.130]:55577) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igl66-0001g9-0t for 38621@debbugs.gnu.org; Mon, 16 Dec 2019 02:47:18 -0500 Received: from [192.168.101.10] ([91.1.215.130]) by mrelayeu.kundenserver.de (mreue012 [212.227.15.167]) with ESMTPSA (Nemesis) id 1Mbkac-1i8FcX0eSc-00dEVC; Mon, 16 Dec 2019 08:47:11 +0100 Subject: Re: bug#38621: gdu showing different sizes To: TJ Luoma References: <75bb079a-519b-129b-50c4-167d62c16698@bernhard-voelker.de> From: Bernhard Voelker Autocrypt: addr=mail@bernhard-voelker.de; prefer-encrypt=mutual; keydata= mQENBFPirzMBCACyzYldTjQ4ufFOkByY5Nn5USb5GFoL48nWBwNHjd9KUbtRRNlQiPNKd6hK Gvd3BGi5aoFKA4ytfRk6jbAbW3jVb3R8wYaV08mOy4KVEKxqN4bxsXlMjNChXVR+rtKDmfI+ oPTL+cPH2X6gW4W02IRbVw0uUhNm6zEedC/gNrY/mTlf1enZ46jxZ7BTUZaG+kx38UMISIMB zSzLRtdkwgmHj4jS3p1fF2cwRqLclIfMjKGpbNFPEXeXKWrCLcqHw78795eAR9q0YvrDkfIn GdDBwfb3VM4NdulwIFzvYZMSXvSbbyPLB5YkHU5aAWQHUse4WlfT5ccDpbzUYldRAvF9ABEB AAG0K0Jlcm5oYXJkIFZvZWxrZXIgPG1haWxAYmVybmhhcmQtdm9lbGtlci5kZT6JATkEEwEC ACMFAlPirzMCGwMHCwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRBGUC73lpFxle5wCACC dbs0QaJ0vR3Sff2cKdTk41rUq3YfWngsR///IOU0C5DdkePmCnJE/lUsUy0LRTxcUDLxQR+x QHU8ssRT0JUO9726dI3miy36UdsgmBYaOtLvQcidGmW1R7o0PYYf04+TFtyqKgngOUBPpMgR 6o4UsQxy/OD4bN1WDqOgIjL+D/qJpkKmgp6L6+hhaBCpiOFKRmmV7YyQ3SqVlfQNiHs5ZtkR nXpIjgZARV+GllKucI17bO0CGmTJZ1tstVy0+W3DQT1lbBkTTc++5LONM99D3jjn23l1ocOp folR53F7I4cb2RNfT23v1I59RH37lB9wMOqrKj0UjYAC2YoPGQ3BuQENBFPirzMBCADXLWWp QihBldY6reca8ZKdc3T9qXEOa3akE3DWKztIBmNJhtYOjmpLYajQTkGa7UoJTnbmZE2Rn6ZE oNnvb0gcFNAIcY95KOI+bjOR8HEgh4cx2REXh6L6olIgyXqt/KFusE4wtVZAFxZl+30HzN6n D+1HvrjXxPJRX6MsIYOYyyX9/6OofwJK6QHODYGp8WL2olHDnmsXg4AT6Wlr7qKpKrQELlcF R4xkvdmgL/Ghw/tK0yJTxMIcewCCZWLPOXRmFRbvAadZWPAgVsJ63siNyUlVnVMSzDgTJl+s l/DMabXpqrJQx3/1Yy6mTaDs3XZT/wmBKaTLXx/LByaPxQQ7ABEBAAGJAR8EGAECAAkFAlPi rzMCGwwACgkQRlAu95aRcZWVPwgAqZT6iTXkoP37wYb41323RzhBcJ8JSk4cyBDBUXX0lMrM 3qhiClKG7phpxVdu817Gwc6Hsecg7FfjQAV8MHQ0ZFeEFdk3b2rKBqfsStc+h49/xF3Fb+if CzR9qeQF82fMSxkg18++7hMcHCMO/hPZ/Q0xRi+lrSr2QKDJQuLzSyVU14TxrCkevZjEhtma VNvcJlJzCbiBXee9Fpc5jITUXPFG8E8dxqo1n+duOyIMgozrAnzP7X5V/Ob/Ozf/aGGX9+Jd inyfCX18nWcHALKMU/36Eua/ylalf/2c2YkBp9KCLVmGgPkUgW52EeRPgroIsiwu+rwCSV6Z UyCJ+OymCg== Message-ID: Date: Mon, 16 Dec 2019 08:47:10 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K1:B5Zn3ZNTTZGiT3yj3ggnraLO4mB6U0iT0EoJO8VuhQInFnIaWo/ OVic+Fd9/D4MJnOXwneKw2/0KfvwTGvYlOR77mEoe0KeiofKtjYSFaF/jJQ+jNL5yjU2AJE ECdPjyuXXRJxOTkog/XEO873b8AIBvoba/3oRla2MCj+m6g9IlB9OEZIfE+lHRb+I786EMJ 5UWMfhQeVSFhhsWeHhelw== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:qSq2YqxIcP0=:vhnGt3xR5dBQdNnZDExGho x+u+fo7fSS2lPtjXHC2tOrB1BAs1dbyBRq5mWt6HeeAqzS/EczzPf8s2pzKQfiDyZMBYXkxwk MwGqEeUjl4YxtKr67e7DU/K+jBAqOWFC31wnpL3TdVUqV5ajbWig1EXmeRx8rK2q4SlbWKXb3 8WCZJ5gbAxxWHlbQ9kw8M4akTs7Viroi7MPRVxs+W8zD66dSZ4GKUzvfaOt3kSa67bfP6Qu3c h3RcQ1faB/GT11bCfSfKpQT6B9azVTJdlMghbrQVwDC3MuggJ7YrZlrJVNozwRemZB2G4QwuS D5IYDz76iRsgfm3e/LvjfPKNgnjWDzRBMrWU2hG3mgoL+Q6WZAJp2rGYLbEUGY8hcuTm0L9cm 9O67rEiao7nVvTwHD/Lao5RsVhQlzIop9ZaP7Jfq3T6utRlBAs6StxUlSFwWM1Ajnqd2Fsx4k tBOiD9JbAgsY/zgWpb8z9hivOOxfJw8igbLlOsLJsm144knvz7zI6hGGO/xLRQWVgaZ/n910y v/Dh0r9NG5zqKEiJxYIjVs0veUX0JyFEzJtthEnsuvqvTgfyeJq5ddfmTtWMGtq/oaysqMM6A NaqVk9g0jCS54oHBQgFKOoVxDrn6cRt9xj7avk+u9hzu+nHMZ8nEWn3PqYe4j9m8tDTj/BIhJ Ho7CEeAd0OfaTssTIxaV+RfXmI7/t6HWnMC2vtcUr3FYEeC9hsQnHmgL/XbVugJ8NNvmFwI96 lxVgU0fMkcV702p7/RniFEH4EoOeJSyO5Yh20wsU5gZCgkYQVtJrx8ry3hIGxmRF8kWRJ+QAw 9f6iilsIbMbGCF6u54z3G7+67Sha3RZTu6F52nAmrx1LqLp8CKNSziYRDL8AJa9oDBQKjstsk Bsdl8vB9ph4UzlswQptDvdFaY1+nVgt1gQKSDCjdk= X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 38621 Cc: 38621@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 2019-12-16 07:25, TJ Luoma wrote: > I sort of followed most of the technical part of that but I still don’t > understand why it’s not a bug to show different information about two > identical files. > > Which may indicate that I didn’t understand the technical part very well. > > As an end user, it’s hard to understand how that inconsistency isn’t both > undesirable and a bug. > > I could maybe see if they were two files with the same byte-count but > different composition that made the calculations off by 1, but this is an > identical file and it’s showing up with two different sizes, in a tool > meant to report sizes. > > That just seems “obviously” wrong even if it’s somehow technically > explainable. Thanks for following up on this for further clarifications. I think the problem is the word "size": while 'ls' and 'du --apparent-size' show the length of the content of a file, 'du' (without --apparent-size') reports the space the file needs on disk. $ du --help | sed 3q Usage: du [OPTION]... [FILE]... or: du [OPTION]... --files0-from=F Summarize disk usage of the set of FILEs, recursively for directories. ____________^^^^^^^^^^ One reason for those sizes to differ are "holes". As an extreme case, one can create a 4 Terabyte file (just NULs) on a filesystem which is much smaller than that: # Filesystem size. $ df -h --out=size,target . Size Mounted on 591G /mnt # Create a NUL-only file of size 4 Terabyte. $ truncate -s4T f2 # 'ls' shows the 4T of file size. $ ls -logh f2 -rw-r--r-- 1 4.0T Dec 16 08:36 f2 # 'du' shows that the file does not even require any disk usage. $ du -h f2 0 f2 # ... but with '--apparent-size' reports the real (content) size. $ du -h --apparent-size f2 4.0T f2 # Any program will see the 4T content transparently. $ wc -c < f2 4398046511104 In your case, the file was a mixture of regular data and holes, and 'cp' (without --sparse=always) tried to automatically determine if the target file should have holes or not (see 'man cp'). Therefore, your 2 files had a different disk usage, but the net length of the content is identical, of course. Have a nice day, Berny From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 16 14:44:23 2019 Received: (at 38621) by debbugs.gnu.org; 16 Dec 2019 19:44:23 +0000 Received: from localhost ([127.0.0.1]:40390 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igwI2-0001eE-Sw for submit@debbugs.gnu.org; Mon, 16 Dec 2019 14:44:23 -0500 Received: from mail-wr1-f48.google.com ([209.85.221.48]:37097) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igwI1-0001e1-PL for 38621@debbugs.gnu.org; Mon, 16 Dec 2019 14:44:22 -0500 Received: by mail-wr1-f48.google.com with SMTP id w15so8781395wru.4 for <38621@debbugs.gnu.org>; Mon, 16 Dec 2019 11:44:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=YhwgZJro08xuSaueLZUW1Mrc5d0t4jdCBn3bb1mTeQc=; b=slLkA5Djb//7CFY9hplT98xTDpHNFzmfOXaSgZ499l0iPZrCR7nHpPM4FVADAhLzxd lANgxsQLePRyCE8Y0IlB5vtWlPHRbKDWN7SbeKghUMEZPg9zgqKgpsgNhmQ0JGHynoSY 4NXn0tdVfPpmowKzuHhkJu4qwc0JLz1hU0ydXhGO6wOrcGBMEckBN+dYAxyn+ZldVPoZ zFGhuisl1sQV6U0o/8ordsW8HynBjAG7hN8P/IPmHehr3tPcXOgrmSA7phQYxp55p7ub MYEb3bT+DjySwG1fOAuTvYBfVbDfhk7ybEPHfLEyTQO2GS+J8Af17YPep/NXMB+ipNRV UjIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=YhwgZJro08xuSaueLZUW1Mrc5d0t4jdCBn3bb1mTeQc=; b=Oyw4nTERGuVxFxhec1jZG6NmowduA26W/AlIBU8msOWca/fIpRQ5M1hnH3lR7ZDQTK wSEMAISnfIzvmtv0HBDVHRaSdaw5yBe1oQL8kbM8W0pT1GCzvxru9sstX7sY50mUW1kq zB2IXtOGqRbqUrkc0N36OWSlQ61JUXERqzAY/PtNh7+iRCHfgE6OXD4iJ2OGNoBVCawC U13qpDF0GpCCmUu1kshu9hCFqsjv4BZLztYZZXtxmuAn/8vU8RtwCV/El9pwW8B4DvPq 5T5vL1DOmD9QinEH0a3rwx5yCBB2/R0xntV+5CjF+Vr4wiCHoomBuBMlM2DXVOGSS0Tz Wq1A== X-Gm-Message-State: APjAAAWuDYpqn2MTRzNEuD9KOwPTLGLWUDCZYS5gWzFHMbaV0G560HK6 JHTVb/+B5MCauT3Zd013ZcHVkViD6vjCySTTwpU= X-Google-Smtp-Source: APXvYqzbQei8cXYuKIH8HSK7dYtSbRqGKNpU7vw+Ryw+nf0vtGg6vOKHeNl3mczTuKkjLfMrI1z9bTRk04KET8ZWDNo= X-Received: by 2002:a5d:6ca1:: with SMTP id a1mr31131157wra.36.1576525455560; Mon, 16 Dec 2019 11:44:15 -0800 (PST) MIME-Version: 1.0 References: <75bb079a-519b-129b-50c4-167d62c16698@bernhard-voelker.de> In-Reply-To: From: TJ Luoma Date: Mon, 16 Dec 2019 14:43:37 -0500 Message-ID: Subject: Re: bug#38621: gdu showing different sizes To: Bernhard Voelker Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 38621 Cc: 38621@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) AHA! Ok, now I understand a little better. I have seen the difference between "size" and "size on disk" and did not realize that applied here. I'm still not 100% clear on _why_ two "identical" files would have different results for "size on disk" (it _seems_ like those should be identical) but I suspect that the answer is probably of a technical nature that would be "over my head" so to speak, and truthfully, all I really need to know is "sometimes that happens" rather than understanding the technical details of why. I appreciate you taking the time to educate me further about this. Cheers Tj On Mon, Dec 16, 2019 at 2:47 AM Bernhard Voelker wrote: > > On 2019-12-16 07:25, TJ Luoma wrote: > > I sort of followed most of the technical part of that but I still don= =E2=80=99t > > understand why it=E2=80=99s not a bug to show different information abo= ut two > > identical files. > > > > Which may indicate that I didn=E2=80=99t understand the technical part = very well. > > > > As an end user, it=E2=80=99s hard to understand how that inconsistency = isn=E2=80=99t both > > undesirable and a bug. > > > > I could maybe see if they were two files with the same byte-count but > > different composition that made the calculations off by 1, but this is = an > > identical file and it=E2=80=99s showing up with two different sizes, in= a tool > > meant to report sizes. > > > > That just seems =E2=80=9Cobviously=E2=80=9D wrong even if it=E2=80=99s = somehow technically > > explainable. > > Thanks for following up on this for further clarifications. > > I think the problem is the word "size": > while 'ls' and 'du --apparent-size' show the length of the content of > a file, 'du' (without --apparent-size') reports the space the file > needs on disk. > > $ du --help | sed 3q > Usage: du [OPTION]... [FILE]... > or: du [OPTION]... --files0-from=3DF > Summarize disk usage of the set of FILEs, recursively for directories. > ____________^^^^^^^^^^ > > One reason for those sizes to differ are "holes". As an extreme case, > one can create a 4 Terabyte file (just NULs) on a filesystem which is > much smaller than that: > > # Filesystem size. > $ df -h --out=3Dsize,target . > Size Mounted on > 591G /mnt > > # Create a NUL-only file of size 4 Terabyte. > $ truncate -s4T f2 > > # 'ls' shows the 4T of file size. > $ ls -logh f2 > -rw-r--r-- 1 4.0T Dec 16 08:36 f2 > > # 'du' shows that the file does not even require any disk usage. > $ du -h f2 > 0 f2 > > # ... but with '--apparent-size' reports the real (content) size. > $ du -h --apparent-size f2 > 4.0T f2 > > # Any program will see the 4T content transparently. > $ wc -c < f2 > 4398046511104 > > In your case, the file was a mixture of regular data and holes, > and 'cp' (without --sparse=3Dalways) tried to automatically determine > if the target file should have holes or not (see 'man cp'). > Therefore, your 2 files had a different disk usage, but the net length > of the content is identical, of course. > > Have a nice day, > Berny From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 16 15:51:47 2019 Received: (at 38621) by debbugs.gnu.org; 16 Dec 2019 20:51:47 +0000 Received: from localhost ([127.0.0.1]:40407 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igxLG-0003Pm-NM for submit@debbugs.gnu.org; Mon, 16 Dec 2019 15:51:47 -0500 Received: from havoc.proulx.com ([96.88.95.61]:36434) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igxLE-0003PX-BF for 38621@debbugs.gnu.org; Mon, 16 Dec 2019 15:51:44 -0500 Received: from joseki.proulx.com (localhost [127.0.0.1]) by havoc.proulx.com (Postfix) with ESMTP id 63C9B477; Mon, 16 Dec 2019 13:51:38 -0700 (MST) Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 357EA21152; Mon, 16 Dec 2019 13:51:38 -0700 (MST) Received: by hysteria.proulx.com (Postfix, from userid 1000) id 2D20B2DC83; Mon, 16 Dec 2019 13:51:38 -0700 (MST) Date: Mon, 16 Dec 2019 13:51:38 -0700 From: Bob Proulx To: TJ Luoma Subject: Re: bug#38621: gdu showing different sizes Message-ID: <20191216130815268937474@bob.proulx.com> References: <75bb079a-519b-129b-50c4-167d62c16698@bernhard-voelker.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 38621 Cc: 38621@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) TJ Luoma wrote: > AHA! Ok, now I understand a little better. I have seen the difference > between "size" and "size on disk" and did not realize that applied > here. > > I'm still not 100% clear on _why_ two "identical" files would have > different results for "size on disk" (it _seems_ like those should be > identical) but I suspect that the answer is probably of a technical > nature that would be "over my head" so to speak, and truthfully, all I > really need to know is "sometimes that happens" rather than > understanding the technical details of why. I think at the start is where the confusion began. Because the commands are named to show that they were intended to show different things. 'du' is named for showing disk usage 'ls' is named for listing files And those are rather different things! Let's dig into the details. The long format for information says: ‘-l’ ‘--format=long’ ‘--format=verbose’ In addition to the name of each file, print the file type, file mode bits, number of hard links, owner name, group name, size, and timestamp (*note Formatting file timestamps::), normally the modification timestamp (the mtime, *note File timestamps::). Print question marks for information that cannot be determined. So we know that ls lists the size of the file. But let me specifically say that this is tagged to the *file*. It's file centric. There is also the -s option. ‘-s’ ‘--size’ Print the disk allocation of each file to the left of the file name. This is the amount of disk space used by the file, which is usually a bit more than the file’s size, but it can be less if the file has holes. This displays how much disk space the file consumes instead of the size of the file. The two being different things. And then the 'du' documentation says: ‘du’ reports the amount of disk space used by the set of specified files And so du is the disk used by the file. But as we know the amount of disk used is dependent upon the file system holding the file. Different file systems will have different storage methods and the amount of disk space being consumed by a file will be different and somewhat unrelated to the size of the file. Disk space consumed to hold the file could be larger or smaller than the file size. In particular if the file is sparse then there are "holes" in the middle that are all zero data and do not need to be stored. Thereby saving the space. In which case it will be smaller. Or since files are stored in blocks the final block will have some fragment of space at the end that is past the end of the file but too small to be used for other files. In which case it will be larger. Therefore it is not surprising that the numbers displayed for disk usage is not the same as the file content size. They would really only line up exactly if the file content size is a multiple of the file system storage block size and every block is fully represented on disk. Otherwise they will always be at least somewhat different in number. As long as I am here I should mention 'df' which shows disk free space information. One sometimes thinks that adding up the file content size should add up to du disk usage size, but it doesn't. And one sometimes thinks that adding up all of the du disk usage sizes should add up to the df disk free sizes, but it doesn't. That is due to a similar reason. File systems reserve a min-free amount of space for superuser level processes to ensure continued operation even if the disk is fulling up from non-privileged processes. Also file system efficiency and performance drops dramatically as the file system fills up. Therefore the file system reports space with the min-free reserved space in mind. And once again this is different on different file systems. But let me return to your first bit of information. The ls long listing of the files. Your version of ls gave an indication that something was different about the second file. > % command ls -l *pkg > -rw-r--r-- 1 tjluoma staff 88885047 Dec 15 00:00 StreamDeck-4.4.2.12189.pkg > -rw-r--r--@ 1 tjluoma staff 88885047 Dec 15 00:02 Stream_Deck_4.4.2.12189.pkg See that '@' in that position? The GNU ls coreutils 8.30 documentation I am looking at says: Following the file mode bits is a single character that specifies whether an alternate access method such as an access control list applies to the file. When the character following the file mode bits is a space, there is no alternate access method. When it is a printing character, then there is such a method. GNU ‘ls’ uses a ‘.’ character to indicate a file with a security context, but no other alternate access method. A file with any other combination of alternate access methods is marked with a ‘+’ character. I did not see anywhere that documented what an '@' means. Therefore it is likely something applied in a downstream patch. Likely a software distribution specific modification. But I don't really know. I live under a rock and don't get out much. But likely meaning that the second file listed with the file mode '@' is not stored on disk in a typical way. That's probably the first clue that it is different. But actually I do not know as I do not see files listed that way here. Bob From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 16 18:38:21 2019 Received: (at 38621) by debbugs.gnu.org; 16 Dec 2019 23:38:21 +0000 Received: from localhost ([127.0.0.1]:40524 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igzwT-00022k-HP for submit@debbugs.gnu.org; Mon, 16 Dec 2019 18:38:21 -0500 Received: from mout.kundenserver.de ([217.72.192.73]:56397) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igzwR-00022Q-Jy for 38621@debbugs.gnu.org; Mon, 16 Dec 2019 18:38:20 -0500 Received: from [192.168.101.10] ([91.1.215.130]) by mrelayeu.kundenserver.de (mreue108 [212.227.15.183]) with ESMTPSA (Nemesis) id 1Md6ZB-1i8lS32TwJ-00aFN2; Tue, 17 Dec 2019 00:38:12 +0100 Subject: Re: bug#38621: gdu showing different sizes To: TJ Luoma References: <75bb079a-519b-129b-50c4-167d62c16698@bernhard-voelker.de> From: Bernhard Voelker Autocrypt: addr=mail@bernhard-voelker.de; prefer-encrypt=mutual; keydata= mQENBFPirzMBCACyzYldTjQ4ufFOkByY5Nn5USb5GFoL48nWBwNHjd9KUbtRRNlQiPNKd6hK Gvd3BGi5aoFKA4ytfRk6jbAbW3jVb3R8wYaV08mOy4KVEKxqN4bxsXlMjNChXVR+rtKDmfI+ oPTL+cPH2X6gW4W02IRbVw0uUhNm6zEedC/gNrY/mTlf1enZ46jxZ7BTUZaG+kx38UMISIMB zSzLRtdkwgmHj4jS3p1fF2cwRqLclIfMjKGpbNFPEXeXKWrCLcqHw78795eAR9q0YvrDkfIn GdDBwfb3VM4NdulwIFzvYZMSXvSbbyPLB5YkHU5aAWQHUse4WlfT5ccDpbzUYldRAvF9ABEB AAG0K0Jlcm5oYXJkIFZvZWxrZXIgPG1haWxAYmVybmhhcmQtdm9lbGtlci5kZT6JATkEEwEC ACMFAlPirzMCGwMHCwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRBGUC73lpFxle5wCACC dbs0QaJ0vR3Sff2cKdTk41rUq3YfWngsR///IOU0C5DdkePmCnJE/lUsUy0LRTxcUDLxQR+x QHU8ssRT0JUO9726dI3miy36UdsgmBYaOtLvQcidGmW1R7o0PYYf04+TFtyqKgngOUBPpMgR 6o4UsQxy/OD4bN1WDqOgIjL+D/qJpkKmgp6L6+hhaBCpiOFKRmmV7YyQ3SqVlfQNiHs5ZtkR nXpIjgZARV+GllKucI17bO0CGmTJZ1tstVy0+W3DQT1lbBkTTc++5LONM99D3jjn23l1ocOp folR53F7I4cb2RNfT23v1I59RH37lB9wMOqrKj0UjYAC2YoPGQ3BuQENBFPirzMBCADXLWWp QihBldY6reca8ZKdc3T9qXEOa3akE3DWKztIBmNJhtYOjmpLYajQTkGa7UoJTnbmZE2Rn6ZE oNnvb0gcFNAIcY95KOI+bjOR8HEgh4cx2REXh6L6olIgyXqt/KFusE4wtVZAFxZl+30HzN6n D+1HvrjXxPJRX6MsIYOYyyX9/6OofwJK6QHODYGp8WL2olHDnmsXg4AT6Wlr7qKpKrQELlcF R4xkvdmgL/Ghw/tK0yJTxMIcewCCZWLPOXRmFRbvAadZWPAgVsJ63siNyUlVnVMSzDgTJl+s l/DMabXpqrJQx3/1Yy6mTaDs3XZT/wmBKaTLXx/LByaPxQQ7ABEBAAGJAR8EGAECAAkFAlPi rzMCGwwACgkQRlAu95aRcZWVPwgAqZT6iTXkoP37wYb41323RzhBcJ8JSk4cyBDBUXX0lMrM 3qhiClKG7phpxVdu817Gwc6Hsecg7FfjQAV8MHQ0ZFeEFdk3b2rKBqfsStc+h49/xF3Fb+if CzR9qeQF82fMSxkg18++7hMcHCMO/hPZ/Q0xRi+lrSr2QKDJQuLzSyVU14TxrCkevZjEhtma VNvcJlJzCbiBXee9Fpc5jITUXPFG8E8dxqo1n+duOyIMgozrAnzP7X5V/Ob/Ozf/aGGX9+Jd inyfCX18nWcHALKMU/36Eua/ylalf/2c2YkBp9KCLVmGgPkUgW52EeRPgroIsiwu+rwCSV6Z UyCJ+OymCg== Message-ID: Date: Tue, 17 Dec 2019 00:38:11 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K1:8jD5OsgI8/Bx0b/5RPa5lS0hmDPhzaFzYjZOuCsuOLW50KtrtEG 3V1O8Ry3sTNTA5QcWdJNb60GyhXI9buCNKLeEK4NqO/PKUzLMa+M0Y7SlRRoiazVY7qu2/a C+BFODaCAlICsFa00ndB+5JRcrV32ag6VD5QvzFV5B15Rh0Pk7iG9keZwj6BywPWOlufTAP etNuZeXzDGtlLIfPXL/eQ== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:Q3f1rh+lBmU=:Mgjae/hSDeWR+dVN9l/Hew zpYTVhP8Vxf46LZtSHLhLayQTMhrDOkhrGiORT2EX8tsyFuyoc8Y20zbKQmtEGefLRbCoaYyE k4IN2KoyKaRnO02VvpuomnUDPNg1hWBCjvQ0Y+6YALepSRK7zpeBWruYGOqFYBdx82anNu0uV SrUvbque9AcOV7sNbH+FidOMdHZdQxUDFA72qn6df6yInAB6exby6WQHK5LwldyZVa/7J2D5/ jK1hgA+TmWBsm3lmjHeiDeYfLhuSLsAjoZL10n/CvI3ji4KO0odwrE2MUH1+cxM2d/6/BHVh8 QoEzt1bl5BMm4zeCjU5Ba/DaiA20yFnYcb1pUXOVPotePUwXTojludrgn8VTFi0cqRBBBwzAV g3pIhOIMiwqPYIi+Em2K9iEf2CEK28CUWVlSGgBtXKju+oKqro+XuJUvmyJVRv7ypePU/ikF5 vLIJmwFjytJhT7Rn3W/gSZOl5qRnxOFWH5ZvizZzELrQuixg5r/MXwE3hhxrJqM3Sn7cM0NO/ 02yU5QXtPDZISZ63w7yB52ID470PDze4QRjMVh66TeN6cdB6dDiVtO0pjvJDsxEbQutqH/2XE 9i9D7qf7N5TpPswp54j4stmw0cQsNAM26ygJ4CgaVbtMmgLMAItfcDfMALjQnrxWY6H52EOFT 3rCLr5nWNPuBGbnLqsDt09lYlSqw+bbpT1WPT9sr209igSaOiDYG2TU7Nwg3gljYkkitFkgLl 3ds+KvvojdzZA/CdAq9stWPirf/em7aBtr5XCbaRcojbMJ6yewGid0TJoz18d4rPIFJggjJFn hP9hkVcz9ylP01NDJeImGP15UP83KiK+ytsstxLLABxE1cowYgqxpv2Hp87c/lOaqmJgqINjH +hI6g1UJGJ8tKwbRNMm0vcB22lRk7JjCkkcax0iIY= X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 38621 Cc: 38621@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 2019-12-16 20:43, TJ Luoma wrote: > AHA! Ok, now I understand a little better. I have seen the difference > between "size" and "size on disk" and did not realize that applied > here. Thanks for confirming. > I'm still not 100% clear on _why_ two "identical" files would have > different results for "size on disk" (it _seems_ like those should be > identical) but I suspect that the answer is probably of a technical > nature that would be "over my head" so to speak, and truthfully, all I > really need to know is "sometimes that happens" rather than > understanding the technical details of why. Actually the difference is a matter of choice, i.e., how the user wants to save the file (obviously, most programs come with a certain default preference). Suppose one writes a file with an "A" at the beginning, then e.g. 1.000.000 NUL characters, and then a "B". Then the storing algorithm may decide to either explicitly write all NULs separately (here displayed as '.') to disk; e.g. 'cp --sparse=never' would do so: - write "A", - write 1.000.000 times a NUL, - write "B". or to try to save some disk space by writing it as a "sparse" file; e.g. 'cp --sparse=always' would (try to) do so: - write an "A", - then tell the filesystem that there are 1.000.000 NULs (which takes just a few bytes physically), - write a "B" The latter method needs support from both the tool and the file system where the file is stored. Or with your words: "sometimes that happens". ;-) > I appreciate you taking the time to educate me further about this. No worries. If there's one user who got confused, then there is the chance that also others might fall into the same issue. Therefore, if you think we could improve something, e.g. a clarifying word in the documentation, then this would help us all. Thanks & have a nice day, Berny From unknown Sat Jun 21 12:19:29 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 14 Jan 2020 12:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator