From unknown Sat Aug 16 15:53:21 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#11950 <11950@debbugs.gnu.org> To: bug#11950 <11950@debbugs.gnu.org> Subject: Status: cp: Recursively copy ordered for maximal reading speed Reply-To: bug#11950 <11950@debbugs.gnu.org> Date: Sat, 16 Aug 2025 22:53:21 +0000 retitle 11950 cp: Recursively copy ordered for maximal reading speed reassign 11950 coreutils submitter 11950 Michael severity 11950 normal tag 11950 notabug moreinfo thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Jul 16 11:25:23 2012 Received: (at submit) by debbugs.gnu.org; 16 Jul 2012 15:25:23 +0000 Received: from localhost ([127.0.0.1]:44316 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SqnAo-0006UN-DI for submit@debbugs.gnu.org; Mon, 16 Jul 2012 11:25:23 -0400 Received: from eggs.gnu.org ([208.118.235.92]:35740) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SqYiQ-0001dh-Dr for submit@debbugs.gnu.org; Sun, 15 Jul 2012 19:59:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SqYcj-0005lg-LY for submit@debbugs.gnu.org; Sun, 15 Jul 2012 19:53:14 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:52004) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SqYcj-0005lb-IE for submit@debbugs.gnu.org; Sun, 15 Jul 2012 19:53:13 -0400 Received: from eggs.gnu.org ([208.118.235.92]:36842) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SqYci-0006Gf-4L for bug-coreutils@gnu.org; Sun, 15 Jul 2012 19:53:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SqYcg-0005kI-PM for bug-coreutils@gnu.org; Sun, 15 Jul 2012 19:53:11 -0400 Received: from mailout-de.gmx.net ([213.165.64.23]:53565) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1SqYcg-0005id-En for bug-coreutils@gnu.org; Sun, 15 Jul 2012 19:53:10 -0400 Received: (qmail invoked by alias); 15 Jul 2012 23:53:08 -0000 Received: from f048233144.adsl.alicedsl.de (EHLO mirrors.kernel.org) [78.48.233.144] by mail.gmx.net (mp071) with SMTP; 16 Jul 2012 01:53:08 +0200 X-Authenticated: #25347478 X-Provags-ID: V01U2FsdGVkX19AiMlIE7xAmECD8UTFTBfOcSS8AuFx8iCLEagdiz hDMBe8IGRQsnCS Date: Mon, 16 Jul 2012 01:53:06 +0200 From: Michael To: bug-coreutils@gnu.org Subject: cp: Recursively copy ordered for maximal reading speed Message-ID: <20120716015306.4ae4e664@mirrors.kernel.org> User-Agent: claws-mail.org X-Mailer: Claws -- sharp as hell. Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 16 Jul 2012 11:25:19 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) Hello, After coding several backup tools there's something in my mind since years. When 'cp' copies files from magnetic harddisks (commonly called after their adapter or bus - SATA, IDE, and the like, i'm not talking about solid state) recursively, it seems to pick up the files in 'raw' order, just as the disk buffer spit them out (like 'in one head move'). Or so. It does not resemble any alphabetical order, for example, it does not even stay within the same parent folder (flingering hither and forth, as the files come in). I suppose that's the fastest order, fastest for reading. However, one could consider another 'maximal speed': The (later) read access of the copied files. (Among the reasons that files are not sorted physically on disk are FS driver gap optimizing code, and user actions like deleting single files, or moving into another place. It could be called 'physically folder fragmentation', something happening sooner or later, if you work on files, anyway. I'd like to propose a way to avoid this specific fragmentation when copying.) For example, take a large image gallery, sorted into several folders and all files sorted alphanum. This is a standard example. Now what will file managers, or image viewers, do with these files ? They will read in one folders content, and display the files sorted alphanum. Usually, they even create thumbnails, so they really access any file separately, and in the said order. This is creating quite some disk head moves, because they are not stored in that order 'physically' on disk. Meaning, it is slow, even if the disk is fast and have a fast buffer, compared to the rarely existing case when the files would be stored physically just in their access order. I hope the idea got clear.... Now my proposal is to have a recursive 'ordered' mode, where cp copies the files of one folder in their alphanumeric sorting (which should be the view mode in 99% of all cases out there). It would slow down the copy process a bit, for the benefit of later reading speed. Now you may ask what is it good for. Aren't backups just that, and noone ever opens them with file managers or viewers, regularly ? But 'cp' is not only used for backups. It is also used to copy the files from the camera chip to the harddisk in the first place, or to copy over to network drives. I believe it is most as backend in most desktop applications anyway, and probably in most servers too. It still is true that most people want maximal copy speed, not maximal reading. But maybe that's partly just because they don't know the choice even exists. If there was such a recursive option, then backup or download tools at least could offer it in their settings too. I would certainly use it in my backup code, because i'm dealing with massive backups, where (maybe unobviously) speed does not matter so much exactly for that reason: Because it needs hours anyway. I do not need speed with backup. I need speed when reading. I'm a DJ with huge music collection, and also a massive photographer and doing lots of movie clips too, doing backups since more than 10 years, and i am absolutely sure about this choice. I just think that there is a grain of meaning in my proposal. I'm not on any bug list, i hope this can be accepted just as a mail. Let me know if and how i can do it better. Kind regards, Michael From debbugs-submit-bounces@debbugs.gnu.org Mon Jul 16 16:58:42 2012 Received: (at 11950) by debbugs.gnu.org; 16 Jul 2012 20:58:42 +0000 Received: from localhost ([127.0.0.1]:44713 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SqsNO-0005kq-De for submit@debbugs.gnu.org; Mon, 16 Jul 2012 16:58:42 -0400 Received: from c-98-226-12-79.hsd1.in.comcast.net ([98.226.12.79]:60114 helo=kosh.dhis.org) by debbugs.gnu.org with smtp (Exim 4.72) (envelope-from ) id 1SqsNM-0005kh-2Q for 11950@debbugs.gnu.org; Mon, 16 Jul 2012 16:58:40 -0400 Received: (qmail 3849 invoked by uid 1000); 16 Jul 2012 20:52:43 -0000 Message-ID: <20120716205243.3848.qmail@kosh.dhis.org> From: "Alan Curry" Subject: Re: bug#11950: cp: Recursively copy ordered for maximal reading speed To: codejodler@gmx.ch (Michael) Date: Mon, 16 Jul 2012 15:52:42 -0500 (GMT+5) In-Reply-To: <20120716015306.4ae4e664@mirrors.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: 3.9 (+++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Michael writes: > > Hello, > > After coding several backup tools there's something in my mind since years. When 'cp' copies files from magnetic harddisks (commonly called after their adapter or bus - SATA, IDE, and the like, i'm not talking about solid state) recursively, it seems to pick up the files in 'raw' order, just as the disk buffer spit them out (like 'in one head move'). Or so. It does not resemble any alphabetical order, for example, it does not even stay within the same parent folder (flingering hither and forth, as the files come in). [...] Content analysis details: (3.9 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 1.4 RCVD_IN_BRBL_LASTEXT RBL: RCVD_IN_BRBL_LASTEXT [98.226.12.79 listed in bb.barracudacentral.org] 3.3 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL [98.226.12.79 listed in zen.spamhaus.org] 0.0 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address [98.226.12.79 listed in dnsbl.sorbs.net] -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] 1.0 RDNS_DYNAMIC Delivered to internal network by host with dynamic-looking rDNS X-Debbugs-Envelope-To: 11950 Cc: 11950@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 3.9 (+++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Michael writes: > > Hello, > > After coding several backup tools there's something in my mind since years. When 'cp' copies files from magnetic harddisks (commonly called after their adapter or bus - SATA, IDE, and the like, i'm not talking about solid state) recursively, it seems to pick up the files in 'raw' order, just as the disk buffer spit them out (like 'in one head move'). Or so. It does not resemble any alphabetical order, for example, it does not even stay within the same parent folder (flingering hither and forth, as the files come in). [...] Content analysis details: (3.9 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 3.3 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL [98.226.12.79 listed in zen.spamhaus.org] 0.0 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address [98.226.12.79 listed in dnsbl.sorbs.net] 1.4 RCVD_IN_BRBL_LASTEXT RBL: RCVD_IN_BRBL_LASTEXT [98.226.12.79 listed in bb.barracudacentral.org] -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] 1.0 RDNS_DYNAMIC Delivered to internal network by host with dynamic-looking rDNS Michael writes: > > Hello, > > After coding several backup tools there's something in my mind since years. When 'cp' copies files from magnetic harddisks (commonly called after their adapter or bus - SATA, IDE, and the like, i'm not talking about solid state) recursively, it seems to pick up the files in 'raw' order, just as the disk buffer spit them out (like 'in one head move'). Or so. It does not resemble any alphabetical order, for example, it does not even stay within the same parent folder (flingering hither and forth, as the files come in). [grumble at User-Agent: claws-mail.org: One line per paragraph isn't good mail formatting!] It's called directory order. It used to be simply order of creation of files, with deletions creating gaps that could be filled by later creations with same-length or shorter names. But on most new filesystems, directories are stored in a non-linear structure so that lookups in a large directory don't have to scan through every name. For ext2/ext3/ext4, run tune2fs -l on the block device and look for the dir_index option. If you're copying files onto a filesystem with dir_index enabled, the order in which cp creates them should have little effect on the directory's layout afterward. If you're not using dir_index on the destination filesystem, there's your problem! Enable dir_index and all directory lookups will be fast. None of this has anything to do with where the actual data blocks of the file will be allocated. There's no way to control that. If you think that the second file created is going to be adjacent to the first file created... that's never been guaranteed. Filesystem block allocators are way more mysterious than that. If you really think there's something to be gained here, prove it: start with a directory with a lot of files but no subdirectories. Do an alphabetical-order copy like this: $ mkdir other_directory ; cp ./* other_directory (The glob returns the names in sorted order so this gives you the creation order you want, unlike cp -r) Then get it all out of cache so the read test will hit the disk as much as possible: $ sync ; echo 3 > /proc/sys/vm/drop_caches And read back the files: $ cd other_directory ; time cat ./* > /dev/null Now repeat, but using cp -r to create the other directory so the files get copied in the source directory order. And repeat again, but using $ find . -type f -exec cat '{}' + > /dev/null instead of the cat ./* (the glob will cat the files in sorted order, the find will use directory order). If there are any significant differences in the times, and dir_index is enabled, you're onto something. With dir_index disabled, you should get worse times all around, but not a lot worse if the files are big enough that the time spent reading their contents overshadows the time spent on directory lookups. -- Alan Curry From debbugs-submit-bounces@debbugs.gnu.org Tue Jul 17 00:32:38 2012 Received: (at 11950) by debbugs.gnu.org; 17 Jul 2012 04:32:38 +0000 Received: from localhost ([127.0.0.1]:45002 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SqzSg-0007S7-J3 for submit@debbugs.gnu.org; Tue, 17 Jul 2012 00:32:38 -0400 Received: from mx.meyering.net ([88.168.87.75]:58416) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SqzSe-0007Rv-8R; Tue, 17 Jul 2012 00:32:37 -0400 Received: from rho.meyering.net (rho.meyering.net [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id 9C5B960069; Tue, 17 Jul 2012 06:26:35 +0200 (CEST) From: Jim Meyering To: "Alan Curry" Subject: Re: bug#11950: cp: Recursively copy ordered for maximal reading speed In-Reply-To: <20120716205243.3848.qmail@kosh.dhis.org> (Alan Curry's message of "Mon, 16 Jul 2012 15:52:42 -0500 (GMT+5)") References: <20120716015306.4ae4e664@mirrors.kernel.org> <20120716205243.3848.qmail@kosh.dhis.org> Date: Tue, 17 Jul 2012 06:26:35 +0200 Message-ID: <87k3y3j9ms.fsf@rho.meyering.net> Lines: 16 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 11950 Cc: Michael , 11950@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) tags 11950 moreinfo thanks Alan Curry wrote: ... > It's called directory order. It used to be simply order of creation of > files, with deletions creating gaps that could be filled by later > creations with same-length or shorter names. Thanks for the report Michael, and thanks for replying, Alan. Michael, you may have noticed that your email automatically created an "issue" in our bug tracker: any email discussion on this thread ends up being archived here: http://bugs.gnu.org/11950 Please let us know how your experiments go. From debbugs-submit-bounces@debbugs.gnu.org Sat Sep 15 06:15:49 2012 Received: (at 11950) by debbugs.gnu.org; 15 Sep 2012 10:15:49 +0000 Received: from localhost ([127.0.0.1]:34840 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TCpPg-00086P-S6 for submit@debbugs.gnu.org; Sat, 15 Sep 2012 06:15:49 -0400 Received: from mx.meyering.net ([88.168.87.75]:49038) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TCpPd-00086D-UT; Sat, 15 Sep 2012 06:15:46 -0400 Received: from rho.meyering.net (rho.meyering.net [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id 2E3AF601F7; Sat, 15 Sep 2012 12:14:43 +0200 (CEST) From: Jim Meyering To: 11950@debbugs.gnu.org Subject: Re: bug#11950: cp: Recursively copy ordered for maximal reading speed In-Reply-To: <20120717062944.5378.qmail@kosh.dhis.org> (Alan Curry's message of "Tue, 17 Jul 2012 01:29:44 -0500 (GMT+5)") References: <20120717062944.5378.qmail@kosh.dhis.org> Date: Sat, 15 Sep 2012 12:14:43 +0200 Message-ID: <878vcbwq24.fsf@rho.meyering.net> Lines: 8 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.4 (--) X-Debbugs-Envelope-To: 11950 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.4 (--) tags 11950 notabug close 11950 thanks Thanks for your interest. Since this is not a bug in coreutils, I'm marking this issue as such (notabug) and closing it. Any additional discussion is still fine and will be archived along with the rest at http://bugs.gnu.org/11950. From unknown Sat Aug 16 15:53:21 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 13 Oct 2012 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator