GNU bug report logs -
#11950
cp: Recursively copy ordered for maximal reading speed
Previous Next
Reported by: Michael <codejodler <at> gmx.ch>
Date: Mon, 16 Jul 2012 15:26:02 UTC
Severity: normal
Tags: moreinfo, notabug
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Hello,
After coding several backup tools there's something in my mind since years. When 'cp' copies files from magnetic harddisks (commonly called after their adapter or bus - SATA, IDE, and the like, i'm not talking about solid state) recursively, it seems to pick up the files in 'raw' order, just as the disk buffer spit them out (like 'in one head move'). Or so. It does not resemble any alphabetical order, for example, it does not even stay within the same parent folder (flingering hither and forth, as the files come in).
I suppose that's the fastest order, fastest for reading. However, one could consider another 'maximal speed': The (later) read access of the copied files.
(Among the reasons that files are not sorted physically on disk are FS driver gap optimizing code, and user actions like deleting single files, or moving into another place. It could be called 'physically folder fragmentation', something happening sooner or later, if you work on files, anyway. I'd like to propose a way to avoid this specific fragmentation when copying.)
For example, take a large image gallery, sorted into several folders and all files sorted alphanum. This is a standard example. Now what will file managers, or image viewers, do with these files ? They will read in one folders content, and display the files sorted alphanum. Usually, they even create thumbnails, so they really access any file separately, and in the said order.
This is creating quite some disk head moves, because they are not stored in that order 'physically' on disk. Meaning, it is slow, even if the disk is fast and have a fast buffer, compared to the rarely existing case when the files would be stored physically just in their access order. I hope the idea got clear....
Now my proposal is to have a recursive 'ordered' mode, where cp copies the files of one folder in their alphanumeric sorting (which should be the view mode in 99% of all cases out there). It would slow down the copy process a bit, for the benefit of later reading speed.
Now you may ask what is it good for. Aren't backups just that, and noone ever opens them with file managers or viewers, regularly ?
But 'cp' is not only used for backups. It is also used to copy the files from the camera chip to the harddisk in the first place, or to copy over to network drives. I believe it is most as backend in most desktop applications anyway, and probably in most servers too.
It still is true that most people want maximal copy speed, not maximal reading. But maybe that's partly just because they don't know the choice even exists. If there was such a recursive option, then backup or download tools at least could offer it in their settings too. I would certainly use it in my backup code, because i'm dealing with massive backups, where (maybe unobviously) speed does not matter so much exactly for that reason: Because it needs hours anyway. I do not need speed with backup. I need speed when reading.
I'm a DJ with huge music collection, and also a massive photographer and doing lots of movie clips too, doing backups since more than 10 years, and i am absolutely sure about this choice. I just think that there is a grain of meaning in my proposal.
I'm not on any bug list, i hope this can be accepted just as a mail. Let me know if and how i can do it better.
Kind regards, Michael
This bug report was last modified 12 years and 309 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.