Package: coreutils;
Reported by: "Polehn, Mike A" <mike.a.polehn <at> intel.com>
Date: Fri, 10 Oct 2014 17:30:02 UTC
Severity: normal
Done: Bob Proulx <bob <at> proulx.com>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Bob Proulx <bob <at> proulx.com> To: Linda Walsh <bash <at> tlinx.org> Cc: 18681 <at> debbugs.gnu.org, "Polehn, Mike A" <mike.a.polehn <at> intel.com> Subject: bug#18681: cp Specific fail example Date: Sun, 19 Oct 2014 17:53:31 -0600
Linda Walsh wrote: > Bob Proulx wrote: > > Also consider that if cp were to acquire all of the enhancements > > that have been requested for cp as time has gone by then cp would > > be just as featureful (bloated!) as rsync and likely just as slow > > as rsync too. > > Nope...rsync is slow because it does everything over a client > server model --- even when it is local. So everything is written through > a pipe .. that's why it can't come close to cp -- and why cp would never > be so slow -- I can't imagine it using a pipe to copy a file anywhere! The client-server structure of rsync is required for copying between systems. Saying that cp doesn't have it isn't fair if cp were to add every requested feature. I am sure that if I search the archives I would find a request to add client-server structure to cp to support copying from system to system. :-) Now I will proactively agree that it would be nice if rsync detected that it was all running locally and didn't fork and instead ran everything in one process like cp does. But I could see that coming to rsync at some time in the future. It is an often requested feature. > > This is something to consider every time someone asks for a > > creeping feature to cp. Especially if they say they want the feature > > in cp because it is faster than rsync. The natural progression is > > that cp would become rsync. > > Not even! Note. cp already has a comparison function > built in that it uses during "cp -u"... I am not convinced of the robustness of 'cp -u ...' interrupt, repeat, interrupt repeat. It wasn't intended for that mode. I am suspicious. Is there any code path that could leave a new file in the target area that would avoid copy? Not sure. Newer meets the -u test but isn't an exact copy if the time stamp were older in the original. But with rsync I know it will correct for this during a subsequent run. > built in that it uses during "cp -u"... but it doesn't go through > pipes. It used to use larger buffer sizes or maybe tell posix > to pre-alloc the destination space, dunno, but it used to be > faster.. I can't say for certain, but it seems to be using Often the data sizes we work with grow larger over time making the same task feel slower because we are actually dealing with more data now. Files include audio. Files include video. Standard def becomes high def. "Difficult to see. Always in motion is the future." > smaller buffer sizes. Another reason rsync is so slow -- uses > a relatively small i/o size 1-4k last I looked. I've asked them > to increase it, but going through a pipe it won't help alot. Nod. Rsync was designed for the network use case. It could benefit with some tuning for the local case. A topic for the rsync list. > Also in rsync, they've added the posix calls to reserve > space in the target location for a file being copied in. > Specifically, this is to lower disk fragmentation (does > cp do anything like that, been a while since I looked). I don't know. It would be worth a look. > > The advantage of rsync is that it can be interrupted and restarted and > > the restarted process will efficiently avoid doing work that is > > already done. An interrupted and restarted cp will perform the same > > work again from start to finish. > > I wouldn't trust that it would. If you interrupt it at exactly > the wrong time, I'd be afraid some file might get set with the right > data but the wrong Meta info (acls, primarily). The design of rsync is to copy the file to a temporary name beside the intended target. After the copy the timestamps are set. After that the timestamps are set the file is renamed into place. An interrupt that happens before that rename time will cause the temporary file to be removed. An interrupt that happens after the rename is, well, after that and the copy is already done. Since rename on the local file system is atomic this is guaranteed to function robustly. (As long as you aren't using a buggy file system that changes the order of operations. That isn't cool. But of course it was famously seen in ext4 for a while. Fortunately sanity has prevailed and ext4 doesn't do that for this operation anymore. Okay to use now.) > > If I am doing a simple copy from A to B then I use 'cp -av A B'. If I > > am doing it the second time then I will use rsync to avoid repeating > > previously done work 'rsync -av A B'. > > Wouldn't cp -auv A B do the same? Do I have to go look at the source code to verify that it doesn't? :-( I assume it doesn't without looking. I assume cp copies in place. I assume that cp does not make a temporary file off to the side and rename it into place once it is done and has set the timestamps. I assume that cp copies to the named destination directly and updates the timestamps afterward. That creates a window of time when the file is in place but has not had the timestamp placed on it yet. Which means that if the cp is interrupted on a large file that it will have started the copy but will not have finished it at the moment that it is interrupted. The new file will be in place with a new timestamp. The second run with cp -u will avoid overwriting the file because the timestamp is newer. However the contents of the file will be incomplete, or at least not matching the source copy at the time of the second copy. If my assumptions in the above are wrong please correct me. I will learn something. But the operating model would need to be the same portably across all portable systems covered by posix before I would consider it actually safe to use. > > If I want progress indication... If I want placement of backup files > > in a particular directory... If I want other fancy features that are > > provided by rsync then it is worth it to use rsync. > > ...trimmed simple benchmark... > > $ time cp -a coreutils junk/ > > By default cp -a transfers acls and ext-attrs and preserves > hard links. Rsync doesn't do any of that by default. > You need to use "-aHAX" to compare them ... Good catch. :-) > you have to call them > out as 'extra' with rsync, so the above test may not be what it seems. > Though if you don't use ACL's (which I do), then maybe the above > is almost reasonable. Still.. should use -aHAX I didn't have any hard links, ACLs, or extended attributes in the test case it shouldn't matter for the above. > Is your rsync newer? i.e. does it have the posix-pre-alloc > hints?... Mine has a pre-alloc patch, but I think that was > suse-added and not the one in the mainline code. Not sure. > > rsync --version > rsync version 3.1.0 protocol version 31 > 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, > socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, > append, ACLs, xattrs, iconv, symtimes, prealloc, SLP I happened to run that test on Debian Sid and it is 3.1.1. However Debian Stable, which I have most widely deployed, has 3.0.9. So you are both ahead of and behind me at the same time. :-) > Throw a few TB copies at rsync -- where all the data > won't fit in memory.... it also, I'm told, has problems with > hardlinks, acls and xattrs slowing it down, so it may be a > matter of usage... I have had problems running rsync with -H for large data sets. Bad enough that I recommend against it. Don't do it! I don't know anything about -A and -X. But rsync -a is fine for very large data sets. > BUT all that said... note that I DO USE it... for the > job I'm doing in my snapper script, nothing else will. Yes. It is too useful to be without! > (don't ya just love performance talk?) Except that we should have moved all of this to the discussion list. I feel guilty to have continued it. We have drifted well away from the original bug report. The one with the terrible title. If this continues let's take it over to the coreutils discussion list for further conversation about it. Bob
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.