Package: coreutils;
Reported by: "Polehn, Mike A" <mike.a.polehn <at> intel.com>
Date: Fri, 10 Oct 2014 17:30:02 UTC
Severity: normal
Done: Bob Proulx <bob <at> proulx.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18681 in the body.
You can then email your comments to 18681 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Fri, 10 Oct 2014 17:30:02 GMT) Full text and rfc822 format available."Polehn, Mike A" <mike.a.polehn <at> intel.com>
:bug-coreutils <at> gnu.org
.
(Fri, 10 Oct 2014 17:30:03 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: "Polehn, Mike A" <mike.a.polehn <at> intel.com> To: "bug-coreutils <at> gnu.org" <bug-coreutils <at> gnu.org> Subject: The Linux cp command has bugs Date: Fri, 10 Oct 2014 17:25:23 +0000
cp --version: 8.21 running on Fedora 20, version 3.16.3-200.fc20.x86_64 with latest updates The Linux copy command (cp) has problems Problem need to copy a tree of 1000s of files to another directory that is a git directory that has a whole bunch of additional build files, so diff between the directories will not do any good. If the files are copied over the git directory I can do what I need to do, since I need to see if there are in differences in any of the files. Using: cp -f -r <dir a> <dir b> For each file being copied it asked: cp: overwrite XXXXXXXXXXXXXXXXX? So the force command does not work, since it should skip the asking about doing an overwrite. If the force command is supposed act differently, then there should be an additional argument because answering yes 1000s of times is not very smart... Also since there are a lot of files, if I accidently hit return before y, cp moves on to the next file, which implies to me that the file was not copied, which gets to be a problem when 1000s of files are copied. I also assumed that 'y' implies the data was copied.
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Fri, 10 Oct 2014 18:03:01 GMT) Full text and rfc822 format available.Message #8 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: "Polehn, Mike A" <mike.a.polehn <at> intel.com> To: "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org> Subject: cp Specific fail example Date: Fri, 10 Oct 2014 18:01:59 +0000
######### get and check out version [root <at> F20-v3 ~]# cd /usr/src [root <at> F20-v3 src]# git clone git://dpdk.org/dpdk [root <at> F20-v3 src]# cd dpdk [root <at> F20-v3 dpdk]# git tag v1.2.3r0 v1.2.3r1 v1.2.3r2 v1.2.3r3 v1.2.3r4 v1.3.0r0 v1.3.1r0 v1.3.1r1 v1.3.1r2 v1.3.1r3 v1.4.0r0 v1.4.1r0 v1.4.1r1 v1.4.1r2 v1.5.0r0 v1.5.0r1 v1.5.0r2 v1.5.1r0 v1.5.1r1 v1.5.1r2 v1.5.2r0 v1.5.2r1 v1.5.2r2 v1.6.0r0 v1.6.0r1 v1.6.0r2 v1.7.0 v1.7.0-rc1 v1.7.0-rc2 v1.7.0-rc3 v1.7.0-rc4 v1.7.1 v1.8.0-rc1 [root <at> F20-test dpdk]# git checkout -b map_v1.7.1 v1.7.1 Switched to a new branch 'map_v1.7.1' ### download dpdk 1.7.1 files from http://dpdk.org/download ### put in /usr/src directory and untar: [root <at> F20-v3 src]# tar -xf dpdk-1.7.1.tar.gz [root <at> F20-v3 src]# dir dpdk dpdk-1.7.1 dpdk-1.7.1.tar.gz [root <at> F20-v3 src]# cp -f -r dpdk-1.7.1/* dpdk/ cp: overwrite âdpdk/app/test/test_lpm6.câ? y cp: overwrite âdpdk/app/test/test_rwlock.câ? y cp: overwrite âdpdk/app/test/test_table_ports.hâ? y cp: overwrite âdpdk/app/test/test_logs.câ? y cp: overwrite âdpdk/app/test/test_pmd_ring.câ? y cp: overwrite âdpdk/app/test/test_table_tables.hâ? cp: overwrite âdpdk/app/test/test_lpm.câ? cp: overwrite âdpdk/app/test/test_malloc.câ? cp: overwrite âdpdk/app/test/test_errno.câ? y cp: overwrite âdpdk/app/test/test_hash.câ? cp: overwrite âdpdk/app/test/test_table_acl.hâ? y note: asking question on each file and moving to next file even when not entering n or y
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Fri, 10 Oct 2014 18:14:02 GMT) Full text and rfc822 format available.Message #11 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: Paul Eggert <eggert <at> cs.ucla.edu> To: "Polehn, Mike A" <mike.a.polehn <at> intel.com>, 18681 <at> debbugs.gnu.org Subject: Re: bug#18681: The Linux cp command has bugs Date: Fri, 10 Oct 2014 11:13:44 -0700
Polehn, Mike A wrote: > Using: cp -f -r <dir a> <dir b> > > For each file being copied it asked: > > cp: overwrite XXXXXXXXXXXXXXXXX? That's not what I observe here (see below). Perhaps there's something else going on, maybe an alias. For example, I couldn't get the cp to work without also using -T. Can you please give an exact recipe for reproducing the problem on your platform? $ mkdir a b $ echo a >a/f $ echo b >b/f $ cp -f -r -T a b $ cat b/f a
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Fri, 10 Oct 2014 18:18:01 GMT) Full text and rfc822 format available.Message #14 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: Paul Eggert <eggert <at> cs.ucla.edu> To: "Polehn, Mike A" <mike.a.polehn <at> intel.com>, "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org> Subject: Re: bug#18681: cp Specific fail example Date: Fri, 10 Oct 2014 11:17:34 -0700
I do not observe the symptoms that you report. See below. My guess is that you've aliased 'cp' to 'cp -i', which is probably a mistake. $ git clone git://dpdk.org/dpdk Cloning into 'dpdk'... remote: Counting objects: 16249, done. remote: Compressing objects: 100% (3976/3976), done. remote: Total 16249 (delta 12964), reused 15109 (delta 12122) Receiving objects: 100% (16249/16249), 12.79 MiB | 1.20 MiB/s, done. Resolving deltas: 100% (12964/12964), done. Checking connectivity... done. $ cd dpdk $ git checkout -b map_v1.7.1 v1.7.1 Switched to a new branch 'map_v1.7.1' $ pwd /tmp/d/dpdk $ cd .. $ wget http://dpdk.org/browse/dpdk/snapshot/dpdk-1.7.1.tar.gz --2014-10-10 11:15:44-- http://dpdk.org/browse/dpdk/snapshot/dpdk-1.7.1.tar.gz Resolving dpdk.org (dpdk.org)... 92.243.14.124 Connecting to dpdk.org (dpdk.org)|92.243.14.124|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [application/x-gzip] Saving to: ‘dpdk-1.7.1.tar.gz’ [ <=> ] 8,281,609 1.17MB/s in 7.5s 2014-10-10 11:15:52 (1.06 MB/s) - ‘dpdk-1.7.1.tar.gz’ saved [8281609] $ tar -xf dpdk-1.7.1.tar.gz $ cp -f -r dpdk-1.7.1/* dpdk/ $
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Fri, 10 Oct 2014 19:14:02 GMT) Full text and rfc822 format available.Message #17 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: Assaf Gordon <assafgordon <at> gmail.com> To: "Polehn, Mike A" <mike.a.polehn <at> intel.com>, 18681 <at> debbugs.gnu.org Subject: Re: bug#18681: The Linux cp command has bugs Date: Fri, 10 Oct 2014 15:13:44 -0400
Hello Mike, On 10/10/2014 01:25 PM, Polehn, Mike A wrote:> > Problem need to copy a tree of 1000s of files to another directory > that is a git directory that has a whole bunch of additional build > files, so diff between the directories will not do any good. > This is slightly off-topic, but if you want to compare only files managed by git (ignoring other files in current directory), perhaps the following would help: # Download and extract the tarball wget -q http://dpdk.org/browse/dpdk/snapshot/dpdk-1.7.1.tar.gz tar -xf dpdk-1.7.1.tar.gz # Clone the git repo with specific branch, checkout the relevant branch # (or go to an existing checked-out repository directory) git clone git://dpdk.org/dpdk cd dpdk git checkout -b map_v1.7.1 v1.7.1 # For each file managed by git (with 'git ls'), # compare it to the corresponding file in the other directory: git ls -0 | xargs -0 -I% diff -q % ../dpdk-1.7.1/% Regards, -gordon
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Fri, 10 Oct 2014 19:16:01 GMT) Full text and rfc822 format available.Message #20 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: Assaf Gordon <assafgordon <at> gmail.com> To: "Polehn, Mike A" <mike.a.polehn <at> intel.com>, 18681 <at> debbugs.gnu.org Subject: Re: bug#18681: The Linux cp command has bugs Date: Fri, 10 Oct 2014 15:15:18 -0400
Sorry, had a typo: On 10/10/2014 03:13 PM, Assaf Gordon wrote: > # For each file managed by git (with 'git ls'), > # compare it to the corresponding file in the other directory: > git ls -0 | xargs -0 -I% diff -q % ../dpdk-1.7.1/% > Should be: git ls -z | xargs -0 -I% diff -q % ../dpdk-1.7.1/%
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Fri, 10 Oct 2014 19:47:02 GMT) Full text and rfc822 format available.Message #23 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: "Polehn, Mike A" <mike.a.polehn <at> intel.com> To: Paul Eggert <eggert <at> cs.ucla.edu>, "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org> Subject: RE: bug#18681: cp Specific fail example Date: Fri, 10 Oct 2014 19:46:25 +0000
Hi Paul! Thank you for your quick response! You were logged in as a normal user. I was logged in as root. I tried as normal user and it worked the same as you. However, logged in as root and the error occurred as before. Did a search for 'cp -I' and found it for root: [root <at> F20-v3 ~]# find /root -type f -print |xargs grep -Hn "cp i" /root/.bashrc:6:alias cp='cp -i' /root/.cshrc:6:alias cp 'cp -i' /root/.tcshrc:6:alias cp 'cp -i' [root <at> F20-v3 ~]# find /etc -type f -print |xargs grep -Hn "cp i" [root <at> F20-v3 ~]# find /home/mike -type f -print |xargs grep -Hn "cp i" /home/mike/dpdk-1.7.1/examples/vhost/main.c:2456: "mbuf_destroy_zcp is: %d\n", /home/mike/dpdk-1.7.1/examples/vhost/main.c:2474: "mbuf_destroy_zcp is: %d\n", /home/mike/dpdk-1.7.1/examples/vhost/main.c:2478: "mbuf_destroy_zcp is : %d\n", /home/mike/dpdk/examples/vhost/main.c:2456: "mbuf_destroy_zcp is: %d\n", /home/mike/dpdk/examples/vhost/main.c:2474: "mbuf_destroy_zcp is: %d\n", /home/mike/dpdk/examples/vhost/main.c:2478: "mbuf_destroy_zcp is : %d\n", But there is still an error for interactive: [root <at> F20-v3 src]# cp -f -r dpdk-1.7.1/* dpdk/ cp: overwrite âdpdk/app/test/test_lpm6.câ? y cp: overwrite âdpdk/app/test/test_rwlock.câ? y cp: overwrite âdpdk/app/test/test_table_ports.hâ? y cp: overwrite âdpdk/app/test/test_logs.câ? y cp: overwrite âdpdk/app/test/test_pmd_ring.câ? y cp: overwrite âdpdk/app/test/test_table_tables.hâ? cp: overwrite âdpdk/app/test/test_lpm.câ? cp: overwrite âdpdk/app/test/test_malloc.câ? cp: overwrite âdpdk/app/test/test_errno.câ? y cp: overwrite âdpdk/app/test/test_hash.câ? cp: overwrite âdpdk/app/test/test_table_acl.hâ? y Didn't answer yes or no for some of these and they moved on anyway, indicating the interactive mode is not operating as expected. It is a good idea as root not to be overwriting files, so I can understand the "cp -i" usage for root. However, some of the reason for using root is to do something that you may not be able to do as a normal user. So being able to override the -i with a -f would be highly desirable. Mike -----Original Message----- From: Paul Eggert [mailto:eggert <at> cs.ucla.edu] Sent: Friday, October 10, 2014 11:18 AM To: Polehn, Mike A; 18681 <at> debbugs.gnu.org Subject: Re: bug#18681: cp Specific fail example I do not observe the symptoms that you report. See below. My guess is that you've aliased 'cp' to 'cp -i', which is probably a mistake. $ git clone git://dpdk.org/dpdk Cloning into 'dpdk'... remote: Counting objects: 16249, done. remote: Compressing objects: 100% (3976/3976), done. remote: Total 16249 (delta 12964), reused 15109 (delta 12122) Receiving objects: 100% (16249/16249), 12.79 MiB | 1.20 MiB/s, done. Resolving deltas: 100% (12964/12964), done. Checking connectivity... done. $ cd dpdk $ git checkout -b map_v1.7.1 v1.7.1 Switched to a new branch 'map_v1.7.1' $ pwd /tmp/d/dpdk $ cd .. $ wget http://dpdk.org/browse/dpdk/snapshot/dpdk-1.7.1.tar.gz --2014-10-10 11:15:44-- http://dpdk.org/browse/dpdk/snapshot/dpdk-1.7.1.tar.gz Resolving dpdk.org (dpdk.org)... 92.243.14.124 Connecting to dpdk.org (dpdk.org)|92.243.14.124|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [application/x-gzip] Saving to: ‘dpdk-1.7.1.tar.gz’ [ <=> ] 8,281,609 1.17MB/s in 7.5s 2014-10-10 11:15:52 (1.06 MB/s) - ‘dpdk-1.7.1.tar.gz’ saved [8281609] $ tar -xf dpdk-1.7.1.tar.gz $ cp -f -r dpdk-1.7.1/* dpdk/ $
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Fri, 10 Oct 2014 20:02:01 GMT) Full text and rfc822 format available.Message #26 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: "Polehn, Mike A" <mike.a.polehn <at> intel.com> To: Assaf Gordon <assafgordon <at> gmail.com>, "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org> Subject: RE: bug#18681: The Linux cp command has bugs Date: Fri, 10 Oct 2014 20:00:52 +0000
Hi Assaf! Thank you for your quick response! There is always multiple ways to do things. The git tool has a diff tool built in that makes file comparison easy. I have run across multiple times that copying one tree over another is desirable. In another bug message thread, we found that the cause was cp alias to 'cp -i' for root user was the actual cause. This still left the incorrect operation of the interactive operation when both -i and -f is used. I think that in some cases the need of override the '-i' with '-f' maybe very desirable. So maybe having the '-f' cancel or override the '-i' operation might be a good change. Thanks! Mike -----Original Message----- From: Assaf Gordon [mailto:assafgordon <at> gmail.com] Sent: Friday, October 10, 2014 12:14 PM To: Polehn, Mike A; 18681 <at> debbugs.gnu.org Subject: Re: bug#18681: The Linux cp command has bugs Hello Mike, On 10/10/2014 01:25 PM, Polehn, Mike A wrote:> > Problem need to copy a tree of 1000s of files to another directory > that is a git directory that has a whole bunch of additional build > files, so diff between the directories will not do any good. > This is slightly off-topic, but if you want to compare only files managed by git (ignoring other files in current directory), perhaps the following would help: # Download and extract the tarball wget -q http://dpdk.org/browse/dpdk/snapshot/dpdk-1.7.1.tar.gz tar -xf dpdk-1.7.1.tar.gz # Clone the git repo with specific branch, checkout the relevant branch # (or go to an existing checked-out repository directory) git clone git://dpdk.org/dpdk cd dpdk git checkout -b map_v1.7.1 v1.7.1 # For each file managed by git (with 'git ls'), # compare it to the corresponding file in the other directory: git ls -0 | xargs -0 -I% diff -q % ../dpdk-1.7.1/% Regards, -gordon
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Fri, 10 Oct 2014 20:56:02 GMT) Full text and rfc822 format available.Message #29 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: Eric Blake <eblake <at> redhat.com> To: "Polehn, Mike A" <mike.a.polehn <at> intel.com>, Assaf Gordon <assafgordon <at> gmail.com>, "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org> Subject: Re: bug#18681: The Linux cp command has bugs Date: Fri, 10 Oct 2014 14:54:59 -0600
[Message part 1 (text/plain, inline)]
On 10/10/2014 02:00 PM, Polehn, Mike A wrote: > This still left the incorrect operation of the interactive operation when both -i and -f is used. The behavior of -i vs. -f interaction is required by POSIX; in particular, POSIX is explicit that -i and -f are NOT a toggle switch of one another, but each turns on slightly different, somewhat overlapping, changes in behavior (so specifying both is different from specifying one in isolation). We can't change what either one of those flags means. If there is another mode of operation that is also useful, then it needs yet another flag. At one point in the past, we had --reply={yes,no,query} to try and offer a third mode, but it had confusing semantics and we ended up pulling it because of the confusion it could cause. At the time we pulled it, we admitted that 'rsync' has some modes of operations that might be better suited for the particular modes that people people seemed to be requesting when they thought that --reply would do the trick (and usually, what they thought --reply would do and what it actually did were different, which is why we removed it to avoid confusion). We have also added a --no-clobber option, which is somewhat of a compromise (what some people thought --reply=no would do, --no-clobber actually does better). So adding a new option is not out of the question, but you'd have to have well-defined semantics of what it should do, and how it differs from either normal mode, '-i' mode, '-f' mode, '-i -f' mode, or '--no-clobber' mode. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Fri, 10 Oct 2014 23:37:01 GMT) Full text and rfc822 format available.Message #32 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: Bob Proulx <bob <at> proulx.com> To: "Polehn, Mike A" <mike.a.polehn <at> intel.com> Cc: 18681 <at> debbugs.gnu.org Subject: Re: bug#18681: cp Specific fail example Date: Fri, 10 Oct 2014 17:36:16 -0600
Polehn, Mike A wrote: > Did a search for 'cp -I' and found it for root: > > [root <at> F20-v3 ~]# find /root -type f -print |xargs grep -Hn "cp i" > /root/.bashrc:6:alias cp='cp -i' > /root/.cshrc:6:alias cp 'cp -i' > /root/.tcshrc:6:alias cp 'cp -i' It might be easier to guess that there is an alias and look for it. :-) # alias cp alias cp='cp -i' # type cp ls is aliased to `cp -i' > But there is still an error for interactive: > > [root <at> F20-v3 src]# cp -f -r dpdk-1.7.1/* dpdk/ Since you know that "cp" in the above is "cp -i" then you know the command is actually "cp -i -f -r dpdk-1.7.1/* dpdk/" which you don't want there. Try it without the alias in play. The normal way in a /bin/sh derived environment is to simply quote the command. If you quote the command then it won't do alias expansion. The usual method of quoting is with a backslash. # \cp -f -r dpdk-1.7.1/* dpdk/ However the canonical method is to use "env" since the above doesn't work in csh derived shells. Therefore you will find suggestions to use env to wrap the command and avoid alias expansion like this. It is often offered when we don't know if you are using a sh or csh derived command line shell. (This env trick is one I learned on this list some years ago.) # env cp -f -r dpdk-1.7.1/* dpdk/ And of course you can always unalias the command too. # unalias cp > It is a good idea as root not to be overwriting files, so I can > understand the "cp -i" usage for root. Personally I simply realize that the tools are sharp kitchen knives and I always handle sharp kitchen knives carefully. Trying to put safety shields on them simply gets in the way. It tends to cause problems such as you are seeing here. I usually remove those aliases on systems I administer. > However, some of the reason for using root is to do something that > you may not be able to do as a normal user. So being able to > override the -i with a -f would be highly desirable. Right. And you can. You have the power. Just do it. By avoiding the alias with \cp (or the env trick) and then you won't have the -i in play. Or remove the alias from the environment. There is the burden upon the root superuser that they have great power. With great power comes great responsibility. Being root means you are a pilot not a passenger. There is an old saying in flying, "Fly the airplane. Don't let the airplane fly you." Hopefully the meaning is obvious even to the non-pilot. Meanwhile... I would be one of those suggesting that perhaps you should try using rsync instead of cp. The cp command is lean and mean by comparison to rsync (and should stay that way). But rsync has many attractive features for doing large copies. Bob
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Fri, 10 Oct 2014 23:50:02 GMT) Full text and rfc822 format available.Message #35 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: Norihiro Tanaka <noritnk <at> kcn.ne.jp> To: "Polehn, Mike A" <mike.a.polehn <at> intel.com> Cc: "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org>, Assaf Gordon <assafgordon <at> gmail.com> Subject: Re: bug#18681: The Linux cp command has bugs Date: Sat, 11 Oct 2014 08:48:54 +0900
Hi Polehn, The -f option isn't `suppress interactive' in cp. It attempts to unlink a destination not to be able to override. It's different from the option in mv. As the behavior is clearly defined in POSIX as Eric says, we won't be able to change it. BTW, I don't like the alias `cp -i'. So I remove it from .bashrc always immediately after an installation of a distribution. (^_^) If you temporarily want to cancel the the alias, you can define an another alias as `cpf', and/or can use below instead of `cp' - command cp -f - /bin/cp -f - ( unalias cp; cp -f ... ) Even if add new option `-F' to supress interactive to cp, we need to use -F for cp and -f for mv to do it.
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Sat, 11 Oct 2014 00:39:02 GMT) Full text and rfc822 format available.Message #38 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: Jon Stanley <jonstanley <at> gmail.com> To: Norihiro Tanaka <noritnk <at> kcn.ne.jp> Cc: "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org>, Assaf Gordon <assafgordon <at> gmail.com>, "Polehn, Mike A" <mike.a.polehn <at> intel.com> Subject: Re: bug#18681: The Linux cp command has bugs Date: Fri, 10 Oct 2014 20:38:06 -0400
On Fri, Oct 10, 2014 at 7:48 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote: > If you temporarily want to cancel the the alias, you can define an another > alias as `cpf', and/or can use below instead of `cp' Note that (in bash at least) you can prefix the command with a backslash (\) to override an alias for that invocation, and is what I typically do: $ \cp <blah>
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Sat, 11 Oct 2014 17:28:02 GMT) Full text and rfc822 format available.Message #41 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: Linda Walsh <bash <at> tlinx.org> To: Bob Proulx <bob <at> proulx.com> Cc: 18681 <at> debbugs.gnu.org, "Polehn, Mike A" <mike.a.polehn <at> intel.com> Subject: Re: bug#18681: cp Specific fail example Date: Sat, 11 Oct 2014 02:24:31 -0700
Bob Proulx wrote: > Meanwhile... I would be one of those suggesting that perhaps you > should try using rsync instead of cp. The cp command is lean and mean > by comparison to rsync (and should stay that way). But rsync has many > attractive features for doing large copies. > ---- fwiw...--- Like large execution times... from the latest snapshot on my system -- I use rsync to only move differences between yesterday and "today[whenever new snap is taken]"... it was a larger than normal snap -- most only take 75-90 minutes...but rsync (these are the script messages) with some debugging output still turned on... even an rm over the resulting diff took 101 seconds... then cp comes along.. even w/a sync it would still be under a minute. I.e. rsync copied just the diffs to "/home.diff", then find with "-empty -delete" is used to get rid of empty dirs (rsync creates many of these). then a static partition is created to hold the "diff" output -- and cp took walked and copied the tree in 12s. (output wasn't flushed, but it's not that long.. <a minute...). If rsync wasn't so slow at local I/O...*sigh*.... rsync took 110m, 14s Empty-directory removal took 1m, 41s Find used space for /home.diff...sz=4.3GB, min=5.4GB, extsz=4.0MB, n-ext'=1388 target extents num=1388, size=4.0M Old volume active: Deactivated. Removed. Create vol. Home-2014.10.08-03.07.05, size 5.4G {L=>141008030705, /dev/Data/Home-2014.10.08-03.07.05=>CODE(0xbf24a0), f=>CODE(0xbf24e8), d=>{su=>"64k", sw=>1}, i=>{maxpct=>10, size=>256}, s=>{size=>4096}} About to copy base-diff dir to static Copying diffs to dated static snap...Time: 0m, 12s. mklabel@ /home/.snapdir/@GMT-2014.10.08-03.07.05/./._snapdat_=snap_copy_complete after copy2staticsnap: complete
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Mon, 13 Oct 2014 00:55:01 GMT) Full text and rfc822 format available.Message #44 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: Bob Proulx <bob <at> proulx.com> To: Linda Walsh <bash <at> tlinx.org> Cc: 18681 <at> debbugs.gnu.org, "Polehn, Mike A" <mike.a.polehn <at> intel.com> Subject: Re: bug#18681: cp Specific fail example Date: Sun, 12 Oct 2014 18:54:03 -0600
Linda Walsh wrote: > Bob Proulx wrote: > > Meanwhile... I would be one of those suggesting that perhaps you > > should try using rsync instead of cp. The cp command is lean and > > mean by comparison to rsync (and should stay that way). But rsync > > has many attractive features for doing large copies. > > ---- fwiw...--- > Like large execution times... from the latest snapshot on my system -- > I use rsync to only move differences between yesterday and "today[whenever > new snap is taken]"... it was a larger than normal snap -- most only > take 75-90 minutes...but rsync (these are the script messages) with some > debugging output still turned on... even an rm over the resulting diff > took 101 seconds... then cp comes along.. even w/a sync it would > still be under a minute. Wow. Just to be clear an rsync copy took 75 to 90 minutes but a cp copy took less than 1 minute? I find that very suspicious. I never see that much difference between them. Are you sure the difference wasn't that the data was cached into ram by the rsync and therefore the second run with cp just ran with the warmed up cache? With a large data set and a large ram that is plausible. > I.e. rsync copied just the diffs to "/home.diff", then > find with "-empty -delete" is used to get rid of empty dirs (rsync > creates many of these). then a static partition is created to hold > the "diff" output -- and cp took walked and copied the tree in 12s. > (output wasn't flushed, but it's not that long.. <a minute...). It appears that you are using features from rsync that do not exist in cp. Therefore the work being done in the task isn't equivalent work. In that case it is probably quite reasonable for rsync to be slower than cp. Also consider that if cp were to acquire all of the enhancements that have been requested for cp as time has gone by then cp would be just as featureful (bloated!) as rsync and likely just as slow as rsync too. This is something to consider every time someone asks for a creeping feature to cp. Especially if they say they want the feature in cp because it is faster than rsync. The natural progression is that cp would become rsync. > If rsync wasn't so slow at local I/O...*sigh*.... The advantage of rsync is that it can be interrupted and restarted and the restarted process will efficiently avoid doing work that is already done. An interrupted and restarted cp will perform the same work again from start to finish. If I am doing a simple copy from A to B then I use 'cp -av A B'. If I am doing it the second time then I will use rsync to avoid repeating previously done work 'rsync -av A B'. If I want progress indication... If I want placement of backup files in a particular directory... If I want other fancy features that are provided by rsync then it is worth it to use rsync. $ du -s coreutils 238920 coreutils $ find coreutils -type f | wc -l 15013 $ rm -rf junk/coreutils # echo 3 > /proc/sys/vm/drop_caches $ time cp -a coreutils junk/ real 1m2.137s user 0m0.140s sys 0m1.724s $ rm -rf junk/coreutils $ time cp -a coreutils junk/ real 0m2.492s user 0m0.060s sys 0m1.064s $ rm -rf junk/coreutils # echo 3 > /proc/sys/vm/drop_caches $ time rsync -a coreutils junk/ real 1m5.473s user 0m1.280s sys 0m2.112s $ rm -rf junk/coreutils $ time rsync -a coreutils junk/ real 0m3.215s user 0m1.184s sys 0m1.536s For normal use cp is a little faster than rsync. Or rather rsync is a little slower than cp. But not enough to make a difference for typical operations. Having the file system cache warmed up makes a *HUGE* difference. Much larger than any other difference. For copies that take hours to run I am probably going to value the restart ability more than raw speed. YMMV. Bob
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Mon, 13 Oct 2014 02:15:02 GMT) Full text and rfc822 format available.Message #47 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: Leslie S Satenstein <lsatenstein <at> yahoo.com> To: "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org>, "Polehn, Mike A" <mike.a.polehn <at> intel.com> Subject: Re: bug#18681: cp Specific fail example Date: Sun, 12 Oct 2014 19:11:45 -0700
[Message part 1 (text/plain, inline)]
Further to Bob's explanation, If you were to copy a 10gig file across the internet. cp would work just fine and could take several hours. But suppose there was an error in the transmission (bad block) or you had to stop and restart. you would need to redo cp and copy the file from the beginning. Rsync would take a checksum of the parts of the file on the remote, and compare it to the host. It would restart at the first detected bad file offset. Regards Leslie Mr. Leslie Satenstein Montreal, Quebec, Canada >________________________________ > From: Bob Proulx <bob <at> proulx.com> >To: Linda Walsh <bash <at> tlinx.org> >Cc: 18681 <at> debbugs.gnu.org; "Polehn, Mike A" <mike.a.polehn <at> intel.com> >Sent: Sunday, October 12, 2014 8:54 PM >Subject: bug#18681: cp Specific fail example > > >Linda Walsh wrote: >> Bob Proulx wrote: >> > Meanwhile... I would be one of those suggesting that perhaps you >> > should try using rsync instead of cp. The cp command is lean and >> > mean by comparison to rsync (and should stay that way). But rsync >> > has many attractive features for doing large copies. >> >> ---- fwiw...--- >> Like large execution times... from the latest snapshot on my system -- >> I use rsync to only move differences between yesterday and "today[whenever >> new snap is taken]"... it was a larger than normal snap -- most only >> take 75-90 minutes...but rsync (these are the script messages) with some >> debugging output still turned on... even an rm over the resulting diff >> took 101 seconds... then cp comes along.. even w/a sync it would >> still be under a minute. > >Wow. Just to be clear an rsync copy took 75 to 90 minutes but a cp >copy took less than 1 minute? I find that very suspicious. I never >see that much difference between them. Are you sure the difference >wasn't that the data was cached into ram by the rsync and therefore >the second run with cp just ran with the warmed up cache? With a >large data set and a large ram that is plausible. > >> I.e. rsync copied just the diffs to "/home.diff", then >> find with "-empty -delete" is used to get rid of empty dirs (rsync >> creates many of these). then a static partition is created to hold >> the "diff" output -- and cp took walked and copied the tree in 12s. >> (output wasn't flushed, but it's not that long.. <a minute...). > >It appears that you are using features from rsync that do not exist in >cp. Therefore the work being done in the task isn't equivalent work. >In that case it is probably quite reasonable for rsync to be slower >than cp. > >Also consider that if cp were to acquire all of the enhancements that >have been requested for cp as time has gone by then cp would be just >as featureful (bloated!) as rsync and likely just as slow as rsync >too. This is something to consider every time someone asks for a >creeping feature to cp. Especially if they say they want the feature >in cp because it is faster than rsync. The natural progression is >that cp would become rsync. > >> If rsync wasn't so slow at local I/O...*sigh*.... > >The advantage of rsync is that it can be interrupted and restarted and >the restarted process will efficiently avoid doing work that is >already done. An interrupted and restarted cp will perform the same >work again from start to finish. > >If I am doing a simple copy from A to B then I use 'cp -av A B'. If I >am doing it the second time then I will use rsync to avoid repeating >previously done work 'rsync -av A B'. > >If I want progress indication... If I want placement of backup files >in a particular directory... If I want other fancy features that are >provided by rsync then it is worth it to use rsync. > > $ du -s coreutils > 238920 coreutils > $ find coreutils -type f | wc -l > 15013 > > $ rm -rf junk/coreutils > # echo 3 > /proc/sys/vm/drop_caches > $ time cp -a coreutils junk/ > real 1m2.137s > user 0m0.140s > sys 0m1.724s > > $ rm -rf junk/coreutils > $ time cp -a coreutils junk/ > real 0m2.492s > user 0m0.060s > sys 0m1.064s > > $ rm -rf junk/coreutils > # echo 3 > /proc/sys/vm/drop_caches > $ time rsync -a coreutils junk/ > real 1m5.473s > user 0m1.280s > sys 0m2.112s > > $ rm -rf junk/coreutils > $ time rsync -a coreutils junk/ > real 0m3.215s > user 0m1.184s > sys 0m1.536s > >For normal use cp is a little faster than rsync. Or rather rsync is a >little slower than cp. But not enough to make a difference for >typical operations. Having the file system cache warmed up makes a >*HUGE* difference. Much larger than any other difference. For copies >that take hours to run I am probably going to value the restart >ability more than raw speed. YMMV. > > > > > >Bob > > > > > >
[Message part 2 (text/html, inline)]
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Mon, 13 Oct 2014 02:47:01 GMT) Full text and rfc822 format available.Message #50 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: Linda Walsh <bash <at> tlinx.org> To: 18681 <at> debbugs.gnu.org, "Polehn, Mike A" <mike.a.polehn <at> intel.com> Subject: Re: bug#18681: cp Specific fail example Date: Sun, 12 Oct 2014 19:45:19 -0700
Bob Proulx wrote: > Wow. Just to be clear an rsync copy took 75 to 90 minutes but a cp --- Actually in the case I used for illustration, it was 110 minutes, but that was longer than normal. Last night's figures: : rsync took 87m, 34s [which is fairly quick given the size of the diffs.] : Empty-directory removal took 1m, 58s : Find used space for /home.diff...sz=2.5GB, min=3.1GB, extsz=4.0MB, n-ext'=806 : Copying diffs to dated static snap...Time: 0m, 17s. It wasn't a copy, but a diff between 2 volumes (the same volume, but one is a ~24+hour snapshot started the on the previous run. So I look at the differences between two temporal copies then copy that to a 3rd partition that starts out empty. So rsync is comparing file times (doesn't do file reads, _by_ _default_, unless it needs to move the data (as indicated by size and timestamps) -- examines all file time/dates on my 'home' partition, and compares those against a mostly-the-same- active LVM snapshot. Out of 871G, on the long day, it found ~5G of changes -- last night was only 3G... varies based on how much change happened to the volume over the period... smallest size now is 600m, largest I've seen has been about 18G. Once the *difference* is on the 3rd volume ("home.diff"), I destroy the active snapshot created 'yesterday', then recreate it as as a dynamically sized static -- enough to hold the diff. Then cp is used to move whatever "diffs" were put on the "diff" volume by rsync. So Those diffs -- most of them are _likely_ to be in memory -- AND as I mentioned, I didn't do a sync after the copy (it happens automatically, but isn't included in the timing). But if I used rsync to do that exact same copy, it would take at least 2-3 times as long... actually... hold on... I can copy it from that partition made yesterday ... into the diff parition.. but will tar up the source to prime the cache... This is the volume: > df . Filesystem Size Used Avail Use% Mounted on /dev/Data/Home-2014.10.08-03.07.05 5.5G 4.4G 1.1G 81%\ /home/.snapdir/@GMT-2014.10.08-03.07.05 Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> du -sh . 4.4G . ok... running cp 1st, then remove, then rsync...: Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \ time sudo cp -a . /home.diff/. 6.39sec 0.15usr 6.23sys (99.81% cpu) Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \ time sudo rm -fr /home.diff/. 1.69sec 0.03usr 1.64sys (99.43% cpu) Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \ time sudo rsync -aHAX . /home.diff/. 20.83sec 27.02usr 11.68sys (185.84% cpu) ----185% cpu!... hey! that's cheating and still 3x slower... here's 1 core: Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \ time sudo rm -fr /home.diff/. 1.73sec 0.03usr 1.69sys (99.39% cpu) Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \ time sudo taskset -a 02 rsync -aHAX . /home.diff/. 38.52sec 25.92usr 11.90sys (98.18% cpu) --- so limiting it to 1 cpu... 6x slower. (remember this is all in memory buffered) Note... rsync has been sped up slightly over the past couple of years and 'cp' has slown down somewhat over the same time period, so these diffs used to be worse. Then 'cp' is used to copy the image on 'home.diff' to the dynamically sized > copy took less than 1 minute? I find that very suspicious. --- Well, hopefully the above explanation is more clear and highlights what we wanted to measure. > > It appears that you are using features from rsync that do not exist in > cp. Therefore the work being done in the task isn't equivalent work. > In that case it is probably quite reasonable for rsync to be slower > than cp. ---- Yup... Never would argue differently, but for what it does, rsync is still pig slow, but when the amount of data you need to move is hundreds of times smaller than the total, it can't be beat! > > Also consider that if cp were to acquire all of the enhancements that > have been requested for cp as time has gone by then cp would be just > as featureful (bloated!) as rsync and likely just as slow as rsync > too. ---- Nope...rsync is slow because it does everything over a client server model --- even when it is local. So everything is written through a pipe .. that's why it can't come close to cp -- and why cp would never be so slow -- I can't imagine it using a pipe to copy a file anywhere! > This is something to consider every time someone asks for a > creeping feature to cp. Especially if they say they want the feature > in cp because it is faster than rsync. The natural progression is > that cp would become rsync. ---- Not even! Note. cp already has a comparison function built in that it uses during "cp -u"... but it doesn't go through pipes. It used to use larger buffer sizes or maybe tell posix to pre-alloc the destination space, dunno, but it used to be faster.. I can't say for certain, but it seems to be using smaller buffer sizes. Another reason rsync is so slow -- uses a relatively small i/o size 1-4k last I looked. I've asked them to increase it, but going through a pipe it won't help alot. This is from a different email on the rsync list from 7/26: One might ask why rsync is so slow -- copying 800G from 1 partition to another via xfsdump/restore takes a bit under 2 hours, or about 170MB/s, but with rsync, on the same partition with rsync transfering less than 1/1000th as much (700MB [in a differential as I mentioned above]), it took ~70-80 minutes... or about 163kB/s. Transfer speeds depend on many factors. One of the largest is transfer size (how much transfered with 1 write /read. Transferring 1GB, @ 1-meg at a time, took 2.08s read, and 1.56s to write (using direct io). Transfer it in 4K chunks: 37.28s, to read, and 43.02s to write. 1k buffers are 4x slower than that! Also in rsync, they've added the posix calls to reserve space in the target location for a file being copied in. Specifically, this is to lower disk fragmentation (does cp do anything like that, been a while since I looked). > >> If rsync wasn't so slow at local I/O...*sigh*.... > > The advantage of rsync is that it can be interrupted and restarted and > the restarted process will efficiently avoid doing work that is > already done. An interrupted and restarted cp will perform the same > work again from start to finish. ---- I wouldn't trust that it would. If you interrupt it at exactly the wrong time, I'd be afraid some file might get set with the right data but the wrong Meta info (acls, primarily). > > If I am doing a simple copy from A to B then I use 'cp -av A B'. If I > am doing it the second time then I will use rsync to avoid repeating > previously done work 'rsync -av A B'. --- Wouldn't cp -auv A B do the same? > > If I want progress indication... If I want placement of backup files > in a particular directory... If I want other fancy features that are > provided by rsync then it is worth it to use rsync. > > $ du -s coreutils > 238920 coreutils > $ find coreutils -type f | wc -l > 15013 > > $ rm -rf junk/coreutils > # echo 3 > /proc/sys/vm/drop_caches > $ time cp -a coreutils junk/ > real 1m2.137s > user 0m0.140s > sys 0m1.724s > > $ rm -rf junk/coreutils > $ time cp -a coreutils junk/ > real 0m2.492s > user 0m0.060s > sys 0m1.064s > > $ rm -rf junk/coreutils > # echo 3 > /proc/sys/vm/drop_caches > $ time rsync -a coreutils junk/ > real 1m5.473s > user 0m1.280s > sys 0m2.112s > > $ rm -rf junk/coreutils > $ time rsync -a coreutils junk/ > real 0m3.215s > user 0m1.184s > sys 0m1.536s --- By default cp -a transfers acls and ext-attrs and preserves hard links. Rsync doesn't do any of that by default. You need to use "-aHAX" to compare them ... you have to call them out as 'extra' with rsync, so the above test may not be what it seems. Though if you don't use ACL's (which I do), then maybe the above is almost reasonable. Still.. should use -aHAX Is your rsync newer? i.e. does it have the posix-pre-alloc hints?... Mine has a pre-alloc patch, but I think that was suse-added and not the one in the mainline code. Not sure. rsync --version rsync version 3.1.0 protocol version 31 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, iconv, symtimes, prealloc, SLP I don't think mine does yet... > > For normal use cp is a little faster than rsync. Or rather rsync is a > little slower than cp. But not enough to make a difference for > typical operations. Having the file system cache warmed up makes a > *HUGE* difference. Much larger than any other difference. For copies > that take hours to run I am probably going to value the restart > ability more than raw speed. YMMV. ---- I'll value the accuracy of xfsdump/restore... Throw a few TB copies at rsync -- where all the data won't fit in memory.... it also, I'm told, has problems with hardlinks, acls and xattrs slowing it down, so it may be a matter of usage... BUT all that said... note that I DO USE it... for the job I'm doing in my snapper script, nothing else will. Cheers! Linda (don't ya just love performance talk?)
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Sun, 19 Oct 2014 23:54:01 GMT) Full text and rfc822 format available.Message #53 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: Bob Proulx <bob <at> proulx.com> To: Linda Walsh <bash <at> tlinx.org> Cc: 18681 <at> debbugs.gnu.org, "Polehn, Mike A" <mike.a.polehn <at> intel.com> Subject: Re: bug#18681: cp Specific fail example Date: Sun, 19 Oct 2014 17:53:31 -0600
Linda Walsh wrote: > Bob Proulx wrote: > > Also consider that if cp were to acquire all of the enhancements > > that have been requested for cp as time has gone by then cp would > > be just as featureful (bloated!) as rsync and likely just as slow > > as rsync too. > > Nope...rsync is slow because it does everything over a client > server model --- even when it is local. So everything is written through > a pipe .. that's why it can't come close to cp -- and why cp would never > be so slow -- I can't imagine it using a pipe to copy a file anywhere! The client-server structure of rsync is required for copying between systems. Saying that cp doesn't have it isn't fair if cp were to add every requested feature. I am sure that if I search the archives I would find a request to add client-server structure to cp to support copying from system to system. :-) Now I will proactively agree that it would be nice if rsync detected that it was all running locally and didn't fork and instead ran everything in one process like cp does. But I could see that coming to rsync at some time in the future. It is an often requested feature. > > This is something to consider every time someone asks for a > > creeping feature to cp. Especially if they say they want the feature > > in cp because it is faster than rsync. The natural progression is > > that cp would become rsync. > > Not even! Note. cp already has a comparison function > built in that it uses during "cp -u"... I am not convinced of the robustness of 'cp -u ...' interrupt, repeat, interrupt repeat. It wasn't intended for that mode. I am suspicious. Is there any code path that could leave a new file in the target area that would avoid copy? Not sure. Newer meets the -u test but isn't an exact copy if the time stamp were older in the original. But with rsync I know it will correct for this during a subsequent run. > built in that it uses during "cp -u"... but it doesn't go through > pipes. It used to use larger buffer sizes or maybe tell posix > to pre-alloc the destination space, dunno, but it used to be > faster.. I can't say for certain, but it seems to be using Often the data sizes we work with grow larger over time making the same task feel slower because we are actually dealing with more data now. Files include audio. Files include video. Standard def becomes high def. "Difficult to see. Always in motion is the future." > smaller buffer sizes. Another reason rsync is so slow -- uses > a relatively small i/o size 1-4k last I looked. I've asked them > to increase it, but going through a pipe it won't help alot. Nod. Rsync was designed for the network use case. It could benefit with some tuning for the local case. A topic for the rsync list. > Also in rsync, they've added the posix calls to reserve > space in the target location for a file being copied in. > Specifically, this is to lower disk fragmentation (does > cp do anything like that, been a while since I looked). I don't know. It would be worth a look. > > The advantage of rsync is that it can be interrupted and restarted and > > the restarted process will efficiently avoid doing work that is > > already done. An interrupted and restarted cp will perform the same > > work again from start to finish. > > I wouldn't trust that it would. If you interrupt it at exactly > the wrong time, I'd be afraid some file might get set with the right > data but the wrong Meta info (acls, primarily). The design of rsync is to copy the file to a temporary name beside the intended target. After the copy the timestamps are set. After that the timestamps are set the file is renamed into place. An interrupt that happens before that rename time will cause the temporary file to be removed. An interrupt that happens after the rename is, well, after that and the copy is already done. Since rename on the local file system is atomic this is guaranteed to function robustly. (As long as you aren't using a buggy file system that changes the order of operations. That isn't cool. But of course it was famously seen in ext4 for a while. Fortunately sanity has prevailed and ext4 doesn't do that for this operation anymore. Okay to use now.) > > If I am doing a simple copy from A to B then I use 'cp -av A B'. If I > > am doing it the second time then I will use rsync to avoid repeating > > previously done work 'rsync -av A B'. > > Wouldn't cp -auv A B do the same? Do I have to go look at the source code to verify that it doesn't? :-( I assume it doesn't without looking. I assume cp copies in place. I assume that cp does not make a temporary file off to the side and rename it into place once it is done and has set the timestamps. I assume that cp copies to the named destination directly and updates the timestamps afterward. That creates a window of time when the file is in place but has not had the timestamp placed on it yet. Which means that if the cp is interrupted on a large file that it will have started the copy but will not have finished it at the moment that it is interrupted. The new file will be in place with a new timestamp. The second run with cp -u will avoid overwriting the file because the timestamp is newer. However the contents of the file will be incomplete, or at least not matching the source copy at the time of the second copy. If my assumptions in the above are wrong please correct me. I will learn something. But the operating model would need to be the same portably across all portable systems covered by posix before I would consider it actually safe to use. > > If I want progress indication... If I want placement of backup files > > in a particular directory... If I want other fancy features that are > > provided by rsync then it is worth it to use rsync. > > ...trimmed simple benchmark... > > $ time cp -a coreutils junk/ > > By default cp -a transfers acls and ext-attrs and preserves > hard links. Rsync doesn't do any of that by default. > You need to use "-aHAX" to compare them ... Good catch. :-) > you have to call them > out as 'extra' with rsync, so the above test may not be what it seems. > Though if you don't use ACL's (which I do), then maybe the above > is almost reasonable. Still.. should use -aHAX I didn't have any hard links, ACLs, or extended attributes in the test case it shouldn't matter for the above. > Is your rsync newer? i.e. does it have the posix-pre-alloc > hints?... Mine has a pre-alloc patch, but I think that was > suse-added and not the one in the mainline code. Not sure. > > rsync --version > rsync version 3.1.0 protocol version 31 > 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, > socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, > append, ACLs, xattrs, iconv, symtimes, prealloc, SLP I happened to run that test on Debian Sid and it is 3.1.1. However Debian Stable, which I have most widely deployed, has 3.0.9. So you are both ahead of and behind me at the same time. :-) > Throw a few TB copies at rsync -- where all the data > won't fit in memory.... it also, I'm told, has problems with > hardlinks, acls and xattrs slowing it down, so it may be a > matter of usage... I have had problems running rsync with -H for large data sets. Bad enough that I recommend against it. Don't do it! I don't know anything about -A and -X. But rsync -a is fine for very large data sets. > BUT all that said... note that I DO USE it... for the > job I'm doing in my snapper script, nothing else will. Yes. It is too useful to be without! > (don't ya just love performance talk?) Except that we should have moved all of this to the discussion list. I feel guilty to have continued it. We have drifted well away from the original bug report. The one with the terrible title. If this continues let's take it over to the coreutils discussion list for further conversation about it. Bob
bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Mon, 20 Oct 2014 00:05:02 GMT) Full text and rfc822 format available.Message #56 received at 18681 <at> debbugs.gnu.org (full text, mbox):
From: Bob Proulx <bob <at> proulx.com> To: "Polehn, Mike A" <mike.a.polehn <at> intel.com>, Assaf Gordon <assafgordon <at> gmail.com>, "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org> Subject: Re: bug#18681: The Linux cp command has bugs Date: Sun, 19 Oct 2014 18:04:17 -0600
close 18681 thanks Eric Blake wrote: > Polehn, Mike A wrote: > > This still left the incorrect operation of the interactive > > operation when both -i and -f is used. > > The behavior of -i vs. -f interaction is required by POSIX; in > particular, POSIX is explicit that -i and -f are NOT a toggle switch of > one another, but each turns on slightly different, somewhat overlapping, > changes in behavior (so specifying both is different from specifying one > in isolation). We can't change what either one of those flags means. This bug log included some serious topic drift of which I contributed to myself. In order to atone for that I am going to triage this as saying that the behavior is intended and standardized and therefore won't be changed. Now that we understand this the bug ticket can be closed. Further discussion can be continued and it will all be logged and read by the subscribed. > If there is another mode of operation that is also useful, then it needs > yet another flag. At one point in the past, we had > --reply={yes,no,query} to try and offer a third mode, but it had > confusing semantics and we ended up pulling it because of the confusion > it could cause. At the time we pulled it, we admitted that 'rsync' has > some modes of operations that might be better suited for the particular > modes that people people seemed to be requesting when they thought that > --reply would do the trick (and usually, what they thought --reply would > do and what it actually did were different, which is why we removed it > to avoid confusion). We have also added a --no-clobber option, which is > somewhat of a compromise (what some people thought --reply=no would do, > --no-clobber actually does better). Good summary! > So adding a new option is not out of the question, but you'd have to > have well-defined semantics of what it should do, and how it differs > from either normal mode, '-i' mode, '-f' mode, '-i -f' mode, or > '--no-clobber' mode. If the readers of this ticket think there is an enhancement request to be filed for cp then please file a wishlist bug with the proposal. A reference to this log can be made if desired. Let me suggest that the proposal first be made on the coreutils discussion list where it can be discussed and shaped and then after that has been done file a wishlist bug of the result in order to track its progress through the code and release. Bob
Bob Proulx <bob <at> proulx.com>
to control <at> debbugs.gnu.org
.
(Mon, 20 Oct 2014 00:05:03 GMT) Full text and rfc822 format available.bug-coreutils <at> gnu.org
:bug#18681
; Package coreutils
.
(Mon, 20 Oct 2014 06:21:01 GMT) Full text and rfc822 format available.Message #61 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Linda Walsh <coreutils <at> tlinx.org> To: Bob Proulx <bob <at> proulx.com> Cc: "Polehn, Mike A" <mike.a.polehn <at> intel.com>, bug-coreutils <at> gnu.org Subject: Re: bug#18681: cp Specific fail example Date: Sun, 19 Oct 2014 23:20:00 -0700
Bob Proulx wrote: > Linda Walsh wrote: >> Bob Proulx wrote: >>> Also consider that if cp were to acquire all of the enhancements >>> that have been requested for cp as time has gone by then cp would >>> be just as featureful (bloated!) as rsync and likely just as slow >>> as rsync too. >> Nope...rsync is slow because it does everything over a client >> server model --- even when it is local. So everything is written through >> a pipe .. that's why it can't come close to cp -- and why cp would never >> be so slow -- I can't imagine it using a pipe to copy a file anywhere! > > The client-server structure of rsync is required for copying between > systems. Saying that cp doesn't have it isn't fair if cp were to add > every requested feature. --- cp was designed for local->local copy. rsync was designed for local->remote synchronization (thus 'r(emote) sync'. Saying it isn't fair to compare code quality between a java->'native code compiler' and a compiler developed for a native platform is entirely fair -- because both started out with different design goals -- thus each ends up with pluses and minus that are an effect of that goal. If you claim comparing such effects isn't fair, then it's not fair to compare any different algorithm with another because algorithms inherently have their pluses and minuses and are often chosen for use in a particular situation because of those pluses and minuses. So lets compare using 'cp' with rsync in copying a remote file. The choice of tools depends on the quality of the remote connection, but in most remote connections, "today", reliability isn't usually an issue as they flow over TCP and file transfer protocols like NFS or CIFS also have checks to allow users to reconnect after an interruption (like a machine reboot). Depending on timeout settings, 'cp' already has a restart over-remove ability when used with NFS or CIFS. CIFS doesn't tolerate a system reboot in the middle of a copy, whereas NFS can recover from such if the client uses hard mounts. But for a local network, I regularly use 'cp' with CIFS and it does a faster job than rsync -- over a reliable local network. > I am sure that if I search the archives I > would find a request to add client-server structure to cp to support > copying from system to system. :-) ---- We are comparing where the tools are at _not_ where they _could_ have been had previous algorithm choices been ignored. We are talking about a local->local copy (in the base note), so glossing over the slowness of rsync in doing such is entirely fair. If you want some level of recovery after interrupt, NFS is a better choice for a local network -- client connections can continue even after a server reboot. But if we are talking local->local reliability, the simple, close solution would be SMB/CIFS. Using a 1GB file as an example (and throwing in a 'dd' for for comparison): > time rsync 1G ishtar:/home/law/1G 20.13sec 1.29usr 2.68sys (19.73% cpu) > time cp 1G /h/. 6.94sec 0.01usr 1.10sys (16.16% cpu) > time dd if=1G of=/h/1G bs=256M oflag=direct 4+0 records in 4+0 records out 1073741824 bytes (1.1 GB) copied, 3.4694 s, 309 MB/s 3.50sec 0.00usr 0.51sys (14.64% cpu) Here again, we see rsync doing the same job of cp taking about 3x the time. For a single file over a local net 'dd' is a better bet. > > Now I will proactively agree that it would be nice if rsync detected > that it was all running locally and didn't fork and instead ran > everything in one process like cp does. But I could see that coming > to rsync at some time in the future. It is an often requested > feature. --- For many years. >>> This is something to consider every time someone asks for a >>> creeping feature to cp. Especially if they say they want the feature >>> in cp because it is faster than rsync. The natural progression is >>> that cp would become rsync. >> Not even! Note. cp already has a comparison function >> built in that it uses during "cp -u"... > > I am not convinced of the robustness of 'cp -u ...' interrupt, repeat, > interrupt repeat. It wasn't intended for that mode. --- Neither is rsync in its default mode. It compares timestamps and size, nothing more. I'd be suspicious of either rsync OR cp's chances in such a situation. But USUALLY, people don't interrupt a copy many times -- or even once, so cp is usually faster... > Is there any code path that could leave a new file in the target area > that would avoid copy? Not sure. Newer meets the -u test but isn't > an exact copy if the time stamp were older in the original. But with > rsync I know it will correct for this during a subsequent run. --- Not necessarily. It doesn't do checksumming by default. Certainly, if you used rsync with '-u', rsync will not be much better in recovery, since target files with more recent timestamps may be left in the target dir. I don't think rsync or cp trap a control-c-abort to cleanup target files. > >> built in that it uses during "cp -u"... but it doesn't go through >> pipes. It used to use larger buffer sizes or maybe tell posix >> to pre-alloc the destination space, dunno, but it used to be >> faster.. I can't say for certain, but it seems to be using > > Often the data sizes we work with grow larger over time making the > same task feel slower because we are actually dealing with more data > now. --- I was comparing copy times with same files, not from years ago to now. >> Another reason rsync is so slow -- uses >> a relatively small i/o size 1-4k last I looked. I've asked them >> to increase it, but going through a pipe it won't help alot. > > Nod. Rsync was designed for the network use case. It could benefit > with some tuning for the local case. A topic for the rsync list. --- Been there, done that. Still comparing current-to-current, not hypotheticals. > >> Also in rsync, they've added the posix calls to reserve >> space in the target location for a file being copied in. >> Specifically, this is to lower disk fragmentation (does >> cp do anything like that, been a while since I looked). > > I don't know. It would be worth a look. > >>> The advantage of rsync is that it can be interrupted and restarted and >>> the restarted process will efficiently avoid doing work that is >>> already done. An interrupted and restarted cp will perform the same >>> work again from start to finish. >> I wouldn't trust that it would. If you interrupt it at exactly >> the wrong time, I'd be afraid some file might get set with the right >> data but the wrong Meta info (acls, primarily). > > The design of rsync is to copy the file to a temporary name beside the > intended target. After the copy the timestamps are set. After that > the timestamps are set the file is renamed into place. An interrupt > that happens before that rename time will cause the temporary file to > be removed. An interrupt that happens after the rename is, well, > after that and the copy is already done. Since rename on the local > file system is atomic this is guaranteed to function robustly. (As > long as you aren't using a buggy file system that changes the order of > operations. That isn't cool. But of course it was famously seen in > ext4 for a while. Fortunately sanity has prevailed and ext4 doesn't > do that for this operation anymore. Okay to use now.) > >>> If I am doing a simple copy from A to B then I use 'cp -av A B'. If I >>> am doing it the second time then I will use rsync to avoid repeating >>> previously done work 'rsync -av A B'. >> Wouldn't cp -auv A B do the same? > > Do I have to go look at the source code to verify that it doesn't? :-( --- My timing says cp is 20x faster for that 1G file case. It also shows that rsync doesn't use a tmp file in the update case > time cp -au 1G /h 0.03sec 0.00usr 0.03sys (79.47% cpu) > cp -au 1G /h > time rsync -au 1G ishtar:/home/law/1G 0.60sec 0.06usr 0.09sys (25.12% cpu) > > I assume it doesn't without looking. I assume cp copies in place. I > assume that cp does not make a temporary file off to the side and > rename it into place once it is done and has set the timestamps. --- I assume rsync doesn't either -- if it is comparing against a file already in place, for it to transfer the whole file... nope. I > assume that cp copies to the named destination directly and updates > the timestamps afterward. That creates a window of time when the file > is in place but has not had the timestamp placed on it yet. > > Which means that if the cp is interrupted on a large file that it will > have started the copy but will not have finished it at the moment that > it is interrupted. The new file will be in place with a new > timestamp. The second run with cp -u will avoid overwriting the file > because the timestamp is newer. However the contents of the file will > be incomplete, or at least not matching the source copy at the time of > the second copy. > > If my assumptions in the above are wrong please correct me. I will > learn something. But the operating model would need to be the same > portably across all portable systems covered by posix before I would > consider it actually safe to use. --- Same happens in rsync -- no tmp file is involved. It compares time stamps and doesn't copy. > >>> If I want progress indication... If I want placement of backup files >>> in a particular directory... If I want other fancy features that are >>> provided by rsync then it is worth it to use rsync. >>> ...trimmed simple benchmark... >>> $ time cp -a coreutils junk/ >> By default cp -a transfers acls and ext-attrs and preserves >> hard links. Rsync doesn't do any of that by default. >> You need to use "-aHAX" to compare them ... > > Good catch. :-) > >> you have to call them >> out as 'extra' with rsync, so the above test may not be what it seems. >> Though if you don't use ACL's (which I do), then maybe the above >> is almost reasonable. Still.. should use -aHAX > > I didn't have any hard links, ACLs, or extended attributes in the test > case it shouldn't matter for the above. > >> Is your rsync newer? i.e. does it have the posix-pre-alloc >> hints?... Mine has a pre-alloc patch, but I think that was >> suse-added and not the one in the mainline code. Not sure. >> >> rsync --version >> rsync version 3.1.0 protocol version 31 >> 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, >> socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, >> append, ACLs, xattrs, iconv, symtimes, prealloc, SLP > > I happened to run that test on Debian Sid and it is 3.1.1. However > Debian Stable, which I have most widely deployed, has 3.0.9. So you > are both ahead of and behind me at the same time. :-) > >> Throw a few TB copies at rsync -- where all the data >> won't fit in memory.... it also, I'm told, has problems with >> hardlinks, acls and xattrs slowing it down, so it may be a >> matter of usage... > > I have had problems running rsync with -H for large data sets. Bad > enough that I recommend against it. Don't do it! I don't know > anything about -A and -X. But rsync -a is fine for very large data > sets. ---- But then you can't compare to 'cp' which does handle that case. >> (don't ya just love performance talk?) > > Except that we should have moved all of this to the discussion list. --- :-( ?discussion list? -- bugs-coreutils? (don't know about others)... 'sides, I didn't bring up rsync, all I added was "If rsync wasn't so slow at local I/O...*sigh*.... " Its good for when you need "diffs", but not as a general replacement for 'cp'.
Debbugs Internal Request <help-debbugs <at> gnu.org>
to internal_control <at> debbugs.gnu.org
.
(Mon, 17 Nov 2014 12:24:04 GMT) Full text and rfc822 format available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.