GNU bug report logs - #18681
The Linux cp command has bugs

Reported by: "Polehn, Mike A" <mike.a.polehn <at> intel.com>

Date: Fri, 10 Oct 2014 17:30:02 UTC

Severity: normal

Done: Bob Proulx <bob <at> proulx.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18681 in the body.
You can then email your comments to 18681 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Fri, 10 Oct 2014 17:30:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Polehn, Mike A" <mike.a.polehn <at> intel.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Fri, 10 Oct 2014 17:30:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Polehn, Mike A" <mike.a.polehn <at> intel.com>
To: "bug-coreutils <at> gnu.org" <bug-coreutils <at> gnu.org>
Subject: The Linux cp command has bugs
Date: Fri, 10 Oct 2014 17:25:23 +0000

cp --version:	8.21

running on Fedora 20, version 3.16.3-200.fc20.x86_64 with latest updates 

The Linux copy command (cp) has problems 

Problem need to copy a tree of 1000s of files to another directory that is a git directory that has a whole bunch of additional build files, so diff between the directories will not do any good.

If the files are copied over the git directory I can do what I need to do, since I need to see if there are in differences in any of the files.

Using: cp -f -r <dir a> <dir b>

For each file being copied it asked:

cp: overwrite XXXXXXXXXXXXXXXXX?

So the force command does not work, since it should skip the asking about doing an overwrite. If the force command is supposed act differently, then there should be an additional argument because answering yes 1000s of times is not very smart... 

Also since there are a lot of files, if I accidently hit return before y, cp moves on to the next file, which implies to me that the file was not copied, which gets to be a problem when 1000s of files are copied. I also assumed that 'y' implies the data was copied.

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Fri, 10 Oct 2014 18:03:01 GMT) Full text and rfc822 format available.

Message #8 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: "Polehn, Mike A" <mike.a.polehn <at> intel.com>
To: "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org>
Subject: cp Specific fail example
Date: Fri, 10 Oct 2014 18:01:59 +0000

######### get and check out version

[root <at> F20-v3 ~]# cd /usr/src
[root <at> F20-v3 src]# git clone git://dpdk.org/dpdk
[root <at> F20-v3 src]# cd dpdk

[root <at> F20-v3 dpdk]# git tag
v1.2.3r0
v1.2.3r1
v1.2.3r2
v1.2.3r3
v1.2.3r4
v1.3.0r0
v1.3.1r0
v1.3.1r1
v1.3.1r2
v1.3.1r3
v1.4.0r0
v1.4.1r0
v1.4.1r1
v1.4.1r2
v1.5.0r0
v1.5.0r1
v1.5.0r2
v1.5.1r0
v1.5.1r1
v1.5.1r2
v1.5.2r0
v1.5.2r1
v1.5.2r2
v1.6.0r0
v1.6.0r1
v1.6.0r2
v1.7.0
v1.7.0-rc1
v1.7.0-rc2
v1.7.0-rc3
v1.7.0-rc4
v1.7.1
v1.8.0-rc1


[root <at> F20-test dpdk]# git checkout -b map_v1.7.1 v1.7.1
Switched to a new branch 'map_v1.7.1'

### download dpdk 1.7.1 files from http://dpdk.org/download
### put in /usr/src directory and untar:


[root <at> F20-v3 src]# tar -xf dpdk-1.7.1.tar.gz

[root <at> F20-v3 src]# dir
dpdk        dpdk-1.7.1	  dpdk-1.7.1.tar.gz 


[root <at> F20-v3 src]# cp -f -r dpdk-1.7.1/* dpdk/
cp: overwrite âdpdk/app/test/test_lpm6.câ? y
cp: overwrite âdpdk/app/test/test_rwlock.câ? y
cp: overwrite âdpdk/app/test/test_table_ports.hâ? y
cp: overwrite âdpdk/app/test/test_logs.câ? y
cp: overwrite âdpdk/app/test/test_pmd_ring.câ? y
cp: overwrite âdpdk/app/test/test_table_tables.hâ?
cp: overwrite âdpdk/app/test/test_lpm.câ?
cp: overwrite âdpdk/app/test/test_malloc.câ?
cp: overwrite âdpdk/app/test/test_errno.câ? y
cp: overwrite âdpdk/app/test/test_hash.câ?
cp: overwrite âdpdk/app/test/test_table_acl.hâ? y

note: asking question on each file and moving to next file even when not entering n or y

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Fri, 10 Oct 2014 18:14:02 GMT) Full text and rfc822 format available.

Message #11 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Polehn, Mike A" <mike.a.polehn <at> intel.com>, 18681 <at> debbugs.gnu.org
Subject: Re: bug#18681: The Linux cp command has bugs
Date: Fri, 10 Oct 2014 11:13:44 -0700

Polehn, Mike A wrote:
> Using: cp -f -r <dir a> <dir b>
>
> For each file being copied it asked:
>
> cp: overwrite XXXXXXXXXXXXXXXXX?

That's not what I observe here (see below).  Perhaps there's something else 
going on, maybe an alias.  For example, I couldn't get the cp to work without 
also using -T.  Can you please give an exact recipe for reproducing the problem 
on your platform?

$ mkdir a b
$ echo a >a/f
$ echo b >b/f
$ cp -f -r -T a b
$ cat b/f
a

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Fri, 10 Oct 2014 18:18:01 GMT) Full text and rfc822 format available.

Message #14 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Polehn, Mike A" <mike.a.polehn <at> intel.com>, 
 "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org>
Subject: Re: bug#18681: cp Specific fail example
Date: Fri, 10 Oct 2014 11:17:34 -0700

I do not observe the symptoms that you report.  See below.  My guess is that 
you've aliased 'cp' to 'cp -i', which is probably a mistake.

$ git clone git://dpdk.org/dpdk
Cloning into 'dpdk'...
remote: Counting objects: 16249, done.
remote: Compressing objects: 100% (3976/3976), done.
remote: Total 16249 (delta 12964), reused 15109 (delta 12122)
Receiving objects: 100% (16249/16249), 12.79 MiB | 1.20 MiB/s, done.
Resolving deltas: 100% (12964/12964), done.
Checking connectivity... done.
$ cd dpdk
$ git checkout -b map_v1.7.1 v1.7.1
Switched to a new branch 'map_v1.7.1'
$ pwd
/tmp/d/dpdk
$ cd ..
$ wget http://dpdk.org/browse/dpdk/snapshot/dpdk-1.7.1.tar.gz
--2014-10-10 11:15:44--  http://dpdk.org/browse/dpdk/snapshot/dpdk-1.7.1.tar.gz
Resolving dpdk.org (dpdk.org)... 92.243.14.124
Connecting to dpdk.org (dpdk.org)|92.243.14.124|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘dpdk-1.7.1.tar.gz’

    [                   <=>                 ] 8,281,609   1.17MB/s   in 7.5s

2014-10-10 11:15:52 (1.06 MB/s) - ‘dpdk-1.7.1.tar.gz’ saved [8281609]

$ tar -xf dpdk-1.7.1.tar.gz
$ cp -f -r dpdk-1.7.1/* dpdk/
$

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Fri, 10 Oct 2014 19:14:02 GMT) Full text and rfc822 format available.

Message #17 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: "Polehn, Mike A" <mike.a.polehn <at> intel.com>, 18681 <at> debbugs.gnu.org
Subject: Re: bug#18681: The Linux cp command has bugs
Date: Fri, 10 Oct 2014 15:13:44 -0400

Hello Mike,

On 10/10/2014 01:25 PM, Polehn, Mike A wrote:>
> Problem need to copy a tree of 1000s of files to another directory
> that is a git directory that has a whole bunch of additional build
> files, so diff between the directories will not do any good.
>

This is slightly off-topic, but if you want to compare only files managed by git (ignoring other files in current directory), perhaps the following would help:

    # Download and extract the tarball
    wget -q http://dpdk.org/browse/dpdk/snapshot/dpdk-1.7.1.tar.gz
    tar -xf dpdk-1.7.1.tar.gz

    # Clone the git repo with specific branch, checkout the relevant branch
    # (or go to an existing checked-out repository directory)
    git clone git://dpdk.org/dpdk
    cd dpdk
    git checkout -b map_v1.7.1 v1.7.1

    # For each file managed by git (with 'git ls'),
    # compare it to the corresponding file in the other directory:
    git ls -0 | xargs -0 -I% diff -q % ../dpdk-1.7.1/%


Regards,
 -gordon

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Fri, 10 Oct 2014 19:16:01 GMT) Full text and rfc822 format available.

Message #20 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: "Polehn, Mike A" <mike.a.polehn <at> intel.com>, 18681 <at> debbugs.gnu.org
Subject: Re: bug#18681: The Linux cp command has bugs
Date: Fri, 10 Oct 2014 15:15:18 -0400

Sorry, had a typo:

On 10/10/2014 03:13 PM, Assaf Gordon wrote:
>      # For each file managed by git (with 'git ls'),
>      # compare it to the corresponding file in the other directory:
>      git ls -0 | xargs -0 -I% diff -q % ../dpdk-1.7.1/%
>

Should be:
 git ls -z | xargs -0 -I% diff -q % ../dpdk-1.7.1/%

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Fri, 10 Oct 2014 19:47:02 GMT) Full text and rfc822 format available.

Message #23 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: "Polehn, Mike A" <mike.a.polehn <at> intel.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, "18681 <at> debbugs.gnu.org"
 <18681 <at> debbugs.gnu.org>
Subject: RE: bug#18681: cp Specific fail example
Date: Fri, 10 Oct 2014 19:46:25 +0000

Hi Paul!

Thank you for your quick response!

You were logged in as a normal user.
I was logged in as root.

I tried as normal user and it worked the same as you.

However, logged in as root and the error occurred as before.

Did a search for 'cp -I' and found it for root:

[root <at> F20-v3 ~]# find /root -type f -print |xargs grep -Hn "cp i"
/root/.bashrc:6:alias cp='cp -i'
/root/.cshrc:6:alias cp 'cp -i'
/root/.tcshrc:6:alias cp 'cp -i'
[root <at> F20-v3 ~]# find /etc -type f -print |xargs grep -Hn "cp i"
[root <at> F20-v3 ~]# find /home/mike -type f -print |xargs grep -Hn "cp i"
/home/mike/dpdk-1.7.1/examples/vhost/main.c:2456:               "mbuf_destroy_zcp is: %d\n",
/home/mike/dpdk-1.7.1/examples/vhost/main.c:2474:               "mbuf_destroy_zcp is: %d\n",
/home/mike/dpdk-1.7.1/examples/vhost/main.c:2478:               "mbuf_destroy_zcp is : %d\n",
/home/mike/dpdk/examples/vhost/main.c:2456:             "mbuf_destroy_zcp is: %d\n",
/home/mike/dpdk/examples/vhost/main.c:2474:             "mbuf_destroy_zcp is: %d\n",
/home/mike/dpdk/examples/vhost/main.c:2478:             "mbuf_destroy_zcp is : %d\n",

But there is still an error for interactive:

[root <at> F20-v3 src]# cp -f -r dpdk-1.7.1/* dpdk/
cp: overwrite âdpdk/app/test/test_lpm6.câ? y
cp: overwrite âdpdk/app/test/test_rwlock.câ? y
cp: overwrite âdpdk/app/test/test_table_ports.hâ? y
cp: overwrite âdpdk/app/test/test_logs.câ? y
cp: overwrite âdpdk/app/test/test_pmd_ring.câ? y
cp: overwrite âdpdk/app/test/test_table_tables.hâ?
cp: overwrite âdpdk/app/test/test_lpm.câ?
cp: overwrite âdpdk/app/test/test_malloc.câ?
cp: overwrite âdpdk/app/test/test_errno.câ? y
cp: overwrite âdpdk/app/test/test_hash.câ?
cp: overwrite âdpdk/app/test/test_table_acl.hâ? y

Didn't answer yes or no for some of these and they moved on anyway, indicating the interactive mode is not operating as expected.

It is a good idea as root not to be overwriting files, so I can understand the "cp -i" usage for root.

However, some of the reason for using root is to do something that you may not be able to do as a normal user. So being able to override the -i with a -f would be highly desirable.

Mike

-----Original Message-----
From: Paul Eggert [mailto:eggert <at> cs.ucla.edu] 
Sent: Friday, October 10, 2014 11:18 AM
To: Polehn, Mike A; 18681 <at> debbugs.gnu.org
Subject: Re: bug#18681: cp Specific fail example

I do not observe the symptoms that you report.  See below.  My guess is that you've aliased 'cp' to 'cp -i', which is probably a mistake.

$ git clone git://dpdk.org/dpdk
Cloning into 'dpdk'...
remote: Counting objects: 16249, done.
remote: Compressing objects: 100% (3976/3976), done.
remote: Total 16249 (delta 12964), reused 15109 (delta 12122) Receiving objects: 100% (16249/16249), 12.79 MiB | 1.20 MiB/s, done.
Resolving deltas: 100% (12964/12964), done.
Checking connectivity... done.
$ cd dpdk
$ git checkout -b map_v1.7.1 v1.7.1
Switched to a new branch 'map_v1.7.1'
$ pwd
/tmp/d/dpdk
$ cd ..
$ wget http://dpdk.org/browse/dpdk/snapshot/dpdk-1.7.1.tar.gz
--2014-10-10 11:15:44--  http://dpdk.org/browse/dpdk/snapshot/dpdk-1.7.1.tar.gz
Resolving dpdk.org (dpdk.org)... 92.243.14.124 Connecting to dpdk.org (dpdk.org)|92.243.14.124|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip] Saving to: ‘dpdk-1.7.1.tar.gz’

     [                   <=>                 ] 8,281,609   1.17MB/s   in 7.5s

2014-10-10 11:15:52 (1.06 MB/s) - ‘dpdk-1.7.1.tar.gz’ saved [8281609]

$ tar -xf dpdk-1.7.1.tar.gz
$ cp -f -r dpdk-1.7.1/* dpdk/
$

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Fri, 10 Oct 2014 20:02:01 GMT) Full text and rfc822 format available.

Message #26 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: "Polehn, Mike A" <mike.a.polehn <at> intel.com>
To: Assaf Gordon <assafgordon <at> gmail.com>, "18681 <at> debbugs.gnu.org"
 <18681 <at> debbugs.gnu.org>
Subject: RE: bug#18681: The Linux cp command has bugs
Date: Fri, 10 Oct 2014 20:00:52 +0000

Hi Assaf!

Thank you for your quick response!

There is always multiple ways to do things. The git tool has a diff tool built in that makes file comparison easy.

I have run across multiple times that copying one tree over another is desirable.

In another bug message thread, we found that the cause was cp alias to 'cp -i' for root user was the actual cause.

This still left the incorrect operation of the interactive operation when both -i and -f is used.

I think that in some cases the need of override the '-i' with '-f' maybe very desirable. So maybe having the '-f' cancel or override the '-i' operation might be a good change.

Thanks!
Mike

-----Original Message-----
From: Assaf Gordon [mailto:assafgordon <at> gmail.com] 
Sent: Friday, October 10, 2014 12:14 PM
To: Polehn, Mike A; 18681 <at> debbugs.gnu.org
Subject: Re: bug#18681: The Linux cp command has bugs

Hello Mike,

On 10/10/2014 01:25 PM, Polehn, Mike A wrote:>
> Problem need to copy a tree of 1000s of files to another directory 
> that is a git directory that has a whole bunch of additional build 
> files, so diff between the directories will not do any good.
>

This is slightly off-topic, but if you want to compare only files managed by git (ignoring other files in current directory), perhaps the following would help:

     # Download and extract the tarball
     wget -q http://dpdk.org/browse/dpdk/snapshot/dpdk-1.7.1.tar.gz
     tar -xf dpdk-1.7.1.tar.gz

     # Clone the git repo with specific branch, checkout the relevant branch
     # (or go to an existing checked-out repository directory)
     git clone git://dpdk.org/dpdk
     cd dpdk
     git checkout -b map_v1.7.1 v1.7.1

     # For each file managed by git (with 'git ls'),
     # compare it to the corresponding file in the other directory:
     git ls -0 | xargs -0 -I% diff -q % ../dpdk-1.7.1/%

Regards,
  -gordon

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Fri, 10 Oct 2014 20:56:02 GMT) Full text and rfc822 format available.

Message #29 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: "Polehn, Mike A" <mike.a.polehn <at> intel.com>,
 Assaf Gordon <assafgordon <at> gmail.com>,
 "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org>
Subject: Re: bug#18681: The Linux cp command has bugs
Date: Fri, 10 Oct 2014 14:54:59 -0600

[Message part 1 (text/plain, inline)]

On 10/10/2014 02:00 PM, Polehn, Mike A wrote:

> This still left the incorrect operation of the interactive operation when both -i and -f is used.

The behavior of -i vs. -f interaction is required by POSIX; in
particular, POSIX is explicit that -i and -f are NOT a toggle switch of
one another, but each turns on slightly different, somewhat overlapping,
changes in behavior (so specifying both is different from specifying one
in isolation). We can't change what either one of those flags means.
If there is another mode of operation that is also useful, then it needs
yet another flag. At one point in the past, we had
--reply={yes,no,query} to try and offer a third mode, but it had
confusing semantics and we ended up pulling it because of the confusion
it could cause. At the time we pulled it, we admitted that 'rsync' has
some modes of operations that might be better suited for the particular
modes that people people seemed to be requesting when they thought that
--reply would do the trick (and usually, what they thought --reply would
do and what it actually did were different, which is why we removed it
to avoid confusion). We have also added a --no-clobber option, which is
somewhat of a compromise (what some people thought --reply=no would do,
--no-clobber actually does better).

So adding a new option is not out of the question, but you'd have to
have well-defined semantics of what it should do, and how it differs
from either normal mode, '-i' mode, '-f' mode, '-i -f' mode, or
'--no-clobber' mode.

--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Fri, 10 Oct 2014 23:37:01 GMT) Full text and rfc822 format available.

Message #32 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: "Polehn, Mike A" <mike.a.polehn <at> intel.com>
Cc: 18681 <at> debbugs.gnu.org
Subject: Re: bug#18681: cp Specific fail example
Date: Fri, 10 Oct 2014 17:36:16 -0600

Polehn, Mike A wrote:
> Did a search for 'cp -I' and found it for root:
> 
> [root <at> F20-v3 ~]# find /root -type f -print |xargs grep -Hn "cp i"
> /root/.bashrc:6:alias cp='cp -i'
> /root/.cshrc:6:alias cp 'cp -i'
> /root/.tcshrc:6:alias cp 'cp -i'

It might be easier to guess that there is an alias and look for it. :-)

  # alias cp
  alias cp='cp -i'

  # type cp
  ls is aliased to `cp -i'

> But there is still an error for interactive:
> 
> [root <at> F20-v3 src]# cp -f -r dpdk-1.7.1/* dpdk/

Since you know that "cp" in the above is "cp -i" then you know the
command is actually "cp -i -f -r dpdk-1.7.1/* dpdk/" which you don't
want there.  Try it without the alias in play.

The normal way in a /bin/sh derived environment is to simply quote the
command.  If you quote the command then it won't do alias expansion.
The usual method of quoting is with a backslash.

  # \cp -f -r dpdk-1.7.1/* dpdk/

However the canonical method is to use "env" since the above doesn't
work in csh derived shells.  Therefore you will find suggestions to
use env to wrap the command and avoid alias expansion like this.  It
is often offered when we don't know if you are using a sh or csh
derived command line shell.  (This env trick is one I learned on this
list some years ago.)

  # env cp -f -r dpdk-1.7.1/* dpdk/

And of course you can always unalias the command too.

  # unalias cp

> It is a good idea as root not to be overwriting files, so I can
> understand the "cp -i" usage for root.

Personally I simply realize that the tools are sharp kitchen knives
and I always handle sharp kitchen knives carefully.  Trying to put
safety shields on them simply gets in the way.  It tends to cause
problems such as you are seeing here.  I usually remove those aliases
on systems I administer.

> However, some of the reason for using root is to do something that
> you may not be able to do as a normal user. So being able to
> override the -i with a -f would be highly desirable.

Right.  And you can.  You have the power.  Just do it.  By avoiding
the alias with \cp (or the env trick) and then you won't have the -i
in play.  Or remove the alias from the environment.

There is the burden upon the root superuser that they have great
power.  With great power comes great responsibility.  Being root means
you are a pilot not a passenger.  There is an old saying in flying,
"Fly the airplane.  Don't let the airplane fly you."  Hopefully the
meaning is obvious even to the non-pilot.

Meanwhile...  I would be one of those suggesting that perhaps you
should try using rsync instead of cp.  The cp command is lean and mean
by comparison to rsync (and should stay that way).  But rsync has many
attractive features for doing large copies.

Bob

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Fri, 10 Oct 2014 23:50:02 GMT) Full text and rfc822 format available.

Message #35 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: "Polehn, Mike A" <mike.a.polehn <at> intel.com>
Cc: "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org>,
 Assaf Gordon <assafgordon <at> gmail.com>
Subject: Re: bug#18681: The Linux cp command has bugs
Date: Sat, 11 Oct 2014 08:48:54 +0900

Hi Polehn,

The -f option isn't `suppress interactive' in cp.  It attempts to unlink
a destination not to be able to override.  It's different from the option
in mv.

As the behavior is clearly defined in POSIX as Eric says, we won't be
able to change it.

BTW, I don't like the alias `cp -i'.  So I remove it from .bashrc always
immediately after an installation of a distribution. (^_^)

If you temporarily want to cancel the the alias, you can define an another
alias as `cpf', and/or can use below instead of `cp'

  - command cp -f
  - /bin/cp -f
  - ( unalias cp; cp -f ... )

Even if add new option `-F' to supress interactive to cp, we need to use
-F for cp and -f for mv to do it.

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Sat, 11 Oct 2014 00:39:02 GMT) Full text and rfc822 format available.

Message #38 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: Jon Stanley <jonstanley <at> gmail.com>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org>,
 Assaf Gordon <assafgordon <at> gmail.com>, "Polehn,
 Mike A" <mike.a.polehn <at> intel.com>
Subject: Re: bug#18681: The Linux cp command has bugs
Date: Fri, 10 Oct 2014 20:38:06 -0400

On Fri, Oct 10, 2014 at 7:48 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> If you temporarily want to cancel the the alias, you can define an another
> alias as `cpf', and/or can use below instead of `cp'

Note that (in bash at least) you can prefix the command with a
backslash (\) to override an alias for that invocation, and is what I
typically do:

$ \cp <blah>

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Sat, 11 Oct 2014 17:28:02 GMT) Full text and rfc822 format available.

Message #41 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <bash <at> tlinx.org>
To: Bob Proulx <bob <at> proulx.com>
Cc: 18681 <at> debbugs.gnu.org, "Polehn, Mike A" <mike.a.polehn <at> intel.com>
Subject: Re: bug#18681: cp Specific fail example
Date: Sat, 11 Oct 2014 02:24:31 -0700

Bob Proulx wrote:
> Meanwhile...  I would be one of those suggesting that perhaps you
> should try using rsync instead of cp.  The cp command is lean and mean
> by comparison to rsync (and should stay that way).  But rsync has many
> attractive features for doing large copies.
>   
---- fwiw...---
Like large execution times... from the latest snapshot on my system --
I use rsync to only move differences between  yesterday and "today[whenever
new snap is taken]"... it was a larger than normal snap -- most only
take 75-90 minutes...but rsync (these are the script messages) with some
debugging output still turned on... even an rm over the resulting diff
took 101 seconds... then cp comes along.. even w/a sync it would
still be under a minute.

I.e. rsync copied just the diffs to "/home.diff", then
find with "-empty -delete" is used to get rid of empty dirs (rsync
creates many of these).  then a static partition is created to hold
the "diff" output -- and cp took walked and copied the tree in 12s.
(output wasn't flushed, but it's not that long.. <a minute...).

If rsync wasn't so slow at local I/O...*sigh*....

rsync took 110m, 14s
Empty-directory removal took 1m, 41s
Find used space for /home.diff...sz=4.3GB, min=5.4GB, extsz=4.0MB, 
n-ext'=1388
target extents num=1388, size=4.0M
Old volume active:  Deactivated. Removed.
Create vol. Home-2014.10.08-03.07.05, size 5.4G
{L=>141008030705, /dev/Data/Home-2014.10.08-03.07.05=>CODE(0xbf24a0), 
f=>CODE(0xbf24e8), d=>{su=>"64k", sw=>1}, i=>{maxpct=>10, size=>256}, 
s=>{size=>4096}}
About to copy base-diff dir to static
Copying diffs to dated static snap...Time: 0m, 12s.
mklabel@ 
/home/.snapdir/@GMT-2014.10.08-03.07.05/./._snapdat_=snap_copy_complete
after copy2staticsnap: complete

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Mon, 13 Oct 2014 00:55:01 GMT) Full text and rfc822 format available.

Message #44 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Linda Walsh <bash <at> tlinx.org>
Cc: 18681 <at> debbugs.gnu.org, "Polehn, Mike A" <mike.a.polehn <at> intel.com>
Subject: Re: bug#18681: cp Specific fail example
Date: Sun, 12 Oct 2014 18:54:03 -0600

Linda Walsh wrote:
> Bob Proulx wrote:
> > Meanwhile...  I would be one of those suggesting that perhaps you
> > should try using rsync instead of cp.  The cp command is lean and
> > mean by comparison to rsync (and should stay that way).  But rsync
> > has many attractive features for doing large copies.
>
> ---- fwiw...---
> Like large execution times... from the latest snapshot on my system --
> I use rsync to only move differences between  yesterday and "today[whenever
> new snap is taken]"... it was a larger than normal snap -- most only
> take 75-90 minutes...but rsync (these are the script messages) with some
> debugging output still turned on... even an rm over the resulting diff
> took 101 seconds... then cp comes along.. even w/a sync it would
> still be under a minute.

Wow.  Just to be clear an rsync copy took 75 to 90 minutes but a cp
copy took less than 1 minute?  I find that very suspicious.  I never
see that much difference between them.  Are you sure the difference
wasn't that the data was cached into ram by the rsync and therefore
the second run with cp just ran with the warmed up cache?  With a
large data set and a large ram that is plausible.

> I.e. rsync copied just the diffs to "/home.diff", then
> find with "-empty -delete" is used to get rid of empty dirs (rsync
> creates many of these).  then a static partition is created to hold
> the "diff" output -- and cp took walked and copied the tree in 12s.
> (output wasn't flushed, but it's not that long.. <a minute...).

It appears that you are using features from rsync that do not exist in
cp.  Therefore the work being done in the task isn't equivalent work.
In that case it is probably quite reasonable for rsync to be slower
than cp.

Also consider that if cp were to acquire all of the enhancements that
have been requested for cp as time has gone by then cp would be just
as featureful (bloated!) as rsync and likely just as slow as rsync
too.  This is something to consider every time someone asks for a
creeping feature to cp.  Especially if they say they want the feature
in cp because it is faster than rsync.  The natural progression is
that cp would become rsync.

> If rsync wasn't so slow at local I/O...*sigh*....

The advantage of rsync is that it can be interrupted and restarted and
the restarted process will efficiently avoid doing work that is
already done.  An interrupted and restarted cp will perform the same
work again from start to finish.

If I am doing a simple copy from A to B then I use 'cp -av A B'.  If I
am doing it the second time then I will use rsync to avoid repeating
previously done work 'rsync -av A B'.

If I want progress indication...  If I want placement of backup files
in a particular directory...  If I want other fancy features that are
provided by rsync then it is worth it to use rsync.

  $ du -s coreutils
  238920  coreutils
  $ find coreutils -type f | wc -l
  15013

  $ rm -rf junk/coreutils
  # echo 3 > /proc/sys/vm/drop_caches
  $ time cp -a coreutils junk/
  real    1m2.137s
  user    0m0.140s
  sys     0m1.724s

  $ rm -rf junk/coreutils
  $ time cp -a coreutils junk/
  real    0m2.492s
  user    0m0.060s
  sys     0m1.064s

  $ rm -rf junk/coreutils
  # echo 3 > /proc/sys/vm/drop_caches
  $ time rsync -a coreutils junk/
  real    1m5.473s
  user    0m1.280s
  sys     0m2.112s

  $ rm -rf junk/coreutils
  $ time rsync -a coreutils junk/
  real    0m3.215s
  user    0m1.184s
  sys     0m1.536s

For normal use cp is a little faster than rsync.  Or rather rsync is a
little slower than cp.  But not enough to make a difference for
typical operations.  Having the file system cache warmed up makes a
*HUGE* difference.  Much larger than any other difference.  For copies
that take hours to run I am probably going to value the restart
ability more than raw speed.  YMMV.

Bob

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Mon, 13 Oct 2014 02:15:02 GMT) Full text and rfc822 format available.

Message #47 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: Leslie S Satenstein <lsatenstein <at> yahoo.com>
To: "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org>,
 "Polehn, Mike A" <mike.a.polehn <at> intel.com>
Subject: Re: bug#18681: cp Specific fail example
Date: Sun, 12 Oct 2014 19:11:45 -0700

[Message part 1 (text/plain, inline)]

Further to Bob's explanation,
If you were to copy a 10gig file across the internet. cp would work just fine and could take several hours.  But suppose there was an error in the transmission (bad block) or you had to stop and restart. you would need to redo cp and copy the file from the beginning.  Rsync would take a checksum of the parts of the file on the remote, and compare it to the host. It would restart at the first detected bad file offset.

 


Regards 

 Leslie

Mr. Leslie Satenstein
Montreal, Quebec, Canada




>________________________________
> From: Bob Proulx <bob <at> proulx.com>
>To: Linda Walsh <bash <at> tlinx.org> 
>Cc: 18681 <at> debbugs.gnu.org; "Polehn, Mike A" <mike.a.polehn <at> intel.com> 
>Sent: Sunday, October 12, 2014 8:54 PM
>Subject: bug#18681: cp Specific fail example
> 
>
>Linda Walsh wrote:
>> Bob Proulx wrote:
>> > Meanwhile...  I would be one of those suggesting that perhaps you
>> > should try using rsync instead of cp.  The cp command is lean and
>> > mean by comparison to rsync (and should stay that way).  But rsync
>> > has many attractive features for doing large copies.
>>
>> ---- fwiw...---
>> Like large execution times... from the latest snapshot on my system --
>> I use rsync to only move differences between  yesterday and "today[whenever
>> new snap is taken]"... it was a larger than normal snap -- most only
>> take 75-90 minutes...but rsync (these are the script messages) with some
>> debugging output still turned on... even an rm over the resulting diff
>> took 101 seconds... then cp comes along.. even w/a sync it would
>> still be under a minute.
>
>Wow.  Just to be clear an rsync copy took 75 to 90 minutes but a cp
>copy took less than 1 minute?  I find that very suspicious.  I never
>see that much difference between them.  Are you sure the difference
>wasn't that the data was cached into ram by the rsync and therefore
>the second run with cp just ran with the warmed up cache?  With a
>large data set and a large ram that is plausible.
>
>> I.e. rsync copied just the diffs to "/home.diff", then
>> find with "-empty -delete" is used to get rid of empty dirs (rsync
>> creates many of these).  then a static partition is created to hold
>> the "diff" output -- and cp took walked and copied the tree in 12s.
>> (output wasn't flushed, but it's not that long.. <a minute...).
>
>It appears that you are using features from rsync that do not exist in
>cp.  Therefore the work being done in the task isn't equivalent work.
>In that case it is probably quite reasonable for rsync to be slower
>than cp.
>
>Also consider that if cp were to acquire all of the enhancements that
>have been requested for cp as time has gone by then cp would be just
>as featureful (bloated!) as rsync and likely just as slow as rsync
>too.  This is something to consider every time someone asks for a
>creeping feature to cp.  Especially if they say they want the feature
>in cp because it is faster than rsync.  The natural progression is
>that cp would become rsync.
>
>> If rsync wasn't so slow at local I/O...*sigh*....
>
>The advantage of rsync is that it can be interrupted and restarted and
>the restarted process will efficiently avoid doing work that is
>already done.  An interrupted and restarted cp will perform the same
>work again from start to finish.
>
>If I am doing a simple copy from A to B then I use 'cp -av A B'.  If I
>am doing it the second time then I will use rsync to avoid repeating
>previously done work 'rsync -av A B'.
>
>If I want progress indication...  If I want placement of backup files
>in a particular directory...  If I want other fancy features that are
>provided by rsync then it is worth it to use rsync.
>
>  $ du -s coreutils
>  238920  coreutils
>  $ find coreutils -type f | wc -l
>  15013
>
>  $ rm -rf junk/coreutils
>  # echo 3 > /proc/sys/vm/drop_caches
>  $ time cp -a coreutils junk/
>  real    1m2.137s
>  user    0m0.140s
>  sys     0m1.724s
>
>  $ rm -rf junk/coreutils
>  $ time cp -a coreutils junk/
>  real    0m2.492s
>  user    0m0.060s
>  sys     0m1.064s
>
>  $ rm -rf junk/coreutils
>  # echo 3 > /proc/sys/vm/drop_caches
>  $ time rsync -a coreutils junk/
>  real    1m5.473s
>  user    0m1.280s
>  sys     0m2.112s
>
>  $ rm -rf junk/coreutils
>  $ time rsync -a coreutils junk/
>  real    0m3.215s
>  user    0m1.184s
>  sys     0m1.536s
>
>For normal use cp is a little faster than rsync.  Or rather rsync is a
>little slower than cp.  But not enough to make a difference for
>typical operations.  Having the file system cache warmed up makes a
>*HUGE* difference.  Much larger than any other difference.  For copies
>that take hours to run I am probably going to value the restart
>ability more than raw speed.  YMMV.
>
>
>
>
>
>Bob
>
>
>
>
>
>

[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Mon, 13 Oct 2014 02:47:01 GMT) Full text and rfc822 format available.

Message #50 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <bash <at> tlinx.org>
To: 18681 <at> debbugs.gnu.org, "Polehn, Mike A" <mike.a.polehn <at> intel.com>
Subject: Re: bug#18681: cp Specific fail example
Date: Sun, 12 Oct 2014 19:45:19 -0700

Bob Proulx wrote:
> Wow.  Just to be clear an rsync copy took 75 to 90 minutes but a cp
---
	Actually in the case I used for illustration, it was 110 minutes,
but that was longer than normal.  Last night's figures:

: rsync took 87m, 34s      [which is fairly quick given the size of the diffs.]
: Empty-directory removal took 1m, 58s
: Find used space for /home.diff...sz=2.5GB, min=3.1GB, extsz=4.0MB, n-ext'=806
: Copying diffs to dated static snap...Time: 0m, 17s.

It wasn't a copy, but a diff between 2 volumes (the same volume, but one
is a ~24+hour snapshot started the on the previous run.  So I look at
the differences between two temporal copies then copy that to a 3rd
partition that starts out empty.  So rsync is comparing file times (doesn't
do file reads, _by_ _default_, unless it needs to move the data (as indicated
by size and timestamps) -- examines all file time/dates on my 'home'
partition, and compares those against a mostly-the-same- active LVM
snapshot.  Out of 871G, on the long day, it found ~5G of changes --
last night was only 3G... varies based on how much change happened to
the volume over the period... smallest size now is 600m, largest I've seen
has been about 18G.

Once the *difference* is on the 3rd volume ("home.diff"), I destroy
the active snapshot created 'yesterday', then recreate it as as a dynamically
sized static -- enough to hold the diff.  Then cp is used to move
whatever "diffs" were put on the "diff" volume by rsync.  So
Those diffs -- most of them are _likely_ to be in memory -- AND as
I mentioned, I didn't do a sync after the copy (it happens automatically,
but isn't included in the timing).

But if I used rsync to do that exact same copy, it would take at least 2-3
times as long... actually... hold on... I can copy it from that partition made
yesterday ... into the diff parition.. but will tar up the source
to prime the cache...

This is the volume:
> df .
Filesystem                          Size  Used Avail Use% Mounted on
/dev/Data/Home-2014.10.08-03.07.05  5.5G  4.4G  1.1G  81%\
                                    /home/.snapdir/@GMT-2014.10.08-03.07.05
Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> du -sh .
4.4G  .

ok... running cp 1st, then remove, then rsync...:

Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \
        time sudo cp -a . /home.diff/.
6.39sec 0.15usr 6.23sys (99.81% cpu)

Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \
        time sudo rm -fr /home.diff/.
1.69sec 0.03usr 1.64sys (99.43% cpu)

Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \
        time sudo rsync -aHAX . /home.diff/.
20.83sec 27.02usr 11.68sys (185.84% cpu)

----185% cpu!... hey! that's cheating and still 3x slower... here's 1 core:

Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \
        time sudo rm -fr /home.diff/.
1.73sec 0.03usr 1.69sys (99.39% cpu)

Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \
        time sudo taskset -a 02 rsync -aHAX . /home.diff/.
38.52sec 25.92usr 11.90sys (98.18% cpu)
---
so limiting it to 1 cpu... 6x slower. (remember this is all
in memory buffered)

Note... rsync has been sped up slightly over the past couple of years
and 'cp' has slown down somewhat  over the same time period, so these
diffs used to be worse.

Then 'cp' is used to copy the image on 'home.diff' to the dynamically
sized
> copy took less than 1 minute?  I find that very suspicious.
---
	Well, hopefully the above explanation is more clear and
highlights what we wanted to measure.

> 
> It appears that you are using features from rsync that do not exist in
> cp.  Therefore the work being done in the task isn't equivalent work.
> In that case it is probably quite reasonable for rsync to be slower
> than cp.
----
Yup... Never would argue differently, but for what it does, rsync is
still pig slow, but when the amount of data you need to move is hundreds
of times smaller than the total, it can't be beat!

> 
> Also consider that if cp were to acquire all of the enhancements that
> have been requested for cp as time has gone by then cp would be just
> as featureful (bloated!) as rsync and likely just as slow as rsync
> too. 
----
	Nope...rsync is slow because it does everything over a client
server model --- even when it is local.  So everything is written through
a pipe .. that's why it can't come close to cp -- and why cp would never
be so slow -- I can't imagine it using a pipe to copy a file anywhere!

> This is something to consider every time someone asks for a
> creeping feature to cp.  Especially if they say they want the feature
> in cp because it is faster than rsync.  The natural progression is
> that cp would become rsync.
----
	Not even!  Note.  cp already has a comparison function
built in that it uses during "cp -u"... but it doesn't go through
pipes.  It used to use larger buffer sizes or maybe tell posix
to pre-alloc the destination space, dunno, but it used to be
faster.. I can't say for certain, but it seems to be using
smaller buffer sizes.  Another reason rsync is so slow -- uses
a relatively small i/o size 1-4k last I looked. I've asked them
to increase it, but going through a pipe it won't help alot.

This is from a different email on the rsync list from 7/26:

One might ask why rsync is so slow --
copying 800G from 1 partition to another via xfsdump/restore takes a bit under 2 
hours,
or about 170MB/s, but with rsync, on the same partition with rsync transfering
less than 1/1000th as much (700MB [in a differential as I mentioned above]), it
took ~70-80 minutes... or about 163kB/s.

Transfer speeds depend on many factors.  One of the largest is
transfer size (how much transfered with 1 write /read.
Transferring 1GB,  @ 1-meg at a time, took 2.08s read, and
1.56s to write (using direct io).

Transfer it in 4K chunks: 37.28s, to read, and 43.02s to write.
1k buffers are 4x slower than that!

Also in rsync, they've added the posix calls to reserve
space in the target location for a file being copied in.
Specifically, this is to lower disk fragmentation (does
cp do anything like that, been a while since I looked).

> 
>> If rsync wasn't so slow at local I/O...*sigh*....
> 
> The advantage of rsync is that it can be interrupted and restarted and
> the restarted process will efficiently avoid doing work that is
> already done.  An interrupted and restarted cp will perform the same
> work again from start to finish.
----
	I wouldn't trust that it would.  If you interrupt it at exactly
the wrong time, I'd be afraid some file might get set with the right
data but the wrong Meta info (acls, primarily).

> 
> If I am doing a simple copy from A to B then I use 'cp -av A B'.  If I
> am doing it the second time then I will use rsync to avoid repeating
> previously done work 'rsync -av A B'.
---
	Wouldn't cp -auv A B do the same?

> 
> If I want progress indication...  If I want placement of backup files
> in a particular directory...  If I want other fancy features that are
> provided by rsync then it is worth it to use rsync.
> 
>   $ du -s coreutils
>   238920  coreutils
>   $ find coreutils -type f | wc -l
>   15013
> 
>   $ rm -rf junk/coreutils
>   # echo 3 > /proc/sys/vm/drop_caches
>   $ time cp -a coreutils junk/
>   real    1m2.137s
>   user    0m0.140s
>   sys     0m1.724s
> 
>   $ rm -rf junk/coreutils
>   $ time cp -a coreutils junk/
>   real    0m2.492s
>   user    0m0.060s
>   sys     0m1.064s
> 
>   $ rm -rf junk/coreutils
>   # echo 3 > /proc/sys/vm/drop_caches
>   $ time rsync -a coreutils junk/
>   real    1m5.473s
>   user    0m1.280s
>   sys     0m2.112s
> 
>   $ rm -rf junk/coreutils
>   $ time rsync -a coreutils junk/
>   real    0m3.215s
>   user    0m1.184s
>   sys     0m1.536s
---
By default cp -a transfers acls and ext-attrs and preserves
hard links.   Rsync doesn't do any of that by default.
You need to  use "-aHAX" to compare them ...

you have to call them
out as 'extra' with rsync, so the above test may not be what it seems.
Though if you don't use ACL's (which I do), then maybe the above
is almost reasonable.  Still.. should use -aHAX

Is your rsync newer? i.e. does it have the posix-pre-alloc
hints?... Mine has a pre-alloc patch, but I think that was
suse-added and not the one in the mainline code.  Not sure.

rsync --version
rsync  version 3.1.0  protocol version 31
    64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
    socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
    append, ACLs, xattrs, iconv, symtimes, prealloc, SLP

I don't think mine does yet...
> 
> For normal use cp is a little faster than rsync.  Or rather rsync is a
> little slower than cp.  But not enough to make a difference for
> typical operations.  Having the file system cache warmed up makes a
> *HUGE* difference.  Much larger than any other difference.  For copies
> that take hours to run I am probably going to value the restart
> ability more than raw speed.  YMMV.
----
	I'll value the accuracy of xfsdump/restore...

	Throw a few TB copies at rsync -- where all the data
won't fit in memory.... it also, I'm told, has problems with
hardlinks, acls and xattrs slowing it down, so it may be a
matter of usage...

	BUT all that said... note that I DO USE it... for the
job I'm doing in my snapper script, nothing else will.

Cheers!
Linda

(don't ya just love performance talk?)

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Sun, 19 Oct 2014 23:54:01 GMT) Full text and rfc822 format available.

Message #53 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Linda Walsh <bash <at> tlinx.org>
Cc: 18681 <at> debbugs.gnu.org, "Polehn, Mike A" <mike.a.polehn <at> intel.com>
Subject: Re: bug#18681: cp Specific fail example
Date: Sun, 19 Oct 2014 17:53:31 -0600

Linda Walsh wrote:
> Bob Proulx wrote:
> > Also consider that if cp were to acquire all of the enhancements
> > that have been requested for cp as time has gone by then cp would
> > be just as featureful (bloated!) as rsync and likely just as slow
> > as rsync too.
>
> 	Nope...rsync is slow because it does everything over a client
> server model --- even when it is local.  So everything is written through
> a pipe .. that's why it can't come close to cp -- and why cp would never
> be so slow -- I can't imagine it using a pipe to copy a file anywhere!

The client-server structure of rsync is required for copying between
systems.  Saying that cp doesn't have it isn't fair if cp were to add
every requested feature.  I am sure that if I search the archives I
would find a request to add client-server structure to cp to support
copying from system to system. :-)

Now I will proactively agree that it would be nice if rsync detected
that it was all running locally and didn't fork and instead ran
everything in one process like cp does.  But I could see that coming
to rsync at some time in the future.  It is an often requested
feature.

> > This is something to consider every time someone asks for a
> > creeping feature to cp.  Especially if they say they want the feature
> > in cp because it is faster than rsync.  The natural progression is
> > that cp would become rsync.
>
> 	Not even!  Note.  cp already has a comparison function
> built in that it uses during "cp -u"...

I am not convinced of the robustness of 'cp -u ...' interrupt, repeat,
interrupt repeat.  It wasn't intended for that mode.  I am suspicious.
Is there any code path that could leave a new file in the target area
that would avoid copy?  Not sure.  Newer meets the -u test but isn't
an exact copy if the time stamp were older in the original.  But with
rsync I know it will correct for this during a subsequent run.

> built in that it uses during "cp -u"... but it doesn't go through
> pipes.  It used to use larger buffer sizes or maybe tell posix
> to pre-alloc the destination space, dunno, but it used to be
> faster.. I can't say for certain, but it seems to be using

Often the data sizes we work with grow larger over time making the
same task feel slower because we are actually dealing with more data
now.  Files include audio.  Files include video.  Standard def becomes
high def.  "Difficult to see.  Always in motion is the future."

> smaller buffer sizes.  Another reason rsync is so slow -- uses
> a relatively small i/o size 1-4k last I looked. I've asked them
> to increase it, but going through a pipe it won't help alot.

Nod.  Rsync was designed for the network use case.  It could benefit
with some tuning for the local case.  A topic for the rsync list.

> Also in rsync, they've added the posix calls to reserve
> space in the target location for a file being copied in.
> Specifically, this is to lower disk fragmentation (does
> cp do anything like that, been a while since I looked).

I don't know.  It would be worth a look.

> > The advantage of rsync is that it can be interrupted and restarted and
> > the restarted process will efficiently avoid doing work that is
> > already done.  An interrupted and restarted cp will perform the same
> > work again from start to finish.
>
> 	I wouldn't trust that it would.  If you interrupt it at exactly
> the wrong time, I'd be afraid some file might get set with the right
> data but the wrong Meta info (acls, primarily).

The design of rsync is to copy the file to a temporary name beside the
intended target.  After the copy the timestamps are set.  After that
the timestamps are set the file is renamed into place.  An interrupt
that happens before that rename time will cause the temporary file to
be removed.  An interrupt that happens after the rename is, well,
after that and the copy is already done.  Since rename on the local
file system is atomic this is guaranteed to function robustly.  (As
long as you aren't using a buggy file system that changes the order of
operations.  That isn't cool.  But of course it was famously seen in
ext4 for a while.  Fortunately sanity has prevailed and ext4 doesn't
do that for this operation anymore.  Okay to use now.)

> > If I am doing a simple copy from A to B then I use 'cp -av A B'.  If I
> > am doing it the second time then I will use rsync to avoid repeating
> > previously done work 'rsync -av A B'.
>
> 	Wouldn't cp -auv A B do the same?

Do I have to go look at the source code to verify that it doesn't? :-(

I assume it doesn't without looking.  I assume cp copies in place.  I
assume that cp does not make a temporary file off to the side and
rename it into place once it is done and has set the timestamps.  I
assume that cp copies to the named destination directly and updates
the timestamps afterward.  That creates a window of time when the file
is in place but has not had the timestamp placed on it yet.

Which means that if the cp is interrupted on a large file that it will
have started the copy but will not have finished it at the moment that
it is interrupted.  The new file will be in place with a new
timestamp.  The second run with cp -u will avoid overwriting the file
because the timestamp is newer.  However the contents of the file will
be incomplete, or at least not matching the source copy at the time of
the second copy.

If my assumptions in the above are wrong please correct me.  I will
learn something.  But the operating model would need to be the same
portably across all portable systems covered by posix before I would
consider it actually safe to use.

> > If I want progress indication...  If I want placement of backup files
> > in a particular directory...  If I want other fancy features that are
> > provided by rsync then it is worth it to use rsync.
> > ...trimmed simple benchmark...
> >  $ time cp -a coreutils junk/
>
> By default cp -a transfers acls and ext-attrs and preserves
> hard links.   Rsync doesn't do any of that by default.
> You need to  use "-aHAX" to compare them ...

Good catch.  :-)

> you have to call them
> out as 'extra' with rsync, so the above test may not be what it seems.
> Though if you don't use ACL's (which I do), then maybe the above
> is almost reasonable.  Still.. should use -aHAX

I didn't have any hard links, ACLs, or extended attributes in the test
case it shouldn't matter for the above.

> Is your rsync newer? i.e. does it have the posix-pre-alloc
> hints?... Mine has a pre-alloc patch, but I think that was
> suse-added and not the one in the mainline code.  Not sure.
> 
> rsync --version
> rsync  version 3.1.0  protocol version 31
>     64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
>     socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
>     append, ACLs, xattrs, iconv, symtimes, prealloc, SLP

I happened to run that test on Debian Sid and it is 3.1.1.  However
Debian Stable, which I have most widely deployed, has 3.0.9.  So you
are both ahead of and behind me at the same time. :-)

> 	Throw a few TB copies at rsync -- where all the data
> won't fit in memory.... it also, I'm told, has problems with
> hardlinks, acls and xattrs slowing it down, so it may be a
> matter of usage...

I have had problems running rsync with -H for large data sets.  Bad
enough that I recommend against it.  Don't do it!  I don't know
anything about -A and -X.  But rsync -a is fine for very large data
sets.

> 	BUT all that said... note that I DO USE it... for the
> job I'm doing in my snapper script, nothing else will.

Yes.  It is too useful to be without!

> (don't ya just love performance talk?)

Except that we should have moved all of this to the discussion list.
I feel guilty to have continued it.  We have drifted well away from
the original bug report.  The one with the terrible title.  If this
continues let's take it over to the coreutils discussion list for
further conversation about it.

Bob

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Mon, 20 Oct 2014 00:05:02 GMT) Full text and rfc822 format available.

Message #56 received at 18681 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: "Polehn, Mike A" <mike.a.polehn <at> intel.com>,
 Assaf Gordon <assafgordon <at> gmail.com>,
 "18681 <at> debbugs.gnu.org" <18681 <at> debbugs.gnu.org>
Subject: Re: bug#18681: The Linux cp command has bugs
Date: Sun, 19 Oct 2014 18:04:17 -0600

close 18681
thanks

Eric Blake wrote:
> Polehn, Mike A wrote:
> > This still left the incorrect operation of the interactive
> > operation when both -i and -f is used.
> 
> The behavior of -i vs. -f interaction is required by POSIX; in
> particular, POSIX is explicit that -i and -f are NOT a toggle switch of
> one another, but each turns on slightly different, somewhat overlapping,
> changes in behavior (so specifying both is different from specifying one
> in isolation).  We can't change what either one of those flags means.

This bug log included some serious topic drift of which I contributed
to myself.  In order to atone for that I am going to triage this as
saying that the behavior is intended and standardized and therefore
won't be changed.  Now that we understand this the bug ticket can be
closed.  Further discussion can be continued and it will all be logged
and read by the subscribed.

> If there is another mode of operation that is also useful, then it needs
> yet another flag.  At one point in the past, we had
> --reply={yes,no,query} to try and offer a third mode, but it had
> confusing semantics and we ended up pulling it because of the confusion
> it could cause.  At the time we pulled it, we admitted that 'rsync' has
> some modes of operations that might be better suited for the particular
> modes that people people seemed to be requesting when they thought that
> --reply would do the trick (and usually, what they thought --reply would
> do and what it actually did were different, which is why we removed it
> to avoid confusion).  We have also added a --no-clobber option, which is
> somewhat of a compromise (what some people thought --reply=no would do,
> --no-clobber actually does better).

Good summary!

> So adding a new option is not out of the question, but you'd have to
> have well-defined semantics of what it should do, and how it differs
> from either normal mode, '-i' mode, '-f' mode, '-i -f' mode, or
> '--no-clobber' mode.

If the readers of this ticket think there is an enhancement request to
be filed for cp then please file a wishlist bug with the proposal.  A
reference to this log can be made if desired.  Let me suggest that the
proposal first be made on the coreutils discussion list where it can
be discussed and shaped and then after that has been done file a
wishlist bug of the result in order to track its progress through the
code and release.

Bob

bug closed, send any further explanations to 18681 <at> debbugs.gnu.org and "Polehn, Mike A" <mike.a.polehn <at> intel.com> Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Mon, 20 Oct 2014 00:05:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#18681; Package coreutils. (Mon, 20 Oct 2014 06:21:01 GMT) Full text and rfc822 format available.

Message #61 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <coreutils <at> tlinx.org>
To: Bob Proulx <bob <at> proulx.com>
Cc: "Polehn, Mike A" <mike.a.polehn <at> intel.com>, bug-coreutils <at> gnu.org
Subject: Re: bug#18681: cp Specific fail example
Date: Sun, 19 Oct 2014 23:20:00 -0700


Bob Proulx wrote:
> Linda Walsh wrote:
>> Bob Proulx wrote:
>>> Also consider that if cp were to acquire all of the enhancements
>>> that have been requested for cp as time has gone by then cp would
>>> be just as featureful (bloated!) as rsync and likely just as slow
>>> as rsync too.
>> 	Nope...rsync is slow because it does everything over a client
>> server model --- even when it is local.  So everything is written through
>> a pipe .. that's why it can't come close to cp -- and why cp would never
>> be so slow -- I can't imagine it using a pipe to copy a file anywhere!
> 
> The client-server structure of rsync is required for copying between
> systems.  Saying that cp doesn't have it isn't fair if cp were to add
> every requested feature.
---
	cp was designed for local->local copy.
	rsync was designed for local->remote synchronization (thus
'r(emote) sync'.  Saying it isn't fair to compare code quality between
a java->'native code compiler' and a compiler developed for a native platform
is entirely fair -- because both started out with different design goals -- thus
each ends up with pluses and minus that are an effect of that goal.  If you
claim comparing such effects isn't fair, then it's not fair to compare
any different algorithm with another because algorithms inherently have
their pluses and minuses and are often chosen for use in a particular
situation because of those pluses and minuses.

	So lets compare using 'cp' with rsync in copying a remote file.
The choice of tools depends on the quality of the remote connection, but
in most remote connections, "today", reliability isn't usually an issue as
they flow over TCP and file transfer protocols like NFS or CIFS also have
checks to allow users to reconnect after an interruption (like a machine reboot).
Depending on timeout settings, 'cp' already has a restart over-remove
ability when used with NFS or CIFS. CIFS doesn't tolerate a system reboot
in the middle of a copy, whereas NFS can recover from such if the client
uses hard mounts.  But for a local network, I regularly use 'cp' with
CIFS and it does a faster job than rsync -- over a reliable local network.


> I am sure that if I search the archives I
> would find a request to add client-server structure to cp to support
> copying from system to system. :-)
----
	We are comparing where the tools are at  _not_ where they _could_
have been had previous algorithm choices been ignored.  We are talking
about a local->local copy (in the base note), so glossing over the slowness
of rsync in doing such is entirely fair.  If you want some level of
recovery after interrupt, NFS is a better choice for a local network --
client connections can continue even after a server reboot.  But if we
are talking local->local reliability, the simple, close solution would be
SMB/CIFS.

Using a 1GB file as an example (and throwing in a 'dd' for
for comparison):

> time rsync 1G ishtar:/home/law/1G
20.13sec 1.29usr 2.68sys (19.73% cpu)
> time cp 1G /h/.      
6.94sec 0.01usr 1.10sys (16.16% cpu)
> time dd if=1G of=/h/1G bs=256M oflag=direct
4+0 records in
4+0 records out
1073741824 bytes (1.1 GB) copied, 3.4694 s, 309 MB/s
3.50sec 0.00usr 0.51sys (14.64% cpu)

Here again, we see rsync doing the same job of cp
taking about 3x the time.

For a single file over a local net 'dd' is a better bet.

> 
> Now I will proactively agree that it would be nice if rsync detected
> that it was all running locally and didn't fork and instead ran
> everything in one process like cp does.  But I could see that coming
> to rsync at some time in the future.  It is an often requested
> feature.
---
	For many years.


>>> This is something to consider every time someone asks for a
>>> creeping feature to cp.  Especially if they say they want the feature
>>> in cp because it is faster than rsync.  The natural progression is
>>> that cp would become rsync.
>> 	Not even!  Note.  cp already has a comparison function
>> built in that it uses during "cp -u"...
> 
> I am not convinced of the robustness of 'cp -u ...' interrupt, repeat,
> interrupt repeat.  It wasn't intended for that mode.
---
	Neither is rsync in its default mode.  It compares
timestamps and size, nothing more.  I'd be suspicious of either
rsync OR cp's chances in such a situation.  But USUALLY, people
don't interrupt a copy many times -- or even once, so cp is usually
faster...


> Is there any code path that could leave a new file in the target area
> that would avoid copy?  Not sure.  Newer meets the -u test but isn't
> an exact copy if the time stamp were older in the original.  But with
> rsync I know it will correct for this during a subsequent run.
---
	Not necessarily.  It doesn't do checksumming by default.  Certainly,
if you used rsync with '-u', rsync will not be much better in recovery,
since target files with more recent timestamps may be left in the
target dir.  I don't think rsync or cp trap a control-c-abort to cleanup
target files.


> 
>> built in that it uses during "cp -u"... but it doesn't go through
>> pipes.  It used to use larger buffer sizes or maybe tell posix
>> to pre-alloc the destination space, dunno, but it used to be
>> faster.. I can't say for certain, but it seems to be using
> 
> Often the data sizes we work with grow larger over time making the
> same task feel slower because we are actually dealing with more data
> now.
---
	I was comparing copy times with same files,
 not from years ago to now.


>>  Another reason rsync is so slow -- uses
>> a relatively small i/o size 1-4k last I looked. I've asked them
>> to increase it, but going through a pipe it won't help alot.
> 
> Nod.  Rsync was designed for the network use case.  It could benefit
> with some tuning for the local case.  A topic for the rsync list.
---
Been there, done that.  Still comparing current-to-current, not
hypotheticals.


> 
>> Also in rsync, they've added the posix calls to reserve
>> space in the target location for a file being copied in.
>> Specifically, this is to lower disk fragmentation (does
>> cp do anything like that, been a while since I looked).
> 
> I don't know.  It would be worth a look.
> 
>>> The advantage of rsync is that it can be interrupted and restarted and
>>> the restarted process will efficiently avoid doing work that is
>>> already done.  An interrupted and restarted cp will perform the same
>>> work again from start to finish.
>> 	I wouldn't trust that it would.  If you interrupt it at exactly
>> the wrong time, I'd be afraid some file might get set with the right
>> data but the wrong Meta info (acls, primarily).
> 
> The design of rsync is to copy the file to a temporary name beside the
> intended target.  After the copy the timestamps are set.  After that
> the timestamps are set the file is renamed into place.  An interrupt
> that happens before that rename time will cause the temporary file to
> be removed.  An interrupt that happens after the rename is, well,
> after that and the copy is already done.  Since rename on the local
> file system is atomic this is guaranteed to function robustly.  (As
> long as you aren't using a buggy file system that changes the order of
> operations.  That isn't cool.  But of course it was famously seen in
> ext4 for a while.  Fortunately sanity has prevailed and ext4 doesn't
> do that for this operation anymore.  Okay to use now.)
> 
>>> If I am doing a simple copy from A to B then I use 'cp -av A B'.  If I
>>> am doing it the second time then I will use rsync to avoid repeating
>>> previously done work 'rsync -av A B'.
>> 	Wouldn't cp -auv A B do the same?
> 
> Do I have to go look at the source code to verify that it doesn't? :-(
---
	My timing says cp is 20x faster for that 1G file case.  It also
shows that rsync doesn't use a tmp file in the update case
>  time cp -au 1G /h
0.03sec 0.00usr 0.03sys (79.47% cpu)
> cp -au 1G /h     
> time rsync -au 1G ishtar:/home/law/1G
0.60sec 0.06usr 0.09sys (25.12% cpu)

> 
> I assume it doesn't without looking.  I assume cp copies in place.  I
> assume that cp does not make a temporary file off to the side and
> rename it into place once it is done and has set the timestamps.
---
	I assume rsync doesn't either -- if it is comparing against
a file already in place, for it to transfer the whole file... nope.  I
> assume that cp copies to the named destination directly and updates
> the timestamps afterward.  That creates a window of time when the file
> is in place but has not had the timestamp placed on it yet.
> 
> Which means that if the cp is interrupted on a large file that it will
> have started the copy but will not have finished it at the moment that
> it is interrupted.  The new file will be in place with a new
> timestamp.  The second run with cp -u will avoid overwriting the file
> because the timestamp is newer.  However the contents of the file will
> be incomplete, or at least not matching the source copy at the time of
> the second copy.
> 
> If my assumptions in the above are wrong please correct me.  I will
> learn something.  But the operating model would need to be the same
> portably across all portable systems covered by posix before I would
> consider it actually safe to use.
---
	Same happens in rsync -- no tmp file is involved.  It compares
time stamps and doesn't copy.


> 
>>> If I want progress indication...  If I want placement of backup files
>>> in a particular directory...  If I want other fancy features that are
>>> provided by rsync then it is worth it to use rsync.
>>> ...trimmed simple benchmark...
>>>  $ time cp -a coreutils junk/
>> By default cp -a transfers acls and ext-attrs and preserves
>> hard links.   Rsync doesn't do any of that by default.
>> You need to  use "-aHAX" to compare them ...
> 
> Good catch.  :-)
> 
>> you have to call them
>> out as 'extra' with rsync, so the above test may not be what it seems.
>> Though if you don't use ACL's (which I do), then maybe the above
>> is almost reasonable.  Still.. should use -aHAX
> 
> I didn't have any hard links, ACLs, or extended attributes in the test
> case it shouldn't matter for the above.
> 
>> Is your rsync newer? i.e. does it have the posix-pre-alloc
>> hints?... Mine has a pre-alloc patch, but I think that was
>> suse-added and not the one in the mainline code.  Not sure.
>>
>> rsync --version
>> rsync  version 3.1.0  protocol version 31
>>     64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
>>     socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
>>     append, ACLs, xattrs, iconv, symtimes, prealloc, SLP
> 
> I happened to run that test on Debian Sid and it is 3.1.1.  However
> Debian Stable, which I have most widely deployed, has 3.0.9.  So you
> are both ahead of and behind me at the same time. :-)
> 
>> 	Throw a few TB copies at rsync -- where all the data
>> won't fit in memory.... it also, I'm told, has problems with
>> hardlinks, acls and xattrs slowing it down, so it may be a
>> matter of usage...
> 
> I have had problems running rsync with -H for large data sets.  Bad
> enough that I recommend against it.  Don't do it!  I don't know
> anything about -A and -X.  But rsync -a is fine for very large data
> sets.
----
	But then you can't compare to 'cp' which does handle
that case.


>> (don't ya just love performance talk?)
> 
> Except that we should have moved all of this to the discussion list.
---
:-( ?discussion list? -- bugs-coreutils? (don't know about others)...

'sides, I didn't bring up rsync, all I added was
"If rsync wasn't so slow at local I/O...*sigh*.... "


Its good for when you need "diffs", but not as a general replacement
for 'cp'.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 17 Nov 2014 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 10 years and 268 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #18681 The Linux cp command has bugs

GNU bug report logs - #18681
The Linux cp command has bugs