Self-reply, following some discussion with my colleague. It is possible that, strictly speaking, -s doesn't make sense alongside -a: By using `cp -as` we're asking to preserve hardlinks AND create symbolic links instead of copying From the man page: -a, --archive same as -dR --preserve=all . -d same as --no-dereference --preserve=links . --preserve[=ATTR_LIST] preserve the specified attributes (default: mode,ownership,timestamps), if possible additional attributes: context, links, xattr, all . -s, --symbolic-link make symbolic links instead of copying As an alternative invocation to get the "expected" behaviour: $ cp -RPs --preserve=all --no-preserve=links "$(pwd)"/copy_from/ copy_to $ ls -l copy_to/file* copy_to/file1 -> /gnu_cp_bug/copy_from/file1 copy_to/file2 -> /gnu_cp_bug/copy_from/file2 So, if the combination of -a/-s doesn't make sense - can this be better documented? Otherwise, please consider fixing the default behaviour (e.g. via my previous tentative diff; or perhaps -s should always imply --no-preseive=links) Kind Regards, Martin -----Original Message----- From: Martin Ramsdale (mramsdal) Sent: 12 October 2020 11:45 To: bug-coreutils@gnu.org Subject: `cp --archive --symbolic-link` non-reproducible and creates hardlinks between symbolic links that dereference to the same inode Dear coreutils maintainers, I've encountered, what I consider, a bug in GNU cp: If you do a recursive copy, then any files with identical inodes from the source will be created with new-identical inodes in the destination. For example: $ mkdir copy_from $ echo aaa > copy_from/file1 $ ln copy_from/file1 copy_from/file2 $ cp -as "$(pwd)"/copy_from/ copy_to $ stat -c '%n %i' copy_to/file* copy_to/file1 42615790 copy_to/file2 42615790 $ ls -l copy_to/file* copy_to/file1 -> /gnu_cp_bug/copy_from/file1 copy_to/file2 -> /gnu_cp_bug/copy_from/file1 Whereas the expected result is: $ ls -l copy_to/file* copy_to/file1 -> /gnu_cp_bug/copy_from/file1 copy_to/file2 -> /gnu_cp_bug/copy_from/file2 Issues this can cause include: 1) Incorrect file usage: Whilst initially any usage of copy_to/file[1,2] is as expected, if any of copy_from/file[1,2] is *replaced* (rather than modified), then usage of copy_to/file[1,2] will yield unexpected results. For example: $ rm copy_from/file2 $ echo bbb > copy_from/file2 $ cat copy_from/file* aaa bbb $ cat copy_to/file* aaa aaa 2) Non-reproducible behaviour: symlinks created may point to any of the original inodes, and so for the same input directory the output of `copy -as` may differ. For example invocations on different systems could yield either: > ls -l copy_to/file* copy_to/file1 -> /gnu_cp_bug/copy_from/file1 copy_to/file2 -> /gnu_cp_bug/copy_from/file1 OR > ls -l copy_to/file* copy_to/file1 -> /gnu_cp_bug/copy_from/file2 copy_to/file2 -> /gnu_cp_bug/copy_from/file2 From brief code inspection, I believe this issue to orginiate from earlier_file lookup in copy_internal(). I haven't had the opportunity to build/validate this as a fix, but propose the following for consideration: diff --git a/src/copy.c b/src/copy.c index 4050f6953..74c1e7499 100644 --- a/src/copy.c +++ b/src/copy.c @@ -2513,6 +2513,8 @@ copy_internal (char const *src_name, char const *dst_name, { if (command_line_arg) earlier_file = remember_copied (dst_name, src_sb.st_ino, src_sb.st_dev); + else if (x->symbolic_link) + earlier_file = NULL; else earlier_file = src_to_dest_lookup (src_sb.st_ino, src_sb.st_dev); } Kind Regards, Martin