GNU bug report logs - #5915
[coreutils] Feature: Introduce fiemap copy to cp(1) and mv(1) v2

Previous Next

Package: coreutils;

Reported by: Jim Meyering <jim <at> meyering.net>

Date: Fri, 9 Apr 2010 15:50:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 5915 in the body.
You can then email your comments to 5915 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5915; Package coreutils. (Fri, 09 Apr 2010 15:50:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jim Meyering <jim <at> meyering.net>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Fri, 09 Apr 2010 15:50:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: [coreutils] Feature: Introduce fiemap copy to cp(1) and mv(1) v2
Date: Fri, 09 Apr 2010 17:49:22 +0200
jeff.liu wrote:
> [...explanation...]

Thanks.

...
>>     Run "make syntax-check", or even "make distcheck"
...
> Sorry to make this stupidly mistake again, I will do double check for furture patches submition.

It's not a big deal.
Notice that you can probably tell your editor
not to do that anymore.

I've just adjusted the Cc list to include bug-coreutils.
Let's keep the discussion there, for the sake of the bug tracker.





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5915; Package coreutils. (Sat, 17 Apr 2010 08:28:01 GMT) Full text and rfc822 format available.

Message #8 received at 5915 <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, 5915 <at> debbugs.gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>, chris.mason <at> oracle.com
Subject: Re: bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1)
	and mv(1) v2
Date: Sat, 17 Apr 2010 16:27:08 +0800
Hello All,

This is the revised version.

Both '--fiemap' and '--fiemap-sync' options were eliminated in this version.
Now cp(1) will perform fiemap optimization whenever possible if cp(1) invoked with --sparse=[WHEN],
otherwise, it will fall back to the standard copy.

I have run 'make syntax-check' to ensure the code style is ok.

The tests were run against 'btrfs', 'ocfs2' and 'ext4', each mounted as physical partitions.
/dev/sda9 on /btrfs type btrfs (rw)
/dev/sda8 on /ocfs2 type ocfs2 (rw,_netdev,nointr,user_xattr,acl,heartbeat=local)
/dev/sda11 on /ext4 type ext4 (rw,acl,user_xattr)

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ uname -a
Linux jeff-laptop 2.6.33-rc5-00238-gb04da8b-dirty #11 SMP Sat Dec 19 22:02:01 CST 2009 i686 GNU/Linux

For the following performance compares, './src/cp' means the fiemap enabled `cp', and '/bin/cp' is
the normal one which build against the git upstream source code.

The performance are good for 'ocfs2' and 'ext4'.  but for 'btrfs', it does not shown not as expected.

Tests:
======
jeff <at> jeff-laptop:~/opensource_dev/coreutils$ dd if=/dev/zero of=/ocfs2/sparse_2 bs=1k count=1 seek=10022
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 0.000382239 s, 2.7 MB/s

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time ./src/cp --sparse=always /btrfs/sparse_2
/btrfs/sparse_2_fiemap

real	0m0.252s
user	0m0.004s
sys	0m0.232s

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time /bin/cp --sparse=always /btrfs/sparse_2
/btrfs/sparse_2_normal

real	0m0.069s
user	0m0.004s
sys	0m0.036s

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time ./src/cp --sparse=always /ocfs2/sparse_2
/ocfs2/sparse_2_fiemap

real	0m0.019s
user	0m0.000s
sys	0m0.012s

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time /bin/cp --sparse=always /ocfs2/sparse_2
/ocfs2/sparse_2_normal

real	0m0.103s
user	0m0.000s
sys	0m0.092s

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time /bin/cp --sparse=always /ext4/sparse_2
/ext4/sparse_2_normal

real	0m0.103s
user	0m0.000s
sys	0m0.064s
jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time ./src/cp --sparse=always /ext4/sparse_2
/ext4/sparse_2_fiemap

real	0m0.012s
user	0m0.000s
sys	0m0.008s


I also tried to test a sparse file which fills with holes in the middle.

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ ls -l /btrfs/sparse_4
-rw-r-xr-- 1 jeff jeff 838860800 Apr 17 14:24 /btrfs/sparse_4

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time ./src/cp --sparse=always /btrfs/sparse_4
/btrfs/sparse_4_fiemap

real    0m43.227s
user    0m0.156s
sys     0m21.749s

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time /bin/cp --sparse=always /btrfs/sparse_4
/btrfs/sparse_4_normal

real    0m18.282s
user    0m0.232s
sys     0m14.301s

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time sudo /bin/cp --sparse=always /ocfs2/sparse_4
/ocfs2/sparse_4_normal

real    0m7.583s
user    0m0.216s
sys     0m7.292s

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time sudo src/cp --sparse=always /ocfs2/sparse_4
/ocfs2/sparse_4_fiemap

real    0m0.133s
user    0m0.000s
sys     0m0.076s

jeff <at> jeff-laptop:~/opensource_dev/du_enhance_project/coreutils$ ls -l /ocfs2/sparse_4*
-rw-r-xr-- 1 root root 838860800 Apr 16 22:52 /ocfs2/sparse_4
-rw-r-xr-- 1 jeff jeff 838860800 Apr 17 14:26 /ocfs2/sparse_4_fiemap
-rw-r-xr-- 1 jeff jeff 838860800 Apr 17 14:27 /ocfs2/sparse_4_normal


jeff <at> jeff-laptop:~/opensource_dev/coreutils$ rdiff signature /ocfs2/sparse_4 /ocfs2/sparse_4_fiemap
jeff <at> jeff-laptop:~/opensource_dev/coreutils$ echo $?
0

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time /bin/cp --sparse=always /ext4/sparse_4
/ext4/sparse_4_normal

real    0m7.258s
user    0m0.244s
sys     0m4.468s

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time ./src/cp --sparse=always /ext4/sparse_4
/ext4/sparse_4_fiemap

real    0m0.173s
user    0m0.000s
sys     0m0.052s


From 95f26596e932b87b2da1695086af156f24ceadfa Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Sat, 17 Apr 2010 16:19:49 +0800
Subject: [PATCH 1/1] Introduct fiemap ioctl(2) for efficient sparse file copy v3

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 src/copy.c   |  141 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/fiemap.h |  102 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 243 insertions(+), 0 deletions(-)
 create mode 100644 src/fiemap.h

diff --git a/src/copy.c b/src/copy.c
index 3c32fa3..b754d9e 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -65,6 +65,10 @@
 # include <sys/ioctl.h>
 #endif

+#ifndef HAVE_FIEMAP
+# include "fiemap.h"
+#endif
+
 #ifndef HAVE_FCHOWN
 # define HAVE_FCHOWN false
 # define fchown(fd, uid, gid) (-1)
@@ -151,6 +155,130 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

+#ifdef __linux__
+# ifndef FS_IOC_FIEMAP
+#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
+# endif
+/* Perform FIEMAP(available in mainline 2.6.27) copy if possible.
+   Call ioctl(2) with FS_IOC_FIEMAP to efficiently map file allocation
+   excepts holes.  So the overhead to deal with holes with lseek(2) in
+   normal copy could be saved.  This would result in much faster backups
+   for any kind of sparse file.  */
+static bool
+fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
+                off_t src_total_size, char const *src_name,
+                char const *dst_name)
+{
+  int last = 0;
+  unsigned int i;
+  bool return_val = true;
+  char fiemap_buf[4096] = "";
+  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
+  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
+  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
+                    sizeof (struct fiemap_extent);
+  uint64_t last_ext_logical = 0;
+  uint64_t last_ext_len = 0;
+  uint64_t last_read_size = 0;
+
+  memset (fiemap, 0, sizeof (*fiemap));
+
+  do
+    {
+      fiemap->fm_start = 0ULL;
+      fiemap->fm_length = FIEMAP_MAX_OFFSET;
+      fiemap->fm_extent_count = count;
+
+      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
+          return false;
+
+      /* If 0 extents are returned, then more ioctls are not needed.  */
+      if (fiemap->fm_mapped_extents == 0)
+          return true;
+
+      for (i = 0; i < fiemap->fm_mapped_extents; i++)
+        {
+          uint64_t ext_logical = fm_ext[i].fe_logical;
+          uint64_t ext_len = fm_ext[i].fe_length;
+
+          if (lseek (src_fd, (off_t) ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (src_name));
+              return_val = false;
+            }
+
+          if (lseek (dest_fd, (off_t) ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (dst_name));
+              return_val = false;
+            }
+
+          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
+            {
+              last_ext_logical = ext_logical;
+              last_ext_len = ext_len;
+              last = 1;
+            }
+
+          char buf[buf_size];
+          while (0 < ext_len)
+            {
+              memset (buf, 0, sizeof (buf));
+
+              /* Avoid reading into the holes if the left extent
+                 length is shorter than the buffer size.  */
+              if (ext_len < buf_size)
+                buf_size = ext_len;
+
+              ssize_t n_read = read (src_fd, buf, buf_size);
+              if (n_read < 0)
+                {
+#ifdef EINTR
+                  if (errno == EINTR)
+                    continue;
+#endif
+                  error (0, errno, _("reading %s"), quote (src_name));
+                  return_val = false;
+                }
+
+              if (n_read == 0)
+                {
+                   /* Figure out how many bytes read from the last extent.  */
+                   last_read_size = last_ext_len - ext_len;
+                   break;
+                }
+
+              if (full_write (dest_fd, buf, n_read) != n_read)
+                {
+                  error (0, errno, _("writing %s"), quote (dst_name));
+                  return_val = false;
+                }
+
+              ext_len -= n_read;
+            }
+
+          fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
+        }
+    } while (last == 0);
+
+  /* If a file ends up with holes, the sum of the last extent logical offset
+     and the read returned size should shorter than the actual size of the file.
+     We should sets the file size to ((struct stat) st_buf.st_size).  */
+  if (last_ext_logical + last_read_size < src_total_size)
+    {
+      if (ftruncate (dest_fd, src_total_size) < 0)
+        {
+          error (0, errno, _("truncating %s"), quote (dst_name));
+          return_val = false;
+        }
+    }
+
+  return return_val;
+}
+#else
+static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
+#endif
+
 /* FIXME: describe */
 /* FIXME: rewrite this to use a hash table so we avoid the quadratic
    performance hit that's probably noticeable only on trees deeper
@@ -679,6 +807,16 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }

+      if (make_holes)
+        {
+          /* Perform efficient FIEMAP copy for sparse files, fall back to the
+             standard copy if fails.  */
+          if (fiemap_copy_ok (source_desc, dest_desc,
+                              buf_size, src_open_sb.st_size,
+                              src_name, dst_name))
+            goto preserve_metadata;
+        }
+
       /* If not making a sparse file, try to use a more-efficient
          buffer size.  */
       if (! make_holes)
@@ -807,6 +945,8 @@ copy_reg (char const *src_name, char const *dst_name,
         }
     }

+preserve_metadata:
+
   if (x->preserve_timestamps)
     {
       struct timespec timespec[2];
@@ -897,6 +1037,7 @@ close_src_desc:

   free (buf_alloc);
   free (name_alloc);
+
   return return_val;
 }

diff --git a/src/fiemap.h b/src/fiemap.h
new file mode 100644
index 0000000..d33293b
--- /dev/null
+++ b/src/fiemap.h
@@ -0,0 +1,102 @@
+/* FS_IOC_FIEMAP ioctl infrastructure.
+   Some portions copyright (C) 2007 Cluster File Systems, Inc
+   Authors: Mark Fasheh <mfasheh <at> suse.com>
+            Kalpak Shah <kalpak.shah <at> sun.com>
+            Andreas Dilger <adilger <at> sun.com>.  */
+
+/* Copy from kernel, modified to respect GNU code style by Jie Liu.  */
+
+#ifndef _LINUX_FIEMAP_H
+# define _LINUX_FIEMAP_H
+
+# include <linux/types.h>
+
+struct fiemap_extent
+{
+  /* Logical offset in bytes for the start of the extent
+     from the beginning of the file.  */
+  uint64_t fe_logical;
+
+  /* Physical offset in bytes for the start of the extent
+     from the beginning of the disk.  */
+  uint64_t fe_physical;
+
+  /* Length in bytes for this extent.  */
+  uint64_t fe_length;
+
+  uint64_t fe_reserved64[2];
+
+  /* FIEMAP_EXTENT_* flags for this extent.  */
+  uint32_t fe_flags;
+
+  uint32_t fe_reserved[3];
+};
+
+struct fiemap
+{
+  /* Logical offset(inclusive) at which to start mapping(in).  */
+  uint64_t fm_start;
+
+  /* Logical length of mapping which userspace wants(in).  */
+  uint64_t fm_length;
+
+  /* FIEMAP_FLAG_* flags for request(in/out).  */
+  uint32_t fm_flags;
+
+  /* Number of extents that were mapped(out).  */
+  uint32_t fm_mapped_extents;
+
+  /* Size of fm_extents array(in).  */
+  uint32_t fm_extent_count;
+
+  uint32_t fm_reserved;
+
+  /* Array of mapped extents(out).  */
+  struct fiemap_extent fm_extents[0];
+};
+
+/* The maximum offset can be mapped for a file.  */
+# define FIEMAP_MAX_OFFSET       (~0ULL)
+
+/* Sync file data before map.  */
+# define FIEMAP_FLAG_SYNC        0x00000001
+
+/* Map extented attribute tree.  */
+# define FIEMAP_FLAG_XATTR       0x00000002
+
+# define FIEMAP_FLAGS_COMPAT     (FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR)
+
+/* Last extent in file.  */
+# define FIEMAP_EXTENT_LAST              0x00000001
+
+/* Data location unknown.  */
+# define FIEMAP_EXTENT_UNKNOWN           0x00000002
+
+/* Location still pending, Sets EXTENT_UNKNOWN.  */
+# define FIEMAP_EXTENT_DELALLOC          0x00000004
+
+/* Data can not be read while fs is unmounted.  */
+# define FIEMAP_EXTENT_ENCODED           0x00000008
+
+/* Data is encrypted by fs.  Sets EXTENT_NO_BYPASS.  */
+# define FIEMAP_EXTENT_DATA_ENCRYPTED    0x00000080
+
+/* Extent offsets may not be block aligned.  */
+# define FIEMAP_EXTENT_NOT_ALIGNED       0x00000100
+
+/* Data mixed with metadata.  Sets EXTENT_NOT_ALIGNED.  */
+# define FIEMAP_EXTENT_DATA_INLINE       0x00000200
+
+/* Multiple files in block.  Set EXTENT_NOT_ALIGNED.  */
+# define FIEMAP_EXTENT_DATA_TAIL         0x00000400
+
+/* Space allocated, but not data (i.e. zero).  */
+# define FIEMAP_EXTENT_UNWRITTEN         0x00000800
+
+/* File does not natively support extents.  Result merged for efficiency.  */
+# define FIEMAP_EXTENT_MERGED		0x00001000
+
+/* Space shared with other files.  */
+# define FIEMAP_EXTENT_SHARED            0x00002000
+
+#endif
-- 
1.5.4.3

Jim Meyering wrote:
> jeff.liu wrote:
>> [...explanation...]
> 
> Thanks.
> 
> ...
>>>     Run "make syntax-check", or even "make distcheck"
> ...
>> Sorry to make this stupidly mistake again, I will do double check for furture patches submition.
> 
> It's not a big deal.
> Notice that you can probably tell your editor
> not to do that anymore.
> 
> I've just adjusted the Cc list to include bug-coreutils.
> Let's keep the discussion there, for the sake of the bug tracker.
> 
> 
> 
> 
> 





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5915; Package coreutils. (Sat, 17 Apr 2010 09:39:01 GMT) Full text and rfc822 format available.

Message #11 received at 5915 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, 5915 <at> debbugs.gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>, chris.mason <at> oracle.com
Subject: Re: bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1) and
	mv(1) v2
Date: Sat, 17 Apr 2010 11:38:34 +0200
jeff.liu wrote:
> Hello All,
>
> This is the revised version.

Thanks for persevering and for the examples.

Would you please maintain a script that automates some of the
commands you've been running, so that we can ensure they
continue to work as advertised?  I keep hoping you'll contribute
a test script or two, too, so I don't have write those, but if
you can at least provide the basis for the tests, I'll massage
that into the required form.  I.e., instead of running the many
commands as you outlined in your mail, you would run a single
script that encapsulates those, where you would be free to assume
(specific-to-your-setup) hard-coded absolute directory names for
each of the different file system types.
Also, hard-coded paths for $cp_orig and $cp_new would be fine,
since you'd use those to demonstrate the performance improvements.

> Both '--fiemap' and '--fiemap-sync' options were eliminated in this version.
> Now cp(1) will perform fiemap optimization whenever possible if cp(1) invoked with --sparse=[WHEN],
> otherwise, it will fall back to the standard copy.
...
> jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time sudo /bin/cp --sparse=always /ocfs2/sparse_4

Hmm... you're using sudo here.
Does that mean the fiemap ioctl works only as root?
Otherwise, why bother with sudo?

> Subject: [PATCH 1/1] Introduct fiemap ioctl(2) for efficient sparse file copy v3
>
> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
> ---
>  src/copy.c   |  141 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  src/fiemap.h |  102 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 243 insertions(+), 0 deletions(-)
>  create mode 100644 src/fiemap.h
>
> diff --git a/src/copy.c b/src/copy.c
...
> +static bool
> +fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
> +                off_t src_total_size, char const *src_name,
> +                char const *dst_name)
> +{
> +  int last = 0;

Use "bool" for booleans:

    bool last = false;

> +  unsigned int i;
> +  bool return_val = true;

Please don't use a name like "return_val".
In this case, using "ok" is more descriptive.
(or reverse the meaning and use "fail")

> +  char fiemap_buf[4096] = "";

This is a dead store.  Remove the ' = ""';

> +  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
> +  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
> +  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
> +                    sizeof (struct fiemap_extent);
> +  uint64_t last_ext_logical = 0;
> +  uint64_t last_ext_len = 0;
> +  uint64_t last_read_size = 0;
> +
> +  memset (fiemap, 0, sizeof (*fiemap));

What is the point of initializing this buffer?
I suspect you can safely remove it.

> +  do
> +    {
> +      fiemap->fm_start = 0ULL;
> +      fiemap->fm_length = FIEMAP_MAX_OFFSET;
> +      fiemap->fm_extent_count = count;
> +
> +      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
> +          return false;
> +
> +      /* If 0 extents are returned, then more ioctls are not needed.  */
> +      if (fiemap->fm_mapped_extents == 0)
> +          return true;
> +
> +      for (i = 0; i < fiemap->fm_mapped_extents; i++)
> +        {
> +          uint64_t ext_logical = fm_ext[i].fe_logical;

There is no point in declaring this as uint64_t when the
sole uses are cast to "off_t".
Instead declare it as off_t, and add an assertion
to ensure that the RHS value was in range.

             off_t ext_logical = fm_ext[i].fe_logical;
             assert (fm_ext[i].fe_logical <= OFF_T_MAX);

> +          uint64_t ext_len = fm_ext[i].fe_length;

> +
> +          if (lseek (src_fd, (off_t) ext_logical, SEEK_SET) < 0LL)
> +            {
> +              error (0, errno, _("cannot lseek %s"), quote (src_name));
> +              return_val = false;
> +            }
> +
> +          if (lseek (dest_fd, (off_t) ext_logical, SEEK_SET) < 0LL)
> +            {
> +              error (0, errno, _("cannot lseek %s"), quote (dst_name));
> +              return_val = false;
> +            }

If either lseek fails, then we must not proceed to read or write.
Setting "return_val" is insufficient.

This makes me think we'll have to redesign a little, to avoid
the possibility of corruption.

Consider this change:

  If the fiemap ioctl succeeds: use only this specialized code to
  perform the copy and do not fall back on normal copy.  However, if
  any operation fails here (like these lseeks, read or write), we
  diagnose it as usual.  Not falling back upon failure has the added
  benefit that there's no risk of a duplicate diagnostic if the same
  failure occurs again during the regular copy.

Otherwise, imagine that lseek advances the source FD,
but the destination lseek fails.  Now the FDs are inconsistent.
We would have to realign (or restore) them, or risk writing
data at the wrong offset: corruption.

> +          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
> +            {
> +              last_ext_logical = ext_logical;
> +              last_ext_len = ext_len;
> +              last = 1;

                 last = true;

> +            }
> +
> +          char buf[buf_size];

"buf" is used only within the loop, so please declare it there.

> +          while (0 < ext_len)
> +            {
> +              memset (buf, 0, sizeof (buf));

I see no need to initialize "buf".
The only data you use from it is put there by "read".

> +              /* Avoid reading into the holes if the left extent
> +                 length is shorter than the buffer size.  */
> +              if (ext_len < buf_size)
> +                buf_size = ext_len;
> +
> +              ssize_t n_read = read (src_fd, buf, buf_size);
> +              if (n_read < 0)
> +                {
> +#ifdef EINTR
> +                  if (errno == EINTR)
> +                    continue;
> +#endif
> +                  error (0, errno, _("reading %s"), quote (src_name));
> +                  return_val = false;
> +                }
> +
> +              if (n_read == 0)
> +                {
> +                   /* Figure out how many bytes read from the last extent.  */
> +                   last_read_size = last_ext_len - ext_len;
> +                   break;

Each of the above three lines is indented one space too far.

> +                }
> +
> +              if (full_write (dest_fd, buf, n_read) != n_read)
> +                {
> +                  error (0, errno, _("writing %s"), quote (dst_name));
> +                  return_val = false;
> +                }
> +
> +              ext_len -= n_read;
> +            }
> +
> +          fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
> +        }
> +    } while (last == 0);

       } while (!last);

> +
> +  /* If a file ends up with holes, the sum of the last extent logical offset
> +     and the read returned size should shorter than the actual size of the file.
> +     We should sets the file size to ((struct stat) st_buf.st_size).  */

     /* If a file ends with a hole, the sum of the last extent logical offset
        and the read-returned size will be shorter than the actual size of the
        file.  Use ftruncate to extend the length of the destination file.  */

> +  if (last_ext_logical + last_read_size < src_total_size)
> +    {
> +      if (ftruncate (dest_fd, src_total_size) < 0)
> +        {
> +          error (0, errno, _("truncating %s"), quote (dst_name));

s/truncating/extending/

> +          return_val = false;
> +        }
> +    }
> +
> +  return return_val;
> +}
> +#else
> +static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
> +#endif
> +
>  /* FIXME: describe */
>  /* FIXME: rewrite this to use a hash table so we avoid the quadratic
>     performance hit that's probably noticeable only on trees deeper
> @@ -679,6 +807,16 @@ copy_reg (char const *src_name, char const *dst_name,
>  #endif
>          }
>
> +      if (make_holes)
> +        {
> +          /* Perform efficient FIEMAP copy for sparse files, fall back to the
> +             standard copy if fails.  */
> +          if (fiemap_copy_ok (source_desc, dest_desc,
> +                              buf_size, src_open_sb.st_size,
> +                              src_name, dst_name))
> +            goto preserve_metadata;
> +        }




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5915; Package coreutils. (Sat, 17 Apr 2010 12:39:02 GMT) Full text and rfc822 format available.

Message #14 received at 5915 <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, 5915 <at> debbugs.gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>, chris.mason <at> oracle.com
Subject: Re: bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1)
	and mv(1) v2
Date: Sat, 17 Apr 2010 20:38:12 +0800
Hi Jim,

Thank you so much for your so quick response!

Jim Meyering wrote:
> jeff.liu wrote:
>> Hello All,
>>
>> This is the revised version.
> 
> Thanks for persevering and for the examples.
> 
> Would you please maintain a script that automates some of the
> commands you've been running, so that we can ensure they
> continue to work as advertised?  I keep hoping you'll contribute
> a test script or two, too, so I don't have write those, but if
> you can at least provide the basis for the tests, I'll massage
> that into the required form.  I.e., instead of running the many
> commands as you outlined in your mail, you would run a single
> script that encapsulates those, where you would be free to assume
> (specific-to-your-setup) hard-coded absolute directory names for
> each of the different file system types.
> Also, hard-coded paths for $cp_orig and $cp_new would be fine,
> since you'd use those to demonstrate the performance improvements.
Yes, I kept in mind for your suggestions about how to meet the requirements for new features. :)

Actually, I already working on a test script, the reason why I didn't submit it along with the
current patch is I have concern if the current test coverage is enough, so I send out the patch
first.  The test script will be submit in my next try.
> 
>> Both '--fiemap' and '--fiemap-sync' options were eliminated in this version.
>> Now cp(1) will perform fiemap optimization whenever possible if cp(1) invoked with --sparse=[WHEN],
>> otherwise, it will fall back to the standard copy.
> ...
>> jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time sudo /bin/cp --sparse=always /ocfs2/sparse_4
> 
> Hmm... you're using sudo here.
> Does that mean the fiemap ioctl works only as root?
sorry! I also forgot why I using sudo at that time. :(
But I am sure fiemap ioctl(2) could works without root privilage.

> Otherwise, why bother with sudo?
> 
>> Subject: [PATCH 1/1] Introduct fiemap ioctl(2) for efficient sparse file copy v3
>>
>> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
>> ---
>>  src/copy.c   |  141 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  src/fiemap.h |  102 ++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 243 insertions(+), 0 deletions(-)
>>  create mode 100644 src/fiemap.h
>>
>> diff --git a/src/copy.c b/src/copy.c
> ...
>> +static bool
>> +fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
>> +                off_t src_total_size, char const *src_name,
>> +                char const *dst_name)
>> +{
>> +  int last = 0;
> 
> Use "bool" for booleans:
> 
>     bool last = false;
>> +  unsigned int i;
>> +  bool return_val = true;
> 
> Please don't use a name like "return_val".
> In this case, using "ok" is more descriptive.
> (or reverse the meaning and use "fail")
Thanks for the point out.
> 
>> +  char fiemap_buf[4096] = "";
> 
> This is a dead store.  Remove the ' = ""';
> 
>> +  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
>> +  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
>> +  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
>> +                    sizeof (struct fiemap_extent);
>> +  uint64_t last_ext_logical = 0;
>> +  uint64_t last_ext_len = 0;
>> +  uint64_t last_read_size = 0;
>> +
>> +  memset (fiemap, 0, sizeof (*fiemap));
> 
> What is the point of initializing this buffer?
> I suspect you can safely remove it.
I'll double check for this point.
> 
>> +  do
>> +    {
>> +      fiemap->fm_start = 0ULL;
>> +      fiemap->fm_length = FIEMAP_MAX_OFFSET;
>> +      fiemap->fm_extent_count = count;
>> +
>> +      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
>> +          return false;
>> +
>> +      /* If 0 extents are returned, then more ioctls are not needed.  */
>> +      if (fiemap->fm_mapped_extents == 0)
>> +          return true;
>> +
>> +      for (i = 0; i < fiemap->fm_mapped_extents; i++)
>> +        {
>> +          uint64_t ext_logical = fm_ext[i].fe_logical;
> 
> There is no point in declaring this as uint64_t when the
> sole uses are cast to "off_t".
> Instead declare it as off_t, and add an assertion
> to ensure that the RHS value was in range.
> 
>              off_t ext_logical = fm_ext[i].fe_logical;
>              assert (fm_ext[i].fe_logical <= OFF_T_MAX);
> 
Yes, it should be declared to 'off_t' here.

>> +          uint64_t ext_len = fm_ext[i].fe_length;
> 
>> +
>> +          if (lseek (src_fd, (off_t) ext_logical, SEEK_SET) < 0LL)
>> +            {
>> +              error (0, errno, _("cannot lseek %s"), quote (src_name));
>> +              return_val = false;
>> +            }
>> +
>> +          if (lseek (dest_fd, (off_t) ext_logical, SEEK_SET) < 0LL)
>> +            {
>> +              error (0, errno, _("cannot lseek %s"), quote (dst_name));
>> +              return_val = false;
>> +            }
> 
> If either lseek fails, then we must not proceed to read or write.
> Setting "return_val" is insufficient.
> 
> This makes me think we'll have to redesign a little, to avoid
> the possibility of corruption.
> 
> Consider this change:
> 
>   If the fiemap ioctl succeeds: use only this specialized code to
>   perform the copy and do not fall back on normal copy.  However, if
>   any operation fails here (like these lseeks, read or write), we
>   diagnose it as usual.  Not falling back upon failure has the added
>   benefit that there's no risk of a duplicate diagnostic if the same
>   failure occurs again during the regular copy.
> 
> Otherwise, imagine that lseek advances the source FD,
> but the destination lseek fails.  Now the FDs are inconsistent.
> We would have to realign (or restore) them, or risk writing
> data at the wrong offset: corruption.
So, do you means if fiemap ioctl(2) succeeds, no matter what error(read, write, lseek) occurred
afterwards, we should always diagnosing it and then abort with the corresponding error msg?

My original design really cause the file corruption if the lseek(2) already advances the source FD
but the destination lseek(2) fails.

But is it make sense to seek back to the start offset for both source and destination FDs, then fall
back to the regular copy?

> 
>> +          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
>> +            {
>> +              last_ext_logical = ext_logical;
>> +              last_ext_len = ext_len;
>> +              last = 1;
> 
>                  last = true;
> 
>> +            }
>> +
>> +          char buf[buf_size];
> 
> "buf" is used only within the loop, so please declare it there.
> 
>> +          while (0 < ext_len)
>> +            {
>> +              memset (buf, 0, sizeof (buf));
> 
> I see no need to initialize "buf".
> The only data you use from it is put there by "read".
This initizalize really do not need here.
> 
>> +              /* Avoid reading into the holes if the left extent
>> +                 length is shorter than the buffer size.  */
>> +              if (ext_len < buf_size)
>> +                buf_size = ext_len;
>> +
>> +              ssize_t n_read = read (src_fd, buf, buf_size);
>> +              if (n_read < 0)
>> +                {
>> +#ifdef EINTR
>> +                  if (errno == EINTR)
>> +                    continue;
>> +#endif
>> +                  error (0, errno, _("reading %s"), quote (src_name));
>> +                  return_val = false;
>> +                }
>> +
>> +              if (n_read == 0)
>> +                {
>> +                   /* Figure out how many bytes read from the last extent.  */
>> +                   last_read_size = last_ext_len - ext_len;
>> +                   break;
> 
> Each of the above three lines is indented one space too far.
oh, I do such stupid thing again!!!!
> 
>> +                }
>> +
>> +              if (full_write (dest_fd, buf, n_read) != n_read)
>> +                {
>> +                  error (0, errno, _("writing %s"), quote (dst_name));
>> +                  return_val = false;
>> +                }
>> +
>> +              ext_len -= n_read;
>> +            }
>> +
>> +          fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
>> +        }
>> +    } while (last == 0);
> 
>        } while (!last);
> 
>> +
>> +  /* If a file ends up with holes, the sum of the last extent logical offset
>> +     and the read returned size should shorter than the actual size of the file.
>> +     We should sets the file size to ((struct stat) st_buf.st_size).  */
> 
>      /* If a file ends with a hole, the sum of the last extent logical offset
>         and the read-returned size will be shorter than the actual size of the
>         file.  Use ftruncate to extend the length of the destination file.  */
> 
>> +  if (last_ext_logical + last_read_size < src_total_size)
>> +    {
>> +      if (ftruncate (dest_fd, src_total_size) < 0)
>> +        {
>> +          error (0, errno, _("truncating %s"), quote (dst_name));
> 
> s/truncating/extending/
thanks for above 3 corrections!
> 
>> +          return_val = false;
>> +        }
>> +    }
>> +
>> +  return return_val;
>> +}
>> +#else
>> +static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
>> +#endif
>> +
>>  /* FIXME: describe */
>>  /* FIXME: rewrite this to use a hash table so we avoid the quadratic
>>     performance hit that's probably noticeable only on trees deeper
>> @@ -679,6 +807,16 @@ copy_reg (char const *src_name, char const *dst_name,
>>  #endif
>>          }
>>
>> +      if (make_holes)
>> +        {
>> +          /* Perform efficient FIEMAP copy for sparse files, fall back to the
>> +             standard copy if fails.  */
>> +          if (fiemap_copy_ok (source_desc, dest_desc,
>> +                              buf_size, src_open_sb.st_size,
>> +                              src_name, dst_name))
>> +            goto preserve_metadata;
>> +        }

Best Regards,
-Jeff




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5915; Package coreutils. (Sat, 17 Apr 2010 13:02:01 GMT) Full text and rfc822 format available.

Message #17 received at 5915 <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, 5915 <at> debbugs.gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>, chris.mason <at> oracle.com
Subject: Re: bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1)
	and mv(1) v2
Date: Sat, 17 Apr 2010 21:00:41 +0800
jeff.liu wrote:
> Hi Jim,
> 
> Thank you so much for your so quick response!
> 
> Jim Meyering wrote:
>> jeff.liu wrote:
>>> Hello All,
>>>
>>> This is the revised version.
>> Thanks for persevering and for the examples.
>>
>> Would you please maintain a script that automates some of the
>> commands you've been running, so that we can ensure they
>> continue to work as advertised?  I keep hoping you'll contribute
>> a test script or two, too, so I don't have write those, but if
>> you can at least provide the basis for the tests, I'll massage
>> that into the required form.  I.e., instead of running the many
>> commands as you outlined in your mail, you would run a single
>> script that encapsulates those, where you would be free to assume
>> (specific-to-your-setup) hard-coded absolute directory names for
>> each of the different file system types.
>> Also, hard-coded paths for $cp_orig and $cp_new would be fine,
>> since you'd use those to demonstrate the performance improvements.
> Yes, I kept in mind for your suggestions about how to meet the requirements for new features. :)
> 
> Actually, I already working on a test script, the reason why I didn't submit it along with the
> current patch is I have concern if the current test coverage is enough, so I send out the patch
> first.  The test script will be submit in my next try.
>>> Both '--fiemap' and '--fiemap-sync' options were eliminated in this version.
>>> Now cp(1) will perform fiemap optimization whenever possible if cp(1) invoked with --sparse=[WHEN],
>>> otherwise, it will fall back to the standard copy.
>> ...
>>> jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time sudo /bin/cp --sparse=always /ocfs2/sparse_4
>> Hmm... you're using sudo here.
>> Does that mean the fiemap ioctl works only as root?
> sorry! I also forgot why I using sudo at that time. :(
> But I am sure fiemap ioctl(2) could works without root privilage.
> 
>> Otherwise, why bother with sudo?
>>
>>> Subject: [PATCH 1/1] Introduct fiemap ioctl(2) for efficient sparse file copy v3
>>>
>>> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
>>> ---
>>>  src/copy.c   |  141 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  src/fiemap.h |  102 ++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 243 insertions(+), 0 deletions(-)
>>>  create mode 100644 src/fiemap.h
>>>
>>> diff --git a/src/copy.c b/src/copy.c
>> ...
>>> +static bool
>>> +fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
>>> +                off_t src_total_size, char const *src_name,
>>> +                char const *dst_name)
>>> +{
>>> +  int last = 0;
>> Use "bool" for booleans:
>>
>>     bool last = false;
>>> +  unsigned int i;
>>> +  bool return_val = true;
>> Please don't use a name like "return_val".
>> In this case, using "ok" is more descriptive.
>> (or reverse the meaning and use "fail")
> Thanks for the point out.
>>> +  char fiemap_buf[4096] = "";
>> This is a dead store.  Remove the ' = ""';
>>
>>> +  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
>>> +  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
>>> +  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
>>> +                    sizeof (struct fiemap_extent);
>>> +  uint64_t last_ext_logical = 0;
>>> +  uint64_t last_ext_len = 0;
>>> +  uint64_t last_read_size = 0;
>>> +
>>> +  memset (fiemap, 0, sizeof (*fiemap));
>> What is the point of initializing this buffer?
>> I suspect you can safely remove it.
> I'll double check for this point.
>>> +  do
>>> +    {
>>> +      fiemap->fm_start = 0ULL;
>>> +      fiemap->fm_length = FIEMAP_MAX_OFFSET;
>>> +      fiemap->fm_extent_count = count;
>>> +
>>> +      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
>>> +          return false;
>>> +
>>> +      /* If 0 extents are returned, then more ioctls are not needed.  */
>>> +      if (fiemap->fm_mapped_extents == 0)
>>> +          return true;
>>> +
>>> +      for (i = 0; i < fiemap->fm_mapped_extents; i++)
>>> +        {
>>> +          uint64_t ext_logical = fm_ext[i].fe_logical;
>> There is no point in declaring this as uint64_t when the
>> sole uses are cast to "off_t".
>> Instead declare it as off_t, and add an assertion
>> to ensure that the RHS value was in range.
>>
>>              off_t ext_logical = fm_ext[i].fe_logical;
>>              assert (fm_ext[i].fe_logical <= OFF_T_MAX);
>>
> Yes, it should be declared to 'off_t' here.
> 
>>> +          uint64_t ext_len = fm_ext[i].fe_length;
>>> +
>>> +          if (lseek (src_fd, (off_t) ext_logical, SEEK_SET) < 0LL)
>>> +            {
>>> +              error (0, errno, _("cannot lseek %s"), quote (src_name));
>>> +              return_val = false;
>>> +            }
>>> +
>>> +          if (lseek (dest_fd, (off_t) ext_logical, SEEK_SET) < 0LL)
>>> +            {
>>> +              error (0, errno, _("cannot lseek %s"), quote (dst_name));
>>> +              return_val = false;
>>> +            }
>> If either lseek fails, then we must not proceed to read or write.
>> Setting "return_val" is insufficient.
>>
>> This makes me think we'll have to redesign a little, to avoid
>> the possibility of corruption.
>>
>> Consider this change:
>>
>>   If the fiemap ioctl succeeds: use only this specialized code to
>>   perform the copy and do not fall back on normal copy.  However, if
>>   any operation fails here (like these lseeks, read or write), we
>>   diagnose it as usual.  Not falling back upon failure has the added
>>   benefit that there's no risk of a duplicate diagnostic if the same
>>   failure occurs again during the regular copy.
>>
>> Otherwise, imagine that lseek advances the source FD,
>> but the destination lseek fails.  Now the FDs are inconsistent.
>> We would have to realign (or restore) them, or risk writing
>> data at the wrong offset: corruption.
> So, do you means if fiemap ioctl(2) succeeds, no matter what error(read, write, lseek) occurred
> afterwards, we should always diagnosing it and then abort with the corresponding error msg?
> 
> My original design really cause the file corruption if the lseek(2) already advances the source FD
> but the destination lseek(2) fails.
> 
> But is it make sense to seek back to the start offset for both source and destination FDs, then fall
> back to the regular copy?
Oh, My thought to seek back to the start offset of both FDs is wrong, it most possibibly encounter
the same issue (read, write, lseek) in regular copy again.

> 
>>> +          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
>>> +            {
>>> +              last_ext_logical = ext_logical;
>>> +              last_ext_len = ext_len;
>>> +              last = 1;
>>                  last = true;
>>
>>> +            }
>>> +
>>> +          char buf[buf_size];
>> "buf" is used only within the loop, so please declare it there.
>>
>>> +          while (0 < ext_len)
>>> +            {
>>> +              memset (buf, 0, sizeof (buf));
>> I see no need to initialize "buf".
>> The only data you use from it is put there by "read".
> This initizalize really do not need here.
>>> +              /* Avoid reading into the holes if the left extent
>>> +                 length is shorter than the buffer size.  */
>>> +              if (ext_len < buf_size)
>>> +                buf_size = ext_len;
>>> +
>>> +              ssize_t n_read = read (src_fd, buf, buf_size);
>>> +              if (n_read < 0)
>>> +                {
>>> +#ifdef EINTR
>>> +                  if (errno == EINTR)
>>> +                    continue;
>>> +#endif
>>> +                  error (0, errno, _("reading %s"), quote (src_name));
>>> +                  return_val = false;
>>> +                }
>>> +
>>> +              if (n_read == 0)
>>> +                {
>>> +                   /* Figure out how many bytes read from the last extent.  */
>>> +                   last_read_size = last_ext_len - ext_len;
>>> +                   break;
>> Each of the above three lines is indented one space too far.
> oh, I do such stupid thing again!!!!
>>> +                }
>>> +
>>> +              if (full_write (dest_fd, buf, n_read) != n_read)
>>> +                {
>>> +                  error (0, errno, _("writing %s"), quote (dst_name));
>>> +                  return_val = false;
>>> +                }
>>> +
>>> +              ext_len -= n_read;
>>> +            }
>>> +
>>> +          fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
>>> +        }
>>> +    } while (last == 0);
>>        } while (!last);
>>
>>> +
>>> +  /* If a file ends up with holes, the sum of the last extent logical offset
>>> +     and the read returned size should shorter than the actual size of the file.
>>> +     We should sets the file size to ((struct stat) st_buf.st_size).  */
>>      /* If a file ends with a hole, the sum of the last extent logical offset
>>         and the read-returned size will be shorter than the actual size of the
>>         file.  Use ftruncate to extend the length of the destination file.  */
>>
>>> +  if (last_ext_logical + last_read_size < src_total_size)
>>> +    {
>>> +      if (ftruncate (dest_fd, src_total_size) < 0)
>>> +        {
>>> +          error (0, errno, _("truncating %s"), quote (dst_name));
>> s/truncating/extending/
> thanks for above 3 corrections!
>>> +          return_val = false;
>>> +        }
>>> +    }
>>> +
>>> +  return return_val;
>>> +}
>>> +#else
>>> +static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
>>> +#endif
>>> +
>>>  /* FIXME: describe */
>>>  /* FIXME: rewrite this to use a hash table so we avoid the quadratic
>>>     performance hit that's probably noticeable only on trees deeper
>>> @@ -679,6 +807,16 @@ copy_reg (char const *src_name, char const *dst_name,
>>>  #endif
>>>          }
>>>
>>> +      if (make_holes)
>>> +        {
>>> +          /* Perform efficient FIEMAP copy for sparse files, fall back to the
>>> +             standard copy if fails.  */
>>> +          if (fiemap_copy_ok (source_desc, dest_desc,
>>> +                              buf_size, src_open_sb.st_size,
>>> +                              src_name, dst_name))
>>> +            goto preserve_metadata;
>>> +        }
> 
> Best Regards,
> -Jeff
> 





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5915; Package coreutils. (Sat, 17 Apr 2010 14:43:02 GMT) Full text and rfc822 format available.

Message #20 received at 5915 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, 5915 <at> debbugs.gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>, chris.mason <at> oracle.com
Subject: Re: bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1) and
	mv(1) v2
Date: Sat, 17 Apr 2010 16:42:41 +0200
jeff.liu wrote:
...
>>> +          uint64_t ext_len = fm_ext[i].fe_length;
>>
>>> +
>>> +          if (lseek (src_fd, (off_t) ext_logical, SEEK_SET) < 0LL)
>>> +            {
>>> +              error (0, errno, _("cannot lseek %s"), quote (src_name));
>>> +              return_val = false;
>>> +            }
>>> +
>>> +          if (lseek (dest_fd, (off_t) ext_logical, SEEK_SET) < 0LL)
>>> +            {
>>> +              error (0, errno, _("cannot lseek %s"), quote (dst_name));
>>> +              return_val = false;
>>> +            }
>>
>> If either lseek fails, then we must not proceed to read or write.
>> Setting "return_val" is insufficient.
>>
>> This makes me think we'll have to redesign a little, to avoid
>> the possibility of corruption.
>>
>> Consider this change:
>>
>>   If the fiemap ioctl succeeds: use only this specialized code to
>>   perform the copy and do not fall back on normal copy.  However, if
>>   any operation fails here (like these lseeks, read or write), we
>>   diagnose it as usual.  Not falling back upon failure has the added
>>   benefit that there's no risk of a duplicate diagnostic if the same
>>   failure occurs again during the regular copy.
>>
>> Otherwise, imagine that lseek advances the source FD,
>> but the destination lseek fails.  Now the FDs are inconsistent.
>> We would have to realign (or restore) them, or risk writing
>> data at the wrong offset: corruption.
> So, do you means if fiemap ioctl(2) succeeds, no matter what error(read, write, lseek) occurred
> afterwards, we should always diagnosing it and then abort with the corresponding error msg?
>
> My original design really cause the file corruption if the lseek(2) already advances the source FD
> but the destination lseek(2) fails.
>
> But is it make sense to seek back to the start offset for both source and destination FDs, then fall
> back to the regular copy?

Your changes should make cp handle these cases:

  ioctl succeeds and fiemap-copying succeeds
    nothing more to do for this pair of files
  ioctl succeeds and any aspect of fiemap-copying fails
    fail: do not try to copy any other way
  ioctl fails
    attempt to copy as usual

In each case, continue with any remaining files.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5915; Package coreutils. (Sat, 17 Apr 2010 15:11:02 GMT) Full text and rfc822 format available.

Message #23 received at 5915 <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, 5915 <at> debbugs.gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>, chris.mason <at> oracle.com
Subject: Re: bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1)
	and mv(1) v2
Date: Sat, 17 Apr 2010 23:10:22 +0800
Jim Meyering wrote:
> jeff.liu wrote:
> ...
>>>> +          uint64_t ext_len = fm_ext[i].fe_length;
>>>> +
>>>> +          if (lseek (src_fd, (off_t) ext_logical, SEEK_SET) < 0LL)
>>>> +            {
>>>> +              error (0, errno, _("cannot lseek %s"), quote (src_name));
>>>> +              return_val = false;
>>>> +            }
>>>> +
>>>> +          if (lseek (dest_fd, (off_t) ext_logical, SEEK_SET) < 0LL)
>>>> +            {
>>>> +              error (0, errno, _("cannot lseek %s"), quote (dst_name));
>>>> +              return_val = false;
>>>> +            }
>>> If either lseek fails, then we must not proceed to read or write.
>>> Setting "return_val" is insufficient.
>>>
>>> This makes me think we'll have to redesign a little, to avoid
>>> the possibility of corruption.
>>>
>>> Consider this change:
>>>
>>>   If the fiemap ioctl succeeds: use only this specialized code to
>>>   perform the copy and do not fall back on normal copy.  However, if
>>>   any operation fails here (like these lseeks, read or write), we
>>>   diagnose it as usual.  Not falling back upon failure has the added
>>>   benefit that there's no risk of a duplicate diagnostic if the same
>>>   failure occurs again during the regular copy.
>>>
>>> Otherwise, imagine that lseek advances the source FD,
>>> but the destination lseek fails.  Now the FDs are inconsistent.
>>> We would have to realign (or restore) them, or risk writing
>>> data at the wrong offset: corruption.
>> So, do you means if fiemap ioctl(2) succeeds, no matter what error(read, write, lseek) occurred
>> afterwards, we should always diagnosing it and then abort with the corresponding error msg?
>>
>> My original design really cause the file corruption if the lseek(2) already advances the source FD
>> but the destination lseek(2) fails.
>>
>> But is it make sense to seek back to the start offset for both source and destination FDs, then fall
>> back to the regular copy?
> 
> Your changes should make cp handle these cases:
> 
>   ioctl succeeds and fiemap-copying succeeds
>     nothing more to do for this pair of files
>   ioctl succeeds and any aspect of fiemap-copying fails
>     fail: do not try to copy any other way
>   ioctl fails
>     attempt to copy as usual
> 
> In each case, continue with any remaining files.
Thanks for the detailed info!

-Jeff




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5915; Package coreutils. (Mon, 19 Apr 2010 09:48:02 GMT) Full text and rfc822 format available.

Message #26 received at 5915 <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, 5915 <at> debbugs.gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>, chris.mason <at> oracle.com
Subject: Re: bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1)
	and mv(1) v2
Date: Mon, 19 Apr 2010 17:45:55 +0800
Hi Jim,

Thanks for your comments for my last patch submition!

Now the fiemap copy logical changed to:
1. If fiemap copy succeeds, go ahead with the meta data preserve procedure if they are user specified.
2. If fiemap_copy fails and the 'normal_copy_required' is true, which means the ioctl(2) fails.
In this case, fall back to normal copy.  'bool normal_copy_required' in fiemap_copy_ok() is a new
argument added in this version used to determine whether normal copy require or not.
3. If fiemap_copy fails for other reason, go to 'close_src_and_dst_desc' directly.

I also add a test script to 'tests/cp/sparse-fiemap-copy'(I can not find out a better name at the
moment), it create a 41M sparse file on btrfs/ext4/ocfs2 separately, and then try copy them in those
2 different ways.
It supply two result verify steps, the first is to compare the sum of 'sys and user' time by run
each copy via '/usr/bin/time', the fiemap copy should run faster than normal copy.
2nd is compare the file size in bytes, the fiemap copied file size should be equal to the orignally.

Would you please review the following patches at your convenient time?

From 561b7b3413de16595fc71f0cba5bfa918b42f9e5 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Sun, 18 Apr 2010 14:31:20 +0800
Subject: [PATCH 1/2] Introduct fiemap ioctl(2) for efficient sparse file copy.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 src/copy.c   |  153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/fiemap.h |  102 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 255 insertions(+), 0 deletions(-)
 create mode 100644 src/fiemap.h

diff --git a/src/copy.c b/src/copy.c
index 3c32fa3..e7adecc 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -65,6 +65,10 @@
 # include <sys/ioctl.h>
 #endif

+#ifndef HAVE_FIEMAP
+# include "fiemap.h"
+#endif
+
 #ifndef HAVE_FCHOWN
 # define HAVE_FCHOWN false
 # define fchown(fd, uid, gid) (-1)
@@ -151,6 +155,133 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

+#ifdef __linux__
+# ifndef FS_IOC_FIEMAP
+#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
+# endif
+/* Perform FIEMAP(available in mainline 2.6.27) copy if possible.
+   Call ioctl(2) with FS_IOC_FIEMAP to efficiently map file allocation
+   excepts holes.  So the overhead to deal with holes with lseek(2) in
+   normal copy could be saved.  This would result in much faster backups
+   for any kind of sparse file.  */
+static bool
+fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
+                off_t src_total_size, char const *src_name,
+                char const *dst_name, bool *normal_copy_required)
+{
+  bool fail = false;
+  bool last = false;
+  char fiemap_buf[4096];
+  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
+  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
+  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
+                    sizeof (struct fiemap_extent);
+  unsigned int i;
+  off_t last_ext_logical = 0;
+  uint64_t last_ext_len = 0;
+  uint64_t last_read_size = 0;
+
+  do
+    {
+      fiemap->fm_start = 0ULL;
+      fiemap->fm_length = FIEMAP_MAX_OFFSET;
+      fiemap->fm_extent_count = count;
+
+      /* When ioctl(2) fails for any reason, fall back to the normal copy.  */
+      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
+        {
+          *normal_copy_required = true;
+          return false;
+        }
+
+      /* If 0 extents are returned, then more ioctls are not needed.  */
+      if (fiemap->fm_mapped_extents == 0)
+          return true;
+
+      for (i = 0; i < fiemap->fm_mapped_extents; i++)
+        {
+          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
+
+          off_t ext_logical = fm_ext[i].fe_logical;
+          uint64_t ext_len = fm_ext[i].fe_length;
+
+          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (src_name));
+              return fail;
+            }
+
+          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (dst_name));
+              return fail;
+            }
+
+          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
+            {
+              last_ext_logical = ext_logical;
+              last_ext_len = ext_len;
+              last = true;
+            }
+
+          while (0 < ext_len)
+            {
+              char buf[buf_size];
+
+              /* Avoid reading into the holes if the left extent
+                 length is shorter than the buffer size.  */
+              if (ext_len < buf_size)
+                buf_size = ext_len;
+
+              ssize_t n_read = read (src_fd, buf, buf_size);
+              if (n_read < 0)
+                {
+#ifdef EINTR
+                  if (errno == EINTR)
+                    continue;
+#endif
+                  error (0, errno, _("reading %s"), quote (src_name));
+                  return fail;
+                }
+
+              if (n_read == 0)
+                {
+                  /* Figure out how many bytes read from the last extent.  */
+                  last_read_size = last_ext_len - ext_len;
+                  break;
+                }
+
+              if (full_write (dest_fd, buf, n_read) != n_read)
+                {
+                  error (0, errno, _("writing %s"), quote (dst_name));
+                  return fail;
+                }
+
+              ext_len -= n_read;
+            }
+
+          fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
+        }
+    } while (! last);
+
+  /* If a file ends up with holes, the sum of the last extent logical offset
+     and the read-returned size will be shorter than the actual size of the
+     file.  Use ftruncate to extend the length of the destination file.  */
+  if (last_ext_logical + last_read_size < src_total_size)
+    {
+      if (ftruncate (dest_fd, src_total_size) < 0)
+        {
+          error (0, errno, _("extending %s"), quote (dst_name));
+          return fail;
+        }
+    }
+
+  return ! fail;
+}
+#else
+static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
+#endif
+
 /* FIXME: describe */
 /* FIXME: rewrite this to use a hash table so we avoid the quadratic
    performance hit that's probably noticeable only on trees deeper
@@ -679,6 +810,25 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }

+      if (make_holes)
+        {
+          bool require_normal_copy = false;
+          /* Perform efficient FIEMAP copy for sparse files, fall back to the
+             standard copy only if the ioctl(2) fails.  */
+          if (fiemap_copy_ok (source_desc, dest_desc, buf_size,
+                              src_open_sb.st_size, src_name,
+                              dst_name, &require_normal_copy))
+            goto preserve_metadata;
+          else
+            {
+              if (! require_normal_copy)
+                {
+                  return_val = false;
+                  goto close_src_and_dst_desc;
+                }
+            }
+        }
+
       /* If not making a sparse file, try to use a more-efficient
          buffer size.  */
       if (! make_holes)
@@ -807,6 +957,8 @@ copy_reg (char const *src_name, char const *dst_name,
         }
     }

+preserve_metadata:
+
   if (x->preserve_timestamps)
     {
       struct timespec timespec[2];
@@ -897,6 +1049,7 @@ close_src_desc:

   free (buf_alloc);
   free (name_alloc);
+
   return return_val;
 }

diff --git a/src/fiemap.h b/src/fiemap.h
new file mode 100644
index 0000000..d33293b
--- /dev/null
+++ b/src/fiemap.h
@@ -0,0 +1,102 @@
+/* FS_IOC_FIEMAP ioctl infrastructure.
+   Some portions copyright (C) 2007 Cluster File Systems, Inc
+   Authors: Mark Fasheh <mfasheh <at> suse.com>
+            Kalpak Shah <kalpak.shah <at> sun.com>
+            Andreas Dilger <adilger <at> sun.com>.  */
+
+/* Copy from kernel, modified to respect GNU code style by Jie Liu.  */
+
+#ifndef _LINUX_FIEMAP_H
+# define _LINUX_FIEMAP_H
+
+# include <linux/types.h>
+
+struct fiemap_extent
+{
+  /* Logical offset in bytes for the start of the extent
+     from the beginning of the file.  */
+  uint64_t fe_logical;
+
+  /* Physical offset in bytes for the start of the extent
+     from the beginning of the disk.  */
+  uint64_t fe_physical;
+
+  /* Length in bytes for this extent.  */
+  uint64_t fe_length;
+
+  uint64_t fe_reserved64[2];
+
+  /* FIEMAP_EXTENT_* flags for this extent.  */
+  uint32_t fe_flags;
+
+  uint32_t fe_reserved[3];
+};
+
+struct fiemap
+{
+  /* Logical offset(inclusive) at which to start mapping(in).  */
+  uint64_t fm_start;
+
+  /* Logical length of mapping which userspace wants(in).  */
+  uint64_t fm_length;
+
+  /* FIEMAP_FLAG_* flags for request(in/out).  */
+  uint32_t fm_flags;
+
+  /* Number of extents that were mapped(out).  */
+  uint32_t fm_mapped_extents;
+
+  /* Size of fm_extents array(in).  */
+  uint32_t fm_extent_count;
+
+  uint32_t fm_reserved;
+
+  /* Array of mapped extents(out).  */
+  struct fiemap_extent fm_extents[0];
+};
+
+/* The maximum offset can be mapped for a file.  */
+# define FIEMAP_MAX_OFFSET       (~0ULL)
+
+/* Sync file data before map.  */
+# define FIEMAP_FLAG_SYNC        0x00000001
+
+/* Map extented attribute tree.  */
+# define FIEMAP_FLAG_XATTR       0x00000002
+
+# define FIEMAP_FLAGS_COMPAT     (FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR)
+
+/* Last extent in file.  */
+# define FIEMAP_EXTENT_LAST              0x00000001
+
+/* Data location unknown.  */
+# define FIEMAP_EXTENT_UNKNOWN           0x00000002
+
+/* Location still pending, Sets EXTENT_UNKNOWN.  */
+# define FIEMAP_EXTENT_DELALLOC          0x00000004
+
+/* Data can not be read while fs is unmounted.  */
+# define FIEMAP_EXTENT_ENCODED           0x00000008
+
+/* Data is encrypted by fs.  Sets EXTENT_NO_BYPASS.  */
+# define FIEMAP_EXTENT_DATA_ENCRYPTED    0x00000080
+
+/* Extent offsets may not be block aligned.  */
+# define FIEMAP_EXTENT_NOT_ALIGNED       0x00000100
+
+/* Data mixed with metadata.  Sets EXTENT_NOT_ALIGNED.  */
+# define FIEMAP_EXTENT_DATA_INLINE       0x00000200
+
+/* Multiple files in block.  Set EXTENT_NOT_ALIGNED.  */
+# define FIEMAP_EXTENT_DATA_TAIL         0x00000400
+
+/* Space allocated, but not data (i.e. zero).  */
+# define FIEMAP_EXTENT_UNWRITTEN         0x00000800
+
+/* File does not natively support extents.  Result merged for efficiency.  */
+# define FIEMAP_EXTENT_MERGED		0x00001000
+
+/* Space shared with other files.  */
+# define FIEMAP_EXTENT_SHARED            0x00002000
+
+#endif
-- 
1.5.4.3


From 99927649578934e3436d8370e3a7d34da5ee9195 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Mon, 19 Apr 2010 15:40:05 +0800
Subject: [PATCH 2/2] Add fiemap copy test script to tests/cp/sparse-fiemap-copy.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 tests/Makefile.am           |    1 +
 tests/cp/sparse-fiemap-copy |   71 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+), 0 deletions(-)
 create mode 100755 tests/cp/sparse-fiemap-copy

diff --git a/tests/Makefile.am b/tests/Makefile.am
index db1610d..21c1f32 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -315,6 +315,7 @@ TESTS =						\
   cp/same-file					\
   cp/slink-2-slink				\
   cp/sparse					\
+  cp/sparse-fiemap-copy                         \
   cp/special-f					\
   cp/src-base-dot				\
   cp/symlink-slash				\
diff --git a/tests/cp/sparse-fiemap-copy b/tests/cp/sparse-fiemap-copy
new file mode 100755
index 0000000..8ae31da
--- /dev/null
+++ b/tests/cp/sparse-fiemap-copy
@@ -0,0 +1,71 @@
+#!/bin/sh
+# Test cp --sparse=always with fiemap copy
+
+# Copyright (C) 2006-2010 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+if test "$VERBOSE" = yes; then
+  set -x
+  cp --version
+fi
+
+. $srcdir/test-lib.sh
+
+test -x "/bin/cp" && cp_orig="/bin/cp" || test -x "/usr/bin/cp" && cp_orig="/usr/bin/cp" \
+|| test -x "/usr/local/bin/cp" && cp_orig="/usr/local/bin/cp" \
+|| test -x "$abs_top_builddir/src/cp" && cp_new="$abs_top_builddir/src/cp" || Exit 1
+
+# Please forgive me for the hard code
+ocfs2_mount_point="/ocfs2"
+ocfs2_sparse=$ocfs2_mount_point/sparse_file
+ocfs2_sparse_normal=$ocfs2_mount_point/sparse_normal_copy
+ocfs2_sparse_fiemap=$ocfs2_mount_point/sparse_fiemap_copy
+
+btrfs_mount_point="/btrfs"
+btrfs_sparse=$btrfs_mount_point/sparse_file
+btrfs_sparse_normal=$btrfs_mount_point/sparse_normal_copy
+btrfs_sparse_fiemap=$btrfs_mount_point/sparse_fiemap_copy
+
+ext4_mount_point="/ext4"
+ext4_sparse=$ext4_mount_point/sparse_file
+ext4_sparse_normal=$ext4_mount_point/sparse_normal_copy
+ext4_sparse_fiemap=$ext4_mount_point/sparse_fiemap_copy
+
+# Create a 41Mbytes sparse file for each file sysytem.
+size=`expr 10 \* 1024`
+dd if=/dev/zero bs=4k count=1 seek=$size of=$ocfs2_sparse > /dev/null 2>&1 || fail=1
+dd if=/dev/zero bs=4k count=1 seek=$size of=$btrfs_sparse > /dev/null 2>&1 || fail=1
+dd if=/dev/zero bs=4k count=1 seek=$size of=$ext4_sparse > /dev/null 2>&1 || fail=1
+
+my_time="/usr/bin/time"
+t1=$($my_time -f "%U + %S" $cp_orig --sparse=always $ocfs2_sparse $ocfs2_sparse_normal 2>&1 | bc)
+t2=$($my_time -f "%U + %S" $cp_new --sparse=always $ocfs2_sparse $ocfs2_sparse_fiemap 2>&1 | bc)
+# Ensure that the fiemap copy has the same size in bytes as the original.
+test `stat --printf %s $ocfs2_sparse` -eq `stat --printf %s $ocfs2_sparse_fiemap` || fail=1
+echo "$t2 < $t1" | bc || fail=1
+
+t1=$($my_time -f "%U + %S" $cp_orig --sparse=always $btrfs_sparse $btrfs_sparse_normal 2>&1 | bc)
+t2=$($my_time -f "%U + %S" $cp_new --sparse=always $btrfs_sparse $btrfs_sparse_fiemap 2>&1 | bc)
+# Ensure that the fiemap copy has the same size in bytes as the original.
+test `stat --printf %s $btrfs_sparse` -eq `stat --printf %s $btrfs_sparse_fiemap` || fail=1
+echo "$t2 < $t1" | bc || fail=1
+
+t1=$($my_time -f "%U + %S" $cp_orig --sparse=always $ext4_sparse $ext4_sparse_normal 2>&1 | bc)
+t2=$($my_time -f "%U + %S" $cp_new --sparse=always $ext4_sparse $ext4_sparse_fiemap 2>&1 | bc)
+# Ensure that the fiemap copy has the same size in bytes as the original.
+test `stat --printf %s $ext4_sparse` -eq `stat --printf %s $ext4_sparse_fiemap` || fail=1
+echo "$t2 < $t1" | bc || fail=1
+
+Exit $fail
-- 
1.5.4.3



Best Regards,
-Jeff




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5915; Package coreutils. (Thu, 22 Apr 2010 03:05:02 GMT) Full text and rfc822 format available.

Message #29 received at 5915 <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, 5915 <at> debbugs.gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>, chris.mason <at> oracle.com
Subject: Re: bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1)
	and mv(1) v2
Date: Thu, 22 Apr 2010 11:04:09 +0800
jeff.liu wrote:
> Hi Jim,
> 
> Thanks for your comments for my last patch submition!
> 
> Now the fiemap copy logical changed to:
> 1. If fiemap copy succeeds, go ahead with the meta data preserve procedure if they are user specified.
> 2. If fiemap_copy fails and the 'normal_copy_required' is true, which means the ioctl(2) fails.
> In this case, fall back to normal copy.  'bool normal_copy_required' in fiemap_copy_ok() is a new
> argument added in this version used to determine whether normal copy require or not.
> 3. If fiemap_copy fails for other reason, go to 'close_src_and_dst_desc' directly.
> 
> I also add a test script to 'tests/cp/sparse-fiemap-copy'(I can not find out a better name at the
> moment), it create a 41M sparse file on btrfs/ext4/ocfs2 separately, and then try copy them in those
> 2 different ways.
> It supply two result verify steps, the first is to compare the sum of 'sys and user' time by run
> each copy via '/usr/bin/time', the fiemap copy should run faster than normal copy.
> 2nd is compare the file size in bytes, the fiemap copied file size should be equal to the orignally.
> 
> Would you please review the following patches at your convenient time?
> 
> From 561b7b3413de16595fc71f0cba5bfa918b42f9e5 Mon Sep 17 00:00:00 2001
> From: Jie Liu <jeff.liu <at> oracle.com>
> Date: Sun, 18 Apr 2010 14:31:20 +0800
> Subject: [PATCH 1/2] Introduct fiemap ioctl(2) for efficient sparse file copy.
> 
> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
> ---
>  src/copy.c   |  153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  src/fiemap.h |  102 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 255 insertions(+), 0 deletions(-)
>  create mode 100644 src/fiemap.h
> 
> diff --git a/src/copy.c b/src/copy.c
> index 3c32fa3..e7adecc 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -65,6 +65,10 @@
>  # include <sys/ioctl.h>
>  #endif
> 
> +#ifndef HAVE_FIEMAP
> +# include "fiemap.h"
> +#endif
> +
>  #ifndef HAVE_FCHOWN
>  # define HAVE_FCHOWN false
>  # define fchown(fd, uid, gid) (-1)
> @@ -151,6 +155,133 @@ clone_file (int dest_fd, int src_fd)
>  #endif
>  }
> 
> +#ifdef __linux__
> +# ifndef FS_IOC_FIEMAP
> +#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
> +# endif
> +/* Perform FIEMAP(available in mainline 2.6.27) copy if possible.
> +   Call ioctl(2) with FS_IOC_FIEMAP to efficiently map file allocation
> +   excepts holes.  So the overhead to deal with holes with lseek(2) in
> +   normal copy could be saved.  This would result in much faster backups
> +   for any kind of sparse file.  */
> +static bool
> +fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
> +                off_t src_total_size, char const *src_name,
> +                char const *dst_name, bool *normal_copy_required)
> +{
> +  bool fail = false;
> +  bool last = false;
> +  char fiemap_buf[4096];
> +  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
> +  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
> +  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
> +                    sizeof (struct fiemap_extent);
> +  unsigned int i;
> +  off_t last_ext_logical = 0;
> +  uint64_t last_ext_len = 0;
> +  uint64_t last_read_size = 0;
> +
> +  do
> +    {
> +      fiemap->fm_start = 0ULL;
> +      fiemap->fm_length = FIEMAP_MAX_OFFSET;
> +      fiemap->fm_extent_count = count;
> +
> +      /* When ioctl(2) fails for any reason, fall back to the normal copy.  */
> +      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
> +        {
> +          *normal_copy_required = true;
> +          return false;
> +        }
> +
> +      /* If 0 extents are returned, then more ioctls are not needed.  */
> +      if (fiemap->fm_mapped_extents == 0)
> +          return true;
This is a bug I just tried out. 'return true' is not enough if '0' extents returned.
Below is the test case could reveal this issue, it just seek(2) but not write any byte, its extent
count is *zero* because there is no data blocks allocated from the extent tree.  In this case,
I should truncate it to the src_desc.st_size, this issue will be fixed in my next post.
dd bs=1 seek=10240 of=/ocfs2/sparse </dev/null 2>/dev/null

Besides, I have tried this case on ocfs2,ext4 and btrfs, the first 2 file systems could handle this
kind of sparse file properly, but btrfs actually allocaed an extent for it.  IMHO, maybe this is a
btrfs issue, I will post it to the mail-list for further discussion.
> +
> +      for (i = 0; i < fiemap->fm_mapped_extents; i++)
> +        {
> +          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
> +
> +          off_t ext_logical = fm_ext[i].fe_logical;
> +          uint64_t ext_len = fm_ext[i].fe_length;
> +
> +          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
> +            {
> +              error (0, errno, _("cannot lseek %s"), quote (src_name));
> +              return fail;
> +            }
> +
> +          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0LL)
> +            {
> +              error (0, errno, _("cannot lseek %s"), quote (dst_name));
> +              return fail;
> +            }
> +
> +          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
> +            {
> +              last_ext_logical = ext_logical;
> +              last_ext_len = ext_len;
> +              last = true;
> +            }
> +
> +          while (0 < ext_len)
> +            {
> +              char buf[buf_size];
> +
> +              /* Avoid reading into the holes if the left extent
> +                 length is shorter than the buffer size.  */
> +              if (ext_len < buf_size)
> +                buf_size = ext_len;
> +
> +              ssize_t n_read = read (src_fd, buf, buf_size);
> +              if (n_read < 0)
> +                {
> +#ifdef EINTR
> +                  if (errno == EINTR)
> +                    continue;
> +#endif
> +                  error (0, errno, _("reading %s"), quote (src_name));
> +                  return fail;
> +                }
> +
> +              if (n_read == 0)
> +                {
> +                  /* Figure out how many bytes read from the last extent.  */
> +                  last_read_size = last_ext_len - ext_len;
> +                  break;
> +                }
> +
> +              if (full_write (dest_fd, buf, n_read) != n_read)
> +                {
> +                  error (0, errno, _("writing %s"), quote (dst_name));
> +                  return fail;
> +                }
> +
> +              ext_len -= n_read;
> +            }
> +
> +          fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
> +        }
> +    } while (! last);
> +
goto Here if fiemap->fm_mapped_extents == 0.
> +  /* If a file ends up with holes, the sum of the last extent logical offset
> +     and the read-returned size will be shorter than the actual size of the
> +     file.  Use ftruncate to extend the length of the destination file.  */
> +  if (last_ext_logical + last_read_size < src_total_size)
> +    {
> +      if (ftruncate (dest_fd, src_total_size) < 0)
> +        {
> +          error (0, errno, _("extending %s"), quote (dst_name));
> +          return fail;
> +        }
> +    }
> +
> +  return ! fail;
> +}
> +#else
> +static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
> +#endif
> +
>  /* FIXME: describe */
>  /* FIXME: rewrite this to use a hash table so we avoid the quadratic
>     performance hit that's probably noticeable only on trees deeper
> @@ -679,6 +810,25 @@ copy_reg (char const *src_name, char const *dst_name,
>  #endif
>          }
> 
> +      if (make_holes)
> +        {
> +          bool require_normal_copy = false;
> +          /* Perform efficient FIEMAP copy for sparse files, fall back to the
> +             standard copy only if the ioctl(2) fails.  */
> +          if (fiemap_copy_ok (source_desc, dest_desc, buf_size,
> +                              src_open_sb.st_size, src_name,
> +                              dst_name, &require_normal_copy))
> +            goto preserve_metadata;
> +          else
> +            {
> +              if (! require_normal_copy)
> +                {
> +                  return_val = false;
> +                  goto close_src_and_dst_desc;
> +                }
> +            }
> +        }
> +
>        /* If not making a sparse file, try to use a more-efficient
>           buffer size.  */
>        if (! make_holes)
> @@ -807,6 +957,8 @@ copy_reg (char const *src_name, char const *dst_name,
>          }
>      }
> 
> +preserve_metadata:
> +
>    if (x->preserve_timestamps)
>      {
>        struct timespec timespec[2];
> @@ -897,6 +1049,7 @@ close_src_desc:
> 
>    free (buf_alloc);
>    free (name_alloc);
> +
>    return return_val;
>  }
> 
> diff --git a/src/fiemap.h b/src/fiemap.h
> new file mode 100644
> index 0000000..d33293b
> --- /dev/null
> +++ b/src/fiemap.h
> @@ -0,0 +1,102 @@
> +/* FS_IOC_FIEMAP ioctl infrastructure.
> +   Some portions copyright (C) 2007 Cluster File Systems, Inc
> +   Authors: Mark Fasheh <mfasheh <at> suse.com>
> +            Kalpak Shah <kalpak.shah <at> sun.com>
> +            Andreas Dilger <adilger <at> sun.com>.  */
> +
> +/* Copy from kernel, modified to respect GNU code style by Jie Liu.  */
> +
> +#ifndef _LINUX_FIEMAP_H
> +# define _LINUX_FIEMAP_H
> +
> +# include <linux/types.h>
> +
> +struct fiemap_extent
> +{
> +  /* Logical offset in bytes for the start of the extent
> +     from the beginning of the file.  */
> +  uint64_t fe_logical;
> +
> +  /* Physical offset in bytes for the start of the extent
> +     from the beginning of the disk.  */
> +  uint64_t fe_physical;
> +
> +  /* Length in bytes for this extent.  */
> +  uint64_t fe_length;
> +
> +  uint64_t fe_reserved64[2];
> +
> +  /* FIEMAP_EXTENT_* flags for this extent.  */
> +  uint32_t fe_flags;
> +
> +  uint32_t fe_reserved[3];
> +};
> +
> +struct fiemap
> +{
> +  /* Logical offset(inclusive) at which to start mapping(in).  */
> +  uint64_t fm_start;
> +
> +  /* Logical length of mapping which userspace wants(in).  */
> +  uint64_t fm_length;
> +
> +  /* FIEMAP_FLAG_* flags for request(in/out).  */
> +  uint32_t fm_flags;
> +
> +  /* Number of extents that were mapped(out).  */
> +  uint32_t fm_mapped_extents;
> +
> +  /* Size of fm_extents array(in).  */
> +  uint32_t fm_extent_count;
> +
> +  uint32_t fm_reserved;
> +
> +  /* Array of mapped extents(out).  */
> +  struct fiemap_extent fm_extents[0];
> +};
> +
> +/* The maximum offset can be mapped for a file.  */
> +# define FIEMAP_MAX_OFFSET       (~0ULL)
> +
> +/* Sync file data before map.  */
> +# define FIEMAP_FLAG_SYNC        0x00000001
> +
> +/* Map extented attribute tree.  */
> +# define FIEMAP_FLAG_XATTR       0x00000002
> +
> +# define FIEMAP_FLAGS_COMPAT     (FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR)
> +
> +/* Last extent in file.  */
> +# define FIEMAP_EXTENT_LAST              0x00000001
> +
> +/* Data location unknown.  */
> +# define FIEMAP_EXTENT_UNKNOWN           0x00000002
> +
> +/* Location still pending, Sets EXTENT_UNKNOWN.  */
> +# define FIEMAP_EXTENT_DELALLOC          0x00000004
> +
> +/* Data can not be read while fs is unmounted.  */
> +# define FIEMAP_EXTENT_ENCODED           0x00000008
> +
> +/* Data is encrypted by fs.  Sets EXTENT_NO_BYPASS.  */
> +# define FIEMAP_EXTENT_DATA_ENCRYPTED    0x00000080
> +
> +/* Extent offsets may not be block aligned.  */
> +# define FIEMAP_EXTENT_NOT_ALIGNED       0x00000100
> +
> +/* Data mixed with metadata.  Sets EXTENT_NOT_ALIGNED.  */
> +# define FIEMAP_EXTENT_DATA_INLINE       0x00000200
> +
> +/* Multiple files in block.  Set EXTENT_NOT_ALIGNED.  */
> +# define FIEMAP_EXTENT_DATA_TAIL         0x00000400
> +
> +/* Space allocated, but not data (i.e. zero).  */
> +# define FIEMAP_EXTENT_UNWRITTEN         0x00000800
> +
> +/* File does not natively support extents.  Result merged for efficiency.  */
> +# define FIEMAP_EXTENT_MERGED		0x00001000
> +
> +/* Space shared with other files.  */
> +# define FIEMAP_EXTENT_SHARED            0x00002000
> +
> +#endif





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5915; Package coreutils. (Thu, 22 Apr 2010 14:46:02 GMT) Full text and rfc822 format available.

Message #32 received at 5915 <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, 5915 <at> debbugs.gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>, chris.mason <at> oracle.com
Subject: Re: bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1)
	and mv(1) v2
Date: Thu, 22 Apr 2010 22:44:14 +0800
jeff.liu wrote:
> jeff.liu wrote:
>> Hi Jim,
>>
>> Thanks for your comments for my last patch submition!
>>
>> Now the fiemap copy logical changed to:
>> 1. If fiemap copy succeeds, go ahead with the meta data preserve procedure if they are user specified.
>> 2. If fiemap_copy fails and the 'normal_copy_required' is true, which means the ioctl(2) fails.
>> In this case, fall back to normal copy.  'bool normal_copy_required' in fiemap_copy_ok() is a new
>> argument added in this version used to determine whether normal copy require or not.
>> 3. If fiemap_copy fails for other reason, go to 'close_src_and_dst_desc' directly.
>>
>> I also add a test script to 'tests/cp/sparse-fiemap-copy'(I can not find out a better name at the
>> moment), it create a 41M sparse file on btrfs/ext4/ocfs2 separately, and then try copy them in those
>> 2 different ways.
>> It supply two result verify steps, the first is to compare the sum of 'sys and user' time by run
>> each copy via '/usr/bin/time', the fiemap copy should run faster than normal copy.
>> 2nd is compare the file size in bytes, the fiemap copied file size should be equal to the orignally.
>>
>> Would you please review the following patches at your convenient time?
>>
>> From 561b7b3413de16595fc71f0cba5bfa918b42f9e5 Mon Sep 17 00:00:00 2001
>> From: Jie Liu <jeff.liu <at> oracle.com>
>> Date: Sun, 18 Apr 2010 14:31:20 +0800
>> Subject: [PATCH 1/2] Introduct fiemap ioctl(2) for efficient sparse file copy.
>>
>> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
>> ---
>>  src/copy.c   |  153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  src/fiemap.h |  102 ++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 255 insertions(+), 0 deletions(-)
>>  create mode 100644 src/fiemap.h
>>
>> diff --git a/src/copy.c b/src/copy.c
>> index 3c32fa3..e7adecc 100644
>> --- a/src/copy.c
>> +++ b/src/copy.c
>> @@ -65,6 +65,10 @@
>>  # include <sys/ioctl.h>
>>  #endif
>>
>> +#ifndef HAVE_FIEMAP
>> +# include "fiemap.h"
>> +#endif
>> +
>>  #ifndef HAVE_FCHOWN
>>  # define HAVE_FCHOWN false
>>  # define fchown(fd, uid, gid) (-1)
>> @@ -151,6 +155,133 @@ clone_file (int dest_fd, int src_fd)
>>  #endif
>>  }
>>
>> +#ifdef __linux__
>> +# ifndef FS_IOC_FIEMAP
>> +#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
>> +# endif
>> +/* Perform FIEMAP(available in mainline 2.6.27) copy if possible.
>> +   Call ioctl(2) with FS_IOC_FIEMAP to efficiently map file allocation
>> +   excepts holes.  So the overhead to deal with holes with lseek(2) in
>> +   normal copy could be saved.  This would result in much faster backups
>> +   for any kind of sparse file.  */
>> +static bool
>> +fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
>> +                off_t src_total_size, char const *src_name,
>> +                char const *dst_name, bool *normal_copy_required)
>> +{
>> +  bool fail = false;
>> +  bool last = false;
>> +  char fiemap_buf[4096];
>> +  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
>> +  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
>> +  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
>> +                    sizeof (struct fiemap_extent);
>> +  unsigned int i;
>> +  off_t last_ext_logical = 0;
>> +  uint64_t last_ext_len = 0;
>> +  uint64_t last_read_size = 0;
>> +
>> +  do
>> +    {
>> +      fiemap->fm_start = 0ULL;
>> +      fiemap->fm_length = FIEMAP_MAX_OFFSET;
>> +      fiemap->fm_extent_count = count;
>> +
>> +      /* When ioctl(2) fails for any reason, fall back to the normal copy.  */
>> +      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
>> +        {
>> +          *normal_copy_required = true;
>> +          return false;
>> +        }
>> +
>> +      /* If 0 extents are returned, then more ioctls are not needed.  */
>> +      if (fiemap->fm_mapped_extents == 0)
>> +          return true;
> This is a bug I just tried out. 'return true' is not enough if '0' extents returned.
> Below is the test case could reveal this issue, it just seek(2) but not write any byte, its extent
> count is *zero* because there is no data blocks allocated from the extent tree.  In this case,
> I should truncate it to the src_desc.st_size, this issue will be fixed in my next post.
> dd bs=1 seek=10240 of=/ocfs2/sparse </dev/null 2>/dev/null
> 
> Besides, I have tried this case on ocfs2,ext4 and btrfs, the first 2 file systems could handle this
> kind of sparse file properly, but btrfs actually allocaed an extent for it.  IMHO, maybe this is a
> btrfs issue, I will post it to the mail-list for further discussion.
FYI, Tao has submitted a patch against the issue which I mentioned above.
For detail, please check the following link:
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/5416

I have tried this fix, it works fine.  Besides that, looks this fix also solved the performance
issue for fiemap copy vs normal copy on btrfs:
jeff <at> jeff-laptop:~/opensource_dev/coreutils$ ls -l /btrfs/sparse_file
-rw-r--r-- 1 jeff jeff 41947136 Apr 19 16:54 /btrfs/sparse_file
jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time /bin/cp --sparse=always /btrfs/sparse_file
/btrfs/sparse_normal_copy

real	0m0.208s
user	0m0.016s
sys	0m0.164s
jeff <at> jeff-laptop:~/opensource_dev/coreutils$ time ./src/cp --sparse=always /btrfs/sparse_file
/btrfs/sparse_fiemap_copy

real	0m0.050s
user	0m0.000s
sys	0m0.040s

Below is the updated patch, it fixed 2 issues:
1. If the extent count is 0, we should truncated it to the "src_desc (struct stat)buf.st_size" if
possible instead of return.
2. I just think of another possible issue, if ioctl(2) fails, but it is not the first time in the
loop.  In this point, we also need to stop the copy rather than fall back to the normal one.
Otherwise, the destination file will be corrupted as well, Am I right?


From 9d049ceef3cfefbf3a85254044458dd9a39eb944 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Thu, 22 Apr 2010 21:48:09 +0800
Subject: [PATCH 1/1] Add fiemap support for efficient sparse file copy.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 src/copy.c |  158 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 158 insertions(+), 0 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 3c32fa3..5f2da52 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -65,6 +65,10 @@
 # include <sys/ioctl.h>
 #endif

+#ifndef HAVE_FIEMAP
+# include "fiemap.h"
+#endif
+
 #ifndef HAVE_FCHOWN
 # define HAVE_FCHOWN false
 # define fchown(fd, uid, gid) (-1)
@@ -151,6 +155,138 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

+#ifdef __linux__
+# ifndef FS_IOC_FIEMAP
+#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
+# endif
+/* Perform FIEMAP(available in mainline 2.6.27) copy if possible.
+   Call ioctl(2) with FS_IOC_FIEMAP to efficiently map file allocation
+   excepts holes.  So the overhead to deal with holes with lseek(2) in
+   normal copy could be saved.  This would result in much faster backups
+   for any kind of sparse file.  */
+static bool
+fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
+                off_t src_total_size, char const *src_name,
+                char const *dst_name, bool *normal_copy_required)
+{
+  bool fail = false;
+  bool last = false;
+  char fiemap_buf[4096];
+  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
+  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
+  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
+                    sizeof (struct fiemap_extent);
+  off_t last_ext_logical = 0;
+  uint64_t last_ext_len = 0;
+  uint64_t last_read_size = 0;
+  unsigned int n_ioctl = 0;
+  unsigned int i;
+
+  do
+    {
+      fiemap->fm_start = 0ULL;
+      fiemap->fm_length = FIEMAP_MAX_OFFSET;
+      fiemap->fm_extent_count = count;
+
+      /* When ioctl(2) fails, if this is the frist time we met, fall back
+         to the normal copy.  */
+      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
+        {
+          if (n_ioctl < 1)
+            *normal_copy_required = true;
+          return false;
+        }
+
+      ++n_ioctl;
+
+      /* If 0 extents are returned, then more ioctls are not needed.  */
+      if (fiemap->fm_mapped_extents == 0)
+        break;
+
+      for (i = 0; i < fiemap->fm_mapped_extents; i++)
+        {
+          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
+
+          off_t ext_logical = fm_ext[i].fe_logical;
+          uint64_t ext_len = fm_ext[i].fe_length;
+
+          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (src_name));
+              return fail;
+            }
+
+          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (dst_name));
+              return fail;
+            }
+
+          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
+            {
+              last_ext_logical = ext_logical;
+              last_ext_len = ext_len;
+              last = true;
+            }
+
+          while (0 < ext_len)
+            {
+              char buf[buf_size];
+
+              /* Avoid reading into the holes if the left extent
+                 length is shorter than the buffer size.  */
+              if (ext_len < buf_size)
+                buf_size = ext_len;
+
+              ssize_t n_read = read (src_fd, buf, buf_size);
+              if (n_read < 0)
+                {
+#ifdef EINTR
+                  if (errno == EINTR)
+                    continue;
+#endif
+                  error (0, errno, _("reading %s"), quote (src_name));
+                  return fail;
+                }
+
+              if (n_read == 0)
+                {
+                  /* Figure out how many bytes read from the last extent.  */
+                  last_read_size = last_ext_len - ext_len;
+                  break;
+                }
+
+              if (full_write (dest_fd, buf, n_read) != n_read)
+                {
+                  error (0, errno, _("writing %s"), quote (dst_name));
+                  return fail;
+                }
+
+              ext_len -= n_read;
+            }
+
+          fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
+        }
+    } while (! last);
+
+  /* If a file ends up with holes, the sum of the last extent logical offset
+     and the read-returned size will be shorter than the actual size of the
+     file.  Use ftruncate to extend the length of the destination file.  */
+  if (last_ext_logical + last_read_size < src_total_size)
+    {
+      if (ftruncate (dest_fd, src_total_size) < 0)
+        {
+          error (0, errno, _("extending %s"), quote (dst_name));
+          return fail;
+        }
+    }
+
+  return ! fail;
+}
+#else
+static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
+#endif
+
 /* FIXME: describe */
 /* FIXME: rewrite this to use a hash table so we avoid the quadratic
    performance hit that's probably noticeable only on trees deeper
@@ -679,6 +815,25 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }

+      if (make_holes)
+        {
+          bool require_normal_copy = false;
+          /* Perform efficient FIEMAP copy for sparse files, fall back to the
+             standard copy only if the ioctl(2) fails.  */
+          if (fiemap_copy_ok (source_desc, dest_desc, buf_size,
+                              src_open_sb.st_size, src_name,
+                              dst_name, &require_normal_copy))
+            goto preserve_metadata;
+          else
+            {
+              if (! require_normal_copy)
+                {
+                  return_val = false;
+                  goto close_src_and_dst_desc;
+                }
+            }
+        }
+
       /* If not making a sparse file, try to use a more-efficient
          buffer size.  */
       if (! make_holes)
@@ -807,6 +962,8 @@ copy_reg (char const *src_name, char const *dst_name,
         }
     }

+preserve_metadata:
+
   if (x->preserve_timestamps)
     {
       struct timespec timespec[2];
@@ -897,6 +1054,7 @@ close_src_desc:

   free (buf_alloc);
   free (name_alloc);
+
   return return_val;
 }

-- 
1.5.4.3



Best Regards,
-Jeff




Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Tue, 31 May 2011 21:11:02 GMT) Full text and rfc822 format available.

Notification sent to Jim Meyering <jim <at> meyering.net>:
bug acknowledged by developer. (Tue, 31 May 2011 21:11:02 GMT) Full text and rfc822 format available.

Message #37 received at 5915-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: 5915-done <at> debbugs.gnu.org
Subject: Re: bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1) and
	mv(1) v2
Date: Tue, 31 May 2011 23:10:44 +0200
Hi Jeff,
Just a heads up that I've closed this issue.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5915; Package coreutils. (Wed, 29 Jun 2011 11:07:02 GMT) Full text and rfc822 format available.

Message #40 received at 5915 <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: 5915 <at> debbugs.gnu.org, jim <at> meyering.net
Subject: Re: bug#5915: [coreutils] Feature: Introduce fiemap copy to cp(1)
	and mv(1) v2
Date: Wed, 29 Jun 2011 19:06:28 +0800
Hi Jim,

Pretty sorry for my late response and thanks for your head up!


Cheers,
-Jeff
Jim Meyering wrote:
> Hi Jeff,
> Just a heads up that I've closed this issue.
> 
> 
> 





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 27 Jul 2011 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 14 years and 26 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.