GNU bug report logs - #6131
[PATCH]: fiemap support for efficient sparse file copy

Previous Next

Package: coreutils;

Reported by: "jeff.liu" <jeff.liu <at> oracle.com>

Date: Fri, 7 May 2010 14:16:02 UTC

Severity: normal

Tags: patch

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 6131 in the body.
You can then email your comments to 6131 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 07 May 2010 14:16:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "jeff.liu" <jeff.liu <at> oracle.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Fri, 07 May 2010 14:16:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: bug-coreutils <at> gnu.org
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Jim Meyering <jim <at> meyering.net>, Chris Mason <chris.mason <at> oracle.com>,
	Joel Becker <Joel.Becker <at> oracle.com>
Subject: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 07 May 2010 22:13:19 +0800
Hello All,

Add fiemap ioctl(2) feature to cp(1) for efficient sparse files copy has been discussed a few times
in the past few months, thanks Jim for the review comments.

I just work out a new patchsets against the latest upstream code and run some tests which I have
shown before, they all works well.

There is minor code change in this post in case of the ioctl(2) fails in the middle of fiemap copy
process.
My thought is, if this is the first time we met, go back to the standard copy as usual.
Otherwise, we should abort the copy process to avoid corrupting the dest file.
In order to determine if it was the first time ioctl(2) fails , make use of the variable 'i' which
is the fiemap extent counter, to check if it is equal to '0',  its value should be increased if the
previous call ioctl(2) succeeds.

Would you guys please review and consider apply the patches if there is no other issue?

From f8c78794a70f1fb45a2c61c8bbeca344087287ab Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Fri, 7 May 2010 20:48:45 +0800
Subject: [PATCH 1/3] Add fiemap.h for fiemap ioctl(2) support.
 It does not shipped by default, so I copy it from kernel at the moment.
 I have update its code style respect to GNU coding style.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 gnulib       |    2 +-
 src/fiemap.h |  102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 103 insertions(+), 1 deletions(-)
 create mode 100644 src/fiemap.h

diff --git a/gnulib b/gnulib
index e6addf8..8df7efd 160000
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit e6addf84d6331d634b5d76db03f59851f3de8894
+Subproject commit 8df7efddc8ffe398cde4106d32b39848e5948df9
diff --git a/src/fiemap.h b/src/fiemap.h
new file mode 100644
index 0000000..d33293b
--- /dev/null
+++ b/src/fiemap.h
@@ -0,0 +1,102 @@
+/* FS_IOC_FIEMAP ioctl infrastructure.
+   Some portions copyright (C) 2007 Cluster File Systems, Inc
+   Authors: Mark Fasheh <mfasheh <at> suse.com>
+            Kalpak Shah <kalpak.shah <at> sun.com>
+            Andreas Dilger <adilger <at> sun.com>.  */
+
+/* Copy from kernel, modified to respect GNU code style by Jie Liu.  */
+
+#ifndef _LINUX_FIEMAP_H
+# define _LINUX_FIEMAP_H
+
+# include <linux/types.h>
+
+struct fiemap_extent
+{
+  /* Logical offset in bytes for the start of the extent
+     from the beginning of the file.  */
+  uint64_t fe_logical;
+
+  /* Physical offset in bytes for the start of the extent
+     from the beginning of the disk.  */
+  uint64_t fe_physical;
+
+  /* Length in bytes for this extent.  */
+  uint64_t fe_length;
+
+  uint64_t fe_reserved64[2];
+
+  /* FIEMAP_EXTENT_* flags for this extent.  */
+  uint32_t fe_flags;
+
+  uint32_t fe_reserved[3];
+};
+
+struct fiemap
+{
+  /* Logical offset(inclusive) at which to start mapping(in).  */
+  uint64_t fm_start;
+
+  /* Logical length of mapping which userspace wants(in).  */
+  uint64_t fm_length;
+
+  /* FIEMAP_FLAG_* flags for request(in/out).  */
+  uint32_t fm_flags;
+
+  /* Number of extents that were mapped(out).  */
+  uint32_t fm_mapped_extents;
+
+  /* Size of fm_extents array(in).  */
+  uint32_t fm_extent_count;
+
+  uint32_t fm_reserved;
+
+  /* Array of mapped extents(out).  */
+  struct fiemap_extent fm_extents[0];
+};
+
+/* The maximum offset can be mapped for a file.  */
+# define FIEMAP_MAX_OFFSET       (~0ULL)
+
+/* Sync file data before map.  */
+# define FIEMAP_FLAG_SYNC        0x00000001
+
+/* Map extented attribute tree.  */
+# define FIEMAP_FLAG_XATTR       0x00000002
+
+# define FIEMAP_FLAGS_COMPAT     (FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR)
+
+/* Last extent in file.  */
+# define FIEMAP_EXTENT_LAST              0x00000001
+
+/* Data location unknown.  */
+# define FIEMAP_EXTENT_UNKNOWN           0x00000002
+
+/* Location still pending, Sets EXTENT_UNKNOWN.  */
+# define FIEMAP_EXTENT_DELALLOC          0x00000004
+
+/* Data can not be read while fs is unmounted.  */
+# define FIEMAP_EXTENT_ENCODED           0x00000008
+
+/* Data is encrypted by fs.  Sets EXTENT_NO_BYPASS.  */
+# define FIEMAP_EXTENT_DATA_ENCRYPTED    0x00000080
+
+/* Extent offsets may not be block aligned.  */
+# define FIEMAP_EXTENT_NOT_ALIGNED       0x00000100
+
+/* Data mixed with metadata.  Sets EXTENT_NOT_ALIGNED.  */
+# define FIEMAP_EXTENT_DATA_INLINE       0x00000200
+
+/* Multiple files in block.  Set EXTENT_NOT_ALIGNED.  */
+# define FIEMAP_EXTENT_DATA_TAIL         0x00000400
+
+/* Space allocated, but not data (i.e. zero).  */
+# define FIEMAP_EXTENT_UNWRITTEN         0x00000800
+
+/* File does not natively support extents.  Result merged for efficiency.  */
+# define FIEMAP_EXTENT_MERGED		0x00001000
+
+/* Space shared with other files.  */
+# define FIEMAP_EXTENT_SHARED            0x00002000
+
+#endif
-- 
1.5.4.3


From 12618891cf4f3aff6b65463887b689c2ad99aa8e Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Fri, 7 May 2010 21:14:05 +0800
Subject: [PATCH 2/3] Add fiemap ioctl(2) support for efficient sparse file copy.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 src/copy.c |  154 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 154 insertions(+), 0 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index c16cef6..960e5fb 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -63,6 +63,10 @@

 #include <sys/ioctl.h>

+#ifndef HAVE_FIEMAP
+# include "fiemap.h"
+#endif
+
 #ifndef HAVE_FCHOWN
 # define HAVE_FCHOWN false
 # define fchown(fd, uid, gid) (-1)
@@ -149,6 +153,136 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

+#ifdef __linux__
+# ifndef FS_IOC_FIEMAP
+#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
+# endif
+/* Perform FIEMAP(available in mainline 2.6.27) copy if possible.
+   Call ioctl(2) with FS_IOC_FIEMAP to efficiently map file allocation
+   excepts holes.  So the overhead to deal with holes with lseek(2) in
+   normal copy could be saved.  This would result in much faster backups
+   for any kind of sparse file.  */
+static bool
+fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
+                off_t src_total_size, char const *src_name,
+                char const *dst_name, bool *normal_copy_required)
+{
+  bool fail = false;
+  bool last = false;
+  char fiemap_buf[4096];
+  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
+  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
+  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
+                    sizeof (struct fiemap_extent);
+  off_t last_ext_logical = 0;
+  uint64_t last_ext_len = 0;
+  uint64_t last_read_size = 0;
+  unsigned int i = 0;
+
+  do
+    {
+      fiemap->fm_start = 0ULL;
+      fiemap->fm_length = FIEMAP_MAX_OFFSET;
+      fiemap->fm_extent_count = count;
+
+      /* When ioctl(2) fails, fall back to the normal copy only if it
+         is the first time we met.  */
+      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
+        {
+          /* If `i > 0', then at least one ioctl(2) has been performed before.  */
+          if (i == 0)
+            *normal_copy_required = true;
+          return false;
+        }
+
+      /* If 0 extents are returned, then more ioctls are not needed.  */
+      if (fiemap->fm_mapped_extents == 0)
+        break;
+
+      for (i = 0; i < fiemap->fm_mapped_extents; i++)
+        {
+          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
+
+          off_t ext_logical = fm_ext[i].fe_logical;
+          uint64_t ext_len = fm_ext[i].fe_length;
+
+          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (src_name));
+              return fail;
+            }
+
+          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (dst_name));
+              return fail;
+            }
+
+          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
+            {
+              last_ext_logical = ext_logical;
+              last_ext_len = ext_len;
+              last = true;
+            }
+
+          while (0 < ext_len)
+            {
+              char buf[buf_size];
+
+              /* Avoid reading into the holes if the left extent
+                 length is shorter than the buffer size.  */
+              if (ext_len < buf_size)
+                buf_size = ext_len;
+
+              ssize_t n_read = read (src_fd, buf, buf_size);
+              if (n_read < 0)
+                {
+#ifdef EINTR
+                  if (errno == EINTR)
+                    continue;
+#endif
+                  error (0, errno, _("reading %s"), quote (src_name));
+                  return fail;
+                }
+
+              if (n_read == 0)
+                {
+                  /* Figure out how many bytes read from the last extent.  */
+                  last_read_size = last_ext_len - ext_len;
+                  break;
+                }
+
+              if (full_write (dest_fd, buf, n_read) != n_read)
+                {
+                  error (0, errno, _("writing %s"), quote (dst_name));
+                  return fail;
+                }
+
+              ext_len -= n_read;
+            }
+
+          fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
+        }
+    } while (! last);
+
+  /* If a file ends up with holes, the sum of the last extent logical offset
+     and the read-returned size will be shorter than the actual size of the
+     file.  Use ftruncate to extend the length of the destination file.  */
+  if (last_ext_logical + last_read_size < src_total_size)
+    {
+      if (ftruncate (dest_fd, src_total_size) < 0)
+        {
+          error (0, errno, _("extending %s"), quote (dst_name));
+          return fail;
+        }
+    }
+
+  return ! fail;
+}
+#else
+static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
+#endif
+
 /* FIXME: describe */
 /* FIXME: rewrite this to use a hash table so we avoid the quadratic
    performance hit that's probably noticeable only on trees deeper
@@ -679,6 +813,25 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }

+      if (make_holes)
+        {
+          bool require_normal_copy = false;
+          /* Perform efficient FIEMAP copy for sparse files, fall back to the
+             standard copy only if the ioctl(2) fails.  */
+          if (fiemap_copy_ok (source_desc, dest_desc, buf_size,
+                              src_open_sb.st_size, src_name,
+                              dst_name, &require_normal_copy))
+            goto preserve_metadata;
+          else
+            {
+              if (! require_normal_copy)
+                {
+                  return_val = false;
+                  goto close_src_and_dst_desc;
+                }
+            }
+        }
+
       /* If not making a sparse file, try to use a more-efficient
          buffer size.  */
       if (! make_holes)
@@ -807,6 +960,7 @@ copy_reg (char const *src_name, char const *dst_name,
         }
     }

+preserve_metadata:
   if (x->preserve_timestamps)
     {
       struct timespec timespec[2];
-- 
1.5.4.3


From 8822b8e3f3ee70b49efb8b8aebff373792956422 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Fri, 7 May 2010 21:31:56 +0800
Subject: [PATCH 3/3] Add test script for cp(1) fiemap copy.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 tests/cp/sparse-fiemap |   58 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 58 insertions(+), 0 deletions(-)
 create mode 100755 tests/cp/sparse-fiemap

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
new file mode 100755
index 0000000..25a8fd6
--- /dev/null
+++ b/tests/cp/sparse-fiemap
@@ -0,0 +1,58 @@
+#!/bin/sh
+# Test cp --sparse=always through fiemap copy
+
+# Copyright (C) 2006-2010 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+if test "$VERBOSE" = yes; then
+  set -x
+  cp --version
+fi
+
+. $srcdir/test-lib.sh
+
+cp_orig=cp
+cp_new="$abs_top_builddir/src/cp"
+
+test -d "/ext4"              \
+  && sparse="/ext4/sparse"   \
+  && normal="/ext4/sparse1"  \
+  && fiemap="/ext4/sparse2"  \
+  || skip=1
+
+test $skip = 1 && skip_test_ "/ext4 does not exists"
+
+size=`expr 10 \* 1024`
+dd if=/dev/zero bs=4k count=1 seek=$size of=$sparse > /dev/null 2>&1 || framework_failure
+
+# Using time(1) instead of shell built-in `time' command.
+# It support "--format" option which is more convinent to calculate
+# the expense time for different `cp' by combine with bc(1) for
+# the performance measurement.
+TIME=`which time` || skip_test_ "time(1) does not exists"
+
+x=$(echo "1+2" | bc)
+test $x = 3 || skip_test_ "bc(1) does not exists"
+
+t1=$($TIME -f "%U + %S" $cp_orig --sparse=always $sparse $normal 2>&1 | bc) || fail=1
+t2=$($TIME -f "%U + %S" $cp_new --sparse=always $sparse $fiemap 2>&1 | bc)  || fail=1
+
+test $fail = 1 && skip_test_ "at least one sparse file copy failed"
+
+# Ensure that the sparse file copied through fiemap has the same size in bytes as the original.
+test `stat --printf %s $sparse` -eq `stat --printf %s $fiemap` || fail=1
+echo "$t2 < $t1" | bc || fail=1
+
+Exit $fail
-- 
1.5.4.3



Best Regards,
-Jeff

-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 12 May 2010 08:50:03 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 12 May 2010 10:48:44 +0200
jeff.liu wrote:
> Hello All,
...
> Would you guys please review and consider apply the patches if there is no other issue?

Thanks for yet another iteration, and sorry it's taking so long.

>>From f8c78794a70f1fb45a2c61c8bbeca344087287ab Mon Sep 17 00:00:00 2001
> From: Jie Liu <jeff.liu <at> oracle.com>
> Date: Fri, 7 May 2010 20:48:45 +0800
> Subject: [PATCH 1/3] Add fiemap.h for fiemap ioctl(2) support.
>  It does not shipped by default, so I copy it from kernel at the moment.
>  I have update its code style respect to GNU coding style.

Your log message should look like this:
(one-line summary, then ChangeLog-style after a blank line)

add fiemap.h for fiemap ioctl(2) support

* src/fiemap.h: Copied from linux's include/linux/fiemap.h,
with minor formatting changes.

> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
> ---
>  gnulib       |    2 +-

Please omit the gnulib part.
It is irrelevant to your change.

>>From 8822b8e3f3ee70b49efb8b8aebff373792956422 Mon Sep 17 00:00:00 2001
> From: Jie Liu <jeff.liu <at> oracle.com>
> Date: Fri, 7 May 2010 21:31:56 +0800
> Subject: [PATCH 3/3] Add test script for cp(1) fiemap copy.
>
> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
> ---
>  tests/cp/sparse-fiemap |   58 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 58 insertions(+), 0 deletions(-)
>  create mode 100755 tests/cp/sparse-fiemap
>
> diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
> new file mode 100755
> index 0000000..25a8fd6
> --- /dev/null
> +++ b/tests/cp/sparse-fiemap
> @@ -0,0 +1,58 @@
> +#!/bin/sh
> +# Test cp --sparse=always through fiemap copy
> +
> +# Copyright (C) 2006-2010 Free Software Foundation, Inc.
> +
> +# This program is free software: you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation, either version 3 of the License, or
> +# (at your option) any later version.
> +
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +if test "$VERBOSE" = yes; then
> +  set -x
> +  cp --version
> +fi
> +
> +. $srcdir/test-lib.sh
> +
> +cp_orig=cp
> +cp_new="$abs_top_builddir/src/cp"
> +
> +test -d "/ext4"              \
> +  && sparse="/ext4/sparse"   \
> +  && normal="/ext4/sparse1"  \
> +  && fiemap="/ext4/sparse2"  \
> +  || skip=1

Be careful about writing to well-known files like these.
That's risky if /ext4 is world-writable.
Work in a temporary directory created by mktemp -d, just in case.

It'd be better still to make the test work when run as root, in which
case you can create and mount a file-backed ext4 partition, and then use
that, rather than relying on the existence of a preexisting, fixed-name
directory that may or may not be of the right FS type.

But don't worry about this part if you're not inclined.
We can make it do that later.

> +test $skip = 1 && skip_test_ "/ext4 does not exists"

s/exists/exist/

> +size=`expr 10 \* 1024`

Here, just use size=10240.

In general, please use $(...), not `...`.
(multiple places)

> +dd if=/dev/zero bs=4k count=1 seek=$size of=$sparse > /dev/null 2>&1 || framework_failure
> +
> +# Using time(1) instead of shell built-in `time' command.
> +# It support "--format" option which is more convinent to calculate
> +# the expense time for different `cp' by combine with bc(1) for
> +# the performance measurement.
> +TIME=`which time` || skip_test_ "time(1) does not exists"

Don't use "which".  It has portability problems.  Instead,

  env time --version | grep GNU > /dev/null \
    || skip_test_ "you lack the GNU time program"

Another alternative: record $(date +%s.%N), before and after.
Elapsed time should be adequate.  Advantage: date is always available,
while time may not be.

> +x=$(echo "1+2" | bc)
> +test $x = 3 || skip_test_ "bc(1) does not exists"
> +
> +t1=$($TIME -f "%U + %S" $cp_orig --sparse=always $sparse $normal 2>&1 | bc) || fail=1
> +t2=$($TIME -f "%U + %S" $cp_new --sparse=always $sparse $fiemap 2>&1 | bc)  || fail=1
> +
> +test $fail = 1 && skip_test_ "at least one sparse file copy failed"
> +
> +# Ensure that the sparse file copied through fiemap has the same size in bytes as the original.
> +test `stat --printf %s $sparse` -eq `stat --printf %s $fiemap` || fail=1

s/-eq/=/

> +echo "$t2 < $t1" | bc || fail=1

At first I wrote this:

    Can't you require something stronger than merely
    that the new code is faster than the old?
    How do the times compare in general?
    Can you construct a worst-case scenario that
    makes the difference as large as possible, yet
    that still completes (with the new code) in say
    5 or 10 seconds on modern equipment?

    Use awk rather than bc for comparison.
    For an example that's similar, I made parted fail a test
    when an efficiency-sensitive operation takes more than a minute here:

    http://git.debian.org/?p=parted/parted.git;a=commitdiff;h=eedc6d77dc4b3488decd4dce9cb8cafaa95755ce

    You can depend on some version of awk being available.
    Invoke it via $AWK, but for that to work in the test script,
    you'll have to add AWK=$(AWK) to tests/Makefile.am's TESTS_ENVIRONMENT,
    as I did here:

    http://git.debian.org/?p=parted/parted.git;a=commitdiff;h=246c953b53c1bd49b1f835f84a1ca29a6d2fbc1c

Then I remembered that here we have timeout(1), so:
you may ignore the above and consider this a suggestion
to use timeout:

But that was in Parted, where we can't guarantee that the timeout
program is available.  Here in coreutils, you're guaranteed to
have timeout(1) (just built), so you might want to use it, too:
Contrive a test that takes a very long time without FIEMAP support
yet that runs in a couple seconds with it.  Then run cp via timeout
with a 10-second limit.  If timeout's exit status is not 0,
then make the test fail.

That has the advantage of letting you use an example that would take
far longer that we typically want to wait for a non-FIEMAP test.
I.e., perform only the FIEMAP-copy and ensure that it's "quick enough".
You don't have to perform a non-FIEMAP one.

Another advantage: if you don't do the old/slow sparse copy,
there's no need for comparison (and bc or awk) at all.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 13 May 2010 14:28:02 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 13 May 2010 22:25:16 +0800
Hi Jim,

Thanks for your kind advise!

I'd like to adopt the timeout(1) approach for the test work.

My thought is:
1. Create and mount a file-backed ext4 partition rather than relying on the HARD CODE path.
2. Create a 2gb sparse file without extent allocated for it.
3. It take nearly 30 seconds to transfer this file in normal copy, yet less than 1 second through
FIEMAP-copy, is it a worst-case scenario that makes the difference as large as possible?
4. run FIEMAP-copy, use timeout(1) to limit it will complete in 1 second, I hope I understood your
opinion correctly ;).

The revised patches are shown as following:

From b683f930c5e70481c2b6e000a626734f975b99ac Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Thu, 13 May 2010 22:09:30 +0800
Subject: [PATCH 1/1] cp: Add FIEMAP support for efficient sparse file copy

* src/fiemap.h: Add fiemap.h for fiemap ioctl(2) support.
Copied from linux's include/linux/fiemap.h, with minor formatting changes.
* src/copy.c (copy_reg): Now, when `cp' invoked with --sparse=[WHEN] option, we
will try to do FIEMAP-copy if the underlaying file system support it, fall back
to a normal copy if it fails.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 src/copy.c   |  154 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/fiemap.h |  102 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 256 insertions(+), 0 deletions(-)
 create mode 100644 src/fiemap.h

diff --git a/src/copy.c b/src/copy.c
index c16cef6..960e5fb 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -63,6 +63,10 @@

 #include <sys/ioctl.h>

+#ifndef HAVE_FIEMAP
+# include "fiemap.h"
+#endif
+
 #ifndef HAVE_FCHOWN
 # define HAVE_FCHOWN false
 # define fchown(fd, uid, gid) (-1)
@@ -149,6 +153,136 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

+#ifdef __linux__
+# ifndef FS_IOC_FIEMAP
+#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
+# endif
+/* Perform FIEMAP(available in mainline 2.6.27) copy if possible.
+   Call ioctl(2) with FS_IOC_FIEMAP to efficiently map file allocation
+   excepts holes.  So the overhead to deal with holes with lseek(2) in
+   normal copy could be saved.  This would result in much faster backups
+   for any kind of sparse file.  */
+static bool
+fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
+                off_t src_total_size, char const *src_name,
+                char const *dst_name, bool *normal_copy_required)
+{
+  bool fail = false;
+  bool last = false;
+  char fiemap_buf[4096];
+  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
+  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
+  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
+                    sizeof (struct fiemap_extent);
+  off_t last_ext_logical = 0;
+  uint64_t last_ext_len = 0;
+  uint64_t last_read_size = 0;
+  unsigned int i = 0;
+
+  do
+    {
+      fiemap->fm_start = 0ULL;
+      fiemap->fm_length = FIEMAP_MAX_OFFSET;
+      fiemap->fm_extent_count = count;
+
+      /* When ioctl(2) fails, fall back to the normal copy only if it
+         is the first time we met.  */
+      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
+        {
+          /* If `i > 0', then at least one ioctl(2) has been performed before.  */
+          if (i == 0)
+            *normal_copy_required = true;
+          return false;
+        }
+
+      /* If 0 extents are returned, then more ioctls are not needed.  */
+      if (fiemap->fm_mapped_extents == 0)
+        break;
+
+      for (i = 0; i < fiemap->fm_mapped_extents; i++)
+        {
+          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
+
+          off_t ext_logical = fm_ext[i].fe_logical;
+          uint64_t ext_len = fm_ext[i].fe_length;
+
+          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (src_name));
+              return fail;
+            }
+
+          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (dst_name));
+              return fail;
+            }
+
+          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
+            {
+              last_ext_logical = ext_logical;
+              last_ext_len = ext_len;
+              last = true;
+            }
+
+          while (0 < ext_len)
+            {
+              char buf[buf_size];
+
+              /* Avoid reading into the holes if the left extent
+                 length is shorter than the buffer size.  */
+              if (ext_len < buf_size)
+                buf_size = ext_len;
+
+              ssize_t n_read = read (src_fd, buf, buf_size);
+              if (n_read < 0)
+                {
+#ifdef EINTR
+                  if (errno == EINTR)
+                    continue;
+#endif
+                  error (0, errno, _("reading %s"), quote (src_name));
+                  return fail;
+                }
+
+              if (n_read == 0)
+                {
+                  /* Figure out how many bytes read from the last extent.  */
+                  last_read_size = last_ext_len - ext_len;
+                  break;
+                }
+
+              if (full_write (dest_fd, buf, n_read) != n_read)
+                {
+                  error (0, errno, _("writing %s"), quote (dst_name));
+                  return fail;
+                }
+
+              ext_len -= n_read;
+            }
+
+          fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
+        }
+    } while (! last);
+
+  /* If a file ends up with holes, the sum of the last extent logical offset
+     and the read-returned size will be shorter than the actual size of the
+     file.  Use ftruncate to extend the length of the destination file.  */
+  if (last_ext_logical + last_read_size < src_total_size)
+    {
+      if (ftruncate (dest_fd, src_total_size) < 0)
+        {
+          error (0, errno, _("extending %s"), quote (dst_name));
+          return fail;
+        }
+    }
+
+  return ! fail;
+}
+#else
+static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
+#endif
+
 /* FIXME: describe */
 /* FIXME: rewrite this to use a hash table so we avoid the quadratic
    performance hit that's probably noticeable only on trees deeper
@@ -679,6 +813,25 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }

+      if (make_holes)
+        {
+          bool require_normal_copy = false;
+          /* Perform efficient FIEMAP copy for sparse files, fall back to the
+             standard copy only if the ioctl(2) fails.  */
+          if (fiemap_copy_ok (source_desc, dest_desc, buf_size,
+                              src_open_sb.st_size, src_name,
+                              dst_name, &require_normal_copy))
+            goto preserve_metadata;
+          else
+            {
+              if (! require_normal_copy)
+                {
+                  return_val = false;
+                  goto close_src_and_dst_desc;
+                }
+            }
+        }
+
       /* If not making a sparse file, try to use a more-efficient
          buffer size.  */
       if (! make_holes)
@@ -807,6 +960,7 @@ copy_reg (char const *src_name, char const *dst_name,
         }
     }

+preserve_metadata:
   if (x->preserve_timestamps)
     {
       struct timespec timespec[2];
diff --git a/src/fiemap.h b/src/fiemap.h
new file mode 100644
index 0000000..d33293b
--- /dev/null
+++ b/src/fiemap.h
@@ -0,0 +1,102 @@
+/* FS_IOC_FIEMAP ioctl infrastructure.
+   Some portions copyright (C) 2007 Cluster File Systems, Inc
+   Authors: Mark Fasheh <mfasheh <at> suse.com>
+            Kalpak Shah <kalpak.shah <at> sun.com>
+            Andreas Dilger <adilger <at> sun.com>.  */
+
+/* Copy from kernel, modified to respect GNU code style by Jie Liu.  */
+
+#ifndef _LINUX_FIEMAP_H
+# define _LINUX_FIEMAP_H
+
+# include <linux/types.h>
+
+struct fiemap_extent
+{
+  /* Logical offset in bytes for the start of the extent
+     from the beginning of the file.  */
+  uint64_t fe_logical;
+
+  /* Physical offset in bytes for the start of the extent
+     from the beginning of the disk.  */
+  uint64_t fe_physical;
+
+  /* Length in bytes for this extent.  */
+  uint64_t fe_length;
+
+  uint64_t fe_reserved64[2];
+
+  /* FIEMAP_EXTENT_* flags for this extent.  */
+  uint32_t fe_flags;
+
+  uint32_t fe_reserved[3];
+};
+
+struct fiemap
+{
+  /* Logical offset(inclusive) at which to start mapping(in).  */
+  uint64_t fm_start;
+
+  /* Logical length of mapping which userspace wants(in).  */
+  uint64_t fm_length;
+
+  /* FIEMAP_FLAG_* flags for request(in/out).  */
+  uint32_t fm_flags;
+
+  /* Number of extents that were mapped(out).  */
+  uint32_t fm_mapped_extents;
+
+  /* Size of fm_extents array(in).  */
+  uint32_t fm_extent_count;
+
+  uint32_t fm_reserved;
+
+  /* Array of mapped extents(out).  */
+  struct fiemap_extent fm_extents[0];
+};
+
+/* The maximum offset can be mapped for a file.  */
+# define FIEMAP_MAX_OFFSET       (~0ULL)
+
+/* Sync file data before map.  */
+# define FIEMAP_FLAG_SYNC        0x00000001
+
+/* Map extented attribute tree.  */
+# define FIEMAP_FLAG_XATTR       0x00000002
+
+# define FIEMAP_FLAGS_COMPAT     (FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR)
+
+/* Last extent in file.  */
+# define FIEMAP_EXTENT_LAST              0x00000001
+
+/* Data location unknown.  */
+# define FIEMAP_EXTENT_UNKNOWN           0x00000002
+
+/* Location still pending, Sets EXTENT_UNKNOWN.  */
+# define FIEMAP_EXTENT_DELALLOC          0x00000004
+
+/* Data can not be read while fs is unmounted.  */
+# define FIEMAP_EXTENT_ENCODED           0x00000008
+
+/* Data is encrypted by fs.  Sets EXTENT_NO_BYPASS.  */
+# define FIEMAP_EXTENT_DATA_ENCRYPTED    0x00000080
+
+/* Extent offsets may not be block aligned.  */
+# define FIEMAP_EXTENT_NOT_ALIGNED       0x00000100
+
+/* Data mixed with metadata.  Sets EXTENT_NOT_ALIGNED.  */
+# define FIEMAP_EXTENT_DATA_INLINE       0x00000200
+
+/* Multiple files in block.  Set EXTENT_NOT_ALIGNED.  */
+# define FIEMAP_EXTENT_DATA_TAIL         0x00000400
+
+/* Space allocated, but not data (i.e. zero).  */
+# define FIEMAP_EXTENT_UNWRITTEN         0x00000800
+
+/* File does not natively support extents.  Result merged for efficiency.  */
+# define FIEMAP_EXTENT_MERGED		0x00001000
+
+/* Space shared with other files.  */
+# define FIEMAP_EXTENT_SHARED            0x00002000
+
+#endif
-- 
1.5.4.3



From f18e1801d1dfca9fa278572b8172a5f97da2adc1 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Thu, 13 May 2010 22:17:53 +0800
Subject: [PATCH 1/1] tests: add a new test for FIEMAP-copy

* tests/cp/sparse-fiemap: Add a new test for FIEMAP-copy against a
loopbacked ext4 partition.
* tests/Makefile.am (sparse-fiemap): Reference the new test.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 tests/Makefile.am      |    2 +
 tests/cp/sparse-fiemap |   61 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+), 0 deletions(-)
 create mode 100644 tests/cp/sparse-fiemap

diff --git a/tests/Makefile.am b/tests/Makefile.am
index 46d388a..a76c6a7 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -25,6 +25,7 @@ root_tests =					\
   cp/special-bits				\
   cp/cp-mv-enotsup-xattr			\
   cp/capability					\
+  cp/sparse-fiemap                              \
   dd/skip-seek-past-dev				\
   install/install-C-root			\
   ls/capability					\
@@ -319,6 +320,7 @@ TESTS =						\
   cp/same-file					\
   cp/slink-2-slink				\
   cp/sparse					\
+  cp/sparse-fiemap                              \
   cp/special-f					\
   cp/src-base-dot				\
   cp/symlink-slash				\
diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
new file mode 100644
index 0000000..f9d3a94
--- /dev/null
+++ b/tests/cp/sparse-fiemap
@@ -0,0 +1,61 @@
+#!/bin/sh
+# Test cp --sparse=always through fiemap copy
+
+# Copyright (C) 2006-2010 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+if test "$VERBOSE" = yes; then
+  set -x
+  cp --version
+fi
+
+. $srcdir/test-lib.sh
+require_root_
+
+cwd=`pwd`
+cleanup_() { cd /; umount "$cwd/mnt"; }
+
+# Create an ext4 loopback file system
+dd if=/dev/zero of=blob bs=8192 count=1000 > /dev/null 2>&1 \
+                                               || skip=1
+mkdir mnt
+mkfs -t ext4 -F blob ||
+  skip_test_ "failed to create ext4 file system"
+mount -oloop blob mnt                          || skip=1
+echo test > mnt/f                              || skip=1
+test -s mnt/f                                  || skip=1
+
+test $skip = 1 &&
+  skip_test_ "insufficient mount/ext4 support"
+
+rm -f mnt/f
+
+# Create a 2gb sparse file
+dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=2096128 > /dev/null 2>&1 || framework_failure
+
+# It take more than 20 seconds to transfer the created sparse file
+# through normal copy, by contrast, it take even less than 1 second
+# through FIEMAP-copy.
+timeout 1 cp --sparse=always mnt/sparse mnt/sparse_fiemap || fail=1
+test $? = 124 && fail=1
+
+# Ensure that the sparse file copied through fiemap has the same size
+# in bytes as the original.
+test `stat --printf %s $sparse` = `stat --printf %s $fiemap` || fail=1
+
+rm -f mnt/sparse
+rm -f mnt/sparse_fiemap
+
+Exit $fail
-- 
1.5.4.3



Thanks,
-Jeff

> 
> Then I remembered that here we have timeout(1), so:
> you may ignore the above and consider this a suggestion
> to use timeout:
> 
> But that was in Parted, where we can't guarantee that the timeout
> program is available.  Here in coreutils, you're guaranteed to
> have timeout(1) (just built), so you might want to use it, too:
> Contrive a test that takes a very long time without FIEMAP support
> yet that runs in a couple seconds with it.  Then run cp via timeout
> with a 10-second limit.  If timeout's exit status is not 0,
> then make the test fail.
> 
> That has the advantage of letting you use an example that would take
> far longer that we typically want to wait for a non-FIEMAP test.
> I.e., perform only the FIEMAP-copy and ensure that it's "quick enough".
> You don't have to perform a non-FIEMAP one.
> 
> Another advantage: if you don't do the old/slow sparse copy,
> there's no need for comparison (and bc or awk) at all.
> 
> 
> 


-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 20 May 2010 19:25:02 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 20 May 2010 21:23:55 +0200
jeff.liu wrote:
> Hi Jim,
>
> Thanks for your kind advise!
>
> I'd like to adopt the timeout(1) approach for the test work.
>
> My thought is:
> 1. Create and mount a file-backed ext4 partition rather than relying on the HARD CODE path.
> 2. Create a 2gb sparse file without extent allocated for it.
> 3. It take nearly 30 seconds to transfer this file in normal copy, yet less than 1 second through
> FIEMAP-copy, is it a worst-case scenario that makes the difference as large as possible?
> 4. run FIEMAP-copy, use timeout(1) to limit it will complete in 1 second, I hope I understood your
> opinion correctly ;).
>
> The revised patches are shown as following:
>
>>From f18e1801d1dfca9fa278572b8172a5f97da2adc1 Mon Sep 17 00:00:00 2001
> From: Jie Liu <jeff.liu <at> oracle.com>
> Date: Thu, 13 May 2010 22:17:53 +0800
> Subject: [PATCH 1/1] tests: add a new test for FIEMAP-copy
>
> * tests/cp/sparse-fiemap: Add a new test for FIEMAP-copy against a
> loopbacked ext4 partition.
> * tests/Makefile.am (sparse-fiemap): Reference the new test.
>
> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
> ---
>  tests/Makefile.am      |    2 +
>  tests/cp/sparse-fiemap |   61 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 63 insertions(+), 0 deletions(-)
>  create mode 100644 tests/cp/sparse-fiemap
>
> diff --git a/tests/Makefile.am b/tests/Makefile.am
> index 46d388a..a76c6a7 100644
> --- a/tests/Makefile.am
> +++ b/tests/Makefile.am
> @@ -25,6 +25,7 @@ root_tests =					\
>    cp/special-bits				\
>    cp/cp-mv-enotsup-xattr			\
>    cp/capability					\
> +  cp/sparse-fiemap                              \
>    dd/skip-seek-past-dev				\
>    install/install-C-root			\
>    ls/capability					\
> @@ -319,6 +320,7 @@ TESTS =						\
>    cp/same-file					\
>    cp/slink-2-slink				\
>    cp/sparse					\
> +  cp/sparse-fiemap                              \
>    cp/special-f					\
>    cp/src-base-dot				\
>    cp/symlink-slash				\

I've applied your patches locally and have begun adjusting them.
First, I removed the addition of cp/sparse-fiemap to the TESTS list above.
Adding it to the root_tests is sufficient.

Then, I've made the following changes to your test script:
  - the original size of your test file of 2GiB was too small,
      in that the old (pre-fiemap) cp copied it for me in less than
      1 second when the backing file was on a tmpfs file system.
      I've made the new size be 2TiB.  The fiemap copy is still so
      quick that it completes in < .01 second.[*]
  - no point in discarding stdout/stderr, since it all goes to the log
  - raised timeout to 10 seconds to give more leeway on slow systems
  - remove those "rm -f" uses.  They're not needed, since the test is
      run in its own temp dir, which is removed automatically when done.
  - remove the $? = 124 test -- the preceding test for success is sufficient

[*] I tried to count syscalls with strace but got a segfault.
Using valgrind I get errors, so debugged enough to get a clean
run, but possibly at the expense of correctness.  We'll need more
tests to ensure that the non-sparse blocks in the copy all have
the same offset/length as in the original.  Details below.

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
old mode 100644
new mode 100755
index f9d3a94..814d537
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -28,8 +28,7 @@ cwd=`pwd`
 cleanup_() { cd /; umount "$cwd/mnt"; }

 # Create an ext4 loopback file system
-dd if=/dev/zero of=blob bs=8192 count=1000 > /dev/null 2>&1 \
-                                               || skip=1
+dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
 mkdir mnt
 mkfs -t ext4 -F blob ||
   skip_test_ "failed to create ext4 file system"
@@ -42,20 +41,15 @@ test $skip = 1 &&

 rm -f mnt/f

-# Create a 2gb sparse file
-dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=2096128 > /dev/null 2>&1 || framework_failure
+# Create a 2TiB sparse file
+dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=2G || framework_failure

-# It take more than 20 seconds to transfer the created sparse file
-# through normal copy, by contrast, it take even less than 1 second
-# through FIEMAP-copy.
-timeout 1 cp --sparse=always mnt/sparse mnt/sparse_fiemap || fail=1
-test $? = 124 && fail=1
+# It takes many minutes to copy this sparse file using the old method.
+# By contrast, it takes far less than 1 second using FIEMAP-copy.
+timeout 10 cp --sparse=always mnt/sparse mnt/sparse_fiemap || fail=1

 # Ensure that the sparse file copied through fiemap has the same size
 # in bytes as the original.
 test `stat --printf %s $sparse` = `stat --printf %s $fiemap` || fail=1

-rm -f mnt/sparse
-rm -f mnt/sparse_fiemap
-
 Exit $fail

----------------------------------------
On F13, x86_64, ext4, I did this:

dd if=/dev/null of=big bs=1 seek=2G
valgrind ./cp --sparse=always big big2
==4771== Conditional jump or move depends on uninitialised value(s)
==4771==    at 0x40465A: fiemap_copy_ok (copy.c:205)
==4771==    by 0x405B61: copy_reg (copy.c:822)
==4771==    by 0x408713: copy_internal (copy.c:2163)
==4771==    by 0x409237: copy (copy.c:2449)
==4771==    by 0x403AC9: do_copy (cp.c:754)
==4771==    by 0x4041E4: main (cp.c:1154)
==4771==
==4771== Syscall param lseek(offset) contains uninitialised byte(s)
==4771==    at 0x3269CE1540: __lseek_nocancel (syscall-template.S:82)
==4771==    by 0x4046D4: fiemap_copy_ok (copy.c:210)
==4771==    by 0x405B61: copy_reg (copy.c:822)
==4771==    by 0x408713: copy_internal (copy.c:2163)
==4771==    by 0x409237: copy (copy.c:2449)
==4771==    by 0x403AC9: do_copy (cp.c:754)
==4771==    by 0x4041E4: main (cp.c:1154)
==4771==
==4771== Syscall param lseek(offset) contains uninitialised byte(s)
==4771==    at 0x3269CE1540: __lseek_nocancel (syscall-template.S:82)
==4771==    by 0x40472D: fiemap_copy_ok (copy.c:216)
==4771==    by 0x405B61: copy_reg (copy.c:822)
==4771==    by 0x408713: copy_internal (copy.c:2163)
==4771==    by 0x409237: copy (copy.c:2449)
==4771==    by 0x403AC9: do_copy (cp.c:754)
==4771==    by 0x4041E4: main (cp.c:1154)
==4771==
==4771== Conditional jump or move depends on uninitialised value(s)
==4771==    at 0x404792: fiemap_copy_ok (copy.c:222)
==4771==    by 0x405B61: copy_reg (copy.c:822)
==4771==    by 0x408713: copy_internal (copy.c:2163)
==4771==    by 0x409237: copy (copy.c:2449)
==4771==    by 0x403AC9: do_copy (cp.c:754)
==4771==    by 0x4041E4: main (cp.c:1154)
==4771==
==4771== Conditional jump or move depends on uninitialised value(s)
==4771==    at 0x40492B: fiemap_copy_ok (copy.c:229)
==4771==    by 0x405B61: copy_reg (copy.c:822)
==4771==    by 0x408713: copy_internal (copy.c:2163)
==4771==    by 0x409237: copy (copy.c:2449)
==4771==    by 0x403AC9: do_copy (cp.c:754)
==4771==    by 0x4041E4: main (cp.c:1154)
==4771==
==4771== Conditional jump or move depends on uninitialised value(s)
==4771==    at 0x4047FA: fiemap_copy_ok (copy.c:235)
==4771==    by 0x405B61: copy_reg (copy.c:822)
==4771==    by 0x408713: copy_internal (copy.c:2163)
==4771==    by 0x409237: copy (copy.c:2449)
==4771==    by 0x403AC9: do_copy (cp.c:754)
==4771==    by 0x4041E4: main (cp.c:1154)
==4771==
==4771== Syscall param read(count) contains uninitialised byte(s)
==4771==    at 0x3269CD41B0: __read_nocancel (syscall-template.S:82)
==4771==    by 0x404821: fiemap_copy_ok (copy.c:238)
==4771==    by 0x405B61: copy_reg (copy.c:822)
==4771==    by 0x408713: copy_internal (copy.c:2163)
==4771==    by 0x409237: copy (copy.c:2449)
==4771==    by 0x403AC9: do_copy (cp.c:754)
==4771==    by 0x4041E4: main (cp.c:1154)
==4771==
==4771== Invalid read of size 8
==4771==    at 0x404952: fiemap_copy_ok (copy.c:265)
==4771==    by 0x405B61: copy_reg (copy.c:822)
==4771==    by 0x408713: copy_internal (copy.c:2163)
==4771==    by 0x409237: copy (copy.c:2449)
==4771==    by 0x403AC9: do_copy (cp.c:754)
==4771==    by 0x4041E4: main (cp.c:1154)
==4771==  Address 0x3ffeffdd68 is not stack'd, malloc'd or (recently) free'd
==4771==
==4771==
==4771== Process terminating with default action of signal 11 (SIGSEGV)
==4771==  Access not within mapped region at address 0x3FFEFFDD68
==4771==    at 0x404952: fiemap_copy_ok (copy.c:265)
==4771==    by 0x405B61: copy_reg (copy.c:822)
==4771==    by 0x408713: copy_internal (copy.c:2163)
==4771==    by 0x409237: copy (copy.c:2449)
==4771==    by 0x403AC9: do_copy (cp.c:754)
==4771==    by 0x4041E4: main (cp.c:1154)

===========================================================
The segv just above is due to hitting this line with i==0:

    fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);

the obvious fix is probably to do this instead:

    fiemap->fm_start = (fm_ext[i].fe_logical + fm_ext[i].fe_length);

All of the used-uninitialized errors can be papered over by
clearing the fiemap_buf array, like this:

+  memset (fiemap_buf, 0, sizeof fiemap_buf);
   do
     {
       fiemap->fm_start = 0ULL;

However, if these are all due solely to F13's valgrind not yet knowing the
semantics of the FIEMAP ioctl, then that may be adequate.

Bottom line:
  - you may consider your test-script patch accepted, with the patch above
  - I'd like to see a new version of the copy.c-changing patch,
    including at least a fix for the fm_ext[-1] access bug.

===========================================================
Solely for reference, here's the copy.c patch I used to avoid
the valgrind-spotted problems:

diff --git a/src/copy.c b/src/copy.c
index 960e5fb..e232eaa 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -179,6 +179,7 @@ fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
   uint64_t last_read_size = 0;
   unsigned int i = 0;

+  memset (fiemap_buf, 0, sizeof fiemap_buf);
   do
     {
       fiemap->fm_start = 0ULL;
@@ -187,7 +188,7 @@ fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,

       /* When ioctl(2) fails, fall back to the normal copy only if it
          is the first time we met.  */
-      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
+      if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
         {
           /* If `i > 0', then at least one ioctl(2) has been performed before.  */
           if (i == 0)
@@ -261,7 +262,7 @@ fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
               ext_len -= n_read;
             }

-          fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
+          fiemap->fm_start = (fm_ext[i].fe_logical + fm_ext[i].fe_length);
         }
     } while (! last);




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 21 May 2010 12:08:02 GMT) Full text and rfc822 format available.

Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, "jeff.liu" <jeff.liu <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 21 May 2010 13:04:28 +0100
On 13/05/10 15:25, jeff.liu wrote:
>
> diff --git a/src/copy.c b/src/copy.c
> index c16cef6..960e5fb 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -63,6 +63,10 @@
>
>  #include <sys/ioctl.h>
>
> +#ifndef HAVE_FIEMAP
> +# include "fiemap.h"
> +#endif

Is HAVE_FIEMAP ever defined anywhere?
In future will we use this to check for <linux/fiemap.h> ?

On 20/05/10 20:23, Jim Meyering wrote:
> 
> diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
> old mode 100644
> new mode 100755
> index f9d3a94..814d537
> --- a/tests/cp/sparse-fiemap
> +++ b/tests/cp/sparse-fiemap
> @@ -28,8 +28,7 @@ cwd=`pwd`
>  cleanup_() { cd /; umount "$cwd/mnt"; }
> 
>  # Create an ext4 loopback file system
> -dd if=/dev/zero of=blob bs=8192 count=1000 > /dev/null 2>&1 \
> -                                               || skip=1
> +dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
>  mkdir mnt
>  mkfs -t ext4 -F blob ||
>    skip_test_ "failed to create ext4 file system"

There is the unlikely combination of ext4 without fiemap support I think?
If so then that dependency is worth a comment.

> @@ -42,20 +41,15 @@ test $skip = 1 &&
> 
>  rm -f mnt/f
> 
> -# Create a 2gb sparse file
> -dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=2096128 > /dev/null 2>&1 || framework_failure
> +# Create a 2TiB sparse file
> +dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=2G || framework_failure

If we don't need any actual data in the files then one could use:
  truncate -s 2TB mnt/sparse

For my reference, I used TB rather than TiB because on ext3
the limit is 0x1FEFF7FC000 (2194719883264)
(0x1FF7FFFD000 (2196875759616) before 2.6.25)

cheers,
Pádraig.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 21 May 2010 13:01:01 GMT) Full text and rfc822 format available.

Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, "jeff.liu" <jeff.liu <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 21 May 2010 14:59:36 +0200
Pádraig Brady wrote:
> On 13/05/10 15:25, jeff.liu wrote:
>>
>> diff --git a/src/copy.c b/src/copy.c
>> index c16cef6..960e5fb 100644
>> --- a/src/copy.c
>> +++ b/src/copy.c
>> @@ -63,6 +63,10 @@
>>
>>  #include <sys/ioctl.h>
>>
>> +#ifndef HAVE_FIEMAP
>> +# include "fiemap.h"
>> +#endif
>
> Is HAVE_FIEMAP ever defined anywhere?
> In future will we use this to check for <linux/fiemap.h> ?

I'll look.

> On 20/05/10 20:23, Jim Meyering wrote:
>>
>> diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
>> old mode 100644
>> new mode 100755
>> index f9d3a94..814d537
>> --- a/tests/cp/sparse-fiemap
>> +++ b/tests/cp/sparse-fiemap
>> @@ -28,8 +28,7 @@ cwd=`pwd`
>>  cleanup_() { cd /; umount "$cwd/mnt"; }
>>
>>  # Create an ext4 loopback file system
>> -dd if=/dev/zero of=blob bs=8192 count=1000 > /dev/null 2>&1 \
>> -                                               || skip=1
>> +dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
>>  mkdir mnt
>>  mkfs -t ext4 -F blob ||
>>    skip_test_ "failed to create ext4 file system"
>
> There is the unlikely combination of ext4 without fiemap support I think?
> If so then that dependency is worth a comment.

I don't know off hand.
Is there a shell-level way to test for that?

>> @@ -42,20 +41,15 @@ test $skip = 1 &&
>>
>>  rm -f mnt/f
>>
>> -# Create a 2gb sparse file
>> -dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=2096128 > /dev/null 2>&1 || framework_failure
>> +# Create a 2TiB sparse file
>> +dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=2G || framework_failure
>
> If we don't need any actual data in the files then one could use:
>   truncate -s 2TB mnt/sparse
>
> For my reference, I used TB rather than TiB because on ext3
> the limit is 0x1FEFF7FC000 (2194719883264)
> (0x1FF7FFFD000 (2196875759616) before 2.6.25)

Thanks.
For now I'll limit it to 1GiB using dd.
That gives a slightly less uniform input.

  # Create a 1TiB sparse file
  dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=1G || framework_failure




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 21 May 2010 13:02:03 GMT) Full text and rfc822 format available.

Message #23 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 21 May 2010 20:59:56 +0800
Jim Meyering wrote:
> jeff.liu wrote:
>> Hi Jim,
>>
>> Thanks for your kind advise!
>>
>> I'd like to adopt the timeout(1) approach for the test work.
>>
>> My thought is:
>> 1. Create and mount a file-backed ext4 partition rather than relying on the HARD CODE path.
>> 2. Create a 2gb sparse file without extent allocated for it.
>> 3. It take nearly 30 seconds to transfer this file in normal copy, yet less than 1 second through
>> FIEMAP-copy, is it a worst-case scenario that makes the difference as large as possible?
>> 4. run FIEMAP-copy, use timeout(1) to limit it will complete in 1 second, I hope I understood your
>> opinion correctly ;).
>>
>> The revised patches are shown as following:
>>
>> >From f18e1801d1dfca9fa278572b8172a5f97da2adc1 Mon Sep 17 00:00:00 2001
>> From: Jie Liu <jeff.liu <at> oracle.com>
>> Date: Thu, 13 May 2010 22:17:53 +0800
>> Subject: [PATCH 1/1] tests: add a new test for FIEMAP-copy
>>
>> * tests/cp/sparse-fiemap: Add a new test for FIEMAP-copy against a
>> loopbacked ext4 partition.
>> * tests/Makefile.am (sparse-fiemap): Reference the new test.
>>
>> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
>> ---
>>  tests/Makefile.am      |    2 +
>>  tests/cp/sparse-fiemap |   61 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 63 insertions(+), 0 deletions(-)
>>  create mode 100644 tests/cp/sparse-fiemap
>>
>> diff --git a/tests/Makefile.am b/tests/Makefile.am
>> index 46d388a..a76c6a7 100644
>> --- a/tests/Makefile.am
>> +++ b/tests/Makefile.am
>> @@ -25,6 +25,7 @@ root_tests =					\
>>    cp/special-bits				\
>>    cp/cp-mv-enotsup-xattr			\
>>    cp/capability					\
>> +  cp/sparse-fiemap                              \
>>    dd/skip-seek-past-dev				\
>>    install/install-C-root			\
>>    ls/capability					\
>> @@ -319,6 +320,7 @@ TESTS =						\
>>    cp/same-file					\
>>    cp/slink-2-slink				\
>>    cp/sparse					\
>> +  cp/sparse-fiemap                              \
>>    cp/special-f					\
>>    cp/src-base-dot				\
>>    cp/symlink-slash				\
> 
> I've applied your patches locally and have begun adjusting them.
> First, I removed the addition of cp/sparse-fiemap to the TESTS list above.
> Adding it to the root_tests is sufficient.
Thank you to point it out.

> Then, I've made the following changes to your test script:
>   - the original size of your test file of 2GiB was too small,
>       in that the old (pre-fiemap) cp copied it for me in less than
>       1 second when the backing file was on a tmpfs file system.
>       I've made the new size be 2TiB.  The fiemap copy is still so
>       quick that it completes in < .01 second.[*]
>   - no point in discarding stdout/stderr, since it all goes to the log
>   - raised timeout to 10 seconds to give more leeway on slow systems
>   - remove those "rm -f" uses.  They're not needed, since the test is
>       run in its own temp dir, which is removed automatically when done.
>   - remove the $? = 124 test -- the preceding test for success is sufficient
> 
> [*] I tried to count syscalls with strace but got a segfault.
> Using valgrind I get errors, so debugged enough to get a clean
> run, but possibly at the expense of correctness.  We'll need more
> tests to ensure that the non-sparse blocks in the copy all have
> the same offset/length as in the original.  
Is it make sense if we write a utility in C through FIEMAP to show the extent info of a file?
then wrap it in our current test scripts or a new test script to compare the non-sparse blocks
offset and length?

filefrag(8) can do such thing(http://e2fsprogs.sourceforge.net/), but maybe we can implement a
compacted version focus on furture extent maping related testing only for coreutils.

Details below.
> 
> diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
> old mode 100644
> new mode 100755
> index f9d3a94..814d537
> --- a/tests/cp/sparse-fiemap
> +++ b/tests/cp/sparse-fiemap
> @@ -28,8 +28,7 @@ cwd=`pwd`
>  cleanup_() { cd /; umount "$cwd/mnt"; }
> 
>  # Create an ext4 loopback file system
> -dd if=/dev/zero of=blob bs=8192 count=1000 > /dev/null 2>&1 \
> -                                               || skip=1
> +dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
>  mkdir mnt
>  mkfs -t ext4 -F blob ||
>    skip_test_ "failed to create ext4 file system"
> @@ -42,20 +41,15 @@ test $skip = 1 &&
> 
>  rm -f mnt/f
> 
> -# Create a 2gb sparse file
> -dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=2096128 > /dev/null 2>&1 || framework_failure
> +# Create a 2TiB sparse file
> +dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=2G || framework_failure
> 
> -# It take more than 20 seconds to transfer the created sparse file
> -# through normal copy, by contrast, it take even less than 1 second
> -# through FIEMAP-copy.
> -timeout 1 cp --sparse=always mnt/sparse mnt/sparse_fiemap || fail=1
> -test $? = 124 && fail=1
> +# It takes many minutes to copy this sparse file using the old method.
> +# By contrast, it takes far less than 1 second using FIEMAP-copy.
> +timeout 10 cp --sparse=always mnt/sparse mnt/sparse_fiemap || fail=1
> 
>  # Ensure that the sparse file copied through fiemap has the same size
>  # in bytes as the original.
>  test `stat --printf %s $sparse` = `stat --printf %s $fiemap` || fail=1
> 
> -rm -f mnt/sparse
> -rm -f mnt/sparse_fiemap
> -
>  Exit $fail
> 
> ----------------------------------------
> On F13, x86_64, ext4, I did this:
> 
> dd if=/dev/null of=big bs=1 seek=2G
> valgrind ./cp --sparse=always big big2
> ==4771== Conditional jump or move depends on uninitialised value(s)
> ==4771==    at 0x40465A: fiemap_copy_ok (copy.c:205)
> ==4771==    by 0x405B61: copy_reg (copy.c:822)
> ==4771==    by 0x408713: copy_internal (copy.c:2163)
> ==4771==    by 0x409237: copy (copy.c:2449)
> ==4771==    by 0x403AC9: do_copy (cp.c:754)
> ==4771==    by 0x4041E4: main (cp.c:1154)
> ==4771==
> ==4771== Syscall param lseek(offset) contains uninitialised byte(s)
> ==4771==    at 0x3269CE1540: __lseek_nocancel (syscall-template.S:82)
> ==4771==    by 0x4046D4: fiemap_copy_ok (copy.c:210)
> ==4771==    by 0x405B61: copy_reg (copy.c:822)
> ==4771==    by 0x408713: copy_internal (copy.c:2163)
> ==4771==    by 0x409237: copy (copy.c:2449)
> ==4771==    by 0x403AC9: do_copy (cp.c:754)
> ==4771==    by 0x4041E4: main (cp.c:1154)
> ==4771==
> ==4771== Syscall param lseek(offset) contains uninitialised byte(s)
> ==4771==    at 0x3269CE1540: __lseek_nocancel (syscall-template.S:82)
> ==4771==    by 0x40472D: fiemap_copy_ok (copy.c:216)
> ==4771==    by 0x405B61: copy_reg (copy.c:822)
> ==4771==    by 0x408713: copy_internal (copy.c:2163)
> ==4771==    by 0x409237: copy (copy.c:2449)
> ==4771==    by 0x403AC9: do_copy (cp.c:754)
> ==4771==    by 0x4041E4: main (cp.c:1154)
> ==4771==
> ==4771== Conditional jump or move depends on uninitialised value(s)
> ==4771==    at 0x404792: fiemap_copy_ok (copy.c:222)
> ==4771==    by 0x405B61: copy_reg (copy.c:822)
> ==4771==    by 0x408713: copy_internal (copy.c:2163)
> ==4771==    by 0x409237: copy (copy.c:2449)
> ==4771==    by 0x403AC9: do_copy (cp.c:754)
> ==4771==    by 0x4041E4: main (cp.c:1154)
> ==4771==
> ==4771== Conditional jump or move depends on uninitialised value(s)
> ==4771==    at 0x40492B: fiemap_copy_ok (copy.c:229)
> ==4771==    by 0x405B61: copy_reg (copy.c:822)
> ==4771==    by 0x408713: copy_internal (copy.c:2163)
> ==4771==    by 0x409237: copy (copy.c:2449)
> ==4771==    by 0x403AC9: do_copy (cp.c:754)
> ==4771==    by 0x4041E4: main (cp.c:1154)
> ==4771==
> ==4771== Conditional jump or move depends on uninitialised value(s)
> ==4771==    at 0x4047FA: fiemap_copy_ok (copy.c:235)
> ==4771==    by 0x405B61: copy_reg (copy.c:822)
> ==4771==    by 0x408713: copy_internal (copy.c:2163)
> ==4771==    by 0x409237: copy (copy.c:2449)
> ==4771==    by 0x403AC9: do_copy (cp.c:754)
> ==4771==    by 0x4041E4: main (cp.c:1154)
> ==4771==
> ==4771== Syscall param read(count) contains uninitialised byte(s)
> ==4771==    at 0x3269CD41B0: __read_nocancel (syscall-template.S:82)
> ==4771==    by 0x404821: fiemap_copy_ok (copy.c:238)
> ==4771==    by 0x405B61: copy_reg (copy.c:822)
> ==4771==    by 0x408713: copy_internal (copy.c:2163)
> ==4771==    by 0x409237: copy (copy.c:2449)
> ==4771==    by 0x403AC9: do_copy (cp.c:754)
> ==4771==    by 0x4041E4: main (cp.c:1154)
> ==4771==
> ==4771== Invalid read of size 8
> ==4771==    at 0x404952: fiemap_copy_ok (copy.c:265)
> ==4771==    by 0x405B61: copy_reg (copy.c:822)
> ==4771==    by 0x408713: copy_internal (copy.c:2163)
> ==4771==    by 0x409237: copy (copy.c:2449)
> ==4771==    by 0x403AC9: do_copy (cp.c:754)
> ==4771==    by 0x4041E4: main (cp.c:1154)
> ==4771==  Address 0x3ffeffdd68 is not stack'd, malloc'd or (recently) free'd
> ==4771==
> ==4771==
> ==4771== Process terminating with default action of signal 11 (SIGSEGV)
> ==4771==  Access not within mapped region at address 0x3FFEFFDD68
> ==4771==    at 0x404952: fiemap_copy_ok (copy.c:265)
> ==4771==    by 0x405B61: copy_reg (copy.c:822)
> ==4771==    by 0x408713: copy_internal (copy.c:2163)
> ==4771==    by 0x409237: copy (copy.c:2449)
> ==4771==    by 0x403AC9: do_copy (cp.c:754)
> ==4771==    by 0x4041E4: main (cp.c:1154)
> 
> ===========================================================
> The segv just above is due to hitting this line with i==0:
> 
>     fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
strange, code should break if there is no extent allocated for a file.
 /* If 0 extents are returned, then more ioctls are not needed.  */
      if (fiemap->fm_mapped_extents == 0)
        break;

> 
> the obvious fix is probably to do this instead:
> 
>     fiemap->fm_start = (fm_ext[i].fe_logical + fm_ext[i].fe_length);
I just found a bug for dealing with the 'fiemap->fm_start', maybe it is the root cause of the
segment fault.  above line still need to write as 'fm_ext[i-1].fe_logical +....' to calculate the
offset for the next ioctl(2).
> 
> All of the used-uninitialized errors can be papered over by
> clearing the fiemap_buf array, like this:
> 
> +  memset (fiemap_buf, 0, sizeof fiemap_buf);
I recalled why I initialized this buf before when you ask me the reason, I was intented to
initialize the 'fiemap->fm_start', so below line 'fiemap->fm_start = 0ULL' should be removed from
the loop.

>    do
>      {
>        fiemap->fm_start = 0ULL;
> 
> However, if these are all due solely to F13's valgrind not yet knowing the
> semantics of the FIEMAP ioctl, then that may be adequate.
as what I mentioned above, this line should be removed or remove out of the loop if we do not
initialize the fiemap buf.
> 
> Bottom line:
>   - you may consider your test-script patch accepted, with the patch above
>   - I'd like to see a new version of the copy.c-changing patch,
>     including at least a fix for the fm_ext[-1] access bug.
> 
> ===========================================================
> Solely for reference, here's the copy.c patch I used to avoid
> the valgrind-spotted problems:
> 
> diff --git a/src/copy.c b/src/copy.c
> index 960e5fb..e232eaa 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -179,6 +179,7 @@ fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
>    uint64_t last_read_size = 0;
>    unsigned int i = 0;
> 
> +  memset (fiemap_buf, 0, sizeof fiemap_buf);
>    do
>      {
>        fiemap->fm_start = 0ULL;
> @@ -187,7 +188,7 @@ fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
> 
>        /* When ioctl(2) fails, fall back to the normal copy only if it
>           is the first time we met.  */
> -      if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0)
> +      if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
>          {
>            /* If `i > 0', then at least one ioctl(2) has been performed before.  */
>            if (i == 0)
> @@ -261,7 +262,7 @@ fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
>                ext_len -= n_read;
>              }
> 
> -          fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
> +          fiemap->fm_start = (fm_ext[i].fe_logical + fm_ext[i].fe_length);
>          }
>      } while (! last);
> 
> 
> 


-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 21 May 2010 14:03:02 GMT) Full text and rfc822 format available.

Message #26 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, "jeff.liu" <jeff.liu <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 21 May 2010 14:46:57 +0100
On 21/05/10 13:59, Jim Meyering wrote:
> Pádraig Brady wrote:
>> There is the unlikely combination of ext4 without fiemap support I think?
>> If so then that dependency is worth a comment.
> 
> I don't know off hand.
> Is there a shell-level way to test for that?

One could check if <linux/fiemap.h> is available, but that would add
the dependency on kernel-headers being installed.
Alternatively one could skip the test on non linux and linux < 2.6.27
But requiring ext4 is probably good enough to limit this to systems
that actually do have fiemap available. Adding a comment about linux >= 2.6.27
might help us quickly eliminate false positives that are reported.

cheers.
Pádraig.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 21 May 2010 14:29:01 GMT) Full text and rfc822 format available.

Message #29 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 21 May 2010 16:27:49 +0200
jeff.liu wrote:
...
>> [*] I tried to count syscalls with strace but got a segfault.
>> Using valgrind I get errors, so debugged enough to get a clean
>> run, but possibly at the expense of correctness.  We'll need more
>> tests to ensure that the non-sparse blocks in the copy all have
>> the same offset/length as in the original.
> Is it make sense if we write a utility in C through FIEMAP to show the extent info of a file?
> then wrap it in our current test scripts or a new test script to compare the non-sparse blocks
> offset and length?

If there were no adequate tool already available, that would be good.

> filefrag(8) can do such thing(http://e2fsprogs.sourceforge.net/), but maybe we can implement a
> compacted version focus on furture extent maping related testing only for coreutils.

Or maybe just use filefrag, when it's available.
On F13, with -v (verbose), it prints this:

    $ filefrag -v big
    Filesystem type is: ef53
    File size of big is 2147483648 (524288 blocks, blocksize 4096)
     ext logical physical expected length flags
       0       0   254527               1
    big: 1 extent found


>> ===========================================================
>> The segv just above is due to hitting this line with i==0:
>>
>>     fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
> strange, code should break if there is no extent allocated for a file.
>  /* If 0 extents are returned, then more ioctls are not needed.  */
>       if (fiemap->fm_mapped_extents == 0)
>         break;

There is one extent, and it is while processing it, with i == 0 that
would trigger the failure when referencing fm_ext[i-1] (aka fm_ext[-1]).

>> the obvious fix is probably to do this instead:
>>
>>     fiemap->fm_start = (fm_ext[i].fe_logical + fm_ext[i].fe_length);
> I just found a bug for dealing with the 'fiemap->fm_start', maybe it is the root cause of the
> segment fault.  above line still need to write as 'fm_ext[i-1].fe_logical +....' to calculate the
> offset for the next ioctl(2).

"i" can be 0 there, so it sounds like you're saying we need to
reference fm_ext[-1].  If you mean that, you'll have to demonstrate
how we guarantee that i > 0 there.

>> All of the used-uninitialized errors can be papered over by
>> clearing the fiemap_buf array, like this:
>>
>> +  memset (fiemap_buf, 0, sizeof fiemap_buf);
> I recalled why I initialized this buf before when you ask me the reason, I was intented to
> initialize the 'fiemap->fm_start', so below line 'fiemap->fm_start = 0ULL' should be removed from
> the loop.
>
>>    do
>>      {
>>        fiemap->fm_start = 0ULL;
>>
>> However, if these are all due solely to F13's valgrind not yet knowing the
>> semantics of the FIEMAP ioctl, then that may be adequate.
> as what I mentioned above, this line should be removed or remove out of the loop if we do not
> initialize the fiemap buf.

I agree.
Leaving the initialization in the loop would provoke an infinite loop,
for a file with many extents.

This demonstrates it:

    $ perl -e 'for (1..100) { sysseek(STDOUT,4096,1)' \
           -e '&& syswrite(STDOUT,"."x4096) or die "$!"}' > j
    $ ./cp --sparse=always j j2
    <INFLOOP!>
    ^C

With this statement "fiemap->fm_start = 0ULL;" in the do-while loop,
the use of ./cp above would infloop.  Without it, it works properly:

    $ env time -f %E ./cp --sparse=always j j2
    0:00.01

And we can compare the extents in the two:
(the awk is mainly to exclude the physical block numbers,
which will always differ)

    $ diff -u <(filefrag -v j|awk '/^ / {print $1,$2,$NF}') \
              <(filefrag -v j2|awk '/^ / {print $1,$2,$NF}')
    $

For reference, here's what filefrag -v output looks like,
given a file with a nontrivial list of extents:

  $ perl -e 'BEGIN{$n=16*1024; *F=*STDOUT}' \
         -e 'for (1..5) { sysseek(*F,$n,1)' \
         -e '&& syswrite *F,"."x$n or die "$!"}' > j
  $ filefrag -v j
  Filesystem type is: ef53
  File size of j is 163840 (40 blocks, blocksize 4096)
   ext logical physical expected length flags
     0       4  6258884               4
     1      12  6258892  6258887      4
     2      20  6258900  6258895      4
     3      28  6258908  6258903      4
     4      36  6258916  6258911      4 eof
  j: 6 extents found




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 21 May 2010 14:45:04 GMT) Full text and rfc822 format available.

Message #32 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 21 May 2010 16:44:39 +0200
jeff.liu wrote:
...
>>> Subject: [PATCH 1/1] tests: add a new test for FIEMAP-copy
>>>
>>> * tests/cp/sparse-fiemap: Add a new test for FIEMAP-copy against a
>>> loopbacked ext4 partition.
>>> * tests/Makefile.am (sparse-fiemap): Reference the new test.

BTW, I've just made this additional change to your test,

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index 6312a4c..bdc7ded 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -27,6 +27,7 @@ require_root_
 cwd=`pwd`
 cleanup_() { cd /; umount "$cwd/mnt"; }

+skip=0
 # Create an ext4 loopback file system
 dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
 mkdir mnt

And will push this correction to the one you appear to have used as a model:

From c9bcbc8f9fc791c97bc85678f5f22458a76689ac Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Fri, 21 May 2010 14:55:36 +0200
Subject: [PATCH] tests: fix cp-a-selinux to skip cleanly upon mkfs failure

* tests/cp/cp-a-selinux: Initialize skip, to avoid a syntax error
in subsequent "test".
---
 tests/cp/cp-a-selinux |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tests/cp/cp-a-selinux b/tests/cp/cp-a-selinux
index b65070a..5b9ff0f 100755
--- a/tests/cp/cp-a-selinux
+++ b/tests/cp/cp-a-selinux
@@ -45,7 +45,7 @@ test -s err && fail=1   #there must be no stderr output for -a
 ls -Z e | grep $ctx || fail=1
 ls -Z f | grep $ctx || fail=1

-
+skip=0
 # Create a file system, then mount it with the context=... option.
 dd if=/dev/zero of=blob bs=8192 count=200 > /dev/null 2>&1 \
                                              || skip=1
--
1.7.1.262.g5ef3d




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 21 May 2010 15:04:01 GMT) Full text and rfc822 format available.

Message #35 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 21 May 2010 23:01:41 +0800
Jim Meyering wrote:
> jeff.liu wrote:
> ...
>>>> Subject: [PATCH 1/1] tests: add a new test for FIEMAP-copy
>>>>
>>>> * tests/cp/sparse-fiemap: Add a new test for FIEMAP-copy against a
>>>> loopbacked ext4 partition.
>>>> * tests/Makefile.am (sparse-fiemap): Reference the new test.
> 
> BTW, I've just made this additional change to your test,
> 
> diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
> index 6312a4c..bdc7ded 100755
> --- a/tests/cp/sparse-fiemap
> +++ b/tests/cp/sparse-fiemap
> @@ -27,6 +27,7 @@ require_root_
>  cwd=`pwd`
>  cleanup_() { cd /; umount "$cwd/mnt"; }
> 
> +skip=0
>  # Create an ext4 loopback file system
>  dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
>  mkdir mnt
> 
> And will push this correction to the one you appear to have used as a model:
> 
> From c9bcbc8f9fc791c97bc85678f5f22458a76689ac Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Fri, 21 May 2010 14:55:36 +0200
> Subject: [PATCH] tests: fix cp-a-selinux to skip cleanly upon mkfs failure
> 
> * tests/cp/cp-a-selinux: Initialize skip, to avoid a syntax error
> in subsequent "test".
> ---
>  tests/cp/cp-a-selinux |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/tests/cp/cp-a-selinux b/tests/cp/cp-a-selinux
> index b65070a..5b9ff0f 100755
> --- a/tests/cp/cp-a-selinux
> +++ b/tests/cp/cp-a-selinux
> @@ -45,7 +45,7 @@ test -s err && fail=1   #there must be no stderr output for -a
>  ls -Z e | grep $ctx || fail=1
>  ls -Z f | grep $ctx || fail=1
> 
> -
> +skip=0
>  # Create a file system, then mount it with the context=... option.
>  dd if=/dev/zero of=blob bs=8192 count=200 > /dev/null 2>&1 \
>                                               || skip=1
Hmm, here also has a '2>&1' for `dd'. :)
> --
> 1.7.1.262.g5ef3d


Thanks,
-Jeff
-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 21 May 2010 15:33:02 GMT) Full text and rfc822 format available.

Message #38 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 21 May 2010 23:31:53 +0800
Jim Meyering wrote:
> jeff.liu wrote:
> ...
>>> [*] I tried to count syscalls with strace but got a segfault.
>>> Using valgrind I get errors, so debugged enough to get a clean
>>> run, but possibly at the expense of correctness.  We'll need more
>>> tests to ensure that the non-sparse blocks in the copy all have
>>> the same offset/length as in the original.
>> Is it make sense if we write a utility in C through FIEMAP to show the extent info of a file?
>> then wrap it in our current test scripts or a new test script to compare the non-sparse blocks
>> offset and length?
> 
> If there were no adequate tool already available, that would be good.
> 
>> filefrag(8) can do such thing(http://e2fsprogs.sourceforge.net/), but maybe we can implement a
>> compacted version focus on furture extent maping related testing only for coreutils.
> 
> Or maybe just use filefrag, when it's available.
> On F13, with -v (verbose), it prints this:
> 
>     $ filefrag -v big
>     Filesystem type is: ef53
>     File size of big is 2147483648 (524288 blocks, blocksize 4096)
>      ext logical physical expected length flags
>        0       0   254527               1
>     big: 1 extent found
> 
> 
>>> ===========================================================
>>> The segv just above is due to hitting this line with i==0:
>>>
>>>     fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
>> strange, code should break if there is no extent allocated for a file.
>>  /* If 0 extents are returned, then more ioctls are not needed.  */
>>       if (fiemap->fm_mapped_extents == 0)
>>         break;
> 
> There is one extent, and it is while processing it, with i == 0 that
> would trigger the failure when referencing fm_ext[i-1] (aka fm_ext[-1]).
> 
>>> the obvious fix is probably to do this instead:
>>>
>>>     fiemap->fm_start = (fm_ext[i].fe_logical + fm_ext[i].fe_length);
>> I just found a bug for dealing with the 'fiemap->fm_start', maybe it is the root cause of the
>> segment fault.  above line still need to write as 'fm_ext[i-1].fe_logical +....' to calculate the
>> offset for the next ioctl(2).
> 
> "i" can be 0 there, so it sounds like you're saying we need to
> reference fm_ext[-1].  If you mean that, you'll have to demonstrate
> how we guarantee that i > 0 there.
Sorry for the lack of detailed info for this point, except for removing the fiemap->fm_start from
the loop, I need to remove "fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);"
out of the 'for (i = 0; i < fiemap->fm_mapped_extents; i++)" as well.
So, if there is only one extent, at least 'i == 1' when the loop finished, we'll not hit the
'fm_ext[-1]' issue.

my thoughts of the fix looks like below:

memset (fiemap, 0, sizeof fiemap_buf);
do
  {
    ioctl (...);

    for (i = 0; i < fiemap->fm_mapped_extents; i++)
      {
        ...
      }
    fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
  } while (! last);

> 
>>> All of the used-uninitialized errors can be papered over by
>>> clearing the fiemap_buf array, like this:
>>>
>>> +  memset (fiemap_buf, 0, sizeof fiemap_buf);
>> I recalled why I initialized this buf before when you ask me the reason, I was intented to
>> initialize the 'fiemap->fm_start', so below line 'fiemap->fm_start = 0ULL' should be removed from
>> the loop.
>>
>>>    do
>>>      {
>>>        fiemap->fm_start = 0ULL;
>>>
>>> However, if these are all due solely to F13's valgrind not yet knowing the
>>> semantics of the FIEMAP ioctl, then that may be adequate.
>> as what I mentioned above, this line should be removed or remove out of the loop if we do not
>> initialize the fiemap buf.
> 
> I agree.
> Leaving the initialization in the loop would provoke an infinite loop,
> for a file with many extents.
> 
> This demonstrates it:
> 
>     $ perl -e 'for (1..100) { sysseek(STDOUT,4096,1)' \
>            -e '&& syswrite(STDOUT,"."x4096) or die "$!"}' > j
>     $ ./cp --sparse=always j j2
>     <INFLOOP!>
>     ^C
> 
> With this statement "fiemap->fm_start = 0ULL;" in the do-while loop,
> the use of ./cp above would infloop.  Without it, it works properly:
> 
>     $ env time -f %E ./cp --sparse=always j j2
>     0:00.01
> 
> And we can compare the extents in the two:
> (the awk is mainly to exclude the physical block numbers,
> which will always differ)
> 
>     $ diff -u <(filefrag -v j|awk '/^ / {print $1,$2,$NF}') \
>               <(filefrag -v j2|awk '/^ / {print $1,$2,$NF}')
>     $
> 
> For reference, here's what filefrag -v output looks like,
> given a file with a nontrivial list of extents:
> 
>   $ perl -e 'BEGIN{$n=16*1024; *F=*STDOUT}' \
>          -e 'for (1..5) { sysseek(*F,$n,1)' \
>          -e '&& syswrite *F,"."x$n or die "$!"}' > j
>   $ filefrag -v j
>   Filesystem type is: ef53
>   File size of j is 163840 (40 blocks, blocksize 4096)
>    ext logical physical expected length flags
>      0       4  6258884               4
>      1      12  6258892  6258887      4
>      2      20  6258900  6258895      4
>      3      28  6258908  6258903      4
>      4      36  6258916  6258911      4 eof
>   j: 6 extents found
Do we need another test script for this test if we choose `filefrag' to examine the extent info?
I'd like to handle it.
> 
> 
> 


Thanks,
-Jeff

-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 21 May 2010 15:43:02 GMT) Full text and rfc822 format available.

Message #41 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 21 May 2010 17:42:25 +0200
jeff.liu wrote:
...
> Sorry for the lack of detailed info for this point, except for removing the fiemap->fm_start from
> the loop, I need to remove "fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);"
> out of the 'for (i = 0; i < fiemap->fm_mapped_extents; i++)" as well.
> So, if there is only one extent, at least 'i == 1' when the loop finished, we'll not hit the
> 'fm_ext[-1]' issue.
>
> my thoughts of the fix looks like below:
>
> memset (fiemap, 0, sizeof fiemap_buf);
> do
>   {
>     ioctl (...);
>
>     for (i = 0; i < fiemap->fm_mapped_extents; i++)
>       {
>         ...
>       }
>     fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
>   } while (! last);

That is better.
Equivalent semantics to my change, but yours avoids unnecessarily
updating fiemap->fm_start for each iteration of the for loop.

...
>> For reference, here's what filefrag -v output looks like,
>> given a file with a nontrivial list of extents:
>>
>>   $ perl -e 'BEGIN{$n=16*1024; *F=*STDOUT}' \
>>          -e 'for (1..5) { sysseek(*F,$n,1)' \
>>          -e '&& syswrite *F,"."x$n or die "$!"}' > j
>>   $ filefrag -v j
>>   Filesystem type is: ef53
>>   File size of j is 163840 (40 blocks, blocksize 4096)
>>    ext logical physical expected length flags
>>      0       4  6258884               4
>>      1      12  6258892  6258887      4
>>      2      20  6258900  6258895      4
>>      3      28  6258908  6258903      4
>>      4      36  6258916  6258911      4 eof
>>   j: 6 extents found
> Do we need another test script for this test if we choose `filefrag' to examine the extent info?

Yes, that's why I took the time to do the above.
I've already written most of it.  Will post shortly.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 21 May 2010 15:52:01 GMT) Full text and rfc822 format available.

Message #44 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 21 May 2010 23:51:03 +0800
Hi Jim,

This is the revised version, it fixed the fiemap-start offset calculation approach to remove it out
of the 'for (i = 0; i < fiemap->fm_mapped_extents; i++)' loop.

I have not got a 64bits machine for the testing at the moment, at the following, the first case only
run againt x86 with valgrind for the non-extent file copy, it works for me, could you help verify on
x64?

The second case is to test the non-sparse extents logical offset and length of the copied file are
identical to the source file, `ex' is test tool I write in C to show the extents info through FIEMAP
ioctl(2), it step through each extent of a file to examine and print out the logical offset/extent
length/physical offset.

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ dd if=/dev/null of=/ext4/sp1 bs=1 seek=2G
jeff <at> jeff-laptop:~/opensource_dev/coreutils$ valgrind --version
valgrind-3.3.0-Debian
jeff <at> jeff-laptop:~/opensource_dev/coreutils$ valgrind ./src/cp --sparse=always /ext4/sp1 /ext4/sp2
==13678== Memcheck, a memory error detector.
==13678== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==13678== Using LibVEX rev 1804, a library for dynamic binary translation.
==13678== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==13678== Using valgrind-3.3.0-Debian, a dynamic binary instrumentation framework.
==13678== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==13678== For more details, rerun with: -v
==13678==
==13678==
==13678== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 23 from 1)
==13678== malloc/free: in use at exit: 0 bytes in 0 blocks.
==13678== malloc/free: 71 allocs, 71 frees, 10,255 bytes allocated.
==13678== For counts of detected errors, rerun with: -v
==13678== All heap blocks were freed -- no leaks areFrom 2a2df00acbcc9cdaef723f23efccb65d761d9093
Mon Sep 17 00:00:00 2001

jeff <at> jeff-laptop:~/opensource_dev/coreutils$ ./src/cp --sparse=always /ocfs2/sparse_dir/sparse_4
/ext4/sp2
jeff <at> jeff-laptop:~/opensource_dev/coreutils$ ./ex /ext4/sp2
Extents in file "/ext4/sp2":    14
Extents returned: 14
Logical: ###[       0]	Ext length: ###[   65536]	Physical: ###[352321536]	
Logical: ###[   98304]	Ext length: ###[   32768]	Physical: ###[352419840]	
Logical: ###[  229376]	Ext length: ###[   32768]	Physical: ###[352550912]	
Logical: ###[  458752]	Ext length: ###[   65536]	Physical: ###[352780288]	
Logical: ###[  950272]	Ext length: ###[   65536]	Physical: ###[353271808]	
Logical: ###[ 1966080]	Ext length: ###[   32768]	Physical: ###[354287616]	
Logical: ###[ 3932160]	Ext length: ###[   65536]	Physical: ###[356253696]	
Logical: ###[ 7897088]	Ext length: ###[   65536]	Physical: ###[360218624]	
Logical: ###[15826944]	Ext length: ###[   65536]	Physical: ###[384925696]	
Logical: ###[31719424]	Ext length: ###[   65536]	Physical: ###[1004797952]	
Logical: ###[63471616]	Ext length: ###[   65536]	Physical: ###[1011384320]	
Logical: ###[126976000]	Ext length: ###[   65536]	Physical: ###[1016168448]	
Logical: ###[254017536]	Ext length: ###[   65536]	Physical: ###[1025769472]	
Logical: ###[508100608]	Ext length: ###[   32768]	Physical: ###[1036582912]	
jeff <at> jeff-laptop:~/opensource_dev/coreutils$ ./src/cp --sparse=always /ext4/sp2 /ext4/sp2_fiemap
jeff <at> jeff-laptop:~/opensource_dev/coreutils$ ./ex /ext4/sp2_fiemap
Extents in file "/ext4/sp2_fiemap":    14
Extents returned: 14
Logical: ###[       0]	Ext length: ###[   65536]	Physical: ###[1040187392]	
Logical: ###[   98304]	Ext length: ###[   32768]	Physical: ###[1040285696]	
Logical: ###[  229376]	Ext length: ###[   32768]	Physical: ###[1040416768]	
Logical: ###[  458752]	Ext length: ###[   65536]	Physical: ###[1040646144]	
Logical: ###[  950272]	Ext length: ###[   65536]	Physical: ###[1041137664]	
Logical: ###[ 1966080]	Ext length: ###[   32768]	Physical: ###[1042153472]	
Logical: ###[ 3932160]	Ext length: ###[   65536]	Physical: ###[1044119552]	
Logical: ###[ 7897088]	Ext length: ###[   65536]	Physical: ###[1048084480]	
Logical: ###[15826944]	Ext length: ###[   65536]	Physical: ###[1056014336]	
Logical: ###[31719424]	Ext length: ###[   65536]	Physical: ###[1063518208]	
Logical: ###[63471616]	Ext length: ###[   65536]	Physical: ###[1070104576]	
Logical: ###[126976000]	Ext length: ###[   65536]	Physical: ###[1125220352]	
Logical: ###[254017536]	Ext length: ###[   65536]	Physical: ###[1134821376]	
Logical: ###[508100608]	Ext length: ###[   32768]	Physical: ###[1145634816]	


From 056bb15018466cc2b6b7ae2603fb41b6f61fa084 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Fri, 21 May 2010 22:49:03 +0800
Subject: [PATCH 1/1] cp: Add FIEMAP support for efficient sparse file copy

* src/fiemap.h: Add fiemap.h for fiemap ioctl(2) support.
Copied from linux's include/linux/fiemap.h, with minor formatting changes.
* src/copy.c (copy_reg): Now, when `cp' invoked with --sparse=[WHEN] option, we
will try to do FIEMAP-copy if the underlaying file system support it, fall back
to a normal copy if it fails.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 src/copy.c   |  153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/fiemap.h |  102 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 255 insertions(+), 0 deletions(-)
 create mode 100644 src/fiemap.h

diff --git a/src/copy.c b/src/copy.c
index c16cef6..f32a676 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -63,6 +63,10 @@

 #include <sys/ioctl.h>

+#ifndef HAVE_FIEMAP
+# include "fiemap.h"
+#endif
+
 #ifndef HAVE_FCHOWN
 # define HAVE_FCHOWN false
 # define fchown(fd, uid, gid) (-1)
@@ -149,6 +153,135 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

+#ifdef __linux__
+# ifndef FS_IOC_FIEMAP
+#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
+# endif
+/* Perform FIEMAP(available in mainline 2.6.27) copy if possible.
+   Call ioctl(2) with FS_IOC_FIEMAP to efficiently map file allocation
+   excepts holes.  So the overhead to deal with holes with lseek(2) in
+   normal copy could be saved.  This would result in much faster backups
+   for any kind of sparse file.  */
+static bool
+fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
+                off_t src_total_size, char const *src_name,
+                char const *dst_name, bool *normal_copy_required)
+{
+  bool fail = false;
+  bool last = false;
+  char fiemap_buf[4096];
+  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
+  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
+  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
+                    sizeof (struct fiemap_extent);
+  off_t last_ext_logical = 0;
+  uint64_t last_ext_len = 0;
+  uint64_t last_read_size = 0;
+  unsigned int i = 0;
+
+  memset (fiemap, 0, sizeof fiemap_buf);
+  do
+    {
+      fiemap->fm_length = FIEMAP_MAX_OFFSET;
+      fiemap->fm_extent_count = count;
+
+      /* When ioctl(2) fails, fall back to the normal copy only if it
+         is the first time we met.  */
+      if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
+        {
+          /* If `i > 0', then at least one ioctl(2) has been performed before.  */
+          if (i == 0)
+            *normal_copy_required = true;
+          return false;
+        }
+
+      /* If 0 extents are returned, then more ioctls are not needed.  */
+      if (fiemap->fm_mapped_extents == 0)
+        break;
+
+      for (i = 0; i < fiemap->fm_mapped_extents; i++)
+        {
+          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
+
+          off_t ext_logical = fm_ext[i].fe_logical;
+          uint64_t ext_len = fm_ext[i].fe_length;
+
+          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (src_name));
+              return fail;
+            }
+
+          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (dst_name));
+              return fail;
+            }
+
+          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
+            {
+              last_ext_logical = ext_logical;
+              last_ext_len = ext_len;
+              last = true;
+            }
+
+          while (0 < ext_len)
+            {
+              char buf[buf_size];
+
+              /* Avoid reading into the holes if the left extent
+                 length is shorter than the buffer size.  */
+              if (ext_len < buf_size)
+                buf_size = ext_len;
+
+              ssize_t n_read = read (src_fd, buf, buf_size);
+              if (n_read < 0)
+                {
+#ifdef EINTR
+                  if (errno == EINTR)
+                    continue;
+#endif
+                  error (0, errno, _("reading %s"), quote (src_name));
+                  return fail;
+                }
+
+              if (n_read == 0)
+                {
+                  /* Figure out how many bytes read from the last extent.  */
+                  last_read_size = last_ext_len - ext_len;
+                  break;
+                }
+
+              if (full_write (dest_fd, buf, n_read) != n_read)
+                {
+                  error (0, errno, _("writing %s"), quote (dst_name));
+                  return fail;
+                }
+
+              ext_len -= n_read;
+            }
+        }
+      fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length);
+    } while (! last);
+
+  /* If a file ends up with holes, the sum of the last extent logical offset
+     and the read-returned size will be shorter than the actual size of the
+     file.  Use ftruncate to extend the length of the destination file.  */
+  if (last_ext_logical + last_read_size < src_total_size)
+    {
+      if (ftruncate (dest_fd, src_total_size) < 0)
+        {
+          error (0, errno, _("extending %s"), quote (dst_name));
+          return fail;
+        }
+    }
+
+  return ! fail;
+}
+#else
+static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
+#endif
+
 /* FIXME: describe */
 /* FIXME: rewrite this to use a hash table so we avoid the quadratic
    performance hit that's probably noticeable only on trees deeper
@@ -679,6 +812,25 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }

+      if (make_holes)
+        {
+          bool require_normal_copy = false;
+          /* Perform efficient FIEMAP copy for sparse files, fall back to the
+             standard copy only if the ioctl(2) fails.  */
+          if (fiemap_copy_ok (source_desc, dest_desc, buf_size,
+                              src_open_sb.st_size, src_name,
+                              dst_name, &require_normal_copy))
+            goto preserve_metadata;
+          else
+            {
+              if (! require_normal_copy)
+                {
+                  return_val = false;
+                  goto close_src_and_dst_desc;
+                }
+            }
+        }
+
       /* If not making a sparse file, try to use a more-efficient
          buffer size.  */
       if (! make_holes)
@@ -807,6 +959,7 @@ copy_reg (char const *src_name, char const *dst_name,
         }
     }

+preserve_metadata:
   if (x->preserve_timestamps)
     {
       struct timespec timespec[2];
diff --git a/src/fiemap.h b/src/fiemap.h
new file mode 100644
index 0000000..d33293b
--- /dev/null
+++ b/src/fiemap.h
@@ -0,0 +1,102 @@
+/* FS_IOC_FIEMAP ioctl infrastructure.
+   Some portions copyright (C) 2007 Cluster File Systems, Inc
+   Authors: Mark Fasheh <mfasheh <at> suse.com>
+            Kalpak Shah <kalpak.shah <at> sun.com>
+            Andreas Dilger <adilger <at> sun.com>.  */
+
+/* Copy from kernel, modified to respect GNU code style by Jie Liu.  */
+
+#ifndef _LINUX_FIEMAP_H
+# define _LINUX_FIEMAP_H
+
+# include <linux/types.h>
+
+struct fiemap_extent
+{
+  /* Logical offset in bytes for the start of the extent
+     from the beginning of the file.  */
+  uint64_t fe_logical;
+
+  /* Physical offset in bytes for the start of the extent
+     from the beginning of the disk.  */
+  uint64_t fe_physical;
+
+  /* Length in bytes for this extent.  */
+  uint64_t fe_length;
+
+  uint64_t fe_reserved64[2];
+
+  /* FIEMAP_EXTENT_* flags for this extent.  */
+  uint32_t fe_flags;
+
+  uint32_t fe_reserved[3];
+};
+
+struct fiemap
+{
+  /* Logical offset(inclusive) at which to start mapping(in).  */
+  uint64_t fm_start;
+
+  /* Logical length of mapping which userspace wants(in).  */
+  uint64_t fm_length;
+
+  /* FIEMAP_FLAG_* flags for request(in/out).  */
+  uint32_t fm_flags;
+
+  /* Number of extents that were mapped(out).  */
+  uint32_t fm_mapped_extents;
+
+  /* Size of fm_extents array(in).  */
+  uint32_t fm_extent_count;
+
+  uint32_t fm_reserved;
+
+  /* Array of mapped extents(out).  */
+  struct fiemap_extent fm_extents[0];
+};
+
+/* The maximum offset can be mapped for a file.  */
+# define FIEMAP_MAX_OFFSET       (~0ULL)
+
+/* Sync file data before map.  */
+# define FIEMAP_FLAG_SYNC        0x00000001
+
+/* Map extented attribute tree.  */
+# define FIEMAP_FLAG_XATTR       0x00000002
+
+# define FIEMAP_FLAGS_COMPAT     (FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR)
+
+/* Last extent in file.  */
+# define FIEMAP_EXTENT_LAST              0x00000001
+
+/* Data location unknown.  */
+# define FIEMAP_EXTENT_UNKNOWN           0x00000002
+
+/* Location still pending, Sets EXTENT_UNKNOWN.  */
+# define FIEMAP_EXTENT_DELALLOC          0x00000004
+
+/* Data can not be read while fs is unmounted.  */
+# define FIEMAP_EXTENT_ENCODED           0x00000008
+
+/* Data is encrypted by fs.  Sets EXTENT_NO_BYPASS.  */
+# define FIEMAP_EXTENT_DATA_ENCRYPTED    0x00000080
+
+/* Extent offsets may not be block aligned.  */
+# define FIEMAP_EXTENT_NOT_ALIGNED       0x00000100
+
+/* Data mixed with metadata.  Sets EXTENT_NOT_ALIGNED.  */
+# define FIEMAP_EXTENT_DATA_INLINE       0x00000200
+
+/* Multiple files in block.  Set EXTENT_NOT_ALIGNED.  */
+# define FIEMAP_EXTENT_DATA_TAIL         0x00000400
+
+/* Space allocated, but not data (i.e. zero).  */
+# define FIEMAP_EXTENT_UNWRITTEN         0x00000800
+
+/* File does not natively support extents.  Result merged for efficiency.  */
+# define FIEMAP_EXTENT_MERGED		0x00001000
+
+/* Space shared with other files.  */
+# define FIEMAP_EXTENT_SHARED            0x00002000
+
+#endif
-- 
1.5.4.3



Cheers,
-Jeff

-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Mon, 24 May 2010 08:29:01 GMT) Full text and rfc822 format available.

Message #47 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Mon, 24 May 2010 16:27:36 +0800
Jim Meyering wrote:
> jeff.liu wrote:
>> Jim Meyering wrote:
>>> jeff.liu wrote:
>>> ...
>>>>>> Subject: [PATCH 1/1] tests: add a new test for FIEMAP-copy
>>>>>>
>>>>>> * tests/cp/sparse-fiemap: Add a new test for FIEMAP-copy against a
>>>>>> loopbacked ext4 partition.
>>>>>> * tests/Makefile.am (sparse-fiemap): Reference the new test.
>>> BTW, I've just made this additional change to your test,
>>>
>>> diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
>>> index 6312a4c..bdc7ded 100755
>>> --- a/tests/cp/sparse-fiemap
>>> +++ b/tests/cp/sparse-fiemap
>>> @@ -27,6 +27,7 @@ require_root_
>>>  cwd=`pwd`
>>>  cleanup_() { cd /; umount "$cwd/mnt"; }
>>>
>>> +skip=0
> 
> More fixes for the test:
> - remove useless rm
> - $sparse and $fiemap were not defined in that final test,
>     so cd into mnt/ and just use the file names instead.
> - I prefer $(...) to `...`, and it's portable in this context
> 
> diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
> index 32ca5fc..cec5224 100755
> --- a/tests/cp/sparse-fiemap
> +++ b/tests/cp/sparse-fiemap
> @@ -40,18 +40,18 @@ test -s mnt/f                                  || skip=1
>  test $skip = 1 &&
>    skip_test_ "insufficient mount/ext4 support"
> 
> -rm -f mnt/f
> -
>  # Create a 1TiB sparse file
>  dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=1G || framework_failure
> 
> +cd mnt || fail=1
> +
>  # It takes many minutes to copy this sparse file using the old method.
>  # By contrast, it takes far less than 1 second using FIEMAP-copy.
> -timeout 10 cp --sparse=always mnt/sparse mnt/sparse_fiemap || fail=1
> +timeout 10 cp --sparse=always sparse fiemap || fail=1
> 
>  # Ensure that the sparse file copied through fiemap has the same size
>  # in bytes as the original.
> -test `stat --printf %s $sparse` = `stat --printf %s $fiemap` || fail=1
> +test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1
> 
>  # =================================================
>  # Ensure that we exercise the FIEMAP-copying code enough
> --
> 1.7.1.262.g5ef3d

Thanks for the info.

For the point of *- I prefer $(...) to `...`, and it's portable in this context*

Could you check the tiny patch below, it does the same thing to improve the portability of
'tests/cp/sparse', I have referred to it as a model.

From 9fbffcd0f4d4706f4b88e7ac73ca13b62c789047 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Mon, 24 May 2010 16:01:06 +0800
Subject: [PATCH 1/1] cp: improve the portability of test

* tests/cp/sparse: improve the portability using shell constructs.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 tests/cp/sparse |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/cp/sparse b/tests/cp/sparse
index 73c2924..dee6de2 100755
--- a/tests/cp/sparse
+++ b/tests/cp/sparse
@@ -28,14 +28,14 @@ require_sparse_support_
 # It has to be at least 128K in order to be sparse on some systems.
 # Make its size one larger than 128K, in order to tickle the
 # bug in coreutils-6.0.
-size=`expr 128 \* 1024 + 1`
+size=$((128 * 1024 + 1))
 dd bs=1 seek=$size of=sparse < /dev/null 2> /dev/null || framework_failure


 cp --sparse=always sparse copy || fail=1

 # Ensure that the copy has the same block count as the original.
-test `stat --printf %b copy` -le `stat --printf %b sparse` || fail=1
+test $(stat --printf %b copy) -le $(stat --printf %b sparse) || fail=1

 # Ensure that --sparse={always,never} with --reflink fail.
 cp --sparse=always --reflink sparse copy && fail=1
-- 
1.5.4.3


Thanks,
-Jeff

-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Mon, 24 May 2010 09:36:02 GMT) Full text and rfc822 format available.

Message #50 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Mon, 24 May 2010 17:34:01 +0800
jeff.liu wrote:
> Jim Meyering wrote:
>> jeff.liu wrote:
>>> Jim Meyering wrote:
>>>> jeff.liu wrote:
>>>> ...
>>>>>>> Subject: [PATCH 1/1] tests: add a new test for FIEMAP-copy
>>>>>>>
>>>>>>> * tests/cp/sparse-fiemap: Add a new test for FIEMAP-copy against a
>>>>>>> loopbacked ext4 partition.
>>>>>>> * tests/Makefile.am (sparse-fiemap): Reference the new test.
>>>> BTW, I've just made this additional change to your test,
>>>>
>>>> diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
>>>> index 6312a4c..bdc7ded 100755
>>>> --- a/tests/cp/sparse-fiemap
>>>> +++ b/tests/cp/sparse-fiemap
>>>> @@ -27,6 +27,7 @@ require_root_
>>>>  cwd=`pwd`
>>>>  cleanup_() { cd /; umount "$cwd/mnt"; }
>>>>
>>>> +skip=0
>> More fixes for the test:
>> - remove useless rm
>> - $sparse and $fiemap were not defined in that final test,
>>     so cd into mnt/ and just use the file names instead.
>> - I prefer $(...) to `...`, and it's portable in this context
>>
>> diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
>> index 32ca5fc..cec5224 100755
>> --- a/tests/cp/sparse-fiemap
>> +++ b/tests/cp/sparse-fiemap
>> @@ -40,18 +40,18 @@ test -s mnt/f                                  || skip=1
>>  test $skip = 1 &&
>>    skip_test_ "insufficient mount/ext4 support"
>>
>> -rm -f mnt/f
>> -
>>  # Create a 1TiB sparse file
>>  dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=1G || framework_failure
>>
>> +cd mnt || fail=1
>> +
>>  # It takes many minutes to copy this sparse file using the old method.
>>  # By contrast, it takes far less than 1 second using FIEMAP-copy.
>> -timeout 10 cp --sparse=always mnt/sparse mnt/sparse_fiemap || fail=1
>> +timeout 10 cp --sparse=always sparse fiemap || fail=1
>>
>>  # Ensure that the sparse file copied through fiemap has the same size
>>  # in bytes as the original.
>> -test `stat --printf %s $sparse` = `stat --printf %s $fiemap` || fail=1
>> +test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1
>>
>>  # =================================================
>>  # Ensure that we exercise the FIEMAP-copying code enough
>> --
>> 1.7.1.262.g5ef3d
> 
> Thanks for the info.
> 
> For the point of *- I prefer $(...) to `...`, and it's portable in this context*
> 
> Could you check the tiny patch below, it does the same thing to improve the portability of
> 'tests/cp/sparse', I have referred to it as a model.
> 
> From 9fbffcd0f4d4706f4b88e7ac73ca13b62c789047 Mon Sep 17 00:00:00 2001
> From: Jie Liu <jeff.liu <at> oracle.com>
> Date: Mon, 24 May 2010 16:01:06 +0800
> Subject: [PATCH 1/1] cp: improve the portability of test
> 
> * tests/cp/sparse: improve the portability using shell constructs.
> 
> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
> ---
>  tests/cp/sparse |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/cp/sparse b/tests/cp/sparse
> index 73c2924..dee6de2 100755
> --- a/tests/cp/sparse
> +++ b/tests/cp/sparse
> @@ -28,14 +28,14 @@ require_sparse_support_
>  # It has to be at least 128K in order to be sparse on some systems.
>  # Make its size one larger than 128K, in order to tickle the
>  # bug in coreutils-6.0.
> -size=`expr 128 \* 1024 + 1`
> +size=$((128 * 1024 + 1))
>  dd bs=1 seek=$size of=sparse < /dev/null 2> /dev/null || framework_failure
> 
> 
>  cp --sparse=always sparse copy || fail=1
> 
>  # Ensure that the copy has the same block count as the original.
> -test `stat --printf %b copy` -le `stat --printf %b sparse` || fail=1
> +test $(stat --printf %b copy) -le $(stat --printf %b sparse) || fail=1
> 
>  # Ensure that --sparse={always,never} with --reflink fail.
>  cp --sparse=always --reflink sparse copy && fail=1

Please ignore above patch, I just found another issue in 'tests/cp/sparse', the new created test
file also named to 'sparse', so when it running, the `cp/sparse' will be truncated to `expr 128 \*
1024 + 1`.

Below patch fix it to create a sparse file 'sparse1' instead(I can not find out a better name for
now), s/-le/=/ to compare the block count.

From 0669ac6d0497a3c6abfc5d53202afc6bc47d0d07 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Mon, 24 May 2010 17:29:27 +0800
Subject: [PATCH 1/1] cp: enhance the sparse file copy test

* tests/cp/sparse: fix sparse file name to 'sparse1', improve
the protability using shell constructs.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 tests/cp/sparse |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tests/cp/sparse b/tests/cp/sparse
index 73c2924..cab8b9c 100755
--- a/tests/cp/sparse
+++ b/tests/cp/sparse
@@ -28,17 +28,17 @@ require_sparse_support_
 # It has to be at least 128K in order to be sparse on some systems.
 # Make its size one larger than 128K, in order to tickle the
 # bug in coreutils-6.0.
-size=`expr 128 \* 1024 + 1`
-dd bs=1 seek=$size of=sparse < /dev/null 2> /dev/null || framework_failure
+size=$((128 * 1024 + 1))
+dd bs=1 seek=$size of=sparse1 < /dev/null 2> /dev/null || framework_failure


-cp --sparse=always sparse copy || fail=1
+cp --sparse=always sparse1 copy || fail=1

 # Ensure that the copy has the same block count as the original.
-test `stat --printf %b copy` -le `stat --printf %b sparse` || fail=1
+test $(stat --printf %b copy) = $(stat --printf %b sparse1) || fail=1

 # Ensure that --sparse={always,never} with --reflink fail.
-cp --sparse=always --reflink sparse copy && fail=1
-cp --sparse=never --reflink sparse copy && fail=1
+cp --sparse=always --reflink sparse1 copy && fail=1
+cp --sparse=never --reflink sparse1 copy && fail=1

 Exit $fail
-- 
1.5.4.3


-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Mon, 24 May 2010 10:02:02 GMT) Full text and rfc822 format available.

Message #53 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Mon, 24 May 2010 09:41:27 +0200
jeff.liu wrote:
> Jim Meyering wrote:
>> jeff.liu wrote:
>> ...
>>>>> Subject: [PATCH 1/1] tests: add a new test for FIEMAP-copy
>>>>>
>>>>> * tests/cp/sparse-fiemap: Add a new test for FIEMAP-copy against a
>>>>> loopbacked ext4 partition.
>>>>> * tests/Makefile.am (sparse-fiemap): Reference the new test.
>>
>> BTW, I've just made this additional change to your test,
>>
>> diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
>> index 6312a4c..bdc7ded 100755
>> --- a/tests/cp/sparse-fiemap
>> +++ b/tests/cp/sparse-fiemap
>> @@ -27,6 +27,7 @@ require_root_
>>  cwd=`pwd`
>>  cleanup_() { cd /; umount "$cwd/mnt"; }
>>
>> +skip=0

More fixes for the test:
- remove useless rm
- $sparse and $fiemap were not defined in that final test,
    so cd into mnt/ and just use the file names instead.
- I prefer $(...) to `...`, and it's portable in this context

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index 32ca5fc..cec5224 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -40,18 +40,18 @@ test -s mnt/f                                  || skip=1
 test $skip = 1 &&
   skip_test_ "insufficient mount/ext4 support"

-rm -f mnt/f
-
 # Create a 1TiB sparse file
 dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=1G || framework_failure

+cd mnt || fail=1
+
 # It takes many minutes to copy this sparse file using the old method.
 # By contrast, it takes far less than 1 second using FIEMAP-copy.
-timeout 10 cp --sparse=always mnt/sparse mnt/sparse_fiemap || fail=1
+timeout 10 cp --sparse=always sparse fiemap || fail=1

 # Ensure that the sparse file copied through fiemap has the same size
 # in bytes as the original.
-test `stat --printf %s $sparse` = `stat --printf %s $fiemap` || fail=1
+test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1

 # =================================================
 # Ensure that we exercise the FIEMAP-copying code enough
--
1.7.1.262.g5ef3d




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 25 May 2010 05:51:02 GMT) Full text and rfc822 format available.

Message #56 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 25 May 2010 07:50:27 +0200
jeff.liu wrote:
...
> Please ignore above patch, I just found another issue in 'tests/cp/sparse', the new created test
> file also named to 'sparse', so when it running, the `cp/sparse' will be truncated to `expr 128 \*
> 1024 + 1`.
>
> Below patch fix it to create a sparse file 'sparse1' instead(I can not find out a better name for
> now), s/-le/=/ to compare the block count.
>
>>From 0669ac6d0497a3c6abfc5d53202afc6bc47d0d07 Mon Sep 17 00:00:00 2001
> From: Jie Liu <jeff.liu <at> oracle.com>
> Date: Mon, 24 May 2010 17:29:27 +0800
> Subject: [PATCH 1/1] cp: enhance the sparse file copy test
>
> * tests/cp/sparse: fix sparse file name to 'sparse1', improve
> the protability using shell constructs.
>
> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
> ---
>  tests/cp/sparse |   12 ++++++------
>  1 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/tests/cp/sparse b/tests/cp/sparse
> index 73c2924..cab8b9c 100755
> --- a/tests/cp/sparse
> +++ b/tests/cp/sparse
> @@ -28,17 +28,17 @@ require_sparse_support_
>  # It has to be at least 128K in order to be sparse on some systems.
>  # Make its size one larger than 128K, in order to tickle the
>  # bug in coreutils-6.0.
> -size=`expr 128 \* 1024 + 1`
> -dd bs=1 seek=$size of=sparse < /dev/null 2> /dev/null || framework_failure
> +size=$((128 * 1024 + 1))

Thank you, but I will not use this patch.

First, $((...)) is *not* portable.
Note that while $(...) is an improvement in coreutils tests, in most projects
converting `...` to $(...) would represent a portability *regression*.
It happens to be acceptable in coreutils tests (and thus preferred by me)
because coreutils ensures that tests are run using a shell that is
modern enough to accept $(...).

Second, while I prefer $(...), it's not worth converting them
one by one.  There are over 500+ uses in tests/.

> +dd bs=1 seek=$size of=sparse1 < /dev/null 2> /dev/null || framework_failure

Are you worried about tests running in parallel, and thus
this test's "sparse" file colliding with the one by
the same name in the fiemap test?
That's not a problem, since each is run in its own
separate subdirectory, via the machinery in test-lib.sh.

> -cp --sparse=always sparse copy || fail=1
> +cp --sparse=always sparse1 copy || fail=1




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 25 May 2010 08:12:02 GMT) Full text and rfc822 format available.

Message #59 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 25 May 2010 16:11:33 +0800
Jim Meyering wrote:
> jeff.liu wrote:
> ...
>> Please ignore above patch, I just found another issue in 'tests/cp/sparse', the new created test
>> file also named to 'sparse', so when it running, the `cp/sparse' will be truncated to `expr 128 \*
>> 1024 + 1`.
>>
>> Below patch fix it to create a sparse file 'sparse1' instead(I can not find out a better name for
>> now), s/-le/=/ to compare the block count.
>>
>> >From 0669ac6d0497a3c6abfc5d53202afc6bc47d0d07 Mon Sep 17 00:00:00 2001
>> From: Jie Liu <jeff.liu <at> oracle.com>
>> Date: Mon, 24 May 2010 17:29:27 +0800
>> Subject: [PATCH 1/1] cp: enhance the sparse file copy test
>>
>> * tests/cp/sparse: fix sparse file name to 'sparse1', improve
>> the protability using shell constructs.
>>
>> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
>> ---
>>  tests/cp/sparse |   12 ++++++------
>>  1 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/tests/cp/sparse b/tests/cp/sparse
>> index 73c2924..cab8b9c 100755
>> --- a/tests/cp/sparse
>> +++ b/tests/cp/sparse
>> @@ -28,17 +28,17 @@ require_sparse_support_
>>  # It has to be at least 128K in order to be sparse on some systems.
>>  # Make its size one larger than 128K, in order to tickle the
>>  # bug in coreutils-6.0.
>> -size=`expr 128 \* 1024 + 1`
>> -dd bs=1 seek=$size of=sparse < /dev/null 2> /dev/null || framework_failure
>> +size=$((128 * 1024 + 1))
> 
> Thank you, but I will not use this patch.
> 
> First, $((...)) is *not* portable.
> Note that while $(...) is an improvement in coreutils tests, in most projects
> converting `...` to $(...) would represent a portability *regression*.
> It happens to be acceptable in coreutils tests (and thus preferred by me)
> because coreutils ensures that tests are run using a shell that is
> modern enough to accept $(...).
Thanks for the clarification.
> 
> Second, while I prefer $(...), it's not worth converting them
> one by one.  There are over 500+ uses in tests/.
> 
>> +dd bs=1 seek=$size of=sparse1 < /dev/null 2> /dev/null || framework_failure
> 
> Are you worried about tests running in parallel, and thus
> this test's "sparse" file colliding with the one by
> the same name in the fiemap test?
No.
At first, I took a look at the file 'cp/sparse', I consider it could be truncated by
the following DD(1) operation, because the target name also named as 'sparse' which is same as
'cp/sparse', if the test run at the same physical directory(i.e. tests/cp/)

++ expr 128 '*' 1024 + 1
+ size=131073
+ dd bs=1 seek=131073 of=sparse
+ cp --sparse=always sparse copy
++ stat --printf %b copy
++ stat --printf %b sparse
+ test 8 -le 8

but by checking through the run log file, looks all the tests are safely run at their own sub-dirs,
Sorry for my take for granted for this point.
+++ /home/jeff/opensource_dev/coreutils/src/mktemp -d
--tmp=/home/jeff/opensource_dev/coreutils/tests cu-sparse.XXXXXXXXXX

> That's not a problem, since each is run in its own
> separate subdirectory, via the machinery in test-lib.sh.
> 
>> -cp --sparse=always sparse copy || fail=1
>> +cp --sparse=always sparse1 copy || fail=1
> 
> 
> 

Thanks,
-Jeff

-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 27 May 2010 10:32:02 GMT) Full text and rfc822 format available.

Message #62 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 27 May 2010 12:30:58 +0200
jeff.liu wrote:
> This is the revised version, it fixed the fiemap-start offset calculation
> approach to remove it out
> of the 'for (i = 0; i < fiemap->fm_mapped_extents; i++)' loop.

Hi Jeff,

I've included below the state of my local changes.
Unfortunately, with that 5-patch series, there is always a test failure
on F13/ext4.  Maybe someone who knows more about extents can provide an
explanation?

Here's a small example to demonstrate:

Create a file with many extents:

    perl -e 'BEGIN { $n = 19 * 1024; *F = *STDOUT }' \
      -e 'for (1..100) { sysseek (*F, $n, 1)' \
      -e '&& syswrite (*F, "."x$n) or die "$!"}' > j1

Using the patched "cp", repeat the following 10 or 20 times:

    ./cp --sparse=always j1 j2; sync
    filefrag -v j1 | awk '/^ / {print $1,$2}' > ff1 || fail=1
    filefrag -v j2 | awk '/^ / {print $1,$2}' > ff2 || fail=1
    diff -u ff1 ff2 || fail=1

Usually there is no diff output, but occasionally it'll print this:
[hmm... today it consistently prints these differences every other time.]

$ diff -u ff1 ff2 || fail=1
--- ff1 2010-05-22 18:42:26.943501382 +0200
+++ ff2 2010-05-22 18:42:27.020876155 +0200
@@ -53,49 +53,50 @@ ext logical
 51 489
 52 498
 53 508
-54 517
-55 527
-56 536
-57 546
-58 555
-59 565
-60 574
-61 584
-62 593
-63 603
-64 612
-65 622
-66 631
-67 641
-68 650
-69 660
-70 669
-71 679
-72 688
-73 698
-74 707
-75 717
-76 726
-77 736
-78 745
-79 755
-80 764
-81 774
-82 783
-83 793
-84 802
-85 812
-86 821
-87 831
-88 840
-89 850
-90 859
-91 869
-92 878
-93 888
-94 897
-95 907
-96 916
-97 926
-98 935
-99 945
+54 512
+55 517
+56 527
+57 536
+58 546
+59 555
+60 565
+61 574
+62 584
+63 593
+64 603
+65 612
+66 622
+67 631
+68 641
+69 650
+70 660
+71 669
+72 679
+73 688
+74 698
+75 707
+76 717
+77 726
+78 736
+79 745
+80 755
+81 764
+82 774
+83 783
+84 793
+85 802
+86 812
+87 821
+88 831
+89 840
+90 850
+91 859
+92 869
+93 878
+94 888
+95 897
+96 907
+97 916
+98 926
+99 935
+100 945


From 2ef44bcb0bbb38cca738aa90e067caee312da939 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Thu, 13 May 2010 22:09:30 +0800
Subject: [PATCH 1/5] cp: Add FIEMAP support for efficient sparse file copy

* src/fiemap.h: Add fiemap.h for fiemap ioctl(2) support.
Copied from linux's include/linux/fiemap.h, with minor formatting changes.
* src/copy.c (copy_reg): Now, when `cp' invoked with --sparse=[WHEN] option, we
will try to do FIEMAP-copy if the underlaying file system support it, fall back
to a normal copy if it fails.
---
 src/copy.c   |  159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/fiemap.h |  102 +++++++++++++++++++++++++++++++++++++
 2 files changed, 261 insertions(+), 0 deletions(-)
 create mode 100644 src/fiemap.h

diff --git a/src/copy.c b/src/copy.c
index c16cef6..0e54729 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -63,6 +63,10 @@

 #include <sys/ioctl.h>

+#ifndef HAVE_FIEMAP
+# include "fiemap.h"
+#endif
+
 #ifndef HAVE_FCHOWN
 # define HAVE_FCHOWN false
 # define fchown(fd, uid, gid) (-1)
@@ -149,6 +153,141 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

+#ifdef __linux__
+# ifndef FS_IOC_FIEMAP
+#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
+# endif
+/* Perform FIEMAP(available in mainline 2.6.27) copy if possible.
+   Call ioctl(2) with FS_IOC_FIEMAP to efficiently map file allocation
+   excepts holes.  So the overhead to deal with holes with lseek(2) in
+   normal copy could be saved.  This would result in much faster backups
+   for any kind of sparse file.  */
+static bool
+fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
+                off_t src_total_size, char const *src_name,
+                char const *dst_name, bool *normal_copy_required)
+{
+  bool fail = false;
+  bool last = false;
+  char fiemap_buf[4096];
+  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
+  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
+  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
+                    sizeof (struct fiemap_extent);
+  off_t last_ext_logical = 0;
+  uint64_t last_ext_len = 0;
+  uint64_t last_read_size = 0;
+  unsigned int i = 0;
+
+  /* This is required at least to initialize fiemap->fm_start,
+     but also serves (in May 2010) to appease valgrind, which
+     appears not to know the semantics of the FIEMAP ioctl. */
+  memset (fiemap_buf, 0, sizeof fiemap_buf);
+
+  do
+    {
+      fiemap->fm_length = FIEMAP_MAX_OFFSET;
+      fiemap->fm_extent_count = count;
+
+      /* When ioctl(2) fails, fall back to the normal copy only if it
+         is the first time we met.  */
+      if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
+        {
+          /* If `i > 0', then at least one ioctl(2) has been performed before.  */
+          if (i == 0)
+            *normal_copy_required = true;
+          return false;
+        }
+
+      /* If 0 extents are returned, then more ioctls are not needed.  */
+      if (fiemap->fm_mapped_extents == 0)
+        break;
+
+      for (i = 0; i < fiemap->fm_mapped_extents; i++)
+        {
+          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
+
+          off_t ext_logical = fm_ext[i].fe_logical;
+          uint64_t ext_len = fm_ext[i].fe_length;
+
+          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (src_name));
+              return fail;
+            }
+
+          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (dst_name));
+              return fail;
+            }
+
+          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
+            {
+              last_ext_logical = ext_logical;
+              last_ext_len = ext_len;
+              last = true;
+            }
+
+          while (0 < ext_len)
+            {
+              char buf[buf_size];
+
+              /* Avoid reading into the holes if the left extent
+                 length is shorter than the buffer size.  */
+              if (ext_len < buf_size)
+                buf_size = ext_len;
+
+              ssize_t n_read = read (src_fd, buf, buf_size);
+              if (n_read < 0)
+                {
+#ifdef EINTR
+                  if (errno == EINTR)
+                    continue;
+#endif
+                  error (0, errno, _("reading %s"), quote (src_name));
+                  return fail;
+                }
+
+              if (n_read == 0)
+                {
+                  /* Figure out how many bytes read from the last extent.  */
+                  last_read_size = last_ext_len - ext_len;
+                  break;
+                }
+
+              if (full_write (dest_fd, buf, n_read) != n_read)
+                {
+                  error (0, errno, _("writing %s"), quote (dst_name));
+                  return fail;
+                }
+
+              ext_len -= n_read;
+            }
+        }
+
+      fiemap->fm_start = fm_ext[i - 1].fe_logical + fm_ext[i - 1].fe_length;
+
+    } while (! last);
+
+  /* If a file ends up with holes, the sum of the last extent logical offset
+     and the read-returned size will be shorter than the actual size of the
+     file.  Use ftruncate to extend the length of the destination file.  */
+  if (last_ext_logical + last_read_size < src_total_size)
+    {
+      if (ftruncate (dest_fd, src_total_size) < 0)
+        {
+          error (0, errno, _("extending %s"), quote (dst_name));
+          return fail;
+        }
+    }
+
+  return ! fail;
+}
+#else
+static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
+#endif
+
 /* FIXME: describe */
 /* FIXME: rewrite this to use a hash table so we avoid the quadratic
    performance hit that's probably noticeable only on trees deeper
@@ -679,6 +818,25 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }

+      if (make_holes)
+        {
+          bool require_normal_copy = false;
+          /* Perform efficient FIEMAP copy for sparse files, fall back to the
+             standard copy only if the ioctl(2) fails.  */
+          if (fiemap_copy_ok (source_desc, dest_desc, buf_size,
+                              src_open_sb.st_size, src_name,
+                              dst_name, &require_normal_copy))
+            goto preserve_metadata;
+          else
+            {
+              if (! require_normal_copy)
+                {
+                  return_val = false;
+                  goto close_src_and_dst_desc;
+                }
+            }
+        }
+
       /* If not making a sparse file, try to use a more-efficient
          buffer size.  */
       if (! make_holes)
@@ -807,6 +965,7 @@ copy_reg (char const *src_name, char const *dst_name,
         }
     }

+preserve_metadata:
   if (x->preserve_timestamps)
     {
       struct timespec timespec[2];
diff --git a/src/fiemap.h b/src/fiemap.h
new file mode 100644
index 0000000..d33293b
--- /dev/null
+++ b/src/fiemap.h
@@ -0,0 +1,102 @@
+/* FS_IOC_FIEMAP ioctl infrastructure.
+   Some portions copyright (C) 2007 Cluster File Systems, Inc
+   Authors: Mark Fasheh <mfasheh <at> suse.com>
+            Kalpak Shah <kalpak.shah <at> sun.com>
+            Andreas Dilger <adilger <at> sun.com>.  */
+
+/* Copy from kernel, modified to respect GNU code style by Jie Liu.  */
+
+#ifndef _LINUX_FIEMAP_H
+# define _LINUX_FIEMAP_H
+
+# include <linux/types.h>
+
+struct fiemap_extent
+{
+  /* Logical offset in bytes for the start of the extent
+     from the beginning of the file.  */
+  uint64_t fe_logical;
+
+  /* Physical offset in bytes for the start of the extent
+     from the beginning of the disk.  */
+  uint64_t fe_physical;
+
+  /* Length in bytes for this extent.  */
+  uint64_t fe_length;
+
+  uint64_t fe_reserved64[2];
+
+  /* FIEMAP_EXTENT_* flags for this extent.  */
+  uint32_t fe_flags;
+
+  uint32_t fe_reserved[3];
+};
+
+struct fiemap
+{
+  /* Logical offset(inclusive) at which to start mapping(in).  */
+  uint64_t fm_start;
+
+  /* Logical length of mapping which userspace wants(in).  */
+  uint64_t fm_length;
+
+  /* FIEMAP_FLAG_* flags for request(in/out).  */
+  uint32_t fm_flags;
+
+  /* Number of extents that were mapped(out).  */
+  uint32_t fm_mapped_extents;
+
+  /* Size of fm_extents array(in).  */
+  uint32_t fm_extent_count;
+
+  uint32_t fm_reserved;
+
+  /* Array of mapped extents(out).  */
+  struct fiemap_extent fm_extents[0];
+};
+
+/* The maximum offset can be mapped for a file.  */
+# define FIEMAP_MAX_OFFSET       (~0ULL)
+
+/* Sync file data before map.  */
+# define FIEMAP_FLAG_SYNC        0x00000001
+
+/* Map extented attribute tree.  */
+# define FIEMAP_FLAG_XATTR       0x00000002
+
+# define FIEMAP_FLAGS_COMPAT     (FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR)
+
+/* Last extent in file.  */
+# define FIEMAP_EXTENT_LAST              0x00000001
+
+/* Data location unknown.  */
+# define FIEMAP_EXTENT_UNKNOWN           0x00000002
+
+/* Location still pending, Sets EXTENT_UNKNOWN.  */
+# define FIEMAP_EXTENT_DELALLOC          0x00000004
+
+/* Data can not be read while fs is unmounted.  */
+# define FIEMAP_EXTENT_ENCODED           0x00000008
+
+/* Data is encrypted by fs.  Sets EXTENT_NO_BYPASS.  */
+# define FIEMAP_EXTENT_DATA_ENCRYPTED    0x00000080
+
+/* Extent offsets may not be block aligned.  */
+# define FIEMAP_EXTENT_NOT_ALIGNED       0x00000100
+
+/* Data mixed with metadata.  Sets EXTENT_NOT_ALIGNED.  */
+# define FIEMAP_EXTENT_DATA_INLINE       0x00000200
+
+/* Multiple files in block.  Set EXTENT_NOT_ALIGNED.  */
+# define FIEMAP_EXTENT_DATA_TAIL         0x00000400
+
+/* Space allocated, but not data (i.e. zero).  */
+# define FIEMAP_EXTENT_UNWRITTEN         0x00000800
+
+/* File does not natively support extents.  Result merged for efficiency.  */
+# define FIEMAP_EXTENT_MERGED		0x00001000
+
+/* Space shared with other files.  */
+# define FIEMAP_EXTENT_SHARED            0x00002000
+
+#endif
--
1.7.1.348.gb26ba


From 260b5b89e33da2b9a5ea5bcd9dba874f503d2937 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Thu, 13 May 2010 22:17:53 +0800
Subject: [PATCH 2/5] tests: add a new test for FIEMAP-copy

* tests/cp/sparse-fiemap: Add a new test for FIEMAP-copy against a
loopbacked ext4 partition.
* tests/Makefile.am (sparse-fiemap): Reference the new test.
---
 tests/Makefile.am      |    1 +
 tests/cp/sparse-fiemap |   56 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 0 deletions(-)
 create mode 100755 tests/cp/sparse-fiemap

diff --git a/tests/Makefile.am b/tests/Makefile.am
index c458574..f7840c8 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -25,6 +25,7 @@ root_tests =					\
   cp/special-bits				\
   cp/cp-mv-enotsup-xattr			\
   cp/capability					\
+  cp/sparse-fiemap                              \
   dd/skip-seek-past-dev				\
   install/install-C-root			\
   ls/capability					\
diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
new file mode 100755
index 0000000..945c94b
--- /dev/null
+++ b/tests/cp/sparse-fiemap
@@ -0,0 +1,56 @@
+#!/bin/sh
+# Test cp --sparse=always through fiemap copy
+
+# Copyright (C) 2006-2010 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+if test "$VERBOSE" = yes; then
+  set -x
+  cp --version
+fi
+
+. $srcdir/test-lib.sh
+require_root_
+
+cwd=`pwd`
+cleanup_() { cd /; umount "$cwd/mnt"; }
+
+skip=0
+# Create an ext4 loopback file system
+dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
+mkdir mnt
+mkfs -t ext4 -F blob ||
+  skip_test_ "failed to create ext4 file system"
+mount -oloop blob mnt                          || skip=1
+echo test > mnt/f                              || skip=1
+test -s mnt/f                                  || skip=1
+
+test $skip = 1 &&
+  skip_test_ "insufficient mount/ext4 support"
+
+# Create a 1TiB sparse file
+dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=1G || framework_failure
+
+cd mnt || fail=1
+
+# It takes many minutes to copy this sparse file using the old method.
+# By contrast, it takes far less than 1 second using FIEMAP-copy.
+timeout 10 cp --sparse=always sparse fiemap || fail=1
+
+# Ensure that the sparse file copied through fiemap has the same size
+# in bytes as the original.
+test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1
+
+Exit $fail
--
1.7.1.348.gb26ba


From 738031aed78b6323969f799d3d5fbf19e0cfc91a Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Fri, 21 May 2010 18:28:42 +0200
Subject: [PATCH 3/5] tests: exercise more of the new FIEMAP copying code

* tests/cp/sparse-fiemap: Ensure that a file with many extents (more
than fit in copy.c's internal 4KiB buffer) is copied properly.
---
 tests/cp/sparse-fiemap |   38 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index 945c94b..b1643be 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -53,4 +53,42 @@ timeout 10 cp --sparse=always sparse fiemap || fail=1
 # in bytes as the original.
 test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1

+# =================================================
+# Ensure that we exercise the FIEMAP-copying code enough
+# to provoke at least two iterations of the do...while loop
+# in which it calls ioctl (fd, FS_IOC_FIEMAP,...
+# This also verifies that non-trivial extents are preserved.
+
+$PERL -e 1 || skip_test_ 'skipping part of this test; you lack perl'
+
+$PERL -e 'BEGIN { $n = 16 * 1024; *F = *STDOUT }' \
+      -e 'for (1..100) { sysseek (*F, $n, 1)' \
+      -e '&& syswrite (*F, "."x$n) or die "$!"}' > j1 || fail=1
+
+cp --sparse=always j1 j2 || fail=1
+cmp j1 j2 || fail=1
+
+filefrag j1 | grep extent \
+  || skip_test_ 'skipping part of this test; you lack filefrag'
+
+# Here is sample filefrag output:
+#   $ perl -e 'BEGIN{$n=16*1024; *F=*STDOUT}' \
+#          -e 'for (1..5) { sysseek(*F,$n,1)' \
+#          -e '&& syswrite *F,"."x$n or die "$!"}' > j
+#   $ filefrag -v j
+#   Filesystem type is: ef53
+#   File size of j is 163840 (40 blocks, blocksize 4096)
+#    ext logical physical expected length flags
+#      0       4  6258884               4
+#      1      12  6258892  6258887      4
+#      2      20  6258900  6258895      4
+#      3      28  6258908  6258903      4
+#      4      36  6258916  6258911      4 eof
+#   j: 6 extents found
+
+# exclude the physical block numbers; they always differ
+filefrag -v j1 | awk '/^ / {print $1,$2,$NF}' > ff1 || fail=1
+filefrag -v j2 | awk '/^ / {print $1,$2,$NF}' > ff2 || fail=1
+compare ff1 ff2 || fail=1
+
 Exit $fail
--
1.7.1.348.gb26ba


From e84f881cb5c2eb92f3b6d5bddaef50c3e811bc30 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sat, 22 May 2010 10:22:58 +0200
Subject: [PATCH 4/5] tests: require root access only if current partition is not ext4

* tests/cp/sparse-fiemap: Don't require root access if current
partition is ext4.
---
 tests/cp/sparse-fiemap |   44 +++++++++++++++++++++++++-------------------
 1 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index b1643be..371bced 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -22,28 +22,34 @@ if test "$VERBOSE" = yes; then
 fi

 . $srcdir/test-lib.sh
-require_root_

-cwd=`pwd`
-cleanup_() { cd /; umount "$cwd/mnt"; }
-
-skip=0
-# Create an ext4 loopback file system
-dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
-mkdir mnt
-mkfs -t ext4 -F blob ||
-  skip_test_ "failed to create ext4 file system"
-mount -oloop blob mnt                          || skip=1
-echo test > mnt/f                              || skip=1
-test -s mnt/f                                  || skip=1
-
-test $skip = 1 &&
-  skip_test_ "insufficient mount/ext4 support"
+if df -T -t ext4 . ; then
+  : # Current dir is on an ext4 partition.  Good!
+else
+  # It's not;  we need to create one, hence we need root access.
+  require_root_
+
+  cwd=$PWD
+  cleanup_() { cd /; umount "$cwd/mnt"; }
+
+  skip=0
+  # Create an ext4 loopback file system
+  dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
+  mkdir mnt
+  mkfs -t ext4 -F blob ||
+    skip_test_ "failed to create ext4 file system"
+  mount -oloop blob mnt                          || skip=1
+  echo test > mnt/f                              || skip=1
+  test -s mnt/f                                  || skip=1
+
+  test $skip = 1 &&
+    skip_test_ "insufficient mount/ext4 support"
+
+  cd mnt || fail=1
+fi

 # Create a 1TiB sparse file
-dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=1G || framework_failure
-
-cd mnt || fail=1
+dd if=/dev/zero of=sparse bs=1k count=1 seek=1G || framework_failure

 # It takes many minutes to copy this sparse file using the old method.
 # By contrast, it takes far less than 1 second using FIEMAP-copy.
--
1.7.1.348.gb26ba


From 213c9247a406b267bc5819ccbe3b729e9557c675 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sat, 22 May 2010 10:21:46 +0200
Subject: [PATCH 5/5] tests: fiemap test improvement

* tests/cp/sparse-fiemap: More tests.
---
 tests/cp/sparse-fiemap |   69 +++++++++++++++++++++++++-----------------------
 1 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index 371bced..ef3742e 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -38,14 +38,14 @@ else
   mkdir mnt
   mkfs -t ext4 -F blob ||
     skip_test_ "failed to create ext4 file system"
-  mount -oloop blob mnt                          || skip=1
-  echo test > mnt/f                              || skip=1
-  test -s mnt/f                                  || skip=1
+  mount -oloop blob mnt   || skip=1
+  cd mnt                  || skip=1
+  echo test > f           || skip=1
+  test -s f               || skip=1

   test $skip = 1 &&
     skip_test_ "insufficient mount/ext4 support"

-  cd mnt || fail=1
 fi

 # Create a 1TiB sparse file
@@ -67,34 +67,37 @@ test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1

 $PERL -e 1 || skip_test_ 'skipping part of this test; you lack perl'

-$PERL -e 'BEGIN { $n = 16 * 1024; *F = *STDOUT }' \
-      -e 'for (1..100) { sysseek (*F, $n, 1)' \
-      -e '&& syswrite (*F, "."x$n) or die "$!"}' > j1 || fail=1
-
-cp --sparse=always j1 j2 || fail=1
-cmp j1 j2 || fail=1
-
-filefrag j1 | grep extent \
-  || skip_test_ 'skipping part of this test; you lack filefrag'
-
-# Here is sample filefrag output:
-#   $ perl -e 'BEGIN{$n=16*1024; *F=*STDOUT}' \
-#          -e 'for (1..5) { sysseek(*F,$n,1)' \
-#          -e '&& syswrite *F,"."x$n or die "$!"}' > j
-#   $ filefrag -v j
-#   Filesystem type is: ef53
-#   File size of j is 163840 (40 blocks, blocksize 4096)
-#    ext logical physical expected length flags
-#      0       4  6258884               4
-#      1      12  6258892  6258887      4
-#      2      20  6258900  6258895      4
-#      3      28  6258908  6258903      4
-#      4      36  6258916  6258911      4 eof
-#   j: 6 extents found
-
-# exclude the physical block numbers; they always differ
-filefrag -v j1 | awk '/^ / {print $1,$2,$NF}' > ff1 || fail=1
-filefrag -v j2 | awk '/^ / {print $1,$2,$NF}' > ff2 || fail=1
-compare ff1 ff2 || fail=1
+for i in $(seq 20); do
+  for j in 1 2 31 100; do
+    $PERL -e 'BEGIN { $n = '$i' * 1024; *F = *STDOUT }' \
+          -e 'for (1..'$j') { sysseek (*F, $n, 1)' \
+          -e '&& syswrite (*F, "."x$n) or die "$!"}' > j1 || fail=1
+
+    cp --sparse=always j1 j2 || fail=1
+    cmp j1 j2 || fail=1
+    filefrag -v j1 | grep extent \
+      || skip_test_ 'skipping part of this test; you lack filefrag'
+
+    # Here is sample filefrag output:
+    #   $ perl -e 'BEGIN{$n=16*1024; *F=*STDOUT}' \
+    #          -e 'for (1..5) { sysseek(*F,$n,1)' \
+    #          -e '&& syswrite *F,"."x$n or die "$!"}' > j
+    #   $ filefrag -v j
+    #   Filesystem type is: ef53
+    #   File size of j is 163840 (40 blocks, blocksize 4096)
+    #    ext logical physical expected length flags
+    #      0       4  6258884               4
+    #      1      12  6258892  6258887      4
+    #      2      20  6258900  6258895      4
+    #      3      28  6258908  6258903      4
+    #      4      36  6258916  6258911      4 eof
+    #   j: 6 extents found
+
+    # exclude the physical block numbers; they always differ
+    filefrag -v j1 | awk '/^ / {print $1,$2}' > ff1 || fail=1
+    filefrag -v j2 | awk '/^ / {print $1,$2}' > ff2 || fail=1
+    compare ff1 ff2 || fail=1
+  done
+done

 Exit $fail
--
1.7.1.348.gb26ba




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 27 May 2010 14:19:01 GMT) Full text and rfc822 format available.

Message #65 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 27 May 2010 22:08:43 +0800
Jim Meyering wrote:
> jeff.liu wrote:
>> This is the revised version, it fixed the fiemap-start offset calculation
>> approach to remove it out
>> of the 'for (i = 0; i < fiemap->fm_mapped_extents; i++)' loop.
> 
> Hi Jeff,
> 
> I've included below the state of my local changes.
> Unfortunately, with that 5-patch series, there is always a test failure
> on F13/ext4.  Maybe someone who knows more about extents can provide an
> explanation?
> 
> Here's a small example to demonstrate:
> 
> Create a file with many extents:
> 
>     perl -e 'BEGIN { $n = 19 * 1024; *F = *STDOUT }' \
>       -e 'for (1..100) { sysseek (*F, $n, 1)' \
>       -e '&& syswrite (*F, "."x$n) or die "$!"}' > j1
> 
> Using the patched "cp", repeat the following 10 or 20 times:
> 
>     ./cp --sparse=always j1 j2; sync
>     filefrag -v j1 | awk '/^ / {print $1,$2}' > ff1 || fail=1
>     filefrag -v j2 | awk '/^ / {print $1,$2}' > ff2 || fail=1
>     diff -u ff1 ff2 || fail=1
> 
> Usually there is no diff output, but occasionally it'll print this:
> [hmm... today it consistently prints these differences every other time.]
Woo!!!
I just run this test on btrfs/ext4/ocfs2 against mainline kernel(Linux jeff-laptop
2.6.33-rc5-00238-gb04da8b-dirty) on my laptop.
Only btrfs always works well for me, Ext4 has the same issue like yours.
jeff <at> jeff-laptop:/ext4/test$ for ((i=0; i < 100; i++)); do ./1.sh; done
--- ff1	2010-05-27 22:03:25.263480260 +0800
+++ ff2	2010-05-27 22:03:25.315476210 +0800
@@ -26,77 +26,78 @@
 24 232
 25 242
 26 251
-27 261
-28 270
-29 280
-30 289
-31 299
-32 308
-33 318
-34 327
-35 337
-36 346
-37 356
-38 365
-39 375
-40 384
-41 394
-42 403
-43 413
-44 422
-45 432
-46 441
-47 451
-48 460
-49 470
-50 479
-51 489
-52 498
-53 508
-54 512
-55 517
-56 527
-57 536
-58 546
-59 555
-60 565
-61 574
-62 584
-63 593
-64 603
-65 612
-66 622
-67 631
-68 641
-69 650
-70 660
-71 669
-72 679
-73 688
-74 698
-75 707
-76 717
-77 726
-78 736
-79 745
-80 755
-81 764
-82 774
-83 783
-84 793
-85 802
-86 812
-87 821
-88 831
-89 840
-90 850
-91 859
-92 869
-93 878
-94 888
-95 897
-96 907
-97 916
-98 926
-99 935
-100 945
+27 256
+28 261
+29 270
+30 280
+31 289
+32 299
+33 308
+34 318
+35 327
+36 337
+37 346
+38 356
+39 365
+40 375
+41 384
+42 394
+43 403
+44 413
+45 422
+46 432
+47 441
+48 451
+49 460
+50 470
+51 479
+52 489
+53 498
+54 508
+55 512
+56 517
+57 527
+58 536
+59 546
+60 555
+61 565
+62 574
+63 584
+64 593
+65 603
+66 612
+67 622
+68 631
+69 641
+70 650
+71 660
+72 669
+73 679
+74 688
+75 698
+76 707
+77 717
+78 726
+79 736
+80 745
+81 755
+82 764
+83 774
+84 783
+85 793
+86 802
+87 812
+88 821
+89 831
+90 840
+91 850
+92 859
+93 869
+94 878
+95 888
+96 897
+97 907
+98 916
+99 926
+100 935
+101 945

OCFS2's show different for many times(repeat 100 times).
But I cannot figure out this issue from the patch at the moment. :(

HEROS, who can give some hints from the kernel's point of view?

jeff <at> jeff-laptop:/ocfs2/test$ for ((i=0; i < 100; i++)); do ./1.sh; done
--- ff1	2010-05-27 21:49:58.288483759 +0800
+++ ff2	2010-05-27 21:49:58.304545778 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 519
--- ff1	2010-05-27 21:50:05.260477055 +0800
+++ ff2	2010-05-27 21:50:05.276532299 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 815
--- ff1	2010-05-27 21:50:13.048484948 +0800
+++ ff2	2010-05-27 21:50:13.076486205 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 161
--- ff1	2010-05-27 21:50:19.748471818 +0800
+++ ff2	2010-05-27 21:50:19.764488230 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 457
--- ff1	2010-05-27 21:50:25.384540820 +0800
+++ ff2	2010-05-27 21:50:25.396508694 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 753
--- ff1	2010-05-27 21:50:33.504500173 +0800
+++ ff2	2010-05-27 21:50:33.536480827 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 99
--- ff1	2010-05-27 21:50:40.012512743 +0800
+++ ff2	2010-05-27 21:50:40.028511067 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 395
--- ff1	2010-05-27 21:50:46.680505132 +0800
+++ ff2	2010-05-27 21:50:46.708501290 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 691
--- ff1	2010-05-27 21:50:55.576480477 +0800
+++ ff2	2010-05-27 21:50:55.596475937 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 37
--- ff1	2010-05-27 21:51:03.491482451 +0800
+++ ff2	2010-05-27 21:51:03.515503404 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 333
--- ff1	2010-05-27 21:51:11.567497188 +0800
+++ ff2	2010-05-27 21:51:11.603479937 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 629
--- ff1	2010-05-27 21:51:20.983471486 +0800
+++ ff2	2010-05-27 21:51:21.011472813 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 925
--- ff1	2010-05-27 21:51:29.548508415 +0800
+++ ff2	2010-05-27 21:51:29.588486206 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 271
--- ff1	2010-05-27 21:51:35.927492089 +0800
+++ ff2	2010-05-27 21:51:35.959488178 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 567
--- ff1	2010-05-27 21:51:40.200480478 +0800
+++ ff2	2010-05-27 21:51:40.240471818 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 863
--- ff1	2010-05-27 21:51:45.772499475 +0800
+++ ff2	2010-05-27 21:51:45.820498986 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 209
--- ff1	2010-05-27 21:51:49.919519607 +0800
+++ ff2	2010-05-27 21:51:49.947475956 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 505
--- ff1	2010-05-27 21:51:55.195482312 +0800
+++ ff2	2010-05-27 21:51:55.231483778 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 801
--- ff1	2010-05-27 21:52:00.911479588 +0800
+++ ff2	2010-05-27 21:52:00.939473232 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 147
--- ff1	2010-05-27 21:52:06.208488091 +0800
+++ ff2	2010-05-27 21:52:06.252482433 +0800
@@ -1,2 +1,3 @@
 ext logical
 0 0
+1 443

....
...
> 
> $ diff -u ff1 ff2 || fail=1
> --- ff1 2010-05-22 18:42:26.943501382 +0200
> +++ ff2 2010-05-22 18:42:27.020876155 +0200
> @@ -53,49 +53,50 @@ ext logical
>  51 489
>  52 498
>  53 508
> -54 517
> -55 527
> -56 536
> -57 546
> -58 555
> -59 565
> -60 574
> -61 584
> -62 593
> -63 603
> -64 612
> -65 622
> -66 631
> -67 641
> -68 650
> -69 660
> -70 669
> -71 679
> -72 688
> -73 698
> -74 707
> -75 717
> -76 726
> -77 736
> -78 745
> -79 755
> -80 764
> -81 774
> -82 783
> -83 793
> -84 802
> -85 812
> -86 821
> -87 831
> -88 840
> -89 850
> -90 859
> -91 869
> -92 878
> -93 888
> -94 897
> -95 907
> -96 916
> -97 926
> -98 935
> -99 945
> +54 512
> +55 517
> +56 527
> +57 536
> +58 546
> +59 555
> +60 565
> +61 574
> +62 584
> +63 593
> +64 603
> +65 612
> +66 622
> +67 631
> +68 641
> +69 650
> +70 660
> +71 669
> +72 679
> +73 688
> +74 698
> +75 707
> +76 717
> +77 726
> +78 736
> +79 745
> +80 755
> +81 764
> +82 774
> +83 783
> +84 793
> +85 802
> +86 812
> +87 821
> +88 831
> +89 840
> +90 850
> +91 859
> +92 869
> +93 878
> +94 888
> +95 897
> +96 907
> +97 916
> +98 926
> +99 935
> +100 945
> 
> 
> From 2ef44bcb0bbb38cca738aa90e067caee312da939 Mon Sep 17 00:00:00 2001
> From: Jie Liu <jeff.liu <at> oracle.com>
> Date: Thu, 13 May 2010 22:09:30 +0800
> Subject: [PATCH 1/5] cp: Add FIEMAP support for efficient sparse file copy
> 
> * src/fiemap.h: Add fiemap.h for fiemap ioctl(2) support.
> Copied from linux's include/linux/fiemap.h, with minor formatting changes.
> * src/copy.c (copy_reg): Now, when `cp' invoked with --sparse=[WHEN] option, we
> will try to do FIEMAP-copy if the underlaying file system support it, fall back
> to a normal copy if it fails.
> ---
>  src/copy.c   |  159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  src/fiemap.h |  102 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 261 insertions(+), 0 deletions(-)
>  create mode 100644 src/fiemap.h
> 
> diff --git a/src/copy.c b/src/copy.c
> index c16cef6..0e54729 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -63,6 +63,10 @@
> 
>  #include <sys/ioctl.h>
> 
> +#ifndef HAVE_FIEMAP
> +# include "fiemap.h"
> +#endif
> +
>  #ifndef HAVE_FCHOWN
>  # define HAVE_FCHOWN false
>  # define fchown(fd, uid, gid) (-1)
> @@ -149,6 +153,141 @@ clone_file (int dest_fd, int src_fd)
>  #endif
>  }
> 
> +#ifdef __linux__
> +# ifndef FS_IOC_FIEMAP
> +#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
> +# endif
> +/* Perform FIEMAP(available in mainline 2.6.27) copy if possible.
> +   Call ioctl(2) with FS_IOC_FIEMAP to efficiently map file allocation
> +   excepts holes.  So the overhead to deal with holes with lseek(2) in
> +   normal copy could be saved.  This would result in much faster backups
> +   for any kind of sparse file.  */
> +static bool
> +fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
> +                off_t src_total_size, char const *src_name,
> +                char const *dst_name, bool *normal_copy_required)
> +{
> +  bool fail = false;
> +  bool last = false;
> +  char fiemap_buf[4096];
> +  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
> +  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
> +  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
> +                    sizeof (struct fiemap_extent);
> +  off_t last_ext_logical = 0;
> +  uint64_t last_ext_len = 0;
> +  uint64_t last_read_size = 0;
> +  unsigned int i = 0;
> +
> +  /* This is required at least to initialize fiemap->fm_start,
> +     but also serves (in May 2010) to appease valgrind, which
> +     appears not to know the semantics of the FIEMAP ioctl. */
> +  memset (fiemap_buf, 0, sizeof fiemap_buf);
> +
> +  do
> +    {
> +      fiemap->fm_length = FIEMAP_MAX_OFFSET;
> +      fiemap->fm_extent_count = count;
> +
> +      /* When ioctl(2) fails, fall back to the normal copy only if it
> +         is the first time we met.  */
> +      if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
> +        {
> +          /* If `i > 0', then at least one ioctl(2) has been performed before.  */
> +          if (i == 0)
> +            *normal_copy_required = true;
> +          return false;
> +        }
> +
> +      /* If 0 extents are returned, then more ioctls are not needed.  */
> +      if (fiemap->fm_mapped_extents == 0)
> +        break;
> +
> +      for (i = 0; i < fiemap->fm_mapped_extents; i++)
> +        {
> +          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
> +
> +          off_t ext_logical = fm_ext[i].fe_logical;
> +          uint64_t ext_len = fm_ext[i].fe_length;
> +
> +          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
> +            {
> +              error (0, errno, _("cannot lseek %s"), quote (src_name));
> +              return fail;
> +            }
> +
> +          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0LL)
> +            {
> +              error (0, errno, _("cannot lseek %s"), quote (dst_name));
> +              return fail;
> +            }
> +
> +          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
> +            {
> +              last_ext_logical = ext_logical;
> +              last_ext_len = ext_len;
> +              last = true;
> +            }
> +
> +          while (0 < ext_len)
> +            {
> +              char buf[buf_size];
> +
> +              /* Avoid reading into the holes if the left extent
> +                 length is shorter than the buffer size.  */
> +              if (ext_len < buf_size)
> +                buf_size = ext_len;
> +
> +              ssize_t n_read = read (src_fd, buf, buf_size);
> +              if (n_read < 0)
> +                {
> +#ifdef EINTR
> +                  if (errno == EINTR)
> +                    continue;
> +#endif
> +                  error (0, errno, _("reading %s"), quote (src_name));
> +                  return fail;
> +                }
> +
> +              if (n_read == 0)
> +                {
> +                  /* Figure out how many bytes read from the last extent.  */
> +                  last_read_size = last_ext_len - ext_len;
> +                  break;
> +                }
> +
> +              if (full_write (dest_fd, buf, n_read) != n_read)
> +                {
> +                  error (0, errno, _("writing %s"), quote (dst_name));
> +                  return fail;
> +                }
> +
> +              ext_len -= n_read;
> +            }
> +        }
> +
> +      fiemap->fm_start = fm_ext[i - 1].fe_logical + fm_ext[i - 1].fe_length;
> +
> +    } while (! last);
> +
> +  /* If a file ends up with holes, the sum of the last extent logical offset
> +     and the read-returned size will be shorter than the actual size of the
> +     file.  Use ftruncate to extend the length of the destination file.  */
> +  if (last_ext_logical + last_read_size < src_total_size)
> +    {
> +      if (ftruncate (dest_fd, src_total_size) < 0)
> +        {
> +          error (0, errno, _("extending %s"), quote (dst_name));
> +          return fail;
> +        }
> +    }
> +
> +  return ! fail;
> +}
> +#else
> +static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
> +#endif
> +
>  /* FIXME: describe */
>  /* FIXME: rewrite this to use a hash table so we avoid the quadratic
>     performance hit that's probably noticeable only on trees deeper
> @@ -679,6 +818,25 @@ copy_reg (char const *src_name, char const *dst_name,
>  #endif
>          }
> 
> +      if (make_holes)
> +        {
> +          bool require_normal_copy = false;
> +          /* Perform efficient FIEMAP copy for sparse files, fall back to the
> +             standard copy only if the ioctl(2) fails.  */
> +          if (fiemap_copy_ok (source_desc, dest_desc, buf_size,
> +                              src_open_sb.st_size, src_name,
> +                              dst_name, &require_normal_copy))
> +            goto preserve_metadata;
> +          else
> +            {
> +              if (! require_normal_copy)
> +                {
> +                  return_val = false;
> +                  goto close_src_and_dst_desc;
> +                }
> +            }
> +        }
> +
>        /* If not making a sparse file, try to use a more-efficient
>           buffer size.  */
>        if (! make_holes)
> @@ -807,6 +965,7 @@ copy_reg (char const *src_name, char const *dst_name,
>          }
>      }
> 
> +preserve_metadata:
>    if (x->preserve_timestamps)
>      {
>        struct timespec timespec[2];
> diff --git a/src/fiemap.h b/src/fiemap.h
> new file mode 100644
> index 0000000..d33293b
> --- /dev/null
> +++ b/src/fiemap.h
> @@ -0,0 +1,102 @@
> +/* FS_IOC_FIEMAP ioctl infrastructure.
> +   Some portions copyright (C) 2007 Cluster File Systems, Inc
> +   Authors: Mark Fasheh <mfasheh <at> suse.com>
> +            Kalpak Shah <kalpak.shah <at> sun.com>
> +            Andreas Dilger <adilger <at> sun.com>.  */
> +
> +/* Copy from kernel, modified to respect GNU code style by Jie Liu.  */
> +
> +#ifndef _LINUX_FIEMAP_H
> +# define _LINUX_FIEMAP_H
> +
> +# include <linux/types.h>
> +
> +struct fiemap_extent
> +{
> +  /* Logical offset in bytes for the start of the extent
> +     from the beginning of the file.  */
> +  uint64_t fe_logical;
> +
> +  /* Physical offset in bytes for the start of the extent
> +     from the beginning of the disk.  */
> +  uint64_t fe_physical;
> +
> +  /* Length in bytes for this extent.  */
> +  uint64_t fe_length;
> +
> +  uint64_t fe_reserved64[2];
> +
> +  /* FIEMAP_EXTENT_* flags for this extent.  */
> +  uint32_t fe_flags;
> +
> +  uint32_t fe_reserved[3];
> +};
> +
> +struct fiemap
> +{
> +  /* Logical offset(inclusive) at which to start mapping(in).  */
> +  uint64_t fm_start;
> +
> +  /* Logical length of mapping which userspace wants(in).  */
> +  uint64_t fm_length;
> +
> +  /* FIEMAP_FLAG_* flags for request(in/out).  */
> +  uint32_t fm_flags;
> +
> +  /* Number of extents that were mapped(out).  */
> +  uint32_t fm_mapped_extents;
> +
> +  /* Size of fm_extents array(in).  */
> +  uint32_t fm_extent_count;
> +
> +  uint32_t fm_reserved;
> +
> +  /* Array of mapped extents(out).  */
> +  struct fiemap_extent fm_extents[0];
> +};
> +
> +/* The maximum offset can be mapped for a file.  */
> +# define FIEMAP_MAX_OFFSET       (~0ULL)
> +
> +/* Sync file data before map.  */
> +# define FIEMAP_FLAG_SYNC        0x00000001
> +
> +/* Map extented attribute tree.  */
> +# define FIEMAP_FLAG_XATTR       0x00000002
> +
> +# define FIEMAP_FLAGS_COMPAT     (FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR)
> +
> +/* Last extent in file.  */
> +# define FIEMAP_EXTENT_LAST              0x00000001
> +
> +/* Data location unknown.  */
> +# define FIEMAP_EXTENT_UNKNOWN           0x00000002
> +
> +/* Location still pending, Sets EXTENT_UNKNOWN.  */
> +# define FIEMAP_EXTENT_DELALLOC          0x00000004
> +
> +/* Data can not be read while fs is unmounted.  */
> +# define FIEMAP_EXTENT_ENCODED           0x00000008
> +
> +/* Data is encrypted by fs.  Sets EXTENT_NO_BYPASS.  */
> +# define FIEMAP_EXTENT_DATA_ENCRYPTED    0x00000080
> +
> +/* Extent offsets may not be block aligned.  */
> +# define FIEMAP_EXTENT_NOT_ALIGNED       0x00000100
> +
> +/* Data mixed with metadata.  Sets EXTENT_NOT_ALIGNED.  */
> +# define FIEMAP_EXTENT_DATA_INLINE       0x00000200
> +
> +/* Multiple files in block.  Set EXTENT_NOT_ALIGNED.  */
> +# define FIEMAP_EXTENT_DATA_TAIL         0x00000400
> +
> +/* Space allocated, but not data (i.e. zero).  */
> +# define FIEMAP_EXTENT_UNWRITTEN         0x00000800
> +
> +/* File does not natively support extents.  Result merged for efficiency.  */
> +# define FIEMAP_EXTENT_MERGED		0x00001000
> +
> +/* Space shared with other files.  */
> +# define FIEMAP_EXTENT_SHARED            0x00002000
> +
> +#endif
> --
> 1.7.1.348.gb26ba
> 
> 
> From 260b5b89e33da2b9a5ea5bcd9dba874f503d2937 Mon Sep 17 00:00:00 2001
> From: Jie Liu <jeff.liu <at> oracle.com>
> Date: Thu, 13 May 2010 22:17:53 +0800
> Subject: [PATCH 2/5] tests: add a new test for FIEMAP-copy
> 
> * tests/cp/sparse-fiemap: Add a new test for FIEMAP-copy against a
> loopbacked ext4 partition.
> * tests/Makefile.am (sparse-fiemap): Reference the new test.
> ---
>  tests/Makefile.am      |    1 +
>  tests/cp/sparse-fiemap |   56 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+), 0 deletions(-)
>  create mode 100755 tests/cp/sparse-fiemap
> 
> diff --git a/tests/Makefile.am b/tests/Makefile.am
> index c458574..f7840c8 100644
> --- a/tests/Makefile.am
> +++ b/tests/Makefile.am
> @@ -25,6 +25,7 @@ root_tests =					\
>    cp/special-bits				\
>    cp/cp-mv-enotsup-xattr			\
>    cp/capability					\
> +  cp/sparse-fiemap                              \
>    dd/skip-seek-past-dev				\
>    install/install-C-root			\
>    ls/capability					\
> diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
> new file mode 100755
> index 0000000..945c94b
> --- /dev/null
> +++ b/tests/cp/sparse-fiemap
> @@ -0,0 +1,56 @@
> +#!/bin/sh
> +# Test cp --sparse=always through fiemap copy
> +
> +# Copyright (C) 2006-2010 Free Software Foundation, Inc.
> +
> +# This program is free software: you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation, either version 3 of the License, or
> +# (at your option) any later version.
> +
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +if test "$VERBOSE" = yes; then
> +  set -x
> +  cp --version
> +fi
> +
> +. $srcdir/test-lib.sh
> +require_root_
> +
> +cwd=`pwd`
> +cleanup_() { cd /; umount "$cwd/mnt"; }
> +
> +skip=0
> +# Create an ext4 loopback file system
> +dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
> +mkdir mnt
> +mkfs -t ext4 -F blob ||
> +  skip_test_ "failed to create ext4 file system"
> +mount -oloop blob mnt                          || skip=1
> +echo test > mnt/f                              || skip=1
> +test -s mnt/f                                  || skip=1
> +
> +test $skip = 1 &&
> +  skip_test_ "insufficient mount/ext4 support"
> +
> +# Create a 1TiB sparse file
> +dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=1G || framework_failure
> +
> +cd mnt || fail=1
> +
> +# It takes many minutes to copy this sparse file using the old method.
> +# By contrast, it takes far less than 1 second using FIEMAP-copy.
> +timeout 10 cp --sparse=always sparse fiemap || fail=1
> +
> +# Ensure that the sparse file copied through fiemap has the same size
> +# in bytes as the original.
> +test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1
> +
> +Exit $fail
> --
> 1.7.1.348.gb26ba
> 
> 
> From 738031aed78b6323969f799d3d5fbf19e0cfc91a Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Fri, 21 May 2010 18:28:42 +0200
> Subject: [PATCH 3/5] tests: exercise more of the new FIEMAP copying code
> 
> * tests/cp/sparse-fiemap: Ensure that a file with many extents (more
> than fit in copy.c's internal 4KiB buffer) is copied properly.
> ---
>  tests/cp/sparse-fiemap |   38 ++++++++++++++++++++++++++++++++++++++
>  1 files changed, 38 insertions(+), 0 deletions(-)
> 
> diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
> index 945c94b..b1643be 100755
> --- a/tests/cp/sparse-fiemap
> +++ b/tests/cp/sparse-fiemap
> @@ -53,4 +53,42 @@ timeout 10 cp --sparse=always sparse fiemap || fail=1
>  # in bytes as the original.
>  test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1
> 
> +# =================================================
> +# Ensure that we exercise the FIEMAP-copying code enough
> +# to provoke at least two iterations of the do...while loop
> +# in which it calls ioctl (fd, FS_IOC_FIEMAP,...
> +# This also verifies that non-trivial extents are preserved.
> +
> +$PERL -e 1 || skip_test_ 'skipping part of this test; you lack perl'
> +
> +$PERL -e 'BEGIN { $n = 16 * 1024; *F = *STDOUT }' \
> +      -e 'for (1..100) { sysseek (*F, $n, 1)' \
> +      -e '&& syswrite (*F, "."x$n) or die "$!"}' > j1 || fail=1
> +
> +cp --sparse=always j1 j2 || fail=1
> +cmp j1 j2 || fail=1
> +
> +filefrag j1 | grep extent \
> +  || skip_test_ 'skipping part of this test; you lack filefrag'
> +
> +# Here is sample filefrag output:
> +#   $ perl -e 'BEGIN{$n=16*1024; *F=*STDOUT}' \
> +#          -e 'for (1..5) { sysseek(*F,$n,1)' \
> +#          -e '&& syswrite *F,"."x$n or die "$!"}' > j
> +#   $ filefrag -v j
> +#   Filesystem type is: ef53
> +#   File size of j is 163840 (40 blocks, blocksize 4096)
> +#    ext logical physical expected length flags
> +#      0       4  6258884               4
> +#      1      12  6258892  6258887      4
> +#      2      20  6258900  6258895      4
> +#      3      28  6258908  6258903      4
> +#      4      36  6258916  6258911      4 eof
> +#   j: 6 extents found
> +
> +# exclude the physical block numbers; they always differ
> +filefrag -v j1 | awk '/^ / {print $1,$2,$NF}' > ff1 || fail=1
> +filefrag -v j2 | awk '/^ / {print $1,$2,$NF}' > ff2 || fail=1
> +compare ff1 ff2 || fail=1
> +
>  Exit $fail
> --
> 1.7.1.348.gb26ba
> 
> 
> From e84f881cb5c2eb92f3b6d5bddaef50c3e811bc30 Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Sat, 22 May 2010 10:22:58 +0200
> Subject: [PATCH 4/5] tests: require root access only if current partition is not ext4
> 
> * tests/cp/sparse-fiemap: Don't require root access if current
> partition is ext4.
> ---
>  tests/cp/sparse-fiemap |   44 +++++++++++++++++++++++++-------------------
>  1 files changed, 25 insertions(+), 19 deletions(-)
> 
> diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
> index b1643be..371bced 100755
> --- a/tests/cp/sparse-fiemap
> +++ b/tests/cp/sparse-fiemap
> @@ -22,28 +22,34 @@ if test "$VERBOSE" = yes; then
>  fi
> 
>  . $srcdir/test-lib.sh
> -require_root_
> 
> -cwd=`pwd`
> -cleanup_() { cd /; umount "$cwd/mnt"; }
> -
> -skip=0
> -# Create an ext4 loopback file system
> -dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
> -mkdir mnt
> -mkfs -t ext4 -F blob ||
> -  skip_test_ "failed to create ext4 file system"
> -mount -oloop blob mnt                          || skip=1
> -echo test > mnt/f                              || skip=1
> -test -s mnt/f                                  || skip=1
> -
> -test $skip = 1 &&
> -  skip_test_ "insufficient mount/ext4 support"
> +if df -T -t ext4 . ; then
> +  : # Current dir is on an ext4 partition.  Good!
> +else
> +  # It's not;  we need to create one, hence we need root access.
> +  require_root_
> +
> +  cwd=$PWD
> +  cleanup_() { cd /; umount "$cwd/mnt"; }
> +
> +  skip=0
> +  # Create an ext4 loopback file system
> +  dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
> +  mkdir mnt
> +  mkfs -t ext4 -F blob ||
> +    skip_test_ "failed to create ext4 file system"
> +  mount -oloop blob mnt                          || skip=1
> +  echo test > mnt/f                              || skip=1
> +  test -s mnt/f                                  || skip=1
> +
> +  test $skip = 1 &&
> +    skip_test_ "insufficient mount/ext4 support"
> +
> +  cd mnt || fail=1
> +fi
> 
>  # Create a 1TiB sparse file
> -dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=1G || framework_failure
> -
> -cd mnt || fail=1
> +dd if=/dev/zero of=sparse bs=1k count=1 seek=1G || framework_failure
> 
>  # It takes many minutes to copy this sparse file using the old method.
>  # By contrast, it takes far less than 1 second using FIEMAP-copy.
> --
> 1.7.1.348.gb26ba
> 
> 
> From 213c9247a406b267bc5819ccbe3b729e9557c675 Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Sat, 22 May 2010 10:21:46 +0200
> Subject: [PATCH 5/5] tests: fiemap test improvement
> 
> * tests/cp/sparse-fiemap: More tests.
> ---
>  tests/cp/sparse-fiemap |   69 +++++++++++++++++++++++++-----------------------
>  1 files changed, 36 insertions(+), 33 deletions(-)
> 
> diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
> index 371bced..ef3742e 100755
> --- a/tests/cp/sparse-fiemap
> +++ b/tests/cp/sparse-fiemap
> @@ -38,14 +38,14 @@ else
>    mkdir mnt
>    mkfs -t ext4 -F blob ||
>      skip_test_ "failed to create ext4 file system"
> -  mount -oloop blob mnt                          || skip=1
> -  echo test > mnt/f                              || skip=1
> -  test -s mnt/f                                  || skip=1
> +  mount -oloop blob mnt   || skip=1
> +  cd mnt                  || skip=1
> +  echo test > f           || skip=1
> +  test -s f               || skip=1
> 
>    test $skip = 1 &&
>      skip_test_ "insufficient mount/ext4 support"
> 
> -  cd mnt || fail=1
>  fi
> 
>  # Create a 1TiB sparse file
> @@ -67,34 +67,37 @@ test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1
> 
>  $PERL -e 1 || skip_test_ 'skipping part of this test; you lack perl'
> 
> -$PERL -e 'BEGIN { $n = 16 * 1024; *F = *STDOUT }' \
> -      -e 'for (1..100) { sysseek (*F, $n, 1)' \
> -      -e '&& syswrite (*F, "."x$n) or die "$!"}' > j1 || fail=1
> -
> -cp --sparse=always j1 j2 || fail=1
> -cmp j1 j2 || fail=1
> -
> -filefrag j1 | grep extent \
> -  || skip_test_ 'skipping part of this test; you lack filefrag'
> -
> -# Here is sample filefrag output:
> -#   $ perl -e 'BEGIN{$n=16*1024; *F=*STDOUT}' \
> -#          -e 'for (1..5) { sysseek(*F,$n,1)' \
> -#          -e '&& syswrite *F,"."x$n or die "$!"}' > j
> -#   $ filefrag -v j
> -#   Filesystem type is: ef53
> -#   File size of j is 163840 (40 blocks, blocksize 4096)
> -#    ext logical physical expected length flags
> -#      0       4  6258884               4
> -#      1      12  6258892  6258887      4
> -#      2      20  6258900  6258895      4
> -#      3      28  6258908  6258903      4
> -#      4      36  6258916  6258911      4 eof
> -#   j: 6 extents found
> -
> -# exclude the physical block numbers; they always differ
> -filefrag -v j1 | awk '/^ / {print $1,$2,$NF}' > ff1 || fail=1
> -filefrag -v j2 | awk '/^ / {print $1,$2,$NF}' > ff2 || fail=1
> -compare ff1 ff2 || fail=1
> +for i in $(seq 20); do
> +  for j in 1 2 31 100; do
> +    $PERL -e 'BEGIN { $n = '$i' * 1024; *F = *STDOUT }' \
> +          -e 'for (1..'$j') { sysseek (*F, $n, 1)' \
> +          -e '&& syswrite (*F, "."x$n) or die "$!"}' > j1 || fail=1
> +
> +    cp --sparse=always j1 j2 || fail=1
> +    cmp j1 j2 || fail=1
> +    filefrag -v j1 | grep extent \
> +      || skip_test_ 'skipping part of this test; you lack filefrag'
> +
> +    # Here is sample filefrag output:
> +    #   $ perl -e 'BEGIN{$n=16*1024; *F=*STDOUT}' \
> +    #          -e 'for (1..5) { sysseek(*F,$n,1)' \
> +    #          -e '&& syswrite *F,"."x$n or die "$!"}' > j
> +    #   $ filefrag -v j
> +    #   Filesystem type is: ef53
> +    #   File size of j is 163840 (40 blocks, blocksize 4096)
> +    #    ext logical physical expected length flags
> +    #      0       4  6258884               4
> +    #      1      12  6258892  6258887      4
> +    #      2      20  6258900  6258895      4
> +    #      3      28  6258908  6258903      4
> +    #      4      36  6258916  6258911      4 eof
> +    #   j: 6 extents found
> +
> +    # exclude the physical block numbers; they always differ
> +    filefrag -v j1 | awk '/^ / {print $1,$2}' > ff1 || fail=1
> +    filefrag -v j2 | awk '/^ / {print $1,$2}' > ff2 || fail=1
> +    compare ff1 ff2 || fail=1
> +  done
> +done
> 
>  Exit $fail
> --
> 1.7.1.348.gb26ba
> 
> 
> 


-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 27 May 2010 17:30:03 GMT) Full text and rfc822 format available.

Message #68 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Sunil Mushran <sunil.mushran <at> oracle.com>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Tao Ma <tao.ma <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, Chris Mason <chris.mason <at> oracle.com>,
	Joel Becker <Joel.Becker <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 27 May 2010 10:28:10 -0700
Jim Meyering wrote:
>> Hi Jeff,
>>
>> I've included below the state of my local changes.
>> Unfortunately, with that 5-patch series, there is always a test failure
>> on F13/ext4.  Maybe someone who knows more about extents can provide an
>> explanation?
>>
>> Here's a small example to demonstrate:
>>
>> Create a file with many extents:
>>
>>      perl -e 'BEGIN { $n = 19 * 1024; *F = *STDOUT }' \
>>        -e 'for (1..100) { sysseek (*F, $n, 1)' \
>>        -e '&&  syswrite (*F, "."x$n) or die "$!"}'>  j1
>>
>> Using the patched "cp", repeat the following 10 or 20 times:
>>
>>      ./cp --sparse=always j1 j2; sync
>>      filefrag -v j1 | awk '/^ / {print $1,$2}'>  ff1 || fail=1
>>      filefrag -v j2 | awk '/^ / {print $1,$2}'>  ff2 || fail=1
>>      diff -u ff1 ff2 || fail=1
>>
>> Usually there is no diff output, but occasionally it'll print this:
>> [hmm... today it consistently prints these differences every other time.]
>

The reason it does not work is because the sparse file created by cp
may not be sparse (or sparse enough). And that is because cp reads
is chunks of st_blocksize and skips the write only if the entire chunk
is zero. The perl script creates the file in 19K chunks (alternate writes
and holes).

So on a 4K fs, the file created by the script will have 4 blocks as holes
(avg). But when cp makes it, it could fill out those holes because the read
granularity could be coarser. For example, ocfs2 fills out st_blocksize
with the fs cluster size (allocation size) which could be larger than the
block size.

My suggestion is to not use filefrag but to use md5sum to compare the two
files. In the end, the aim of this feature is to reduce the number of reads.
Furthermore, the number of extents created is a function of not only cp but
also the file system. (A fs does not guarantee the number of extents a file
will have. It only guarantees that the data out is the same as data in.)




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 27 May 2010 18:44:02 GMT) Full text and rfc822 format available.

Message #71 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Sunil Mushran <sunil.mushran <at> oracle.com>
Cc: "jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 27 May 2010 20:43:40 +0200
Sunil Mushran wrote:
> Jim Meyering wrote:
>>> Hi Jeff,
>>>
>>> I've included below the state of my local changes.
>>> Unfortunately, with that 5-patch series, there is always a test failure
>>> on F13/ext4.  Maybe someone who knows more about extents can provide an
>>> explanation?
>>>
>>> Here's a small example to demonstrate:
>>>
>>> Create a file with many extents:
>>>
>>>      perl -e 'BEGIN { $n = 19 * 1024; *F = *STDOUT }' \
>>>        -e 'for (1..100) { sysseek (*F, $n, 1)' \
>>>        -e '&&  syswrite (*F, "."x$n) or die "$!"}'>  j1
>>>
>>> Using the patched "cp", repeat the following 10 or 20 times:
>>>
>>>      ./cp --sparse=always j1 j2; sync
>>>      filefrag -v j1 | awk '/^ / {print $1,$2}'>  ff1 || fail=1
>>>      filefrag -v j2 | awk '/^ / {print $1,$2}'>  ff2 || fail=1
>>>      diff -u ff1 ff2 || fail=1
>>>
>>> Usually there is no diff output, but occasionally it'll print this:
>>> [hmm... today it consistently prints these differences every other time.]
>
> The reason it does not work is because the sparse file created by cp
> may not be sparse (or sparse enough). And that is because cp reads
> is chunks of st_blocksize and skips the write only if the entire chunk
> is zero. The perl script creates the file in 19K chunks (alternate writes
> and holes).

Thanks for replying.

However, your description of how GNU cp works suggests that you're
looking at the pre-FIEMAP semantics.  Please refer to the patches here

    http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/20534/focus=20630

that make it use FIEMAP.

> So on a 4K fs, the file created by the script will have 4 blocks as holes
> (avg). But when cp makes it, it could fill out those holes because the read
> granularity could be coarser. For example, ocfs2 fills out st_blocksize
> with the fs cluster size (allocation size) which could be larger than the
> block size.
>
> My suggestion is to not use filefrag but to use md5sum to compare the two
> files.

That would be pointless.
The goal of the test is to determine that the FIEMAP copy
did indeed preserve the extents.  If I do as you suggest, even if
cp mistakenly filled in all holes, the test would still pass.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 27 May 2010 18:46:02 GMT) Full text and rfc822 format available.

Message #74 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 27 May 2010 20:45:24 +0200
jeff.liu wrote:
> Jim Meyering wrote:
>> jeff.liu wrote:
>>> This is the revised version, it fixed the fiemap-start offset calculation
>>> approach to remove it out
>>> of the 'for (i = 0; i < fiemap->fm_mapped_extents; i++)' loop.
>>
>> Hi Jeff,
>>
>> I've included below the state of my local changes.
>> Unfortunately, with that 5-patch series, there is always a test failure
>> on F13/ext4.  Maybe someone who knows more about extents can provide an
>> explanation?
>>
>> Here's a small example to demonstrate:
>>
>> Create a file with many extents:
>>
>>     perl -e 'BEGIN { $n = 19 * 1024; *F = *STDOUT }' \
>>       -e 'for (1..100) { sysseek (*F, $n, 1)' \
>>       -e '&& syswrite (*F, "."x$n) or die "$!"}' > j1
>>
>> Using the patched "cp", repeat the following 10 or 20 times:
>>
>>     ./cp --sparse=always j1 j2; sync
>>     filefrag -v j1 | awk '/^ / {print $1,$2}' > ff1 || fail=1
>>     filefrag -v j2 | awk '/^ / {print $1,$2}' > ff2 || fail=1
>>     diff -u ff1 ff2 || fail=1
>>
>> Usually there is no diff output, but occasionally it'll print this:
>> [hmm... today it consistently prints these differences every other time.]
> Woo!!!
> I just run this test on btrfs/ext4/ocfs2 against mainline kernel(Linux jeff-laptop
> 2.6.33-rc5-00238-gb04da8b-dirty) on my laptop.
> Only btrfs always works well for me, Ext4 has the same issue like yours.

One more point of reference:
the tests all passed on an XFS file system.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 27 May 2010 19:05:02 GMT) Full text and rfc822 format available.

Message #77 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Sunil Mushran <sunil.mushran <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: "jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 27 May 2010 12:03:09 -0700
On 05/27/2010 11:43 AM, Jim Meyering wrote:
> Sunil Mushran wrote:
>    
>> Jim Meyering wrote:
>>      
>>>> Hi Jeff,
>>>>
>>>> I've included below the state of my local changes.
>>>> Unfortunately, with that 5-patch series, there is always a test failure
>>>> on F13/ext4.  Maybe someone who knows more about extents can provide an
>>>> explanation?
>>>>
>>>> Here's a small example to demonstrate:
>>>>
>>>> Create a file with many extents:
>>>>
>>>>       perl -e 'BEGIN { $n = 19 * 1024; *F = *STDOUT }' \
>>>>         -e 'for (1..100) { sysseek (*F, $n, 1)' \
>>>>         -e '&&   syswrite (*F, "."x$n) or die "$!"}'>   j1
>>>>
>>>> Using the patched "cp", repeat the following 10 or 20 times:
>>>>
>>>>       ./cp --sparse=always j1 j2; sync
>>>>       filefrag -v j1 | awk '/^ / {print $1,$2}'>   ff1 || fail=1
>>>>       filefrag -v j2 | awk '/^ / {print $1,$2}'>   ff2 || fail=1
>>>>       diff -u ff1 ff2 || fail=1
>>>>
>>>> Usually there is no diff output, but occasionally it'll print this:
>>>> [hmm... today it consistently prints these differences every other time.]
>>>>          
>> The reason it does not work is because the sparse file created by cp
>> may not be sparse (or sparse enough). And that is because cp reads
>> is chunks of st_blocksize and skips the write only if the entire chunk
>> is zero. The perl script creates the file in 19K chunks (alternate writes
>> and holes).
>>      
> Thanks for replying.
>
> However, your description of how GNU cp works suggests that you're
> looking at the pre-FIEMAP semantics.  Please refer to the patches here
>
>      http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/20534/focus=20630
>
> that make it use FIEMAP.
>
>    
>> So on a 4K fs, the file created by the script will have 4 blocks as holes
>> (avg). But when cp makes it, it could fill out those holes because the read
>> granularity could be coarser. For example, ocfs2 fills out st_blocksize
>> with the fs cluster size (allocation size) which could be larger than the
>> block size.
>>
>> My suggestion is to not use filefrag but to use md5sum to compare the two
>> files.
>>      
> That would be pointless.
> The goal of the test is to determine that the FIEMAP copy
> did indeed preserve the extents.  If I do as you suggest, even if
> cp mistakenly filled in all holes, the test would still pass.
>    

I wouldn't call it pointless. More efficient read is still a big win.
My problem with the filefrag compare is that it is expecting a
behavior that no filesystem guarantees.

But having said that, I am intrigued as to why it is not working
as expected.

Do you have a strace for me to look at? strace of the fiemap
enabled cp. Also, output of "stat -f j1; stat j1 j2;filefrag -v j1; 
filefrag -v j2;"

Thanks
Sunil




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 27 May 2010 20:28:02 GMT) Full text and rfc822 format available.

Message #80 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 27 May 2010 22:26:50 +0200
Jim Meyering wrote:
> jeff.liu wrote:
>> Jim Meyering wrote:
>>> jeff.liu wrote:
>>>> This is the revised version, it fixed the fiemap-start offset calculation
>>>> approach to remove it out
>>>> of the 'for (i = 0; i < fiemap->fm_mapped_extents; i++)' loop.
>>>
>>> Hi Jeff,
>>>
>>> I've included below the state of my local changes.
>>> Unfortunately, with that 5-patch series, there is always a test failure
>>> on F13/ext4.  Maybe someone who knows more about extents can provide an
>>> explanation?
>>>
>>> Here's a small example to demonstrate:
>>>
>>> Create a file with many extents:
>>>
>>>     perl -e 'BEGIN { $n = 19 * 1024; *F = *STDOUT }' \
>>>       -e 'for (1..100) { sysseek (*F, $n, 1)' \
>>>       -e '&& syswrite (*F, "."x$n) or die "$!"}' > j1
>>>
>>> Using the patched "cp", repeat the following 10 or 20 times:
>>>
>>>     ./cp --sparse=always j1 j2; sync
>>>     filefrag -v j1 | awk '/^ / {print $1,$2}' > ff1 || fail=1
>>>     filefrag -v j2 | awk '/^ / {print $1,$2}' > ff2 || fail=1
>>>     diff -u ff1 ff2 || fail=1
>>>
>>> Usually there is no diff output, but occasionally it'll print this:
>>> [hmm... today it consistently prints these differences every other time.]
>> Woo!!!
>> I just run this test on btrfs/ext4/ocfs2 against mainline kernel(Linux jeff-laptop
>> 2.6.33-rc5-00238-gb04da8b-dirty) on my laptop.
>> Only btrfs always works well for me, Ext4 has the same issue like yours.
>
> One more point of reference:
> the tests all passed on an XFS file system.

Since XFS is relatively stable, I'll use it instead of ext4.
This incremental will permit use of either XFS or btrfs
for this test, as non-root, and otherwise, create an XFS
loopback partition when run as root:


diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index ef3742e..763c3b7 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -23,8 +23,8 @@ fi

 . $srcdir/test-lib.sh

-if df -T -t ext4 . ; then
-  : # Current dir is on an ext4 partition.  Good!
+if df -T -t btrfs -t xfs . ; then
+  : # Current dir is on a partition with working extents.  Good!
 else
   # It's not;  we need to create one, hence we need root access.
   require_root_
@@ -33,18 +33,18 @@ else
   cleanup_() { cd /; umount "$cwd/mnt"; }

   skip=0
-  # Create an ext4 loopback file system
-  dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
+  # Create an XFS loopback file system
+  dd if=/dev/zero of=blob bs=32k count=1000 || skip=1
   mkdir mnt
-  mkfs -t ext4 -F blob ||
-    skip_test_ "failed to create ext4 file system"
+  mkfs -t xfs blob ||
+    skip_test_ "failed to create XFS file system"
   mount -oloop blob mnt   || skip=1
   cd mnt                  || skip=1
   echo test > f           || skip=1
   test -s f               || skip=1

   test $skip = 1 &&
-    skip_test_ "insufficient mount/ext4 support"
+    skip_test_ "insufficient mount/XFS support"

 fi




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 28 May 2010 08:09:02 GMT) Full text and rfc822 format available.

Message #83 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tao Ma <tao.ma <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, "jeff.liu" <jeff.liu <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 28 May 2010 15:52:26 +0800
Hi Jim

On 05/27/2010 06:30 PM, Jim Meyering wrote:
> jeff.liu wrote:
>> This is the revised version, it fixed the fiemap-start offset calculation
>> approach to remove it out
>> of the 'for (i = 0; i<  fiemap->fm_mapped_extents; i++)' loop.
>
> Hi Jeff,
>
> I've included below the state of my local changes.
> Unfortunately, with that 5-patch series, there is always a test failure
> on F13/ext4.  Maybe someone who knows more about extents can provide an
> explanation?
Just want to clarify why ocfs2 didn't work here. I guess the reason also 
works for ext4 since both ext4 and ocfs2 use block group to organize 
their blocks in the volume.

I checked the perl test script to create sparse src file, it will create 
contiguous bytes(around 20-24k) at an interval of around 40k.So in 
general, these 20-24k should be contiguous. But that does exist some 
scenario that they could be separately into 2 extents. Consider one 
block group is used to allocate blocks to this file, when the block 
group only has 10K left while you are requiring 20K, it will use the 
left 10K in this group and allocate 10K from another block group. That 
would become 2 extents since they can't be contiguous.

So I guess the right step is to check the holes by using filefrag if you 
prefer this tool and want to make sure cp doesn't copy holes(I get this 
point from another e-mail written by you). How to find holes with 
filefrag? I guess it is quite simple since it also use fiemap and we can 
calculating holes easily by comparing the 2 consecutive records. I guess 
we can get what you want with ext4 after this update.

Regards,
Tao




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 28 May 2010 16:45:02 GMT) Full text and rfc822 format available.

Message #86 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Tao Ma <tao.ma <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, "jeff.liu" <jeff.liu <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 28 May 2010 18:44:33 +0200
Tao Ma wrote:

> Hi Jim
>
> On 05/27/2010 06:30 PM, Jim Meyering wrote:
>> jeff.liu wrote:
>>> This is the revised version, it fixed the fiemap-start offset calculation
>>> approach to remove it out
>>> of the 'for (i = 0; i<  fiemap->fm_mapped_extents; i++)' loop.
>>
>> Hi Jeff,
>>
>> I've included below the state of my local changes.
>> Unfortunately, with that 5-patch series, there is always a test failure
>> on F13/ext4.  Maybe someone who knows more about extents can provide an
>> explanation?
> Just want to clarify why ocfs2 didn't work here. I guess the reason
> also works for ext4 since both ext4 and ocfs2 use block group to
> organize their blocks in the volume.

Hi Tao,

Thank you for the explanation.
I'm glad to hear that there is no underlying problem.

> I checked the perl test script to create sparse src file, it will
> create contiguous bytes(around 20-24k) at an interval of around 40k.So
> in general, these 20-24k should be contiguous. But that does exist
> some scenario that they could be separately into 2 extents. Consider
> one block group is used to allocate blocks to this file, when the
> block group only has 10K left while you are requiring 20K, it will use
> the left 10K in this group and allocate 10K from another block
> group. That would become 2 extents since they can't be contiguous.

> So I guess the right step is to check the holes by using filefrag if
> you prefer this tool and want to make sure cp doesn't copy holes(I get

Do you know of a tool other than filefrag that I could use?

It looks like a small script could filter filefrag -v output, detect
split extents and rewrite to make the output match what's expected.
Probably not worth it, though, since this is already a very fragile test.

It would be nice to be able to perform the test in non-root
mode on any of ext4, ocfs2, btrfs, xfs file systems, but for
now, due to this difference, I can use only the latter two.

> this point from another e-mail written by you). How to find holes with
> filefrag? I guess it is quite simple since it also use fiemap and we
> can calculating holes easily by comparing the 2 consecutive records. I
> guess we can get what you want with ext4 after this update.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sun, 30 May 2010 14:29:02 GMT) Full text and rfc822 format available.

Message #89 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tao Ma <tao.ma <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, "jeff.liu" <jeff.liu <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Sun, 30 May 2010 22:28:10 +0800
Hi Jim,

On 05/29/2010 12:44 AM, Jim Meyering wrote:
> Tao Ma wrote:
>
>> Hi Jim
>>
>> On 05/27/2010 06:30 PM, Jim Meyering wrote:
>>> jeff.liu wrote:
>>>> This is the revised version, it fixed the fiemap-start offset calculation
>>>> approach to remove it out
>>>> of the 'for (i = 0; i<   fiemap->fm_mapped_extents; i++)' loop.
>>>
>>> Hi Jeff,
>>>
>>> I've included below the state of my local changes.
>>> Unfortunately, with that 5-patch series, there is always a test failure
>>> on F13/ext4.  Maybe someone who knows more about extents can provide an
>>> explanation?
>> Just want to clarify why ocfs2 didn't work here. I guess the reason
>> also works for ext4 since both ext4 and ocfs2 use block group to
>> organize their blocks in the volume.
>
> Hi Tao,
>
> Thank you for the explanation.
> I'm glad to hear that there is no underlying problem.
>
>> I checked the perl test script to create sparse src file, it will
>> create contiguous bytes(around 20-24k) at an interval of around 40k.So
>> in general, these 20-24k should be contiguous. But that does exist
>> some scenario that they could be separately into 2 extents. Consider
>> one block group is used to allocate blocks to this file, when the
>> block group only has 10K left while you are requiring 20K, it will use
>> the left 10K in this group and allocate 10K from another block
>> group. That would become 2 extents since they can't be contiguous.
>
>> So I guess the right step is to check the holes by using filefrag if
>> you prefer this tool and want to make sure cp doesn't copy holes(I get
>
> Do you know of a tool other than filefrag that I could use?
nope.
>
> It looks like a small script could filter filefrag -v output, detect
> split extents and rewrite to make the output match what's expected.
> Probably not worth it, though, since this is already a very fragile test.
yeah, actually I guess what we want is not the extents number but the 
result of whether extents match. So if we can find all the 
pairs(v_cpos, len) both in the src and the target, and there is no extra 
pair in the target, that is good enough. The extents number is really a 
tricky thing and depends on what the internal file system does.
>
> It would be nice to be able to perform the test in non-root
> mode on any of ext4, ocfs2, btrfs, xfs file systems, but for
> now, due to this difference, I can use only the latter two.
I think with the above method we should test ext4 and ocfs2 also. So 
jeff, could you please write a small script to test whether the above 
method work?

Regards,
Tao




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sun, 30 May 2010 20:13:02 GMT) Full text and rfc822 format available.

Message #92 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Tao Ma <tao.ma <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, "jeff.liu" <jeff.liu <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Sun, 30 May 2010 22:12:17 +0200
Tao Ma wrote:
> Hi Jim,
> On 05/29/2010 12:44 AM, Jim Meyering wrote:
>> Tao Ma wrote:
>>
>>> Hi Jim
>>>
>>> On 05/27/2010 06:30 PM, Jim Meyering wrote:
>>>> jeff.liu wrote:
>>>>> This is the revised version, it fixed the fiemap-start offset calculation
>>>>> approach to remove it out
>>>>> of the 'for (i = 0; i<   fiemap->fm_mapped_extents; i++)' loop.
>>>>
>>>> Hi Jeff,
>>>>
>>>> I've included below the state of my local changes.
>>>> Unfortunately, with that 5-patch series, there is always a test failure
>>>> on F13/ext4.  Maybe someone who knows more about extents can provide an
>>>> explanation?
>>> Just want to clarify why ocfs2 didn't work here. I guess the reason
>>> also works for ext4 since both ext4 and ocfs2 use block group to
>>> organize their blocks in the volume.
>>
>> Hi Tao,
>>
>> Thank you for the explanation.
>> I'm glad to hear that there is no underlying problem.
>>
>>> I checked the perl test script to create sparse src file, it will
>>> create contiguous bytes(around 20-24k) at an interval of around 40k.So
>>> in general, these 20-24k should be contiguous. But that does exist
>>> some scenario that they could be separately into 2 extents. Consider
>>> one block group is used to allocate blocks to this file, when the
>>> block group only has 10K left while you are requiring 20K, it will use
>>> the left 10K in this group and allocate 10K from another block
>>> group. That would become 2 extents since they can't be contiguous.
>>
>>> So I guess the right step is to check the holes by using filefrag if
>>> you prefer this tool and want to make sure cp doesn't copy holes(I get
>>
>> Do you know of a tool other than filefrag that I could use?
> nope.
>>
>> It looks like a small script could filter filefrag -v output, detect
>> split extents and rewrite to make the output match what's expected.
>> Probably not worth it, though, since this is already a very fragile test.

I went ahead and did it, after all.
Here's the script, filefrag-extent-compare.
With it, this test should pass when run on any of those four
file system types.


eval '(exit $?0)' && eval 'exec perl -wS "$0" ${1+"$@"}'
  & eval 'exec perl -wS "$0" $argv:q'
    if 0;
# Determine whether two files have the same extents by comparing
# the logical block numbers and lengths from filefrag -v for each.

# Invoke like this:
# This helper function, f, extracts logical block number and lengths.
# f() { awk '/^ *[0-9]/ {printf "%d %d ",$2,NF<5?$NF:$5} END {print ""}'; }
# { filefrag -v j1 | f; filefrag -v j2 | f; } | ./filefrag-extent-compare

use warnings;
use strict;
(my $ME = $0) =~ s|.*/||;

my @line = <>;
my $n_lines = @line;
$n_lines == 2
  or die "$ME: expected exactly two input lines; got $n_lines\n";

my @A = split ' ', $line[0];
my @B = split ' ', $line[1];
@A % 2 || @B % 2
  and die "$ME: unexpected input: odd number of numbers; expected even\n";

my @a;
my @b;
foreach my $i (0..@A/2-1) { $a[$i] = { L_BLK => $A[2*$i], LEN => $A[2*$i+1] } };
foreach my $i (0..@B/2-1) { $b[$i] = { L_BLK => $B[2*$i], LEN => $B[2*$i+1] } };

my $i = 0;
my $j = 0;
while (1)
  {
    !defined $a[$i] && !defined $b[$j]
      and exit 0;
    defined $a[$i] && defined $b[$j]
      or die "\@a and \@b have different lengths, even after adjustment\n";
    ($a[$i]->{L_BLK} == $b[$j]->{L_BLK}
     && $a[$i]->{LEN} == $b[$j]->{LEN})
      and next;
    ($a[$i]->{LEN} < $b[$j]->{LEN}
     && exists $a[$i+1] && $a[$i]->{LEN} + $a[$i+1]->{LEN} == $b[$j]->{LEN})
      and ++$i, next;
    exists $b[$j+1] && $a[$i]->{LEN} == $b[$i]->{LEN} + $b[$i+1]->{LEN}
      and ++$j, next;
    die "differing extent:\n"
      . "  [$i]=$a[$i]->{L_BLK} $a[$i]->{LEN}\n"
      . "  [$j]=$b[$j]->{L_BLK} $b[$j]->{LEN}\n"
  }
continue
  {
    ++$i;
    ++$j;
  }




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Mon, 31 May 2010 02:54:02 GMT) Full text and rfc822 format available.

Message #95 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>, Tao Ma <tao.ma <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Mon, 31 May 2010 10:51:51 +0800
Jim Meyering wrote:
> Tao Ma wrote:
>> Hi Jim,
>> On 05/29/2010 12:44 AM, Jim Meyering wrote:
>>> Tao Ma wrote:
>>>
>>>> Hi Jim
>>>>
>>>> On 05/27/2010 06:30 PM, Jim Meyering wrote:
>>>>> jeff.liu wrote:
>>>>>> This is the revised version, it fixed the fiemap-start offset calculation
>>>>>> approach to remove it out
>>>>>> of the 'for (i = 0; i<   fiemap->fm_mapped_extents; i++)' loop.
>>>>> Hi Jeff,
>>>>>
>>>>> I've included below the state of my local changes.
>>>>> Unfortunately, with that 5-patch series, there is always a test failure
>>>>> on F13/ext4.  Maybe someone who knows more about extents can provide an
>>>>> explanation?
>>>> Just want to clarify why ocfs2 didn't work here. I guess the reason
>>>> also works for ext4 since both ext4 and ocfs2 use block group to
>>>> organize their blocks in the volume.
>>> Hi Tao,
>>>
>>> Thank you for the explanation.
>>> I'm glad to hear that there is no underlying problem.
>>>
>>>> I checked the perl test script to create sparse src file, it will
>>>> create contiguous bytes(around 20-24k) at an interval of around 40k.So
>>>> in general, these 20-24k should be contiguous. But that does exist
>>>> some scenario that they could be separately into 2 extents. Consider
>>>> one block group is used to allocate blocks to this file, when the
>>>> block group only has 10K left while you are requiring 20K, it will use
>>>> the left 10K in this group and allocate 10K from another block
>>>> group. That would become 2 extents since they can't be contiguous.
>>>> So I guess the right step is to check the holes by using filefrag if
>>>> you prefer this tool and want to make sure cp doesn't copy holes(I get
>>> Do you know of a tool other than filefrag that I could use?
>> nope.
>>> It looks like a small script could filter filefrag -v output, detect
>>> split extents and rewrite to make the output match what's expected.
>>> Probably not worth it, though, since this is already a very fragile test.
> 
> I went ahead and did it, after all.
> Here's the script, filefrag-extent-compare.
> With it, this test should pass when run on any of those four
> file system types.
> 
> 
> eval '(exit $?0)' && eval 'exec perl -wS "$0" ${1+"$@"}'
>   & eval 'exec perl -wS "$0" $argv:q'
>     if 0;
> # Determine whether two files have the same extents by comparing
> # the logical block numbers and lengths from filefrag -v for each.
> 
> # Invoke like this:
> # This helper function, f, extracts logical block number and lengths.
> # f() { awk '/^ *[0-9]/ {printf "%d %d ",$2,NF<5?$NF:$5} END {print ""}'; }
> # { filefrag -v j1 | f; filefrag -v j2 | f; } | ./filefrag-extent-compare
> 
> use warnings;
> use strict;
> (my $ME = $0) =~ s|.*/||;
> 
> my @line = <>;
> my $n_lines = @line;
> $n_lines == 2
>   or die "$ME: expected exactly two input lines; got $n_lines\n";
> 
> my @A = split ' ', $line[0];
> my @B = split ' ', $line[1];
> @A % 2 || @B % 2
>   and die "$ME: unexpected input: odd number of numbers; expected even\n";
> 
> my @a;
> my @b;
> foreach my $i (0..@A/2-1) { $a[$i] = { L_BLK => $A[2*$i], LEN => $A[2*$i+1] } };
> foreach my $i (0..@B/2-1) { $b[$i] = { L_BLK => $B[2*$i], LEN => $B[2*$i+1] } };
> 
> my $i = 0;
> my $j = 0;
> while (1)
>   {
>     !defined $a[$i] && !defined $b[$j]
>       and exit 0;
>     defined $a[$i] && defined $b[$j]
>       or die "\@a and \@b have different lengths, even after adjustment\n";
>     ($a[$i]->{L_BLK} == $b[$j]->{L_BLK}
>      && $a[$i]->{LEN} == $b[$j]->{LEN})
>       and next;
>     ($a[$i]->{LEN} < $b[$j]->{LEN}
>      && exists $a[$i+1] && $a[$i]->{LEN} + $a[$i+1]->{LEN} == $b[$j]->{LEN})
>       and ++$i, next;
>     exists $b[$j+1] && $a[$i]->{LEN} == $b[$i]->{LEN} + $b[$i+1]->{LEN}
>       and ++$j, next;
>     die "differing extent:\n"
>       . "  [$i]=$a[$i]->{L_BLK} $a[$i]->{LEN}\n"
>       . "  [$j]=$b[$j]->{L_BLK} $b[$j]->{LEN}\n"
>   }
> continue
>   {
>     ++$i;
>     ++$j;
>   }
> 
> 
> 
Thanks Tao and Jim for helping this out!

-Jeff

-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 02 Jun 2010 07:29:01 GMT) Full text and rfc822 format available.

Message #98 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Tao Ma <tao.ma <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, "jeff.liu" <jeff.liu <at> oracle.com>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 02 Jun 2010 09:27:53 +0200
Jim Meyering wrote:
...
>>> Do you know of a tool other than filefrag that I could use?
>> nope.
>>>
>>> It looks like a small script could filter filefrag -v output, detect
>>> split extents and rewrite to make the output match what's expected.
>>> Probably not worth it, though, since this is already a very fragile test.
>
> I went ahead and did it, after all.
> Here's the script, filefrag-extent-compare.
> With it, this test should pass when run on any of those four
> file system types.

Not quite.
Parsing filefrag -v output is part of what is fragile.

Here are two examples:

==> ff1 <==
Filesystem type is: ef53
File size of j1 is 49152 (12 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       3 10256380               3
   1       9 10740733 10256382      3 eof

==> ff2 <==
Filesystem type is: ef53
File size of j2 is 49152 (12 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       3 10283520               3
   1       9 10283523               3 eof
                       ^^^^^^
Note the missing "expected" number.
That doesn't happen often, but it's enough to cause occasional
false-positive failures, since the awk filter counts fields.
I don't dare try to count columns, because those physical block
numbers are likely to have width greater than 8 some of the time.

Instead, I adjusted the filter to remove the "eof"
and let the existing awk code handle the rest:

# Extract logical block number and length pairs from filefrag -v output.
# The initial sed is to remove the "eof" from the normally-empty "flags" field.
# That is required when that final extent has no number in the "expected" field.
f()
{
  sed 's/ eof$//' $@ \
    | awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
}

I'll post a new patch soon.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 08 Jun 2010 11:35:02 GMT) Full text and rfc822 format available.

Message #101 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 08 Jun 2010 13:33:47 +0200
Jim Meyering wrote:
> Jim Meyering wrote:
> ...
>>>> Do you know of a tool other than filefrag that I could use?
>>> nope.
>>>>
>>>> It looks like a small script could filter filefrag -v output, detect
>>>> split extents and rewrite to make the output match what's expected.
>>>> Probably not worth it, though, since this is already a very fragile test.
>>
>> I went ahead and did it, after all.
>> Here's the script, filefrag-extent-compare.
>> With it, this test should pass when run on any of those four
>> file system types.
>
> Not quite.
> Parsing filefrag -v output is part of what is fragile.
>
> Here are two examples:
>
> ==> ff1 <==
> Filesystem type is: ef53
> File size of j1 is 49152 (12 blocks, blocksize 4096)
>  ext logical physical expected length flags
>    0       3 10256380               3
>    1       9 10740733 10256382      3 eof
>
> ==> ff2 <==
> Filesystem type is: ef53
> File size of j2 is 49152 (12 blocks, blocksize 4096)
>  ext logical physical expected length flags
>    0       3 10283520               3
>    1       9 10283523               3 eof
>                        ^^^^^^
> Note the missing "expected" number.
> That doesn't happen often, but it's enough to cause occasional
> false-positive failures, since the awk filter counts fields.
> I don't dare try to count columns, because those physical block
> numbers are likely to have width greater than 8 some of the time.
>
> Instead, I adjusted the filter to remove the "eof"
> and let the existing awk code handle the rest:
>
> # Extract logical block number and length pairs from filefrag -v output.
> # The initial sed is to remove the "eof" from the normally-empty "flags" field.
> # That is required when that final extent has no number in the "expected" field.
> f()
> {
>   sed 's/ eof$//' $@ \
>     | awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
> }
>
> I'll post a new patch soon.

Slowly but surely...

I'll squash the two most recent changes (at the end, below)
into yours, Jeff.

From 9f7a9882944455bfc2b6aec9c9f5431ac429b88b Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Thu, 13 May 2010 22:09:30 +0800
Subject: [PATCH 01/10] cp: Add FIEMAP support for efficient sparse file copy

* src/fiemap.h: Add fiemap.h for fiemap ioctl(2) support.
Copied from linux's include/linux/fiemap.h, with minor formatting changes.
* src/copy.c (copy_reg): Now, when `cp' invoked with --sparse=[WHEN] option, we
will try to do FIEMAP-copy if the underlaying file system support it, fall back
to a normal copy if it fails.
---
 src/copy.c   |  159 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/fiemap.h |  102 +++++++++++++++++++++++++++++++++++++
 2 files changed, 261 insertions(+), 0 deletions(-)
 create mode 100644 src/fiemap.h

diff --git a/src/copy.c b/src/copy.c
index 171499c..2c15ca0 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -63,6 +63,10 @@

 #include <sys/ioctl.h>

+#ifndef HAVE_FIEMAP
+# include "fiemap.h"
+#endif
+
 #ifndef HAVE_FCHOWN
 # define HAVE_FCHOWN false
 # define fchown(fd, uid, gid) (-1)
@@ -149,6 +153,141 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

+#ifdef __linux__
+# ifndef FS_IOC_FIEMAP
+#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
+# endif
+/* Perform FIEMAP(available in mainline 2.6.27) copy if possible.
+   Call ioctl(2) with FS_IOC_FIEMAP to efficiently map file allocation
+   excepts holes.  So the overhead to deal with holes with lseek(2) in
+   normal copy could be saved.  This would result in much faster backups
+   for any kind of sparse file.  */
+static bool
+fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
+                off_t src_total_size, char const *src_name,
+                char const *dst_name, bool *normal_copy_required)
+{
+  bool fail = false;
+  bool last = false;
+  char fiemap_buf[4096];
+  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
+  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
+  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
+                    sizeof (struct fiemap_extent);
+  off_t last_ext_logical = 0;
+  uint64_t last_ext_len = 0;
+  uint64_t last_read_size = 0;
+  unsigned int i = 0;
+
+  /* This is required at least to initialize fiemap->fm_start,
+     but also serves (in May 2010) to appease valgrind, which
+     appears not to know the semantics of the FIEMAP ioctl. */
+  memset (fiemap_buf, 0, sizeof fiemap_buf);
+
+  do
+    {
+      fiemap->fm_length = FIEMAP_MAX_OFFSET;
+      fiemap->fm_extent_count = count;
+
+      /* When ioctl(2) fails, fall back to the normal copy only if it
+         is the first time we met.  */
+      if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
+        {
+          /* If `i > 0', then at least one ioctl(2) has been performed before.  */
+          if (i == 0)
+            *normal_copy_required = true;
+          return false;
+        }
+
+      /* If 0 extents are returned, then more ioctls are not needed.  */
+      if (fiemap->fm_mapped_extents == 0)
+        break;
+
+      for (i = 0; i < fiemap->fm_mapped_extents; i++)
+        {
+          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
+
+          off_t ext_logical = fm_ext[i].fe_logical;
+          uint64_t ext_len = fm_ext[i].fe_length;
+
+          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (src_name));
+              return fail;
+            }
+
+          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0LL)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (dst_name));
+              return fail;
+            }
+
+          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
+            {
+              last_ext_logical = ext_logical;
+              last_ext_len = ext_len;
+              last = true;
+            }
+
+          while (0 < ext_len)
+            {
+              char buf[buf_size];
+
+              /* Avoid reading into the holes if the left extent
+                 length is shorter than the buffer size.  */
+              if (ext_len < buf_size)
+                buf_size = ext_len;
+
+              ssize_t n_read = read (src_fd, buf, buf_size);
+              if (n_read < 0)
+                {
+#ifdef EINTR
+                  if (errno == EINTR)
+                    continue;
+#endif
+                  error (0, errno, _("reading %s"), quote (src_name));
+                  return fail;
+                }
+
+              if (n_read == 0)
+                {
+                  /* Figure out how many bytes read from the last extent.  */
+                  last_read_size = last_ext_len - ext_len;
+                  break;
+                }
+
+              if (full_write (dest_fd, buf, n_read) != n_read)
+                {
+                  error (0, errno, _("writing %s"), quote (dst_name));
+                  return fail;
+                }
+
+              ext_len -= n_read;
+            }
+        }
+
+      fiemap->fm_start = fm_ext[i - 1].fe_logical + fm_ext[i - 1].fe_length;
+
+    } while (! last);
+
+  /* If a file ends up with holes, the sum of the last extent logical offset
+     and the read-returned size will be shorter than the actual size of the
+     file.  Use ftruncate to extend the length of the destination file.  */
+  if (last_ext_logical + last_read_size < src_total_size)
+    {
+      if (ftruncate (dest_fd, src_total_size) < 0)
+        {
+          error (0, errno, _("extending %s"), quote (dst_name));
+          return fail;
+        }
+    }
+
+  return ! fail;
+}
+#else
+static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
+#endif
+
 /* FIXME: describe */
 /* FIXME: rewrite this to use a hash table so we avoid the quadratic
    performance hit that's probably noticeable only on trees deeper
@@ -679,6 +818,25 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }

+      if (make_holes)
+        {
+          bool require_normal_copy = false;
+          /* Perform efficient FIEMAP copy for sparse files, fall back to the
+             standard copy only if the ioctl(2) fails.  */
+          if (fiemap_copy_ok (source_desc, dest_desc, buf_size,
+                              src_open_sb.st_size, src_name,
+                              dst_name, &require_normal_copy))
+            goto preserve_metadata;
+          else
+            {
+              if (! require_normal_copy)
+                {
+                  return_val = false;
+                  goto close_src_and_dst_desc;
+                }
+            }
+        }
+
       /* If not making a sparse file, try to use a more-efficient
          buffer size.  */
       if (! make_holes)
@@ -807,6 +965,7 @@ copy_reg (char const *src_name, char const *dst_name,
         }
     }

+preserve_metadata:
   if (x->preserve_timestamps)
     {
       struct timespec timespec[2];
diff --git a/src/fiemap.h b/src/fiemap.h
new file mode 100644
index 0000000..d33293b
--- /dev/null
+++ b/src/fiemap.h
@@ -0,0 +1,102 @@
+/* FS_IOC_FIEMAP ioctl infrastructure.
+   Some portions copyright (C) 2007 Cluster File Systems, Inc
+   Authors: Mark Fasheh <mfasheh <at> suse.com>
+            Kalpak Shah <kalpak.shah <at> sun.com>
+            Andreas Dilger <adilger <at> sun.com>.  */
+
+/* Copy from kernel, modified to respect GNU code style by Jie Liu.  */
+
+#ifndef _LINUX_FIEMAP_H
+# define _LINUX_FIEMAP_H
+
+# include <linux/types.h>
+
+struct fiemap_extent
+{
+  /* Logical offset in bytes for the start of the extent
+     from the beginning of the file.  */
+  uint64_t fe_logical;
+
+  /* Physical offset in bytes for the start of the extent
+     from the beginning of the disk.  */
+  uint64_t fe_physical;
+
+  /* Length in bytes for this extent.  */
+  uint64_t fe_length;
+
+  uint64_t fe_reserved64[2];
+
+  /* FIEMAP_EXTENT_* flags for this extent.  */
+  uint32_t fe_flags;
+
+  uint32_t fe_reserved[3];
+};
+
+struct fiemap
+{
+  /* Logical offset(inclusive) at which to start mapping(in).  */
+  uint64_t fm_start;
+
+  /* Logical length of mapping which userspace wants(in).  */
+  uint64_t fm_length;
+
+  /* FIEMAP_FLAG_* flags for request(in/out).  */
+  uint32_t fm_flags;
+
+  /* Number of extents that were mapped(out).  */
+  uint32_t fm_mapped_extents;
+
+  /* Size of fm_extents array(in).  */
+  uint32_t fm_extent_count;
+
+  uint32_t fm_reserved;
+
+  /* Array of mapped extents(out).  */
+  struct fiemap_extent fm_extents[0];
+};
+
+/* The maximum offset can be mapped for a file.  */
+# define FIEMAP_MAX_OFFSET       (~0ULL)
+
+/* Sync file data before map.  */
+# define FIEMAP_FLAG_SYNC        0x00000001
+
+/* Map extented attribute tree.  */
+# define FIEMAP_FLAG_XATTR       0x00000002
+
+# define FIEMAP_FLAGS_COMPAT     (FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR)
+
+/* Last extent in file.  */
+# define FIEMAP_EXTENT_LAST              0x00000001
+
+/* Data location unknown.  */
+# define FIEMAP_EXTENT_UNKNOWN           0x00000002
+
+/* Location still pending, Sets EXTENT_UNKNOWN.  */
+# define FIEMAP_EXTENT_DELALLOC          0x00000004
+
+/* Data can not be read while fs is unmounted.  */
+# define FIEMAP_EXTENT_ENCODED           0x00000008
+
+/* Data is encrypted by fs.  Sets EXTENT_NO_BYPASS.  */
+# define FIEMAP_EXTENT_DATA_ENCRYPTED    0x00000080
+
+/* Extent offsets may not be block aligned.  */
+# define FIEMAP_EXTENT_NOT_ALIGNED       0x00000100
+
+/* Data mixed with metadata.  Sets EXTENT_NOT_ALIGNED.  */
+# define FIEMAP_EXTENT_DATA_INLINE       0x00000200
+
+/* Multiple files in block.  Set EXTENT_NOT_ALIGNED.  */
+# define FIEMAP_EXTENT_DATA_TAIL         0x00000400
+
+/* Space allocated, but not data (i.e. zero).  */
+# define FIEMAP_EXTENT_UNWRITTEN         0x00000800
+
+/* File does not natively support extents.  Result merged for efficiency.  */
+# define FIEMAP_EXTENT_MERGED		0x00001000
+
+/* Space shared with other files.  */
+# define FIEMAP_EXTENT_SHARED            0x00002000
+
+#endif
--
1.7.1.501.g23b46


From dc98ac611dbb141275613fe30f5a8d2486817a3b Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Thu, 13 May 2010 22:17:53 +0800
Subject: [PATCH 02/10] tests: add a new test for FIEMAP-copy

* tests/cp/sparse-fiemap: Add a new test for FIEMAP-copy against a
loopbacked ext4 partition.
* tests/Makefile.am (sparse-fiemap): Reference the new test.
---
 tests/Makefile.am      |    1 +
 tests/cp/sparse-fiemap |   56 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 0 deletions(-)
 create mode 100755 tests/cp/sparse-fiemap

diff --git a/tests/Makefile.am b/tests/Makefile.am
index c458574..f7840c8 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -25,6 +25,7 @@ root_tests =					\
   cp/special-bits				\
   cp/cp-mv-enotsup-xattr			\
   cp/capability					\
+  cp/sparse-fiemap                              \
   dd/skip-seek-past-dev				\
   install/install-C-root			\
   ls/capability					\
diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
new file mode 100755
index 0000000..945c94b
--- /dev/null
+++ b/tests/cp/sparse-fiemap
@@ -0,0 +1,56 @@
+#!/bin/sh
+# Test cp --sparse=always through fiemap copy
+
+# Copyright (C) 2006-2010 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+if test "$VERBOSE" = yes; then
+  set -x
+  cp --version
+fi
+
+. $srcdir/test-lib.sh
+require_root_
+
+cwd=`pwd`
+cleanup_() { cd /; umount "$cwd/mnt"; }
+
+skip=0
+# Create an ext4 loopback file system
+dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
+mkdir mnt
+mkfs -t ext4 -F blob ||
+  skip_test_ "failed to create ext4 file system"
+mount -oloop blob mnt                          || skip=1
+echo test > mnt/f                              || skip=1
+test -s mnt/f                                  || skip=1
+
+test $skip = 1 &&
+  skip_test_ "insufficient mount/ext4 support"
+
+# Create a 1TiB sparse file
+dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=1G || framework_failure
+
+cd mnt || fail=1
+
+# It takes many minutes to copy this sparse file using the old method.
+# By contrast, it takes far less than 1 second using FIEMAP-copy.
+timeout 10 cp --sparse=always sparse fiemap || fail=1
+
+# Ensure that the sparse file copied through fiemap has the same size
+# in bytes as the original.
+test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1
+
+Exit $fail
--
1.7.1.501.g23b46


From af8047bc22c93e72c38cfd9cd7e1fdd455ce81e0 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Fri, 28 May 2010 09:24:15 +0200
Subject: [PATCH 03/10] tests: sparse-fiemap: factor out some set-up

* tests/cp/sparse-fiemap: Cd into test directory sooner.
---
 tests/cp/sparse-fiemap |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index 945c94b..21b02ac 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -33,9 +33,10 @@ dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
 mkdir mnt
 mkfs -t ext4 -F blob ||
   skip_test_ "failed to create ext4 file system"
-mount -oloop blob mnt                          || skip=1
-echo test > mnt/f                              || skip=1
-test -s mnt/f                                  || skip=1
+mount -oloop blob mnt   || skip=1
+cd mnt                  || skip=1
+echo test > f           || skip=1
+test -s f               || skip=1

 test $skip = 1 &&
   skip_test_ "insufficient mount/ext4 support"
@@ -43,7 +44,6 @@ test $skip = 1 &&
 # Create a 1TiB sparse file
 dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=1G || framework_failure

-cd mnt || fail=1

 # It takes many minutes to copy this sparse file using the old method.
 # By contrast, it takes far less than 1 second using FIEMAP-copy.
--
1.7.1.501.g23b46


From 0a6df6c17b2043515f7719491b48eb15d2812456 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Fri, 21 May 2010 18:28:42 +0200
Subject: [PATCH 04/10] tests: exercise more of the new FIEMAP copying code

* tests/cp/sparse-fiemap: Ensure that a file with many extents (more
than fit in copy.c's internal 4KiB buffer) is copied properly.
---
 tests/cp/sparse-fiemap |   38 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index 21b02ac..3608db3 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -53,4 +53,42 @@ timeout 10 cp --sparse=always sparse fiemap || fail=1
 # in bytes as the original.
 test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1

+# =================================================
+# Ensure that we exercise the FIEMAP-copying code enough
+# to provoke at least two iterations of the do...while loop
+# in which it calls ioctl (fd, FS_IOC_FIEMAP,...
+# This also verifies that non-trivial extents are preserved.
+
+$PERL -e 1 || skip_test_ 'skipping part of this test; you lack perl'
+
+$PERL -e 'BEGIN { $n = 16 * 1024; *F = *STDOUT }' \
+      -e 'for (1..100) { sysseek (*F, $n, 1)' \
+      -e '&& syswrite (*F, "."x$n) or die "$!"}' > j1 || fail=1
+
+cp --sparse=always j1 j2 || fail=1
+cmp j1 j2 || fail=1
+
+filefrag j1 | grep extent \
+  || skip_test_ 'skipping part of this test; you lack filefrag'
+
+# Here is sample filefrag output:
+#   $ perl -e 'BEGIN{$n=16*1024; *F=*STDOUT}' \
+#          -e 'for (1..5) { sysseek(*F,$n,1)' \
+#          -e '&& syswrite *F,"."x$n or die "$!"}' > j
+#   $ filefrag -v j
+#   File system type is: ef53
+#   File size of j is 163840 (40 blocks, blocksize 4096)
+#    ext logical physical expected length flags
+#      0       4  6258884               4
+#      1      12  6258892  6258887      4
+#      2      20  6258900  6258895      4
+#      3      28  6258908  6258903      4
+#      4      36  6258916  6258911      4 eof
+#   j: 6 extents found
+
+# exclude the physical block numbers; they always differ
+filefrag -v j1 | awk '/^ / {print $1,$2,$NF}' > ff1 || fail=1
+filefrag -v j2 | awk '/^ / {print $1,$2,$NF}' > ff2 || fail=1
+compare ff1 ff2 || fail=1
+
 Exit $fail
--
1.7.1.501.g23b46


From b257d65c6d923567af5d4b757810dd62e1be91bb Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sat, 22 May 2010 10:22:58 +0200
Subject: [PATCH 05/10] tests: require root only if current partition is neither btrfs nor xfs

* tests/cp/sparse-fiemap: Don't require root access if current
partition is btrfs or xfs.
---
 tests/cp/sparse-fiemap |   43 ++++++++++++++++++++++++-------------------
 1 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index 3608db3..17ed4f9 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -22,28 +22,33 @@ if test "$VERBOSE" = yes; then
 fi

 . $srcdir/test-lib.sh
-require_root_

-cwd=`pwd`
-cleanup_() { cd /; umount "$cwd/mnt"; }
-
-skip=0
-# Create an ext4 loopback file system
-dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1
-mkdir mnt
-mkfs -t ext4 -F blob ||
-  skip_test_ "failed to create ext4 file system"
-mount -oloop blob mnt   || skip=1
-cd mnt                  || skip=1
-echo test > f           || skip=1
-test -s f               || skip=1
-
-test $skip = 1 &&
-  skip_test_ "insufficient mount/ext4 support"
+if df -T -t btrfs -t xfs . ; then
+  : # Current dir is on a partition with working extents.  Good!
+else
+  # It's not;  we need to create one, hence we need root access.
+  require_root_
+
+  cwd=$PWD
+  cleanup_() { cd /; umount "$cwd/mnt"; }
+
+  skip=0
+  # Create an XFS loopback file system
+  dd if=/dev/zero of=blob bs=32k count=1000 || skip=1
+  mkdir mnt
+  mkfs -t xfs blob ||
+    skip_test_ "failed to create XFS file system"
+  mount -oloop blob mnt   || skip=1
+  cd mnt                  || skip=1
+  echo test > f           || skip=1
+  test -s f               || skip=1
+
+  test $skip = 1 &&
+    skip_test_ "insufficient mount/XFS support"
+fi

 # Create a 1TiB sparse file
-dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=1G || framework_failure
-
+dd if=/dev/zero of=sparse bs=1k count=1 seek=1G || framework_failure

 # It takes many minutes to copy this sparse file using the old method.
 # By contrast, it takes far less than 1 second using FIEMAP-copy.
--
1.7.1.501.g23b46


From 71b4c7df63795345781bc2012f79695fc6933c7d Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sat, 22 May 2010 10:21:46 +0200
Subject: [PATCH 06/10] tests: test fiemap-enabled cp more thoroughly

* tests/cp/sparse-fiemap: More tests.
---
 tests/cp/sparse-fiemap |   61 +++++++++++++++++++++++++----------------------
 1 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index 17ed4f9..6d0bd83 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -66,34 +66,37 @@ test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1

 $PERL -e 1 || skip_test_ 'skipping part of this test; you lack perl'

-$PERL -e 'BEGIN { $n = 16 * 1024; *F = *STDOUT }' \
-      -e 'for (1..100) { sysseek (*F, $n, 1)' \
-      -e '&& syswrite (*F, "."x$n) or die "$!"}' > j1 || fail=1
-
-cp --sparse=always j1 j2 || fail=1
-cmp j1 j2 || fail=1
-
-filefrag j1 | grep extent \
-  || skip_test_ 'skipping part of this test; you lack filefrag'
-
-# Here is sample filefrag output:
-#   $ perl -e 'BEGIN{$n=16*1024; *F=*STDOUT}' \
-#          -e 'for (1..5) { sysseek(*F,$n,1)' \
-#          -e '&& syswrite *F,"."x$n or die "$!"}' > j
-#   $ filefrag -v j
-#   File system type is: ef53
-#   File size of j is 163840 (40 blocks, blocksize 4096)
-#    ext logical physical expected length flags
-#      0       4  6258884               4
-#      1      12  6258892  6258887      4
-#      2      20  6258900  6258895      4
-#      3      28  6258908  6258903      4
-#      4      36  6258916  6258911      4 eof
-#   j: 6 extents found
-
-# exclude the physical block numbers; they always differ
-filefrag -v j1 | awk '/^ / {print $1,$2,$NF}' > ff1 || fail=1
-filefrag -v j2 | awk '/^ / {print $1,$2,$NF}' > ff2 || fail=1
-compare ff1 ff2 || fail=1
+for i in $(seq 20); do
+  for j in 1 2 31 100; do
+    $PERL -e 'BEGIN { $n = '$i' * 1024; *F = *STDOUT }' \
+          -e 'for (1..'$j') { sysseek (*F, $n, 1)' \
+          -e '&& syswrite (*F, "."x$n) or die "$!"}' > j1 || fail=1
+
+    cp --sparse=always j1 j2 || fail=1
+    cmp j1 j2 || fail=1
+    filefrag -v j1 | grep extent \
+      || skip_test_ 'skipping part of this test; you lack filefrag'
+
+    # Here is sample filefrag output:
+    #   $ perl -e 'BEGIN{$n=16*1024; *F=*STDOUT}' \
+    #          -e 'for (1..5) { sysseek(*F,$n,1)' \
+    #          -e '&& syswrite *F,"."x$n or die "$!"}' > j
+    #   $ filefrag -v j
+    #   File system type is: ef53
+    #   File size of j is 163840 (40 blocks, blocksize 4096)
+    #    ext logical physical expected length flags
+    #      0       4  6258884               4
+    #      1      12  6258892  6258887      4
+    #      2      20  6258900  6258895      4
+    #      3      28  6258908  6258903      4
+    #      4      36  6258916  6258911      4 eof
+    #   j: 6 extents found
+
+    # exclude the physical block numbers; they always differ
+    filefrag -v j1 | awk '/^ / {print $1,$2}' > ff1 || fail=1
+    filefrag -v j2 | awk '/^ / {print $1,$2}' > ff2 || fail=1
+    compare ff1 ff2 || fail=1
+  done
+done

 Exit $fail
--
1.7.1.501.g23b46


From f674f49bfd315143ffb4ab376e30546fd021f719 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sat, 29 May 2010 21:22:40 +0200
Subject: [PATCH 07/10] tests: relax the root-tests cross-check

* cfg.mk (sc_root_tests): Allow spaces before "require_root_",
now that tests/cp/sparse-fiemap has a conditional use.
---
 cfg.mk |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/cfg.mk b/cfg.mk
index dff5de5..17267cf 100644
--- a/cfg.mk
+++ b/cfg.mk
@@ -80,7 +80,7 @@ sc_root_tests:
 	@if test -d tests \
 	      && grep check-root tests/Makefile.am>/dev/null 2>&1; then \
 	t1=sc-root.expected; t2=sc-root.actual;				\
-	grep -nl '^require_root_$$'					\
+	grep -nl '^ *require_root_$$'					\
 	  $$($(VC_LIST) tests) |sed s,tests/,, |sort > $$t1;		\
 	sed -n '/^root_tests =[	 ]*\\$$/,/[^\]$$/p'			\
 	  $(srcdir)/tests/Makefile.am					\
--
1.7.1.501.g23b46


From 034c56d9a9a408ce336d039dba072c4ec97d7d04 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sun, 30 May 2010 21:20:30 +0200
Subject: [PATCH 08/10] tests: improve fiemap test to work with 4 FS types; fall back on ext4

* tests/cp/sparse-fiemap: Improve.
* tests/filefrag-extent-compare: New file.
---
 tests/cp/sparse-fiemap        |   38 ++++++++++++++++------
 tests/filefrag-extent-compare |   68 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+), 11 deletions(-)
 create mode 100644 tests/filefrag-extent-compare

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index 6d0bd83..3e7c11f 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -23,7 +23,7 @@ fi

 . $srcdir/test-lib.sh

-if df -T -t btrfs -t xfs . ; then
+if df -T -t btrfs -t xfs -t ext4 -t ocfs2 . ; then
   : # Current dir is on a partition with working extents.  Good!
 else
   # It's not;  we need to create one, hence we need root access.
@@ -33,18 +33,18 @@ else
   cleanup_() { cd /; umount "$cwd/mnt"; }

   skip=0
-  # Create an XFS loopback file system
+  # Create an ext4 loopback file system
   dd if=/dev/zero of=blob bs=32k count=1000 || skip=1
   mkdir mnt
-  mkfs -t xfs blob ||
-    skip_test_ "failed to create XFS file system"
+  mkfs -t ext4 -F blob ||
+    skip_test_ "failed to create ext4 file system"
   mount -oloop blob mnt   || skip=1
   cd mnt                  || skip=1
   echo test > f           || skip=1
   test -s f               || skip=1

   test $skip = 1 &&
-    skip_test_ "insufficient mount/XFS support"
+    skip_test_ "insufficient mount/ext4 support"
 fi

 # Create a 1TiB sparse file
@@ -66,13 +66,26 @@ test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1

 $PERL -e 1 || skip_test_ 'skipping part of this test; you lack perl'

-for i in $(seq 20); do
+# Extract logical block number and length pairs from filefrag -v output.
+# The initial sed is to remove the "eof" from the normally-empty "flags" field.
+# That is required when that final extent has no number in the "expected" field.
+f()
+{
+  sed 's/ eof$//' $@ \
+    | awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
+}
+
+for i in $(seq 1 2 21); do
   for j in 1 2 31 100; do
     $PERL -e 'BEGIN { $n = '$i' * 1024; *F = *STDOUT }' \
           -e 'for (1..'$j') { sysseek (*F, $n, 1)' \
-          -e '&& syswrite (*F, "."x$n) or die "$!"}' > j1 || fail=1
-
+          -e '&& syswrite (*F, chr($_)x$n) or die "$!"}' > j1 || fail=1
+    # sync
     cp --sparse=always j1 j2 || fail=1
+    # sync
+    # Technically we may need the 'sync' uses above, but
+    # uncommenting them makes this test take much longer.
+
     cmp j1 j2 || fail=1
     filefrag -v j1 | grep extent \
       || skip_test_ 'skipping part of this test; you lack filefrag'
@@ -93,10 +106,13 @@ for i in $(seq 20); do
     #   j: 6 extents found

     # exclude the physical block numbers; they always differ
-    filefrag -v j1 | awk '/^ / {print $1,$2}' > ff1 || fail=1
-    filefrag -v j2 | awk '/^ / {print $1,$2}' > ff2 || fail=1
-    compare ff1 ff2 || fail=1
+    filefrag -v j1 > ff1 || fail=1
+    filefrag -v j2 > ff2 || fail=1
+    { f ff1; f ff2; } \
+      | $PERL $abs_top_srcdir/tests/filefrag-extent-compare \
+        || { fail=1; break; }
   done
+  test $fail = 1 && break
 done

 Exit $fail
diff --git a/tests/filefrag-extent-compare b/tests/filefrag-extent-compare
new file mode 100644
index 0000000..3c095d5
--- /dev/null
+++ b/tests/filefrag-extent-compare
@@ -0,0 +1,68 @@
+eval '(exit $?0)' && eval 'exec perl -wS "$0" ${1+"$@"}'
+  & eval 'exec perl -wS "$0" $argv:q'
+    if 0;
+# Determine whether two files have the same extents by comparing
+# the logical block numbers and lengths from filefrag -v for each.
+
+# Invoke like this:
+# This helper function, f, extracts logical block number and lengths.
+# f() { awk '/^ *[0-9]/ {printf "%d %d ",$2,NF<5?$NF:$5} END {print ""}'; }
+# { filefrag -v j1 | f; filefrag -v j2 | f; } | ./filefrag-extent-compare
+
+use warnings;
+use strict;
+(my $ME = $0) =~ s|.*/||;
+
+my @line = <>;
+my $n_lines = @line;
+$n_lines == 2
+  or die "$ME: expected exactly two input lines; got $n_lines\n";
+
+my @A = split ' ', $line[0];
+my @B = split ' ', $line[1];
+@A % 2 || @B % 2
+  and die "$ME: unexpected input: odd number of numbers; expected even\n";
+
+my @a;
+my @b;
+foreach my $i (0..@A/2-1) { $a[$i] = { L_BLK => $A[2*$i], LEN => $A[2*$i+1] } };
+foreach my $i (0..@B/2-1) { $b[$i] = { L_BLK => $B[2*$i], LEN => $B[2*$i+1] } };
+
+my $i = 0;
+my $j = 0;
+while (1)
+  {
+    !defined $a[$i] && !defined $b[$j]
+      and exit 0;
+    defined $a[$i] && defined $b[$j]
+      or die "\@a and \@b have different lengths, even after adjustment\n";
+    ($a[$i]->{L_BLK} == $b[$j]->{L_BLK}
+     && $a[$i]->{LEN} == $b[$j]->{LEN})
+      and next;
+    ($a[$i]->{LEN} < $b[$j]->{LEN}
+     && exists $a[$i+1] && $a[$i]->{LEN} + $a[$i+1]->{LEN} == $b[$j]->{LEN})
+      and ++$i, next;
+    exists $b[$j+1] && $a[$i]->{LEN} == $b[$i]->{LEN} + $b[$i+1]->{LEN}
+      and ++$j, next;
+    die "differing extent:\n"
+      . "  [$i]=$a[$i]->{L_BLK} $a[$i]->{LEN}\n"
+      . "  [$j]=$b[$j]->{L_BLK} $b[$j]->{LEN}\n"
+  }
+continue
+  {
+    ++$i;
+    ++$j;
+  }
+
+### Setup "GNU" style for perl-mode and cperl-mode.
+## Local Variables:
+## mode: perl
+## perl-indent-level: 2
+## perl-continued-statement-offset: 2
+## perl-continued-brace-offset: 0
+## perl-brace-offset: 0
+## perl-brace-imaginary-offset: 0
+## perl-label-offset: -2
+## perl-extra-newline-before-brace: t
+## perl-merge-trailing-else: nil
+## End:
--
1.7.1.501.g23b46


From a7a188a4fd4f58404b825f463f3670193232a037 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sat, 5 Jun 2010 10:17:48 +0200
Subject: [PATCH 09/10] cleanup

* src/copy.c (fiemap_copy): Rename from fiemap_copy_ok.
Add/improve comments.
Remove local, "fail".
---
 src/copy.c |   42 +++++++++++++++++++++---------------------
 1 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 2c15ca0..6a4631a 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -157,17 +157,17 @@ clone_file (int dest_fd, int src_fd)
 # ifndef FS_IOC_FIEMAP
 #  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
 # endif
-/* Perform FIEMAP(available in mainline 2.6.27) copy if possible.
-   Call ioctl(2) with FS_IOC_FIEMAP to efficiently map file allocation
-   excepts holes.  So the overhead to deal with holes with lseek(2) in
-   normal copy could be saved.  This would result in much faster backups
-   for any kind of sparse file.  */
+/* Perform a FIEMAP copy, if possible.
+   Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to
+   obtain a map of file extents excluding holes.  This avoids the
+   overhead of detecting holes in a hole-introducing/preserving copy,
+   and thus makes copying sparse files much more efficient.
+   Upon a successful copy, return true.  Otherwise, return false.  */
 static bool
-fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
-                off_t src_total_size, char const *src_name,
-                char const *dst_name, bool *normal_copy_required)
+fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
+             off_t src_total_size, char const *src_name,
+             char const *dst_name, bool *normal_copy_required)
 {
-  bool fail = false;
   bool last = false;
   char fiemap_buf[4096];
   struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
@@ -180,7 +180,7 @@ fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
   unsigned int i = 0;

   /* This is required at least to initialize fiemap->fm_start,
-     but also serves (in May 2010) to appease valgrind, which
+     but also serves (in mid 2010) to appease valgrind, which
      appears not to know the semantics of the FIEMAP ioctl. */
   memset (fiemap_buf, 0, sizeof fiemap_buf);

@@ -213,13 +213,13 @@ fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
           if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
             {
               error (0, errno, _("cannot lseek %s"), quote (src_name));
-              return fail;
+              return false;
             }

           if (lseek (dest_fd, ext_logical, SEEK_SET) < 0LL)
             {
               error (0, errno, _("cannot lseek %s"), quote (dst_name));
-              return fail;
+              return false;
             }

           if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
@@ -246,7 +246,7 @@ fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
                     continue;
 #endif
                   error (0, errno, _("reading %s"), quote (src_name));
-                  return fail;
+                  return false;
                 }

               if (n_read == 0)
@@ -259,7 +259,7 @@ fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
               if (full_write (dest_fd, buf, n_read) != n_read)
                 {
                   error (0, errno, _("writing %s"), quote (dst_name));
-                  return fail;
+                  return false;
                 }

               ext_len -= n_read;
@@ -277,15 +277,15 @@ fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size,
     {
       if (ftruncate (dest_fd, src_total_size) < 0)
         {
-          error (0, errno, _("extending %s"), quote (dst_name));
-          return fail;
+          error (0, errno, _("failed to extend %s"), quote (dst_name));
+          return false;
         }
     }

-  return ! fail;
+  return true;
 }
 #else
-static bool fiemap_copy_ok (ignored) { errno == ENOTSUP; return false; }
+static bool fiemap_copy (ignored) { errno == ENOTSUP; return false; }
 #endif

 /* FIXME: describe */
@@ -823,9 +823,9 @@ copy_reg (char const *src_name, char const *dst_name,
           bool require_normal_copy = false;
           /* Perform efficient FIEMAP copy for sparse files, fall back to the
              standard copy only if the ioctl(2) fails.  */
-          if (fiemap_copy_ok (source_desc, dest_desc, buf_size,
-                              src_open_sb.st_size, src_name,
-                              dst_name, &require_normal_copy))
+          if (fiemap_copy (source_desc, dest_desc, buf_size,
+                           src_open_sb.st_size, src_name,
+                           dst_name, &require_normal_copy))
             goto preserve_metadata;
           else
             {
--
1.7.1.501.g23b46


From 7d10de7a2ba4161bc2b910f30ff4841f8c91512d Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sat, 5 Jun 2010 10:56:36 +0200
Subject: [PATCH 10/10] adjust normal_copy_required semantics

* src/copy.c (fiemap_copy): Do not require caller to set
"normal_copy_required" before calling fiemap_copy.
---
 src/copy.c |   18 +++++++++++-------
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 6a4631a..eb67700 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -161,8 +161,10 @@ clone_file (int dest_fd, int src_fd)
    Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to
    obtain a map of file extents excluding holes.  This avoids the
    overhead of detecting holes in a hole-introducing/preserving copy,
-   and thus makes copying sparse files much more efficient.
-   Upon a successful copy, return true.  Otherwise, return false.  */
+   and thus makes copying sparse files much more efficient.  Upon a
+   successful copy, return true.  If the initial ioctl fails, set
+   *NORMAL_COPY_REQUIRED to true and return false.  Upon any other
+   failure, set *NORMAL_COPY_REQUIRED to false and return false.  */
 static bool
 fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
              off_t src_total_size, char const *src_name,
@@ -170,14 +172,15 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
 {
   bool last = false;
   char fiemap_buf[4096];
-  struct fiemap *fiemap = (struct fiemap *)fiemap_buf;
+  struct fiemap *fiemap = (struct fiemap *) fiemap_buf;
   struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
-  uint32_t count = (sizeof (fiemap_buf) - sizeof (*fiemap)) /
-                    sizeof (struct fiemap_extent);
+  uint32_t count = ((sizeof fiemap_buf - sizeof (*fiemap))
+                    / sizeof (struct fiemap_extent));
   off_t last_ext_logical = 0;
   uint64_t last_ext_len = 0;
   uint64_t last_read_size = 0;
   unsigned int i = 0;
+  *normal_copy_required = false;

   /* This is required at least to initialize fiemap->fm_start,
      but also serves (in mid 2010) to appease valgrind, which
@@ -193,7 +196,8 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
          is the first time we met.  */
       if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
         {
-          /* If `i > 0', then at least one ioctl(2) has been performed before.  */
+          /* If the first ioctl fails, tell the caller that it is
+             ok to proceed with a normal copy.  */
           if (i == 0)
             *normal_copy_required = true;
           return false;
@@ -820,7 +824,7 @@ copy_reg (char const *src_name, char const *dst_name,

       if (make_holes)
         {
-          bool require_normal_copy = false;
+          bool require_normal_copy;
           /* Perform efficient FIEMAP copy for sparse files, fall back to the
              standard copy only if the ioctl(2) fails.  */
           if (fiemap_copy (source_desc, dest_desc, buf_size,
--
1.7.1.501.g23b46




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 08 Jun 2010 12:12:02 GMT) Full text and rfc822 format available.

Message #104 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 08 Jun 2010 06:10:36 -0600
[Message part 1 (text/plain, inline)]
On 06/08/2010 05:33 AM, Jim Meyering wrote:
> new file mode 100644
> index 0000000..d33293b
> --- /dev/null
> +++ b/src/fiemap.h
> @@ -0,0 +1,102 @@
> +/* FS_IOC_FIEMAP ioctl infrastructure.
> +   Some portions copyright (C) 2007 Cluster File Systems, Inc
> +   Authors: Mark Fasheh <mfasheh <at> suse.com>
> +            Kalpak Shah <kalpak.shah <at> sun.com>
> +            Andreas Dilger <adilger <at> sun.com>.  */

Shouldn't we also add an FSF Copyright 2010 to this new file, to cover
our changes?

> +++ b/tests/cp/sparse-fiemap
> @@ -0,0 +1,56 @@
> +#!/bin/sh
> +# Test cp --sparse=always through fiemap copy
> +
> +# Copyright (C) 2006-2010 Free Software Foundation, Inc.

How much of this content comes from other files from 2006, vs. new
content needing only 2010?

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 08 Jun 2010 16:14:01 GMT) Full text and rfc822 format available.

Message #107 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Sunil Mushran <sunil.mushran <at> oracle.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: bug-coreutils <at> gnu.org, Jim Meyering <jim <at> meyering.net>,
	Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 08 Jun 2010 09:11:46 -0700
On 06/08/2010 05:10 AM, Eric Blake wrote:
>> new file mode 100644
>> index 0000000..d33293b
>> --- /dev/null
>> +++ b/src/fiemap.h
>> @@ -0,0 +1,102 @@
>> +/* FS_IOC_FIEMAP ioctl infrastructure.
>> +   Some portions copyright (C) 2007 Cluster File Systems, Inc
>> +   Authors: Mark Fasheh<mfasheh <at> suse.com>
>> +            Kalpak Shah<kalpak.shah <at> sun.com>
>> +            Andreas Dilger<adilger <at> sun.com>.  */
>>      
> Shouldn't we also add an FSF Copyright 2010 to this new file, to cover
> our changes?
>    

+/* Copy from kernel, modified to respect GNU code style by Jie Liu.  */


The difference is only in style.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 08 Jun 2010 20:34:01 GMT) Full text and rfc822 format available.

Message #110 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 08 Jun 2010 22:32:18 +0200
Jim Meyering wrote:
> Subject: [PATCH 01/10] cp: Add FIEMAP support for efficient sparse file copy

FYI, using those patches, I ran a test for the first time in a few days:

    check -C tests TESTS=cp/sparse-fiemap VERBOSE=yes

It failed like this on an ext4 partition using F13:

    + timeout 10 cp --sparse=always sparse fiemap
    + fail=1
    ++ stat --printf %s sparse
    ++ stat --printf %s fiemap
    + test 1099511628800 = 0
    + fail=1

That is very odd.  No diagnostic from cp, yet it failed
after creating a zero-length file.

Here's the corresponding piece of the script:

    # It takes many minutes to copy this sparse file using the old method.
    # By contrast, it takes far less than 1 second using FIEMAP-copy.
    timeout 10 cp --sparse=always sparse fiemap || fail=1

    # Ensure that the sparse file copied through fiemap has the same size
    # in bytes as the original.
    test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1

However, so far I've been unable to reproduce the failure,
running hundreds of iterations:

    for i in $(seq 300); do printf .; make check -C tests \
      TESTS=cp/sparse-fiemap VERBOSE=yes >& makerr-$i || break; done

Have any of you heard of a problem whereby a cold cache can cause
such a thing?  "echo 3 > /proc/sys/vm/drop_caches" didn't help.
I suspect that having so many extents is unusual, so maybe
this is a rarely exercised corner case.

===============================
As I wrote the above, I realized I probably had enough
information to deduce where things were going wrong, even
if so far I've been unable to reproduce it.

And sure enough.  There is a way to provoke exactly
that failure.  If the *second* (or later) FIEMAP ioctl fails:

  do
    {
      fiemap->fm_length = FIEMAP_MAX_OFFSET;
      fiemap->fm_extent_count = count;

      /* When ioctl(2) fails, fall back to the normal copy only if it
         is the first time we met.  */
      if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
        {
          /* If the first ioctl fails, tell the caller that it is
             ok to proceed with a normal copy.  */
          if (i == 0)
            *normal_copy_required = true;
          return false;
        }

In that case, fiemap_copy returns false (with no diagnostic)
and cp fails silently.

Obviously I will now add code to diagnose the failure,
but do any of you know off hand how to reproduce this
or what the failure might have been?

Here's the patch I plan to merge:

diff --git a/src/copy.c b/src/copy.c
index eb67700..07d605e 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -200,6 +200,12 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
              ok to proceed with a normal copy.  */
           if (i == 0)
             *normal_copy_required = true;
+          else
+            {
+              /* If the second or subsequent ioctl fails, diagnose it,
+                 since it ends up causing the entire copy/cp to fail.  */
+              error (0, errno, _("%s: FIEMAP ioctl failed"), quote (src_name));
+            }
           return false;
         }




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 08 Jun 2010 21:10:02 GMT) Full text and rfc822 format available.

Message #113 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Eric Blake <eblake <at> redhat.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 08 Jun 2010 22:57:51 +0200
Eric Blake wrote:
>> +++ b/tests/cp/sparse-fiemap
>> @@ -0,0 +1,56 @@
>> +#!/bin/sh
>> +# Test cp --sparse=always through fiemap copy
>> +
>> +# Copyright (C) 2006-2010 Free Software Foundation, Inc.
>
> How much of this content comes from other files from 2006, vs. new
> content needing only 2010?

Good point.
This started as a largely copied test, but now it's mostly new.
I'm taking this opportunity to make it use init.sh, too.
I'll squash this.  Think of init.sh as the next-generation test-lib.sh.

Thanks.

From c7f9d3d0ff23d72cadd435ceef8d44b7eab7f072 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Tue, 8 Jun 2010 22:53:51 +0200
Subject: [PATCH] test-use init.sh

---
 tests/cp/sparse-fiemap |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index 3e7c11f..dc0cf60 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -1,7 +1,7 @@
 #!/bin/sh
 # Test cp --sparse=always through fiemap copy

-# Copyright (C) 2006-2010 Free Software Foundation, Inc.
+# Copyright (C) 2010 Free Software Foundation, Inc.

 # This program is free software: you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -21,7 +21,7 @@ if test "$VERBOSE" = yes; then
   cp --version
 fi

-. $srcdir/test-lib.sh
+. "${srcdir=.}/init.sh"; path_prepend_ ../src

 if df -T -t btrfs -t xfs -t ext4 -t ocfs2 . ; then
   : # Current dir is on a partition with working extents.  Good!
--
1.7.1.501.g23b46




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 09 Jun 2010 00:45:02 GMT) Full text and rfc822 format available.

Message #116 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 08 Jun 2010 17:44:25 -0700
Jeff Liu and Jim Meyering wrote:

> diff --git a/src/fiemap.h b/src/fiemap.h
> new file mode 100644
> index 0000000..d33293b
> --- /dev/null
> +++ b/src/fiemap.h

Why is this file necessary?  fiemap.h is included only if it exists,
right?  Shouldn't we just use the kernel's fiemap.h rather than
copying it here and assuming kernel internals?

>          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)

For this sort of thing, please just use "0" rather than "0LL".
"0" is easier to read and it has the same effect here.

>              char buf[buf_size];

This assumes C99, since buf_size is not known at compile time.
Also, isn't there a potential segmentation-violation problem
if buf_size is sufficiently large?

More generally, since the caller is already allocating a buffer of the
appropiate size, shouldn't we just reuse that buffer, rather than
allocating a new one?  That would avoid the problems of assuming
C99 and possible segmentation violations.


>   char fiemap_buf[4096];
>   struct fiemap *fiemap = (struct fiemap *) fiemap_buf;
>   struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
>   uint32_t count = ((sizeof fiemap_buf - sizeof (*fiemap))
>                     / sizeof (struct fiemap_extent));

This code isn't portable, since fiemap_buf is only char-aligned, and
struct fiemap may well require stricter alignment.  The code will work
on the x86 despite the alignment problem, but that's just a happy
coincidence.

A lesser point: the code assumes that 'struct fiemap' is sufficiently
small (considerably less than 4096 bytes in size); I expect that this
is universally true but we might as well check this assumption, since
it's easy to do so without any runtime overhead.

So I propose something like this instead:

   union { struct fiemap f; char c[4096]; } fiemap_buf;
   struct fiemap *fiemap = &fiemap_buf.f;
   struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
   enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
   verify (count != 0);





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 09 Jun 2010 06:46:02 GMT) Full text and rfc822 format available.

Message #119 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 09 Jun 2010 08:27:35 +0200
Paul Eggert wrote:
> Jeff Liu and Jim Meyering wrote:
>> diff --git a/src/fiemap.h b/src/fiemap.h
>> new file mode 100644
>> index 0000000..d33293b
>> --- /dev/null
>> +++ b/src/fiemap.h

Hi Paul,

Thanks for the review.

> Why is this file necessary?  fiemap.h is included only if it exists,
> right?  Shouldn't we just use the kernel's fiemap.h rather than
> copying it here and assuming kernel internals?

The ioctl is available/usable in 2.6.27 and newer that do not publish
this file.  For example, it's in F13's (2.6.33's) /usr/include/linux/fiemap.h,
as well as the one in debian unstable's 2.6.32, but probably
not in much older kernels.

Hmm..  I see that it's available even in F11's 2.6.30.9-x

It would be better to include <linux/fiemap.h> when present,
else our copy of that header.  Then, eventually, the else
clause will become unused.  Note that even on newer kernels,
the linux/* headers are optional.

Eventually we'll have a hard requirement on kernel headers --
at least when building against a linux kernel.

>>          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
>
> For this sort of thing, please just use "0" rather than "0LL".
> "0" is easier to read and it has the same effect here.

Included in the patch below.

>>              char buf[buf_size];
>
> This assumes C99, since buf_size is not known at compile time.
> Also, isn't there a potential segmentation-violation problem
> if buf_size is sufficiently large?
>
> More generally, since the caller is already allocating a buffer of the
> appropiate size, shouldn't we just reuse that buffer, rather than
> allocating a new one?  That would avoid the problems of assuming
> C99 and possible segmentation violations.

Good point.  Thanks.
We can definitely avoid that allocation.
Do you feel like writing the patch?

I've just pushed this series to a branch, "fiemap-copy",
so others can follow along and contribute more easily.

>>   char fiemap_buf[4096];
>>   struct fiemap *fiemap = (struct fiemap *) fiemap_buf;
>>   struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
>>   uint32_t count = ((sizeof fiemap_buf - sizeof (*fiemap))
>>                     / sizeof (struct fiemap_extent));
>
> This code isn't portable, since fiemap_buf is only char-aligned, and
> struct fiemap may well require stricter alignment.  The code will work
> on the x86 despite the alignment problem, but that's just a happy
> coincidence.
>
> A lesser point: the code assumes that 'struct fiemap' is sufficiently
> small (considerably less than 4096 bytes in size); I expect that this
> is universally true but we might as well check this assumption, since
> it's easy to do so without any runtime overhead.
>
> So I propose something like this instead:
>
>    union { struct fiemap f; char c[4096]; } fiemap_buf;
>    struct fiemap *fiemap = &fiemap_buf.f;
>    struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
>    enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
>    verify (count != 0);

I've done this in your name:

From fffa2e8661a27978927fcc8afb6873631a753292 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Wed, 9 Jun 2010 08:15:07 +0200
Subject: [PATCH] copy.c: ensure proper alignment of fiemap buffer

* src/copy.c (fiemap_copy): Ensure that our fiemap buffer
is large enough and well-aligned.
Replace "0LL" with equivalent "0" as 3rd argument to lseek.
---
 src/copy.c |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index f629771..27e083a 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -171,11 +171,12 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
              char const *dst_name, bool *normal_copy_required)
 {
   bool last = false;
-  char fiemap_buf[4096];
-  struct fiemap *fiemap = (struct fiemap *) fiemap_buf;
+  union { struct fiemap f; char c[4096]; } fiemap_buf;
+  struct fiemap *fiemap = &fiemap_buf.f;
   struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
-  uint32_t count = ((sizeof fiemap_buf - sizeof (*fiemap))
-                    / sizeof (struct fiemap_extent));
+  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
+  verify (count != 0);
+
   off_t last_ext_logical = 0;
   uint64_t last_ext_len = 0;
   uint64_t last_read_size = 0;
@@ -185,7 +186,7 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
   /* This is required at least to initialize fiemap->fm_start,
      but also serves (in mid 2010) to appease valgrind, which
      appears not to know the semantics of the FIEMAP ioctl. */
-  memset (fiemap_buf, 0, sizeof fiemap_buf);
+  memset (&fiemap_buf, 0, sizeof fiemap_buf);

   do
     {
@@ -220,13 +221,13 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
           off_t ext_logical = fm_ext[i].fe_logical;
           uint64_t ext_len = fm_ext[i].fe_length;

-          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
+          if (lseek (src_fd, ext_logical, SEEK_SET) < 0)
             {
               error (0, errno, _("cannot lseek %s"), quote (src_name));
               return false;
             }

-          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0LL)
+          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0)
             {
               error (0, errno, _("cannot lseek %s"), quote (dst_name));
               return false;
--
1.7.1.501.g23b46


Also, I've squashed this clean-up patch onto Jeff's original,
since ext_len is unsigned (of type uint64_t).

From bad13e737c683757a2ed05404564d8863c5da30e Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Wed, 9 Jun 2010 08:24:39 +0200
Subject: [PATCH] remove 0 <

---
 src/copy.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 27e083a..f149be4 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -240,7 +240,7 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
               last = true;
             }

-          while (0 < ext_len)
+          while (ext_len)
             {
               char buf[buf_size];

--
1.7.1.501.g23b46




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 09 Jun 2010 07:35:02 GMT) Full text and rfc822 format available.

Message #122 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>, Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 09 Jun 2010 15:33:18 +0800
Jim Meyering wrote:
> Paul Eggert wrote:
>> Jeff Liu and Jim Meyering wrote:
>>> diff --git a/src/fiemap.h b/src/fiemap.h
>>> new file mode 100644
>>> index 0000000..d33293b
>>> --- /dev/null
>>> +++ b/src/fiemap.h
> 
> Hi Paul,
> 
> Thanks for the review.
> 
>> Why is this file necessary?  fiemap.h is included only if it exists,
>> right?  Shouldn't we just use the kernel's fiemap.h rather than
>> copying it here and assuming kernel internals?
> 
> The ioctl is available/usable in 2.6.27 and newer that do not publish
> this file.  For example, it's in F13's (2.6.33's) /usr/include/linux/fiemap.h,
> as well as the one in debian unstable's 2.6.32, but probably
> not in much older kernels.
> 
> Hmm..  I see that it's available even in F11's 2.6.30.9-x
> 
> It would be better to include <linux/fiemap.h> when present,
> else our copy of that header.  Then, eventually, the else
> clause will become unused.  Note that even on newer kernels,
> the linux/* headers are optional.
> 
> Eventually we'll have a hard requirement on kernel headers --
> at least when building against a linux kernel.
> 
>>>          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
>> For this sort of thing, please just use "0" rather than "0LL".
>> "0" is easier to read and it has the same effect here.
> 
> Included in the patch below.
> 
>>>              char buf[buf_size];
>> This assumes C99, since buf_size is not known at compile time.
>> Also, isn't there a potential segmentation-violation problem
>> if buf_size is sufficiently large?
>>
>> More generally, since the caller is already allocating a buffer of the
>> appropiate size, shouldn't we just reuse that buffer, rather than
>> allocating a new one?  That would avoid the problems of assuming
>> C99 and possible segmentation violations.
> 
> Good point.  Thanks.
> We can definitely avoid that allocation.
> Do you feel like writing the patch?
> 
> I've just pushed this series to a branch, "fiemap-copy",
> so others can follow along and contribute more easily.
> 
>>>   char fiemap_buf[4096];
>>>   struct fiemap *fiemap = (struct fiemap *) fiemap_buf;
>>>   struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
>>>   uint32_t count = ((sizeof fiemap_buf - sizeof (*fiemap))
>>>                     / sizeof (struct fiemap_extent));
>> This code isn't portable, since fiemap_buf is only char-aligned, and
>> struct fiemap may well require stricter alignment.  The code will work
>> on the x86 despite the alignment problem, but that's just a happy
>> coincidence.
>>
>> A lesser point: the code assumes that 'struct fiemap' is sufficiently
>> small (considerably less than 4096 bytes in size); I expect that this
>> is universally true but we might as well check this assumption, since
>> it's easy to do so without any runtime overhead.
>>
>> So I propose something like this instead:
>>
>>    union { struct fiemap f; char c[4096]; } fiemap_buf;
>>    struct fiemap *fiemap = &fiemap_buf.f;
>>    struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
>>    enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
>>    verify (count != 0);
Cooool!!

Thanks Paul and Jim for the sharing.


Regards,
-Jeff

> 
> I've done this in your name:
> 
> From fffa2e8661a27978927fcc8afb6873631a753292 Mon Sep 17 00:00:00 2001
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Wed, 9 Jun 2010 08:15:07 +0200
> Subject: [PATCH] copy.c: ensure proper alignment of fiemap buffer
> 
> * src/copy.c (fiemap_copy): Ensure that our fiemap buffer
> is large enough and well-aligned.
> Replace "0LL" with equivalent "0" as 3rd argument to lseek.
> ---
>  src/copy.c |   15 ++++++++-------
>  1 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/src/copy.c b/src/copy.c
> index f629771..27e083a 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -171,11 +171,12 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
>               char const *dst_name, bool *normal_copy_required)
>  {
>    bool last = false;
> -  char fiemap_buf[4096];
> -  struct fiemap *fiemap = (struct fiemap *) fiemap_buf;
> +  union { struct fiemap f; char c[4096]; } fiemap_buf;
> +  struct fiemap *fiemap = &fiemap_buf.f;
>    struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
> -  uint32_t count = ((sizeof fiemap_buf - sizeof (*fiemap))
> -                    / sizeof (struct fiemap_extent));
> +  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
> +  verify (count != 0);
> +
>    off_t last_ext_logical = 0;
>    uint64_t last_ext_len = 0;
>    uint64_t last_read_size = 0;
> @@ -185,7 +186,7 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
>    /* This is required at least to initialize fiemap->fm_start,
>       but also serves (in mid 2010) to appease valgrind, which
>       appears not to know the semantics of the FIEMAP ioctl. */
> -  memset (fiemap_buf, 0, sizeof fiemap_buf);
> +  memset (&fiemap_buf, 0, sizeof fiemap_buf);
> 
>    do
>      {
> @@ -220,13 +221,13 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
>            off_t ext_logical = fm_ext[i].fe_logical;
>            uint64_t ext_len = fm_ext[i].fe_length;
> 
> -          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
> +          if (lseek (src_fd, ext_logical, SEEK_SET) < 0)
>              {
>                error (0, errno, _("cannot lseek %s"), quote (src_name));
>                return false;
>              }
> 
> -          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0LL)
> +          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0)
>              {
>                error (0, errno, _("cannot lseek %s"), quote (dst_name));
>                return false;
> --
> 1.7.1.501.g23b46
> 
> 
> Also, I've squashed this clean-up patch onto Jeff's original,
> since ext_len is unsigned (of type uint64_t).
> 
> From bad13e737c683757a2ed05404564d8863c5da30e Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Wed, 9 Jun 2010 08:24:39 +0200
> Subject: [PATCH] remove 0 <
> 
> ---
>  src/copy.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/src/copy.c b/src/copy.c
> index 27e083a..f149be4 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -240,7 +240,7 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
>                last = true;
>              }
> 
> -          while (0 < ext_len)
> +          while (ext_len)
>              {
>                char buf[buf_size];
> 
> --
> 1.7.1.501.g23b46
> 
> 
> 


-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 09 Jun 2010 09:10:02 GMT) Full text and rfc822 format available.

Message #125 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>, Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 09 Jun 2010 17:07:33 +0800
Jim Meyering wrote:
> Paul Eggert wrote:
>> Jeff Liu and Jim Meyering wrote:
>>> diff --git a/src/fiemap.h b/src/fiemap.h
>>> new file mode 100644
>>> index 0000000..d33293b
>>> --- /dev/null
>>> +++ b/src/fiemap.h
> 
> Hi Paul,
> 
> Thanks for the review.
> 
>> Why is this file necessary?  fiemap.h is included only if it exists,
>> right?  Shouldn't we just use the kernel's fiemap.h rather than
>> copying it here and assuming kernel internals?
> 
> The ioctl is available/usable in 2.6.27 and newer that do not publish
> this file.  For example, it's in F13's (2.6.33's) /usr/include/linux/fiemap.h,
> as well as the one in debian unstable's 2.6.32, but probably
> not in much older kernels.
> 
> Hmm..  I see that it's available even in F11's 2.6.30.9-x
> 
> It would be better to include <linux/fiemap.h> when present,
> else our copy of that header.  Then, eventually, the else
> clause will become unused.  Note that even on newer kernels,
> the linux/* headers are optional.
> 
> Eventually we'll have a hard requirement on kernel headers --
> at least when building against a linux kernel.
> 
>>>          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
>> For this sort of thing, please just use "0" rather than "0LL".
>> "0" is easier to read and it has the same effect here.
> 
> Included in the patch below.
> 
>>>              char buf[buf_size];
>> This assumes C99, since buf_size is not known at compile time.
>> Also, isn't there a potential segmentation-violation problem
>> if buf_size is sufficiently large?
>>
>> More generally, since the caller is already allocating a buffer of the
>> appropiate size, shouldn't we just reuse that buffer, rather than
>> allocating a new one?  That would avoid the problems of assuming
>> C99 and possible segmentation violations.
> 
> Good point.  Thanks.
> We can definitely avoid that allocation.
> Do you feel like writing the patch?
Thanks for pointing this out!

Hi Paul,

I am appreciated if you have time for writing the patch.  Or else, I will follow up sometime in the
next few days since I have an urgent task need to handle over at present.


Regards,
-Jeff

> 
> I've just pushed this series to a branch, "fiemap-copy",
> so others can follow along and contribute more easily.
> 
>>>   char fiemap_buf[4096];
>>>   struct fiemap *fiemap = (struct fiemap *) fiemap_buf;
>>>   struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
>>>   uint32_t count = ((sizeof fiemap_buf - sizeof (*fiemap))
>>>                     / sizeof (struct fiemap_extent));
>> This code isn't portable, since fiemap_buf is only char-aligned, and
>> struct fiemap may well require stricter alignment.  The code will work
>> on the x86 despite the alignment problem, but that's just a happy
>> coincidence.
>>
>> A lesser point: the code assumes that 'struct fiemap' is sufficiently
>> small (considerably less than 4096 bytes in size); I expect that this
>> is universally true but we might as well check this assumption, since
>> it's easy to do so without any runtime overhead.
>>
>> So I propose something like this instead:
>>
>>    union { struct fiemap f; char c[4096]; } fiemap_buf;
>>    struct fiemap *fiemap = &fiemap_buf.f;
>>    struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
>>    enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
>>    verify (count != 0);
> 
> I've done this in your name:
> 
> From fffa2e8661a27978927fcc8afb6873631a753292 Mon Sep 17 00:00:00 2001
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Wed, 9 Jun 2010 08:15:07 +0200
> Subject: [PATCH] copy.c: ensure proper alignment of fiemap buffer
> 
> * src/copy.c (fiemap_copy): Ensure that our fiemap buffer
> is large enough and well-aligned.
> Replace "0LL" with equivalent "0" as 3rd argument to lseek.
> ---
>  src/copy.c |   15 ++++++++-------
>  1 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/src/copy.c b/src/copy.c
> index f629771..27e083a 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -171,11 +171,12 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
>               char const *dst_name, bool *normal_copy_required)
>  {
>    bool last = false;
> -  char fiemap_buf[4096];
> -  struct fiemap *fiemap = (struct fiemap *) fiemap_buf;
> +  union { struct fiemap f; char c[4096]; } fiemap_buf;
> +  struct fiemap *fiemap = &fiemap_buf.f;
>    struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
> -  uint32_t count = ((sizeof fiemap_buf - sizeof (*fiemap))
> -                    / sizeof (struct fiemap_extent));
> +  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
> +  verify (count != 0);
> +
>    off_t last_ext_logical = 0;
>    uint64_t last_ext_len = 0;
>    uint64_t last_read_size = 0;
> @@ -185,7 +186,7 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
>    /* This is required at least to initialize fiemap->fm_start,
>       but also serves (in mid 2010) to appease valgrind, which
>       appears not to know the semantics of the FIEMAP ioctl. */
> -  memset (fiemap_buf, 0, sizeof fiemap_buf);
> +  memset (&fiemap_buf, 0, sizeof fiemap_buf);
> 
>    do
>      {
> @@ -220,13 +221,13 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
>            off_t ext_logical = fm_ext[i].fe_logical;
>            uint64_t ext_len = fm_ext[i].fe_length;
> 
> -          if (lseek (src_fd, ext_logical, SEEK_SET) < 0LL)
> +          if (lseek (src_fd, ext_logical, SEEK_SET) < 0)
>              {
>                error (0, errno, _("cannot lseek %s"), quote (src_name));
>                return false;
>              }
> 
> -          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0LL)
> +          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0)
>              {
>                error (0, errno, _("cannot lseek %s"), quote (dst_name));
>                return false;
> --
> 1.7.1.501.g23b46
> 
> 
> Also, I've squashed this clean-up patch onto Jeff's original,
> since ext_len is unsigned (of type uint64_t).
> 
> From bad13e737c683757a2ed05404564d8863c5da30e Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Wed, 9 Jun 2010 08:24:39 +0200
> Subject: [PATCH] remove 0 <
> 
> ---
>  src/copy.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/src/copy.c b/src/copy.c
> index 27e083a..f149be4 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -240,7 +240,7 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
>                last = true;
>              }
> 
> -          while (0 < ext_len)
> +          while (ext_len)
>              {
>                char buf[buf_size];
> 
> --
> 1.7.1.501.g23b46
> 
> 
> 


-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 09 Jun 2010 14:49:02 GMT) Full text and rfc822 format available.

Message #128 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 09 Jun 2010 22:46:20 +0800
Jim Meyering wrote:
> Jim Meyering wrote:
>> Subject: [PATCH 01/10] cp: Add FIEMAP support for efficient sparse file copy
> 
> FYI, using those patches, I ran a test for the first time in a few days:
> 
>     check -C tests TESTS=cp/sparse-fiemap VERBOSE=yes
> 
> It failed like this on an ext4 partition using F13:
> 
>     + timeout 10 cp --sparse=always sparse fiemap
>     + fail=1
>     ++ stat --printf %s sparse
>     ++ stat --printf %s fiemap
>     + test 1099511628800 = 0
>     + fail=1
> 
> That is very odd.  No diagnostic from cp, yet it failed
> after creating a zero-length file.
> 
> Here's the corresponding piece of the script:
> 
>     # It takes many minutes to copy this sparse file using the old method.
>     # By contrast, it takes far less than 1 second using FIEMAP-copy.
>     timeout 10 cp --sparse=always sparse fiemap || fail=1
> 
>     # Ensure that the sparse file copied through fiemap has the same size
>     # in bytes as the original.
>     test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1
> 
> However, so far I've been unable to reproduce the failure,
> running hundreds of iterations:
> 
>     for i in $(seq 300); do printf .; make check -C tests \
>       TESTS=cp/sparse-fiemap VERBOSE=yes >& makerr-$i || break; done
> 
> Have any of you heard of a problem whereby a cold cache can cause
> such a thing?  "echo 3 > /proc/sys/vm/drop_caches" didn't help.
Hi Jim,

Have you run `sync' before clean the buffer and caches?  Actually, even run `sync' first, sometimes,
maybe the dirty objects still can not be freed in some cases. :(

I can reproduce this issue on ext4 and btrfs(physical mounted partition) or just run the
sparse-fiemap test script, ocfs2 always works fine in this case.

I guess this issue might caused by the 'cold cache' as your above mentioned.
According to my tryout, after clean out the caches, cp via filemap always works in my test
environment, otherwise, it failed from time to time.

My kernel version:
Linux jeff-laptop 2.6.33-rc5-00238-gb04da8b-dirty #11 SMP Sat Dec 19 22:02:01 CST 2009 i686 GNU/Linux

jeff <at> jeff-laptop:/ext4$ dd if=/dev/zero of=sparse bs=1k count=1 seek=1G
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 0.000156654 s, 6.5 MB/s
jeff <at> jeff-laptop:/ext4$ ls -l sparse
-rw-r--r-- 1 jeff jeff 1099511628800 Jun  9 22:21 sparse
jeff <at> jeff-laptop:/ext4$ filefrag sparse
sparse: 0 extents found
jeff <at> jeff-laptop:/ext4$ filefrag -v sparse
Filesystem type is: ef53
File size of sparse is 1099511628800 (268435457 blocks, blocksize 4096)
 ext  logical physical expected length flags
sparse: 1 extent found

To free the buffer cache:
=========================
jeff <at> jeff-laptop:/ext4$ free
             total       used       free     shared    buffers     cached
Mem:       1980300     719972    1260328          0       2836      94104
-/+ buffers/cache:     623032    1357268
Swap:            0          0          0
jeff <at> jeff-laptop:/ext4$ sync

In another root console, run 'echo 3 > /proc/sys/vm/drop_caches'

jeff <at> jeff-laptop:/ext4$ free
             total       used       free     shared    buffers     cached
Mem:       1980300     716780    1263520          0       1184      88592   <<<<<-----freed
-/+ buffers/cache:     627004    1353296
Swap:            0          0          0

jeff <at> jeff-laptop:/ext4$ filefrag -v sparse
Filesystem type is: ef53
File size of sparse is 1099511628800 (268435457 blocks, blocksize 4096)
 ext  logical physical expected length flags
   0 268435456    32999               1 eof
sparse: 2 extents found

jeff <at> jeff-laptop:/ext4$ ./cp --sparse=always sparse f1
last_ext_logical 1099511627776 last_read_size 1024 src_total_size 1099511628800
jeff <at> jeff-laptop:/ext4$ filefrag -v f1
Filesystem type is: ef53
File size of f1 is 1099511628800 (268435457 blocks, blocksize 4096)
 ext  logical physical expected length flags
   0 268435456   296960               1 eof
f1: 2 extents found


jeff <at> jeff-laptop:/ext4$ ./cp --sparse=always sparse f2
last_ext_logical 1099511627776 last_read_size 1024 src_total_size 1099511628800

jeff <at> jeff-laptop:/ext4$ filefrag -v f2
Filesystem type is: ef53
File size of f2 is 1099511628800 (268435457 blocks, blocksize 4096)
 ext  logical physical expected length flags
f2: 1 extent found

jeff <at> jeff-laptop:/ext4$ sync and 'clean memory via /proc on another root console'

jeff <at> jeff-laptop:/ext4$ filefrag -v f2
Filesystem type is: ef53
File size of f2 is 1099511628800 (268435457 blocks, blocksize 4096)
 ext  logical physical expected length flags
   0 268435456    33379               1 eof
f2: 2 extents found


I will do a double check for my original patch to ensure this is not a code bug for that issue once
get through an urgent task on hand.

Thanks,
-Jeff

> I suspect that having so many extents is unusual, so maybe
> this is a rarely exercised corner case.
> 
> ===============================
> As I wrote the above, I realized I probably had enough
> information to deduce where things were going wrong, even
> if so far I've been unable to reproduce it.
> 
> And sure enough.  There is a way to provoke exactly
> that failure.  If the *second* (or later) FIEMAP ioctl fails:
> 
>   do
>     {
>       fiemap->fm_length = FIEMAP_MAX_OFFSET;
>       fiemap->fm_extent_count = count;
> 
>       /* When ioctl(2) fails, fall back to the normal copy only if it
>          is the first time we met.  */
>       if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
>         {
>           /* If the first ioctl fails, tell the caller that it is
>              ok to proceed with a normal copy.  */
>           if (i == 0)
>             *normal_copy_required = true;
>           return false;
>         }
> 
> In that case, fiemap_copy returns false (with no diagnostic)
> and cp fails silently.
> 
> Obviously I will now add code to diagnose the failure,
> but do any of you know off hand how to reproduce this
> or what the failure might have been?
> 
> Here's the patch I plan to merge:
> 
> diff --git a/src/copy.c b/src/copy.c
> index eb67700..07d605e 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -200,6 +200,12 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
>               ok to proceed with a normal copy.  */
>            if (i == 0)
>              *normal_copy_required = true;
> +          else
> +            {
> +              /* If the second or subsequent ioctl fails, diagnose it,
> +                 since it ends up causing the entire copy/cp to fail.  */
> +              error (0, errno, _("%s: FIEMAP ioctl failed"), quote (src_name));
> +            }
>            return false;
>          }


-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 09 Jun 2010 15:13:02 GMT) Full text and rfc822 format available.

Message #131 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 09 Jun 2010 17:12:07 +0200
jeff.liu wrote:

> Jim Meyering wrote:
>> Jim Meyering wrote:
>>> Subject: [PATCH 01/10] cp: Add FIEMAP support for efficient sparse file copy
>>
>> FYI, using those patches, I ran a test for the first time in a few days:
>>
>>     check -C tests TESTS=cp/sparse-fiemap VERBOSE=yes
>>
>> It failed like this on an ext4 partition using F13:
>>
>>     + timeout 10 cp --sparse=always sparse fiemap
>>     + fail=1
>>     ++ stat --printf %s sparse
>>     ++ stat --printf %s fiemap
>>     + test 1099511628800 = 0
>>     + fail=1
>>
>> That is very odd.  No diagnostic from cp, yet it failed
>> after creating a zero-length file.
>>
>> Here's the corresponding piece of the script:
>>
>>     # It takes many minutes to copy this sparse file using the old method.
>>     # By contrast, it takes far less than 1 second using FIEMAP-copy.
>>     timeout 10 cp --sparse=always sparse fiemap || fail=1
>>
>>     # Ensure that the sparse file copied through fiemap has the same size
>>     # in bytes as the original.
>>     test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1
>>
>> However, so far I've been unable to reproduce the failure,
>> running hundreds of iterations:
>>
>>     for i in $(seq 300); do printf .; make check -C tests \
>>       TESTS=cp/sparse-fiemap VERBOSE=yes >& makerr-$i || break; done
>>
>> Have any of you heard of a problem whereby a cold cache can cause
>> such a thing?  "echo 3 > /proc/sys/vm/drop_caches" didn't help.
> Hi Jim,
>
> Have you run `sync' before clean the buffer and caches?  Actually, even run `sync' first, sometimes,
> maybe the dirty objects still can not be freed in some cases. :(

Hi Jeff,

Thanks for the feedback.

The test script I ran above does not invoke sync between
the dd invocation and the cp --sparse.
If running sync before cp is required in order to avoid a failure,
then I consider that a bug in cp.

> I can reproduce this issue on ext4 and btrfs(physical mounted partition) or just run the

If you can reproduce the cp failure, then please use my latest
patch that diagnoses 2nd and subsequent FIEMAP ioctl failure
and tell me what cp prints when it fails.

I would really like to know how the first ioctl can succeed,
yet the second one fails.

Your demonstrations show that *after* the cp, one may have
to sync in order to see precisely if/how the newly-created
destination file is fragmented, but that is nothing new.
Note the commented-out "sync" uses in the test script.
As far as I can see, your examples do not show cp failing
like it did for me.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 10 Jun 2010 06:58:02 GMT) Full text and rfc822 format available.

Message #134 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 10 Jun 2010 14:56:57 +0800
Jim Meyering wrote:
> jeff.liu wrote:
> 
>> Jim Meyering wrote:
>>> Jim Meyering wrote:
>>>> Subject: [PATCH 01/10] cp: Add FIEMAP support for efficient sparse file copy
>>> FYI, using those patches, I ran a test for the first time in a few days:
>>>
>>>     check -C tests TESTS=cp/sparse-fiemap VERBOSE=yes
>>>
>>> It failed like this on an ext4 partition using F13:
>>>
>>>     + timeout 10 cp --sparse=always sparse fiemap
>>>     + fail=1
>>>     ++ stat --printf %s sparse
>>>     ++ stat --printf %s fiemap
>>>     + test 1099511628800 = 0
>>>     + fail=1
>>>
>>> That is very odd.  No diagnostic from cp, yet it failed
>>> after creating a zero-length file.
>>>
>>> Here's the corresponding piece of the script:
>>>
>>>     # It takes many minutes to copy this sparse file using the old method.
>>>     # By contrast, it takes far less than 1 second using FIEMAP-copy.
>>>     timeout 10 cp --sparse=always sparse fiemap || fail=1
>>>
>>>     # Ensure that the sparse file copied through fiemap has the same size
>>>     # in bytes as the original.
>>>     test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1
>>>
>>> However, so far I've been unable to reproduce the failure,
>>> running hundreds of iterations:
>>>
>>>     for i in $(seq 300); do printf .; make check -C tests \
>>>       TESTS=cp/sparse-fiemap VERBOSE=yes >& makerr-$i || break; done
>>>
>>> Have any of you heard of a problem whereby a cold cache can cause
>>> such a thing?  "echo 3 > /proc/sys/vm/drop_caches" didn't help.
>> Hi Jim,
>>
>> Have you run `sync' before clean the buffer and caches?  Actually, even run `sync' first, sometimes,
>> maybe the dirty objects still can not be freed in some cases. :(
> 
> Hi Jeff,
> 
> Thanks for the feedback.
> 
> The test script I ran above does not invoke sync between
> the dd invocation and the cp --sparse.
> If running sync before cp is required in order to avoid a failure,
> then I consider that a bug in cp.
> 
>> I can reproduce this issue on ext4 and btrfs(physical mounted partition) or just run the
> 
> If you can reproduce the cp failure, then please use my latest
> patch that diagnoses 2nd and subsequent FIEMAP ioctl failure
> and tell me what cp prints when it fails.
Strange!! Today I can not reproduce this issue on my laptop even repeat the test script up to 300 times.

Yesterday I also run test against your fiemap-copy git-tree with the latest diagnose patch, but it
did not show me the 2nd ioctl failure info because that 1TB sparse file only have one extent
allocated on ext4, so the ioctl(2) only hit once.

> 
> I would really like to know how the first ioctl can succeed,
> yet the second one fails.
> 
> Your demonstrations show that *after* the cp, one may have
> to sync in order to see precisely if/how the newly-created
> destination file is fragmented, but that is nothing new.
> Note the commented-out "sync" uses in the test script.
> As far as I can see, your examples do not show cp failing
> like it did for me.
Yeah, I just realized that the behaviour I observed is caused by the delay allocation mechanism of
the particular FS.


Thanks,
-Jeff
> 
> 
> 


-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 10 Jun 2010 23:48:02 GMT) Full text and rfc822 format available.

Message #137 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 10 Jun 2010 16:47:23 -0700
On 06/09/2010 11:56 PM, jeff.liu wrote:

> Yeah, I just realized that the behaviour I observed is caused by the delay allocation mechanism of
> the particular FS.

If the file system is using delayed allocation, then can
the fiemap ioctl tell us that a file contains a hole (because nothing has been
allocated there), but read() would tell us that the file contains nonzero data at the same location
(because it's sitting in a buffer somewhere)?  If so, we'd need to do something like invoke
fdatasync() on the file before issuing the fiemap ioctl, to force allocation; or perhaps
there's another ioctl that will do the allocation without having to actually do a sync.

There's also the issue of copying from a file at the same time that some other process
is writing to it, but that is allowed to produce ill-defined behavior.  I'm more worried
about the case where some other process writes to the source file just before 'cp' starts.

(Sorry, I haven't had time yet to dive into the proposed change; I'm still trying to understand
the environment.)

One other thing: Solaris 10 supports lseek with the SEEK_HOLE and SEEK_DATA options, which
are easier to use and which (as far as I can tell from the manual) shouldn't require anything
fdatasync-ish.  Any objection if I propose support for that too?  It is supposed to work
with ZFS, something I can test here.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 11 Jun 2010 00:31:01 GMT) Full text and rfc822 format available.

Message #140 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Joel Becker <Joel.Becker <at> oracle.com>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 10 Jun 2010 17:28:35 -0700
On Thu, Jun 10, 2010 at 04:47:23PM -0700, Paul Eggert wrote:
> On 06/09/2010 11:56 PM, jeff.liu wrote:
> 
> > Yeah, I just realized that the behaviour I observed is caused by the delay allocation mechanism of
> > the particular FS.
> 
> If the file system is using delayed allocation, then can
> the fiemap ioctl tell us that a file contains a hole (because nothing has been
> allocated there), but read() would tell us that the file contains nonzero data at the same location
> (because it's sitting in a buffer somewhere)?  If so, we'd need to do something like invoke
> fdatasync() on the file before issuing the fiemap ioctl, to force allocation; or perhaps
> there's another ioctl that will do the allocation without having to actually do a sync.

	I can think of a couple things, like returning an extent record
with a zero block number: "I have an extent here, but I don't have
physical storage."  Or having a filesystem's fiemap call force the
allocation before returning.
	What say we copy linux-ext4 here?

Joel

-- 

"If you took all of the grains of sand in the world, and lined
 them up end to end in a row, you'd be working for the government!"
	- Mr. Interesting

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker <at> oracle.com
Phone: (650) 506-8127




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 11 Jun 2010 00:33:01 GMT) Full text and rfc822 format available.

Message #143 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Sunil Mushran <sunil.mushran <at> oracle.com>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: bug-coreutils <at> gnu.org, Jim Meyering <jim <at> meyering.net>,
	Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	"linux-ext4 <at> vger.kernel.org" <linux-ext4 <at> vger.kernel.org>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 10 Jun 2010 17:31:57 -0700
On 06/10/2010 04:47 PM, Paul Eggert wrote:
> On 06/09/2010 11:56 PM, jeff.liu wrote:
>    
>> Yeah, I just realized that the behaviour I observed is caused by the delay allocation mechanism of
>> the particular FS.
>>      
> If the file system is using delayed allocation, then can
> the fiemap ioctl tell us that a file contains a hole (because nothing has been
> allocated there), but read() would tell us that the file contains nonzero data at the same location
> (because it's sitting in a buffer somewhere)?  If so, we'd need to do something like invoke
> fdatasync() on the file before issuing the fiemap ioctl, to force allocation; or perhaps
> there's another ioctl that will do the allocation without having to actually do a sync.
>    

I guess we'll have to use FIEMAP_FLAG_SYNC.

> There's also the issue of copying from a file at the same time that some other process
> is writing to it, but that is allowed to produce ill-defined behavior.  I'm more worried
> about the case where some other process writes to the source file just before 'cp' starts.
>    

cp's behavior with active files is undefined. But we know it reads from
offset 0 to MAX. With fiemap it will continue to do the same with the
exception that it will skip reads (and thus writes) depending on the extent
map it gets at the very beginning.

> (Sorry, I haven't had time yet to dive into the proposed change; I'm still trying to understand
> the environment.)
>
> One other thing: Solaris 10 supports lseek with the SEEK_HOLE and SEEK_DATA options, which
> are easier to use and which (as far as I can tell from the manual) shouldn't require anything
> fdatasync-ish.  Any objection if I propose support for that too?  It is supposed to work
> with ZFS, something I can test here.
>    

There is no plan to implement SEEK_HOLE/SEEK_DATA in the kernel.
At most glibc will use fiemap to extend lseek(). BTW, SEEK_HOLE/DATA
also have the same problem with active files.

ccing linux-ext4.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 11 Jun 2010 00:51:02 GMT) Full text and rfc822 format available.

Message #146 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tao Ma <tao.ma <at> oracle.com>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 11 Jun 2010 08:35:00 +0800
Paul Eggert wrote:
> On 06/09/2010 11:56 PM, jeff.liu wrote:
>
>   
>> Yeah, I just realized that the behaviour I observed is caused by the delay allocation mechanism of
>> the particular FS.
>>     
>
> If the file system is using delayed allocation, then can
> the fiemap ioctl tell us that a file contains a hole (because nothing has been
> allocated there), but read() would tell us that the file contains nonzero data at the same location
> (because it's sitting in a buffer somewhere)?  If so, we'd need to do something like invoke
> fdatasync() on the file before issuing the fiemap ioctl, to force allocation; or perhaps
> there's another ioctl that will do the allocation without having to actually do a sync.
>   
actually there is a flag FIEMAP_EXTENT_DELALLOC in fiemap, and in this 
case, the file system
should return us a fiemap extent with this flag. I just did a simple 
test with ext4, and it looks that
it has a problem with sparse file. For a file like this:
dd if=/dev/zero of=/mnt/ext4/a bs=1M count=1
We can get a fiemap extent with DEALLOC successfully.
while with dd if=/dev/zero of=/mnt/ext4/a bs=1M count=1 seek=1
the ext4 can't return a valid fiemap extent.
Don't have time to dove into it yet.
> There's also the issue of copying from a file at the same time that some other process
> is writing to it, but that is allowed to produce ill-defined behavior.  I'm more worried
> about the case where some other process writes to the source file just before 'cp' starts.
>   
If the file system can return the right DEALLOC fiemap, there should be 
no problem for it I guess.
> (Sorry, I haven't had time yet to dive into the proposed change; I'm still trying to understand
> the environment.)
>
> One other thing: Solaris 10 supports lseek with the SEEK_HOLE and SEEK_DATA options, which
> are easier to use and which (as far as I can tell from the manual) shouldn't require anything
> fdatasync-ish.  Any objection if I propose support for that too?  It is supposed to work
> with ZFS, something I can test here.
>   
I am afraid not. There are some discussions about this long time ago. 
See this article:
http://lwn.net/Articles/260795/
So it looks that SEEK_HOLE and SEEK_DATA are not welcomed by the kernel.

Regards,
Tao





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 11 Jun 2010 08:34:02 GMT) Full text and rfc822 format available.

Message #149 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	Jim Meyering <jim <at> meyering.net>
Cc: Tao Ma <tao.ma <at> oracle.com>,
	"linux-ext4 <at> vger.kernel.org" <linux-ext4 <at> vger.kernel.org>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 11 Jun 2010 16:31:13 +0800
Sunil Mushran wrote:
> On 06/10/2010 04:47 PM, Paul Eggert wrote:
>> On 06/09/2010 11:56 PM, jeff.liu wrote:
>>   
>>> Yeah, I just realized that the behaviour I observed is caused by the
>>> delay allocation mechanism of
>>> the particular FS.
>>>      
>> If the file system is using delayed allocation, then can
>> the fiemap ioctl tell us that a file contains a hole (because nothing
>> has been
>> allocated there), but read() would tell us that the file contains
>> nonzero data at the same location
>> (because it's sitting in a buffer somewhere)?  If so, we'd need to do
>> something like invoke
>> fdatasync() on the file before issuing the fiemap ioctl, to force
>> allocation; or perhaps
>> there's another ioctl that will do the allocation without having to
>> actually do a sync.
>>    
> 
> I guess we'll have to use FIEMAP_FLAG_SYNC.
Hi Sunil,

Thanks for the comments.
So we can ensure the source file synced before mapping in this way.

Hi Jim and Paul,

How about the tiny patch below?

From d6d619a169ff68a9a310a69d8089b9fbf83b5f91 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Fri, 11 Jun 2010 16:29:02 +0800
Subject: [PATCH 1/1] copy.c: add FIEMAP_FLAG_SYNC to fiemap ioctl

* src/copy.c (fiemap_copy): Force kernel to sync the source
file before mapping.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 src/copy.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index f149be4..f48c74d 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -191,6 +191,7 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
   do
     {
       fiemap->fm_length = FIEMAP_MAX_OFFSET;
+      fiemap->fm_flags = FIEMAP_FLAG_SYNC;
       fiemap->fm_extent_count = count;

       /* When ioctl(2) fails, fall back to the normal copy only if it
-- 
1.5.4.3



Thanks,
-Jeff

> 
>> There's also the issue of copying from a file at the same time that
>> some other process
>> is writing to it, but that is allowed to produce ill-defined
>> behavior.  I'm more worried
>> about the case where some other process writes to the source file just
>> before 'cp' starts.
>>    
> 
> cp's behavior with active files is undefined. But we know it reads from
> offset 0 to MAX. With fiemap it will continue to do the same with the
> exception that it will skip reads (and thus writes) depending on the extent
> map it gets at the very beginning.
> 
>> (Sorry, I haven't had time yet to dive into the proposed change; I'm
>> still trying to understand
>> the environment.)
>>
>> One other thing: Solaris 10 supports lseek with the SEEK_HOLE and
>> SEEK_DATA options, which
>> are easier to use and which (as far as I can tell from the manual)
>> shouldn't require anything
>> fdatasync-ish.  Any objection if I propose support for that too?  It
>> is supposed to work
>> with ZFS, something I can test here.
>>    
> 
> There is no plan to implement SEEK_HOLE/SEEK_DATA in the kernel.
> At most glibc will use fiemap to extend lseek(). BTW, SEEK_HOLE/DATA
> also have the same problem with active files.
> 
> ccing linux-ext4.
> 
> 
> 


-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 11 Jun 2010 12:44:01 GMT) Full text and rfc822 format available.

Message #152 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	"linux-ext4 <at> vger.kernel.org" <linux-ext4 <at> vger.kernel.org>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 11 Jun 2010 14:38:08 +0200
jeff.liu wrote:
> Sunil Mushran wrote:
...
>> I guess we'll have to use FIEMAP_FLAG_SYNC.
> Hi Sunil,
>
> Thanks for the comments.
> So we can ensure the source file synced before mapping in this way.
>
> Hi Jim and Paul,
>
> How about the tiny patch below?
...
> Subject: [PATCH 1/1] copy.c: add FIEMAP_FLAG_SYNC to fiemap ioctl
>
> * src/copy.c (fiemap_copy): Force kernel to sync the source
> file before mapping.
>
> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
> ---
>  src/copy.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/src/copy.c b/src/copy.c
> index f149be4..f48c74d 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -191,6 +191,7 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
>    do
>      {
>        fiemap->fm_length = FIEMAP_MAX_OFFSET;
> +      fiemap->fm_flags = FIEMAP_FLAG_SYNC;
>        fiemap->fm_extent_count = count;

Thanks to both of you.
That patch looks fine, Jeff.
However, at least with ext4 and fedora 13, I require
this change to pass "make check".  Note the new entry
in the "flags" column:

    ==> ff2 <==
    Filesystem type is: ef53
    File size of j2 is 2048 (1 block, blocksize 4096)
     ext logical physical expected length flags
       0       0        0               1 unknown,delalloc,eof
    j2: 1 extent found

So I will apply the test-fixing patch first, then your change,
Jeff, and then rebase to the latest on master.

From 484903dc41246cb3c774f178f695725561b105a0 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Fri, 11 Jun 2010 14:34:03 +0200
Subject: [PATCH 1/2] tests: accommodate varying filefrag -v "flags" output

* tests/cp/sparse-fiemap: Accommodate values other than "eof"
in the "flags" column of filefrag -v output
---
 tests/cp/sparse-fiemap |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index dc0cf60..b6b1103 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -68,10 +68,11 @@ $PERL -e 1 || skip_test_ 'skipping part of this test; you lack perl'

 # Extract logical block number and length pairs from filefrag -v output.
 # The initial sed is to remove the "eof" from the normally-empty "flags" field.
+# Similarly, remove flags values like "unknown,delalloc,eof".
 # That is required when that final extent has no number in the "expected" field.
 f()
 {
-  sed 's/ eof$//' $@ \
+  sed 's/ [a-z,][a-z,]*$//' $@ \
     | awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
 }

--
1.7.1.501.g23b46




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 11 Jun 2010 14:53:03 GMT) Full text and rfc822 format available.

Message #155 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 11 Jun 2010 16:52:40 +0200
Paul Eggert wrote:
> On 06/09/2010 11:56 PM, jeff.liu wrote:
>
>> Yeah, I just realized that the behaviour I observed is caused by the delay allocation mechanism of
>> the particular FS.
>
> If the file system is using delayed allocation, then can
> the fiemap ioctl tell us that a file contains a hole (because nothing has been
> allocated there), but read() would tell us that the file contains nonzero data at the same location
> (because it's sitting in a buffer somewhere)?  If so, we'd need to do something like invoke
> fdatasync() on the file before issuing the fiemap ioctl, to force allocation; or perhaps
> there's another ioctl that will do the allocation without having to actually do a sync.
>
> There's also the issue of copying from a file at the same time that some other process
> is writing to it, but that is allowed to produce ill-defined behavior.  I'm more worried
> about the case where some other process writes to the source file just before 'cp' starts.
>
> (Sorry, I haven't had time yet to dive into the proposed change; I'm still trying to understand
> the environment.)
>
> One other thing: Solaris 10 supports lseek with the SEEK_HOLE and SEEK_DATA options, which
> are easier to use and which (as far as I can tell from the manual) shouldn't require anything
> fdatasync-ish.  Any objection if I propose support for that too?  It is supposed to work
> with ZFS, something I can test here.

Sure, having an implementation that also works on ZFS would be welcome.
I updated (non-fast-forward) the fiemap-copy branch an hour or so ago.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 11 Jun 2010 15:56:02 GMT) Full text and rfc822 format available.

Message #158 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Sandeen <sandeen <at> redhat.com>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Jim Meyering <jim <at> meyering.net>,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	"linux-ext4 <at> vger.kernel.org" <linux-ext4 <at> vger.kernel.org>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 11 Jun 2010 09:03:29 -0500
jeff.liu wrote:
> Sunil Mushran wrote:
>> On 06/10/2010 04:47 PM, Paul Eggert wrote:
>>> On 06/09/2010 11:56 PM, jeff.liu wrote:
>>>   
>>>> Yeah, I just realized that the behaviour I observed is caused by the
>>>> delay allocation mechanism of
>>>> the particular FS.
>>>>      
>>> If the file system is using delayed allocation, then can
>>> the fiemap ioctl tell us that a file contains a hole (because nothing
>>> has been
>>> allocated there), but read() would tell us that the file contains
>>> nonzero data at the same location
>>> (because it's sitting in a buffer somewhere)?  If so, we'd need to do
>>> something like invoke
>>> fdatasync() on the file before issuing the fiemap ioctl, to force
>>> allocation; or perhaps
>>> there's another ioctl that will do the allocation without having to
>>> actually do a sync.
>>>    
>> I guess we'll have to use FIEMAP_FLAG_SYNC.
> Hi Sunil,
> 
> Thanks for the comments.
> So we can ensure the source file synced before mapping in this way.
> 
> Hi Jim and Paul,
> 
> How about the tiny patch below?

I agree that this is needed, thanks.

-Eric

> From d6d619a169ff68a9a310a69d8089b9fbf83b5f91 Mon Sep 17 00:00:00 2001
> From: Jie Liu <jeff.liu <at> oracle.com>
> Date: Fri, 11 Jun 2010 16:29:02 +0800
> Subject: [PATCH 1/1] copy.c: add FIEMAP_FLAG_SYNC to fiemap ioctl
> 
> * src/copy.c (fiemap_copy): Force kernel to sync the source
> file before mapping.
> 
> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
> ---
>  src/copy.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/src/copy.c b/src/copy.c
> index f149be4..f48c74d 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -191,6 +191,7 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
>    do
>      {
>        fiemap->fm_length = FIEMAP_MAX_OFFSET;
> +      fiemap->fm_flags = FIEMAP_FLAG_SYNC;
>        fiemap->fm_extent_count = count;
> 
>        /* When ioctl(2) fails, fall back to the normal copy only if it





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sun, 13 Jun 2010 03:39:02 GMT) Full text and rfc822 format available.

Message #161 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Eric Sandeen <sandeen <at> redhat.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Jim Meyering <jim <at> meyering.net>,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	"linux-ext4 <at> vger.kernel.org" <linux-ext4 <at> vger.kernel.org>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Sun, 13 Jun 2010 11:37:18 +0800
Eric Sandeen wrote:
> jeff.liu wrote:
>> Sunil Mushran wrote:
>>> On 06/10/2010 04:47 PM, Paul Eggert wrote:
>>>> On 06/09/2010 11:56 PM, jeff.liu wrote:
>>>>   
>>>>> Yeah, I just realized that the behaviour I observed is caused by the
>>>>> delay allocation mechanism of
>>>>> the particular FS.
>>>>>      
>>>> If the file system is using delayed allocation, then can
>>>> the fiemap ioctl tell us that a file contains a hole (because nothing
>>>> has been
>>>> allocated there), but read() would tell us that the file contains
>>>> nonzero data at the same location
>>>> (because it's sitting in a buffer somewhere)?  If so, we'd need to do
>>>> something like invoke
>>>> fdatasync() on the file before issuing the fiemap ioctl, to force
>>>> allocation; or perhaps
>>>> there's another ioctl that will do the allocation without having to
>>>> actually do a sync.
>>>>    
>>> I guess we'll have to use FIEMAP_FLAG_SYNC.
>> Hi Sunil,
>>
>> Thanks for the comments.
>> So we can ensure the source file synced before mapping in this way.
>>
>> Hi Jim and Paul,
>>
>> How about the tiny patch below?
> 
> I agree that this is needed, thanks.
Thanks for your confirming response.

Regards,
-Jeff
> 
> -Eric
> 
>> From d6d619a169ff68a9a310a69d8089b9fbf83b5f91 Mon Sep 17 00:00:00 2001
>> From: Jie Liu <jeff.liu <at> oracle.com>
>> Date: Fri, 11 Jun 2010 16:29:02 +0800
>> Subject: [PATCH 1/1] copy.c: add FIEMAP_FLAG_SYNC to fiemap ioctl
>>
>> * src/copy.c (fiemap_copy): Force kernel to sync the source
>> file before mapping.
>>
>> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
>> ---
>>  src/copy.c |    1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/src/copy.c b/src/copy.c
>> index f149be4..f48c74d 100644
>> --- a/src/copy.c
>> +++ b/src/copy.c
>> @@ -191,6 +191,7 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
>>    do
>>      {
>>        fiemap->fm_length = FIEMAP_MAX_OFFSET;
>> +      fiemap->fm_flags = FIEMAP_FLAG_SYNC;
>>        fiemap->fm_extent_count = count;
>>
>>        /* When ioctl(2) fails, fall back to the normal copy only if it
> 
> 
> 
> 


-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sun, 13 Jun 2010 14:55:01 GMT) Full text and rfc822 format available.

Message #164 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Sun, 13 Jun 2010 16:53:56 +0200
Jim Meyering wrote:
...
> So I will apply the test-fixing patch first, then your change,
> Jeff, and then rebase to the latest on master.

FYI, I did that.  In addition, I needed two more
changes in order to avoid "make distcheck" failure:

From be5548445e90a36ab5018cac0fb19f2498d0521c Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sun, 13 Jun 2010 16:19:29 +0200
Subject: [PATCH 1/2] build: distribute new file, fiemap.h

* src/Makefile.am (noinst_HEADERS): Add fiemap.h.
---
 src/Makefile.am |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/src/Makefile.am b/src/Makefile.am
index 0630a06..7d56312 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -145,6 +145,7 @@ noinst_HEADERS =	\
   copy.h		\
   cp-hash.h		\
   dircolors.h		\
+  fiemap.h		\
   fs.h			\
   group-list.h		\
   ls.h			\
--
1.7.1.511.g2f531


From f25181d32c40f82ee26dea6de6b7f4b385352a14 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sun, 13 Jun 2010 16:34:42 +0200
Subject: [PATCH 2/2] build: distribute new test script, filefrag-extent-compare

* tests/Makefile.am (EXTRA_DIST): Add filefrag-extent-compare.
---
 tests/Makefile.am |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/tests/Makefile.am b/tests/Makefile.am
index f7840c8..61ccf01 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -10,6 +10,7 @@ EXTRA_DIST =		\
   CuTmpdir.pm		\
   check.mk		\
   envvar-check		\
+  filefrag-extent-compare \
   init.sh		\
   lang-default		\
   other-fs-tmpdir	\
--
1.7.1.511.g2f531




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 15 Jun 2010 21:10:02 GMT) Full text and rfc822 format available.

Message #167 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	"linux-ext4 <at> vger.kernel.org" <linux-ext4 <at> vger.kernel.org>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 15 Jun 2010 14:09:22 -0700
On 06/11/2010 01:31 AM, jeff.liu wrote:

> +      fiemap->fm_flags = FIEMAP_FLAG_SYNC;

If I'm reading the Linux source code correctly,
this forces all the dirty blocks in the input file to disk, and
forces the kernel to wait until all those blocks actually hit the disk.
We don't need that; there shouldn't be a need to force any
blocks to hit the disk.  All that 'cp' needs to know is: please tell me
the next block which might contain nonzero data.  Is there some way
that we can get that information without forcing the input file's
blocks to disk?




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 15 Jun 2010 21:12:02 GMT) Full text and rfc822 format available.

Message #170 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: Sunil Mushran <sunil.mushran <at> oracle.com>
Cc: bug-coreutils <at> gnu.org, Jim Meyering <jim <at> meyering.net>,
	Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	"linux-ext4 <at> vger.kernel.org" <linux-ext4 <at> vger.kernel.org>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 15 Jun 2010 14:11:01 -0700
Sunil Mushran wrote:

> SEEK_HOLE/DATA also have the same problem with active files.

Yes, that's true if 'cp' is copying a file while someone else is writing to it.
But the case we're worried about is when 'cp' starts copying a file immediately
after someone else has finished writing to it but data has not been sent to
disk; in that case SEEK_HOLE/DATA should work just as well as the fiemap ioctl.





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 15 Jun 2010 21:13:02 GMT) Full text and rfc822 format available.

Message #173 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: Tao Ma <tao.ma <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 15 Jun 2010 14:11:53 -0700
On 06/10/2010 05:35 PM, Tao Ma wrote:

> there is a flag FIEMAP_EXTENT_DELALLOC in fiemap ...
> with dd if=/dev/zero of=/mnt/ext4/a bs=1M count=1 seek=1
> the ext4 can't return a valid fiemap extent. 

Hmm, this sounds like a fairly serious bug, in that it would prevent this
part of cp from working.  What does the fiemap ioctl return in this buggy case?
Is there some way that cp can detect the bug, and report it to the user,
or work around it?  Or should we just assume that the bug will get fixed
and that cp shouldn't worry about it?




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 15 Jun 2010 21:29:01 GMT) Full text and rfc822 format available.

Message #176 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: Tao Ma <tao.ma <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 15 Jun 2010 14:28:43 -0700
I looked at the proposed fiemap patch for cp and see another issue
with it, which needs to be thought through more carefully.

The proposed fiemap code tries to copy the _physical_ holes in the
source file.  But currently cp copies the _logical_ holes: that is, it
skips over writing (and thus attempts to create physical holes)
whenever the source file contains a block of zeros, regardless of
whether that block of zeros is physically stored in the file system.
The current cp behavior is often the desired behavior (particularly
for cp --sparse=always) and the fiemap code shouldn't alter that.

For example, if a fiemap_extent has the FIEMAP_EXTENT_UNWRITTEN flag
set, cp should treat that as a hole, because the extent is all zeros.
(This will greatly help performance in some cases.)  Also, if an input
extent is read and a block of it is found to be zeros, cp should skip
over that block when writing.

If the proposed fiemap code was intended to copy the physical holes,
then the code wasn't successful in implementing the intent, as there's
no guarantee that the extents of the destination are the same as the
extents of the source.  If it is useful to have 'cp' copy the physical
holes, this could be supported via a new cp option or options, but
surely the default should continue to be to copy logical holes.





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 15 Jun 2010 21:44:02 GMT) Full text and rfc822 format available.

Message #179 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 15 Jun 2010 23:43:25 +0200
Paul Eggert wrote:
> I looked at the proposed fiemap patch for cp and see another issue
> with it, which needs to be thought through more carefully.
>
> The proposed fiemap code tries to copy the _physical_ holes in the
> source file.  But currently cp copies the _logical_ holes: that is, it
> skips over writing (and thus attempts to create physical holes)
> whenever the source file contains a block of zeros, regardless of
> whether that block of zeros is physically stored in the file system.
> The current cp behavior is often the desired behavior (particularly
> for cp --sparse=always) and the fiemap code shouldn't alter that.
>
> For example, if a fiemap_extent has the FIEMAP_EXTENT_UNWRITTEN flag
> set, cp should treat that as a hole, because the extent is all zeros.
> (This will greatly help performance in some cases.)  Also, if an input
> extent is read and a block of it is found to be zeros, cp should skip
> over that block when writing.
>
> If the proposed fiemap code was intended to copy the physical holes,
> then the code wasn't successful in implementing the intent, as there's
> no guarantee that the extents of the destination are the same as the
> extents of the source.  If it is useful to have 'cp' copy the physical
> holes, this could be supported via a new cp option or options, but
> surely the default should continue to be to copy logical holes.

Good point!

I think that copying physical holes via FIEMAP should be the default, when
possible.  One problem is that the current code on the fiemap-copy branch
does not honor --sparse=WHEN when in fiemap-copying mode.  The solution
would seem to be to change the regular-file-copying loop in the fiemap_copy
function to use the same hole-preserving code that is used in copy_reg.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 15 Jun 2010 22:13:01 GMT) Full text and rfc822 format available.

Message #182 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Joel Becker <Joel.Becker <at> oracle.com>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 15 Jun 2010 15:10:19 -0700
On Tue, Jun 15, 2010 at 02:28:43PM -0700, Paul Eggert wrote:
> I looked at the proposed fiemap patch for cp and see another issue
> with it, which needs to be thought through more carefully.
> 
> The proposed fiemap code tries to copy the _physical_ holes in the
> source file.  But currently cp copies the _logical_ holes: that is, it
> skips over writing (and thus attempts to create physical holes)
> whenever the source file contains a block of zeros, regardless of
> whether that block of zeros is physically stored in the file system.
> The current cp behavior is often the desired behavior (particularly
> for cp --sparse=always) and the fiemap code shouldn't alter that.
> 
> For example, if a fiemap_extent has the FIEMAP_EXTENT_UNWRITTEN flag
> set, cp should treat that as a hole, because the extent is all zeros.
> (This will greatly help performance in some cases.)  Also, if an input
> extent is read and a block of it is found to be zeros, cp should skip
> over that block when writing.

	I pretty much agree with everything you wrote above.  Fiemap is
introduced into cp(1) to optimize the existing --sparse behavior.  So we
should be holding to that behavior.
	In fact, I'll go one further.  While I haven't looked at the
actual cp code in a little bit, I do not think that fiemap is used for
the non-sparse copy.  I contend that fiemap should be used for all
copies if the capability exists.  Even if cp is doing a non-sparse copy,
there's no need for it to call read(2) on a 1GB hole in the file.  It
can write zeros just fine on its own.
	Basically, these capabilities (fiemap, SEEK_HOLE) are ways to
avoid reading zeros we can determine without reading and scanning.  We
should always take advantage of that.

> If the proposed fiemap code was intended to copy the physical holes,
> then the code wasn't successful in implementing the intent, as there's
> no guarantee that the extents of the destination are the same as the
> extents of the source.  If it is useful to have 'cp' copy the physical
> holes, this could be supported via a new cp option or options, but
> surely the default should continue to be to copy logical holes.

	The goal was not physical holes, it was logical ones as you
surmise.  I can't see any way to ensure the file has identical extents.
I don't know any filesystem that guarantees such allocations on
write(2).
	I do think there might be an interesting performance
optimization to be had, but it is independent of fiemap.  Above, you
describe the usual "cp --sparse" behavior of reading allocated data and
checking for zeroes.  If you're optimizing for space usage -- what "cp
--sparse" is defined to do -- you definitely want to skip writing blocks
that are all zero.  But it actually creates a crappier allocation and
I/O pattern for some files.
	Imagine you have a data extent that is alternating blocks of
zeroes and ones:

hexdump -C /tmp/blk
00000000  01 01 01 01 01 01 01 01  01 01 01 01 01 01 01 01 |................|
*
00001000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
*
00002000  01 01 01 01 01 01 01 01  01 01 01 01 01 01 01 01 |................|
*
00003000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|
*
00004000
... (continued for 4MB in this example)
00400000

Classic "cp --sparse" will spend CPU time scanning this extent, which
can be megabytes in size.  It will then write out every other block,
which is terribly slow, and it leavs a horribly fragmented destination
file.  The result is even more disastrous if those holes ever get filled
in.
	I'm, of course, exaggerating a worst-case here.  But folks
looking at I/O performance would love detection like "here's a linear
MB, sure it has some zero blocks, but it's faster to write the entire MB
than to chop it up."

Joel

-- 

Life's Little Instruction Book #456

	"Send your loved one flowers.  Think of a reason later."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker <at> oracle.com
Phone: (650) 506-8127




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 15 Jun 2010 22:20:03 GMT) Full text and rfc822 format available.

Message #185 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 15 Jun 2010 15:19:13 -0700
On 06/15/2010 02:43 PM, Jim Meyering wrote:

> I think that copying physical holes via FIEMAP should be the default, when
> possible.  One problem is that the current code on the fiemap-copy branch
> does not honor --sparse=WHEN when in fiemap-copying mode.  The solution
> would seem to be to change the regular-file-copying loop in the fiemap_copy
> function to use the same hole-preserving code that is used in copy_reg.

I assume that the solution would be used only with cp --sparse=always?
(Otherwise, it would amount to copying logical holes by default.)

If so, under this proposal, --sparse=always would copy logical holes,
--sparse=never would never copy holes, and --sparse=auto (the default)
would copy physical holes if supported, else it would copy logical
holes if the file seems to have enough physical holes, and otherwise
it would copy no holes.  (Whew!)

But this raises another issue.  Under this proposal, the default behavior
copies physical _holes_, but it doesn't copy physical _extents_.
For example, suppose the input file has a single 128 MiB extent.  The
proposal could create an output file with two 64-MiB extents
that are logically contiguous.  This is because the code does not advise the
output file about the extents that the input file had, and the operating
system might assign a smaller output extent at first, discovering only too late
that more space was needed.

If the intent is to copy the physical extents, then the proposed code
needs to be fixed so that it uses fallocate() to attempt to create the
same extents in the destination that existed in the source.  Obviously
this could fail for any number of reasons, for example, the destination
file system might be different from the source; but the goal would be
to preserve extents if the underlying OS is "nice" about implementing
fallocate().

Which raises another question: should cp attempt to create _better_
extents in the destination than in the source?  For example, if
the source contains two logically-adjacent 64-MiB extents, should the
destination contain just one 128-MiB extent?  I can see where that
might be worthwile.

This all is getting a bit complicated, I'm afraid.  Perhaps it'd be better
to try this stuff out with a new option, "cp --sparse=extents" say, and
keep the default as-is for a while?




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 15 Jun 2010 22:31:01 GMT) Full text and rfc822 format available.

Message #188 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Sunil Mushran <sunil.mushran <at> oracle.com>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: bug-coreutils <at> gnu.org, Jim Meyering <jim <at> meyering.net>,
	Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 15 Jun 2010 15:30:15 -0700
On 06/15/2010 03:19 PM, Paul Eggert wrote:
> On 06/15/2010 02:43 PM, Jim Meyering wrote:
>
>    
>> I think that copying physical holes via FIEMAP should be the default, when
>> possible.  One problem is that the current code on the fiemap-copy branch
>> does not honor --sparse=WHEN when in fiemap-copying mode.  The solution
>> would seem to be to change the regular-file-copying loop in the fiemap_copy
>> function to use the same hole-preserving code that is used in copy_reg.
>>      
> I assume that the solution would be used only with cp --sparse=always?
> (Otherwise, it would amount to copying logical holes by default.)
>
> If so, under this proposal, --sparse=always would copy logical holes,
> --sparse=never would never copy holes, and --sparse=auto (the default)
> would copy physical holes if supported, else it would copy logical
> holes if the file seems to have enough physical holes, and otherwise
> it would copy no holes.  (Whew!)
>
> But this raises another issue.  Under this proposal, the default behavior
> copies physical _holes_, but it doesn't copy physical _extents_.
> For example, suppose the input file has a single 128 MiB extent.  The
> proposal could create an output file with two 64-MiB extents
> that are logically contiguous.  This is because the code does not advise the
> output file about the extents that the input file had, and the operating
> system might assign a smaller output extent at first, discovering only too late
> that more space was needed.
>
> If the intent is to copy the physical extents, then the proposed code
> needs to be fixed so that it uses fallocate() to attempt to create the
> same extents in the destination that existed in the source.  Obviously
> this could fail for any number of reasons, for example, the destination
> file system might be different from the source; but the goal would be
> to preserve extents if the underlying OS is "nice" about implementing
> fallocate().
>
> Which raises another question: should cp attempt to create _better_
> extents in the destination than in the source?  For example, if
> the source contains two logically-adjacent 64-MiB extents, should the
> destination contain just one 128-MiB extent?  I can see where that
> might be worthwile.
>
> This all is getting a bit complicated, I'm afraid.  Perhaps it'd be better
> to try this stuff out with a new option, "cp --sparse=extents" say, and
> keep the default as-is for a while?
>    

Expecting to control the number of extents in a file in any file system
is expecting too much of the fs. No file system guarantees that.
fallocate() only allocates the space. It does not give any guarantees
for the number of extents.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 15 Jun 2010 22:43:01 GMT) Full text and rfc822 format available.

Message #191 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Joel Becker <Joel.Becker <at> oracle.com>
To: Sunil Mushran <sunil.mushran <at> oracle.com>
Cc: Paul Eggert <eggert <at> CS.UCLA.EDU>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 15 Jun 2010 15:41:10 -0700
On Tue, Jun 15, 2010 at 03:30:15PM -0700, Sunil Mushran wrote:
> On 06/15/2010 03:19 PM, Paul Eggert wrote:
> >This all is getting a bit complicated, I'm afraid.  Perhaps it'd be better
> >to try this stuff out with a new option, "cp --sparse=extents" say, and
> >keep the default as-is for a while?
> 
> Expecting to control the number of extents in a file in any file system
> is expecting too much of the fs. No file system guarantees that.
> fallocate() only allocates the space. It does not give any guarantees
> for the number of extents.

	I don't think it's ever interesting to try and mimic the source
file's extent pattern.  Really, what's the point?  I have done a lot of
administration in my time, and I've worked with a lot of people.  I've
never needed nor heard anyone ask for this capability.
	What people want is "--sparse=always, but not slow".  Either
they know their input file has 100MB holes, and they want their
destination to have 100MB holes or they have a 100MB file that has few
zeros, and they wouldn't mind those zeros being unallocated.  What they
really don't want is to wait while the CPU churns figuring this out.
When people choose "--sparse=auto", they don't mean they want
non-sparse files; they just want to avoid lots of work for a file that
isn't very sparse.  With fiemap, cp can give them the best of both
worlds.

Joel

-- 

"You must remember this:
 A kiss is just a kiss,
 A sigh is just a sigh.
 The fundamental rules apply
 As time goes by."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker <at> oracle.com
Phone: (650) 506-8127




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 15 Jun 2010 23:25:02 GMT) Full text and rfc822 format available.

Message #194 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 16 Jun 2010 00:22:09 +0100
On 15/06/10 23:19, Paul Eggert wrote:
> On 06/15/2010 02:43 PM, Jim Meyering wrote:
> 
>> I think that copying physical holes via FIEMAP should be the default, when
>> possible.  One problem is that the current code on the fiemap-copy branch
>> does not honor --sparse=WHEN when in fiemap-copying mode.  The solution
>> would seem to be to change the regular-file-copying loop in the fiemap_copy
>> function to use the same hole-preserving code that is used in copy_reg.
> 
> I assume that the solution would be used only with cp --sparse=always?
> (Otherwise, it would amount to copying logical holes by default.)
> 
> If so, under this proposal, --sparse=always would copy logical holes,
> --sparse=never would never copy holes, and --sparse=auto (the default)
> would copy physical holes if supported, else it would copy logical
> holes if the file seems to have enough physical holes, and otherwise
> it would copy no holes.  (Whew!)
> 
> But this raises another issue.  Under this proposal, the default behavior
> copies physical _holes_, but it doesn't copy physical _extents_.
> For example, suppose the input file has a single 128 MiB extent.  The
> proposal could create an output file with two 64-MiB extents
> that are logically contiguous.  This is because the code does not advise the
> output file about the extents that the input file had, and the operating
> system might assign a smaller output extent at first, discovering only too late
> that more space was needed.
> 
> If the intent is to copy the physical extents, then the proposed code
> needs to be fixed so that it uses fallocate() to attempt to create the
> same extents in the destination that existed in the source.  Obviously
> this could fail for any number of reasons, for example, the destination
> file system might be different from the source; but the goal would be
> to preserve extents if the underlying OS is "nice" about implementing
> fallocate().
> 
> Which raises another question: should cp attempt to create _better_
> extents in the destination than in the source?  For example, if
> the source contains two logically-adjacent 64-MiB extents, should the
> destination contain just one 128-MiB extent?  I can see where that
> might be worthwile.
> 
> This all is getting a bit complicated, I'm afraid.  Perhaps it'd be better
> to try this stuff out with a new option, "cp --sparse=extents" say, and
> keep the default as-is for a while?

Well I previously had a patch to call fallocate() to alloc the largest
extents possible, which I think is sensible to do by default.
I didn't apply it as I was waiting for a change or even a comment
on the weird fallocate() interface (which we can forget about this stage).
http://lists.gnu.org/archive/html/bug-coreutils/2009-05/msg00260.html

cheers,
Pádraig.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 16 Jun 2010 05:47:02 GMT) Full text and rfc822 format available.

Message #197 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tao Ma <tao.ma <at> oracle.com>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 16 Jun 2010 13:45:41 +0800
Hi Paul,

On 06/16/2010 05:11 AM, Paul Eggert wrote:
> On 06/10/2010 05:35 PM, Tao Ma wrote:
>
>> there is a flag FIEMAP_EXTENT_DELALLOC in fiemap ...
>> with dd if=/dev/zero of=/mnt/ext4/a bs=1M count=1 seek=1
>> the ext4 can't return a valid fiemap extent.
>
> Hmm, this sounds like a fairly serious bug, in that it would prevent this
> part of cp from working.  What does the fiemap ioctl return in this buggy case?
> Is there some way that cp can detect the bug, and report it to the user,
> or work around it?  Or should we just assume that the bug will get fixed
> and that cp shouldn't worry about it?
I have prompted this to the linux ext4 
community(http://lkml.org/lkml/2010/6/10/412), they are working on it 
now I guess.

Regards,
Tao




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 16 Jun 2010 06:46:01 GMT) Full text and rfc822 format available.

Message #200 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 16 Jun 2010 08:45:08 +0200
Paul Eggert wrote:
> On 06/15/2010 02:43 PM, Jim Meyering wrote:
>> I think that copying physical holes via FIEMAP should be the default, when
>> possible.  One problem is that the current code on the fiemap-copy branch
>> does not honor --sparse=WHEN when in fiemap-copying mode.  The solution
>> would seem to be to change the regular-file-copying loop in the fiemap_copy
>> function to use the same hole-preserving code that is used in copy_reg.
>
> I assume that the solution would be used only with cp --sparse=always?
> (Otherwise, it would amount to copying logical holes by default.)

Right.
I.e., use the same code that honors (or not) the "make_holes" variable.

> If so, under this proposal, --sparse=always would copy logical holes,
> --sparse=never would never copy holes, and --sparse=auto (the default)
> would copy physical holes if supported, else it would copy logical
> holes if the file seems to have enough physical holes, and otherwise
> it would copy no holes.  (Whew!)

Right.  With --sparse=always and fiemap support, it would take advantage
of existing extents to minimize copying time, and for the nontrivial
extents, it would detect/induce new holes when possible.

Perhaps we need a new logical option that would make a difference
only when there are nontrivial fiemap extents:
  - read nontrivial extents, as is done now
  - read them, *and* search-for/induce-holes as we do in legacy
      copy for --sparse=always.

Or, putting it another way, perhaps we need a new command line
option to control whether we even attempt a fiemap copy.
IMHO the default should be to enable it, once all of the
underlying bits are deemed to be stable enough.

    --fiemap
    --no-fiemap

Then, --fiemap --sparse=never would do what the existing fiemap_copy
function does, and --fiemap --sparse=always would work once the
internal-to-fiemap_copy copying code is adjusted to use the
hole-preserving code in copy_reg.

Then, to guarantee the legacy --sparse=never behavior,
one would have to specify --no-fiemap --sparse=never

> But this raises another issue.  Under this proposal, the default behavior
> copies physical _holes_, but it doesn't copy physical _extents_.
> For example, suppose the input file has a single 128 MiB extent.

Yes, I saw that already, in testing, even with relatively
simply structured inputs.  That's why the test script had to
be adjusted to accommodate differences in extent layout,
effectively merging split extents when comparing src and dest
filefrag -v output.

> The
> proposal could create an output file with two 64-MiB extents
> that are logically contiguous.  This is because the code does not advise the
> output file about the extents that the input file had, and the operating
> system might assign a smaller output extent at first, discovering only too late
> that more space was needed.
>
> If the intent is to copy the physical extents, then the proposed code

That appears to be infeasible.

> needs to be fixed so that it uses fallocate() to attempt to create the
> same extents in the destination that existed in the source.  Obviously
> this could fail for any number of reasons, for example, the destination
> file system might be different from the source; but the goal would be
> to preserve extents if the underlying OS is "nice" about implementing
> fallocate().
>
> Which raises another question: should cp attempt to create _better_
> extents in the destination than in the source?  For example, if

It's already great that most of this functionality
appears to be usable on four(!) file system types.
I don't want to push our luck.

> This all is getting a bit complicated, I'm afraid.  Perhaps it'd be better
> to try this stuff out with a new option, "cp --sparse=extents" say, and
> keep the default as-is for a while?

I think we can keep it conceptually simple enough
that it will be safe to make fiemap copying the default.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 16 Jun 2010 07:00:03 GMT) Full text and rfc822 format available.

Message #203 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 16 Jun 2010 14:57:01 +0800
Paul Eggert wrote:
> I looked at the proposed fiemap patch for cp and see another issue
> with it, which needs to be thought through more carefully.
> 
> The proposed fiemap code tries to copy the _physical_ holes in the
> source file.  But currently cp copies the _logical_ holes: that is, it
> skips over writing (and thus attempts to create physical holes)
> whenever the source file contains a block of zeros, regardless of
> whether that block of zeros is physically stored in the file system.
> The current cp behavior is often the desired behavior (particularly
> for cp --sparse=always) and the fiemap code shouldn't alter that.
> 
> For example, if a fiemap_extent has the FIEMAP_EXTENT_UNWRITTEN flag
> set, cp should treat that as a hole, because the extent is all zeros.
> (This will greatly help performance in some cases.)  Also, if an input
> extent is read and a block of it is found to be zeros, cp should skip
> over that block when writing.
If FIEMAP_EXTENT_UNWRITTEN flag is set, we can call fallocate(2) against the dest file directly for
the performance boost.

Thanks,
-Jeff
> 
> If the proposed fiemap code was intended to copy the physical holes,
> then the code wasn't successful in implementing the intent, as there's
> no guarantee that the extents of the destination are the same as the
> extents of the source.  If it is useful to have 'cp' copy the physical
> holes, this could be supported via a new cp option or options, but
> surely the default should continue to be to copy logical holes.
> 


-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 16 Jun 2010 07:24:02 GMT) Full text and rfc822 format available.

Message #206 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 16 Jun 2010 15:22:08 +0800
Tao Ma wrote:
> Hi Paul,
> 
> On 06/16/2010 05:11 AM, Paul Eggert wrote:
>> On 06/10/2010 05:35 PM, Tao Ma wrote:
>>
>>> there is a flag FIEMAP_EXTENT_DELALLOC in fiemap ...
>>> with dd if=/dev/zero of=/mnt/ext4/a bs=1M count=1 seek=1
>>> the ext4 can't return a valid fiemap extent.
>>
>> Hmm, this sounds like a fairly serious bug, in that it would prevent this
>> part of cp from working.  What does the fiemap ioctl return in this
>> buggy case?
In this case, fiemap ioctl returns successfully, but 'fm_mapped_extents' will set to ZERO to
indicate there is no extents allocated for the file.
>> Is there some way that cp can detect the bug, and report it to the user,
>> or work around it?
work around is add FIEMAP_FLAG_SYNC to fiemap ioctl at the moment.

Thanks,
-Jeff
Or should we just assume that the bug will get fixed
>> and that cp shouldn't worry about it?
> I have prompted this to the linux ext4
> community(http://lkml.org/lkml/2010/6/10/412), they are working on it
> now I guess.
> 
> Regards,
> Tao


-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 16 Jun 2010 08:33:01 GMT) Full text and rfc822 format available.

Message #209 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 16 Jun 2010 09:31:31 +0100
On 16/06/10 07:45, Jim Meyering wrote:
> Paul Eggert wrote:
>> On 06/15/2010 02:43 PM, Jim Meyering wrote:
>>> I think that copying physical holes via FIEMAP should be the default, when
>>> possible.  One problem is that the current code on the fiemap-copy branch
>>> does not honor --sparse=WHEN when in fiemap-copying mode.  The solution
>>> would seem to be to change the regular-file-copying loop in the fiemap_copy
>>> function to use the same hole-preserving code that is used in copy_reg.
>>
>> I assume that the solution would be used only with cp --sparse=always?
>> (Otherwise, it would amount to copying logical holes by default.)
> 
> Right.
> I.e., use the same code that honors (or not) the "make_holes" variable.
> 
>> If so, under this proposal, --sparse=always would copy logical holes,
>> --sparse=never would never copy holes, and --sparse=auto (the default)
>> would copy physical holes if supported, else it would copy logical
>> holes if the file seems to have enough physical holes, and otherwise
>> it would copy no holes.  (Whew!)
> 
> Right.  With --sparse=always and fiemap support, it would take advantage
> of existing extents to minimize copying time, and for the nontrivial
> extents, it would detect/induce new holes when possible.
> 
> Perhaps we need a new logical option that would make a difference
> only when there are nontrivial fiemap extents:
>   - read nontrivial extents, as is done now
>   - read them, *and* search-for/induce-holes as we do in legacy
>       copy for --sparse=always.
> 
> Or, putting it another way, perhaps we need a new command line
> option to control whether we even attempt a fiemap copy.
> IMHO the default should be to enable it, once all of the
> underlying bits are deemed to be stable enough.
> 
>     --fiemap
>     --no-fiemap
> 
> Then, --fiemap --sparse=never would do what the existing fiemap_copy
> function does, and --fiemap --sparse=always would work once the
> internal-to-fiemap_copy copying code is adjusted to use the
> hole-preserving code in copy_reg.
> 
> Then, to guarantee the legacy --sparse=never behavior,
> one would have to specify --no-fiemap --sparse=never

I would suggest not to add options like this,
and only do what's possible without new options
as it would push too many implementation details
to the docs/users IMHO.

cheers,
Pádraig.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 16 Jun 2010 08:50:03 GMT) Full text and rfc822 format available.

Message #212 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 16 Jun 2010 10:49:08 +0200
Pádraig Brady wrote:

> On 16/06/10 07:45, Jim Meyering wrote:
>> Paul Eggert wrote:
>>> On 06/15/2010 02:43 PM, Jim Meyering wrote:
>>>> I think that copying physical holes via FIEMAP should be the default, when
>>>> possible.  One problem is that the current code on the fiemap-copy branch
>>>> does not honor --sparse=WHEN when in fiemap-copying mode.  The solution
>>>> would seem to be to change the regular-file-copying loop in the fiemap_copy
>>>> function to use the same hole-preserving code that is used in copy_reg.
>>>
>>> I assume that the solution would be used only with cp --sparse=always?
>>> (Otherwise, it would amount to copying logical holes by default.)
>>
>> Right.
>> I.e., use the same code that honors (or not) the "make_holes" variable.
>>
>>> If so, under this proposal, --sparse=always would copy logical holes,
>>> --sparse=never would never copy holes, and --sparse=auto (the default)
>>> would copy physical holes if supported, else it would copy logical
>>> holes if the file seems to have enough physical holes, and otherwise
>>> it would copy no holes.  (Whew!)
>>
>> Right.  With --sparse=always and fiemap support, it would take advantage
>> of existing extents to minimize copying time, and for the nontrivial
>> extents, it would detect/induce new holes when possible.
>>
>> Perhaps we need a new logical option that would make a difference
>> only when there are nontrivial fiemap extents:
>>   - read nontrivial extents, as is done now
>>   - read them, *and* search-for/induce-holes as we do in legacy
>>       copy for --sparse=always.
>>
>> Or, putting it another way, perhaps we need a new command line
>> option to control whether we even attempt a fiemap copy.
>> IMHO the default should be to enable it, once all of the
>> underlying bits are deemed to be stable enough.
>>
>>     --fiemap
>>     --no-fiemap
>>
>> Then, --fiemap --sparse=never would do what the existing fiemap_copy
>> function does, and --fiemap --sparse=always would work once the
>> internal-to-fiemap_copy copying code is adjusted to use the
>> hole-preserving code in copy_reg.
>>
>> Then, to guarantee the legacy --sparse=never behavior,
>> one would have to specify --no-fiemap --sparse=never
>
> I would suggest not to add options like this,
> and only do what's possible without new options
> as it would push too many implementation details
> to the docs/users IMHO.

Good point.  Smaller/simpler would be better.
It'd be great if we can eliminate as never-useful one of the
new combinations.

Currently on the fiemap-copy branch, once we get a single
successful ioctl, the code will not bother with the usual
hole-detecting/introducing technique implied by --sparse=always.

Should it do that ever?  Always?
Does this need an option?




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 16 Jun 2010 09:05:02 GMT) Full text and rfc822 format available.

Message #215 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Joel Becker <Joel.Becker <at> oracle.com>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Jim Meyering <jim <at> meyering.net>,
	Tao Ma <tao.ma <at> oracle.com>, Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 16 Jun 2010 02:03:42 -0700
On Wed, Jun 16, 2010 at 02:57:01PM +0800, jeff.liu wrote:
> Paul Eggert wrote:
> > For example, if a fiemap_extent has the FIEMAP_EXTENT_UNWRITTEN flag
> > set, cp should treat that as a hole, because the extent is all zeros.
> > (This will greatly help performance in some cases.)  Also, if an input
> > extent is read and a block of it is found to be zeros, cp should skip
> > over that block when writing.
> If FIEMAP_EXTENT_UNWRITTEN flag is set, we can call fallocate(2) against the dest file directly for
> the performance boost.

	Nope.  An UNWRITTEN extent is a hole, and the user asked for
holes.  If cp sees an UNWRITTEN extent it is skipped.

Joel

-- 

"Friends may come and go, but enemies accumulate." 
        - Thomas Jones

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker <at> oracle.com
Phone: (650) 506-8127




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 16 Jun 2010 09:05:02 GMT) Full text and rfc822 format available.

Message #218 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Joel Becker <Joel.Becker <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, "jeff.liu" <jeff.liu <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 16 Jun 2010 02:02:26 -0700
On Wed, Jun 16, 2010 at 10:49:08AM +0200, Jim Meyering wrote:
> > I would suggest not to add options like this,
> > and only do what's possible without new options
> > as it would push too many implementation details
> > to the docs/users IMHO.
> 
> Currently on the fiemap-copy branch, once we get a single
> successful ioctl, the code will not bother with the usual
> hole-detecting/introducing technique implied by --sparse=always.
> 
> Should it do that ever?  Always?
> Does this need an option?

	There should be no new or modified options.
	I think the code should, at all times and in all modes, attempt
to get an fiemap of the source file.  If cp is in --sparse=never mode,
it can use the mapping to plan its source file reads.  If it is in
--sparse=auto, it can always skip holes and unwritten extents, while
using the original --sparse=auto code to determine whether to scan
allocated extents for zeros.
	With this map, cp will never read holes or unwritten extents;
--sparse=never mode can write out the zeros without reading them.
	Calling the ioctl will never hurt the cp operation.  If the
ioctl fails, cp can use all of its original heuristics and copy code.

Joel

-- 

"Copy from one, it's plagiarism; copy from two, it's research."
        - Wilson Mizner

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker <at> oracle.com
Phone: (650) 506-8127




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 16 Jun 2010 09:29:01 GMT) Full text and rfc822 format available.

Message #221 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 16 Jun 2010 10:26:40 +0100
On 16/06/10 09:49, Jim Meyering wrote:
> Pádraig Brady wrote:
> 
>> On 16/06/10 07:45, Jim Meyering wrote:
>>> Paul Eggert wrote:
>>>> On 06/15/2010 02:43 PM, Jim Meyering wrote:
>>>>> I think that copying physical holes via FIEMAP should be the default, when
>>>>> possible.  One problem is that the current code on the fiemap-copy branch
>>>>> does not honor --sparse=WHEN when in fiemap-copying mode.  The solution
>>>>> would seem to be to change the regular-file-copying loop in the fiemap_copy
>>>>> function to use the same hole-preserving code that is used in copy_reg.
>>>>
>>>> I assume that the solution would be used only with cp --sparse=always?
>>>> (Otherwise, it would amount to copying logical holes by default.)
>>>
>>> Right.
>>> I.e., use the same code that honors (or not) the "make_holes" variable.
>>>
>>>> If so, under this proposal, --sparse=always would copy logical holes,
>>>> --sparse=never would never copy holes, and --sparse=auto (the default)
>>>> would copy physical holes if supported, else it would copy logical
>>>> holes if the file seems to have enough physical holes, and otherwise
>>>> it would copy no holes.  (Whew!)
>>>
>>> Right.  With --sparse=always and fiemap support, it would take advantage
>>> of existing extents to minimize copying time, and for the nontrivial
>>> extents, it would detect/induce new holes when possible.
>>>
>>> Perhaps we need a new logical option that would make a difference
>>> only when there are nontrivial fiemap extents:
>>>   - read nontrivial extents, as is done now
>>>   - read them, *and* search-for/induce-holes as we do in legacy
>>>       copy for --sparse=always.
>>>
>>> Or, putting it another way, perhaps we need a new command line
>>> option to control whether we even attempt a fiemap copy.
>>> IMHO the default should be to enable it, once all of the
>>> underlying bits are deemed to be stable enough.
>>>
>>>     --fiemap
>>>     --no-fiemap
>>>
>>> Then, --fiemap --sparse=never would do what the existing fiemap_copy
>>> function does, and --fiemap --sparse=always would work once the
>>> internal-to-fiemap_copy copying code is adjusted to use the
>>> hole-preserving code in copy_reg.
>>>
>>> Then, to guarantee the legacy --sparse=never behavior,
>>> one would have to specify --no-fiemap --sparse=never
>>
>> I would suggest not to add options like this,
>> and only do what's possible without new options
>> as it would push too many implementation details
>> to the docs/users IMHO.
> 
> Good point.  Smaller/simpler would be better.
> It'd be great if we can eliminate as never-useful one of the
> new combinations.
> 
> Currently on the fiemap-copy branch, once we get a single
> successful ioctl, the code will not bother with the usual
> hole-detecting/introducing technique implied by --sparse=always.
> 
> Should it do that ever?  Always?
> Does this need an option?

I don't see the need for new options TBH
I see fiemap just as a way to efficiently detect/read holes,
and should have no bearing on the destination.

cp --sparse=auto (this is currently what cp does by default)
  recreate the original fiemap holes or resort to existing
  heuristic if fiemap not available
cp --sparse=never
  write all data, but use fiemap if available to efficiently read
cp --sparse=always
  recreate original holes and perhaps extend add to them for
  other runs of zero bytes. Without having looked at the code
  I see this as a little tricky to mix with fiemap.
  Now since fiemap is only an optimization we can skip it
  completely for this uncommon case if too tricky (just add a FIXME for now).

cheers,
Pádraig.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 16 Jun 2010 14:01:01 GMT) Full text and rfc822 format available.

Message #224 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tao Ma <tao.ma <at> oracle.com>
To: Joel Becker <Joel.Becker <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Jim Meyering <jim <at> meyering.net>,
	"jeff.liu" <jeff.liu <at> oracle.com>, Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 16 Jun 2010 21:58:02 +0800

On 06/16/2010 05:03 PM, Joel Becker wrote:
> On Wed, Jun 16, 2010 at 02:57:01PM +0800, jeff.liu wrote:
>> Paul Eggert wrote:
>>> For example, if a fiemap_extent has the FIEMAP_EXTENT_UNWRITTEN flag
>>> set, cp should treat that as a hole, because the extent is all zeros.
>>> (This will greatly help performance in some cases.)  Also, if an input
>>> extent is read and a block of it is found to be zeros, cp should skip
>>> over that block when writing.
>> If FIEMAP_EXTENT_UNWRITTEN flag is set, we can call fallocate(2) against the dest file directly for
>> the performance boost.
>
> 	Nope.  An UNWRITTEN extent is a hole, and the user asked for
> holes.  If cp sees an UNWRITTEN extent it is skipped.
yeah, I agree with this.
The unwritten extent is allocated by the user for the source file. Since 
there is no user say to us that the target also need extra allocation, 
we'd better just leave a hole there. ;)

Regards,
Tao




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 14 Jul 2010 07:00:04 GMT) Full text and rfc822 format available.

Message #227 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Pádraig Brady <P <at> draigBrady.com>,
	Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>, Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 14 Jul 2010 14:58:16 +0800
Pádraig Brady wrote:
> On 16/06/10 09:49, Jim Meyering wrote:
>> Pádraig Brady wrote:
>>
>>> On 16/06/10 07:45, Jim Meyering wrote:
>>>> Paul Eggert wrote:
>>>>> On 06/15/2010 02:43 PM, Jim Meyering wrote:
>>>>>> I think that copying physical holes via FIEMAP should be the default, when
>>>>>> possible.  One problem is that the current code on the fiemap-copy branch
>>>>>> does not honor --sparse=WHEN when in fiemap-copying mode.  The solution
>>>>>> would seem to be to change the regular-file-copying loop in the fiemap_copy
>>>>>> function to use the same hole-preserving code that is used in copy_reg.
>>>>> I assume that the solution would be used only with cp --sparse=always?
>>>>> (Otherwise, it would amount to copying logical holes by default.)
>>>> Right.
>>>> I.e., use the same code that honors (or not) the "make_holes" variable.
>>>>
>>>>> If so, under this proposal, --sparse=always would copy logical holes,
>>>>> --sparse=never would never copy holes, and --sparse=auto (the default)
>>>>> would copy physical holes if supported, else it would copy logical
>>>>> holes if the file seems to have enough physical holes, and otherwise
>>>>> it would copy no holes.  (Whew!)
>>>> Right.  With --sparse=always and fiemap support, it would take advantage
>>>> of existing extents to minimize copying time, and for the nontrivial
>>>> extents, it would detect/induce new holes when possible.
>>>>
>>>> Perhaps we need a new logical option that would make a difference
>>>> only when there are nontrivial fiemap extents:
>>>>   - read nontrivial extents, as is done now
>>>>   - read them, *and* search-for/induce-holes as we do in legacy
>>>>       copy for --sparse=always.
>>>>
>>>> Or, putting it another way, perhaps we need a new command line
>>>> option to control whether we even attempt a fiemap copy.
>>>> IMHO the default should be to enable it, once all of the
>>>> underlying bits are deemed to be stable enough.
>>>>
>>>>     --fiemap
>>>>     --no-fiemap
>>>>
>>>> Then, --fiemap --sparse=never would do what the existing fiemap_copy
>>>> function does, and --fiemap --sparse=always would work once the
>>>> internal-to-fiemap_copy copying code is adjusted to use the
>>>> hole-preserving code in copy_reg.
>>>>
>>>> Then, to guarantee the legacy --sparse=never behavior,
>>>> one would have to specify --no-fiemap --sparse=never
>>> I would suggest not to add options like this,
>>> and only do what's possible without new options
>>> as it would push too many implementation details
>>> to the docs/users IMHO.
>> Good point.  Smaller/simpler would be better.
>> It'd be great if we can eliminate as never-useful one of the
>> new combinations.
>>
>> Currently on the fiemap-copy branch, once we get a single
>> successful ioctl, the code will not bother with the usual
>> hole-detecting/introducing technique implied by --sparse=always.
>>
>> Should it do that ever?  Always?
>> Does this need an option?


Hello All,

I am very sorry for the late response! I have an urgent task need to deliver in the past few weeks.

Thanks for all your suggestions, I would like to improve the fiemap-copy accordingly, so does the
following sentence is the final decision?

> I don't see the need for new options TBH
> I see fiemap just as a way to efficiently detect/read holes,
> and should have no bearing on the destination.
> 
> cp --sparse=auto (this is currently what cp does by default)
>   recreate the original fiemap holes or resort to existing
>   heuristic if fiemap not available
> cp --sparse=never
>   write all data, but use fiemap if available to efficiently read
> cp --sparse=always
>   recreate original holes and perhaps extend add to them for
>   other runs of zero bytes. Without having looked at the code
>   I see this as a little tricky to mix with fiemap.
>   Now since fiemap is only an optimization we can skip it
>   completely for this uncommon case if too tricky (just add a FIXME for now).

Current code handle 'cp --sparse=auto' and 'cp --sparse=always' in same way since these two options
all setup 'make_holes' to True, though '--sparse=auto' use a heuristic to determine whether SRC_NAME
contains any sparse blocks.

For 'cp --sparse=never', when detected holes from SRC file, do not lseek(2) against DST file,
instead, write ZEROs to DST file, Am I right?

If so, IMHO, we can pass 'sparse_mode' to fiemap_copy(), then decide how to do operation against DST
file according to its mode.

In addition to this, I'd like to fix another issue in current code as Paul suggested,
1. Do not allocate buffer in fiemap_copy() for data pass between SRC and DST files, instead, take
advantage of the buffer allocated outside.

2. Performance optimization, invoke fallocate(2) if an extent flag is UNWRITTEN
For this case, maybe we have to wait until fallocate interface become stable just as Pádraig
methoned before:
http://lists.gnu.org/archive/html/bug-coreutils/2009-05/msg00260.html

> cheers,
> Pádraig.

Thanks,
-Jeff


-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 14 Jul 2010 08:41:02 GMT) Full text and rfc822 format available.

Message #230 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Jim Meyering <jim <at> meyering.net>,
	Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 14 Jul 2010 09:38:36 +0100
On 14/07/10 07:58, jeff.liu wrote:
> Pádraig Brady wrote:
> 
> Hello All,
> 
> I am very sorry for the late response! I have an urgent task need to deliver in the past few weeks.
> 
> Thanks for all your suggestions, I would like to improve the fiemap-copy accordingly, so does the
> following sentence is the final decision?
> 
>> I don't see the need for new options TBH
>> I see fiemap just as a way to efficiently detect/read holes,
>> and should have no bearing on the destination.
>>
>> cp --sparse=auto (this is currently what cp does by default)
>>   recreate the original fiemap holes or resort to existing
>>   heuristic if fiemap not available
>> cp --sparse=never
>>   write all data, but use fiemap if available to efficiently read
>> cp --sparse=always
>>   recreate original holes and perhaps extend add to them for
>>   other runs of zero bytes. Without having looked at the code
>>   I see this as a little tricky to mix with fiemap.
>>   Now since fiemap is only an optimization we can skip it
>>   completely for this uncommon case if too tricky (just add a FIXME for now).

Joel, overlapped with my response and concurred.

> Current code handle 'cp --sparse=auto' and 'cp --sparse=always' in same way since these two options
> all setup 'make_holes' to True, though '--sparse=auto' use a heuristic to determine whether SRC_NAME
> contains any sparse blocks.
> 
> For 'cp --sparse=never', when detected holes from SRC file, do not lseek(2) against DST file,
> instead, write ZEROs to DST file, Am I right?

right

> 
> If so, IMHO, we can pass 'sparse_mode' to fiemap_copy(), then decide how to do operation against DST
> file according to its mode.
> 
> In addition to this, I'd like to fix another issue in current code as Paul suggested,
> 1. Do not allocate buffer in fiemap_copy() for data pass between SRC and DST files, instead, take
> advantage of the buffer allocated outside.

good

> 2. Performance optimization, invoke fallocate(2) if an extent flag is UNWRITTEN
> For this case, maybe we have to wait until fallocate interface become stable just as Pádraig
> methoned before:
> http://lists.gnu.org/archive/html/bug-coreutils/2009-05/msg00260.html

If you decide to do that, then please do it as a separate patch.

cheers,
Pádraig.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 14 Jul 2010 17:46:01 GMT) Full text and rfc822 format available.

Message #233 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 14 Jul 2010 10:45:23 -0700
>> I see fiemap just as a way to efficiently detect/read holes,
>> and should have no bearing on the destination.

Hmm, but the proposal quoted below would mean that fiemap does have a
bearing on the destination, in the --sparse=auto case.
I guess this is OK, but it should be documented.

>> cp --sparse=auto (this is currently what cp does by default)
>>   recreate the original fiemap holes or resort to existing
>>   heuristic if fiemap not available

It's not just fiemap.  It's also the Solaris interface with SEEK_HOLE
and SEEK_DATA.  The change should involve a module that isolates these
low-level details from copy.c.  copy.c should ask the new module for the
locations of the holes (or the non-holes: that could be more convenient).
On traditional hosts without fiemap or SEEK_DATA, the module should report
that it doesn't know where the holes are; this can let copy.c resort to
the existing heuristic of looking at the size and the disk usage and
using the --sparse=always approach if the file "smells" like it's sparse.

>> cp --sparse=never
>>   write all data, but use fiemap if available to efficiently read

Surely there's no need to write all the data if fallocate works.

>> cp --sparse=always
>>   recreate original holes and perhaps extend add to them for
>>   other runs of zero bytes. Without having looked at the code
>>   I see this as a little tricky to mix with fiemap.
>>   Now since fiemap is only an optimization we can skip it
>>   completely for this uncommon case if too tricky (just add a FIXME for now).

Yes, that makes sense.  --sparse=always should never invoke fallocate.

> For 'cp --sparse=never', when detected holes from SRC file, do not lseek(2) against DST file,
> instead, write ZEROs to DST file, Am I right?

Only if fallocate doesn't work.  If fallocate works, there's no need
to write zeros to the destination.

> 2. Performance optimization, invoke fallocate(2) if an extent flag is UNWRITTEN

This doesn't sound right.  A FIEMAP_EXTENT_UNWRITTEN extent is all zeros, and
so it should act as if it were a hole.  The goal is not to copy the exact
fiemap structure of the source (that's impossible): the goal is to use as
little time and space as possible.

> If you decide to do that, then please do it as a separate patch.

It's not clear to me that the fiemap stuff can be cleanly separated
from the fallocate stuff.  To some extent they're the same issue.
If they can easily be separated, that's better of course.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 14 Jul 2010 23:53:02 GMT) Full text and rfc822 format available.

Message #236 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 15 Jul 2010 00:51:36 +0100
On 14/07/10 18:45, Paul Eggert wrote:
>>> I see fiemap just as a way to efficiently detect/read holes,
>>> and should have no bearing on the destination.
> 
> Hmm, but the proposal quoted below would mean that fiemap does have a
> bearing on the destination, in the --sparse=auto case.
> I guess this is OK, but it should be documented.
> 
>>> cp --sparse=auto (this is currently what cp does by default)
>>>   recreate the original fiemap holes or resort to existing
>>>   heuristic if fiemap not available
> 
> It's not just fiemap.  It's also the Solaris interface with SEEK_HOLE
> and SEEK_DATA.  The change should involve a module that isolates these
> low-level details from copy.c.  copy.c should ask the new module for the
> locations of the holes (or the non-holes: that could be more convenient).
> On traditional hosts without fiemap or SEEK_DATA, the module should report
> that it doesn't know where the holes are; this can let copy.c resort to
> the existing heuristic of looking at the size and the disk usage and
> using the --sparse=always approach if the file "smells" like it's sparse.
> 
>>> cp --sparse=never
>>>   write all data, but use fiemap if available to efficiently read
> 
> Surely there's no need to write all the data if fallocate works.
> 
>>> cp --sparse=always
>>>   recreate original holes and perhaps extend add to them for
>>>   other runs of zero bytes. Without having looked at the code
>>>   I see this as a little tricky to mix with fiemap.
>>>   Now since fiemap is only an optimization we can skip it
>>>   completely for this uncommon case if too tricky (just add a FIXME for now).
> 
> Yes, that makes sense.  --sparse=always should never invoke fallocate.
> 
>> For 'cp --sparse=never', when detected holes from SRC file, do not lseek(2) against DST file,
>> instead, write ZEROs to DST file, Am I right?
> 
> Only if fallocate doesn't work.  If fallocate works, there's no need
> to write zeros to the destination.

What you're describing here is posix_fallocate()
which uses fallocate() if available or falls back
to an implementation that writes a single 0 byte
to each block.

> 
>> 2. Performance optimization, invoke fallocate(2) if an extent flag is UNWRITTEN
> 
> This doesn't sound right.  A FIEMAP_EXTENT_UNWRITTEN extent is all zeros, and
> so it should act as if it were a hole.  The goal is not to copy the exact
> fiemap structure of the source (that's impossible): the goal is to use as
> little time and space as possible.
> 
>> If you decide to do that, then please do it as a separate patch.
> 
> It's not clear to me that the fiemap stuff can be cleanly separated
> from the fallocate stuff.  To some extent they're the same issue.
> If they can easily be separated, that's better of course.

I see fiemap as optimizing reads,
posix_fallocate() as optimizing writing zeros
and fallocate() as optimizing allocation.

So not having thought much about implementation details,
it seems like they could be logically separated.
I.E. we could optimize the writing zeros and allocation
later when we have the fallocate and posix_fallocate
gnulib modules in place.

In saying that, doing both now is better
when these details are in everyone's minds.
I'll not get to resubmitting my fallocate gnulib patch,
or doing a posix_fallocate module, this week at least I think.

cheers,
Pádraig.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 15 Jul 2010 01:51:02 GMT) Full text and rfc822 format available.

Message #239 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Pádraig Brady <P <at> draigBrady.com>,
	Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>,
	Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 15 Jul 2010 09:48:28 +0800
Hi Pádraig and Paul,

Thanks for your quick response.

Pádraig Brady wrote:
> On 14/07/10 18:45, Paul Eggert wrote:
>>>> I see fiemap just as a way to efficiently detect/read holes,
>>>> and should have no bearing on the destination.
>> Hmm, but the proposal quoted below would mean that fiemap does have a
>> bearing on the destination, in the --sparse=auto case.
>> I guess this is OK, but it should be documented.
>>
>>>> cp --sparse=auto (this is currently what cp does by default)
>>>>   recreate the original fiemap holes or resort to existing
>>>>   heuristic if fiemap not available
>> It's not just fiemap.  It's also the Solaris interface with SEEK_HOLE
>> and SEEK_DATA.  The change should involve a module that isolates these
>> low-level details from copy.c.  copy.c should ask the new module for the
>> locations of the holes (or the non-holes: that could be more convenient).
Consider the expansibility, its better to add a new file involves fiemap and Solaris interface(I'll
implement the fiemap at the moment).
just like 'copy.c' shared the functions between cp(1) and mv(1).
Maybe it could be used for other utilities to add new features related to them.

>> On traditional hosts without fiemap or SEEK_DATA, the module should report
>> that it doesn't know where the holes are; this can let copy.c resort to
>> the existing heuristic of looking at the size and the disk usage and
>> using the --sparse=always approach if the file "smells" like it's sparse.
>>
>>>> cp --sparse=never
>>>>   write all data, but use fiemap if available to efficiently read
>> Surely there's no need to write all the data if fallocate works.
>>
>>>> cp --sparse=always
>>>>   recreate original holes and perhaps extend add to them for
>>>>   other runs of zero bytes. Without having looked at the code
>>>>   I see this as a little tricky to mix with fiemap.
>>>>   Now since fiemap is only an optimization we can skip it
>>>>   completely for this uncommon case if too tricky (just add a FIXME for now).
>> Yes, that makes sense.  --sparse=always should never invoke fallocate.
>>
>>> For 'cp --sparse=never', when detected holes from SRC file, do not lseek(2) against DST file,
>>> instead, write ZEROs to DST file, Am I right?
>> Only if fallocate doesn't work.  If fallocate works, there's no need
>> to write zeros to the destination.
> 
> What you're describing here is posix_fallocate()
> which uses fallocate() if available or falls back
> to an implementation that writes a single 0 byte
> to each block.
> 
>>> 2. Performance optimization, invoke fallocate(2) if an extent flag is UNWRITTEN
>> This doesn't sound right.  A FIEMAP_EXTENT_UNWRITTEN extent is all zeros, and
>> so it should act as if it were a hole.  The goal is not to copy the exact
>> fiemap structure of the source (that's impossible): the goal is to use as
>> little time and space as possible.
A FIEMAP_EXTENT_UNWRITTEN extent is marked to allocated although read it will return ZEROs through
the filesystem.  So why not using fallocate(2) to deal with it?  IMHO, it meet the goal to use
little time and space as possible, Am I miss something?

>> 
>>> If you decide to do that, then please do it as a separate patch.
>> It's not clear to me that the fiemap stuff can be cleanly separated
>> from the fallocate stuff.  To some extent they're the same issue.
>> If they can easily be separated, that's better of course.
> 
> I see fiemap as optimizing reads,
> posix_fallocate() as optimizing writing zeros
> and fallocate() as optimizing allocation.
> 
> So not having thought much about implementation details,
> it seems like they could be logically separated.
> I.E. we could optimize the writing zeros and allocation
> later when we have the fallocate and posix_fallocate
> gnulib modules in place.
I think so, its better to wait until the changes done in gnulib, for now, we can add FIXME for both
cases.

> 
> In saying that, doing both now is better
> when these details are in everyone's minds.
> I'll not get to resubmitting my fallocate gnulib patch,
> or doing a posix_fallocate module, this week at least I think.
> 
> cheers,
> Pádraig.

Thanks,
-Jeff

-- 
With Windows 7, Microsoft is asserting legal control over your computer and is using this power to
abuse computer users.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 15 Jul 2010 22:14:02 GMT) Full text and rfc822 format available.

Message #242 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Joel Becker <Joel.Becker <at> oracle.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Jim Meyering <jim <at> meyering.net>,
	"jeff.liu" <jeff.liu <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 15 Jul 2010 15:12:58 -0700
On Thu, Jul 15, 2010 at 12:51:36AM +0100, Pádraig Brady wrote:
> On 14/07/10 18:45, Paul Eggert wrote:

	First and foremost, I re-concur with the broad strokes of the
--sparse={always,never,auto} conversation.  I think you all knew that,
though ;-)

> > It's not just fiemap.  It's also the Solaris interface with SEEK_HOLE
> > and SEEK_DATA.  The change should involve a module that isolates these
> > low-level details from copy.c.  copy.c should ask the new module for the
> > locations of the holes (or the non-holes: that could be more convenient).
> > On traditional hosts without fiemap or SEEK_DATA, the module should report
> > that it doesn't know where the holes are; this can let copy.c resort to
> > the existing heuristic of looking at the size and the disk usage and
> > using the --sparse=always approach if the file "smells" like it's sparse.

	While I think the final result wants to support both fiemap and
SEEK_HOLE, I think baby steps are in order.  If we just implement fiemap
right now, we can later turn that into init_extent_detection() and 
get_next_extent().

> >> 2. Performance optimization, invoke fallocate(2) if an extent flag is UNWRITTEN
> > 
> > This doesn't sound right.  A FIEMAP_EXTENT_UNWRITTEN extent is all zeros, and
> > so it should act as if it were a hole.  The goal is not to copy the exact
> > fiemap structure of the source (that's impossible): the goal is to use as
> > little time and space as possible.

	What he said.  If you find an FIEMAP_EXTENT_UNWRITTEN extent,
you just skip it.  It is a hole for the purposes of copying.  If someone
really wants to clone the extent layout, they can use reflink(8).

> > It's not clear to me that the fiemap stuff can be cleanly separated
> > from the fallocate stuff.  To some extent they're the same issue.
> > If they can easily be separated, that's better of course.
> 
> I see fiemap as optimizing reads,
> posix_fallocate() as optimizing writing zeros
> and fallocate() as optimizing allocation.
> 
> So not having thought much about implementation details,
> it seems like they could be logically separated.

	I think they should absolutely be separated.  The fiemap patch
doesn't have to do anything with fallocate()/posix_fallocate() on the
write side.
	Let's get a happy fiemap patch.  Then a happy
[posix]_fallocate() patch.  Then a happy SEEK_HOLE patch.

Joel

-- 

"For every complex problem there exists a solution that is brief,
     concise, and totally wrong."
                                        -Unknown

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker <at> oracle.com
Phone: (650) 506-8127




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 15 Jul 2010 22:32:02 GMT) Full text and rfc822 format available.

Message #245 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: Joel Becker <Joel.Becker <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, "jeff.liu" <jeff.liu <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 15 Jul 2010 15:31:52 -0700
>>> This doesn't sound right.  A FIEMAP_EXTENT_UNWRITTEN extent is all zeros, and
>>> so it should act as if it were a hole.  The goal is not to copy the exact
>>> fiemap structure of the source (that's impossible): the goal is to use as
>>> little time and space as possible.

> A FIEMAP_EXTENT_UNWRITTEN extent is marked to allocated although
> read it will return ZEROs through the filesystem.  So why not using
> fallocate(2) to deal with it?  IMHO, it meet the goal to use little
> time and space as possible, Am I miss something?

It's faster to simply skip around that extent while reading it, and to
skip around it when writing it, than to call fallocate when writing it.
Logically, a FIEMAP_EXTENT_UNWRITTEN extent is a hole, and should be
optimized when reading, just like any hole.

>> I see fiemap as optimizing reads,
>> posix_fallocate() as optimizing writing zeros
>> and fallocate() as optimizing allocation.

It may not be quite that simple.  Some platforms won't have fallocate
and so posix_fallocate will have to do double duty as optimizing
allocation too.  Also, lseek is part of the process of optimizing
reads, and of optimizing writing zeros.  And fallocate also optimizes
writing zeros, I expect.

I'm not objecting to breaking these improvements into two or three
pieces, if someone wants to do that.  However, it shouldn't be
required to break them up; it's OK if someone wants to do it all at
once.  (This stuff is not that hard, after all.)  I was planning to
give it a shot at some point but obviously have not done so yet.





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Thu, 15 Jul 2010 22:52:01 GMT) Full text and rfc822 format available.

Message #248 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 15 Jul 2010 15:51:40 -0700
>>> This doesn't sound right.  A FIEMAP_EXTENT_UNWRITTEN extent is all zeros, and
>>> so it should act as if it were a hole.  The goal is not to copy the exact
>>> fiemap structure of the source (that's impossible): the goal is to use as
>>> little time and space as possible.

> A FIEMAP_EXTENT_UNWRITTEN extent is marked to allocated although
> read it will return ZEROs through the filesystem.  So why not using
> fallocate(2) to deal with it?  IMHO, it meet the goal to use little
> time and space as possible, Am I miss something?

It's faster to simply skip around that extent while reading it, and to
skip around it when writing it, than to allocate it with fallocate
when writing it.  Logically, a FIEMAP_EXTENT_UNWRITTEN extent is a
hole, and should be optimized when reading and writing, just like any
hole.

>> I see fiemap as optimizing reads,
>> posix_fallocate() as optimizing writing zeros
>> and fallocate() as optimizing allocation.

It may not be quite that simple.  Some platforms won't have fallocate
and so posix_fallocate will have to do double duty as optimizing
allocation too.  Also, lseek is part of the process of optimizing
reads, and of optimizing writing zeros.  Most important, the
heuristics for optimizing the writes should use info derived from
optimizing the reads.

I'm not objecting to breaking these improvements into two or three
pieces, if someone wants to do that.  However, it shouldn't be
required to break them up; it's OK if someone wants to do it all at
once.  (This stuff is not that hard, after all.)  I was planning to
give it a shot at some point but obviously have not done so yet.





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 16 Jul 2010 14:51:02 GMT) Full text and rfc822 format available.

Message #251 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 16 Jul 2010 22:49:59 +0800
Paul Eggert wrote:
>>>> This doesn't sound right.  A FIEMAP_EXTENT_UNWRITTEN extent is all zeros, and
>>>> so it should act as if it were a hole.  The goal is not to copy the exact
>>>> fiemap structure of the source (that's impossible): the goal is to use as
>>>> little time and space as possible.
> 
>> A FIEMAP_EXTENT_UNWRITTEN extent is marked to allocated although
>> read it will return ZEROs through the filesystem.  So why not using
>> fallocate(2) to deal with it?  IMHO, it meet the goal to use little
>> time and space as possible, Am I miss something?
> 
> It's faster to simply skip around that extent while reading it, and to
> skip around it when writing it, than to allocate it with fallocate
> when writing it.  Logically, a FIEMAP_EXTENT_UNWRITTEN extent is a
> hole, and should be optimized when reading and writing, just like any
> hole.
> 
>>> I see fiemap as optimizing reads,
>>> posix_fallocate() as optimizing writing zeros
>>> and fallocate() as optimizing allocation.
> 
> It may not be quite that simple.  Some platforms won't have fallocate
> and so posix_fallocate will have to do double duty as optimizing
> allocation too.  Also, lseek is part of the process of optimizing
> reads, and of optimizing writing zeros.  Most important, the
> heuristics for optimizing the writes should use info derived from
> optimizing the reads.
> 
> I'm not objecting to breaking these improvements into two or three
> pieces, if someone wants to do that.  However, it shouldn't be
> required to break them up; it's OK if someone wants to do it all at
> once.  (This stuff is not that hard, after all.)  I was planning to
> give it a shot at some point but obviously have not done so yet.

Hi paul, Pádraig and All,

Thanks for all your comments!

For now, I am inclined to separate efficient read through fiemap
and improve the write and allocation stuff via fallocate() or other ways later.

In this version, I created a new file src/sparse-core.c(not find out a better name yet :( ) to place
the original fiemap_copy().

Below is the revised fiemap_copy(), it honor make_holes to write all data(a new function
fill_with_holes_ok() implemented at sparse-core.c for this purpose) if --sparse=never option is
specified, it still does same thing as original for --sparse=always and --sparse=auto excecpt make
use of external buf to avoid allocated another one while reading and writing.

From 53e1975cb23a4d3c9d9f982d1e48437e95b071d8 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Fri, 16 Jul 2010 22:25:21 +0800
Subject: [PATCH 1/1] cp: Isolate fiemap_copy() to sparse-core.c

* src/sparse-core.c: new file include fiemap_copy()
* src/sparse-core.h: header file of sparse-core.c
* src/copy.c (copy_reg): Now, `cp' attempt to get an fiemap of the source file by default, fall back
  to
  a normal copy if the underlaying file system does not support it.
  We honor --sparse=never by writing all data but use fiemap if available to efficiently read.
* po/POTFILES.in: add sparse-core.c to it to pass po_check for translatable diagnostics.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 po/POTFILES.in    |    1 +
 src/Makefile.am   |    3 +-
 src/copy.c        |  187 ++++-------------------------------------
 src/sparse-core.c |  240 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/sparse-core.h |   24 ++++++
 5 files changed, 284 insertions(+), 171 deletions(-)
 create mode 100644 src/sparse-core.c
 create mode 100644 src/sparse-core.h

diff --git a/po/POTFILES.in b/po/POTFILES.in
index c862877..025c9b0 100644
--- a/po/POTFILES.in
+++ b/po/POTFILES.in
@@ -110,6 +110,7 @@ src/shred.c
 src/shuf.c
 src/sleep.c
 src/sort.c
+src/sparse-core.c
 src/split.c
 src/stat.c
 src/stdbuf.c
diff --git a/src/Makefile.am b/src/Makefile.am
index 7d56312..a838920 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -144,6 +144,7 @@ noinst_HEADERS =	\
   chown-core.h		\
   copy.h		\
   cp-hash.h		\
+  sparse-core.h         \
   dircolors.h		\
   fiemap.h		\
   fs.h			\
@@ -459,7 +460,7 @@ uninstall-local:
 	  fi; \
 	fi

-copy_sources = copy.c cp-hash.c
+copy_sources = copy.c cp-hash.c sparse-core.c

 # Use `ginstall' in the definition of PROGRAMS and in dependencies to avoid
 # confusion with the `install' target.  The install rule transforms `ginstall'
diff --git a/src/copy.c b/src/copy.c
index f48c74d..f6d9b5e 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -30,6 +30,7 @@
 #endif

 #include "system.h"
+#include "sparse-core.h"
 #include "acl.h"
 #include "backupfile.h"
 #include "buffer-lcm.h"
@@ -63,10 +64,6 @@

 #include <sys/ioctl.h>

-#ifndef HAVE_FIEMAP
-# include "fiemap.h"
-#endif
-
 #ifndef HAVE_FCHOWN
 # define HAVE_FCHOWN false
 # define fchown(fd, uid, gid) (-1)
@@ -153,153 +150,6 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

-#ifdef __linux__
-# ifndef FS_IOC_FIEMAP
-#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
-# endif
-/* Perform a FIEMAP copy, if possible.
-   Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to
-   obtain a map of file extents excluding holes.  This avoids the
-   overhead of detecting holes in a hole-introducing/preserving copy,
-   and thus makes copying sparse files much more efficient.  Upon a
-   successful copy, return true.  If the initial ioctl fails, set
-   *NORMAL_COPY_REQUIRED to true and return false.  Upon any other
-   failure, set *NORMAL_COPY_REQUIRED to false and return false.  */
-static bool
-fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
-             off_t src_total_size, char const *src_name,
-             char const *dst_name, bool *normal_copy_required)
-{
-  bool last = false;
-  union { struct fiemap f; char c[4096]; } fiemap_buf;
-  struct fiemap *fiemap = &fiemap_buf.f;
-  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
-  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
-  verify (count != 0);
-
-  off_t last_ext_logical = 0;
-  uint64_t last_ext_len = 0;
-  uint64_t last_read_size = 0;
-  unsigned int i = 0;
-  *normal_copy_required = false;
-
-  /* This is required at least to initialize fiemap->fm_start,
-     but also serves (in mid 2010) to appease valgrind, which
-     appears not to know the semantics of the FIEMAP ioctl. */
-  memset (&fiemap_buf, 0, sizeof fiemap_buf);
-
-  do
-    {
-      fiemap->fm_length = FIEMAP_MAX_OFFSET;
-      fiemap->fm_flags = FIEMAP_FLAG_SYNC;
-      fiemap->fm_extent_count = count;
-
-      /* When ioctl(2) fails, fall back to the normal copy only if it
-         is the first time we met.  */
-      if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
-        {
-          /* If the first ioctl fails, tell the caller that it is
-             ok to proceed with a normal copy.  */
-          if (i == 0)
-            *normal_copy_required = true;
-          else
-            {
-              /* If the second or subsequent ioctl fails, diagnose it,
-                 since it ends up causing the entire copy/cp to fail.  */
-              error (0, errno, _("%s: FIEMAP ioctl failed"), quote (src_name));
-            }
-          return false;
-        }
-
-      /* If 0 extents are returned, then more ioctls are not needed.  */
-      if (fiemap->fm_mapped_extents == 0)
-        break;
-
-      for (i = 0; i < fiemap->fm_mapped_extents; i++)
-        {
-          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
-
-          off_t ext_logical = fm_ext[i].fe_logical;
-          uint64_t ext_len = fm_ext[i].fe_length;
-
-          if (lseek (src_fd, ext_logical, SEEK_SET) < 0)
-            {
-              error (0, errno, _("cannot lseek %s"), quote (src_name));
-              return false;
-            }
-
-          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0)
-            {
-              error (0, errno, _("cannot lseek %s"), quote (dst_name));
-              return false;
-            }
-
-          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
-            {
-              last_ext_logical = ext_logical;
-              last_ext_len = ext_len;
-              last = true;
-            }
-
-          while (ext_len)
-            {
-              char buf[buf_size];
-
-              /* Avoid reading into the holes if the left extent
-                 length is shorter than the buffer size.  */
-              if (ext_len < buf_size)
-                buf_size = ext_len;
-
-              ssize_t n_read = read (src_fd, buf, buf_size);
-              if (n_read < 0)
-                {
-#ifdef EINTR
-                  if (errno == EINTR)
-                    continue;
-#endif
-                  error (0, errno, _("reading %s"), quote (src_name));
-                  return false;
-                }
-
-              if (n_read == 0)
-                {
-                  /* Figure out how many bytes read from the last extent.  */
-                  last_read_size = last_ext_len - ext_len;
-                  break;
-                }
-
-              if (full_write (dest_fd, buf, n_read) != n_read)
-                {
-                  error (0, errno, _("writing %s"), quote (dst_name));
-                  return false;
-                }
-
-              ext_len -= n_read;
-            }
-        }
-
-      fiemap->fm_start = fm_ext[i - 1].fe_logical + fm_ext[i - 1].fe_length;
-
-    } while (! last);
-
-  /* If a file ends up with holes, the sum of the last extent logical offset
-     and the read-returned size will be shorter than the actual size of the
-     file.  Use ftruncate to extend the length of the destination file.  */
-  if (last_ext_logical + last_read_size < src_total_size)
-    {
-      if (ftruncate (dest_fd, src_total_size) < 0)
-        {
-          error (0, errno, _("failed to extend %s"), quote (dst_name));
-          return false;
-        }
-    }
-
-  return true;
-}
-#else
-static bool fiemap_copy (ignored) { errno == ENOTSUP; return false; }
-#endif
-
 /* FIXME: describe */
 /* FIXME: rewrite this to use a hash table so we avoid the quadratic
    performance hit that's probably noticeable only on trees deeper
@@ -830,25 +680,6 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }

-      if (make_holes)
-        {
-          bool require_normal_copy;
-          /* Perform efficient FIEMAP copy for sparse files, fall back to the
-             standard copy only if the ioctl(2) fails.  */
-          if (fiemap_copy (source_desc, dest_desc, buf_size,
-                           src_open_sb.st_size, src_name,
-                           dst_name, &require_normal_copy))
-            goto preserve_metadata;
-          else
-            {
-              if (! require_normal_copy)
-                {
-                  return_val = false;
-                  goto close_src_and_dst_desc;
-                }
-            }
-        }
-
       /* If not making a sparse file, try to use a more-efficient
          buffer size.  */
       if (! make_holes)
@@ -877,6 +708,22 @@ copy_reg (char const *src_name, char const *dst_name,
       buf_alloc = xmalloc (buf_size + buf_alignment_slop);
       buf = ptr_align (buf_alloc, buf_alignment);

+      bool require_normal_copy;
+      /* Perform efficient FIEMAP copy for sparse files, fall back to the
+         standard copy only if the ioctl(2) fails.  */
+      if (fiemap_copy (source_desc, dest_desc, buf, buf_size,
+                       make_holes, src_name, dst_name,
+                       src_open_sb.st_size, &require_normal_copy))
+        goto preserve_metadata;
+      else
+        {
+          if (! require_normal_copy)
+            {
+              return_val = false;
+              goto close_src_and_dst_desc;
+            }
+        }
+
       while (true)
         {
           word *wp = NULL;
diff --git a/src/sparse-core.c b/src/sparse-core.c
new file mode 100644
index 0000000..e122dc8
--- /dev/null
+++ b/src/sparse-core.c
@@ -0,0 +1,240 @@
+/* sparse-core.c -- core functions for efficient reading sparse files
+   Copyright (C) 2010 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <config.h>
+#include <stdio.h>
+#include <string.h>
+#include <assert.h>
+#include <sys/types.h>
+#include <sys/ioctl.h>
+
+#include "system.h"
+#include "sparse-core.h"
+#include "error.h"
+#include "quote.h"
+#include "full-write.h"
+
+#ifndef HAVE_FIEMAP
+# include "fiemap.h"
+#endif
+
+static bool
+fill_with_holes_ok(int dest_fd, const char *dst_name,
+                   char *buf, size_t buf_size,
+                   uint64_t holes_len)
+{
+  while (buf_size < holes_len)
+    {
+      if (full_write (dest_fd, buf, buf_size) != buf_size)
+        {
+          error (0, errno, _("writing %s"), quote (dst_name));
+          return false;
+        }
+        holes_len -= buf_size;
+    }
+
+  if (0 < holes_len)
+    {
+      if (full_write (dest_fd, buf, holes_len) != holes_len)
+        {
+          error (0, errno, _("writing %s"), quote (dst_name));
+          return false;
+        }
+    }
+
+  return true;
+}
+
+
+#ifdef __linux__
+# ifndef FS_IOC_FIEMAP
+#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
+# endif
+
+/* Perform a FIEMAP copy, if possible.
+   Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to
+   obtain a map of file extents excluding holes.  This avoids the
+   overhead of detecting holes in a hole-introducing/preserving copy,
+   and thus makes copying sparse files much more efficient.  Upon a
+   successful copy, return true.  If the initial ioctl fails, set
+   *NORMAL_COPY_REQUIRED to true and return false.  Upon any other
+   failure, set *NORMAL_COPY_REQUIRED to false and return false.  */
+
+extern bool
+fiemap_copy (int src_fd, int dest_fd, char *buf,
+             size_t buf_size, bool make_holes,
+             char const *src_name, char const *dst_name,
+             off_t src_total_size, bool *normal_copy_required)
+{
+  bool last = false;
+  union { struct fiemap f; char c[4096]; } fiemap_buf;
+  struct fiemap *fiemap = &fiemap_buf.f;
+  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
+  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
+  verify (count != 0);
+
+  off_t prev_ext_logical = 0;
+  off_t last_ext_logical = 0;
+  uint64_t prev_ext_len = 0;
+  uint64_t last_ext_len = 0;
+  uint64_t last_read_size = 0;
+  unsigned int i = 0;
+  *normal_copy_required = false;
+
+  if (! make_holes)
+    memset (buf, 0, buf_size);
+
+  /* This is required at least to initialize fiemap->fm_start,
+     but also serves (in mid 2010) to appease valgrind, which
+     appears not to know the semantics of the FIEMAP ioctl. */
+  memset (&fiemap_buf, 0, sizeof fiemap_buf);
+
+  do
+    {
+      fiemap->fm_length = FIEMAP_MAX_OFFSET;
+      fiemap->fm_flags = FIEMAP_FLAG_SYNC;
+      fiemap->fm_extent_count = count;
+
+      /* When ioctl(2) fails, fall back to the normal copy only if it
+         is the first time we met.  */
+      if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
+        {
+          /* If the first ioctl fails, tell the caller that it is
+             ok to proceed with a normal copy.  */
+          if (i == 0)
+            *normal_copy_required = true;
+          else
+            {
+              /* If the second or subsequent ioctl fails, diagnose it,
+                 since it ends up causing the entire copy/cp to fail.  */
+              error (0, errno, _("%s: FIEMAP ioctl failed"), quote (src_name));
+            }
+          return false;
+        }
+
+      /* If 0 extents are returned, then more ioctls are not needed.  */
+      if (fiemap->fm_mapped_extents == 0)
+        break;
+
+      for (i = 0; i < fiemap->fm_mapped_extents; i++)
+        {
+          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
+
+          off_t ext_logical = fm_ext[i].fe_logical;
+          uint64_t ext_len = fm_ext[i].fe_length;
+
+          if (lseek (src_fd, ext_logical, SEEK_SET) < 0)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (src_name));
+              return false;
+            }
+
+          if (0 < i)
+            {
+              prev_ext_logical = fm_ext[i - 1].fe_logical,
+              prev_ext_len = fm_ext[i - 1].fe_length;
+            }
+
+          if (! make_holes)
+            {
+              if (prev_ext_logical + prev_ext_len < ext_logical)
+                {
+                  uint64_t holes_len = ext_logical - prev_ext_logical - prev_ext_len;
+                  if (! fill_with_holes_ok (dest_fd, dst_name, buf, buf_size, holes_len))
+                    return false;
+                }
+            }
+          else
+            {
+              if (lseek (dest_fd, ext_logical, SEEK_SET) < 0)
+                {
+                  error (0, errno, _("cannot lseek %s"), quote (dst_name));
+                  return false;
+                }
+            }
+
+          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
+            {
+              last_ext_logical = ext_logical;
+              last_ext_len = ext_len;
+              last = true;
+            }
+
+          while (ext_len)
+            {
+              /* Avoid reading into the holes if the left extent
+                 length is shorter than the buffer size.  */
+              if (ext_len < buf_size)
+                buf_size = ext_len;
+
+              ssize_t n_read = read (src_fd, buf, buf_size);
+              if (n_read < 0)
+                {
+#ifdef EINTR
+                  if (errno == EINTR)
+                    continue;
+#endif
+                  error (0, errno, _("reading %s"), quote (src_name));
+                  return false;
+                }
+
+              if (n_read == 0)
+                {
+                  /* Figure out how many bytes read from the last extent.  */
+                  last_read_size = last_ext_len - ext_len;
+                  break;
+                }
+
+              if (full_write (dest_fd, buf, n_read) != n_read)
+                {
+                  error (0, errno, _("writing %s"), quote (dst_name));
+                  return false;
+                }
+
+              ext_len -= n_read;
+            }
+        }
+
+      fiemap->fm_start = fm_ext[i - 1].fe_logical + fm_ext[i - 1].fe_length;
+
+    } while (! last);
+
+  /* If a file ends up with holes, the sum of the last extent logical offset
+     and the read-returned size will be shorter than the actual size of the
+     file.  Use ftruncate to extend the length of the destination file.  */
+  if (last_ext_logical + last_read_size < src_total_size)
+    {
+      if (! make_holes)
+        {
+          uint64_t holes_len = src_total_size - last_ext_logical - last_ext_len;
+          if (! fill_with_holes_ok (dest_fd, dst_name, buf, buf_size, holes_len))
+            return false;
+        }
+      else
+        {
+          if (ftruncate (dest_fd, src_total_size) < 0)
+            {
+              error (0, errno, _("failed to extend %s"), quote (dst_name));
+              return false;
+            }
+        }
+    }
+
+  return true;
+}
+#else
+extern bool fiemap_copy (ignored) { errno == ENOTSUP; return false; }
+#endif
diff --git a/src/sparse-core.h b/src/sparse-core.h
new file mode 100644
index 0000000..89c9ebf
--- /dev/null
+++ b/src/sparse-core.h
@@ -0,0 +1,24 @@
+/* core functions for efficient reading sparse files
+   Copyright (C) 2010 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifndef SPARSE_CORE_H
+# define SPARSE_CORE_H
+
+bool fiemap_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
+                  bool make_holes, char const *src_name, char const *dst_name,
+                  off_t src_total_size, bool *normal_copy_required);
+
+#endif /* SPARSE_CORE_H */
-- 
1.5.4.3


Thanks,
-Jeff

-- 
The knowledge you get, no matter how much it is, must be possessed yourself and nourished with your
own painstaking efforts and be your achievement through hard work.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 16 Jul 2010 15:54:02 GMT) Full text and rfc822 format available.

Message #254 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, Joel Becker <Joel.Becker <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 16 Jul 2010 08:53:27 -0700
On 07/16/10 07:49, jeff.liu wrote:

> For now, I am inclined to separate efficient read through fiemap
> and improve the write and allocation stuff via fallocate() or other ways later.

I haven't had time to look at it carefully, but here's a very brief
review.  The code you sent, like what's in the fiemap branch, has
a separate version of a chunk of copy.c that does both reading
and writing and optimizes both reading and writing by invoking the fiemap ioctls
at strategic locations.  Instead, it would be better to have
a module that separates out the efficient-read stuff by telling
copy.c where the next significant input extent is, and then modify copy.c
to use that module.  On hosts that do not support fiemap, the module
would simply report the entire input file as that file's only extent.

Surely such an approach would be more modular, and would result in
less duplication of code.  I can write something along those lines
if there's interest and if nobody else wants to take a crack at it.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 16 Jul 2010 21:35:01 GMT) Full text and rfc822 format available.

Message #257 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Joel Becker <Joel.Becker <at> oracle.com>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, "jeff.liu" <jeff.liu <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 16 Jul 2010 14:33:17 -0700
On Fri, Jul 16, 2010 at 08:53:27AM -0700, Paul Eggert wrote:
> I haven't had time to look at it carefully, but here's a very brief
> review.  The code you sent, like what's in the fiemap branch, has
> a separate version of a chunk of copy.c that does both reading
> and writing and optimizes both reading and writing by invoking the fiemap ioctls
> at strategic locations.  Instead, it would be better to have
> a module that separates out the efficient-read stuff by telling
> copy.c where the next significant input extent is, and then modify copy.c
> to use that module.  On hosts that do not support fiemap, the module
> would simply report the entire input file as that file's only extent.

	Precisely.  The sparse-core.c or whatever it is called shouldn't
be doing the copy, it should just provide:

handle = init_extent_scan(fd);
while (get_next_extent(handle, &extent_start, &extent_len)) {
    ...
}
close_extent_scan(handle);

	Then copy.c just implements this loop and the '...' part.

Joel

-- 

"A narcissist is someone better looking than you are."  
         - Gore Vidal

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker <at> oracle.com
Phone: (650) 506-8127




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sat, 17 Jul 2010 02:12:01 GMT) Full text and rfc822 format available.

Message #260 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Joel Becker <Joel.Becker <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Sat, 17 Jul 2010 10:11:30 +0800
Joel Becker wrote:
> On Fri, Jul 16, 2010 at 08:53:27AM -0700, Paul Eggert wrote:
>> I haven't had time to look at it carefully, but here's a very brief
>> review.  The code you sent, like what's in the fiemap branch, has
>> a separate version of a chunk of copy.c that does both reading
>> and writing and optimizes both reading and writing by invoking the fiemap ioctls
>> at strategic locations.  Instead, it would be better to have
>> a module that separates out the efficient-read stuff by telling
>> copy.c where the next significant input extent is, and then modify copy.c
>> to use that module.  On hosts that do not support fiemap, the module
>> would simply report the entire input file as that file's only extent.
> 
> 	Precisely.  The sparse-core.c or whatever it is called shouldn't
> be doing the copy, it should just provide:
> 
> handle = init_extent_scan(fd);
> while (get_next_extent(handle, &extent_start, &extent_len)) {
>     ...
> }
> close_extent_scan(handle);
> 
> 	Then copy.c just implements this loop and the '...' part.
> 
> Joel
> 

yes, its better to separate copy and extent scan, and its not difficult to implement.  But I was
wondering to return an array of extents info or just return one extent info for each scan?

I would like to work out an unique interface could work for both Linux and Solaris, for Solaris
SEEK_DATA/HOLES stuff, looks its convinent to just return next extent info every time.

But for fiemap, maybe its better to return an extents_info_array as currentt design to reduce the
ioctl(2)  calls.

 union { struct fiemap f; char c[4096]; } fiemap_buf;
 enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };

Maybe we can encapsulae Solaris SEEK_DATA/SEEK_HOLE to return extents_info_array(if possible, not
play with those APIs yet) if you guys aggree with above.

then Joel's code example looks like:
handle = init_extent_scan(fd);
while (get_next_extents(handle, &extents_info)) {
    num = extents_info->extents_count;
    for (i = 0; i < num; i++)
      {
        extent_info = extents_info[i];
        same_as_usual;
      }
}
close_extent_scan(handle);



Thanks
-Jeff



-- 
The knowledge you get, no matter how much it is, must be possessed yourself and nourished with your
own painstaking efforts and be your achievement through hard work.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sat, 17 Jul 2010 06:16:01 GMT) Full text and rfc822 format available.

Message #263 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Joel Becker <Joel.Becker <at> oracle.com>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Jim Meyering <jim <at> meyering.net>,
	Chris Mason <chris.mason <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 16 Jul 2010 23:14:27 -0700
On Sat, Jul 17, 2010 at 10:11:30AM +0800, jeff.liu wrote:
> Joel Becker wrote:
> > On Fri, Jul 16, 2010 at 08:53:27AM -0700, Paul Eggert wrote:
> >> I haven't had time to look at it carefully, but here's a very brief
> >> review.  The code you sent, like what's in the fiemap branch, has
> >> a separate version of a chunk of copy.c that does both reading
> >> and writing and optimizes both reading and writing by invoking the fiemap ioctls
> >> at strategic locations.  Instead, it would be better to have
> >> a module that separates out the efficient-read stuff by telling
> >> copy.c where the next significant input extent is, and then modify copy.c
> >> to use that module.  On hosts that do not support fiemap, the module
> >> would simply report the entire input file as that file's only extent.
> > 
> > 	Precisely.  The sparse-core.c or whatever it is called shouldn't
> > be doing the copy, it should just provide:
> > 
> > handle = init_extent_scan(fd);
> > while (get_next_extent(handle, &extent_start, &extent_len)) {
> >     ...
> > }
> > close_extent_scan(handle);
> > 
> > 	Then copy.c just implements this loop and the '...' part.
> > 
> > Joel
> > 
> 
> yes, its better to separate copy and extent scan, and its not difficult to implement.  But I was
> wondering to return an array of extents info or just return one extent info for each scan?

	get_next_extent() just returns one extent, but the caller has no
idea what is hanging off of handle.  In fiemap, it could be a large
array of extents you've cached during init_extent_scan().  For Solaris
it might be some placeholder.

> I would like to work out an unique interface could work for both Linux and Solaris, for Solaris
> SEEK_DATA/HOLES stuff, looks its convinent to just return next extent info every time.
> 
> But for fiemap, maybe its better to return an extents_info_array as currentt design to reduce the
> ioctl(2)  calls.

	You don't need multiple ioctl(2) calls.  Here's a trivial
example:

void *init_extent_scan(int fd)
{
    struct fiemap *handle;

    handle = malloc(sizeof(struct fiemap) +
                    (EXTENTS_PER_IOCTL * sizeof(struct fiemap_extent)));
    handle->fm_extent_count = EXTENTS_PER_IOCTL;
    if (!ioctl(fd, FS_IOC_FIEMAP, handle))
        return handle;

    if (lseek(fd, SEEK_HOLE) >= 0)
        return (void *)-1;

    return NULL;
}

loff_t get_next_extent(void *handle, loff_t *start, loff_t *len)
{
    if (handle == (void *)-1)
        /* Do SEEK_HOLE thing */
    else if (handle)
        return handle->fm_extents[next_one++];
}

Joel

-- 

Life's Little Instruction Book #497

	"Go down swinging."

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker <at> oracle.com
Phone: (650) 506-8127




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sat, 17 Jul 2010 06:24:01 GMT) Full text and rfc822 format available.

Message #266 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Joel Becker <Joel.Becker <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Jim Meyering <jim <at> meyering.net>,
	Chris Mason <chris.mason <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Sat, 17 Jul 2010 14:23:26 +0800
Joel Becker wrote:
> On Sat, Jul 17, 2010 at 10:11:30AM +0800, jeff.liu wrote:
>> Joel Becker wrote:
>>> On Fri, Jul 16, 2010 at 08:53:27AM -0700, Paul Eggert wrote:
>>>> I haven't had time to look at it carefully, but here's a very brief
>>>> review.  The code you sent, like what's in the fiemap branch, has
>>>> a separate version of a chunk of copy.c that does both reading
>>>> and writing and optimizes both reading and writing by invoking the fiemap ioctls
>>>> at strategic locations.  Instead, it would be better to have
>>>> a module that separates out the efficient-read stuff by telling
>>>> copy.c where the next significant input extent is, and then modify copy.c
>>>> to use that module.  On hosts that do not support fiemap, the module
>>>> would simply report the entire input file as that file's only extent.
>>> 	Precisely.  The sparse-core.c or whatever it is called shouldn't
>>> be doing the copy, it should just provide:
>>>
>>> handle = init_extent_scan(fd);
>>> while (get_next_extent(handle, &extent_start, &extent_len)) {
>>>     ...
>>> }
>>> close_extent_scan(handle);
>>>
>>> 	Then copy.c just implements this loop and the '...' part.
>>>
>>> Joel
>>>
>> yes, its better to separate copy and extent scan, and its not difficult to implement.  But I was
>> wondering to return an array of extents info or just return one extent info for each scan?
> 
> 	get_next_extent() just returns one extent, but the caller has no
> idea what is hanging off of handle.  In fiemap, it could be a large
> array of extents you've cached during init_extent_scan().  For Solaris
> it might be some placeholder.
> 
>> I would like to work out an unique interface could work for both Linux and Solaris, for Solaris
>> SEEK_DATA/HOLES stuff, looks its convinent to just return next extent info every time.
>>
>> But for fiemap, maybe its better to return an extents_info_array as currentt design to reduce the
>> ioctl(2)  calls.
> 
> 	You don't need multiple ioctl(2) calls.  Here's a trivial
> example:
> 
> void *init_extent_scan(int fd)
> {
>     struct fiemap *handle;
> 
>     handle = malloc(sizeof(struct fiemap) +
>                     (EXTENTS_PER_IOCTL * sizeof(struct fiemap_extent)));
>     handle->fm_extent_count = EXTENTS_PER_IOCTL;
>     if (!ioctl(fd, FS_IOC_FIEMAP, handle))
>         return handle;
> 
>     if (lseek(fd, SEEK_HOLE) >= 0)
>         return (void *)-1;
> 
>     return NULL;
> }
> 
> loff_t get_next_extent(void *handle, loff_t *start, loff_t *len)
> {
>     if (handle == (void *)-1)
>         /* Do SEEK_HOLE thing */
>     else if (handle)
>         return handle->fm_extents[next_one++];
> }
Thanks Joel, I understood your idea now!

> 
> Joel
> 
Regards,
-Jeff

-- 
The knowledge you get, no matter how much it is, must be possessed yourself and nourished with your
own painstaking efforts and be your achievement through hard work.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 20 Jul 2010 15:25:02 GMT) Full text and rfc822 format available.

Message #269 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 20 Jul 2010 23:24:21 +0800
Hello All,

Below is my patches to isolate the extents scan and fetch functions in a new module to improve its
extendibility.

It introduce a new file 'src/extent-scan.c' to place those functions, and then call those functions
from extents_copy() at copy_reg() to process the regular file copy.

In addition to this, another major change is to copy all data if '--sparse=never' option is
specified.  It write all data to destination file but using extents_copy() if available for
efficient read source file, and it try to figure out the holes between the previous and current
extents, and call fill_with_holes_ok() to write zeros as holes to the destination file if it is.
Call file_with_holes_ok() to write zeros up to the source file size if hit the last extent of the
source file and there is a hole behind it.

I have not implement the solaris lseek(2) at the moment for lack of solaris environment, it need to
delay a period of time.

According to my tryout, it works for those 4 filesystems in common use, you all know.
As usual, any comments are welcome!


From 70773fdf1d85ba070e054b0467a7a0e1e2b00ea8 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Tue, 20 Jul 2010 20:35:25 +0800
Subject: [PATCH 1/3] cp: delete fiemap_copy() related stuff from copy.c

* delete fiemap_copy(), now it is implemented as a module.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 src/copy.c |  171 ------------------------------------------------------------
 1 files changed, 0 insertions(+), 171 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index f48c74d..171499c 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -63,10 +63,6 @@

 #include <sys/ioctl.h>

-#ifndef HAVE_FIEMAP
-# include "fiemap.h"
-#endif
-
 #ifndef HAVE_FCHOWN
 # define HAVE_FCHOWN false
 # define fchown(fd, uid, gid) (-1)
@@ -153,153 +149,6 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

-#ifdef __linux__
-# ifndef FS_IOC_FIEMAP
-#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
-# endif
-/* Perform a FIEMAP copy, if possible.
-   Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to
-   obtain a map of file extents excluding holes.  This avoids the
-   overhead of detecting holes in a hole-introducing/preserving copy,
-   and thus makes copying sparse files much more efficient.  Upon a
-   successful copy, return true.  If the initial ioctl fails, set
-   *NORMAL_COPY_REQUIRED to true and return false.  Upon any other
-   failure, set *NORMAL_COPY_REQUIRED to false and return false.  */
-static bool
-fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
-             off_t src_total_size, char const *src_name,
-             char const *dst_name, bool *normal_copy_required)
-{
-  bool last = false;
-  union { struct fiemap f; char c[4096]; } fiemap_buf;
-  struct fiemap *fiemap = &fiemap_buf.f;
-  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
-  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
-  verify (count != 0);
-
-  off_t last_ext_logical = 0;
-  uint64_t last_ext_len = 0;
-  uint64_t last_read_size = 0;
-  unsigned int i = 0;
-  *normal_copy_required = false;
-
-  /* This is required at least to initialize fiemap->fm_start,
-     but also serves (in mid 2010) to appease valgrind, which
-     appears not to know the semantics of the FIEMAP ioctl. */
-  memset (&fiemap_buf, 0, sizeof fiemap_buf);
-
-  do
-    {
-      fiemap->fm_length = FIEMAP_MAX_OFFSET;
-      fiemap->fm_flags = FIEMAP_FLAG_SYNC;
-      fiemap->fm_extent_count = count;
-
-      /* When ioctl(2) fails, fall back to the normal copy only if it
-         is the first time we met.  */
-      if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
-        {
-          /* If the first ioctl fails, tell the caller that it is
-             ok to proceed with a normal copy.  */
-          if (i == 0)
-            *normal_copy_required = true;
-          else
-            {
-              /* If the second or subsequent ioctl fails, diagnose it,
-                 since it ends up causing the entire copy/cp to fail.  */
-              error (0, errno, _("%s: FIEMAP ioctl failed"), quote (src_name));
-            }
-          return false;
-        }
-
-      /* If 0 extents are returned, then more ioctls are not needed.  */
-      if (fiemap->fm_mapped_extents == 0)
-        break;
-
-      for (i = 0; i < fiemap->fm_mapped_extents; i++)
-        {
-          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
-
-          off_t ext_logical = fm_ext[i].fe_logical;
-          uint64_t ext_len = fm_ext[i].fe_length;
-
-          if (lseek (src_fd, ext_logical, SEEK_SET) < 0)
-            {
-              error (0, errno, _("cannot lseek %s"), quote (src_name));
-              return false;
-            }
-
-          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0)
-            {
-              error (0, errno, _("cannot lseek %s"), quote (dst_name));
-              return false;
-            }
-
-          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
-            {
-              last_ext_logical = ext_logical;
-              last_ext_len = ext_len;
-              last = true;
-            }
-
-          while (ext_len)
-            {
-              char buf[buf_size];
-
-              /* Avoid reading into the holes if the left extent
-                 length is shorter than the buffer size.  */
-              if (ext_len < buf_size)
-                buf_size = ext_len;
-
-              ssize_t n_read = read (src_fd, buf, buf_size);
-              if (n_read < 0)
-                {
-#ifdef EINTR
-                  if (errno == EINTR)
-                    continue;
-#endif
-                  error (0, errno, _("reading %s"), quote (src_name));
-                  return false;
-                }
-
-              if (n_read == 0)
-                {
-                  /* Figure out how many bytes read from the last extent.  */
-                  last_read_size = last_ext_len - ext_len;
-                  break;
-                }
-
-              if (full_write (dest_fd, buf, n_read) != n_read)
-                {
-                  error (0, errno, _("writing %s"), quote (dst_name));
-                  return false;
-                }
-
-              ext_len -= n_read;
-            }
-        }
-
-      fiemap->fm_start = fm_ext[i - 1].fe_logical + fm_ext[i - 1].fe_length;
-
-    } while (! last);
-
-  /* If a file ends up with holes, the sum of the last extent logical offset
-     and the read-returned size will be shorter than the actual size of the
-     file.  Use ftruncate to extend the length of the destination file.  */
-  if (last_ext_logical + last_read_size < src_total_size)
-    {
-      if (ftruncate (dest_fd, src_total_size) < 0)
-        {
-          error (0, errno, _("failed to extend %s"), quote (dst_name));
-          return false;
-        }
-    }
-
-  return true;
-}
-#else
-static bool fiemap_copy (ignored) { errno == ENOTSUP; return false; }
-#endif
-
 /* FIXME: describe */
 /* FIXME: rewrite this to use a hash table so we avoid the quadratic
    performance hit that's probably noticeable only on trees deeper
@@ -830,25 +679,6 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }

-      if (make_holes)
-        {
-          bool require_normal_copy;
-          /* Perform efficient FIEMAP copy for sparse files, fall back to the
-             standard copy only if the ioctl(2) fails.  */
-          if (fiemap_copy (source_desc, dest_desc, buf_size,
-                           src_open_sb.st_size, src_name,
-                           dst_name, &require_normal_copy))
-            goto preserve_metadata;
-          else
-            {
-              if (! require_normal_copy)
-                {
-                  return_val = false;
-                  goto close_src_and_dst_desc;
-                }
-            }
-        }
-
       /* If not making a sparse file, try to use a more-efficient
          buffer size.  */
       if (! make_holes)
@@ -977,7 +807,6 @@ copy_reg (char const *src_name, char const *dst_name,
         }
     }

-preserve_metadata:
   if (x->preserve_timestamps)
     {
       struct timespec timespec[2];
-- 
1.5.4.3

From f083f1a52ec5baba90aa228c1053f4a32127b3b2 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Tue, 20 Jul 2010 23:10:29 +0800
Subject: [PATCH 2/3] cp: add a new module for scanning extents

* src/extent-scan.c: Source code for scanning extents.
  Call init_extent_scan() to return an extents map.
  Call get_next_extent() to get the next extent for each iteration.
  Call close_extent_scan() to tear down the scan, it now do nothing.
* src/extent-scan.h: Header file of extent-scan.c.
* src/Makefile.am: Reference it.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 src/Makefile.am   |    3 +-
 src/extent-scan.c |  137 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/extent-scan.h |   25 ++++++++++
 3 files changed, 164 insertions(+), 1 deletions(-)
 create mode 100644 src/extent-scan.c
 create mode 100644 src/extent-scan.h

diff --git a/src/Makefile.am b/src/Makefile.am
index 7d56312..fb8186c 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -145,6 +145,7 @@ noinst_HEADERS =	\
   copy.h		\
   cp-hash.h		\
   dircolors.h		\
+  extent-scan.h         \
   fiemap.h		\
   fs.h			\
   group-list.h		\
@@ -459,7 +460,7 @@ uninstall-local:
 	  fi; \
 	fi

-copy_sources = copy.c cp-hash.c
+copy_sources = copy.c cp-hash.c extent-scan.c

 # Use `ginstall' in the definition of PROGRAMS and in dependencies to avoid
 # confusion with the `install' target.  The install rule transforms `ginstall'
diff --git a/src/extent-scan.c b/src/extent-scan.c
new file mode 100644
index 0000000..c4085e0
--- /dev/null
+++ b/src/extent-scan.c
@@ -0,0 +1,137 @@
+/* extent-scan.c -- core functions for scanning extents
+   Copyright (C) 2010 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+   Written by Jie Liu (jeff.liu <at> oracle.com).  */
+
+#include <config.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/ioctl.h>
+
+#include "system.h"
+#include "extent-scan.h"
+#include "error.h"
+#include "quote.h"
+
+#ifndef HAVE_FIEMAP
+# include "fiemap.h"
+#endif
+
+/* The number of extents currently scan returned.  */
+static size_t current_scanned_extents_count = 0;
+
+#ifdef __linux__
+# ifndef FS_IOC_FIEMAP
+#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
+# endif
+/* Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to
+   obtain a map of file extents excluding holes.  */
+extern void *
+init_extent_scan (int src_fd, const char *src_name,
+                  bool *normal_copy_required,
+                  bool *hit_last_extent)
+{
+  union { struct fiemap f; char c[4096]; } fiemap_buf;
+  struct fiemap *fiemap = &fiemap_buf.f;
+  struct fiemap_extent *fm_extents = &fiemap->fm_extents[0];
+  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_extents };
+  verify (count != 0);
+  static uint64_t next_map_start = 0;
+
+  /* This is required at least to initialize fiemap->fm_start,
+   * but also serves (in mid 2010) to appease valgrind, which
+   * appears not to know the semantics of the FIEMAP ioctl. */
+  memset (&fiemap_buf, 0, sizeof fiemap_buf);
+
+  fiemap->fm_start = next_map_start;
+  fiemap->fm_flags = FIEMAP_FLAG_SYNC;
+  fiemap->fm_extent_count = count;
+  fiemap->fm_length = FIEMAP_MAX_OFFSET - next_map_start;
+
+  /* When ioctl(2) fails, fall back to the normal copy only if it
+     is the first time we met.  */
+  if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
+    {
+      error (0, errno, _("%s: FIEMAP ioctl failed"), quote (src_name));
+
+      if (next_map_start == 0)
+        *normal_copy_required = true;
+
+      return NULL;
+    }
+
+  /* If 0 extents are returned, then more init_extent_scan() are not needed.  */
+  if (fiemap->fm_mapped_extents == 0)
+    {
+      *hit_last_extent = true;
+      return NULL;
+    }
+
+  current_scanned_extents_count = fiemap->fm_mapped_extents;
+  unsigned int last_extent_index = current_scanned_extents_count - 1;
+
+  if (fm_extents[last_extent_index].fe_flags & FIEMAP_EXTENT_LAST)
+    {
+      *hit_last_extent = true;
+      return fm_extents;
+    }
+
+  next_map_start = fm_extents[last_extent_index].fe_logical +
+                   fm_extents[last_extent_index].fe_length;
+
+  return fm_extents;
+}
+
+/* Return an extent's logical offset and length for each iteration.  */
+extern bool
+get_next_extent (void *scanned_extents, off_t *extent_logical,
+                 uint64_t *extent_length)
+{
+  static size_t i = 0;
+  struct fiemap_extent *fm_extents = (struct fiemap_extent *) scanned_extents;
+
+  if (i < current_scanned_extents_count)
+    {
+      *extent_logical = fm_extents[i].fe_logical;
+      *extent_length = fm_extents[i].fe_length;
+      i++;
+      return true;
+    }
+
+  return false;
+}
+
+extern void
+close_extent_scan (void)
+{
+  return ;
+}
+#else
+extent void *
+init_extent_scan (int src_fd, const char *src_name,
+                  bool *hit_last_extent, bool *normal_copy_required)
+{
+  *normal_copy_required = true;
+  errno = ENOTSUP;
+  (void) src_fd;
+  (void) src_name;
+  (void) hit_last_extent;
+  (void) normal_copy_required;
+  return false;
+}
+extern bool get_next_extent (ignored) { errno = ENOTSUP; return false; }
+extern void close_extent_scan (ignored) { error = ENOTSUP; return ; }
+#endif
diff --git a/src/extent-scan.h b/src/extent-scan.h
new file mode 100644
index 0000000..e7e373f
--- /dev/null
+++ b/src/extent-scan.h
@@ -0,0 +1,25 @@
+/* core functions for efficient reading sparse files
+   Copyright (C) 2010 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifndef SPARSE_CORE_H
+# define SPARSE_CORE_H
+
+void *init_extent_scan (int src_fd, const char *src_name,
+                        bool *normal_copy_required,
+                        bool *hit_last_extent);
+bool get_next_extent (void *scanned_extents, off_t *extent_logical, uint64_t *extent_length);
+void close_extent_scan (void);
+#endif /* SPARSE_CORE_H */
-- 
1.5.4.3


From f23169c2c1721b8888dccb77000f79ddf9804df0 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Tue, 20 Jul 2010 23:11:08 +0800
Subject: [PATCH 3/3] cp: add extents_copy() for efficient sparse file copy

* src/copy.c (copy_reg): Now, `cp' attempt to make use of the new extent scan
  module for efficient sparse file copy through extents_copy(), fall back to a
  normal copy if the underlaying file system does not support it.
  We honor --sparse=never by writing all data but use extents scan if available
  to read source file for the efficiently read.
* src/copy.c: Add fill_with_holes_ok() for write zeros as holes to destination file.
* po/POTFILES.in: add extent-scan.c to it.

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 po/POTFILES.in |    1 +
 src/copy.c     |  190 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 191 insertions(+), 0 deletions(-)

diff --git a/po/POTFILES.in b/po/POTFILES.in
index c862877..2ac1993 100644
--- a/po/POTFILES.in
+++ b/po/POTFILES.in
@@ -60,6 +60,7 @@ src/echo.c
 src/env.c
 src/expand.c
 src/expr.c
+src/extent-scan.c
 src/factor.c
 src/false.c
 src/fmt.c
diff --git a/src/copy.c b/src/copy.c
index 171499c..6d89bbe 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -36,6 +36,7 @@
 #include "copy.h"
 #include "cp-hash.h"
 #include "error.h"
+#include "extent-scan.h"
 #include "fcntl--.h"
 #include "file-set.h"
 #include "filemode.h"
@@ -149,6 +150,176 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

+/* Write zeros as holes to the destination file.  */
+static bool
+fill_with_holes_ok (int dest_fd, const char *dst_name,
+                    char *buf, size_t buf_size,
+                    uint64_t holes_len)
+{
+  while (buf_size < holes_len)
+    {
+      if (full_write (dest_fd, buf, buf_size) != buf_size)
+        {
+          error (0, errno, _("writing %s"), quote (dst_name));
+          return false;
+        }
+        holes_len -= buf_size;
+    }
+
+  if (0 < holes_len)
+    {
+      if (full_write (dest_fd, buf, holes_len) != holes_len)
+        {
+          error (0, errno, _("writing %s"), quote (dst_name));
+          return false;
+        }
+    }
+
+  return true;
+}
+
+/* Perform an efficient extents copy, if possible.  This avoids
+   the overhead of detecting holes in hole-introducing/preserving
+   copy, and thus make copying sparse files much more efficient.
+   Upon a successful copy, return true.  If the first initial extent
+   scan fails, set *NORMAL_COPY_REQUIRED to true and return false.
+   Upon any other failure, set *NORMAL_COPY_REQUIRED to false and
+   return false.  */
+static bool
+extents_copy (int source_desc, int dest_desc,
+              char *buf, size_t buf_size,
+              const char *src_name, const char *dst_name,
+              bool make_holes, size_t src_total_size,
+              bool *require_normal_copy)
+{
+  bool init_extent_scan_failed = false;
+  bool hit_last_extent = false;
+  void *scanned_extents;
+  off_t last_ext_logical = 0;
+  off_t ext_logical = 0;
+  uint64_t last_ext_len = 0;
+  uint64_t ext_len = 0;
+  uint64_t holes_len = 0;
+  uint64_t last_read_size = 0;
+
+  if (! make_holes)
+    memset (buf, 0, buf_size);
+
+  do
+    {
+      scanned_extents = init_extent_scan (source_desc, src_name,
+                                          &init_extent_scan_failed,
+                                          &hit_last_extent);
+      if (init_extent_scan_failed)
+        {
+          *require_normal_copy = true;
+          return false;
+        }
+
+      while (get_next_extent (scanned_extents, &ext_logical, &ext_len))
+        {
+          assert (ext_logical <= OFF_T_MAX);
+
+          if (lseek (source_desc, ext_logical, SEEK_SET) < 0)
+            {
+              error (0, errno, _("cannot lseek %s"), quote (src_name));
+              return false;
+            }
+
+          if (make_holes)
+            {
+              if (lseek (dest_desc, ext_logical, SEEK_SET) < 0)
+                {
+                  error (0, errno, _("cannot lseek %s"), quote (dst_name));
+                  return false;
+                }
+            }
+          else
+            {
+              /* If not making a sparse file, write zeros to the destination
+                 file if there is a hole between the last and current extent.  */
+              if (last_ext_logical + last_ext_len < ext_logical)
+                {
+                  holes_len = ext_logical - last_ext_logical - last_ext_len;
+                  if (! fill_with_holes_ok (dest_desc, dst_name, buf, buf_size, holes_len))
+                    return false;
+                }
+            }
+
+          last_ext_logical = ext_logical;
+          last_ext_len = ext_len;
+
+          last_read_size = 0;
+          while (ext_len)
+            {
+              /* Avoid reading into the holes if the left extent
+                 length is shorter than the buffer size.  */
+              if (ext_len < buf_size)
+                buf_size = ext_len;
+
+              ssize_t n_read = read (source_desc, buf, buf_size);
+              if (n_read < 0)
+                {
+#ifdef EINTR
+                  if (errno == EINTR)
+                    continue;
+#endif
+                  error (0, errno, _("reading %s"), quote (src_name));
+                  return false;
+                }
+
+              if (n_read == 0)
+                {
+                  last_read_size = last_ext_len - ext_len;
+                  break;
+                }
+
+              if (full_write (dest_desc, buf, n_read) != n_read)
+                {
+                  error (0, errno, _("writing %s"), quote (dst_name));
+                  return false;
+                }
+
+              ext_len -= n_read;
+              last_read_size += n_read;
+            }
+        }
+    } while (! hit_last_extent);
+
+  close_extent_scan ();
+
+  /* If a file ends up with holes, the sum of the last extent logical offset
+     and the read-returned size or the last extent length will be shorter than
+     the actual size of the file.  Use ftruncate to extend the length of the
+     destination file if make_holes, or write zeros up to the actual size of the
+     file.  */
+  if (make_holes)
+    {
+      if (last_ext_logical + last_read_size < src_total_size)
+        {
+          if (ftruncate (dest_desc, src_total_size) < 0)
+            {
+              error (0, errno, _("failed to extend %s"), quote (dst_name));
+              return false;
+            }
+        }
+    }
+  else
+    {
+      if (last_ext_logical + last_ext_len < src_total_size)
+        {
+          holes_len = src_total_size - last_ext_logical - last_ext_len;
+          if (0 < holes_len)
+            {
+              if (! fill_with_holes_ok (dest_desc, dst_name, buf, buf_size, holes_len))
+                return false;
+            }
+        }
+    }
+
+  return true;
+}
+
 /* FIXME: describe */
 /* FIXME: rewrite this to use a hash table so we avoid the quadratic
    performance hit that's probably noticeable only on trees deeper
@@ -707,6 +878,24 @@ copy_reg (char const *src_name, char const *dst_name,
       buf_alloc = xmalloc (buf_size + buf_alignment_slop);
       buf = ptr_align (buf_alloc, buf_alignment);

+      bool require_normal_copy;
+      /* Perform efficient extents copy for sparse files, fall back to the
+         standard copy only if the first initial extent scan fails.  If the
+         `--sparse=never' option was specified, we writing all data but use
+         extents copy if available to efficiently read.  */
+      if (extents_copy (source_desc, dest_desc, buf, buf_size,
+                        src_name, dst_name, make_holes,
+                        src_open_sb.st_size, &require_normal_copy))
+        goto preserve_metadata;
+      else
+        {
+          if (! require_normal_copy)
+            {
+              return_val = false;
+              goto close_src_and_dst_desc;
+            }
+        }
+
       while (true)
         {
           word *wp = NULL;
@@ -807,6 +996,7 @@ copy_reg (char const *src_name, char const *dst_name,
         }
     }

+preserve_metadata:
   if (x->preserve_timestamps)
     {
       struct timespec timespec[2];
-- 
1.5.4.3



Thanks,
-Jeff

-- 
The knowledge you get, no matter how much it is, must be possessed yourself and nourished with your
own painstaking efforts and be your achievement through hard work.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Mon, 20 Sep 2010 07:17:01 GMT) Full text and rfc822 format available.

Message #272 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Mon, 20 Sep 2010 15:18:01 +0800
Hi Jim and All,

Do you have any comments for the current implementation?

Sorry for my so delayed asking.


Thanks,
-Jeff

jeff.liu wrote:
> Hello All,
> 
> Below is my patches to isolate the extents scan and fetch functions in a new module to improve its
> extendibility.
> 
> It introduce a new file 'src/extent-scan.c' to place those functions, and then call those functions
> from extents_copy() at copy_reg() to process the regular file copy.
> 
> In addition to this, another major change is to copy all data if '--sparse=never' option is
> specified.  It write all data to destination file but using extents_copy() if available for
> efficient read source file, and it try to figure out the holes between the previous and current
> extents, and call fill_with_holes_ok() to write zeros as holes to the destination file if it is.
> Call file_with_holes_ok() to write zeros up to the source file size if hit the last extent of the
> source file and there is a hole behind it.
> 
> I have not implement the solaris lseek(2) at the moment for lack of solaris environment, it need to
> delay a period of time.
> 
> According to my tryout, it works for those 4 filesystems in common use, you all know.
> As usual, any comments are welcome!
> 
> 
> From 70773fdf1d85ba070e054b0467a7a0e1e2b00ea8 Mon Sep 17 00:00:00 2001
> From: Jie Liu <jeff.liu <at> oracle.com>
> Date: Tue, 20 Jul 2010 20:35:25 +0800
> Subject: [PATCH 1/3] cp: delete fiemap_copy() related stuff from copy.c
> 
> * delete fiemap_copy(), now it is implemented as a module.
> 
> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
> ---
>  src/copy.c |  171 ------------------------------------------------------------
>  1 files changed, 0 insertions(+), 171 deletions(-)
> 
> diff --git a/src/copy.c b/src/copy.c
> index f48c74d..171499c 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -63,10 +63,6 @@
> 
>  #include <sys/ioctl.h>
> 
> -#ifndef HAVE_FIEMAP
> -# include "fiemap.h"
> -#endif
> -
>  #ifndef HAVE_FCHOWN
>  # define HAVE_FCHOWN false
>  # define fchown(fd, uid, gid) (-1)
> @@ -153,153 +149,6 @@ clone_file (int dest_fd, int src_fd)
>  #endif
>  }
> 
> -#ifdef __linux__
> -# ifndef FS_IOC_FIEMAP
> -#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
> -# endif
> -/* Perform a FIEMAP copy, if possible.
> -   Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to
> -   obtain a map of file extents excluding holes.  This avoids the
> -   overhead of detecting holes in a hole-introducing/preserving copy,
> -   and thus makes copying sparse files much more efficient.  Upon a
> -   successful copy, return true.  If the initial ioctl fails, set
> -   *NORMAL_COPY_REQUIRED to true and return false.  Upon any other
> -   failure, set *NORMAL_COPY_REQUIRED to false and return false.  */
> -static bool
> -fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
> -             off_t src_total_size, char const *src_name,
> -             char const *dst_name, bool *normal_copy_required)
> -{
> -  bool last = false;
> -  union { struct fiemap f; char c[4096]; } fiemap_buf;
> -  struct fiemap *fiemap = &fiemap_buf.f;
> -  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
> -  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
> -  verify (count != 0);
> -
> -  off_t last_ext_logical = 0;
> -  uint64_t last_ext_len = 0;
> -  uint64_t last_read_size = 0;
> -  unsigned int i = 0;
> -  *normal_copy_required = false;
> -
> -  /* This is required at least to initialize fiemap->fm_start,
> -     but also serves (in mid 2010) to appease valgrind, which
> -     appears not to know the semantics of the FIEMAP ioctl. */
> -  memset (&fiemap_buf, 0, sizeof fiemap_buf);
> -
> -  do
> -    {
> -      fiemap->fm_length = FIEMAP_MAX_OFFSET;
> -      fiemap->fm_flags = FIEMAP_FLAG_SYNC;
> -      fiemap->fm_extent_count = count;
> -
> -      /* When ioctl(2) fails, fall back to the normal copy only if it
> -         is the first time we met.  */
> -      if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
> -        {
> -          /* If the first ioctl fails, tell the caller that it is
> -             ok to proceed with a normal copy.  */
> -          if (i == 0)
> -            *normal_copy_required = true;
> -          else
> -            {
> -              /* If the second or subsequent ioctl fails, diagnose it,
> -                 since it ends up causing the entire copy/cp to fail.  */
> -              error (0, errno, _("%s: FIEMAP ioctl failed"), quote (src_name));
> -            }
> -          return false;
> -        }
> -
> -      /* If 0 extents are returned, then more ioctls are not needed.  */
> -      if (fiemap->fm_mapped_extents == 0)
> -        break;
> -
> -      for (i = 0; i < fiemap->fm_mapped_extents; i++)
> -        {
> -          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
> -
> -          off_t ext_logical = fm_ext[i].fe_logical;
> -          uint64_t ext_len = fm_ext[i].fe_length;
> -
> -          if (lseek (src_fd, ext_logical, SEEK_SET) < 0)
> -            {
> -              error (0, errno, _("cannot lseek %s"), quote (src_name));
> -              return false;
> -            }
> -
> -          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0)
> -            {
> -              error (0, errno, _("cannot lseek %s"), quote (dst_name));
> -              return false;
> -            }
> -
> -          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
> -            {
> -              last_ext_logical = ext_logical;
> -              last_ext_len = ext_len;
> -              last = true;
> -            }
> -
> -          while (ext_len)
> -            {
> -              char buf[buf_size];
> -
> -              /* Avoid reading into the holes if the left extent
> -                 length is shorter than the buffer size.  */
> -              if (ext_len < buf_size)
> -                buf_size = ext_len;
> -
> -              ssize_t n_read = read (src_fd, buf, buf_size);
> -              if (n_read < 0)
> -                {
> -#ifdef EINTR
> -                  if (errno == EINTR)
> -                    continue;
> -#endif
> -                  error (0, errno, _("reading %s"), quote (src_name));
> -                  return false;
> -                }
> -
> -              if (n_read == 0)
> -                {
> -                  /* Figure out how many bytes read from the last extent.  */
> -                  last_read_size = last_ext_len - ext_len;
> -                  break;
> -                }
> -
> -              if (full_write (dest_fd, buf, n_read) != n_read)
> -                {
> -                  error (0, errno, _("writing %s"), quote (dst_name));
> -                  return false;
> -                }
> -
> -              ext_len -= n_read;
> -            }
> -        }
> -
> -      fiemap->fm_start = fm_ext[i - 1].fe_logical + fm_ext[i - 1].fe_length;
> -
> -    } while (! last);
> -
> -  /* If a file ends up with holes, the sum of the last extent logical offset
> -     and the read-returned size will be shorter than the actual size of the
> -     file.  Use ftruncate to extend the length of the destination file.  */
> -  if (last_ext_logical + last_read_size < src_total_size)
> -    {
> -      if (ftruncate (dest_fd, src_total_size) < 0)
> -        {
> -          error (0, errno, _("failed to extend %s"), quote (dst_name));
> -          return false;
> -        }
> -    }
> -
> -  return true;
> -}
> -#else
> -static bool fiemap_copy (ignored) { errno == ENOTSUP; return false; }
> -#endif
> -
>  /* FIXME: describe */
>  /* FIXME: rewrite this to use a hash table so we avoid the quadratic
>     performance hit that's probably noticeable only on trees deeper
> @@ -830,25 +679,6 @@ copy_reg (char const *src_name, char const *dst_name,
>  #endif
>          }
> 
> -      if (make_holes)
> -        {
> -          bool require_normal_copy;
> -          /* Perform efficient FIEMAP copy for sparse files, fall back to the
> -             standard copy only if the ioctl(2) fails.  */
> -          if (fiemap_copy (source_desc, dest_desc, buf_size,
> -                           src_open_sb.st_size, src_name,
> -                           dst_name, &require_normal_copy))
> -            goto preserve_metadata;
> -          else
> -            {
> -              if (! require_normal_copy)
> -                {
> -                  return_val = false;
> -                  goto close_src_and_dst_desc;
> -                }
> -            }
> -        }
> -
>        /* If not making a sparse file, try to use a more-efficient
>           buffer size.  */
>        if (! make_holes)
> @@ -977,7 +807,6 @@ copy_reg (char const *src_name, char const *dst_name,
>          }
>      }
> 
> -preserve_metadata:
>    if (x->preserve_timestamps)
>      {
>        struct timespec timespec[2];





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Mon, 20 Sep 2010 11:33:02 GMT) Full text and rfc822 format available.

Message #275 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Mon, 20 Sep 2010 13:35:05 +0200
jeff.liu wrote:
> Hi Jim and All,
>
> Do you have any comments for the current implementation?

Hi Jeff,

Thanks for the reminder.
I've just gone back and looked at those patches:

    http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/20534/focus=21008

There are some superficial problems.

First, whenever you move code from one place to another,
the commit that performs the move should be careful to
induce no other change.  In this case, the change should
remove code from copy.c and create the new .c file with
code that is essentially identical.  You'll have to remove
a "static" attribute on the primary function(s), and add
#include directives in the new file, but those are inevitable.
Also, in copy.c, you will remove the function body
and associated #include directives, adding an #include
of the new .h file.  Of course, this change must also update
src/Makefile.am, and the result should pass "make distcheck".

But perhaps you require new functions like init_extent_scan
in order to use the abstracted function properly.
In that case, your first commit would add the new functions
in copy.c and make use of them there.  Then you would move
all of the functions to their new file in the next commit,
making no semantic change.

Note however, that this copying code is intended to be usable in a
multi-threaded application, and thus must avoid using internal static
state.  Your patch adds a few new static variables, each of which
would cause trouble in such an application:

  +static size_t current_scanned_extents_count = 0;
  +  static uint64_t next_map_start = 0;
  +  static size_t i = 0;

While you're rearranging your patch along the lines above,
please eliminate those static variables, too.

Also, this new function should be adjusted:

  +/* Write zeros as holes to the destination file.  */
  +static bool
  +fill_with_holes_ok (int dest_fd, const char *dst_name,
  +                    char *buf, size_t buf_size,
  +                    uint64_t holes_len)

Its signature is unnecessarily complicated for a function
that does nothing more than write N zero bytes to a file descriptor.
Also, the function name is misleading (as is its holes_len parameter),
since it certainly does not create holes.

Hmm... now I'm suspicious: could the second use of your fill_with_holes_ok
write from an uninitialized "buf"?  It appears that is possible,
but I confess not to have applied the patch.

What do you think of this signature,

  static bool
  write_zeros (int fd, uint64_t n_bytes)

That would require a buffer full of zeros, preferably of optimal size.
It could use a body like this,

{
  static char zero[32 * 1024];
  while (n_bytes)
    {
      uint64_t n = MIN (sizeof zero, n_bytes);
      if (full_write (fd, zero, n)) != n)
        return false;
      n_bytes -= n;
    }
  return true;
}

or even calloc an IO_BUFSIZE-byte buffer on the first call
and use that.  Yes, using calloc appears better, since this code
will end up being used relatively infrequently.  Or perhaps better
still, do use calloc, but if the allocation fails, fall back on
using a smaller static buffer, of size say 1024 bytes.

Of course, simplifying the function means each caller
will have to diagnose failure, but imho, that's preferable
in this case.

Jim




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 21 Sep 2010 06:54:02 GMT) Full text and rfc822 format available.

Message #278 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 21 Sep 2010 14:47:10 +0800
Hi Jim,

Thanks for your prompt response and kindly suggestion!
I totally agree with your review comments, I will post next round patches according to that soon.


Regards,
-Jeff

Jim Meyering wrote:
> jeff.liu wrote:
>> Hi Jim and All,
>>
>> Do you have any comments for the current implementation?
> 
> Hi Jeff,
> 
> Thanks for the reminder.
> I've just gone back and looked at those patches:
> 
>     http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/20534/focus=21008
> 
> There are some superficial problems.
> 
> First, whenever you move code from one place to another,
> the commit that performs the move should be careful to
> induce no other change.  In this case, the change should
> remove code from copy.c and create the new .c file with
> code that is essentially identical.  You'll have to remove
> a "static" attribute on the primary function(s), and add
> #include directives in the new file, but those are inevitable.
> Also, in copy.c, you will remove the function body
> and associated #include directives, adding an #include
> of the new .h file.  Of course, this change must also update
> src/Makefile.am, and the result should pass "make distcheck".
> 
> But perhaps you require new functions like init_extent_scan
> in order to use the abstracted function properly.
> In that case, your first commit would add the new functions
> in copy.c and make use of them there.  Then you would move
> all of the functions to their new file in the next commit,
> making no semantic change.
> 
> Note however, that this copying code is intended to be usable in a
> multi-threaded application, and thus must avoid using internal static
> state.  Your patch adds a few new static variables, each of which
> would cause trouble in such an application:
> 
>   +static size_t current_scanned_extents_count = 0;
>   +  static uint64_t next_map_start = 0;
>   +  static size_t i = 0;
> 
> While you're rearranging your patch along the lines above,
> please eliminate those static variables, too.
> 
> Also, this new function should be adjusted:
> 
>   +/* Write zeros as holes to the destination file.  */
>   +static bool
>   +fill_with_holes_ok (int dest_fd, const char *dst_name,
>   +                    char *buf, size_t buf_size,
>   +                    uint64_t holes_len)
> 
> Its signature is unnecessarily complicated for a function
> that does nothing more than write N zero bytes to a file descriptor.
> Also, the function name is misleading (as is its holes_len parameter),
> since it certainly does not create holes.
> 
> Hmm... now I'm suspicious: could the second use of your fill_with_holes_ok
> write from an uninitialized "buf"?  It appears that is possible,
> but I confess not to have applied the patch.
> 
> What do you think of this signature,
> 
>   static bool
>   write_zeros (int fd, uint64_t n_bytes)
> 
> That would require a buffer full of zeros, preferably of optimal size.
> It could use a body like this,
> 
> {
>   static char zero[32 * 1024];
>   while (n_bytes)
>     {
>       uint64_t n = MIN (sizeof zero, n_bytes);
>       if (full_write (fd, zero, n)) != n)
>         return false;
>       n_bytes -= n;
>     }
>   return true;
> }
> 
> or even calloc an IO_BUFSIZE-byte buffer on the first call
> and use that.  Yes, using calloc appears better, since this code
> will end up being used relatively infrequently.  Or perhaps better
> still, do use calloc, but if the allocation fails, fall back on
> using a smaller static buffer, of size say 1024 bytes.
> 
> Of course, simplifying the function means each caller
> will have to diagnose failure, but imho, that's preferable
> in this case.
> 
> Jim
> 
> 
> 





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sun, 26 Sep 2010 08:21:01 GMT) Full text and rfc822 format available.

Message #281 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Sun, 26 Sep 2010 16:20:42 +0800
Hi Jim,

Sorry for the delay.

This is the new patch to isolate the stuff regarding to extents reading to a new module. and teach
cp(1) to make use of it.

Changes:
========
extent-scan.h/extent-scan.c:
1. Removed all the *static* variables from this module.
I'd like to introduce two new data structures "struct extent_scan" and "struct extent_info" which
are used to reserve the information goes through a file reading and the info(logical
offset/length/flags) regarding to each extent scan-returned separately, it is intended to work as a
wrap for either fiemap or SEEK_DATA/SEEK_HOLE info.

/* Structure used to reserve extent information.  */
struct extent_info
{
  /* Logical offset of an extent.  */
  off_t ext_logical;

  /* Extent length.  */
  uint64_t ext_length;

  /* Extent flags, use it for FIEMAP only.  */
  uint32_t ext_flags;
};

/* Structure used to reserve extent scan information.  */
struct extent_scan
{
  /* File descriptor of extent scan run against.  */
  int fd;

  /* File name for debugging purpose.  */
  char *fname;

  /* Next extent scan start offset.  */
  off_t scan_start;

  /* How many extents info returned for a scan.  */
  uint32_t ei_count;

  /* If true, fall back to a normal copy, either
     set by the failure of ioctl(2) for FIEMAP or
     lseek(2) with SEEK_DATA.  */
  bool initial_scan_failed;

  /* If ture, the total extent scan per file has been finished.  */
  bool hit_last_extent;

  /* Extent information.  */
  struct extent_info *ext_info;
};


2. Functions in the new module:
void open_extent_scan(int fd, char const *fname, struct extent_scan **scan);

Call this function first, it do allocate the spaces for "extent_scan" and initialize some of entries
in this structure if necessary.  The returned variable will be used as the input argument of
get_extents_info() which shown as below.

bool get_extents_info(struct extent_scan *scan);  maybe need to figure out a better name?

This function accept an initialized extent_scan and allocated the spaces for extent_info based on
the number of extents through either call fiemap ioctl(2) or lseek(2) SEEK_DATA/SEEK_HOLE stuff.
Fill the extent_info with the logical offset and length per extent, and set the
extent_info->ext_flags for fiemap, or just set it to ZERO for lseek.

void close_extent_scan(struct extent_scan *scan);

Call this function after every get_extents_info() done to release the memory of extent_info, but
keep the extent_scan space.  After a file copy done successfully, call it again to release the
extent_scan.

copy.c:
======
1. Replace fill_with_holes_ok() with write_zeros (int fd, uint64_t len), but is it better to add an
char const *src_name to this function for debugging purpose?

2. Replace fiemap_copy() with extent_copy()

According to my tryout, it works fine, both make syntax-check and make distcheck also run passed.
I hope you guys would like this implementation. :)

Any comments are appreciated!


From 8ca00082e68b3bef4ee0ed042cb7a70dcaf154e8 Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Sun, 26 Sep 2010 16:03:48 +0800
Subject: [PATCH 1/1] cp: add a new module for scanning extents

* src/extent-scan.c: Source code for scanning extents.
  Call open_extent_scan() to allocate and return an initialized extent scan.
  Call get_extents_info() to get a number of extents for each iteration.
  Call close_extent_scan() to release the space allocated extent_info per extent scan,
  and extent_scan per file scan.
* src/extent-scan.h: Header file of extent-scan.c.
* src/Makefile.am: Reference it and link it to copy_source.
* src/copy.c: Make use of the new module, replace fiemap_copy() with extent_copy().

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 src/Makefile.am   |    2 +-
 src/copy.c        |  186 +++++++++++++++++++++++++++++------------------------
 src/extent-scan.c |  137 +++++++++++++++++++++++++++++++++++++++
 src/extent-scan.h |   71 ++++++++++++++++++++
 4 files changed, 312 insertions(+), 84 deletions(-)
 create mode 100644 src/extent-scan.c
 create mode 100644 src/extent-scan.h

diff --git a/src/Makefile.am b/src/Makefile.am
index 7d56312..2412b16 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -459,7 +459,7 @@ uninstall-local:
 	  fi; \
 	fi

-copy_sources = copy.c cp-hash.c
+copy_sources = copy.c cp-hash.c extent-scan.c

 # Use `ginstall' in the definition of PROGRAMS and in dependencies to avoid
 # confusion with the `install' target.  The install rule transforms `ginstall'
diff --git a/src/copy.c b/src/copy.c
index f48c74d..0c54365 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -35,6 +35,7 @@
 #include "buffer-lcm.h"
 #include "copy.h"
 #include "cp-hash.h"
+#include "extent-scan.h"
 #include "error.h"
 #include "fcntl--.h"
 #include "file-set.h"
@@ -63,10 +64,6 @@

 #include <sys/ioctl.h>

-#ifndef HAVE_FIEMAP
-# include "fiemap.h"
-#endif
-
 #ifndef HAVE_FCHOWN
 # define HAVE_FCHOWN false
 # define fchown(fd, uid, gid) (-1)
@@ -153,74 +150,67 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

-#ifdef __linux__
-# ifndef FS_IOC_FIEMAP
-#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
-# endif
-/* Perform a FIEMAP copy, if possible.
-   Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to
-   obtain a map of file extents excluding holes.  This avoids the
-   overhead of detecting holes in a hole-introducing/preserving copy,
-   and thus makes copying sparse files much more efficient.  Upon a
-   successful copy, return true.  If the initial ioctl fails, set
-   *NORMAL_COPY_REQUIRED to true and return false.  Upon any other
-   failure, set *NORMAL_COPY_REQUIRED to false and return false.  */
 static bool
-fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
-             off_t src_total_size, char const *src_name,
-             char const *dst_name, bool *normal_copy_required)
+write_zeros (int fd, uint64_t n_bytes)
 {
-  bool last = false;
-  union { struct fiemap f; char c[4096]; } fiemap_buf;
-  struct fiemap *fiemap = &fiemap_buf.f;
-  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
-  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
-  verify (count != 0);
+  char *zeros = calloc (IO_BUFSIZE, sizeof (char));
+  if (! zeros)
+    {
+      /* Try a small buffer.  */
+      static char zeros[1024];
+    }

+  while (n_bytes)
+    {
+      uint64_t n = MIN (sizeof zeros, n_bytes);
+      if ((full_write (fd, zeros, n)) != n)
+        return false;
+      n_bytes -= n;
+    }
+
+  return true;
+}
+
+/* Perform an efficient extent copy, if possible.  This avoids
+   the overhead of detecting holes in hole-introducing/preserving
+   copy, and thus makes copying sparse files much more efficient.
+   Upon a successful copy, return true.  If the initial extent scan
+   fails, set *NORMAL_COPY_REQUIRED to true and return false.
+   Upon any other failure, set *NORMAL_COPY_REQUIRED to false and
+   return false.  */
+static bool
+extent_copy (int src_fd, int dest_fd, size_t buf_size,
+             off_t src_total_size, bool make_holes,
+             char const *src_name, char const *dst_name,
+             bool *require_normal_copy)
+{
+  struct extent_scan *scan;
   off_t last_ext_logical = 0;
   uint64_t last_ext_len = 0;
   uint64_t last_read_size = 0;
-  unsigned int i = 0;
-  *normal_copy_required = false;
+  unsigned int i;
+  bool ok = true;

-  /* This is required at least to initialize fiemap->fm_start,
-     but also serves (in mid 2010) to appease valgrind, which
-     appears not to know the semantics of the FIEMAP ioctl. */
-  memset (&fiemap_buf, 0, sizeof fiemap_buf);
+  open_extent_scan (src_fd, src_name, &scan);

   do
     {
-      fiemap->fm_length = FIEMAP_MAX_OFFSET;
-      fiemap->fm_flags = FIEMAP_FLAG_SYNC;
-      fiemap->fm_extent_count = count;
-
-      /* When ioctl(2) fails, fall back to the normal copy only if it
-         is the first time we met.  */
-      if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
+      ok = get_extents_info (scan);
+      if (! ok)
         {
-          /* If the first ioctl fails, tell the caller that it is
-             ok to proceed with a normal copy.  */
-          if (i == 0)
-            *normal_copy_required = true;
-          else
-            {
-              /* If the second or subsequent ioctl fails, diagnose it,
-                 since it ends up causing the entire copy/cp to fail.  */
-              error (0, errno, _("%s: FIEMAP ioctl failed"), quote (src_name));
-            }
+          if (scan->hit_last_extent)
+            break;
+
+          if (scan->initial_scan_failed)
+            *require_normal_copy = true;
+
           return false;
         }

-      /* If 0 extents are returned, then more ioctls are not needed.  */
-      if (fiemap->fm_mapped_extents == 0)
-        break;
-
-      for (i = 0; i < fiemap->fm_mapped_extents; i++)
+      for (i = 0; i < scan->ei_count; i++)
         {
-          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
-
-          off_t ext_logical = fm_ext[i].fe_logical;
-          uint64_t ext_len = fm_ext[i].fe_length;
+          off_t ext_logical = scan->ext_info[i].ext_logical;
+          uint64_t ext_len = scan->ext_info[i].ext_length;

           if (lseek (src_fd, ext_logical, SEEK_SET) < 0)
             {
@@ -228,19 +218,30 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
               return false;
             }

-          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0)
+          if (make_holes)
             {
-              error (0, errno, _("cannot lseek %s"), quote (dst_name));
-              return false;
+              if (lseek (dest_fd, ext_logical, SEEK_SET) < 0)
+                {
+                  error (0, errno, _("cannot lseek %s"), quote (dst_name));
+                  return false;
+                }
             }
-
-          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
+          else
             {
-              last_ext_logical = ext_logical;
-              last_ext_len = ext_len;
-              last = true;
+              /* If not making a sparse file, write zeros to the destination
+                 file if there is a hole between the last and current extent.  */
+              if (last_ext_logical + last_ext_len < ext_logical)
+                {
+                  uint64_t holes_len = ext_logical - last_ext_logical - last_ext_len;
+                  if (! write_zeros (dest_fd, holes_len))
+                    return false;
+                }
             }

+          last_ext_logical = ext_logical;
+          last_ext_len = ext_len;
+          last_read_size = 0;
+
           while (ext_len)
             {
               char buf[buf_size];
@@ -258,12 +259,12 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
                     continue;
 #endif
                   error (0, errno, _("reading %s"), quote (src_name));
-                  return false;
+                    return false;
                 }

               if (n_read == 0)
                 {
-                  /* Figure out how many bytes read from the last extent.  */
+                  /* Figure out how many bytes read from the previous extent.  */
                   last_read_size = last_ext_len - ext_len;
                   break;
                 }
@@ -278,27 +279,44 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
             }
         }

-      fiemap->fm_start = fm_ext[i - 1].fe_logical + fm_ext[i - 1].fe_length;
+      /* Release the space allocated to scan->ext_info.  */
+      close_extent_scan (scan);
+    } while (! scan->hit_last_extent);

-    } while (! last);
+  /* Release the space allocated to scan->fname and scan.  */
+  close_extent_scan (scan);

   /* If a file ends up with holes, the sum of the last extent logical offset
-     and the read-returned size will be shorter than the actual size of the
-     file.  Use ftruncate to extend the length of the destination file.  */
-  if (last_ext_logical + last_read_size < src_total_size)
+     and the read-returned size or the last extent length will be shorter than
+     the actual size of the file.  Use ftruncate to extend the length of the
+     destination file if make_holes, or write zeros up to the actual size of the
+     file.  */
+  if (make_holes)
     {
-      if (ftruncate (dest_fd, src_total_size) < 0)
+      if (last_ext_logical + last_read_size < src_total_size)
         {
-          error (0, errno, _("failed to extend %s"), quote (dst_name));
-          return false;
+          if (ftruncate (dest_fd, src_total_size) < 0)
+            {
+              error (0, errno, _("failed to extend %s"), quote (dst_name));
+              return false;
+            }
+        }
+    }
+  else
+    {
+      if (last_ext_logical + last_ext_len < src_total_size)
+        {
+          uint64_t holes_len = src_total_size - last_ext_logical - last_ext_len;
+          if (0 < holes_len)
+            {
+              if (! write_zeros (dest_fd, holes_len))
+                return false;
+            }
         }
     }

   return true;
 }
-#else
-static bool fiemap_copy (ignored) { errno == ENOTSUP; return false; }
-#endif

 /* FIXME: describe */
 /* FIXME: rewrite this to use a hash table so we avoid the quadratic
@@ -833,11 +851,13 @@ copy_reg (char const *src_name, char const *dst_name,
       if (make_holes)
         {
           bool require_normal_copy;
-          /* Perform efficient FIEMAP copy for sparse files, fall back to the
-             standard copy only if the ioctl(2) fails.  */
-          if (fiemap_copy (source_desc, dest_desc, buf_size,
-                           src_open_sb.st_size, src_name,
-                           dst_name, &require_normal_copy))
+          /* Perform efficient extent copy for sparse file, fall back to the
+             standard copy only if the initial extent scan fails.  If the
+             '--sparse=never' option was specified, we writing all data but
+             use extent copy if available to efficiently read.  */
+          if (extent_copy (source_desc, dest_desc, buf_size,
+                           src_open_sb.st_size, make_holes,
+                           src_name, dst_name, &require_normal_copy))
             goto preserve_metadata;
           else
             {
diff --git a/src/extent-scan.c b/src/extent-scan.c
new file mode 100644
index 0000000..ff7d622
--- /dev/null
+++ b/src/extent-scan.c
@@ -0,0 +1,137 @@
+/* extent-scan.c -- core functions for scanning extents
+   Copyright (C) 2010 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+   Written by Jie Liu (jeff.liu <at> oracle.com).  */
+
+#include <config.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/ioctl.h>
+#include <assert.h>
+
+#include "system.h"
+#include "extent-scan.h"
+#include "error.h"
+#include "quote.h"
+
+#ifndef HAVE_FIEMAP
+# include "fiemap.h"
+#endif
+
+/* Allocate space for struct extent_scan, initialize the entries if
+   necessary and return it as the input argument of get_extents_info().  */
+extern void
+open_extent_scan (int src_fd, char const *src_name,
+                  struct extent_scan **scan)
+{
+  struct extent_scan *ext_scan = xmalloc (sizeof *ext_scan);
+
+  ext_scan->fd = src_fd;
+  ext_scan->fname = xstrdup (src_name);
+  ext_scan->ei_count = 0;
+  ext_scan->scan_start = 0;
+  ext_scan->initial_scan_failed = false;
+  ext_scan->hit_last_extent = false;
+
+  *scan = ext_scan;
+}
+
+#ifdef __linux__
+# ifndef FS_IOC_FIEMAP
+#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
+# endif
+/* Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to
+   obtain a map of file extents excluding holes.  */
+extern bool
+get_extents_info (struct extent_scan *scan)
+{
+  union { struct fiemap f; char c[4096]; } fiemap_buf;
+  struct fiemap *fiemap = &fiemap_buf.f;
+  struct fiemap_extent *fm_extents = &fiemap->fm_extents[0];
+  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_extents };
+  verify (count != 0);
+  unsigned int i;
+
+  /* This is required at least to initialize fiemap->fm_start,
+     but also serves (in mid 2010) to appease valgrind, which
+     appears not to know the semantics of the FIEMAP ioctl. */
+  memset (&fiemap_buf, 0, sizeof fiemap_buf);
+
+  fiemap->fm_start = scan->scan_start;
+  fiemap->fm_flags = FIEMAP_FLAG_SYNC;
+  fiemap->fm_extent_count = count;
+  fiemap->fm_length = FIEMAP_MAX_OFFSET - scan->scan_start;
+
+  /* Fall back to the standard copy if call ioctl(2) failed for the
+     the first time.  */
+  if (ioctl (scan->fd, FS_IOC_FIEMAP, fiemap) < 0)
+    {
+      error (0, errno, _("%s: FIEMAP ioctl failed"), quote (scan->fname));
+
+      if (scan->scan_start == 0)
+        scan->initial_scan_failed = true;
+      return false;
+    }
+
+  /* If 0 extents are returned, then more get_extent_table() are not needed.  */
+  if (fiemap->fm_mapped_extents == 0)
+    {
+      scan->hit_last_extent = true;
+      return false;
+    }
+
+  scan->ei_count = fiemap->fm_mapped_extents;
+  scan->ext_info = xcalloc (scan->ei_count, sizeof scan->ext_info[0]);
+
+  for (i = 0; i < scan->ei_count; i++)
+    {
+      assert (fm_extents[i].fe_logical <= OFF_T_MAX);
+
+      scan->ext_info[i].ext_logical = fm_extents[i].fe_logical;
+      scan->ext_info[i].ext_length = fm_extents[i].fe_length;
+      scan->ext_info[i].ext_flags = fm_extents[i].fe_flags;
+    }
+
+  i--;
+  if (scan->ext_info[i].ext_flags & FIEMAP_EXTENT_LAST)
+    {
+      scan->hit_last_extent = true;
+      return true;
+    }
+
+  scan->scan_start = fm_extents[i].fe_logical + fm_extents[i].fe_length;
+
+  return true;
+}
+#else
+extern bool get_extents_info (ignored) { errno = ENOTSUP; return false; }
+#endif
+
+extern void
+close_extent_scan (struct extent_scan *scan)
+{
+  if (scan)
+    {
+      /* If not hit the last extent, only release extent_info.  */
+      if (scan->ext_info && ! scan->hit_last_extent)
+        {
+          free (scan->ext_info);
+          return;
+        }
+      free (scan->fname);
+      free (scan);
+    }
+}
diff --git a/src/extent-scan.h b/src/extent-scan.h
new file mode 100644
index 0000000..3caae88
--- /dev/null
+++ b/src/extent-scan.h
@@ -0,0 +1,71 @@
+/* core functions for efficient reading sparse files
+   Copyright (C) 2010 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+   Written by Jie Liu (jeff.liu <at> oracle.com).  */
+
+#ifndef EXTENT_SCAN_H
+# define EXTENT_SCAN_H
+
+/* Structure used to reserve information of each extent.  */
+struct extent_info
+{
+  /* Logical offset of an extent.  */
+  off_t ext_logical;
+
+  /* Extent length.  */
+  uint64_t ext_length;
+
+  /* Extent flags, use it for FIEMAP only, or set it to zero.  */
+  uint32_t ext_flags;
+};
+
+/* Structure used to reserve extent scan information per file.  */
+struct extent_scan
+{
+  /* File descriptor of extent scan run against.  */
+  int fd;
+
+  /* File name for debugging purpose.  */
+  char *fname;
+
+  /* Next scan start offset.  */
+  off_t scan_start;
+
+  /* How many extent info returned for a scan.  */
+  uint32_t ei_count;
+
+  /* If true, fall back to a normal copy, either
+     set by the failure of ioctl(2) for FIEMAP or
+     lseek(2) with SEEK_DATA.  */
+  bool initial_scan_failed;
+
+  /* If ture, the total extent scan per file has been finished.  */
+  bool hit_last_extent;
+
+  /* Extent information.  */
+  struct extent_info *ext_info;
+};
+
+void
+open_extent_scan (int src_fd, char const *src_name,
+                  struct extent_scan **scan);
+
+bool
+get_extents_info (struct extent_scan *scan);
+
+void
+close_extent_scan (struct extent_scan *scan);
+#endif /* EXTENT_SCAN_H */
-- 
1.5.4.3


Regards,
-Jeff

Jim Meyering wrote:
> jeff.liu wrote:
>> Hi Jim and All,
>>
>> Do you have any comments for the current implementation?
> 
> Hi Jeff,
> 
> Thanks for the reminder.
> I've just gone back and looked at those patches:
> 
>     http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/20534/focus=21008
> 
> There are some superficial problems.
> 
> First, whenever you move code from one place to another,
> the commit that performs the move should be careful to
> induce no other change.  In this case, the change should
> remove code from copy.c and create the new .c file with
> code that is essentially identical.  You'll have to remove
> a "static" attribute on the primary function(s), and add
> #include directives in the new file, but those are inevitable.
> Also, in copy.c, you will remove the function body
> and associated #include directives, adding an #include
> of the new .h file.  Of course, this change must also update
> src/Makefile.am, and the result should pass "make distcheck".
> 
> But perhaps you require new functions like init_extent_scan
> in order to use the abstracted function properly.
> In that case, your first commit would add the new functions
> in copy.c and make use of them there.  Then you would move
> all of the functions to their new file in the next commit,
> making no semantic change.
> 
> Note however, that this copying code is intended to be usable in a
> multi-threaded application, and thus must avoid using internal static
> state.  Your patch adds a few new static variables, each of which
> would cause trouble in such an application:
> 
>   +static size_t current_scanned_extents_count = 0;
>   +  static uint64_t next_map_start = 0;
>   +  static size_t i = 0;
> 
> While you're rearranging your patch along the lines above,
> please eliminate those static variables, too.
> 
> Also, this new function should be adjusted:
> 
>   +/* Write zeros as holes to the destination file.  */
>   +static bool
>   +fill_with_holes_ok (int dest_fd, const char *dst_name,
>   +                    char *buf, size_t buf_size,
>   +                    uint64_t holes_len)
> 
> Its signature is unnecessarily complicated for a function
> that does nothing more than write N zero bytes to a file descriptor.
> Also, the function name is misleading (as is its holes_len parameter),
> since it certainly does not create holes.
> 
> Hmm... now I'm suspicious: could the second use of your fill_with_holes_ok
> write from an uninitialized "buf"?  It appears that is possible,
> but I confess not to have applied the patch.
> 
> What do you think of this signature,
> 
>   static bool
>   write_zeros (int fd, uint64_t n_bytes)
> 
> That would require a buffer full of zeros, preferably of optimal size.
> It could use a body like this,
> 
> {
>   static char zero[32 * 1024];
>   while (n_bytes)
>     {
>       uint64_t n = MIN (sizeof zero, n_bytes);
>       if (full_write (fd, zero, n)) != n)
>         return false;
>       n_bytes -= n;
>     }
>   return true;
> }
> 
> or even calloc an IO_BUFSIZE-byte buffer on the first call
> and use that.  Yes, using calloc appears better, since this code
> will end up being used relatively infrequently.  Or perhaps better
> still, do use calloc, but if the allocation fails, fall back on
> using a smaller static buffer, of size say 1024 bytes.
> 
> Of course, simplifying the function means each caller
> will have to diagnose failure, but imho, that's preferable
> in this case.
> 
> Jim
> 
> 
> 





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sun, 26 Sep 2010 08:45:02 GMT) Full text and rfc822 format available.

Message #284 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Sun, 26 Sep 2010 10:46:56 +0200
Hi Jeff,

This function has problems:
  - the inner "zeros" declaration shadows the outer one
      and ends up being useless.
  - the "sizeof zeros" resolves to 4 or 8.  obviously not what you intended.

...
>  static bool
> +write_zeros (int fd, uint64_t n_bytes)
>  {
> -  bool last = false;
> -  union { struct fiemap f; char c[4096]; } fiemap_buf;
> -  struct fiemap *fiemap = &fiemap_buf.f;
> -  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
> -  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
> -  verify (count != 0);
> +  char *zeros = calloc (IO_BUFSIZE, sizeof (char));
> +  if (! zeros)
> +    {
> +      /* Try a small buffer.  */
> +      static char zeros[1024];
> +    }
>
> +  while (n_bytes)
> +    {
> +      uint64_t n = MIN (sizeof zeros, n_bytes);
> +      if ((full_write (fd, zeros, n)) != n)
> +        return false;
> +      n_bytes -= n;
> +    }
> +
> +  return true;
> +}

Please use the following instead.
I'll review the rest tomorrow or Tuesday.

static bool
write_zeros (int fd, uint64_t n_bytes)
{
  static char *zeros;
  static size_t nz = IO_BUFSIZE;

  if (zeros == NULL)
    {
      static char fallback[1024];
      zeros = calloc (nz, 1);
      if (zeros == NULL)
        {
          zeros = fallback;
          nz = sizeof fallback;
        }
    }

  while (n_bytes)
    {
      uint64_t n = MIN (nz, n_bytes);
      if ((full_write (fd, zeros, n)) != n)
        return false;
      n_bytes -= n;
    }

  return true;
}




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sun, 26 Sep 2010 08:52:02 GMT) Full text and rfc822 format available.

Message #287 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Sun, 26 Sep 2010 16:53:02 +0800
Hi Jim,

Thanks for your prompt response, I will fix this issue when all review done.


Regards,
-Jeff

Jim Meyering wrote:
> Hi Jeff,
> 
> This function has problems:
>   - the inner "zeros" declaration shadows the outer one
>       and ends up being useless.
>   - the "sizeof zeros" resolves to 4 or 8.  obviously not what you intended.
> 
> ...
>>  static bool
>> +write_zeros (int fd, uint64_t n_bytes)
>>  {
>> -  bool last = false;
>> -  union { struct fiemap f; char c[4096]; } fiemap_buf;
>> -  struct fiemap *fiemap = &fiemap_buf.f;
>> -  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
>> -  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
>> -  verify (count != 0);
>> +  char *zeros = calloc (IO_BUFSIZE, sizeof (char));
>> +  if (! zeros)
>> +    {
>> +      /* Try a small buffer.  */
>> +      static char zeros[1024];
>> +    }
>>
>> +  while (n_bytes)
>> +    {
>> +      uint64_t n = MIN (sizeof zeros, n_bytes);
>> +      if ((full_write (fd, zeros, n)) != n)
>> +        return false;
>> +      n_bytes -= n;
>> +    }
>> +
>> +  return true;
>> +}
> 
> Please use the following instead.
> I'll review the rest tomorrow or Tuesday.
> 
> static bool
> write_zeros (int fd, uint64_t n_bytes)
> {
>   static char *zeros;
>   static size_t nz = IO_BUFSIZE;
> 
>   if (zeros == NULL)
>     {
>       static char fallback[1024];
>       zeros = calloc (nz, 1);
>       if (zeros == NULL)
>         {
>           zeros = fallback;
>           nz = sizeof fallback;
>         }
>     }
> 
>   while (n_bytes)
>     {
>       uint64_t n = MIN (nz, n_bytes);
>       if ((full_write (fd, zeros, n)) != n)
>         return false;
>       n_bytes -= n;
>     }
> 
>   return true;
> }





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 28 Sep 2010 03:07:01 GMT) Full text and rfc822 format available.

Message #290 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 28 Sep 2010 11:08:24 +0800
jeff.liu wrote:
> Hi Jim,
> 
> Thanks for your prompt response, I will fix this issue when all review done.
Hi Jim,

For my current implementation, I just have another thought to remove the "char *fname" from struct
extent_scan, and add a new item "int errno" to save the errno set by ioctl(2) or lseek(2) if either
call failed.
at first, I am intended to use "char *fname" for the debugging purpose inside, however, maybe its
better to do such things outside of the module according to the return value and errno. this change
can not only reduce the memory allocation for 'fname' but also make a neatly open_extent_scan()
interface.

/* Structure used to reserve extent scan information per file.  */
struct extent_scan
{
 ....
  int errno;
  ....
};

void
open_extent_scan (int src_fd, struct extent_scan **scan);


Is it better?

Thanks,
-Jeff

> 
> 
> Regards,
> -Jeff
> 
> Jim Meyering wrote:
>> Hi Jeff,
>>
>> This function has problems:
>>   - the inner "zeros" declaration shadows the outer one
>>       and ends up being useless.
>>   - the "sizeof zeros" resolves to 4 or 8.  obviously not what you intended.
>>
>> ...
>>>  static bool
>>> +write_zeros (int fd, uint64_t n_bytes)
>>>  {
>>> -  bool last = false;
>>> -  union { struct fiemap f; char c[4096]; } fiemap_buf;
>>> -  struct fiemap *fiemap = &fiemap_buf.f;
>>> -  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
>>> -  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
>>> -  verify (count != 0);
>>> +  char *zeros = calloc (IO_BUFSIZE, sizeof (char));
>>> +  if (! zeros)
>>> +    {
>>> +      /* Try a small buffer.  */
>>> +      static char zeros[1024];
>>> +    }
>>>
>>> +  while (n_bytes)
>>> +    {
>>> +      uint64_t n = MIN (sizeof zeros, n_bytes);
>>> +      if ((full_write (fd, zeros, n)) != n)
>>> +        return false;
>>> +      n_bytes -= n;
>>> +    }
>>> +
>>> +  return true;
>>> +}
>> Please use the following instead.
>> I'll review the rest tomorrow or Tuesday.
>>
>> static bool
>> write_zeros (int fd, uint64_t n_bytes)
>> {
>>   static char *zeros;
>>   static size_t nz = IO_BUFSIZE;
>>
>>   if (zeros == NULL)
>>     {
>>       static char fallback[1024];
>>       zeros = calloc (nz, 1);
>>       if (zeros == NULL)
>>         {
>>           zeros = fallback;
>>           nz = sizeof fallback;
>>         }
>>     }
>>
>>   while (n_bytes)
>>     {
>>       uint64_t n = MIN (nz, n_bytes);
>>       if ((full_write (fd, zeros, n)) != n)
>>         return false;
>>       n_bytes -= n;
>>     }
>>
>>   return true;
>> }
> 
> 
> 
> 





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 28 Sep 2010 08:21:01 GMT) Full text and rfc822 format available.

Message #293 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 28 Sep 2010 10:23:16 +0200
jeff.liu wrote:
> Sorry for the delay.
>
> This is the new patch to isolate the stuff regarding to extents reading to a new module. and teach
> cp(1) to make use of it.

Jeff,

I applied your patch to my rebased fiemap-copy branch.
My first step was to run the usual

  ./bootstrap && ./configure && make && make check

"make check" failed on due to a double free in your new code:
(x86_64, Fedora 13, ext4 working directory)

To get details, I made this temporary modification:

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index b6b1103..a8643bc 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -82,7 +82,7 @@ for i in $(seq 1 2 21); do
           -e 'for (1..'$j') { sysseek (*F, $n, 1)' \
           -e '&& syswrite (*F, chr($_)x$n) or die "$!"}' > j1 || fail=1
     # sync
-    cp --sparse=always j1 j2 || fail=1
+    valgrind cp --sparse=always j1 j2 || fail=1
     # sync
     # Technically we may need the 'sync' uses above, but
     # uncommenting them makes this test take much longer.

Then I reran make check.
That showed that 4 invocations of cp failed, each with errors
like the ones below.  I'll look at the actual code once you've
fixed these bugs:

==13203== Command: cp --sparse=always j1 j2
==13203==
==13203== Invalid read of size 1
==13203==    at 0x404960: extent_copy (copy.c:284)
==13203==    by 0x405A92: copy_reg (copy.c:848)
==13203==    by 0x4084F2: copy_internal (copy.c:2189)
==13203==    by 0x40901E: copy (copy.c:2475)
==13203==    by 0x403AC9: do_copy (cp.c:757)
==13203==    by 0x4041F4: main (cp.c:1162)
==13203==  Address 0x4c35b8d is 29 bytes inside a block of size 40 free'd
==13203==    at 0x4A04D72: free (vg_replace_malloc.c:325)
==13203==    by 0x4096A9: close_extent_scan (extent-scan.c:135)
==13203==    by 0x40495B: extent_copy (copy.c:283)
==13203==    by 0x405A92: copy_reg (copy.c:848)
==13203==    by 0x4084F2: copy_internal (copy.c:2189)
==13203==    by 0x40901E: copy (copy.c:2475)
==13203==    by 0x403AC9: do_copy (cp.c:757)
==13203==    by 0x4041F4: main (cp.c:1162)
==13203==
==13203== Invalid read of size 8
==13203==    at 0x409664: close_extent_scan (extent-scan.c:129)
==13203==    by 0x40497D: extent_copy (copy.c:287)
==13203==    by 0x405A92: copy_reg (copy.c:848)
==13203==    by 0x4084F2: copy_internal (copy.c:2189)
==13203==    by 0x40901E: copy (copy.c:2475)
==13203==    by 0x403AC9: do_copy (cp.c:757)
==13203==    by 0x4041F4: main (cp.c:1162)
==13203==  Address 0x4c35b90 is 32 bytes inside a block of size 40 free'd
==13203==    at 0x4A04D72: free (vg_replace_malloc.c:325)
==13203==    by 0x4096A9: close_extent_scan (extent-scan.c:135)
==13203==    by 0x40495B: extent_copy (copy.c:283)
==13203==    by 0x405A92: copy_reg (copy.c:848)
==13203==    by 0x4084F2: copy_internal (copy.c:2189)
==13203==    by 0x40901E: copy (copy.c:2475)
==13203==    by 0x403AC9: do_copy (cp.c:757)
==13203==    by 0x4041F4: main (cp.c:1162)
==13203==
==13203== Invalid read of size 1
==13203==    at 0x409671: close_extent_scan (extent-scan.c:129)
==13203==    by 0x40497D: extent_copy (copy.c:287)
==13203==    by 0x405A92: copy_reg (copy.c:848)
==13203==    by 0x4084F2: copy_internal (copy.c:2189)
==13203==    by 0x40901E: copy (copy.c:2475)
==13203==    by 0x403AC9: do_copy (cp.c:757)
==13203==    by 0x4041F4: main (cp.c:1162)
==13203==  Address 0x4c35b8d is 29 bytes inside a block of size 40 free'd
==13203==    at 0x4A04D72: free (vg_replace_malloc.c:325)
==13203==    by 0x4096A9: close_extent_scan (extent-scan.c:135)
==13203==    by 0x40495B: extent_copy (copy.c:283)
==13203==    by 0x405A92: copy_reg (copy.c:848)
==13203==    by 0x4084F2: copy_internal (copy.c:2189)
==13203==    by 0x40901E: copy (copy.c:2475)
==13203==    by 0x403AC9: do_copy (cp.c:757)
==13203==    by 0x4041F4: main (cp.c:1162)
==13203==
==13203== Invalid read of size 8
==13203==    at 0x409692: close_extent_scan (extent-scan.c:134)
==13203==    by 0x40497D: extent_copy (copy.c:287)
==13203==    by 0x405A92: copy_reg (copy.c:848)
==13203==    by 0x4084F2: copy_internal (copy.c:2189)
==13203==    by 0x40901E: copy (copy.c:2475)
==13203==    by 0x403AC9: do_copy (cp.c:757)
==13203==    by 0x4041F4: main (cp.c:1162)
==13203==  Address 0x4c35b78 is 8 bytes inside a block of size 40 free'd
==13203==    at 0x4A04D72: free (vg_replace_malloc.c:325)
==13203==    by 0x4096A9: close_extent_scan (extent-scan.c:135)
==13203==    by 0x40495B: extent_copy (copy.c:283)
==13203==    by 0x405A92: copy_reg (copy.c:848)
==13203==    by 0x4084F2: copy_internal (copy.c:2189)
==13203==    by 0x40901E: copy (copy.c:2475)
==13203==    by 0x403AC9: do_copy (cp.c:757)
==13203==    by 0x4041F4: main (cp.c:1162)
==13203==
==13203== Invalid free() / delete / delete[]
==13203==    at 0x4A04D72: free (vg_replace_malloc.c:325)
==13203==    by 0x40969D: close_extent_scan (extent-scan.c:134)
==13203==    by 0x40497D: extent_copy (copy.c:287)
==13203==    by 0x405A92: copy_reg (copy.c:848)
==13203==    by 0x4084F2: copy_internal (copy.c:2189)
==13203==    by 0x40901E: copy (copy.c:2475)
==13203==    by 0x403AC9: do_copy (cp.c:757)
==13203==    by 0x4041F4: main (cp.c:1162)
==13203==  Address 0x4c35be0 is 0 bytes inside a block of size 3 free'd
==13203==    at 0x4A04D72: free (vg_replace_malloc.c:325)
==13203==    by 0x40969D: close_extent_scan (extent-scan.c:134)
==13203==    by 0x40495B: extent_copy (copy.c:283)
==13203==    by 0x405A92: copy_reg (copy.c:848)
==13203==    by 0x4084F2: copy_internal (copy.c:2189)
==13203==    by 0x40901E: copy (copy.c:2475)
==13203==    by 0x403AC9: do_copy (cp.c:757)
==13203==    by 0x4041F4: main (cp.c:1162)
==13203==
==13203== Invalid free() / delete / delete[]
==13203==    at 0x4A04D72: free (vg_replace_malloc.c:325)
==13203==    by 0x4096A9: close_extent_scan (extent-scan.c:135)
==13203==    by 0x40497D: extent_copy (copy.c:287)
==13203==    by 0x405A92: copy_reg (copy.c:848)
==13203==    by 0x4084F2: copy_internal (copy.c:2189)
==13203==    by 0x40901E: copy (copy.c:2475)
==13203==    by 0x403AC9: do_copy (cp.c:757)
==13203==    by 0x4041F4: main (cp.c:1162)
==13203==  Address 0x4c35b70 is 0 bytes inside a block of size 40 free'd
==13203==    at 0x4A04D72: free (vg_replace_malloc.c:325)
==13203==    by 0x4096A9: close_extent_scan (extent-scan.c:135)
==13203==    by 0x40495B: extent_copy (copy.c:283)
==13203==    by 0x405A92: copy_reg (copy.c:848)
==13203==    by 0x4084F2: copy_internal (copy.c:2189)
==13203==    by 0x40901E: copy (copy.c:2475)
==13203==    by 0x403AC9: do_copy (cp.c:757)
==13203==    by 0x4041F4: main (cp.c:1162)




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 28 Sep 2010 08:25:02 GMT) Full text and rfc822 format available.

Message #296 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 28 Sep 2010 10:27:16 +0200
jeff.liu wrote:
> jeff.liu wrote:
>> Hi Jim,
>>
>> Thanks for your prompt response, I will fix this issue when all review done.
> Hi Jim,
>
> For my current implementation, I just have another thought to remove the "char *fname" from struct
> extent_scan, and add a new item "int errno" to save the errno set by ioctl(2) or lseek(2) if either
> call failed.
> at first, I am intended to use "char *fname" for the debugging purpose inside, however, maybe its
> better to do such things outside of the module according to the return value and errno. this change
> can not only reduce the memory allocation for 'fname' but also make a neatly open_extent_scan()
> interface.
>
> /* Structure used to reserve extent scan information per file.  */
> struct extent_scan
> {
>  ....
>   int errno;
>   ....
> };
>
> void
> open_extent_scan (int src_fd, struct extent_scan **scan);
>
>
> Is it better?

That would be better.
Since there is only one diagnostic using that file name
you can do the same from the caller.

In fact, why store errno in that struct at all?
Can't you just ensure that errno is set to the proper value when it fails?




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 29 Sep 2010 07:34:01 GMT) Full text and rfc822 format available.

Message #299 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 29 Sep 2010 15:35:17 +0800
Jim Meyering wrote:
> jeff.liu wrote:
>> jeff.liu wrote:
>>> Hi Jim,
>>>
>>> Thanks for your prompt response, I will fix this issue when all review done.
>> Hi Jim,
>>
>> For my current implementation, I just have another thought to remove the "char *fname" from struct
>> extent_scan, and add a new item "int errno" to save the errno set by ioctl(2) or lseek(2) if either
>> call failed.
>> at first, I am intended to use "char *fname" for the debugging purpose inside, however, maybe its
>> better to do such things outside of the module according to the return value and errno. this change
>> can not only reduce the memory allocation for 'fname' but also make a neatly open_extent_scan()
>> interface.
>>
>> /* Structure used to reserve extent scan information per file.  */
>> struct extent_scan
>> {
>>  ....
>>   int errno;
>>   ....
>> };
>>
>> void
>> open_extent_scan (int src_fd, struct extent_scan **scan);
>>
>>
>> Is it better?
> 
> That would be better.
> Since there is only one diagnostic using that file name
> you can do the same from the caller.
> 
> In fact, why store errno in that struct at all?
> Can't you just ensure that errno is set to the proper value when it fails?
Indeed, there is no need to store 'errno'.

Thanks,
-Jeff
> 
> 
> 





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 29 Sep 2010 08:12:02 GMT) Full text and rfc822 format available.

Message #302 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 29 Sep 2010 16:11:41 +0800
Jim Meyering wrote:
> jeff.liu wrote:
>> Sorry for the delay.
>>
>> This is the new patch to isolate the stuff regarding to extents reading to a new module. and teach
>> cp(1) to make use of it.
> 
> Jeff,
> 
> I applied your patch to my rebased fiemap-copy branch.
> My first step was to run the usual
> 
>   ./bootstrap && ./configure && make && make check
> 
> "make check" failed on due to a double free in your new code:
> (x86_64, Fedora 13, ext4 working directory)
> 
> To get details, I made this temporary modification:
Hi Jim,

I am sorry for the fault, it fixed at the patch below.
Would you please revie at your convenience?

Changes:
========
1. fix write_zeros() as Jim's comments, thanks for pointing this out.
2. remove char const *fname from struct extent_scan.
3. change the signature of open_extent_scan() from "void open_extent_scan(struct extent_scan
**scan)" to "void open_extent_scan(struct extent_scan *scan)"; the reason is I'd like to reduce once
memory allocation for the extent_scan variable, instead, using stack to save it.
4. remove close_extent_scan() from a function defined at extent-scan.c to extent-scan.h as a Macro
definination, but it does nothing for now, since initial extent scan defined at stack.
5. add a macro "free_extents_info()" defined at extent-scan.h to release the memory allocated to
extent info which should be called combine with get_extents_info(), it just one line, so IMHO,
define it as macro should be ok.

I have done the memory check via `valgrind`, no issue found.
make test against cp/sparse-fiemap failed at the extent compare stage, but the file content is
identical to each other by comparing those two files "j1/j2" manually.
Is it make sense if we verify them through diff(1) since the testing file is in small size?
or we have to merge the contig extents from the output of `filefrag', I admit I have not dig into
the filefrag-extent-compare at the moment, I need to recall the perl language syntax. :-P.


From 50a3338db06442fa2d789fd65175172d140cc96e Mon Sep 17 00:00:00 2001
From: Jie Liu <jeff.liu <at> oracle.com>
Date: Wed, 29 Sep 2010 15:35:43 +0800
Subject: [PATCH 1/1] cp: add a new module for scanning extents

* src/extent-scan.c: Source code for scanning extents.
  Call open_extent_scan() to initialize extent scan.
  Call get_extents_info() to get a number of extents for each iteration.
* src/extent-scan.h: Header file of extent-scan.c.
  Wrap free_extent_info() as macro define to release the space allocated extent_info per extent scan.
  Wrap close_extent_scan() as macro define but do nothing at the moment.
* src/Makefile.am: Reference it and link it to copy_source.
* src/copy.c: Make use of the new module, replace fiemap_copy() with extent_copy().

Signed-off-by: Jie Liu <jeff.liu <at> oracle.com>
---
 src/Makefile.am   |    2 +-
 src/copy.c        |  197 ++++++++++++++++++++++++++++++----------------------
 src/extent-scan.c |  113 ++++++++++++++++++++++++++++++
 src/extent-scan.h |   68 ++++++++++++++++++
 4 files changed, 296 insertions(+), 84 deletions(-)
 create mode 100644 src/extent-scan.c
 create mode 100644 src/extent-scan.h

diff --git a/src/Makefile.am b/src/Makefile.am
index 7d56312..2412b16 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -459,7 +459,7 @@ uninstall-local:
 	  fi; \
 	fi

-copy_sources = copy.c cp-hash.c
+copy_sources = copy.c cp-hash.c extent-scan.c

 # Use `ginstall' in the definition of PROGRAMS and in dependencies to avoid
 # confusion with the `install' target.  The install rule transforms `ginstall'
diff --git a/src/copy.c b/src/copy.c
index f48c74d..42ec300 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -35,6 +35,7 @@
 #include "buffer-lcm.h"
 #include "copy.h"
 #include "cp-hash.h"
+#include "extent-scan.h"
 #include "error.h"
 #include "fcntl--.h"
 #include "file-set.h"
@@ -63,10 +64,6 @@

 #include <sys/ioctl.h>

-#ifndef HAVE_FIEMAP
-# include "fiemap.h"
-#endif
-
 #ifndef HAVE_FCHOWN
 # define HAVE_FCHOWN false
 # define fchown(fd, uid, gid) (-1)
@@ -153,74 +150,79 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

-#ifdef __linux__
-# ifndef FS_IOC_FIEMAP
-#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
-# endif
-/* Perform a FIEMAP copy, if possible.
-   Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to
-   obtain a map of file extents excluding holes.  This avoids the
-   overhead of detecting holes in a hole-introducing/preserving copy,
-   and thus makes copying sparse files much more efficient.  Upon a
-   successful copy, return true.  If the initial ioctl fails, set
-   *NORMAL_COPY_REQUIRED to true and return false.  Upon any other
-   failure, set *NORMAL_COPY_REQUIRED to false and return false.  */
 static bool
-fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
-             off_t src_total_size, char const *src_name,
-             char const *dst_name, bool *normal_copy_required)
+write_zeros (int fd, uint64_t n_bytes)
 {
-  bool last = false;
-  union { struct fiemap f; char c[4096]; } fiemap_buf;
-  struct fiemap *fiemap = &fiemap_buf.f;
-  struct fiemap_extent *fm_ext = &fiemap->fm_extents[0];
-  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_ext };
-  verify (count != 0);
+  static char *zeros;
+  static size_t nz = IO_BUFSIZE;
+
+  if (zeros == NULL)
+    {
+      static char fallback[1024];
+      zeros = calloc (nz, 1);
+      if (zeros == NULL)
+        {
+          zeros = fallback;
+          nz = sizeof fallback;
+        }
+    }
+
+  while (n_bytes)
+    {
+      uint64_t n = MIN (sizeof nz, n_bytes);
+      if ((full_write (fd, zeros, n)) != n)
+        return false;
+      n_bytes -= n;
+    }
+
+  return true;
+}

+/* Perform an efficient extent copy, if possible.  This avoids
+   the overhead of detecting holes in hole-introducing/preserving
+   copy, and thus makes copying sparse files much more efficient.
+   Upon a successful copy, return true.  If the initial extent scan
+   fails, set *NORMAL_COPY_REQUIRED to true and return false.
+   Upon any other failure, set *NORMAL_COPY_REQUIRED to false and
+   return false.  */
+static bool
+extent_copy (int src_fd, int dest_fd, size_t buf_size,
+             off_t src_total_size, bool make_holes,
+             char const *src_name, char const *dst_name,
+             bool *require_normal_copy)
+{
+  struct extent_scan scan;
   off_t last_ext_logical = 0;
   uint64_t last_ext_len = 0;
   uint64_t last_read_size = 0;
-  unsigned int i = 0;
-  *normal_copy_required = false;
+  unsigned int i;
+  bool ok = true;

-  /* This is required at least to initialize fiemap->fm_start,
-     but also serves (in mid 2010) to appease valgrind, which
-     appears not to know the semantics of the FIEMAP ioctl. */
-  memset (&fiemap_buf, 0, sizeof fiemap_buf);
+  open_extent_scan (src_fd, &scan);

   do
     {
-      fiemap->fm_length = FIEMAP_MAX_OFFSET;
-      fiemap->fm_flags = FIEMAP_FLAG_SYNC;
-      fiemap->fm_extent_count = count;
-
-      /* When ioctl(2) fails, fall back to the normal copy only if it
-         is the first time we met.  */
-      if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0)
+      ok = get_extents_info (&scan);
+      if (! ok)
         {
-          /* If the first ioctl fails, tell the caller that it is
-             ok to proceed with a normal copy.  */
-          if (i == 0)
-            *normal_copy_required = true;
-          else
+          if (scan.hit_last_extent)
+            break;
+
+          if (scan.initial_scan_failed)
             {
-              /* If the second or subsequent ioctl fails, diagnose it,
-                 since it ends up causing the entire copy/cp to fail.  */
-              error (0, errno, _("%s: FIEMAP ioctl failed"), quote (src_name));
+              close_extent_scan (&scan);
+              *require_normal_copy = true;
+              return false;
             }
+
+          error (0, errno, _("failed to get extents info %s"), quote (src_name));
           return false;
         }

-      /* If 0 extents are returned, then more ioctls are not needed.  */
-      if (fiemap->fm_mapped_extents == 0)
-        break;
-
-      for (i = 0; i < fiemap->fm_mapped_extents; i++)
+      for (i = 0; i < scan.ei_count; i++)
         {
-          assert (fm_ext[i].fe_logical <= OFF_T_MAX);
-
-          off_t ext_logical = fm_ext[i].fe_logical;
-          uint64_t ext_len = fm_ext[i].fe_length;
+          off_t ext_logical = scan.ext_info[i].ext_logical;
+          uint64_t ext_len = scan.ext_info[i].ext_length;

           if (lseek (src_fd, ext_logical, SEEK_SET) < 0)
             {
@@ -228,27 +230,37 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
               return false;
             }

-          if (lseek (dest_fd, ext_logical, SEEK_SET) < 0)
+          if (make_holes)
             {
-              error (0, errno, _("cannot lseek %s"), quote (dst_name));
-              return false;
+              if (lseek (dest_fd, ext_logical, SEEK_SET) < 0)
+                {
+                  error (0, errno, _("cannot lseek %s"), quote (dst_name));
+                  return false;
+                }
             }
-
-          if (fm_ext[i].fe_flags & FIEMAP_EXTENT_LAST)
+          else
             {
-              last_ext_logical = ext_logical;
-              last_ext_len = ext_len;
-              last = true;
+              /* If not making a sparse file, write zeros to the destination
+                 file if there is a hole between the last and current extent.  */
+              if (last_ext_logical + last_ext_len < ext_logical)
+                {
+                  uint64_t holes_len = ext_logical - last_ext_logical - last_ext_len;
+                  if (! write_zeros (dest_fd, holes_len))
+                    return false;
+                }
             }

+          last_ext_logical = ext_logical;
+          last_ext_len = ext_len;
+          last_read_size = 0;
+
           while (ext_len)
             {
               char buf[buf_size];

               /* Avoid reading into the holes if the left extent
                  length is shorter than the buffer size.  */
-              if (ext_len < buf_size)
-                buf_size = ext_len;
+              buf_size = MIN (ext_len, buf_size);

               ssize_t n_read = read (src_fd, buf, buf_size);
               if (n_read < 0)
@@ -258,12 +270,12 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
                     continue;
 #endif
                   error (0, errno, _("reading %s"), quote (src_name));
-                  return false;
+                    return false;
                 }

               if (n_read == 0)
                 {
-                  /* Figure out how many bytes read from the last extent.  */
+                  /* Figure out how many bytes read from the previous extent.  */
                   last_read_size = last_ext_len - ext_len;
                   break;
                 }
@@ -278,27 +290,44 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
             }
         }

-      fiemap->fm_start = fm_ext[i - 1].fe_logical + fm_ext[i - 1].fe_length;
+      /* Release the space allocated to scan->ext_info.  */
+      free_extents_info (&scan);
+    } while (! scan.hit_last_extent);

-    } while (! last);
+  /* Do nothing now.  */
+  close_extent_scan (&scan);

   /* If a file ends up with holes, the sum of the last extent logical offset
-     and the read-returned size will be shorter than the actual size of the
-     file.  Use ftruncate to extend the length of the destination file.  */
-  if (last_ext_logical + last_read_size < src_total_size)
+     and the read-returned size or the last extent length will be shorter than
+     the actual size of the file.  Use ftruncate to extend the length of the
+     destination file if make_holes, or write zeros up to the actual size of the
+     file.  */
+  if (make_holes)
     {
-      if (ftruncate (dest_fd, src_total_size) < 0)
+      if (last_ext_logical + last_read_size < src_total_size)
         {
-          error (0, errno, _("failed to extend %s"), quote (dst_name));
-          return false;
+          if (ftruncate (dest_fd, src_total_size) < 0)
+            {
+              error (0, errno, _("failed to extend %s"), quote (dst_name));
+              return false;
+            }
+        }
+    }
+  else
+    {
+      if (last_ext_logical + last_ext_len < src_total_size)
+        {
+          uint64_t holes_len = src_total_size - last_ext_logical - last_ext_len;
+          if (0 < holes_len)
+            {
+              if (! write_zeros (dest_fd, holes_len))
+                return false;
+            }
         }
     }

   return true;
 }
-#else
-static bool fiemap_copy (ignored) { errno == ENOTSUP; return false; }
-#endif

 /* FIXME: describe */
 /* FIXME: rewrite this to use a hash table so we avoid the quadratic
@@ -833,11 +862,13 @@ copy_reg (char const *src_name, char const *dst_name,
       if (make_holes)
         {
           bool require_normal_copy;
-          /* Perform efficient FIEMAP copy for sparse files, fall back to the
-             standard copy only if the ioctl(2) fails.  */
-          if (fiemap_copy (source_desc, dest_desc, buf_size,
-                           src_open_sb.st_size, src_name,
-                           dst_name, &require_normal_copy))
+          /* Perform efficient extent copy for sparse file, fall back to the
+             standard copy only if the initial extent scan fails.  If the
+             '--sparse=never' option was specified, we writing all data but
+             use extent copy if available to efficiently read.  */
+          if (extent_copy (source_desc, dest_desc, buf_size,
+                           src_open_sb.st_size, make_holes,
+                           src_name, dst_name, &require_normal_copy))
             goto preserve_metadata;
           else
             {
diff --git a/src/extent-scan.c b/src/extent-scan.c
new file mode 100644
index 0000000..f371b87
--- /dev/null
+++ b/src/extent-scan.c
@@ -0,0 +1,113 @@
+/* extent-scan.c -- core functions for scanning extents
+   Copyright (C) 2010 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+   Written by Jie Liu (jeff.liu <at> oracle.com).  */
+
+#include <config.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/ioctl.h>
+#include <assert.h>
+
+#include "system.h"
+#include "extent-scan.h"
+#include "error.h"
+#include "quote.h"
+
+#ifndef HAVE_FIEMAP
+# include "fiemap.h"
+#endif
+
+/* Allocate space for struct extent_scan, initialize the entries if
+   necessary and return it as the input argument of get_extents_info().  */
+extern void
+open_extent_scan (int src_fd, struct extent_scan *scan)
+{
+  scan->fd = src_fd;
+  scan->ei_count = 0;
+  scan->scan_start = 0;
+  scan->initial_scan_failed = false;
+  scan->hit_last_extent = false;
+}
+
+#ifdef __linux__
+# ifndef FS_IOC_FIEMAP
+#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
+# endif
+/* Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to
+   obtain a map of file extents excluding holes.  */
+extern bool
+get_extents_info (struct extent_scan *scan)
+{
+  union { struct fiemap f; char c[4096]; } fiemap_buf;
+  struct fiemap *fiemap = &fiemap_buf.f;
+  struct fiemap_extent *fm_extents = &fiemap->fm_extents[0];
+  enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_extents };
+  verify (count != 0);
+  unsigned int i;
+
+  /* This is required at least to initialize fiemap->fm_start,
+     but also serves (in mid 2010) to appease valgrind, which
+     appears not to know the semantics of the FIEMAP ioctl. */
+  memset (&fiemap_buf, 0, sizeof fiemap_buf);
+
+  fiemap->fm_start = scan->scan_start;
+  fiemap->fm_flags = FIEMAP_FLAG_SYNC;
+  fiemap->fm_extent_count = count;
+  fiemap->fm_length = FIEMAP_MAX_OFFSET - scan->scan_start;
+
+  /* Fall back to the standard copy if call ioctl(2) failed for the
+     the first time.  */
+  if (ioctl (scan->fd, FS_IOC_FIEMAP, fiemap) < 0)
+    {
+      if (scan->scan_start == 0)
+        scan->initial_scan_failed = true;
+      return false;
+    }
+
+  /* If 0 extents are returned, then more get_extent_table() are not needed.  */
+  if (fiemap->fm_mapped_extents == 0)
+    {
+      scan->hit_last_extent = true;
+      return false;
+    }
+
+  scan->ei_count = fiemap->fm_mapped_extents;
+  scan->ext_info = xnmalloc (scan->ei_count, sizeof (struct extent_info));
+
+  for (i = 0; i < scan->ei_count; i++)
+    {
+      assert (fm_extents[i].fe_logical <= OFF_T_MAX);
+
+      scan->ext_info[i].ext_logical = fm_extents[i].fe_logical;
+      scan->ext_info[i].ext_length = fm_extents[i].fe_length;
+      scan->ext_info[i].ext_flags = fm_extents[i].fe_flags;
+    }
+
+  i--;
+  if (scan->ext_info[i].ext_flags & FIEMAP_EXTENT_LAST)
+    {
+      scan->hit_last_extent = true;
+      return true;
+    }
+
+  scan->scan_start = fm_extents[i].fe_logical + fm_extents[i].fe_length;
+
+  return true;
+}
+#else
+extern bool get_extents_info (ignored) { errno = ENOTSUP; return false; }
+#endif
diff --git a/src/extent-scan.h b/src/extent-scan.h
new file mode 100644
index 0000000..07c2e5b
--- /dev/null
+++ b/src/extent-scan.h
@@ -0,0 +1,68 @@
+/* core functions for efficient reading sparse files
+   Copyright (C) 2010 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+   Written by Jie Liu (jeff.liu <at> oracle.com).  */
+
+#ifndef EXTENT_SCAN_H
+# define EXTENT_SCAN_H
+
+/* Structure used to reserve information of each extent.  */
+struct extent_info
+{
+  /* Logical offset of an extent.  */
+  off_t ext_logical;
+
+  /* Extent length.  */
+  uint64_t ext_length;
+
+  /* Extent flags, use it for FIEMAP only, or set it to zero.  */
+  uint32_t ext_flags;
+};
+
+/* Structure used to reserve extent scan information per file.  */
+struct extent_scan
+{
+  /* File descriptor of extent scan run against.  */
+  int fd;
+
+  /* Next scan start offset.  */
+  off_t scan_start;
+
+  /* How many extent info returned for a scan.  */
+  uint32_t ei_count;
+
+  /* If true, fall back to a normal copy, either
+     set by the failure of ioctl(2) for FIEMAP or
+     lseek(2) with SEEK_DATA.  */
+  bool initial_scan_failed;
+
+  /* If ture, the total extent scan per file has been finished.  */
+  bool hit_last_extent;
+
+  /* Extent information.  */
+  struct extent_info *ext_info;
+};
+
+void
+open_extent_scan (int src_fd, struct extent_scan *scan);
+
+bool
+get_extents_info (struct extent_scan *scan);
+
+#define free_extents_info(ext_scan) free ((ext_scan)->ext_info)
+#define close_extent_scan(ext_scan) /* empty */
+
+#endif /* EXTENT_SCAN_H */
-- 
1.5.4.3


Regards,
-Jeff






Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Mon, 11 Oct 2010 20:56:02 GMT) Full text and rfc822 format available.

Message #305 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Mon, 11 Oct 2010 22:59:14 +0200
jeff.liu wrote:

> Jim Meyering wrote:
>> jeff.liu wrote:
>>> Sorry for the delay.
>>>
>>> This is the new patch to isolate the stuff regarding to extents reading to a new module. and teach
>>> cp(1) to make use of it.
>>
>> Jeff,
>>
>> I applied your patch to my rebased fiemap-copy branch.
>> My first step was to run the usual
>>
>>   ./bootstrap && ./configure && make && make check
>>
>> "make check" failed on due to a double free in your new code:
>> (x86_64, Fedora 13, ext4 working directory)
>>
>> To get details, I made this temporary modification:
> Hi Jim,
>
> I am sorry for the fault, it fixed at the patch below.
> Would you please revie at your convenience?

Thanks,

Here are 5 changes on top of yours.
I'll definitely adjust logs and maybe merge one or two before
pushing anything.  Just to be sure people understand, this series
will not be in the upcoming release.

Quick summary:
  - don't let write failure go unreported
  - make "distcheck" pass once again
  - rename functions to start with "extent_scan_"
  - remove unnecessary #include directives (part of "make syntax-check")

None of this is pushed yet, but at least now it passes "make distcheck".

From ef3c2fe3760b11bc143b36246ee458ec86c484c9 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Mon, 11 Oct 2010 11:19:02 +0200
Subject: [PATCH 1/5] rename extent_scan member

* extent-scan.h [struct extent_scan]: Rename member:
s/hit_last_extent/hit_final_extent/.  "final" is clearer,
since "last" can be interpreted as "preceding".
---
 src/copy.c        |    4 ++--
 src/extent-scan.c |    6 +++---
 src/extent-scan.h |    2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 43eeb74..1e1360e 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -208,7 +208,7 @@ extent_copy (int src_fd, int dest_fd, size_t buf_size,
       bool ok = get_extents_info (&scan);
       if (! ok)
         {
-          if (scan.hit_last_extent)
+          if (scan.hit_final_extent)
             break;

           if (scan.initial_scan_failed)
@@ -302,7 +302,7 @@ extent_copy (int src_fd, int dest_fd, size_t buf_size,

       /* Release the space allocated to scan->ext_info.  */
       free_extents_info (&scan);
-    } while (! scan.hit_last_extent);
+    } while (! scan.hit_final_extent);

   /* Do nothing now.  */
   close_extent_scan (&scan);
diff --git a/src/extent-scan.c b/src/extent-scan.c
index f371b87..b0345f5 100644
--- a/src/extent-scan.c
+++ b/src/extent-scan.c
@@ -40,7 +40,7 @@ open_extent_scan (int src_fd, struct extent_scan *scan)
   scan->ei_count = 0;
   scan->scan_start = 0;
   scan->initial_scan_failed = false;
-  scan->hit_last_extent = false;
+  scan->hit_final_extent = false;
 }

 #ifdef __linux__
@@ -81,7 +81,7 @@ get_extents_info (struct extent_scan *scan)
   /* If 0 extents are returned, then more get_extent_table() are not needed.  */
   if (fiemap->fm_mapped_extents == 0)
     {
-      scan->hit_last_extent = true;
+      scan->hit_final_extent = true;
       return false;
     }

@@ -100,7 +100,7 @@ get_extents_info (struct extent_scan *scan)
   i--;
   if (scan->ext_info[i].ext_flags & FIEMAP_EXTENT_LAST)
     {
-      scan->hit_last_extent = true;
+      scan->hit_final_extent = true;
       return true;
     }

diff --git a/src/extent-scan.h b/src/extent-scan.h
index 07c2e5b..0c9c199 100644
--- a/src/extent-scan.h
+++ b/src/extent-scan.h
@@ -50,7 +50,7 @@ struct extent_scan
   bool initial_scan_failed;

   /* If ture, the total extent scan per file has been finished.  */
-  bool hit_last_extent;
+  bool hit_final_extent;

   /* Extent information.  */
   struct extent_info *ext_info;
--
1.7.3.1.104.gc752e


From 7f38fe3ab8d1ee08f8ca4a96457df39da5bd1f70 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Mon, 11 Oct 2010 11:44:12 +0200
Subject: [PATCH 2/5] rename extent-scan functions to start with extent_scan_

---
 src/copy.c        |   12 +++++-------
 src/extent-scan.c |   10 +++++-----
 src/extent-scan.h |   22 ++++++++++++----------
 3 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 1e1360e..a7d10b8 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -201,11 +201,11 @@ extent_copy (int src_fd, int dest_fd, size_t buf_size,
   uint64_t last_ext_len = 0;
   uint64_t last_read_size = 0;

-  open_extent_scan (src_fd, &scan);
+  extent_scan_init (src_fd, &scan);

   do
     {
-      bool ok = get_extents_info (&scan);
+      bool ok = extent_scan_read (&scan);
       if (! ok)
         {
           if (scan.hit_final_extent)
@@ -213,7 +213,6 @@ extent_copy (int src_fd, int dest_fd, size_t buf_size,

           if (scan.initial_scan_failed)
             {
-              close_extent_scan (&scan);
               *require_normal_copy = true;
               return false;
             }
@@ -301,11 +300,10 @@ extent_copy (int src_fd, int dest_fd, size_t buf_size,
         }

       /* Release the space allocated to scan->ext_info.  */
-      free_extents_info (&scan);
-    } while (! scan.hit_final_extent);
+      extent_scan_free (&scan);

-  /* Do nothing now.  */
-  close_extent_scan (&scan);
+    }
+  while (! scan.hit_final_extent);

   /* If a file ends up with holes, the sum of the last extent logical offset
      and the read-returned size or the last extent length will be shorter than
diff --git a/src/extent-scan.c b/src/extent-scan.c
index b0345f5..97bb792 100644
--- a/src/extent-scan.c
+++ b/src/extent-scan.c
@@ -32,9 +32,9 @@
 #endif

 /* Allocate space for struct extent_scan, initialize the entries if
-   necessary and return it as the input argument of get_extents_info().  */
+   necessary and return it as the input argument of extent_scan_read().  */
 extern void
-open_extent_scan (int src_fd, struct extent_scan *scan)
+extent_scan_init (int src_fd, struct extent_scan *scan)
 {
   scan->fd = src_fd;
   scan->ei_count = 0;
@@ -50,14 +50,13 @@ open_extent_scan (int src_fd, struct extent_scan *scan)
 /* Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to
    obtain a map of file extents excluding holes.  */
 extern bool
-get_extents_info (struct extent_scan *scan)
+extent_scan_read (struct extent_scan *scan)
 {
   union { struct fiemap f; char c[4096]; } fiemap_buf;
   struct fiemap *fiemap = &fiemap_buf.f;
   struct fiemap_extent *fm_extents = &fiemap->fm_extents[0];
   enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_extents };
   verify (count != 0);
-  unsigned int i;

   /* This is required at least to initialize fiemap->fm_start,
      but also serves (in mid 2010) to appease valgrind, which
@@ -88,6 +87,7 @@ get_extents_info (struct extent_scan *scan)
   scan->ei_count = fiemap->fm_mapped_extents;
   scan->ext_info = xnmalloc (scan->ei_count, sizeof (struct extent_info));

+  unsigned int i;
   for (i = 0; i < scan->ei_count; i++)
     {
       assert (fm_extents[i].fe_logical <= OFF_T_MAX);
@@ -109,5 +109,5 @@ get_extents_info (struct extent_scan *scan)
   return true;
 }
 #else
-extern bool get_extents_info (ignored) { errno = ENOTSUP; return false; }
+extern bool extent_scan_read (ignored) { errno = ENOTSUP; return false; }
 #endif
diff --git a/src/extent-scan.h b/src/extent-scan.h
index 0c9c199..3119c8d 100644
--- a/src/extent-scan.h
+++ b/src/extent-scan.h
@@ -19,7 +19,7 @@
 #ifndef EXTENT_SCAN_H
 # define EXTENT_SCAN_H

-/* Structure used to reserve information of each extent.  */
+/* Structure used to store information of each extent.  */
 struct extent_info
 {
   /* Logical offset of an extent.  */
@@ -44,25 +44,27 @@ struct extent_scan
   /* How many extent info returned for a scan.  */
   uint32_t ei_count;

-  /* If true, fall back to a normal copy, either
-     set by the failure of ioctl(2) for FIEMAP or
-     lseek(2) with SEEK_DATA.  */
+  /* If true, fall back to a normal copy, either set by the
+     failure of ioctl(2) for FIEMAP or lseek(2) with SEEK_DATA.  */
   bool initial_scan_failed;

-  /* If ture, the total extent scan per file has been finished.  */
+  /* If true, the total extent scan per file has been finished.  */
   bool hit_final_extent;

-  /* Extent information.  */
+  /* Extent information: a malloc'd array of ei_count structs.  */
   struct extent_info *ext_info;
 };

 void
-open_extent_scan (int src_fd, struct extent_scan *scan);
+extent_scan_init (int src_fd, struct extent_scan *scan);

 bool
-get_extents_info (struct extent_scan *scan);
+extent_scan_read (struct extent_scan *scan);

-#define free_extents_info(ext_scan) free ((ext_scan)->ext_info)
-#define close_extent_scan(ext_scan) /* empty */
+static inline void
+extent_scan_free (struct extent_scan *scan)
+{
+  free (scan->ext_info);
+}

 #endif /* EXTENT_SCAN_H */
--
1.7.3.1.104.gc752e


From e33ec433eb36b1a777f9591a63bcaee1b9e6c1bf Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Mon, 11 Oct 2010 11:55:46 +0200
Subject: [PATCH 3/5] distribute extent-scan.h, too

* src/Makefile.am (copy_sources): Also distribute extent-scan.h.
---
 src/Makefile.am |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/Makefile.am b/src/Makefile.am
index 7187596..de4c828 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -450,7 +450,7 @@ uninstall-local:
 	  fi; \
 	fi

-copy_sources = copy.c cp-hash.c extent-scan.c
+copy_sources = copy.c cp-hash.c extent-scan.c extent-scan.h

 # Use `ginstall' in the definition of PROGRAMS and in dependencies to avoid
 # confusion with the `install' target.  The install rule transforms `ginstall'
--
1.7.3.1.104.gc752e


From b0a1374189800a6e8edc2cfb5154199fe970ccd7 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Mon, 11 Oct 2010 11:55:58 +0200
Subject: [PATCH 4/5] formatting

---
 src/extent-scan.c |    7 ++++++-
 src/extent-scan.h |    6 ++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/extent-scan.c b/src/extent-scan.c
index 97bb792..5160975 100644
--- a/src/extent-scan.c
+++ b/src/extent-scan.c
@@ -109,5 +109,10 @@ extent_scan_read (struct extent_scan *scan)
   return true;
 }
 #else
-extern bool extent_scan_read (ignored) { errno = ENOTSUP; return false; }
+extern bool
+extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED)
+{
+  errno = ENOTSUP;
+  return false;
+}
 #endif
diff --git a/src/extent-scan.h b/src/extent-scan.h
index 3119c8d..ac9e500 100644
--- a/src/extent-scan.h
+++ b/src/extent-scan.h
@@ -55,11 +55,9 @@ struct extent_scan
   struct extent_info *ext_info;
 };

-void
-extent_scan_init (int src_fd, struct extent_scan *scan);
+void extent_scan_init (int src_fd, struct extent_scan *scan);

-bool
-extent_scan_read (struct extent_scan *scan);
+bool extent_scan_read (struct extent_scan *scan);

 static inline void
 extent_scan_free (struct extent_scan *scan)
--
1.7.3.1.104.gc752e


From f4513c41e656af44859587060ce9658241988cb1 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Mon, 11 Oct 2010 12:00:07 +0200
Subject: [PATCH 5/5] extent-scan.c: don't include error.h or quote.h

* src/extent-scan.c: Don't include error.h or quote.h.  Neither is used.
---
 src/extent-scan.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/src/extent-scan.c b/src/extent-scan.c
index 5160975..3bb0d53 100644
--- a/src/extent-scan.c
+++ b/src/extent-scan.c
@@ -24,8 +24,6 @@

 #include "system.h"
 #include "extent-scan.h"
-#include "error.h"
-#include "quote.h"

 #ifndef HAVE_FIEMAP
 # include "fiemap.h"
--
1.7.3.1.104.gc752e




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 12 Oct 2010 05:25:02 GMT) Full text and rfc822 format available.

Message #308 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 12 Oct 2010 13:15:11 +0800
Jim Meyering wrote:
> jeff.liu wrote:
> 
>> Jim Meyering wrote:
>>> jeff.liu wrote:
>>>> Sorry for the delay.
>>>>
>>>> This is the new patch to isolate the stuff regarding to extents reading to a new module. and teach
>>>> cp(1) to make use of it.
>>> Jeff,
>>>
>>> I applied your patch to my rebased fiemap-copy branch.
>>> My first step was to run the usual
>>>
>>>   ./bootstrap && ./configure && make && make check
>>>
>>> "make check" failed on due to a double free in your new code:
>>> (x86_64, Fedora 13, ext4 working directory)
>>>
>>> To get details, I made this temporary modification:
>> Hi Jim,
>>
>> I am sorry for the fault, it fixed at the patch below.
>> Would you please revie at your convenience?
> 
> Thanks,
> 
> Here are 5 changes on top of yours.
> I'll definitely adjust logs and maybe merge one or two before
> pushing anything.  Just to be sure people understand, this series
> will not be in the upcoming release.
> 
> Quick summary:
>   - don't let write failure go unreported
>   - make "distcheck" pass once again
>   - rename functions to start with "extent_scan_"
>   - remove unnecessary #include directives (part of "make syntax-check")
> 
> None of this is pushed yet, but at least now it passes "make distcheck".
Thanks for the update, the new functions name looks better than me.


Regards,
-Jeff
> 
> From ef3c2fe3760b11bc143b36246ee458ec86c484c9 Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Mon, 11 Oct 2010 11:19:02 +0200
> Subject: [PATCH 1/5] rename extent_scan member
> 
> * extent-scan.h [struct extent_scan]: Rename member:
> s/hit_last_extent/hit_final_extent/.  "final" is clearer,
> since "last" can be interpreted as "preceding".
> ---
>  src/copy.c        |    4 ++--
>  src/extent-scan.c |    6 +++---
>  src/extent-scan.h |    2 +-
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/src/copy.c b/src/copy.c
> index 43eeb74..1e1360e 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -208,7 +208,7 @@ extent_copy (int src_fd, int dest_fd, size_t buf_size,
>        bool ok = get_extents_info (&scan);
>        if (! ok)
>          {
> -          if (scan.hit_last_extent)
> +          if (scan.hit_final_extent)
>              break;
> 
>            if (scan.initial_scan_failed)
> @@ -302,7 +302,7 @@ extent_copy (int src_fd, int dest_fd, size_t buf_size,
> 
>        /* Release the space allocated to scan->ext_info.  */
>        free_extents_info (&scan);
> -    } while (! scan.hit_last_extent);
> +    } while (! scan.hit_final_extent);
> 
>    /* Do nothing now.  */
>    close_extent_scan (&scan);
> diff --git a/src/extent-scan.c b/src/extent-scan.c
> index f371b87..b0345f5 100644
> --- a/src/extent-scan.c
> +++ b/src/extent-scan.c
> @@ -40,7 +40,7 @@ open_extent_scan (int src_fd, struct extent_scan *scan)
>    scan->ei_count = 0;
>    scan->scan_start = 0;
>    scan->initial_scan_failed = false;
> -  scan->hit_last_extent = false;
> +  scan->hit_final_extent = false;
>  }
> 
>  #ifdef __linux__
> @@ -81,7 +81,7 @@ get_extents_info (struct extent_scan *scan)
>    /* If 0 extents are returned, then more get_extent_table() are not needed.  */
>    if (fiemap->fm_mapped_extents == 0)
>      {
> -      scan->hit_last_extent = true;
> +      scan->hit_final_extent = true;
>        return false;
>      }
> 
> @@ -100,7 +100,7 @@ get_extents_info (struct extent_scan *scan)
>    i--;
>    if (scan->ext_info[i].ext_flags & FIEMAP_EXTENT_LAST)
>      {
> -      scan->hit_last_extent = true;
> +      scan->hit_final_extent = true;
>        return true;
>      }
> 
> diff --git a/src/extent-scan.h b/src/extent-scan.h
> index 07c2e5b..0c9c199 100644
> --- a/src/extent-scan.h
> +++ b/src/extent-scan.h
> @@ -50,7 +50,7 @@ struct extent_scan
>    bool initial_scan_failed;
> 
>    /* If ture, the total extent scan per file has been finished.  */
> -  bool hit_last_extent;
> +  bool hit_final_extent;
> 
>    /* Extent information.  */
>    struct extent_info *ext_info;
> --
> 1.7.3.1.104.gc752e
> 
> 
> From 7f38fe3ab8d1ee08f8ca4a96457df39da5bd1f70 Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Mon, 11 Oct 2010 11:44:12 +0200
> Subject: [PATCH 2/5] rename extent-scan functions to start with extent_scan_
> 
> ---
>  src/copy.c        |   12 +++++-------
>  src/extent-scan.c |   10 +++++-----
>  src/extent-scan.h |   22 ++++++++++++----------
>  3 files changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/src/copy.c b/src/copy.c
> index 1e1360e..a7d10b8 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -201,11 +201,11 @@ extent_copy (int src_fd, int dest_fd, size_t buf_size,
>    uint64_t last_ext_len = 0;
>    uint64_t last_read_size = 0;
> 
> -  open_extent_scan (src_fd, &scan);
> +  extent_scan_init (src_fd, &scan);
> 
>    do
>      {
> -      bool ok = get_extents_info (&scan);
> +      bool ok = extent_scan_read (&scan);
>        if (! ok)
>          {
>            if (scan.hit_final_extent)
> @@ -213,7 +213,6 @@ extent_copy (int src_fd, int dest_fd, size_t buf_size,
> 
>            if (scan.initial_scan_failed)
>              {
> -              close_extent_scan (&scan);
>                *require_normal_copy = true;
>                return false;
>              }
> @@ -301,11 +300,10 @@ extent_copy (int src_fd, int dest_fd, size_t buf_size,
>          }
> 
>        /* Release the space allocated to scan->ext_info.  */
> -      free_extents_info (&scan);
> -    } while (! scan.hit_final_extent);
> +      extent_scan_free (&scan);
> 
> -  /* Do nothing now.  */
> -  close_extent_scan (&scan);
> +    }
> +  while (! scan.hit_final_extent);
> 
>    /* If a file ends up with holes, the sum of the last extent logical offset
>       and the read-returned size or the last extent length will be shorter than
> diff --git a/src/extent-scan.c b/src/extent-scan.c
> index b0345f5..97bb792 100644
> --- a/src/extent-scan.c
> +++ b/src/extent-scan.c
> @@ -32,9 +32,9 @@
>  #endif
> 
>  /* Allocate space for struct extent_scan, initialize the entries if
> -   necessary and return it as the input argument of get_extents_info().  */
> +   necessary and return it as the input argument of extent_scan_read().  */
>  extern void
> -open_extent_scan (int src_fd, struct extent_scan *scan)
> +extent_scan_init (int src_fd, struct extent_scan *scan)
>  {
>    scan->fd = src_fd;
>    scan->ei_count = 0;
> @@ -50,14 +50,13 @@ open_extent_scan (int src_fd, struct extent_scan *scan)
>  /* Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to
>     obtain a map of file extents excluding holes.  */
>  extern bool
> -get_extents_info (struct extent_scan *scan)
> +extent_scan_read (struct extent_scan *scan)
>  {
>    union { struct fiemap f; char c[4096]; } fiemap_buf;
>    struct fiemap *fiemap = &fiemap_buf.f;
>    struct fiemap_extent *fm_extents = &fiemap->fm_extents[0];
>    enum { count = (sizeof fiemap_buf - sizeof *fiemap) / sizeof *fm_extents };
>    verify (count != 0);
> -  unsigned int i;
> 
>    /* This is required at least to initialize fiemap->fm_start,
>       but also serves (in mid 2010) to appease valgrind, which
> @@ -88,6 +87,7 @@ get_extents_info (struct extent_scan *scan)
>    scan->ei_count = fiemap->fm_mapped_extents;
>    scan->ext_info = xnmalloc (scan->ei_count, sizeof (struct extent_info));
> 
> +  unsigned int i;
>    for (i = 0; i < scan->ei_count; i++)
>      {
>        assert (fm_extents[i].fe_logical <= OFF_T_MAX);
> @@ -109,5 +109,5 @@ get_extents_info (struct extent_scan *scan)
>    return true;
>  }
>  #else
> -extern bool get_extents_info (ignored) { errno = ENOTSUP; return false; }
> +extern bool extent_scan_read (ignored) { errno = ENOTSUP; return false; }
>  #endif
> diff --git a/src/extent-scan.h b/src/extent-scan.h
> index 0c9c199..3119c8d 100644
> --- a/src/extent-scan.h
> +++ b/src/extent-scan.h
> @@ -19,7 +19,7 @@
>  #ifndef EXTENT_SCAN_H
>  # define EXTENT_SCAN_H
> 
> -/* Structure used to reserve information of each extent.  */
> +/* Structure used to store information of each extent.  */
>  struct extent_info
>  {
>    /* Logical offset of an extent.  */
> @@ -44,25 +44,27 @@ struct extent_scan
>    /* How many extent info returned for a scan.  */
>    uint32_t ei_count;
> 
> -  /* If true, fall back to a normal copy, either
> -     set by the failure of ioctl(2) for FIEMAP or
> -     lseek(2) with SEEK_DATA.  */
> +  /* If true, fall back to a normal copy, either set by the
> +     failure of ioctl(2) for FIEMAP or lseek(2) with SEEK_DATA.  */
>    bool initial_scan_failed;
> 
> -  /* If ture, the total extent scan per file has been finished.  */
> +  /* If true, the total extent scan per file has been finished.  */
>    bool hit_final_extent;
> 
> -  /* Extent information.  */
> +  /* Extent information: a malloc'd array of ei_count structs.  */
>    struct extent_info *ext_info;
>  };
> 
>  void
> -open_extent_scan (int src_fd, struct extent_scan *scan);
> +extent_scan_init (int src_fd, struct extent_scan *scan);
> 
>  bool
> -get_extents_info (struct extent_scan *scan);
> +extent_scan_read (struct extent_scan *scan);
> 
> -#define free_extents_info(ext_scan) free ((ext_scan)->ext_info)
> -#define close_extent_scan(ext_scan) /* empty */
> +static inline void
> +extent_scan_free (struct extent_scan *scan)
> +{
> +  free (scan->ext_info);
> +}
> 
>  #endif /* EXTENT_SCAN_H */
> --
> 1.7.3.1.104.gc752e
> 
> 
> From e33ec433eb36b1a777f9591a63bcaee1b9e6c1bf Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Mon, 11 Oct 2010 11:55:46 +0200
> Subject: [PATCH 3/5] distribute extent-scan.h, too
> 
> * src/Makefile.am (copy_sources): Also distribute extent-scan.h.
> ---
>  src/Makefile.am |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/src/Makefile.am b/src/Makefile.am
> index 7187596..de4c828 100644
> --- a/src/Makefile.am
> +++ b/src/Makefile.am
> @@ -450,7 +450,7 @@ uninstall-local:
>  	  fi; \
>  	fi
> 
> -copy_sources = copy.c cp-hash.c extent-scan.c
> +copy_sources = copy.c cp-hash.c extent-scan.c extent-scan.h
> 
>  # Use `ginstall' in the definition of PROGRAMS and in dependencies to avoid
>  # confusion with the `install' target.  The install rule transforms `ginstall'
> --
> 1.7.3.1.104.gc752e
> 
> 
> From b0a1374189800a6e8edc2cfb5154199fe970ccd7 Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Mon, 11 Oct 2010 11:55:58 +0200
> Subject: [PATCH 4/5] formatting
> 
> ---
>  src/extent-scan.c |    7 ++++++-
>  src/extent-scan.h |    6 ++----
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/src/extent-scan.c b/src/extent-scan.c
> index 97bb792..5160975 100644
> --- a/src/extent-scan.c
> +++ b/src/extent-scan.c
> @@ -109,5 +109,10 @@ extent_scan_read (struct extent_scan *scan)
>    return true;
>  }
>  #else
> -extern bool extent_scan_read (ignored) { errno = ENOTSUP; return false; }
> +extern bool
> +extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED)
> +{
> +  errno = ENOTSUP;
> +  return false;
> +}
>  #endif
> diff --git a/src/extent-scan.h b/src/extent-scan.h
> index 3119c8d..ac9e500 100644
> --- a/src/extent-scan.h
> +++ b/src/extent-scan.h
> @@ -55,11 +55,9 @@ struct extent_scan
>    struct extent_info *ext_info;
>  };
> 
> -void
> -extent_scan_init (int src_fd, struct extent_scan *scan);
> +void extent_scan_init (int src_fd, struct extent_scan *scan);
> 
> -bool
> -extent_scan_read (struct extent_scan *scan);
> +bool extent_scan_read (struct extent_scan *scan);
> 
>  static inline void
>  extent_scan_free (struct extent_scan *scan)
> --
> 1.7.3.1.104.gc752e
> 
> 
> From f4513c41e656af44859587060ce9658241988cb1 Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Mon, 11 Oct 2010 12:00:07 +0200
> Subject: [PATCH 5/5] extent-scan.c: don't include error.h or quote.h
> 
> * src/extent-scan.c: Don't include error.h or quote.h.  Neither is used.
> ---
>  src/extent-scan.c |    2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
> 
> diff --git a/src/extent-scan.c b/src/extent-scan.c
> index 5160975..3bb0d53 100644
> --- a/src/extent-scan.c
> +++ b/src/extent-scan.c
> @@ -24,8 +24,6 @@
> 
>  #include "system.h"
>  #include "extent-scan.h"
> -#include "error.h"
> -#include "quote.h"
> 
>  #ifndef HAVE_FIEMAP
>  # include "fiemap.h"
> --
> 1.7.3.1.104.gc752e
> 
> 
> 





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sat, 22 Jan 2011 13:00:03 GMT) Full text and rfc822 format available.

Message #311 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>,
	Tao Ma <tao.ma <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Sat, 22 Jan 2011 14:07:16 +0100
jeff.liu wrote:
> Hi Jim and All,
>
> Do you have any comments for the current implementation?

There have been several releases since we last talked about this,
but now is a good time to revive it.

I've rebased the fiemap-copy branch and made a few changes:
(somewhat sloppy 2nd log entry with the "*")

      copy.c: shorten a comment to fit in 80 columns
      * src/copy.c (copy_reg): Remove useless else-after-goto.
      copy: call extent_copy also when make_holes is false, ...
      copy: tweak variable name; improve a comment
      copy: don't allocate a separate buffer just for extent-based copy

I pushed the result as the new fiemap-copy-2 branch:

    http://git.savannah.gnu.org/cgit/coreutils.git/log/?h=fiemap-copy-2

Here are the five most recent commits.
The last one is the most interesting.
Here's its full log entry:

    copy: don't allocate a separate buffer just for extent-based copy
    * src/copy.c (copy_reg): Move use of extent_scan to just *after*
    we allocate the main copying buffer, so we can...
    (extent_scan): Take a new parameter, BUF, and use that rather
    than allocating a private buffer.  Update caller.


From ffd02ad91ac22b18c0a07c433e7e9983aed81542 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Tue, 11 Jan 2011 22:49:34 +0100
Subject: [PATCH 1/5] copy.c: shorten a comment to fit in 80 columns

---
 src/copy.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 30c1b56..270009b 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -287,7 +287,7 @@ extent_copy (int src_fd, int dest_fd, size_t buf_size,

               if (n_read == 0)
                 {
-                  /* Figure out how many bytes read from the previous extent.  */
+                  /* Record number of bytes read from the previous extent.  */
                   last_read_size = last_ext_len - ext_len;
                   break;
                 }
--
1.7.3.5.38.gb312b


From f880d4e43c47fa0b08757d911e00c69de07296ab Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sat, 22 Jan 2011 12:30:21 +0100
Subject: [PATCH 2/5] * src/copy.c (copy_reg): Remove useless else-after-goto.

---
 src/copy.c |   10 ++++------
 1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 270009b..71da00d 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -879,13 +879,11 @@ copy_reg (char const *src_name, char const *dst_name,
                            src_open_sb.st_size, make_holes,
                            src_name, dst_name, &require_normal_copy))
             goto preserve_metadata;
-          else
+
+          if (! require_normal_copy)
             {
-              if (! require_normal_copy)
-                {
-                  return_val = false;
-                  goto close_src_and_dst_desc;
-                }
+              return_val = false;
+              goto close_src_and_dst_desc;
             }
         }

--
1.7.3.5.38.gb312b


From 237c2325b3d11e1b1a576978b884df3423a075b1 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sat, 22 Jan 2011 12:36:03 +0100
Subject: [PATCH 3/5] copy: call extent_copy also when make_holes is false, ...

so that we benefit from using extents also when reading a sparse
input file with --sparse=never.
* src/copy.c (copy_reg): Remove erroneous test of "make_holes"
so that we call extent_copy also when make_holes is false.
Otherwise, what's the point of that parameter?
---
 src/copy.c |   29 +++++++++++++----------------
 1 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 71da00d..be7fdba 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -868,23 +868,20 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }

-      if (make_holes)
+      bool require_normal_copy;
+      /* Perform efficient extent copy for sparse file, fall back to the
+         standard copy only if the initial extent scan fails.  If the
+         '--sparse=never' option was specified, we writing all data but
+         use extent copy if available to efficiently read.  */
+      if (extent_copy (source_desc, dest_desc, buf_size,
+                       src_open_sb.st_size, make_holes,
+                       src_name, dst_name, &require_normal_copy))
+        goto preserve_metadata;
+
+      if (! require_normal_copy)
         {
-          bool require_normal_copy;
-          /* Perform efficient extent copy for sparse file, fall back to the
-             standard copy only if the initial extent scan fails.  If the
-             '--sparse=never' option was specified, we writing all data but
-             use extent copy if available to efficiently read.  */
-          if (extent_copy (source_desc, dest_desc, buf_size,
-                           src_open_sb.st_size, make_holes,
-                           src_name, dst_name, &require_normal_copy))
-            goto preserve_metadata;
-
-          if (! require_normal_copy)
-            {
-              return_val = false;
-              goto close_src_and_dst_desc;
-            }
+          return_val = false;
+          goto close_src_and_dst_desc;
         }

       /* If not making a sparse file, try to use a more-efficient
--
1.7.3.5.38.gb312b


From b3dfab326ad8d917ac1eaba10e0852bf695f93ae Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sat, 22 Jan 2011 12:55:58 +0100
Subject: [PATCH 4/5] copy: tweak variable name; improve a comment

* src/copy.c (copy_reg): Rename a variable to make more sense from
caller's perspective: s/require_normal_copy/normal_copy_required/.
This is an output-only variable, and the original name could make
it look like an input (or i&o) variable.
---
 src/copy.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index be7fdba..fae8dbe 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -868,17 +868,17 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }

-      bool require_normal_copy;
-      /* Perform efficient extent copy for sparse file, fall back to the
+      bool normal_copy_required;
+      /* Perform an efficient extent-based copy, falling back to the
          standard copy only if the initial extent scan fails.  If the
-         '--sparse=never' option was specified, we writing all data but
-         use extent copy if available to efficiently read.  */
+         '--sparse=never' option is specified, write all data but use
+         any extents to read more efficiently.  */
       if (extent_copy (source_desc, dest_desc, buf_size,
                        src_open_sb.st_size, make_holes,
-                       src_name, dst_name, &require_normal_copy))
+                       src_name, dst_name, &normal_copy_required))
         goto preserve_metadata;

-      if (! require_normal_copy)
+      if (! normal_copy_required)
         {
           return_val = false;
           goto close_src_and_dst_desc;
--
1.7.3.5.38.gb312b


From bdf7c351a37ed6eeaa6bce98cb82902073bcc6c3 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sat, 22 Jan 2011 13:09:08 +0100
Subject: [PATCH 5/5] copy: don't allocate a separate buffer just for extent-based copy

* src/copy.c (copy_reg): Move use of extent_scan to just *after*
we allocate the main copying buffer, so we can...
(extent_scan): Take a new parameter, BUF, and use that rather
than allocating a private buffer.  Update caller.
---
 src/copy.c |   36 +++++++++++++++++-------------------
 1 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index fae8dbe..c9cc2f7 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -194,7 +194,7 @@ write_zeros (int fd, uint64_t n_bytes)
    Upon any other failure, set *NORMAL_COPY_REQUIRED to false and
    return false.  */
 static bool
-extent_copy (int src_fd, int dest_fd, size_t buf_size,
+extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
              off_t src_total_size, bool make_holes,
              char const *src_name, char const *dst_name,
              bool *require_normal_copy)
@@ -268,8 +268,6 @@ extent_copy (int src_fd, int dest_fd, size_t buf_size,

           while (ext_len)
             {
-              char buf[buf_size];
-
               /* Avoid reading into the holes if the left extent
                  length is shorter than the buffer size.  */
               buf_size = MIN (ext_len, buf_size);
@@ -868,22 +866,6 @@ copy_reg (char const *src_name, char const *dst_name,
 #endif
         }

-      bool normal_copy_required;
-      /* Perform an efficient extent-based copy, falling back to the
-         standard copy only if the initial extent scan fails.  If the
-         '--sparse=never' option is specified, write all data but use
-         any extents to read more efficiently.  */
-      if (extent_copy (source_desc, dest_desc, buf_size,
-                       src_open_sb.st_size, make_holes,
-                       src_name, dst_name, &normal_copy_required))
-        goto preserve_metadata;
-
-      if (! normal_copy_required)
-        {
-          return_val = false;
-          goto close_src_and_dst_desc;
-        }
-
       /* If not making a sparse file, try to use a more-efficient
          buffer size.  */
       if (! make_holes)
@@ -912,6 +894,22 @@ copy_reg (char const *src_name, char const *dst_name,
       buf_alloc = xmalloc (buf_size + buf_alignment_slop);
       buf = ptr_align (buf_alloc, buf_alignment);

+      bool normal_copy_required;
+      /* Perform an efficient extent-based copy, falling back to the
+         standard copy only if the initial extent scan fails.  If the
+         '--sparse=never' option is specified, write all data but use
+         any extents to read more efficiently.  */
+      if (extent_copy (source_desc, dest_desc, buf, buf_size,
+                       src_open_sb.st_size, make_holes,
+                       src_name, dst_name, &normal_copy_required))
+        goto preserve_metadata;
+
+      if (! normal_copy_required)
+        {
+          return_val = false;
+          goto close_src_and_dst_desc;
+        }
+
       while (true)
         {
           word *wp = NULL;
--
1.7.3.5.38.gb312b




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 25 Jan 2011 07:25:01 GMT) Full text and rfc822 format available.

Message #314 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Paul Eggert <eggert <at> CS.UCLA.EDU>, bug-coreutils <at> gnu.org,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 25 Jan 2011 15:31:54 +0800
Hi Jim,

Thanks for your time to help consolidating the code!

Is this patchset acceptable to merge into the next official release?
AFAICS, the tests passed on all filesystems except ext4, but the result is ok by comparing the file
contents, can we take this risk?

Another thing is to add solaris SEEK_DATA support to extent_scan.c as we discussed before, not sure
if anyone working on this now. If not, I will take some time to follow up but have to delay about 2
weeks since I will on vacation for the chinese new year start from next week.

Btw, do you have plan to post extent_scan module to gnulib upstream? so that other file archive
projects(like tar(1)) can benefit from it.

Any thing I can do for this patchset please just let me know. :)


Regards,
-Jeff

Jim Meyering wrote:
> jeff.liu wrote:
>> Hi Jim and All,
>>
>> Do you have any comments for the current implementation?
> 
> There have been several releases since we last talked about this,
> but now is a good time to revive it.
> 
> I've rebased the fiemap-copy branch and made a few changes:
> (somewhat sloppy 2nd log entry with the "*")
> 
>       copy.c: shorten a comment to fit in 80 columns
>       * src/copy.c (copy_reg): Remove useless else-after-goto.
>       copy: call extent_copy also when make_holes is false, ...
>       copy: tweak variable name; improve a comment
>       copy: don't allocate a separate buffer just for extent-based copy
> 
> I pushed the result as the new fiemap-copy-2 branch:
> 
>     http://git.savannah.gnu.org/cgit/coreutils.git/log/?h=fiemap-copy-2
> 
> Here are the five most recent commits.
> The last one is the most interesting.
> Here's its full log entry:
> 
>     copy: don't allocate a separate buffer just for extent-based copy
>     * src/copy.c (copy_reg): Move use of extent_scan to just *after*
>     we allocate the main copying buffer, so we can...
>     (extent_scan): Take a new parameter, BUF, and use that rather
>     than allocating a private buffer.  Update caller.
> 
> 
> From ffd02ad91ac22b18c0a07c433e7e9983aed81542 Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Tue, 11 Jan 2011 22:49:34 +0100
> Subject: [PATCH 1/5] copy.c: shorten a comment to fit in 80 columns
> 
> ---
>  src/copy.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/src/copy.c b/src/copy.c
> index 30c1b56..270009b 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -287,7 +287,7 @@ extent_copy (int src_fd, int dest_fd, size_t buf_size,
> 
>                if (n_read == 0)
>                  {
> -                  /* Figure out how many bytes read from the previous extent.  */
> +                  /* Record number of bytes read from the previous extent.  */
>                    last_read_size = last_ext_len - ext_len;
>                    break;
>                  }
> --
> 1.7.3.5.38.gb312b
> 
> 
> From f880d4e43c47fa0b08757d911e00c69de07296ab Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Sat, 22 Jan 2011 12:30:21 +0100
> Subject: [PATCH 2/5] * src/copy.c (copy_reg): Remove useless else-after-goto.
> 
> ---
>  src/copy.c |   10 ++++------
>  1 files changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/src/copy.c b/src/copy.c
> index 270009b..71da00d 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -879,13 +879,11 @@ copy_reg (char const *src_name, char const *dst_name,
>                             src_open_sb.st_size, make_holes,
>                             src_name, dst_name, &require_normal_copy))
>              goto preserve_metadata;
> -          else
> +
> +          if (! require_normal_copy)
>              {
> -              if (! require_normal_copy)
> -                {
> -                  return_val = false;
> -                  goto close_src_and_dst_desc;
> -                }
> +              return_val = false;
> +              goto close_src_and_dst_desc;
>              }
>          }
> 
> --
> 1.7.3.5.38.gb312b
> 
> 
> From 237c2325b3d11e1b1a576978b884df3423a075b1 Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Sat, 22 Jan 2011 12:36:03 +0100
> Subject: [PATCH 3/5] copy: call extent_copy also when make_holes is false, ...
> 
> so that we benefit from using extents also when reading a sparse
> input file with --sparse=never.
> * src/copy.c (copy_reg): Remove erroneous test of "make_holes"
> so that we call extent_copy also when make_holes is false.
> Otherwise, what's the point of that parameter?
> ---
>  src/copy.c |   29 +++++++++++++----------------
>  1 files changed, 13 insertions(+), 16 deletions(-)
> 
> diff --git a/src/copy.c b/src/copy.c
> index 71da00d..be7fdba 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -868,23 +868,20 @@ copy_reg (char const *src_name, char const *dst_name,
>  #endif
>          }
> 
> -      if (make_holes)
> +      bool require_normal_copy;
> +      /* Perform efficient extent copy for sparse file, fall back to the
> +         standard copy only if the initial extent scan fails.  If the
> +         '--sparse=never' option was specified, we writing all data but
> +         use extent copy if available to efficiently read.  */
> +      if (extent_copy (source_desc, dest_desc, buf_size,
> +                       src_open_sb.st_size, make_holes,
> +                       src_name, dst_name, &require_normal_copy))
> +        goto preserve_metadata;
> +
> +      if (! require_normal_copy)
>          {
> -          bool require_normal_copy;
> -          /* Perform efficient extent copy for sparse file, fall back to the
> -             standard copy only if the initial extent scan fails.  If the
> -             '--sparse=never' option was specified, we writing all data but
> -             use extent copy if available to efficiently read.  */
> -          if (extent_copy (source_desc, dest_desc, buf_size,
> -                           src_open_sb.st_size, make_holes,
> -                           src_name, dst_name, &require_normal_copy))
> -            goto preserve_metadata;
> -
> -          if (! require_normal_copy)
> -            {
> -              return_val = false;
> -              goto close_src_and_dst_desc;
> -            }
> +          return_val = false;
> +          goto close_src_and_dst_desc;
>          }
> 
>        /* If not making a sparse file, try to use a more-efficient
> --
> 1.7.3.5.38.gb312b
> 
> 
> From b3dfab326ad8d917ac1eaba10e0852bf695f93ae Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Sat, 22 Jan 2011 12:55:58 +0100
> Subject: [PATCH 4/5] copy: tweak variable name; improve a comment
> 
> * src/copy.c (copy_reg): Rename a variable to make more sense from
> caller's perspective: s/require_normal_copy/normal_copy_required/.
> This is an output-only variable, and the original name could make
> it look like an input (or i&o) variable.
> ---
>  src/copy.c |   12 ++++++------
>  1 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/src/copy.c b/src/copy.c
> index be7fdba..fae8dbe 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -868,17 +868,17 @@ copy_reg (char const *src_name, char const *dst_name,
>  #endif
>          }
> 
> -      bool require_normal_copy;
> -      /* Perform efficient extent copy for sparse file, fall back to the
> +      bool normal_copy_required;
> +      /* Perform an efficient extent-based copy, falling back to the
>           standard copy only if the initial extent scan fails.  If the
> -         '--sparse=never' option was specified, we writing all data but
> -         use extent copy if available to efficiently read.  */
> +         '--sparse=never' option is specified, write all data but use
> +         any extents to read more efficiently.  */
>        if (extent_copy (source_desc, dest_desc, buf_size,
>                         src_open_sb.st_size, make_holes,
> -                       src_name, dst_name, &require_normal_copy))
> +                       src_name, dst_name, &normal_copy_required))
>          goto preserve_metadata;
> 
> -      if (! require_normal_copy)
> +      if (! normal_copy_required)
>          {
>            return_val = false;
>            goto close_src_and_dst_desc;
> --
> 1.7.3.5.38.gb312b
> 
> 
> From bdf7c351a37ed6eeaa6bce98cb82902073bcc6c3 Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Sat, 22 Jan 2011 13:09:08 +0100
> Subject: [PATCH 5/5] copy: don't allocate a separate buffer just for extent-based copy
> 
> * src/copy.c (copy_reg): Move use of extent_scan to just *after*
> we allocate the main copying buffer, so we can...
> (extent_scan): Take a new parameter, BUF, and use that rather
> than allocating a private buffer.  Update caller.
> ---
>  src/copy.c |   36 +++++++++++++++++-------------------
>  1 files changed, 17 insertions(+), 19 deletions(-)
> 
> diff --git a/src/copy.c b/src/copy.c
> index fae8dbe..c9cc2f7 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -194,7 +194,7 @@ write_zeros (int fd, uint64_t n_bytes)
>     Upon any other failure, set *NORMAL_COPY_REQUIRED to false and
>     return false.  */
>  static bool
> -extent_copy (int src_fd, int dest_fd, size_t buf_size,
> +extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
>               off_t src_total_size, bool make_holes,
>               char const *src_name, char const *dst_name,
>               bool *require_normal_copy)
> @@ -268,8 +268,6 @@ extent_copy (int src_fd, int dest_fd, size_t buf_size,
> 
>            while (ext_len)
>              {
> -              char buf[buf_size];
> -
>                /* Avoid reading into the holes if the left extent
>                   length is shorter than the buffer size.  */
>                buf_size = MIN (ext_len, buf_size);
> @@ -868,22 +866,6 @@ copy_reg (char const *src_name, char const *dst_name,
>  #endif
>          }
> 
> -      bool normal_copy_required;
> -      /* Perform an efficient extent-based copy, falling back to the
> -         standard copy only if the initial extent scan fails.  If the
> -         '--sparse=never' option is specified, write all data but use
> -         any extents to read more efficiently.  */
> -      if (extent_copy (source_desc, dest_desc, buf_size,
> -                       src_open_sb.st_size, make_holes,
> -                       src_name, dst_name, &normal_copy_required))
> -        goto preserve_metadata;
> -
> -      if (! normal_copy_required)
> -        {
> -          return_val = false;
> -          goto close_src_and_dst_desc;
> -        }
> -
>        /* If not making a sparse file, try to use a more-efficient
>           buffer size.  */
>        if (! make_holes)
> @@ -912,6 +894,22 @@ copy_reg (char const *src_name, char const *dst_name,
>        buf_alloc = xmalloc (buf_size + buf_alignment_slop);
>        buf = ptr_align (buf_alloc, buf_alignment);
> 
> +      bool normal_copy_required;
> +      /* Perform an efficient extent-based copy, falling back to the
> +         standard copy only if the initial extent scan fails.  If the
> +         '--sparse=never' option is specified, write all data but use
> +         any extents to read more efficiently.  */
> +      if (extent_copy (source_desc, dest_desc, buf, buf_size,
> +                       src_open_sb.st_size, make_holes,
> +                       src_name, dst_name, &normal_copy_required))
> +        goto preserve_metadata;
> +
> +      if (! normal_copy_required)
> +        {
> +          return_val = false;
> +          goto close_src_and_dst_desc;
> +        }
> +
>        while (true)
>          {
>            word *wp = NULL;
> --
> 1.7.3.5.38.gb312b
> 
> 
> 





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 25 Jan 2011 07:52:02 GMT) Full text and rfc822 format available.

Message #317 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Paul Eggert <eggert <at> CS.UCLA.EDU>, bug-coreutils <at> gnu.org,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 25 Jan 2011 08:59:44 +0100
jeff.liu wrote:
> AFAICS, the tests passed on all filesystems except ext4,

Really?
The vast majority of my testing is with ext4 on Fedora 14, and I have seen
no failure -- otherwise I would have mentioned that as a known problem.

What type of system/kernel are you using?
Was your ext4 partition created long ago?  With what options?
Did "make check" fail?  If so, please provide details.
If something else failed, please give me enough information
to reproduce it.

> but the result is ok by comparing the file
> contents, can we take this risk?

> Is this patchset acceptable to merge into the next official release?

An ext4 failure sounds ominous.

> Another thing is to add solaris SEEK_DATA support to extent_scan.c as
> we discussed before, not sure
> if anyone working on this now. If not, I will take some time to follow
> up but have to delay about 2
> weeks since I will on vacation for the chinese new year start from next week.
>
> Btw, do you have plan to post extent_scan module to gnulib upstream?
> so that other file archive
> projects(like tar(1)) can benefit from it.

I do not plan to do that right away.

> Any thing I can do for this patchset please just let me know. :)




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 25 Jan 2011 11:41:01 GMT) Full text and rfc822 format available.

Message #320 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>,
	Pádraig Brady <P <at> draigBrady.com>,
	Paul Eggert <eggert <at> CS.UCLA.EDU>, bug-coreutils <at> gnu.org,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 25 Jan 2011 19:46:50 +0800
Jim Meyering wrote:
> jeff.liu wrote:
>> AFAICS, the tests passed on all filesystems except ext4,
> 
> Really?
> The vast majority of my testing is with ext4 on Fedora 14, and I have seen
> no failure -- otherwise I would have mentioned that as a known problem.

I have mentioned this issue at:
http://osdir.com/ml/bug-coreutils-gnu/2010-09/msg00092.html
"make test against cp/sparse-fiemap failed at the extent compare stage, but the file content is
identical to each other by comparing those two files "j1/j2" manually.
Is it make sense if we verify them through diff(1) since the testing file is in small size?"

> 
> What type of system/kernel are you using?
2.6.33-RC3 && 2.6.36
> Was your ext4 partition created long ago?  With what options?
fiemap copy works well if run `cp' against physical ext4 partition.
> Did "make check" fail?  If so, please provide details.
Yeah, I will show the detail of 'make check' at below.

btw, I just checked out the new branch and tried to compile it but ran into an error:
date.c:30:28: error: parse-datetime.h: No such file or directory
date.c: In function 'batch_convert':
date.c:284: warning: implicit declaration of function 'parse_datetime'

I guess 'parse-datetime.h' is shipped with gnulib? For now, I can not pull the latest gnulib code
since the remote host does not response.

For a quick reply, I ran 'make check' against the previous code base before your latest commit.

sudo make check TESTS=cp/sparse-fiemap VERBOSE=yes

tests/cp/sparse-fiemap.log:
===========================
....
...
+ mount -oloop blob mnt
+ cd mnt
+ echo test
+ test -s f
+ test 0 = 1
+ dd if=/dev/zero of=sparse bs=1k count=1 seek=1G
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 4.6234e-05 s, 22.1 MB/s
+ timeout 10 cp --sparse=always sparse fiemap
++ stat --printf %s sparse
++ stat --printf %s fiemap
+ test 1099511628800 = 1099511628800
+ perl -e 1
++ seq 1 2 21
+ for i in '$(seq 1 2 21)'
+ for j in 1 2 31 100
+ perl -e 'BEGIN { $n = 1 * 1024; *F = *STDOUT }' -e 'for (1..1) { sysseek (*F, $n, 1)' -e '&&
syswrite (*F, chr($_)x$n) or die "$!"}'
+ cp --sparse=always j1 j2
+ cmp j1 j2
+ filefrag -v j1
+ grep extent
j1: 2 extents found
+ filefrag -v j1
+ filefrag -v j2
+ f ff1
+ perl /home/jeff/opensource_dev/fiemap_copy/tests/filefrag-extent-compare
+ awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
+ sed 's/ [a-z,][a-z,]*$//' ff1
+ f ff2
+ sed 's/ [a-z,][a-z,]*$//' ff2
+ awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
@a and @b have different lengths, even after adjustment
+ fail=1
+ break
+ test 1 = 1
+ break
+ Exit 1
+ set +e
+ exit 1
+ exit 1
+ remove_tmp_
+ __st=1
+ cleanup_
+ cd /
+ umount /home/jeff/opensource_dev/fiemap_copy/tests/gt-sparse-fiemap.cXgC/mnt
+ cd /home/jeff/opensource_dev/fiemap_copy/tests
+ chmod -R u+rwx /home/jeff/opensource_dev/fiemap_copy/tests/gt-sparse-fiemap.cXgC
+ rm -rf /home/jeff/opensource_dev/fiemap_copy/tests/gt-sparse-fiemap.cXgC
+ exit 1


Thanks,
-Jeff
> If something else failed, please give me enough information
> to reproduce it.
> 
>> but the result is ok by comparing the file
>> contents, can we take this risk?
> 
>> Is this patchset acceptable to merge into the next official release?
> 
> An ext4 failure sounds ominous.
> 
>> Another thing is to add solaris SEEK_DATA support to extent_scan.c as
>> we discussed before, not sure
>> if anyone working on this now. If not, I will take some time to follow
>> up but have to delay about 2
>> weeks since I will on vacation for the chinese new year start from next week.
>>
>> Btw, do you have plan to post extent_scan module to gnulib upstream?
>> so that other file archive
>> projects(like tar(1)) can benefit from it.
> 
> I do not plan to do that right away.
> 
>> Any thing I can do for this patchset please just let me know. :)
> 
> 
> 





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Tue, 25 Jan 2011 17:10:03 GMT) Full text and rfc822 format available.

Message #323 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Eggert <eggert <at> CS.UCLA.EDU>,
	bug-coreutils <at> gnu.org, Paul,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Tue, 25 Jan 2011 18:17:50 +0100
jeff.liu wrote:
> Jim Meyering wrote:
>> jeff.liu wrote:
>>> AFAICS, the tests passed on all filesystems except ext4,
>>
>> Really?
>> The vast majority of my testing is with ext4 on Fedora 14, and I have seen
>> no failure -- otherwise I would have mentioned that as a known problem.
>
> I have mentioned this issue at:
> http://osdir.com/ml/bug-coreutils-gnu/2010-09/msg00092.html
>
> "make test against cp/sparse-fiemap failed at the extent compare
> stage, but the file content is
> identical to each other by comparing those two files "j1/j2" manually.
> Is it make sense if we verify them through diff(1) since the testing
> file is in small size?"

No.  The whole point of the test is to verify that the extents have
been preserved in the copy.  Diff doesn't know about extents.

>> What type of system/kernel are you using?
> 2.6.33-RC3 && 2.6.36
>> Was your ext4 partition created long ago?  With what options?
> fiemap copy works well if run `cp' against physical ext4 partition.
>> Did "make check" fail?  If so, please provide details.
> Yeah, I will show the detail of 'make check' at below.

What version of filefrag are you using?
Mine comes from e2fsprogs-1.41.12-6.fc14.x86_64

> btw, I just checked out the new branch and tried to compile it but ran
> into an error:
> date.c:30:28: error: parse-datetime.h: No such file or directory
> date.c: In function 'batch_convert':
> date.c:284: warning: implicit declaration of function 'parse_datetime'
>
> I guess 'parse-datetime.h' is shipped with gnulib? For now, I can not
> pull the latest gnulib code
> since the remote host does not response.

Did you run ./bootstrap ?
That is a requirement whenever the coreutils
pulls in a change to the gnulib submodule.

> For a quick reply, I ran 'make check' against the previous code base
> before your latest commit.
>
> sudo make check TESTS=cp/sparse-fiemap VERBOSE=yes
...
> + filefrag -v j2
...
> + awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
> @a and @b have different lengths, even after adjustment
> + fail=1




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 26 Jan 2011 03:53:02 GMT) Full text and rfc822 format available.

Message #326 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "jeff.liu" <jeff.liu <at> oracle.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul <at> debbugs.gnu.org,
	Eggert <eggert <at> CS.UCLA.EDU>, bug-coreutils <at> gnu.org,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 26 Jan 2011 11:58:38 +0800
Jim Meyering wrote:
> jeff.liu wrote:
>> Jim Meyering wrote:
>>> jeff.liu wrote:
>>>> AFAICS, the tests passed on all filesystems except ext4,
>>> Really?
>>> The vast majority of my testing is with ext4 on Fedora 14, and I have seen
>>> no failure -- otherwise I would have mentioned that as a known problem.
>> I have mentioned this issue at:
>> http://osdir.com/ml/bug-coreutils-gnu/2010-09/msg00092.html
>>
>> "make test against cp/sparse-fiemap failed at the extent compare
>> stage, but the file content is
>> identical to each other by comparing those two files "j1/j2" manually.
>> Is it make sense if we verify them through diff(1) since the testing
>> file is in small size?"
> 
> No.  The whole point of the test is to verify that the extents have
> been preserved in the copy.  Diff doesn't know about extents.
> 
>>> What type of system/kernel are you using?
>> 2.6.33-RC3 && 2.6.36
>>> Was your ext4 partition created long ago?  With what options?
>> fiemap copy works well if run `cp' against physical ext4 partition.
>>> Did "make check" fail?  If so, please provide details.
>> Yeah, I will show the detail of 'make check' at below.
> 
> What version of filefrag are you using?
Mine comes from E2fsprogs version 1.41.12 shipped with ubuntu8.0.4.
I updated the filefrag(8) to the upstream one but still no luck. :(
the kernel I have tried are 2.6.28/2.6.33-RC3/2.6.36.

> Mine comes from e2fsprogs-1.41.12-6.fc14.x86_64
> 
>> btw, I just checked out the new branch and tried to compile it but ran
>> into an error:
>> date.c:30:28: error: parse-datetime.h: No such file or directory
>> date.c: In function 'batch_convert':
>> date.c:284: warning: implicit declaration of function 'parse_datetime'
>>
>> I guess 'parse-datetime.h' is shipped with gnulib? For now, I can not
>> pull the latest gnulib code
>> since the remote host does not response.
> 
> Did you run ./bootstrap ?
sure, yesterday it failed due to my proxy issue.


Thanks,
-Jeff
> That is a requirement whenever the coreutils
> pulls in a change to the gnulib submodule.
> 
>> For a quick reply, I ran 'make check' against the previous code base
>> before your latest commit.
>>
>> sudo make check TESTS=cp/sparse-fiemap VERBOSE=yes
> ...
>> + filefrag -v j2
> ...
>> + awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
>> @a and @b have different lengths, even after adjustment
>> + fail=1
> 
> 
> 





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Wed, 26 Jan 2011 14:11:02 GMT) Full text and rfc822 format available.

Message #329 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jeff liu <jeff.liu <at> oracle.com>
To: "jeff.liu" <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul <at> debbugs.gnu.org,
	Eggert <eggert <at> CS.UCLA.EDU>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>,
	Pádraig Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Wed, 26 Jan 2011 22:16:36 +0800
[Message part 1 (text/plain, inline)]
在 2011-1-26,上午11:58, jeff.liu 写道:

> Jim Meyering wrote:
>> jeff.liu wrote:
>>> Jim Meyering wrote:
>>>> jeff.liu wrote:
>>>>> AFAICS, the tests passed on all filesystems except ext4,
>>>> Really?
>>>> The vast majority of my testing is with ext4 on Fedora 14, and I have seen
>>>> no failure -- otherwise I would have mentioned that as a known problem.
>>> I have mentioned this issue at:
>>> http://osdir.com/ml/bug-coreutils-gnu/2010-09/msg00092.html
>>> 
>>> "make test against cp/sparse-fiemap failed at the extent compare
>>> stage, but the file content is
>>> identical to each other by comparing those two files "j1/j2" manually.
>>> Is it make sense if we verify them through diff(1) since the testing
>>> file is in small size?"
>> 
>> No.  The whole point of the test is to verify that the extents have
>> been preserved in the copy.  Diff doesn't know about extents.
>> 
>>>> What type of system/kernel are you using?
>>> 2.6.33-RC3 && 2.6.36
>>>> Was your ext4 partition created long ago?  With what options?
>>> fiemap copy works well if run `cp' against physical ext4 partition.
>>>> Did "make check" fail?  If so, please provide details.
>>> Yeah, I will show the detail of 'make check' at below.
>> 
>> What version of filefrag are you using?
> Mine comes from E2fsprogs version 1.41.12 shipped with ubuntu8.0.4.
> I updated the filefrag(8) to the upstream one but still no luck. :(
> the kernel I have tried are 2.6.28/2.6.33-RC3/2.6.36.
> 
Hi Jim,

The issue I observed before is weird.

Now  make check passed against the following combination:
1. Refresh installed host in Ubuntu10.0.4, 
filefrag comes from E2fsprogs 1.41.11 && Kernel: 2.6.32-16
2. filefrag in e2fsprogs-1.4.12 && kernel-2.6.36.

$ sudo make check TESTS=cp/sparse-fiemap VERBOSE=yes
[sudo] password for jeff: 
make  check-TESTS
make[1]: Entering directory `/home/jeff/opensoure_dev/coreutils/tests'
make[2]: Entering directory `/home/jeff/opensoure_dev/coreutils/tests'
PASS: cp/sparse-fiemap
=============
1 test passed
=============
make[2]: Leaving directory `/home/jeff/opensoure_dev/coreutils/tests'
make[1]: Leaving directory `/home/jeff/opensoure_dev/coreutils/tests'


Thanks,
-Jeff


>> Mine comes from e2fsprogs-1.41.12-6.fc14.x86_64
>> 
>>> btw, I just checked out the new branch and tried to compile it but ran
>>> into an error:
>>> date.c:30:28: error: parse-datetime.h: No such file or directory
>>> date.c: In function 'batch_convert':
>>> date.c:284: warning: implicit declaration of function 'parse_datetime'
>>> 
>>> I guess 'parse-datetime.h' is shipped with gnulib? For now, I can not
>>> pull the latest gnulib code
>>> since the remote host does not response.
>> 
>> Did you run ./bootstrap ?
> sure, yesterday it failed due to my proxy issue.
> 
> 
> Thanks,
> -Jeff
>> That is a requirement whenever the coreutils
>> pulls in a change to the gnulib submodule.
>> 
>>> For a quick reply, I ran 'make check' against the previous code base
>>> before your latest commit.
>>> 
>>> sudo make check TESTS=cp/sparse-fiemap VERBOSE=yes
>> ...
>>> + filefrag -v j2
>> ...
>>> + awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
>>> @a and @b have different lengths, even after adjustment
>>> + fail=1
>> 
>> 
>> 
> 
> 
> 
> 


[Message part 2 (text/html, inline)]

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Fri, 28 Jan 2011 18:10:03 GMT) Full text and rfc822 format available.

Message #332 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Jeff liu <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul <at> debbugs.gnu.org,
	Eggert <eggert <at> CS.UCLA.EDU>, bug-coreutils <at> gnu.org,
	Pádraig, Brady <P <at> draigBrady.com>,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Fri, 28 Jan 2011 19:05:52 +0100
Jeff liu wrote:
> Now  make check passed against the following combination:
> 1. Refresh installed host in Ubuntu10.0.4,
> filefrag comes from E2fsprogs 1.41.11 && Kernel: 2.6.32-16
> 2. filefrag in e2fsprogs-1.4.12 && kernel-2.6.36.
[passes]

Glad to here it passes for you, now.
FYI, I have spent pretty much time on cp over the last
couple days, factoring out the hole-inducing code and
making extent_copy use it.  Part of the motivation was
to fix cp --sparse=always, which was broken on the branch.
It would not induce holes when going through extent_copy.
I've added a couple more tests and will post the series as
soon I've cleaned things up a little more.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sat, 29 Jan 2011 09:40:02 GMT) Full text and rfc822 format available.

Message #335 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Jeff liu <jeff.liu <at> oracle.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul <at> debbugs.gnu.org,
	Eggert <eggert <at> CS.UCLA.EDU>, bug-coreutils <at> gnu.org,
	=?UTF-8?Q?P=C3=A1draig <at> debbugs.gnu.org,
	?= Brady <P <at> draigBrady.com>, Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Sat, 29 Jan 2011 10:47:53 +0100
Jim Meyering wrote:
> Jeff liu wrote:
>> Now  make check passed against the following combination:
>> 1. Refresh installed host in Ubuntu10.0.4,
>> filefrag comes from E2fsprogs 1.41.11 && Kernel: 2.6.32-16
>> 2. filefrag in e2fsprogs-1.4.12 && kernel-2.6.36.
> [passes]
>
> Glad to here it passes for you, now.
> FYI, I have spent pretty much time on cp over the last
> couple days, factoring out the hole-inducing code and
> making extent_copy use it.  Part of the motivation was
> to fix cp --sparse=always, which was broken on the branch.
> It would not induce holes when going through extent_copy.
> I've added a couple more tests and will post the series as
> soon I've cleaned things up a little more.

Here are 9 more patches, just pushed to the fiemap-copy-2 branch:

  http://git.savannah.gnu.org/cgit/coreutils.git/log/?h=fiemap-copy-2

The first and last add tests, and the others consolidate,
clean up, and fix a few bugs.

  1/9 tests: ensure that FIEMAP-enabled cp copies a sparse file efficiently
    Ensure that copying a sparse 1TiB file completes in less than 3 seconds
    That can only succeed with FIEMAP (or --reflink=, which is off by default)

  2/9 fiemap copy: rename some locals
    The _logical suffix was not useful.  Change it to _start

  3/9 fiemap copy: simplify post-loop logic; improve comments

  4/9 fiemap copy: avoid a performance hit due to very small buffer
    I didn't measure this, but once you see it, it's an obvious bug.
    Using an arbitrarily small buffer size is bound to cause trouble.

  5/9 fiemap copy: avoid leak-on-error
    Failing from within the loop, we have to free the extent buffer.

  6/9 copy: factor sparse-copying code into its own function, because
    we're going to have to use it from within extent_copy, too.
    I realized that cp --sparse=always could no longer create holes
    in the destination.  Factoring this out is the first step.

  7/9 copy: remove obsolete comment
    unrelated to the rest, but hard to pull out since it's in moved code

  8/9 copy: make extent_copy use sparse_copy, rather than its own code
    Now that sparse_copy is separate, and used by copy_reg, adapt it
    so that it can also be used by extent_copy.

  9/9 tests: cp/fiemap: exercise previously-failing parts
    This is a hole-inducing test that would have failed with previous
    fiemap-based copying code.

I may change or remove the sparse_copy_finalize function, which just calls
ftruncate, especially now that it's used from only one place (initially
I was using it from each sparse_copy caller, but that didn't work out),
and don't particularly like the added lseek call that is performed for
each file copied, but keeping track of total written/offset byte counts
and inflicting the need to do that on both callers seems like too much
added code/complexity to justify avoiding that single lseek call.

From 8e4f0efd3ad17f1dd7a561369da22dfaf43ab3e8 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Fri, 28 Jan 2011 22:31:23 +0100
Subject: [PATCH 1/9] tests: ensure that FIEMAP-enabled cp copies a sparse file efficiently

* tests/cp/fiemap-perf: New file.
* tests/Makefile.am (TESTS): Add it.
---
 tests/Makefile.am    |    1 +
 tests/cp/fiemap-perf |   32 ++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 0 deletions(-)
 create mode 100755 tests/cp/fiemap-perf

diff --git a/tests/Makefile.am b/tests/Makefile.am
index 847f181..7855ac5 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -320,6 +320,7 @@ TESTS =						\
   cp/dir-vs-file				\
   cp/existing-perm-race				\
   cp/fail-perm					\
+  cp/fiemap-perf                                \
   cp/file-perm-race				\
   cp/into-self					\
   cp/link					\
diff --git a/tests/cp/fiemap-perf b/tests/cp/fiemap-perf
new file mode 100755
index 0000000..429e59b
--- /dev/null
+++ b/tests/cp/fiemap-perf
@@ -0,0 +1,32 @@
+#!/bin/sh
+# ensure that a sparse file is copied efficiently, by default
+
+# Copyright (C) 2011 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+print_ver_ cp
+
+# Require a fiemap-enabled FS.
+df -T -t btrfs -t xfs -t ext4 -t ocfs2 . \
+  || skip_ "this file system lacks FIEMAP support"
+
+# Create a large-but-sparse file.
+timeout 1 dd bs=1 seek=1T of=f < /dev/null || framework_failure_
+
+# Nothing can read (much less write) that many bytes in so little time.
+timeout 3 cp f f2 || framework_failure_
+
+Exit $fail
--
1.7.3.5.44.g960a


From dd380c3d672f78adb4cb907e8658db6b3962a281 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 27 Jan 2011 18:28:25 +0100
Subject: [PATCH 2/9] fiemap copy: rename some locals

(extent_copy): Rename locals: s/*ext_logical/*ext_start/
---
 src/copy.c |   22 +++++++++++-----------
 1 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index c9cc2f7..e164ab7 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -200,7 +200,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
              bool *require_normal_copy)
 {
   struct extent_scan scan;
-  off_t last_ext_logical = 0;
+  off_t last_ext_start = 0;
   uint64_t last_ext_len = 0;
   uint64_t last_read_size = 0;

@@ -228,10 +228,10 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
       unsigned int i;
       for (i = 0; i < scan.ei_count; i++)
         {
-          off_t ext_logical = scan.ext_info[i].ext_logical;
+          off_t ext_start = scan.ext_info[i].ext_logical;
           uint64_t ext_len = scan.ext_info[i].ext_length;

-          if (lseek (src_fd, ext_logical, SEEK_SET) < 0)
+          if (lseek (src_fd, ext_start, SEEK_SET) < 0)
             {
               error (0, errno, _("cannot lseek %s"), quote (src_name));
               return false;
@@ -239,7 +239,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,

           if (make_holes)
             {
-              if (lseek (dest_fd, ext_logical, SEEK_SET) < 0)
+              if (lseek (dest_fd, ext_start, SEEK_SET) < 0)
                 {
                   error (0, errno, _("cannot lseek %s"), quote (dst_name));
                   return false;
@@ -249,10 +249,10 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
             {
               /* We're not inducing holes; write zeros to the destination file
                  if there is a hole between the last and current extent.  */
-              if (last_ext_logical + last_ext_len < ext_logical)
+              if (last_ext_start + last_ext_len < ext_start)
                 {
-                  uint64_t hole_size = (ext_logical
-                                        - last_ext_logical
+                  uint64_t hole_size = (ext_start
+                                        - last_ext_start
                                         - last_ext_len);
                   if (! write_zeros (dest_fd, hole_size))
                     {
@@ -262,7 +262,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
                 }
             }

-          last_ext_logical = ext_logical;
+          last_ext_start = ext_start;
           last_ext_len = ext_len;
           last_read_size = 0;

@@ -313,7 +313,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
      file.  */
   if (make_holes)
     {
-      if (last_ext_logical + last_read_size < src_total_size)
+      if (last_ext_start + last_read_size < src_total_size)
         {
           if (ftruncate (dest_fd, src_total_size) < 0)
             {
@@ -324,9 +324,9 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
     }
   else
     {
-      if (last_ext_logical + last_ext_len < src_total_size)
+      if (last_ext_start + last_ext_len < src_total_size)
         {
-          uint64_t holes_len = src_total_size - last_ext_logical - last_ext_len;
+          uint64_t holes_len = src_total_size - last_ext_start - last_ext_len;
           if (0 < holes_len)
             {
               if (! write_zeros (dest_fd, holes_len))
--
1.7.3.5.44.g960a


From d1067e37b0e4b945ab901e98d6eedb249fa2a42c Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 27 Jan 2011 19:00:48 +0100
Subject: [PATCH 3/9] fiemap copy: simplify post-loop logic; improve comments

* src/copy.c (extent_copy): Avoid duplication in post-loop
extend-to-desired-length code.
---
 src/copy.c |   44 +++++++++++++++-----------------------------
 1 files changed, 15 insertions(+), 29 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index e164ab7..ab18a76 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -268,8 +268,8 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,

           while (ext_len)
             {
-              /* Avoid reading into the holes if the left extent
-                 length is shorter than the buffer size.  */
+              /* Don't read from a following hole if EXT_LEN
+                 is smaller than the buffer size.  */
               buf_size = MIN (ext_len, buf_size);

               ssize_t n_read = read (src_fd, buf, buf_size);
@@ -285,7 +285,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,

               if (n_read == 0)
                 {
-                  /* Record number of bytes read from the previous extent.  */
+                  /* Record number of bytes read from this extent-at-EOF.  */
                   last_read_size = last_ext_len - ext_len;
                   break;
                 }
@@ -306,33 +306,19 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
     }
   while (! scan.hit_final_extent);

-  /* If a file ends up with holes, the sum of the last extent logical offset
-     and the read-returned size or the last extent length will be shorter than
-     the actual size of the file.  Use ftruncate to extend the length of the
-     destination file if make_holes, or write zeros up to the actual size of the
-     file.  */
-  if (make_holes)
-    {
-      if (last_ext_start + last_read_size < src_total_size)
-        {
-          if (ftruncate (dest_fd, src_total_size) < 0)
-            {
-              error (0, errno, _("failed to extend %s"), quote (dst_name));
-              return false;
-            }
-        }
-    }
-  else
-    {
-      if (last_ext_start + last_ext_len < src_total_size)
-        {
-          uint64_t holes_len = src_total_size - last_ext_start - last_ext_len;
-          if (0 < holes_len)
-            {
-              if (! write_zeros (dest_fd, holes_len))
-                return false;
-            }
-        }
+  /* When the source file ends with a hole, the sum of the last extent start
+     offset and (the read-returned size or the last extent length) is smaller
+     than the actual size of the file.  In that case, extend the destination
+     file to the required length.  When MAKE_HOLES is set, use ftruncate;
+     otherwise, use write_zeros.  */
+  uint64_t eof_hole_len = (src_total_size - last_ext_start
+                           - (last_read_size ? last_read_size : last_ext_len));
+  if (eof_hole_len && (make_holes
+                       ? ftruncate (dest_fd, src_total_size)
+                       : ! write_zeros (dest_fd, eof_hole_len)))
+    {
+      error (0, errno, _("failed to extend %s"), quote (dst_name));
+      return false;
     }

   return true;
--
1.7.3.5.44.g960a


From 33f4a4a549afb3de94e546091c91586a1ece67ba Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 27 Jan 2011 17:30:08 +0100
Subject: [PATCH 4/9] fiemap copy: avoid a performance hit due to very small buffer

* src/copy.c (extent_copy): Don't let what should have been a
temporary reduction of buf_size (to handle a short ext_len) become
permanent and thus impact the performance of all further iterations.
---
 src/copy.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index ab18a76..9a3a8f7 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -270,9 +270,8 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
             {
               /* Don't read from a following hole if EXT_LEN
                  is smaller than the buffer size.  */
-              buf_size = MIN (ext_len, buf_size);
-
-              ssize_t n_read = read (src_fd, buf, buf_size);
+              size_t b_size = MIN (ext_len, buf_size);
+              ssize_t n_read = read (src_fd, buf, b_size);
               if (n_read < 0)
                 {
 #ifdef EINTR
--
1.7.3.5.44.g960a


From 47c8476ec9629239c82caf50b1c68b7bc58ba2d6 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 27 Jan 2011 17:49:04 +0100
Subject: [PATCH 5/9] fiemap copy: avoid leak-on-error

* src/copy.c (extent_copy): Don't leak an extent_scan buffer on
failed lseek, read, or write.
---
 src/copy.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 9a3a8f7..208e463 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -234,6 +234,8 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
           if (lseek (src_fd, ext_start, SEEK_SET) < 0)
             {
               error (0, errno, _("cannot lseek %s"), quote (src_name));
+            fail:
+              extent_scan_free (&scan);
               return false;
             }

@@ -242,7 +244,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
               if (lseek (dest_fd, ext_start, SEEK_SET) < 0)
                 {
                   error (0, errno, _("cannot lseek %s"), quote (dst_name));
-                  return false;
+                  goto fail;
                 }
             }
           else
@@ -257,7 +259,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
                   if (! write_zeros (dest_fd, hole_size))
                     {
                       error (0, errno, _("%s: write failed"), quote (dst_name));
-                      return false;
+                      goto fail;
                     }
                 }
             }
@@ -279,7 +281,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
                     continue;
 #endif
                   error (0, errno, _("reading %s"), quote (src_name));
-                    return false;
+                  goto fail;
                 }

               if (n_read == 0)
@@ -292,7 +294,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
               if (full_write (dest_fd, buf, n_read) != n_read)
                 {
                   error (0, errno, _("writing %s"), quote (dst_name));
-                  return false;
+                  goto fail;
                 }

               ext_len -= n_read;
--
1.7.3.5.44.g960a


From c0b7bc3864c06ea12c2740056e28623449fb63a7 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 27 Jan 2011 20:57:17 +0100
Subject: [PATCH 6/9] copy: factor sparse-copying code into its own function, because

we're going to have to use it from within extent_copy, too.
* src/copy.c (sparse_copy): New function, factored out of...
(copy_reg): ...here.
Remove now-unused locals.
---
 src/copy.c |  212 ++++++++++++++++++++++++++++++++----------------------------
 1 files changed, 114 insertions(+), 98 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 208e463..cc8f68f 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -134,6 +134,116 @@ utimens_symlink (char const *file, struct timespec const *timespec)
   return err;
 }

+/* Copy the regular file open on SRC_FD/SRC_NAME to DST_FD/DST_NAME,
+   honoring the MAKE_HOLES setting and using the BUF_SIZE-byte buffer
+   BUF for temporary storage.  Return true upon successful completion;
+   print a diagnostic and return false upon error.  */
+static bool
+sparse_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
+             bool make_holes,
+             char const *src_name, char const *dst_name)
+{
+  typedef uintptr_t word;
+  off_t n_read_total = 0;
+  bool last_write_made_hole = false;
+
+  while (true)
+    {
+      word *wp = NULL;
+
+      ssize_t n_read = read (src_fd, buf, buf_size);
+      if (n_read < 0)
+        {
+#ifdef EINTR
+          if (errno == EINTR)
+            continue;
+#endif
+          error (0, errno, _("reading %s"), quote (src_name));
+          return false;
+        }
+      if (n_read == 0)
+        break;
+
+      n_read_total += n_read;
+
+      if (make_holes)
+        {
+          char *cp;
+
+          /* Sentinel to stop loop.  */
+          buf[n_read] = '\1';
+#ifdef lint
+          /* Usually, buf[n_read] is not the byte just before a "word"
+             (aka uintptr_t) boundary.  In that case, the word-oriented
+             test below (*wp++ == 0) would read some uninitialized bytes
+             after the sentinel.  To avoid false-positive reports about
+             this condition (e.g., from a tool like valgrind), set the
+             remaining bytes -- to any value.  */
+          memset (buf + n_read + 1, 0, sizeof (word) - 1);
+#endif
+
+          /* Find first nonzero *word*, or the word with the sentinel.  */
+
+          wp = (word *) buf;
+          while (*wp++ == 0)
+            continue;
+
+          /* Find the first nonzero *byte*, or the sentinel.  */
+
+          cp = (char *) (wp - 1);
+          while (*cp++ == 0)
+            continue;
+
+          if (cp <= buf + n_read)
+            /* Clear to indicate that a normal write is needed. */
+            wp = NULL;
+          else
+            {
+              /* We found the sentinel, so the whole input block was zero.
+                 Make a hole.  */
+              if (lseek (dest_fd, n_read, SEEK_CUR) < 0)
+                {
+                  error (0, errno, _("cannot lseek %s"), quote (dst_name));
+                  return false;
+                }
+              last_write_made_hole = true;
+            }
+        }
+
+      if (!wp)
+        {
+          size_t n = n_read;
+          if (full_write (dest_fd, buf, n) != n)
+            {
+              error (0, errno, _("writing %s"), quote (dst_name));
+              return false;
+            }
+          last_write_made_hole = false;
+
+          /* It is tempting to return early here upon a short read from a
+             regular file.  That would save the final read syscall for each
+             file.  Unfortunately that doesn't work for certain files in
+             /proc with linux kernels from at least 2.6.9 .. 2.6.29.  */
+        }
+    }
+
+  /* If the file ends with a `hole', we need to do something to record
+     the length of the file.  On modern systems, calling ftruncate does
+     the job.  On systems without native ftruncate support, we have to
+     write a byte at the ending position.  Otherwise the kernel would
+     truncate the file at the end of the last write operation.  */
+  if (last_write_made_hole)
+    {
+      if (ftruncate (dest_fd, n_read_total) < 0)
+        {
+          error (0, errno, _("truncating %s"), quote (dst_name));
+          return false;
+        }
+    }
+
+  return true;
+}
+
 /* Perform the O(1) btrfs clone operation, if possible.
    Upon success, return 0.  Otherwise, return -1 and set errno.  */
 static inline int
@@ -824,7 +934,6 @@ copy_reg (char const *src_name, char const *dst_name,
   if (data_copy_required)
     {
       typedef uintptr_t word;
-      off_t n_read_total = 0;

       /* Choose a suitable buffer size; it may be adjusted later.  */
       size_t buf_alignment = lcm (getpagesize (), sizeof (word));
@@ -832,7 +941,6 @@ copy_reg (char const *src_name, char const *dst_name,
       size_t buf_size = io_blksize (sb);

       /* Deal with sparse files.  */
-      bool last_write_made_hole = false;
       bool make_holes = false;

       if (S_ISREG (sb.st_mode))
@@ -897,103 +1005,11 @@ copy_reg (char const *src_name, char const *dst_name,
           goto close_src_and_dst_desc;
         }

-      while (true)
+      if ( ! sparse_copy (source_desc, dest_desc, buf, buf_size,
+                          make_holes, src_name, dst_name))
         {
-          word *wp = NULL;
-
-          ssize_t n_read = read (source_desc, buf, buf_size);
-          if (n_read < 0)
-            {
-#ifdef EINTR
-              if (errno == EINTR)
-                continue;
-#endif
-              error (0, errno, _("reading %s"), quote (src_name));
-              return_val = false;
-              goto close_src_and_dst_desc;
-            }
-          if (n_read == 0)
-            break;
-
-          n_read_total += n_read;
-
-          if (make_holes)
-            {
-              char *cp;
-
-              /* Sentinel to stop loop.  */
-              buf[n_read] = '\1';
-#ifdef lint
-              /* Usually, buf[n_read] is not the byte just before a "word"
-                 (aka uintptr_t) boundary.  In that case, the word-oriented
-                 test below (*wp++ == 0) would read some uninitialized bytes
-                 after the sentinel.  To avoid false-positive reports about
-                 this condition (e.g., from a tool like valgrind), set the
-                 remaining bytes -- to any value.  */
-              memset (buf + n_read + 1, 0, sizeof (word) - 1);
-#endif
-
-              /* Find first nonzero *word*, or the word with the sentinel.  */
-
-              wp = (word *) buf;
-              while (*wp++ == 0)
-                continue;
-
-              /* Find the first nonzero *byte*, or the sentinel.  */
-
-              cp = (char *) (wp - 1);
-              while (*cp++ == 0)
-                continue;
-
-              if (cp <= buf + n_read)
-                /* Clear to indicate that a normal write is needed. */
-                wp = NULL;
-              else
-                {
-                  /* We found the sentinel, so the whole input block was zero.
-                     Make a hole.  */
-                  if (lseek (dest_desc, n_read, SEEK_CUR) < 0)
-                    {
-                      error (0, errno, _("cannot lseek %s"), quote (dst_name));
-                      return_val = false;
-                      goto close_src_and_dst_desc;
-                    }
-                  last_write_made_hole = true;
-                }
-            }
-
-          if (!wp)
-            {
-              size_t n = n_read;
-              if (full_write (dest_desc, buf, n) != n)
-                {
-                  error (0, errno, _("writing %s"), quote (dst_name));
-                  return_val = false;
-                  goto close_src_and_dst_desc;
-                }
-              last_write_made_hole = false;
-
-              /* It is tempting to return early here upon a short read from a
-                 regular file.  That would save the final read syscall for each
-                 file.  Unfortunately that doesn't work for certain files in
-                 /proc with linux kernels from at least 2.6.9 .. 2.6.29.  */
-            }
-        }
-
-      /* If the file ends with a `hole', we need to do something to record
-         the length of the file.  On modern systems, calling ftruncate does
-         the job.  On systems without native ftruncate support, we have to
-         write a byte at the ending position.  Otherwise the kernel would
-         truncate the file at the end of the last write operation.  */
-
-      if (last_write_made_hole)
-        {
-          if (ftruncate (dest_desc, n_read_total) < 0)
-            {
-              error (0, errno, _("truncating %s"), quote (dst_name));
-              return_val = false;
-              goto close_src_and_dst_desc;
-            }
+          return_val = false;
+          goto close_src_and_dst_desc;
         }
     }

--
1.7.3.5.44.g960a


From 80038c3cba2dee9c6c41ab6a28a1233a538ee2ee Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 27 Jan 2011 21:01:07 +0100
Subject: [PATCH 7/9] copy: remove obsolete comment

* src/copy.c (sparse_copy): Remove now-obsolete comment about
how we used to work around lack of ftruncate.  Combine nested
if conditions into one.
---
 src/copy.c |   21 +++++++++------------
 1 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index cc8f68f..4bfdce6 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -137,7 +137,10 @@ utimens_symlink (char const *file, struct timespec const *timespec)
 /* Copy the regular file open on SRC_FD/SRC_NAME to DST_FD/DST_NAME,
    honoring the MAKE_HOLES setting and using the BUF_SIZE-byte buffer
    BUF for temporary storage.  Return true upon successful completion;
-   print a diagnostic and return false upon error.  */
+   print a diagnostic and return false upon error.
+   Note that for best results, BUF should be "well"-aligned.
+   BUF must have sizeof(uintptr_t)-1 bytes of additional space
+   beyond BUF[BUF_SIZE-1].  */
 static bool
 sparse_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
              bool make_holes,
@@ -227,18 +230,12 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
         }
     }

-  /* If the file ends with a `hole', we need to do something to record
-     the length of the file.  On modern systems, calling ftruncate does
-     the job.  On systems without native ftruncate support, we have to
-     write a byte at the ending position.  Otherwise the kernel would
-     truncate the file at the end of the last write operation.  */
-  if (last_write_made_hole)
+  /* If the file ends with a `hole', we need to do something to record the
+     length of the file.  On modern systems, calling ftruncate does the job.  */
+  if (last_write_made_hole && ftruncate (dest_fd, n_read_total) < 0)
     {
-      if (ftruncate (dest_fd, n_read_total) < 0)
-        {
-          error (0, errno, _("truncating %s"), quote (dst_name));
-          return false;
-        }
+      error (0, errno, _("truncating %s"), quote (dst_name));
+      return false;
     }

   return true;
--
1.7.3.5.44.g960a


From f161ba3fcd5832d1344224ec41627cace5d73544 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Fri, 28 Jan 2011 21:19:50 +0100
Subject: [PATCH 8/9] copy: make extent_copy use sparse_copy, rather than its own code

* src/copy.c (extent_copy): Before this change, extent_copy would fail
to create holes, thus breaking --sparse=auto and --sparse=always.
I.e., copying a large enough file of all zeros, cp --sparse=always
should introduce a hole, but with extent_copy, it would not.
---
 src/copy.c |  109 +++++++++++++++++++++++++++---------------------------------
 1 files changed, 49 insertions(+), 60 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 4bfdce6..96bb35b 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -136,25 +136,28 @@ utimens_symlink (char const *file, struct timespec const *timespec)

 /* Copy the regular file open on SRC_FD/SRC_NAME to DST_FD/DST_NAME,
    honoring the MAKE_HOLES setting and using the BUF_SIZE-byte buffer
-   BUF for temporary storage.  Return true upon successful completion;
+   BUF for temporary storage.  Copy no more than MAX_N_READ bytes.
+   Return true upon successful completion;
    print a diagnostic and return false upon error.
    Note that for best results, BUF should be "well"-aligned.
    BUF must have sizeof(uintptr_t)-1 bytes of additional space
-   beyond BUF[BUF_SIZE-1].  */
+   beyond BUF[BUF_SIZE-1].
+   Set *LAST_WRITE_MADE_HOLE to true if the final operation on
+   DEST_FD introduced a hole.  */
 static bool
 sparse_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
              bool make_holes,
-             char const *src_name, char const *dst_name)
+             char const *src_name, char const *dst_name,
+             uintmax_t max_n_read, bool *last_write_made_hole)
 {
   typedef uintptr_t word;
-  off_t n_read_total = 0;
-  bool last_write_made_hole = false;
+  *last_write_made_hole = false;

-  while (true)
+  while (max_n_read)
     {
       word *wp = NULL;

-      ssize_t n_read = read (src_fd, buf, buf_size);
+      ssize_t n_read = read (src_fd, buf, MIN (max_n_read, buf_size));
       if (n_read < 0)
         {
 #ifdef EINTR
@@ -166,8 +169,7 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
         }
       if (n_read == 0)
         break;
-
-      n_read_total += n_read;
+      max_n_read -= n_read;

       if (make_holes)
         {
@@ -209,7 +211,7 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
                   error (0, errno, _("cannot lseek %s"), quote (dst_name));
                   return false;
                 }
-              last_write_made_hole = true;
+              *last_write_made_hole = true;
             }
         }

@@ -221,7 +223,7 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
               error (0, errno, _("writing %s"), quote (dst_name));
               return false;
             }
-          last_write_made_hole = false;
+          *last_write_made_hole = false;

           /* It is tempting to return early here upon a short read from a
              regular file.  That would save the final read syscall for each
@@ -230,9 +232,16 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
         }
     }

-  /* If the file ends with a `hole', we need to do something to record the
-     length of the file.  On modern systems, calling ftruncate does the job.  */
-  if (last_write_made_hole && ftruncate (dest_fd, n_read_total) < 0)
+  return true;
+}
+
+/* If the file ends with a `hole' (i.e., if sparse_copy set wrote_hole_at_eof),
+   call this function to record the length of the output file.  */
+static bool
+sparse_copy_finalize (int dest_fd, char const *dst_name)
+{
+  off_t len = lseek (dest_fd, 0, SEEK_CUR);
+  if (0 <= len && ftruncate (dest_fd, len) < 0)
     {
       error (0, errno, _("truncating %s"), quote (dst_name));
       return false;
@@ -309,10 +318,10 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
   struct extent_scan scan;
   off_t last_ext_start = 0;
   uint64_t last_ext_len = 0;
-  uint64_t last_read_size = 0;

   extent_scan_init (src_fd, &scan);

+  bool wrote_hole_at_eof = true;
   do
     {
       bool ok = extent_scan_read (&scan);
@@ -356,8 +365,9 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
             }
           else
             {
-              /* We're not inducing holes; write zeros to the destination file
-                 if there is a hole between the last and current extent.  */
+              /* When not inducing holes and when there is a hole between
+                 the end of the previous extent and the beginning of the
+                 current one, write zeros to the destination file.  */
               if (last_ext_start + last_ext_len < ext_start)
                 {
                   uint64_t hole_size = (ext_start
@@ -373,39 +383,11 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,

           last_ext_start = ext_start;
           last_ext_len = ext_len;
-          last_read_size = 0;
-
-          while (ext_len)
-            {
-              /* Don't read from a following hole if EXT_LEN
-                 is smaller than the buffer size.  */
-              size_t b_size = MIN (ext_len, buf_size);
-              ssize_t n_read = read (src_fd, buf, b_size);
-              if (n_read < 0)
-                {
-#ifdef EINTR
-                  if (errno == EINTR)
-                    continue;
-#endif
-                  error (0, errno, _("reading %s"), quote (src_name));
-                  goto fail;
-                }
-
-              if (n_read == 0)
-                {
-                  /* Record number of bytes read from this extent-at-EOF.  */
-                  last_read_size = last_ext_len - ext_len;
-                  break;
-                }
-
-              if (full_write (dest_fd, buf, n_read) != n_read)
-                {
-                  error (0, errno, _("writing %s"), quote (dst_name));
-                  goto fail;
-                }

-              ext_len -= n_read;
-            }
+          if ( ! sparse_copy (src_fd, dest_fd, buf, buf_size,
+                              make_holes, src_name, dst_name, ext_len,
+                              &wrote_hole_at_eof))
+            return false;
         }

       /* Release the space allocated to scan->ext_info.  */
@@ -414,16 +396,19 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
     }
   while (! scan.hit_final_extent);

-  /* When the source file ends with a hole, the sum of the last extent start
-     offset and (the read-returned size or the last extent length) is smaller
-     than the actual size of the file.  In that case, extend the destination
-     file to the required length.  When MAKE_HOLES is set, use ftruncate;
-     otherwise, use write_zeros.  */
-  uint64_t eof_hole_len = (src_total_size - last_ext_start
-                           - (last_read_size ? last_read_size : last_ext_len));
-  if (eof_hole_len && (make_holes
-                       ? ftruncate (dest_fd, src_total_size)
-                       : ! write_zeros (dest_fd, eof_hole_len)))
+  /* When the source file ends with a hole, we have to do a little more work,
+     since the above copied only up to and including the final extent.
+     In order to complete the copy, we may have to insert a hole or write
+     zeros in the destination corresponding to the source file's hole-at-EOF.
+
+     In addition, if the final extent was a block of zeros at EOF and we've
+     just converted them to a hole in the destination, we must call ftruncate
+     here in order to record the proper length in the destination.  */
+  off_t dest_len = lseek (dest_fd, 0, SEEK_CUR);
+  if ((dest_len < src_total_size || wrote_hole_at_eof)
+      && (make_holes
+          ? ftruncate (dest_fd, src_total_size)
+          : ! write_zeros (dest_fd, src_total_size - dest_len)))
     {
       error (0, errno, _("failed to extend %s"), quote (dst_name));
       return false;
@@ -1002,8 +987,12 @@ copy_reg (char const *src_name, char const *dst_name,
           goto close_src_and_dst_desc;
         }

+      bool wrote_hole_at_eof;
       if ( ! sparse_copy (source_desc, dest_desc, buf, buf_size,
-                          make_holes, src_name, dst_name))
+                          make_holes, src_name, dst_name, UINTMAX_MAX,
+                          &wrote_hole_at_eof)
+           || (wrote_hole_at_eof &&
+               ! sparse_copy_finalize (dest_desc, dst_name)))
         {
           return_val = false;
           goto close_src_and_dst_desc;
--
1.7.3.5.44.g960a


From 7f154dcfc5641c9616921d4c5ac5005bcb2507eb Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 27 Jan 2011 15:17:42 +0100
Subject: [PATCH 9/9] tests: cp/fiemap: exercise previously-failing parts

* tests/cp/fiemap-2: New test.
* tests/Makefile.am (TESTS): Add it.
---
 tests/Makefile.am |    1 +
 tests/cp/fiemap-2 |   54 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 0 deletions(-)
 create mode 100755 tests/cp/fiemap-2

diff --git a/tests/Makefile.am b/tests/Makefile.am
index 7855ac5..40d35ac 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -321,6 +321,7 @@ TESTS =						\
   cp/existing-perm-race				\
   cp/fail-perm					\
   cp/fiemap-perf                                \
+  cp/fiemap-2                                   \
   cp/file-perm-race				\
   cp/into-self					\
   cp/link					\
diff --git a/tests/cp/fiemap-2 b/tests/cp/fiemap-2
new file mode 100755
index 0000000..d40505b
--- /dev/null
+++ b/tests/cp/fiemap-2
@@ -0,0 +1,54 @@
+#!/bin/sh
+# Exercise a few more corners of the fiemap-copying code.
+
+# Copyright (C) 2011 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+print_ver_ cp
+
+# Require a fiemap-enabled FS.
+df -T -t btrfs -t xfs -t ext4 -t ocfs2 . \
+  || skip_ "this file system lacks FIEMAP support"
+
+# Exercise the code that handles a file ending in a hole.
+printf x > k || framework_failure_
+dd bs=1k seek=128 of=k < /dev/null || framework_failure_
+
+# The first time through the outer loop, the input file, K, ends with a hole.
+# The second time through, we append a byte so that it does not.
+for append in no yes; do
+  test $append = yes && printf y >> k
+  for i in always never; do
+    cp --sparse=$i k k2 || fail=1
+    cmp k k2 || fail=1
+  done
+done
+
+# Ensure that --sparse=always can restore holes.
+rm -f k
+# Create a file starting with an "x", followed by 257K-1 0 bytes.
+printf x > k || framework_failure_
+dd bs=1k seek=1 of=k count=255 < /dev/zero || framework_failure_
+
+# cp should detect the all-zero blocks and convert some of them to holes.
+# How many it detects/converts currently depends on io_blksize.
+# Currently, on my F14/ext4 desktop, this K starts off with size 256KiB,
+# (note that the K in the preceding test starts off with size 4KiB).
+# cp from coreutils-8.9 with --sparse=always reduces the size to 32KiB.
+cp --sparse=always k k2 || fail=1
+test $(stat -c %b k2) -lt $(stat -c %b k) || fail=1
+
+Exit $fail
--
1.7.3.5.44.g960a




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sat, 29 Jan 2011 12:02:02 GMT) Full text and rfc822 format available.

Message #338 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul <at> debbugs.gnu.org,
	Eggert <eggert <at> CS.UCLA.EDU>, bug-coreutils <at> gnu.org,
	Jeff liu <jeff.liu <at> oracle.com>,
	=?UTF-8?Q?P=C3=A1draig <at> debbugs.gnu.org,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Sat, 29 Jan 2011 11:57:11 +0000
On 29/01/11 09:47, Jim Meyering wrote:
> Jim Meyering wrote:
>> Jeff liu wrote:
>>> Now  make check passed against the following combination:
>>> 1. Refresh installed host in Ubuntu10.0.4,
>>> filefrag comes from E2fsprogs 1.41.11 && Kernel: 2.6.32-16
>>> 2. filefrag in e2fsprogs-1.4.12 && kernel-2.6.36.
>> [passes]
>>
>> Glad to here it passes for you, now.
>> FYI, I have spent pretty much time on cp over the last
>> couple days, factoring out the hole-inducing code and
>> making extent_copy use it.  Part of the motivation was
>> to fix cp --sparse=always, which was broken on the branch.
>> It would not induce holes when going through extent_copy.
>> I've added a couple more tests and will post the series as
>> soon I've cleaned things up a little more.
> 
> Here are 9 more patches, just pushed to the fiemap-copy-2 branch:
> 
>   http://git.savannah.gnu.org/cgit/coreutils.git/log/?h=fiemap-copy-2
> 
> The first and last add tests, and the others consolidate,
> clean up, and fix a few bugs.
> 
>   1/9 tests: ensure that FIEMAP-enabled cp copies a sparse file efficiently
>     Ensure that copying a sparse 1TiB file completes in less than 3 seconds
>     That can only succeed with FIEMAP (or --reflink=, which is off by default)
> 
>   2/9 fiemap copy: rename some locals
>     The _logical suffix was not useful.  Change it to _start
> 
>   3/9 fiemap copy: simplify post-loop logic; improve comments
> 
>   4/9 fiemap copy: avoid a performance hit due to very small buffer
>     I didn't measure this, but once you see it, it's an obvious bug.
>     Using an arbitrarily small buffer size is bound to cause trouble.
> 
>   5/9 fiemap copy: avoid leak-on-error
>     Failing from within the loop, we have to free the extent buffer.
> 
>   6/9 copy: factor sparse-copying code into its own function, because
>     we're going to have to use it from within extent_copy, too.
>     I realized that cp --sparse=always could no longer create holes
>     in the destination.  Factoring this out is the first step.
> 
>   7/9 copy: remove obsolete comment
>     unrelated to the rest, but hard to pull out since it's in moved code
> 
>   8/9 copy: make extent_copy use sparse_copy, rather than its own code
>     Now that sparse_copy is separate, and used by copy_reg, adapt it
>     so that it can also be used by extent_copy.
> 
>   9/9 tests: cp/fiemap: exercise previously-failing parts
>     This is a hole-inducing test that would have failed with previous
>     fiemap-based copying code.
> 
> I may change or remove the sparse_copy_finalize function, which just calls
> ftruncate, especially now that it's used from only one place (initially
> I was using it from each sparse_copy caller, but that didn't work out),
> and don't particularly like the added lseek call that is performed for
> each file copied, but keeping track of total written/offset byte counts
> and inflicting the need to do that on both callers seems like too much
> added code/complexity to justify avoiding that single lseek call.
> 
>>From 8e4f0efd3ad17f1dd7a561369da22dfaf43ab3e8 Mon Sep 17 00:00:00 2001
> From: Jim Meyering <meyering <at> redhat.com>
> Date: Fri, 28 Jan 2011 22:31:23 +0100
> Subject: [PATCH 1/9] tests: ensure that FIEMAP-enabled cp copies a sparse file efficiently
> 
> * tests/cp/fiemap-perf: New file.
> * tests/Makefile.am (TESTS): Add it.
> ---
>  tests/Makefile.am    |    1 +
>  tests/cp/fiemap-perf |   32 ++++++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+), 0 deletions(-)
>  create mode 100755 tests/cp/fiemap-perf
> 
> diff --git a/tests/Makefile.am b/tests/Makefile.am
> index 847f181..7855ac5 100644
> --- a/tests/Makefile.am
> +++ b/tests/Makefile.am
> @@ -320,6 +320,7 @@ TESTS =						\
>    cp/dir-vs-file				\
>    cp/existing-perm-race				\
>    cp/fail-perm					\
> +  cp/fiemap-perf                                \
>    cp/file-perm-race				\
>    cp/into-self					\
>    cp/link					\
> diff --git a/tests/cp/fiemap-perf b/tests/cp/fiemap-perf
> new file mode 100755
> index 0000000..429e59b
> --- /dev/null
> +++ b/tests/cp/fiemap-perf
> @@ -0,0 +1,32 @@
> +#!/bin/sh
> +# ensure that a sparse file is copied efficiently, by default
> +
> +# Copyright (C) 2011 Free Software Foundation, Inc.
> +
> +# This program is free software: you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation, either version 3 of the License, or
> +# (at your option) any later version.
> +
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +. "${srcdir=.}/init.sh"; path_prepend_ ../src
> +print_ver_ cp
> +
> +# Require a fiemap-enabled FS.
> +df -T -t btrfs -t xfs -t ext4 -t ocfs2 . \
> +  || skip_ "this file system lacks FIEMAP support"
> +
> +# Create a large-but-sparse file.
> +timeout 1 dd bs=1 seek=1T of=f < /dev/null || framework_failure_
> +
> +# Nothing can read (much less write) that many bytes in so little time.
> +timeout 3 cp f f2 || framework_failure_

I'm a bit worried with a 1s timeout.
The following will only give false negatives over 100GB/s

timeout 10 truncate -s1T f || framework_failure_
timeout 10 cp f f2 || framework_failure_

I wouldn't worry about filling file systems either,
as we're already limiting to ext4 etc.

>>From 7f154dcfc5641c9616921d4c5ac5005bcb2507eb Mon Sep 17 00:00:00 20: Jim Meyering <meyering <at> redhat.com>
> Date: Thu, 27 Jan 2011 15:17:42 +0100
> Subject: [PATCH 9/9] tests: cp/fiemap: exercise previously-failing parts
> +# Ensure that --sparse=always can restore holes.
> +rm -f k
> +# Create a file starting with an "x", followed by 257K-1 0 bytes.
> +printf x > k || framework_failure_
> +dd bs=1k seek=1 of=k count=255 < /dev/zero || framework_failure_

S/257/256/ ?





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sat, 29 Jan 2011 16:02:02 GMT) Full text and rfc822 format available.

Message #341 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Jeff liu <jeff.liu <at> oracle.com>,
	Paul Eggert <eggert <at> cs.ucla.edu>, bug-coreutils <at> gnu.org,
	Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Sat, 29 Jan 2011 17:10:07 +0100
Pádraig Brady wrote:
...
>> +# Require a fiemap-enabled FS.
>> +df -T -t btrfs -t xfs -t ext4 -t ocfs2 . \
>> +  || skip_ "this file system lacks FIEMAP support"
>> +
>> +# Create a large-but-sparse file.
>> +timeout 1 dd bs=1 seek=1T of=f < /dev/null || framework_failure_
>> +
>> +# Nothing can read (much less write) that many bytes in so little time.
>> +timeout 3 cp f f2 || framework_failure_
>
> I'm a bit worried with a 1s timeout.
> The following will only give false negatives over 100GB/s
>
> timeout 10 truncate -s1T f || framework_failure_
> timeout 10 cp f f2 || framework_failure_

Thanks.  Using truncate there is better, and 10 seconds is
more consistent with many other timeout-using tests.
While fixing that I noticed another problem:
the latter command should be setting fail=1.

diff --git a/tests/cp/fiemap-perf b/tests/cp/fiemap-perf
index 429e59b..6c588cb 100755
--- a/tests/cp/fiemap-perf
+++ b/tests/cp/fiemap-perf
@@ -24,9 +24,9 @@ df -T -t btrfs -t xfs -t ext4 -t ocfs2 . \
   || skip_ "this file system lacks FIEMAP support"

 # Create a large-but-sparse file.
-timeout 1 dd bs=1 seek=1T of=f < /dev/null || framework_failure_
+timeout 10 truncate -s1T f || framework_failure_

 # Nothing can read (much less write) that many bytes in so little time.
-timeout 3 cp f f2 || framework_failure_
+timeout 10 cp f f2 || fail=1

 Exit $fail


> I wouldn't worry about filling file systems either,
> as we're already limiting to ext4 etc.

Nor would I.

>> Subject: [PATCH 9/9] tests: cp/fiemap: exercise previously-failing parts
>> +# Ensure that --sparse=always can restore holes.
>> +rm -f k
>> +# Create a file starting with an "x", followed by 257K-1 0 bytes.
>> +printf x > k || framework_failure_
>> +dd bs=1k seek=1 of=k count=255 < /dev/zero || framework_failure_
>
> S/257/256/ ?

Good catch.
Initially I had count=256 and the comment was correct,
but I changed to 255 and didn't adjust.
I've corrected the comment as you suggest.

I've amended those two commits, rebased, and pushed to
a new branch: fiemap-copy-3.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6131; Package coreutils. (Sun, 30 Jan 2011 20:25:02 GMT) Full text and rfc822 format available.

Message #344 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Paul Eggert <eggert <at> cs.ucla.edu>,
	6131-done <at> debbugs.gnu.org, bug-coreutils <at> gnu.org,
	Jeff liu <jeff.liu <at> oracle.com>, Chris Mason <chris.mason <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Sun, 30 Jan 2011 21:32:16 +0100
Jim Meyering wrote:
...
> I've amended those two commits, rebased, and pushed to
> a new branch: fiemap-copy-3.

As I mentioned, everything is now on "master".
Odd how that works.  Minutes after pushing everything,
I wondered if there was a NEWS entry for this feature...
No.  So I wrote this:

From 5b11cd01790473b1964a0f91eca5205dfdcae773 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sun, 30 Jan 2011 21:27:12 +0100
Subject: [PATCH] doc: NEWS: mention cp's improvement

* NEWS (New Features): cp now copies sparse files efficiently.
---
 NEWS |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/NEWS b/NEWS
index 6e7efe1..042bfed 100644
--- a/NEWS
+++ b/NEWS
@@ -21,6 +21,15 @@ GNU coreutils NEWS                                    -*- outline -*-

 ** New features

+  cp now copies sparse files efficiently on file systems with FIEMAP
+  support (ext4, btrfs, xfs, ocfs2).  Before, it had to read 2^20 bytes
+  when copying a 1MiB sparse file.  Now, it copies bytes only for the
+  non-sparse sections of a file.  Similarly, to induce a hole in the
+  output file, it had to detect a long sequence of zero bytes.  Now,
+  it knows precisely where each hole in an input file is, and can
+  reproduce them efficiently in the output file.  mv also benefits
+  when it resorts to copying, e.g., between file systems.
+
   join now supports -o 'auto' which will automatically infer the
   output format from the first line in each file, to ensure
   the same number of fields are output for each line.
--
1.7.3.5.44.g960a




Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Sun, 30 Jan 2011 20:25:02 GMT) Full text and rfc822 format available.

Notification sent to "jeff.liu" <jeff.liu <at> oracle.com>:
bug acknowledged by developer. (Sun, 30 Jan 2011 20:25:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 28 Feb 2011 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 14 years and 119 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.