From unknown Sat Jul 12 02:58:11 2025 X-Loop: help-debbugs@gnu.org Subject: bug#8061: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module Resent-From: Jeff liu Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 17 Feb 2011 13:50:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 8061 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 8061@debbugs.gnu.org Cc: "jeff.liu" , Jim Meyering X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.129795056721401 (code B ref -1); Thu, 17 Feb 2011 13:50:03 +0000 Received: (at submit) by debbugs.gnu.org; 17 Feb 2011 13:49:27 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Pq4EY-0005Z7-Aw for submit@debbugs.gnu.org; Thu, 17 Feb 2011 08:49:27 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Pq4EV-0005Yv-T5 for submit@debbugs.gnu.org; Thu, 17 Feb 2011 08:49:24 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pq4NH-00089I-L7 for submit@debbugs.gnu.org; Thu, 17 Feb 2011 08:58:29 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,HTML_MESSAGE, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:43637) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pq4NH-00089B-Ae for submit@debbugs.gnu.org; Thu, 17 Feb 2011 08:58:27 -0500 Received: from [140.186.70.92] (port=39195 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pq4Mw-0004mB-2Q for bug-coreutils@gnu.org; Thu, 17 Feb 2011 08:58:26 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pq4MU-0007zp-Lx for bug-coreutils@gnu.org; Thu, 17 Feb 2011 08:57:40 -0500 Received: from rcsinet10.oracle.com ([148.87.113.121]:18456) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pq4MU-0007zR-8M for bug-coreutils@gnu.org; Thu, 17 Feb 2011 08:57:38 -0500 Received: from acsinet15.oracle.com (acsinet15.oracle.com [141.146.126.227]) by rcsinet10.oracle.com (Switch-3.4.2/Switch-3.4.2) with ESMTP id p1HDvT4c001402 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 17 Feb 2011 13:57:34 GMT Received: from acsmt355.oracle.com (acsmt355.oracle.com [141.146.40.155]) by acsinet15.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id p1HD39m2027007; Thu, 17 Feb 2011 13:57:29 GMT Received: from abhmt014.oracle.com by acsmt354.oracle.com with ESMTP id 1063839691297951043; Thu, 17 Feb 2011 05:57:23 -0800 Received: from [192.168.1.101] (/221.223.109.12) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 17 Feb 2011 05:57:21 -0800 From: Jeff liu Content-Type: multipart/alternative; boundary=Apple-Mail-3--331228896 Date: Thu, 17 Feb 2011 21:57:14 +0800 Message-Id: <2DB776C1-EF34-423D-8BE5-71C2F49DFF01@oracle.com> Mime-Version: 1.0 (Apple Message framework v1081) X-Mailer: Apple Mail (2.1081) X-Source-IP: acsmt355.oracle.com [141.146.40.155] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090203.4D5D2949.0106:SCFMA4539814,ss=1,fgs=0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 199.232.76.165 X-Spam-Score: -6.4 (------) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.4 (------) --Apple-Mail-3--331228896 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hello All, This is the first try to introduce the SEEK_DATA/SEEK_HOLE support to = extent_scan module for efficient sparse file copy on ZFS, I have = delayed it for a long time, sorry for that! Below is the code change lists: src/extent_scan.h: add a new structure item 'src_total_size' to "struct = extent_info", since I have to make use of this value to determine a file is sparse of not for the initial scan. If the returns of = lseek(fd, 0, SEEK_HOLE) is equal to the src_total_size or large than it, = it means the source file is definitely not a sparse file or maybe it is a sparse file but it does = not make sense for proceeding scan read. another change in this file is the signature of extent_scan_init(), just = as I mentioned above, it need to accept the src_total_size variable. src/extent_scan.c: implement the new exent_scan_read() through = SEEK_DATA/SEEK_HOLE, it will be called if those two values are defined = at . src/copy.c: pass src_total_size to extent_scan_init(). On my test environment, Solaris10, SunOS 5.10 Generic_142910-17, I have = tried a few simple cases, they are works to me. For now, I am using diff(1) to verify the copy result, does anyone know = some utilities can be used to write the test script? I have sent an email to ZFS DEV mail-list to ask this question = yesterday, a nice guy suggest me to use = ZDB(http://cuddletech.com/blog/?p=3D407) for that, I'm still study this utility now, I also noticed there is patch to add = SEEK_HOLE/SEEK_DATA support to os module in Python community, please = refer to: http://bugs.python.org/file19566/z.patch but it require very latest python build I think, so could anyone give = some other advices in this point? The patch is shown as following, any help testing and comments are = appreciated!! Thanks, -Jeff From: Jie Liu Date: Thu, 17 Feb 2011 21:14:23 +0800 Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to = extent_scan module * src/extent_scan.h: add src_total_size to struct extent_info, we need to check the SEEK_HOLE result against it for initial extent scan. modify the extent_scan_init() signature, to add size_t src_total_size. * src/extent_scan.c: implement a new extent_scan_read() through = SEEK_DATA and SEEK_HOLE. * src/copy.c: pass src_total_size to extent_scan_init(). Signed-off-by: Jie Liu --- src/copy.c | 2 +- src/extent-scan.c | 113 = ++++++++++++++++++++++++++++++++++++++++++++++++++++- src/extent-scan.h | 9 +++- 3 files changed, 120 insertions(+), 4 deletions(-) diff --git a/src/copy.c b/src/copy.c index 104652d..22b9911 100644 --- a/src/copy.c +++ b/src/copy.c @@ -306,7 +306,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, = size_t buf_size, We may need this at the end, for a final ftruncate. */ off_t dest_pos =3D 0; =20 - extent_scan_init (src_fd, &scan); + extent_scan_init (src_fd, src_total_size, &scan); =20 *require_normal_copy =3D false; bool wrote_hole_at_eof =3D true; diff --git a/src/extent-scan.c b/src/extent-scan.c index 1ba59db..ffeab7a 100644 --- a/src/extent-scan.c +++ b/src/extent-scan.c @@ -32,13 +32,17 @@ /* Allocate space for struct extent_scan, initialize the entries if necessary and return it as the input argument of extent_scan_read(). = */ extern void -extent_scan_init (int src_fd, struct extent_scan *scan) +extent_scan_init (int src_fd, size_t src_total_size, + struct extent_scan *scan) { scan->fd =3D src_fd; scan->ei_count =3D 0; scan->scan_start =3D 0; scan->initial_scan_failed =3D false; scan->hit_final_extent =3D false; +#if defined(SEEK_HOLE) && defined(SEEK_DATA) + scan->src_total_size =3D src_total_size; +#endif } =20 #ifdef __linux__ @@ -106,6 +110,113 @@ extent_scan_read (struct extent_scan *scan) =20 return true; } +#elif defined(SEEK_HOLE) && defined(SEEK_DATA) +extern bool +extent_scan_read (struct extent_scan *scan) +{ + off_t data_pos, hole_pos; + union { struct extent_info ei; char c[4096]; } extent_buf; + struct extent_info *ext_info =3D &extent_buf.ei; + enum { count =3D (sizeof extent_buf / sizeof *ext_info) }; + verify (count !=3D 0); + + memset (&extent_buf, 0, sizeof extent_buf); + + if (scan->scan_start =3D=3D 0) + { +# ifdef _PC_MIN_HOLE_SIZE + /* To determine if the underlaying file system support + SEEK_HOLE, if not, fall back to the standard copy. */ + if (fpathconf (scan->fd, _PC_MIN_HOLE_SIZE) < 0) + { + scan->initial_scan_failed =3D true; + return false; + } +# endif + + /* If we have been compiled on an OS that supports SEEK_HOLE + but run on an OS that does not support SEEK_HOLE, we get + EINVAL. If the underlying filesystem does not support the + SEEK_HOLE call, we get ENOTSUP, fall back to standard copy + in either case. */ + hole_pos =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE); + if (hole_pos < 0) + { + if (errno =3D=3D EINVAL || errno =3D=3D ENOTSUP) + scan->initial_scan_failed =3D true; + return false; + } + + /* Seek back to position 0 first if we detected a real hole. */ + if (hole_pos > 0) + { + off_t tmp_pos; + tmp_pos =3D lseek (scan->fd, (off_t) 0, SEEK_SET); + if (tmp_pos !=3D (off_t) 0) + return false; + + /* The source file is definitely not a sparse file, or it + maybe a sparse file but SEEK_HOLE returns the source = file's + total size, fall back to the standard copy too. */ + if (hole_pos >=3D scan->src_total_size) + { + scan->initial_scan_failed =3D true; + return false; + } + } + } + + unsigned int i =3D 0; + /* If lseek(2) failed and the errno is set to ENXIO, for + SEEK_DATA there are no more data regions past the supplied + offset. For SEEK_HOLE, there are no more holes past the=20 + supplied offset. Set scan->hit_final_extent to true for + either case. */ + do { + data_pos =3D lseek (scan->fd, scan->scan_start, SEEK_DATA); + if (data_pos < 0) + { + if (errno !=3D ENXIO) + return false; + else + { + scan->hit_final_extent =3D true; + return true; + } + } + + hole_pos =3D lseek (scan->fd, data_pos, SEEK_HOLE); + if (hole_pos < 0) + { + if (errno !=3D ENXIO) + return false; + else + { + scan->hit_final_extent =3D true; + return true; + } + } + + ext_info[i].ext_logical =3D data_pos; + ext_info[i].ext_length =3D hole_pos - data_pos; + scan->scan_start =3D hole_pos; + ++i; + } while (scan->scan_start < scan->src_total_size && i < count); + + i--; + scan->ei_count =3D i; + scan->ext_info =3D xnmalloc (scan->ei_count, sizeof (struct = extent_info)); + + for (i =3D 0; i < scan->ei_count; i++) + { + assert (ext_info[i].ext_logical <=3D OFF_T_MAX); + + scan->ext_info[i].ext_logical =3D ext_info[i].ext_logical; + scan->ext_info[i].ext_length =3D ext_info[i].ext_length; + } + + return true;=20 +} #else extern bool extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED) diff --git a/src/extent-scan.h b/src/extent-scan.h index 4724b25..a271b95 100644 --- a/src/extent-scan.h +++ b/src/extent-scan.h @@ -18,7 +18,6 @@ =20 #ifndef EXTENT_SCAN_H # define EXTENT_SCAN_H - /* Structure used to store information of each extent. */ struct extent_info { @@ -38,6 +37,11 @@ struct extent_scan /* File descriptor of extent scan run against. */ int fd; =20 +#if defined(SEEK_DATA) && defined(SEEK_HOLE) + /* Source file size, i.e, (struct stat) &statbuf.st_size. */ + size_t src_total_size; +#endif + /* Next scan start offset. */ off_t scan_start; =20 @@ -55,7 +59,8 @@ struct extent_scan struct extent_info *ext_info; }; =20 -void extent_scan_init (int src_fd, struct extent_scan *scan); +void extent_scan_init (int src_fd, size_t src_total_size, + struct extent_scan *scan); =20 bool extent_scan_read (struct extent_scan *scan); =20 --=20 1.7.4= --Apple-Mail-3--331228896 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
a file is sparse of not for the = initial scan.  If the returns of lseek(fd, 0, SEEK_HOLE) is equal = to the src_total_size or large than it, it means the source = file
is definitely not a sparse file or maybe it is a sparse = file but it does not make sense for proceeding scan = read.
another change in this file is the signature of = extent_scan_init(), just as I mentioned above, it need to accept the = src_total_size variable.
src/extent_scan.c: implement the new = exent_scan_read() through SEEK_DATA/SEEK_HOLE, it will be called if = those two values are defined at <unistd.h>.
src/copy.c: = pass src_total_size to extent_scan_init().

On = my test environment,  Solaris10, SunOS 5.10 Generic_142910-17, I = have tried a few simple cases, they are works to = me.

For now, I am using diff(1) to verify the = copy result,  does anyone know some utilities can be used to write = the test script?
I have sent an email to ZFS DEV mail-list to = ask this question yesterday,  a nice guy suggest me to use ZDB(http://cuddletech.com/blog/?p= =3D407) for that, I'm
still study this utility now,   = I also noticed there is patch to add SEEK_HOLE/SEEK_DATA support to os = module in Python community,  please refer to:
but it require very latest python build I = think,  so could anyone give some other advices in this = point?

The patch is shown as following, any = help testing and comments are = appreciated!!


Thanks,
-J= eff


From: Jie Liu <jeff.liu@oracle.com>
= Date: Thu, 17 Feb 2011 21:14:23 +0800
Subject: [PATCH 1/1] = copy: add SEEK_DATA/SEEK_HOLE support to extent_scan = module

* src/extent_scan.h: add src_total_size = to struct extent_info, we need
  to check the = SEEK_HOLE result against it for initial extent = scan.
  modify the extent_scan_init() signature, to = add size_t src_total_size.
* src/extent_scan.c: implement a = new extent_scan_read() through SEEK_DATA
  and = SEEK_HOLE.
* src/copy.c: pass src_total_size to = extent_scan_init().

Signed-off-by: Jie Liu = <jeff.liu@oracle.com>
= ---
 src/copy.c        |   =  2 +-
 src/extent-scan.c |  113 = ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 src/= extent-scan.h |    9 +++-
 3 files changed, 120 = insertions(+), 4 deletions(-)

diff --git = a/src/copy.c b/src/copy.c
index 104652d..22b9911 = 100644
--- a/src/copy.c
+++ = b/src/copy.c
@@ -306,7 +306,7 @@ extent_copy (int src_fd, int = dest_fd, char *buf, size_t buf_size,
     =  We may need this at the end, for a final ftruncate. =  */
   off_t dest_pos =3D = 0;
 
-  extent_scan_init (src_fd, = &scan);
+  extent_scan_init (src_fd, src_total_size, = &scan);
 
   *require_normal_copy = =3D false;
   bool wrote_hole_at_eof =3D = true;
diff --git a/src/extent-scan.c = b/src/extent-scan.c
index 1ba59db..ffeab7a = 100644
--- a/src/extent-scan.c
+++ = b/src/extent-scan.c
@@ -32,13 +32,17 @@
 /* = Allocate space for struct extent_scan, initialize the entries = if
    necessary and return it as the input = argument of extent_scan_read().  */
 extern = void
-extent_scan_init (int src_fd, struct extent_scan = *scan)
+extent_scan_init (int src_fd, size_t = src_total_size,
+             =      struct extent_scan = *scan)
 {
   scan->fd =3D = src_fd;
   scan->ei_count =3D = 0;
   scan->scan_start =3D = 0;
   scan->initial_scan_failed =3D = false;
   scan->hit_final_extent =3D = false;
+#if defined(SEEK_HOLE) && = defined(SEEK_DATA)
+  scan->src_total_size =3D = src_total_size;
+#endif
 }
 
=
 #ifdef __linux__
@@ -106,6 +110,113 @@ = extent_scan_read (struct extent_scan = *scan)
 
   return = true;
 }
+#elif defined(SEEK_HOLE) && = defined(SEEK_DATA)
+extern bool
+extent_scan_read = (struct extent_scan *scan)
+{
+  off_t = data_pos, hole_pos;
+  union { struct extent_info ei; = char c[4096]; } extent_buf;
+  struct extent_info = *ext_info =3D &extent_buf.ei;
+  enum { count =3D = (sizeof extent_buf / sizeof *ext_info) };
+  verify = (count !=3D 0);
+
+  memset (&extent_buf, = 0, sizeof extent_buf);
+
+  if = (scan->scan_start =3D=3D 0)
+    {
+# = ifdef _PC_MIN_HOLE_SIZE
+      /* To determine = if the underlaying file system support
+       =   SEEK_HOLE, if not, fall back to the standard copy. =  */
+      if (fpathconf (scan->fd, = _PC_MIN_HOLE_SIZE) < 0)
+       =  {
+         =  scan->initial_scan_failed =3D true;
+     =      return false;
+       =  }
+# endif
+
+     =  /* If we have been compiled on an OS that supports = SEEK_HOLE
+         but run on an OS that = does not support SEEK_HOLE, we get
+       =   EINVAL.  If the underlying filesystem does not support = the
+         SEEK_HOLE call, we get = ENOTSUP, fall back to standard copy
+       =   in either case.  */
+      hole_pos = =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE);
+     =  if (hole_pos < 0)
+       =  {
+          if (errno =3D=3D = EINVAL || errno =3D=3D ENOTSUP)
+         =    scan->initial_scan_failed =3D true;
+   =        return false;
+     =    }
+
+      /* Seek back = to position 0 first if we detected a real hole.  */
+ =      if (hole_pos > 0)
+     =    {
+          off_t = tmp_pos;
+          tmp_pos =3D lseek = (scan->fd, (off_t) 0, SEEK_SET);
+       =    if (tmp_pos !=3D (off_t) 0)
+     =          return false;
+
+ =          /* The source file is definitely not a = sparse file, or it
+             = maybe a sparse file but SEEK_HOLE returns the source file's
+ =             total size, fall back to the = standard copy too.  */
+         =  if (hole_pos >=3D scan->src_total_size)
+   =          {
+       =        scan->initial_scan_failed =3D = true;
+              return = false;
+           =  }
+        }
+   =  }
+
+  unsigned int i =3D 0;
+ =  /* If lseek(2) failed and the errno is set to ENXIO, = for
+     SEEK_DATA there are no more data regions = past the supplied
+     offset.  For SEEK_HOLE, = there are no more holes past the 
+     = supplied offset.  Set scan->hit_final_extent to true = for
+     either case.  */
+  do = {
+    data_pos =3D lseek (scan->fd, = scan->scan_start, SEEK_DATA);
+    if (data_pos = < 0)
+      {
+     =    if (errno !=3D ENXIO)
+       =    return false;
+       =  else
+          {
+ =            scan->hit_final_extent =3D = true;
+            return = true;
+          }
+   =    }
+
+    hole_pos =3D lseek = (scan->fd, data_pos, SEEK_HOLE);
+    if = (hole_pos < 0)
+      {
+   =      if (errno !=3D ENXIO)
+     =      return false;
+       =  else
+          {
+ =            scan->hit_final_extent =3D = true;
+            return = true;
+          }
+   =    }
+
+   =  ext_info[i].ext_logical =3D data_pos;
+   =  ext_info[i].ext_length =3D hole_pos - data_pos;
+   =  scan->scan_start =3D hole_pos;
+   =  ++i;
+  } while (scan->scan_start < = scan->src_total_size && i < = count);
+
+  i--;
+ =  scan->ei_count =3D i;
+  scan->ext_info =3D = xnmalloc (scan->ei_count, sizeof (struct = extent_info));
+
+  for (i =3D 0; i < = scan->ei_count; i++)
+    {
+   =    assert (ext_info[i].ext_logical <=3D = OFF_T_MAX);
+
+     =  scan->ext_info[i].ext_logical =3D = ext_info[i].ext_logical;
+     =  scan->ext_info[i].ext_length =3D = ext_info[i].ext_length;
+   =  }
+
+  return = true; 
+}
 #else
 extern = bool
 extent_scan_read (struct extent_scan *scan = ATTRIBUTE_UNUSED)
diff --git a/src/extent-scan.h = b/src/extent-scan.h
index 4724b25..a271b95 = 100644
--- a/src/extent-scan.h
+++ = b/src/extent-scan.h
@@ -18,7 +18,6 = @@
 
 #ifndef = EXTENT_SCAN_H
 # define = EXTENT_SCAN_H
-
 /* Structure used to store = information of each extent.  */
 struct = extent_info
 {
@@ -38,6 +37,11 @@ struct = extent_scan
   /* File descriptor of extent scan run = against.  */
   int = fd;
 
+#if defined(SEEK_DATA) && = defined(SEEK_HOLE)
+  /* Source file size, i.e, (struct = stat) &statbuf.st_size.  */
+  size_t = src_total_size;
+#endif
+
   /* = Next scan start offset.  */
   off_t = scan_start;
 
@@ -55,7 +59,8 @@ struct = extent_scan
   struct extent_info = *ext_info;
 };
 
-void = extent_scan_init (int src_fd, struct extent_scan *scan);
+void = extent_scan_init (int src_fd, size_t src_total_size,
+   =                     = struct extent_scan *scan);
 
 bool = extent_scan_read (struct extent_scan = *scan);
 
-- 
1.7.4
= --Apple-Mail-3--331228896-- From unknown Sat Jul 12 02:58:11 2025 X-Loop: help-debbugs@gnu.org Subject: bug#8061: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module Resent-From: Jeff liu Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Mon, 18 Apr 2011 14:17:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 8061 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Jeff liu Cc: bug-coreutils@gnu.org, Jim Meyering Received: via spool by submit@debbugs.gnu.org id=B.13031361746552 (code B ref -1); Mon, 18 Apr 2011 14:17:02 +0000 Received: (at submit) by debbugs.gnu.org; 18 Apr 2011 14:16:14 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QBpFK-0001ha-Jh for submit@debbugs.gnu.org; Mon, 18 Apr 2011 10:16:14 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QBpFH-0001h8-0q for submit@debbugs.gnu.org; Mon, 18 Apr 2011 10:16:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QBpF7-0000h8-P2 for submit@debbugs.gnu.org; Mon, 18 Apr 2011 10:16:01 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,HTML_MESSAGE, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:44165) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QBpF7-0000h4-NG for submit@debbugs.gnu.org; Mon, 18 Apr 2011 10:15:57 -0400 Received: from eggs.gnu.org ([140.186.70.92]:42879) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QBpF3-0006wU-I4 for bug-coreutils@gnu.org; Mon, 18 Apr 2011 10:15:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QBpEx-0000g4-8J for bug-coreutils@gnu.org; Mon, 18 Apr 2011 10:15:53 -0400 Received: from rcsinet10.oracle.com ([148.87.113.121]:56885) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QBpEw-0000fo-UI for bug-coreutils@gnu.org; Mon, 18 Apr 2011 10:15:47 -0400 Received: from acsinet15.oracle.com (acsinet15.oracle.com [141.146.126.227]) by rcsinet10.oracle.com (Switch-3.4.2/Switch-3.4.2) with ESMTP id p3IEFga9001504 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 18 Apr 2011 14:15:44 GMT Received: from acsmt358.oracle.com (acsmt358.oracle.com [141.146.40.158]) by acsinet15.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id p3IEFe4G019882 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 18 Apr 2011 14:15:40 GMT Received: from abhmt016.oracle.com (abhmt016.oracle.com [141.146.116.25]) by acsmt358.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id p3IEFdvj013419; Mon, 18 Apr 2011 09:15:40 -0500 Received: from [192.168.1.100] (/123.119.97.134) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 18 Apr 2011 07:15:34 -0700 Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: multipart/alternative; boundary=Apple-Mail-3-558882838 From: Jeff liu In-Reply-To: <2DB776C1-EF34-423D-8BE5-71C2F49DFF01@oracle.com> Date: Mon, 18 Apr 2011 22:15:13 +0800 Message-Id: References: <2DB776C1-EF34-423D-8BE5-71C2F49DFF01@oracle.com> X-Mailer: Apple Mail (2.1082) X-Source-IP: acsmt358.oracle.com [141.146.40.158] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090201.4DAC478D.005F:SCFSTAT5015188, ss=1, pt=DBB_65838, fgs=0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -6.4 (------) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.4 (------) --Apple-Mail-3-558882838 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=GB2312 Hi All, Please ignore the current patch, I will submit another patch with a few = fixes soon. Thanks, -Jeff =D4=DA 2011-2-17=A3=AC=CF=C2=CE=E79:57=A3=AC Jeff liu =D0=B4=B5=C0=A3=BA > Hello All, >=20 > This is the first try to introduce the SEEK_DATA/SEEK_HOLE support to = extent_scan module for efficient sparse file copy on ZFS, I have = delayed it for a long time, sorry for that! >=20 > Below is the code change lists: > src/extent_scan.h: add a new structure item 'src_total_size' to = "struct extent_info", since I have to make use of this value to = determine > a file is sparse of not for the initial scan. If the returns of = lseek(fd, 0, SEEK_HOLE) is equal to the src_total_size or large than it, = it means the source file > is definitely not a sparse file or maybe it is a sparse file but it = does not make sense for proceeding scan read. > another change in this file is the signature of extent_scan_init(), = just as I mentioned above, it need to accept the src_total_size = variable. > src/extent_scan.c: implement the new exent_scan_read() through = SEEK_DATA/SEEK_HOLE, it will be called if those two values are defined = at . > src/copy.c: pass src_total_size to extent_scan_init(). >=20 > On my test environment, Solaris10, SunOS 5.10 Generic_142910-17, I = have tried a few simple cases, they are works to me. >=20 > For now, I am using diff(1) to verify the copy result, does anyone = know some utilities can be used to write the test script? > I have sent an email to ZFS DEV mail-list to ask this question = yesterday, a nice guy suggest me to use = ZDB(http://cuddletech.com/blog/?p=3D407) for that, I'm > still study this utility now, I also noticed there is patch to add = SEEK_HOLE/SEEK_DATA support to os module in Python community, please = refer to: > http://bugs.python.org/file19566/z.patch > but it require very latest python build I think, so could anyone give = some other advices in this point? >=20 > The patch is shown as following, any help testing and comments are = appreciated!! >=20 >=20 > Thanks, > -Jeff >=20 >=20 > From: Jie Liu > Date: Thu, 17 Feb 2011 21:14:23 +0800 > Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to = extent_scan module >=20 > * src/extent_scan.h: add src_total_size to struct extent_info, we need > to check the SEEK_HOLE result against it for initial extent scan. > modify the extent_scan_init() signature, to add size_t = src_total_size. > * src/extent_scan.c: implement a new extent_scan_read() through = SEEK_DATA > and SEEK_HOLE. > * src/copy.c: pass src_total_size to extent_scan_init(). >=20 > Signed-off-by: Jie Liu > --- > src/copy.c | 2 +- > src/extent-scan.c | 113 = ++++++++++++++++++++++++++++++++++++++++++++++++++++- > src/extent-scan.h | 9 +++- > 3 files changed, 120 insertions(+), 4 deletions(-) >=20 > diff --git a/src/copy.c b/src/copy.c > index 104652d..22b9911 100644 > --- a/src/copy.c > +++ b/src/copy.c > @@ -306,7 +306,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, = size_t buf_size, > We may need this at the end, for a final ftruncate. */ > off_t dest_pos =3D 0; > =20 > - extent_scan_init (src_fd, &scan); > + extent_scan_init (src_fd, src_total_size, &scan); > =20 > *require_normal_copy =3D false; > bool wrote_hole_at_eof =3D true; > diff --git a/src/extent-scan.c b/src/extent-scan.c > index 1ba59db..ffeab7a 100644 > --- a/src/extent-scan.c > +++ b/src/extent-scan.c > @@ -32,13 +32,17 @@ > /* Allocate space for struct extent_scan, initialize the entries if > necessary and return it as the input argument of = extent_scan_read(). */ > extern void > -extent_scan_init (int src_fd, struct extent_scan *scan) > +extent_scan_init (int src_fd, size_t src_total_size, > + struct extent_scan *scan) > { > scan->fd =3D src_fd; > scan->ei_count =3D 0; > scan->scan_start =3D 0; > scan->initial_scan_failed =3D false; > scan->hit_final_extent =3D false; > +#if defined(SEEK_HOLE) && defined(SEEK_DATA) > + scan->src_total_size =3D src_total_size; > +#endif > } > =20 > #ifdef __linux__ > @@ -106,6 +110,113 @@ extent_scan_read (struct extent_scan *scan) > =20 > return true; > } > +#elif defined(SEEK_HOLE) && defined(SEEK_DATA) > +extern bool > +extent_scan_read (struct extent_scan *scan) > +{ > + off_t data_pos, hole_pos; > + union { struct extent_info ei; char c[4096]; } extent_buf; > + struct extent_info *ext_info =3D &extent_buf.ei; > + enum { count =3D (sizeof extent_buf / sizeof *ext_info) }; > + verify (count !=3D 0); > + > + memset (&extent_buf, 0, sizeof extent_buf); > + > + if (scan->scan_start =3D=3D 0) > + { > +# ifdef _PC_MIN_HOLE_SIZE > + /* To determine if the underlaying file system support > + SEEK_HOLE, if not, fall back to the standard copy. */ > + if (fpathconf (scan->fd, _PC_MIN_HOLE_SIZE) < 0) > + { > + scan->initial_scan_failed =3D true; > + return false; > + } > +# endif > + > + /* If we have been compiled on an OS that supports SEEK_HOLE > + but run on an OS that does not support SEEK_HOLE, we get > + EINVAL. If the underlying filesystem does not support the > + SEEK_HOLE call, we get ENOTSUP, fall back to standard copy > + in either case. */ > + hole_pos =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE); > + if (hole_pos < 0) > + { > + if (errno =3D=3D EINVAL || errno =3D=3D ENOTSUP) > + scan->initial_scan_failed =3D true; > + return false; > + } > + > + /* Seek back to position 0 first if we detected a real hole. = */ > + if (hole_pos > 0) > + { > + off_t tmp_pos; > + tmp_pos =3D lseek (scan->fd, (off_t) 0, SEEK_SET); > + if (tmp_pos !=3D (off_t) 0) > + return false; > + > + /* The source file is definitely not a sparse file, or it > + maybe a sparse file but SEEK_HOLE returns the source = file's > + total size, fall back to the standard copy too. */ > + if (hole_pos >=3D scan->src_total_size) > + { > + scan->initial_scan_failed =3D true; > + return false; > + } > + } > + } > + > + unsigned int i =3D 0; > + /* If lseek(2) failed and the errno is set to ENXIO, for > + SEEK_DATA there are no more data regions past the supplied > + offset. For SEEK_HOLE, there are no more holes past the=20 > + supplied offset. Set scan->hit_final_extent to true for > + either case. */ > + do { > + data_pos =3D lseek (scan->fd, scan->scan_start, SEEK_DATA); > + if (data_pos < 0) > + { > + if (errno !=3D ENXIO) > + return false; > + else > + { > + scan->hit_final_extent =3D true; > + return true; > + } > + } > + > + hole_pos =3D lseek (scan->fd, data_pos, SEEK_HOLE); > + if (hole_pos < 0) > + { > + if (errno !=3D ENXIO) > + return false; > + else > + { > + scan->hit_final_extent =3D true; > + return true; > + } > + } > + > + ext_info[i].ext_logical =3D data_pos; > + ext_info[i].ext_length =3D hole_pos - data_pos; > + scan->scan_start =3D hole_pos; > + ++i; > + } while (scan->scan_start < scan->src_total_size && i < count); > + > + i--; > + scan->ei_count =3D i; > + scan->ext_info =3D xnmalloc (scan->ei_count, sizeof (struct = extent_info)); > + > + for (i =3D 0; i < scan->ei_count; i++) > + { > + assert (ext_info[i].ext_logical <=3D OFF_T_MAX); > + > + scan->ext_info[i].ext_logical =3D ext_info[i].ext_logical; > + scan->ext_info[i].ext_length =3D ext_info[i].ext_length; > + } > + > + return true;=20 > +} > #else > extern bool > extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED) > diff --git a/src/extent-scan.h b/src/extent-scan.h > index 4724b25..a271b95 100644 > --- a/src/extent-scan.h > +++ b/src/extent-scan.h > @@ -18,7 +18,6 @@ > =20 > #ifndef EXTENT_SCAN_H > # define EXTENT_SCAN_H > - > /* Structure used to store information of each extent. */ > struct extent_info > { > @@ -38,6 +37,11 @@ struct extent_scan > /* File descriptor of extent scan run against. */ > int fd; > =20 > +#if defined(SEEK_DATA) && defined(SEEK_HOLE) > + /* Source file size, i.e, (struct stat) &statbuf.st_size. */ > + size_t src_total_size; > +#endif > + > /* Next scan start offset. */ > off_t scan_start; > =20 > @@ -55,7 +59,8 @@ struct extent_scan > struct extent_info *ext_info; > }; > =20 > -void extent_scan_init (int src_fd, struct extent_scan *scan); > +void extent_scan_init (int src_fd, size_t src_total_size, > + struct extent_scan *scan); > =20 > bool extent_scan_read (struct extent_scan *scan); > =20 > --=20 > 1.7.4 --Apple-Mail-3-558882838 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=GB2312 Hi = All,

Please ignore the current patch, I will submit = another patch with a few fixes = soon.


Thanks,
-Jeff

=D4=DA 2011-2-17=A3=AC=CF=C2=CE=E79:57=A3=AC Jeff = liu =D0=B4=B5=C0=A3=BA

Hello = All,

This is the first try to introduce the = SEEK_DATA/SEEK_HOLE support to extent_scan module for efficient sparse = file copy on ZFS,  I have delayed it for a long time, sorry for = that!

Below is the code change = lists:
src/extent_scan.h:  add a new structure item = 'src_total_size' to "struct extent_info",  since I have to make use = of this value to determine
a file is sparse of not for the = initial scan.  If the returns of lseek(fd, 0, SEEK_HOLE) is equal = to the src_total_size or large than it, it means the source = file
is definitely not a sparse file or maybe it is a sparse = file but it does not make sense for proceeding scan = read.
another change in this file is the signature of = extent_scan_init(), just as I mentioned above, it need to accept the = src_total_size variable.
src/extent_scan.c: implement the new = exent_scan_read() through SEEK_DATA/SEEK_HOLE, it will be called if = those two values are defined at <unistd.h>.
src/copy.c: = pass src_total_size to extent_scan_init().

On = my test environment,  Solaris10, SunOS 5.10 Generic_142910-17, I = have tried a few simple cases, they are works to = me.

For now, I am using diff(1) to verify the = copy result,  does anyone know some utilities can be used to write = the test script?
I have sent an email to ZFS DEV mail-list to = ask this question yesterday,  a nice guy suggest me to use ZDB(http://cuddletech.com/blog/?p= =3D407) for that, I'm
still study this utility now,   = I also noticed there is patch to add SEEK_HOLE/SEEK_DATA support to os = module in Python community,  please refer to:
but it require very latest python build I = think,  so could anyone give some other advices in this = point?

The patch is shown as following, any = help testing and comments are = appreciated!!


Thanks,
-J= eff


From: Jie Liu <jeff.liu@oracle.com>
= Date: Thu, 17 Feb 2011 21:14:23 +0800
Subject: [PATCH 1/1] = copy: add SEEK_DATA/SEEK_HOLE support to extent_scan = module

* src/extent_scan.h: add src_total_size = to struct extent_info, we need
  to check the = SEEK_HOLE result against it for initial extent = scan.
  modify the extent_scan_init() signature, to = add size_t src_total_size.
* src/extent_scan.c: implement a = new extent_scan_read() through SEEK_DATA
  and = SEEK_HOLE.
* src/copy.c: pass src_total_size to = extent_scan_init().

Signed-off-by: Jie Liu = <jeff.liu@oracle.com>
= ---
 src/copy.c        |   =  2 +-
 src/extent-scan.c |  113 = ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 src/= extent-scan.h |    9 +++-
 3 files changed, 120 = insertions(+), 4 deletions(-)

diff --git = a/src/copy.c b/src/copy.c
index 104652d..22b9911 = 100644
--- a/src/copy.c
+++ = b/src/copy.c
@@ -306,7 +306,7 @@ extent_copy (int src_fd, int = dest_fd, char *buf, size_t buf_size,
     =  We may need this at the end, for a final ftruncate. =  */
   off_t dest_pos =3D = 0;
 
-  extent_scan_init (src_fd, = &scan);
+  extent_scan_init (src_fd, src_total_size, = &scan);
 
   *require_normal_copy = =3D false;
   bool wrote_hole_at_eof =3D = true;
diff --git a/src/extent-scan.c = b/src/extent-scan.c
index 1ba59db..ffeab7a = 100644
--- a/src/extent-scan.c
+++ = b/src/extent-scan.c
@@ -32,13 +32,17 @@
 /* = Allocate space for struct extent_scan, initialize the entries = if
    necessary and return it as the input = argument of extent_scan_read().  */
 extern = void
-extent_scan_init (int src_fd, struct extent_scan = *scan)
+extent_scan_init (int src_fd, size_t = src_total_size,
+             =      struct extent_scan = *scan)
 {
   scan->fd =3D = src_fd;
   scan->ei_count =3D = 0;
   scan->scan_start =3D = 0;
   scan->initial_scan_failed =3D = false;
   scan->hit_final_extent =3D = false;
+#if defined(SEEK_HOLE) && = defined(SEEK_DATA)
+  scan->src_total_size =3D = src_total_size;
+#endif
 }
 
=
 #ifdef __linux__
@@ -106,6 +110,113 @@ = extent_scan_read (struct extent_scan = *scan)
 
   return = true;
 }
+#elif defined(SEEK_HOLE) && = defined(SEEK_DATA)
+extern bool
+extent_scan_read = (struct extent_scan *scan)
+{
+  off_t = data_pos, hole_pos;
+  union { struct extent_info ei; = char c[4096]; } extent_buf;
+  struct extent_info = *ext_info =3D &extent_buf.ei;
+  enum { count =3D = (sizeof extent_buf / sizeof *ext_info) };
+  verify = (count !=3D 0);
+
+  memset (&extent_buf, = 0, sizeof extent_buf);
+
+  if = (scan->scan_start =3D=3D 0)
+    {
+# = ifdef _PC_MIN_HOLE_SIZE
+      /* To determine = if the underlaying file system support
+       =   SEEK_HOLE, if not, fall back to the standard copy. =  */
+      if (fpathconf (scan->fd, = _PC_MIN_HOLE_SIZE) < 0)
+       =  {
+         =  scan->initial_scan_failed =3D true;
+     =      return false;
+       =  }
+# endif
+
+     =  /* If we have been compiled on an OS that supports = SEEK_HOLE
+         but run on an OS that = does not support SEEK_HOLE, we get
+       =   EINVAL.  If the underlying filesystem does not support = the
+         SEEK_HOLE call, we get = ENOTSUP, fall back to standard copy
+       =   in either case.  */
+      hole_pos = =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE);
+     =  if (hole_pos < 0)
+       =  {
+          if (errno =3D=3D = EINVAL || errno =3D=3D ENOTSUP)
+         =    scan->initial_scan_failed =3D true;
+   =        return false;
+     =    }
+
+      /* Seek back = to position 0 first if we detected a real hole.  */
+ =      if (hole_pos > 0)
+     =    {
+          off_t = tmp_pos;
+          tmp_pos =3D lseek = (scan->fd, (off_t) 0, SEEK_SET);
+       =    if (tmp_pos !=3D (off_t) 0)
+     =          return false;
+
+ =          /* The source file is definitely not a = sparse file, or it
+             = maybe a sparse file but SEEK_HOLE returns the source file's
+ =             total size, fall back to the = standard copy too.  */
+         =  if (hole_pos >=3D scan->src_total_size)
+   =          {
+       =        scan->initial_scan_failed =3D = true;
+              return = false;
+           =  }
+        }
+   =  }
+
+  unsigned int i =3D 0;
+ =  /* If lseek(2) failed and the errno is set to ENXIO, = for
+     SEEK_DATA there are no more data regions = past the supplied
+     offset.  For SEEK_HOLE, = there are no more holes past the 
+     = supplied offset.  Set scan->hit_final_extent to true = for
+     either case.  */
+  do = {
+    data_pos =3D lseek (scan->fd, = scan->scan_start, SEEK_DATA);
+    if (data_pos = < 0)
+      {
+     =    if (errno !=3D ENXIO)
+       =    return false;
+       =  else
+          {
+ =            scan->hit_final_extent =3D = true;
+            return = true;
+          }
+   =    }
+
+    hole_pos =3D lseek = (scan->fd, data_pos, SEEK_HOLE);
+    if = (hole_pos < 0)
+      {
+   =      if (errno !=3D ENXIO)
+     =      return false;
+       =  else
+          {
+ =            scan->hit_final_extent =3D = true;
+            return = true;
+          }
+   =    }
+
+   =  ext_info[i].ext_logical =3D data_pos;
+   =  ext_info[i].ext_length =3D hole_pos - data_pos;
+   =  scan->scan_start =3D hole_pos;
+   =  ++i;
+  } while (scan->scan_start < = scan->src_total_size && i < = count);
+
+  i--;
+ =  scan->ei_count =3D i;
+  scan->ext_info =3D = xnmalloc (scan->ei_count, sizeof (struct = extent_info));
+
+  for (i =3D 0; i < = scan->ei_count; i++)
+    {
+   =    assert (ext_info[i].ext_logical <=3D = OFF_T_MAX);
+
+     =  scan->ext_info[i].ext_logical =3D = ext_info[i].ext_logical;
+     =  scan->ext_info[i].ext_length =3D = ext_info[i].ext_length;
+   =  }
+
+  return = true; 
+}
 #else
 extern = bool
 extent_scan_read (struct extent_scan *scan = ATTRIBUTE_UNUSED)
diff --git a/src/extent-scan.h = b/src/extent-scan.h
index 4724b25..a271b95 = 100644
--- a/src/extent-scan.h
+++ = b/src/extent-scan.h
@@ -18,7 +18,6 = @@
 
 #ifndef = EXTENT_SCAN_H
 # define = EXTENT_SCAN_H
-
 /* Structure used to store = information of each extent.  */
 struct = extent_info
 {
@@ -38,6 +37,11 @@ struct = extent_scan
   /* File descriptor of extent scan run = against.  */
   int = fd;
 
+#if defined(SEEK_DATA) && = defined(SEEK_HOLE)
+  /* Source file size, i.e, (struct = stat) &statbuf.st_size.  */
+  size_t = src_total_size;
+#endif
+
   /* = Next scan start offset.  */
   off_t = scan_start;
 
@@ -55,7 +59,8 @@ struct = extent_scan
   struct extent_info = *ext_info;
 };
 
-void = extent_scan_init (int src_fd, struct extent_scan *scan);
+void = extent_scan_init (int src_fd, size_t src_total_size,
+   =                     = struct extent_scan *scan);
 
 bool = extent_scan_read (struct extent_scan = *scan);
 
-- 
1.7.4

= --Apple-Mail-3-558882838-- From unknown Sat Jul 12 02:58:11 2025 X-Loop: help-debbugs@gnu.org Subject: bug#8061: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module Resent-From: Jeff liu Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Tue, 19 Apr 2011 08:54:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 8061 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Jeff liu Cc: =?UTF-8?Q?P=C3=A1draig?= Brady , bug-coreutils@gnu.org, Jim Meyering Received: via spool by submit@debbugs.gnu.org id=B.130320320116976 (code B ref -1); Tue, 19 Apr 2011 08:54:01 +0000 Received: (at submit) by debbugs.gnu.org; 19 Apr 2011 08:53:21 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QC6gS-0004Pk-67 for submit@debbugs.gnu.org; Tue, 19 Apr 2011 04:53:21 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QC6gN-0004PW-Sl for submit@debbugs.gnu.org; Tue, 19 Apr 2011 04:53:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QC6gF-0004e3-F9 for submit@debbugs.gnu.org; Tue, 19 Apr 2011 04:53:10 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,HTML_MESSAGE, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:42601) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QC6gF-0004dz-9g for submit@debbugs.gnu.org; Tue, 19 Apr 2011 04:53:07 -0400 Received: from eggs.gnu.org ([140.186.70.92]:60467) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QC6gC-0007KG-25 for bug-coreutils@gnu.org; Tue, 19 Apr 2011 04:53:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QC6g8-0004cn-CG for bug-coreutils@gnu.org; Tue, 19 Apr 2011 04:53:04 -0400 Received: from rcsinet10.oracle.com ([148.87.113.121]:49013) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QC6g7-0004ce-VD for bug-coreutils@gnu.org; Tue, 19 Apr 2011 04:53:00 -0400 Received: from rcsinet15.oracle.com (rcsinet15.oracle.com [148.87.113.117]) by rcsinet10.oracle.com (Switch-3.4.2/Switch-3.4.2) with ESMTP id p3J8qixt002378 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 19 Apr 2011 08:52:46 GMT Received: from acsmt358.oracle.com (acsmt358.oracle.com [141.146.40.158]) by rcsinet15.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id p3J8qhGA022441 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 19 Apr 2011 08:52:43 GMT Received: from abhmt013.oracle.com (abhmt013.oracle.com [141.146.116.22]) by acsmt358.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id p3J8pgPb002153; Tue, 19 Apr 2011 03:51:42 -0500 Received: from [10.189.10.146] (/203.190.176.53) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 19 Apr 2011 01:51:41 -0700 Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: multipart/alternative; boundary=Apple-Mail-1-625867617 From: Jeff liu In-Reply-To: Date: Tue, 19 Apr 2011 16:51:38 +0800 Message-Id: <3E3FBE56-4D89-44A4-94ED-13F3D1F693A3@oracle.com> References: <2DB776C1-EF34-423D-8BE5-71C2F49DFF01@oracle.com> X-Mailer: Apple Mail (2.1082) X-Source-IP: acsmt358.oracle.com [141.146.40.158] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090201.4DAD4D5C.00E3:SCFSTAT5015188, ss=1, pt=DBB_65838, fgs=0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -6.4 (------) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.4 (------) --Apple-Mail-1-625867617 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > Hi All, >=20 > Please ignore the current patch, I will submit another patch with a = few fixes soon. Now the new patch set coming, =20 In previous post, I have tried to change the extent_scan_init() = interface by adding a new argument to indicate the source file size, this will reduce the overhead of call fstat(2) in extent_scan_read(), = since the file size is definitely needed for SEEK* stuff, however, the = file size is redundant for FIEMAP. so I changed my idea to keep extent_scan_init() as before, instead, to = retrieve the file size in extent_scan_read() when launching the first = scan, one benefit is, there is nothing need to be modified in extent_copy() for this patch. Tests: =3D=3D=3D=3D A new test sparse-lseek was introduced in this post, it make use of the = sparse file generation function in Perl, and do `cmp` against the target = copied file. I have also took a look at the `sdb` utility shipped with ZFS, but did = not found any interesting stuff can be used for this test. Test run passed on my environment as below, bash-3.00# make check TESTS=3Dcp/sparse-lseek VERBOSE=3Dyes make check-TESTS make[1]: Entering directory `/coreutils/tests' make[2]: Entering directory `/coreutils/tests' PASS: cp/sparse-lseek =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 1 test passed =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D make[2]: Leaving directory `/coreutils/tests' make[1]: Leaving directory `/coreutils/tests' GEN vc_exe_in_TESTS No differences encountered Manual tests: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 1. Ensure trailing blanks, test 0 size sparse file, non-sparse file, = sparse file with hole start and hole end. 2. make syntax-check failed, I have no idea of this issue at the moment, = I also tried to run make distcheck, looks the package building, install = and uninstall procedures all passed, but it also failed at the final stage, am I missing something here?=20 The logs which were shown as following, bash-3.00# make syntax-check GFDL_version awk: syntax error near line 1 awk: bailing out near line 1 make: *** [sc_GFDL_version.z] Error 2 make distcheck: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D ...... make[1]: Entering directory `/coreutils' GEN check-ls-dircolors make my-distcheck make[2]: Entering directory `/coreutils' make syntax-check make[3]: Entering directory `/coreutils' GFDL_version awk: syntax error near line 1 awk: bailing out near line 1 make[3]: *** [sc_GFDL_version.z] Error 2 make[3]: Leaving directory `/coreutils' make[2]: *** [my-distcheck] Error 2 make[2]: Leaving directory `/coreutils' make[1]: *** [distcheck-hook] Error 2 make[1]: Leaving directory `/coreutils' make: *** [distcheck] Error 1 Below is the revised patch, =46rom 4f966c1fe6226f3f711faae120cd8bea78e722b8 Mon Sep 17 00:00:00 2001 From: Jie Liu Date: Tue, 19 Apr 2011 15:24:50 -0700 Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to = extent_scan module * src/extent_scan.h: introduce src_total_size to struct extent_info, we need it for lseek(2) iteration. * src/extent_scan.c: implement a new extent_scan_read() through = SEEK_DATA and SEEK_HOLE if those stuff are supported. * tests/cp/sparse-lseek: add a new test for lseek(2) extent copy. Signed-off-by: Jie Liu --- src/extent-scan.c | 119 = +++++++++++++++++++++++++++++++++++++++++++++++++ src/extent-scan.h | 5 ++ tests/Makefile.am | 1 + tests/cp/sparse-lseek | 56 +++++++++++++++++++++++ 4 files changed, 181 insertions(+), 0 deletions(-) create mode 100755 tests/cp/sparse-lseek diff --git a/src/extent-scan.c b/src/extent-scan.c index da7eb9d..a54eca0 100644 --- a/src/extent-scan.c +++ b/src/extent-scan.c @@ -17,7 +17,9 @@ Written by Jie Liu (jeff.liu@oracle.com). */ =20 #include +#include #include +#include #include #include #include @@ -71,6 +73,9 @@ extent_scan_init (int src_fd, struct extent_scan = *scan) scan->initial_scan_failed =3D false; scan->hit_final_extent =3D false; scan->fm_flags =3D extent_need_sync () ? FIEMAP_FLAG_SYNC : 0; +#if defined (SEEK_DATA) && defined (SEEK_HOLE) + scan->src_total_size =3D 0; +#endif } =20 #ifdef __linux__ @@ -204,6 +209,120 @@ extent_scan_read (struct extent_scan *scan) =20 return true; } +#elif defined (SEEK_HOLE) && defined (SEEK_DATA) +extern bool +extent_scan_read (struct extent_scan *scan) +{ + off_t data_pos, hole_pos; + union { struct extent_info ei; char c[4096]; } extent_buf; + struct extent_info *ext_info =3D &extent_buf.ei; + enum { count =3D (sizeof extent_buf / sizeof *ext_info) }; + verify (count !=3D 0); + + memset (&extent_buf, 0, sizeof extent_buf); + + if (scan->scan_start =3D=3D 0) + { +# ifdef _PC_MIN_HOLE_SIZE + /* To determine if the underlaying file system support + SEEK_HOLE. If not, fall back to the standard copy. */ + if (fpathconf (scan->fd, _PC_MIN_HOLE_SIZE) < 0) + { + scan->initial_scan_failed =3D true; + return false; + } +# endif + + /* If we have been compiled on an OS that supports SEEK_HOLE + but run on an OS that does not support SEEK_HOLE, we get + EINVAL. If the underlying file system does not support the + SEEK_HOLE call, we get ENOTSUP, setting initial_scan_failed + to true to fall back to the standard copy in either case. */ + hole_pos =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE); + if (hole_pos < 0) + { + if (errno =3D=3D EINVAL || errno =3D=3D ENOTSUP) + scan->initial_scan_failed =3D true; + return false; + } + + /* Seek back to position 0 first. */ + if (hole_pos > 0) + { + if (lseek (scan->fd, (off_t) 0, SEEK_SET) < 0) + return false; + } + + struct stat sb; + if (fstat (scan->fd, &sb) < 0) + return false; + + /* This is definitely not a sparse file, we treat it as a big = extent. */ + if (hole_pos >=3D sb.st_size) + { + scan->ei_count =3D 1; + scan->ext_info =3D xnmalloc (scan->ei_count, sizeof (struct = extent_info)); + scan->ext_info[0].ext_logical =3D 0; + scan->ext_info[0].ext_length =3D sb.st_size; + scan->hit_final_extent =3D true; + return true; + } + scan->src_total_size =3D sb.st_size; + } + + unsigned int i =3D 0; + /* If lseek(2) failed and the errno is set to ENXIO, for + SEEK_DATA there are no more data regions past the supplied + offset. For SEEK_HOLE, there are no more holes past the + supplied offset. Set scan->hit_final_extent to true in + either case. */ + while (scan->scan_start < scan->src_total_size && i < count) + { + data_pos =3D lseek (scan->fd, scan->scan_start, SEEK_DATA); + if (data_pos < 0) + { + if (errno =3D=3D ENXIO) + { + scan->hit_final_extent =3D true; + break; + } + return false; + } + + hole_pos =3D lseek (scan->fd, data_pos, SEEK_HOLE); + if (hole_pos < 0) + { + if (errno =3D=3D ENXIO) + { + scan->hit_final_extent =3D true; + hole_pos =3D scan->src_total_size; + if (data_pos < hole_pos) + goto preserve_ext_info; + break; + } + return false; + } + +preserve_ext_info: + ext_info[i].ext_logical =3D data_pos; + ext_info[i].ext_length =3D hole_pos - data_pos; + scan->scan_start =3D hole_pos; + ++i; + } + + scan->ei_count =3D i; + scan->ext_info =3D xnmalloc (scan->ei_count, sizeof (struct = extent_info)); + + for (i =3D 0; i < scan->ei_count; i++) + { + assert (ext_info[i].ext_logical <=3D OFF_T_MAX); + + scan->ext_info[i].ext_logical =3D ext_info[i].ext_logical; + scan->ext_info[i].ext_length =3D ext_info[i].ext_length; + } + + return (lseek (scan->fd, (off_t) 0, SEEK_SET) < 0) ? false : true; +} #else extern bool extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED) diff --git a/src/extent-scan.h b/src/extent-scan.h index 5b4ded5..4fc05c6 100644 --- a/src/extent-scan.h +++ b/src/extent-scan.h @@ -38,6 +38,11 @@ struct extent_scan /* File descriptor of extent scan run against. */ int fd; =20 +# if defined (SEEK_DATA) && defined (SEEK_HOLE) + /* Source file size, i.e, (struct stat) &statbuf.st_size. */ + size_t src_total_size; +#endif + /* Next scan start offset. */ off_t scan_start; =20 diff --git a/tests/Makefile.am b/tests/Makefile.am index 685eb52..6c596b9 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -28,6 +28,7 @@ root_tests =3D = \ cp/cp-mv-enotsup-xattr \ cp/capability \ cp/sparse-fiemap \ + cp/sparse-lseek \ dd/skip-seek-past-dev \ install/install-C-root \ ls/capability \ diff --git a/tests/cp/sparse-lseek b/tests/cp/sparse-lseek new file mode 100755 index 0000000..5b8f2c1 --- /dev/null +++ b/tests/cp/sparse-lseek @@ -0,0 +1,56 @@ +#!/bin/sh +# Test cp --sparse=3Dalways through lseek(SEEK_DATA/SEEK_HOLE) copy + +# Copyright (C) 2010-2011 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +. "${srcdir=3D.}/init.sh"; path_prepend_ ../src +print_ver_ cp +$PERL -e 1 || skip_test_ 'you lack perl' + +zfsdisk=3DdiskX +zfspool=3Dseektest + +require_root_ + +cwd=3D$PWD +cleanup_() { zpool destroy $zfspool; } + +skip=3D0 +mkfile 128m "$cwd/$zfsdisk" || skip=3D1 + +# Check if the seektest pool is already exists +zpool list $zfspool 2>/dev/null && + skip_test_ "$zfspool already exists" + +# Create pool and verify if it is mounted automatically +zpool create $zfspool "$cwd/$zfsdisk" || skip=3D1 +zpool list $zfspool >/dev/null || skip=3D1 + +test $skip =3D 1 && skip_test_ "insufficient ZFS support" + +for i in $(seq 1 2 21); do + for j in 1 2 31 100; do + $PERL -e 'BEGIN { $n =3D '$i' * 1024; *F =3D *STDOUT }' \ + -e 'for (1..'$j') { sysseek (*F, $n, 1)' \ + -e '&& syswrite (*F, chr($_)x$n) or die "$!"}' > /$zfspool/j1 = || fail=3D1 + + cp --sparse=3Dalways /$zfspool/j1 /$zfspool/j2 || fail=3D1 + cmp /$zfspool/j1 /$zfspool/j2 || fail=3D1 + test $fail =3D 1 && break 2 + done +done + +Exit $fail --=20 1.7.4 Any comments are appreciated! Thanks, -Jeff >=20 >=20 > Thanks, > -Jeff >=20 >=20 >> Hello All, >>=20 >> This is the first try to introduce the SEEK_DATA/SEEK_HOLE support to = extent_scan module for efficient sparse file copy on ZFS, I have = delayed it for a long time, sorry for that! >>=20 >> Below is the code change lists: >> src/extent_scan.h: add a new structure item 'src_total_size' to = "struct extent_info", since I have to make use of this value to = determine >> a file is sparse of not for the initial scan. If the returns of = lseek(fd, 0, SEEK_HOLE) is equal to the src_total_size or large than it, = it means the source file >> is definitely not a sparse file or maybe it is a sparse file but it = does not make sense for proceeding scan read. >> another change in this file is the signature of extent_scan_init(), = just as I mentioned above, it need to accept the src_total_size = variable. >> src/extent_scan.c: implement the new exent_scan_read() through = SEEK_DATA/SEEK_HOLE, it will be called if those two values are defined = at . >> src/copy.c: pass src_total_size to extent_scan_init(). >>=20 >> On my test environment, Solaris10, SunOS 5.10 Generic_142910-17, I = have tried a few simple cases, they are works to me. >>=20 >> For now, I am using diff(1) to verify the copy result, does anyone = know some utilities can be used to write the test script? >> I have sent an email to ZFS DEV mail-list to ask this question = yesterday, a nice guy suggest me to use = ZDB(http://cuddletech.com/blog/?p=3D407) for that, I'm >> still study this utility now, I also noticed there is patch to add = SEEK_HOLE/SEEK_DATA support to os module in Python community, please = refer to: >> http://bugs.python.org/file19566/z.patch >> but it require very latest python build I think, so could anyone = give some other advices in this point? >>=20 >> The patch is shown as following, any help testing and comments are = appreciated!! >>=20 >>=20 >> Thanks, >> -Jeff >>=20 >>=20 >> From: Jie Liu >> Date: Thu, 17 Feb 2011 21:14:23 +0800 >> Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to = extent_scan module >>=20 >> * src/extent_scan.h: add src_total_size to struct extent_info, we = need >> to check the SEEK_HOLE result against it for initial extent scan. >> modify the extent_scan_init() signature, to add size_t = src_total_size. >> * src/extent_scan.c: implement a new extent_scan_read() through = SEEK_DATA >> and SEEK_HOLE. >> * src/copy.c: pass src_total_size to extent_scan_init(). >>=20 >> Signed-off-by: Jie Liu >> --- >> src/copy.c | 2 +- >> src/extent-scan.c | 113 = ++++++++++++++++++++++++++++++++++++++++++++++++++++- >> src/extent-scan.h | 9 +++- >> 3 files changed, 120 insertions(+), 4 deletions(-) >>=20 >> diff --git a/src/copy.c b/src/copy.c >> index 104652d..22b9911 100644 >> --- a/src/copy.c >> +++ b/src/copy.c >> @@ -306,7 +306,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, = size_t buf_size, >> We may need this at the end, for a final ftruncate. */ >> off_t dest_pos =3D 0; >> =20 >> - extent_scan_init (src_fd, &scan); >> + extent_scan_init (src_fd, src_total_size, &scan); >> =20 >> *require_normal_copy =3D false; >> bool wrote_hole_at_eof =3D true; >> diff --git a/src/extent-scan.c b/src/extent-scan.c >> index 1ba59db..ffeab7a 100644 >> --- a/src/extent-scan.c >> +++ b/src/extent-scan.c >> @@ -32,13 +32,17 @@ >> /* Allocate space for struct extent_scan, initialize the entries if >> necessary and return it as the input argument of = extent_scan_read(). */ >> extern void >> -extent_scan_init (int src_fd, struct extent_scan *scan) >> +extent_scan_init (int src_fd, size_t src_total_size, >> + struct extent_scan *scan) >> { >> scan->fd =3D src_fd; >> scan->ei_count =3D 0; >> scan->scan_start =3D 0; >> scan->initial_scan_failed =3D false; >> scan->hit_final_extent =3D false; >> +#if defined(SEEK_HOLE) && defined(SEEK_DATA) >> + scan->src_total_size =3D src_total_size; >> +#endif >> } >> =20 >> #ifdef __linux__ >> @@ -106,6 +110,113 @@ extent_scan_read (struct extent_scan *scan) >> =20 >> return true; >> } >> +#elif defined(SEEK_HOLE) && defined(SEEK_DATA) >> +extern bool >> +extent_scan_read (struct extent_scan *scan) >> +{ >> + off_t data_pos, hole_pos; >> + union { struct extent_info ei; char c[4096]; } extent_buf; >> + struct extent_info *ext_info =3D &extent_buf.ei; >> + enum { count =3D (sizeof extent_buf / sizeof *ext_info) }; >> + verify (count !=3D 0); >> + >> + memset (&extent_buf, 0, sizeof extent_buf); >> + >> + if (scan->scan_start =3D=3D 0) >> + { >> +# ifdef _PC_MIN_HOLE_SIZE >> + /* To determine if the underlaying file system support >> + SEEK_HOLE, if not, fall back to the standard copy. */ >> + if (fpathconf (scan->fd, _PC_MIN_HOLE_SIZE) < 0) >> + { >> + scan->initial_scan_failed =3D true; >> + return false; >> + } >> +# endif >> + >> + /* If we have been compiled on an OS that supports SEEK_HOLE >> + but run on an OS that does not support SEEK_HOLE, we get >> + EINVAL. If the underlying filesystem does not support the >> + SEEK_HOLE call, we get ENOTSUP, fall back to standard copy >> + in either case. */ >> + hole_pos =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE); >> + if (hole_pos < 0) >> + { >> + if (errno =3D=3D EINVAL || errno =3D=3D ENOTSUP) >> + scan->initial_scan_failed =3D true; >> + return false; >> + } >> + >> + /* Seek back to position 0 first if we detected a real hole. = */ >> + if (hole_pos > 0) >> + { >> + off_t tmp_pos; >> + tmp_pos =3D lseek (scan->fd, (off_t) 0, SEEK_SET); >> + if (tmp_pos !=3D (off_t) 0) >> + return false; >> + >> + /* The source file is definitely not a sparse file, or it >> + maybe a sparse file but SEEK_HOLE returns the source = file's >> + total size, fall back to the standard copy too. */ >> + if (hole_pos >=3D scan->src_total_size) >> + { >> + scan->initial_scan_failed =3D true; >> + return false; >> + } >> + } >> + } >> + >> + unsigned int i =3D 0; >> + /* If lseek(2) failed and the errno is set to ENXIO, for >> + SEEK_DATA there are no more data regions past the supplied >> + offset. For SEEK_HOLE, there are no more holes past the=20 >> + supplied offset. Set scan->hit_final_extent to true for >> + either case. */ >> + do { >> + data_pos =3D lseek (scan->fd, scan->scan_start, SEEK_DATA); >> + if (data_pos < 0) >> + { >> + if (errno !=3D ENXIO) >> + return false; >> + else >> + { >> + scan->hit_final_extent =3D true; >> + return true; >> + } >> + } >> + >> + hole_pos =3D lseek (scan->fd, data_pos, SEEK_HOLE); >> + if (hole_pos < 0) >> + { >> + if (errno !=3D ENXIO) >> + return false; >> + else >> + { >> + scan->hit_final_extent =3D true; >> + return true; >> + } >> + } >> + >> + ext_info[i].ext_logical =3D data_pos; >> + ext_info[i].ext_length =3D hole_pos - data_pos; >> + scan->scan_start =3D hole_pos; >> + ++i; >> + } while (scan->scan_start < scan->src_total_size && i < count); >> + >> + i--; >> + scan->ei_count =3D i; >> + scan->ext_info =3D xnmalloc (scan->ei_count, sizeof (struct = extent_info)); >> + >> + for (i =3D 0; i < scan->ei_count; i++) >> + { >> + assert (ext_info[i].ext_logical <=3D OFF_T_MAX); >> + >> + scan->ext_info[i].ext_logical =3D ext_info[i].ext_logical; >> + scan->ext_info[i].ext_length =3D ext_info[i].ext_length; >> + } >> + >> + return true;=20 >> +} >> #else >> extern bool >> extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED) >> diff --git a/src/extent-scan.h b/src/extent-scan.h >> index 4724b25..a271b95 100644 >> --- a/src/extent-scan.h >> +++ b/src/extent-scan.h >> @@ -18,7 +18,6 @@ >> =20 >> #ifndef EXTENT_SCAN_H >> # define EXTENT_SCAN_H >> - >> /* Structure used to store information of each extent. */ >> struct extent_info >> { >> @@ -38,6 +37,11 @@ struct extent_scan >> /* File descriptor of extent scan run against. */ >> int fd; >> =20 >> +#if defined(SEEK_DATA) && defined(SEEK_HOLE) >> + /* Source file size, i.e, (struct stat) &statbuf.st_size. */ >> + size_t src_total_size; >> +#endif >> + >> /* Next scan start offset. */ >> off_t scan_start; >> =20 >> @@ -55,7 +59,8 @@ struct extent_scan >> struct extent_info *ext_info; >> }; >> =20 >> -void extent_scan_init (int src_fd, struct extent_scan *scan); >> +void extent_scan_init (int src_fd, size_t src_total_size, >> + struct extent_scan *scan); >> =20 >> bool extent_scan_read (struct extent_scan *scan); >> =20 >> --=20 >> 1.7.4 >=20 --Apple-Mail-1-625867617 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
Hi All,

Please ignore the = current patch, I will submit another patch with a few fixes = soon.

Now the new patch set coming, =  

In previous post, I have tried to change = the extent_scan_init() interface by adding a new argument to indicate = the source file size,
this will reduce the overhead of call = fstat(2)  in extent_scan_read(), since the file size is = definitely needed for SEEK* stuff, however, the file size is redundant = for FIEMAP.
so I changed my idea to keep extent_scan_init() as = before,  instead, to retrieve the file size in = extent_scan_read() when launching the first scan, one benefit is, there = is nothing need to
be modified in extent_copy() for this = patch.

Tests:
=3D=3D=3D=3D
A = new test sparse-lseek was introduced in this post, it make use of the = sparse file generation function in Perl, and do `cmp` against the target = copied file.
I have also took a look at the `sdb` utility = shipped with ZFS, but did not found any interesting stuff can be used = for this test.

Test run passed on my = environment as below,

bash-3.00# make = check TESTS=3Dcp/sparse-lseek VERBOSE=3Dyes
make =  check-TESTS
make[1]: Entering directory = `/coreutils/tests'
make[2]: Entering directory = `/coreutils/tests'
PASS: = cp/sparse-lseek
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
1 test passed
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
make[2]: Leaving directory `/coreutils/tests'
make[1]: = Leaving directory `/coreutils/tests'
  GEN   =  vc_exe_in_TESTS
No differences = encountered

Manual = tests:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
1. Ensure = trailing blanks, test 0 size sparse file, non-sparse file,  sparse = file with hole start and hole end.
2. make syntax-check = failed, I have no idea of this issue at the moment,  I also tried = to run make distcheck, looks the package building, install and uninstall = procedures all passed,
but it also failed at the final = stage, am I missing something here? 

The = logs which were shown as following,
bash-3.00# make = syntax-check
GFDL_version
awk: syntax error near = line 1
awk: bailing out near line 1
make: *** = [sc_GFDL_version.z] Error 2

make = distcheck:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
= ......
make[1]: Entering directory = `/coreutils'
  GEN   =  check-ls-dircolors
make my-distcheck
make[2]: = Entering directory `/coreutils'
make = syntax-check
make[3]: Entering directory = `/coreutils'
GFDL_version
awk: syntax error near = line 1
awk: bailing out near line 1
make[3]: *** = [sc_GFDL_version.z] Error 2
make[3]: Leaving directory = `/coreutils'
make[2]: *** [my-distcheck] Error = 2
make[2]: Leaving directory `/coreutils'
make[1]: = *** [distcheck-hook] Error 2
make[1]: Leaving directory = `/coreutils'
make: *** [distcheck] Error = 1



Below is the = revised patch,

=46rom = 4f966c1fe6226f3f711faae120cd8bea78e722b8 Mon Sep 17 00:00:00 = 2001
From: Jie Liu <jeff.liu@oracle.com>
= Date: Tue, 19 Apr 2011 15:24:50 -0700
Subject: [PATCH 1/1] = copy: add SEEK_DATA/SEEK_HOLE support to extent_scan = module

* src/extent_scan.h: introduce = src_total_size to struct extent_info, we
  need it = for lseek(2) iteration.
* src/extent_scan.c: implement a new = extent_scan_read() through SEEK_DATA
  and SEEK_HOLE = if those stuff are supported.
* tests/cp/sparse-lseek: add a = new test for lseek(2) extent = copy.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
= ---
 src/extent-scan.c     |  119 = +++++++++++++++++++++++++++++++++++++++++++++++++
 src/exte= nt-scan.h     |    5 = ++
 tests/Makefile.am     |    1 = +
 tests/cp/sparse-lseek |   56 = +++++++++++++++++++++++
 4 files changed, 181 = insertions(+), 0 deletions(-)
 create mode 100755 = tests/cp/sparse-lseek

diff --git = a/src/extent-scan.c b/src/extent-scan.c
index da7eb9d..a54eca0 = 100644
--- a/src/extent-scan.c
+++ = b/src/extent-scan.c
@@ -17,7 +17,9 @@
   =  Written by Jie Liu (jeff.liu@oracle.com). =  */
 
 #include = <config.h>
+#include = <fcntl.h>
 #include = <sys/types.h>
+#include = <sys/stat.h>
 #include = <sys/ioctl.h>
 #include = <sys/utsname.h>
 #include = <assert.h>
@@ -71,6 +73,9 @@ extent_scan_init (int = src_fd, struct extent_scan *scan)
   = scan->initial_scan_failed =3D false;
   = scan->hit_final_extent =3D false;
   = scan->fm_flags =3D extent_need_sync () ? FIEMAP_FLAG_SYNC : = 0;
+#if defined (SEEK_DATA) && defined = (SEEK_HOLE)
+  scan->src_total_size =3D = 0;
+#endif
 }
 
 #i= fdef __linux__
@@ -204,6 +209,120 @@ extent_scan_read (struct = extent_scan *scan)
 
   return = true;
 }
+#elif defined (SEEK_HOLE) && = defined (SEEK_DATA)
+extern bool
+extent_scan_read = (struct extent_scan *scan)
+{
+  off_t = data_pos, hole_pos;
+  union { struct extent_info ei; = char c[4096]; } extent_buf;
+  struct extent_info = *ext_info =3D &extent_buf.ei;
+  enum { count =3D = (sizeof extent_buf / sizeof *ext_info) };
+  verify = (count !=3D 0);
+
+  memset (&extent_buf, = 0, sizeof extent_buf);
+
+  if = (scan->scan_start =3D=3D 0)
+    {
+# = ifdef _PC_MIN_HOLE_SIZE
+      /* To determine = if the underlaying file system support
+       =   SEEK_HOLE.  If not, fall back to the standard copy. =  */
+      if (fpathconf (scan->fd, = _PC_MIN_HOLE_SIZE) < 0)
+       =  {
+         =  scan->initial_scan_failed =3D true;
+     =      return false;
+       =  }
+# endif
+
+     =  /* If we have been compiled on an OS that supports = SEEK_HOLE
+         but run on an OS that = does not support SEEK_HOLE, we get
+       =   EINVAL.  If the underlying file system does not support = the
+         SEEK_HOLE call, we get = ENOTSUP, setting initial_scan_failed
+       =   to true to fall back to the standard copy in either case. =  */
+      hole_pos =3D lseek = (scan->fd, (off_t) 0, SEEK_HOLE);
+      if = (hole_pos < 0)
+        {
+ =          if (errno =3D=3D EINVAL || errno =3D=3D = ENOTSUP)
+           =  scan->initial_scan_failed =3D true;
+     =      return false;
+       =  }
+
+      /* Seek back to = position 0 first.  */
+      if (hole_pos = > 0)
+        {
+   =        if (lseek (scan->fd, (off_t) 0, SEEK_SET) = < 0)
+            return = false;
+        }
+
+ =      struct stat sb;
+      if = (fstat (scan->fd, &sb) < 0)
+       =  return false;
+
+      /* This = is definitely not a sparse file, we treat it as a big extent. =  */
+      if (hole_pos >=3D = sb.st_size)
+        {
+   =        scan->ei_count =3D 1;
+   =        scan->ext_info =3D xnmalloc = (scan->ei_count, sizeof (struct extent_info));
+   =        scan->ext_info[0].ext_logical =3D = 0;
+         =  scan->ext_info[0].ext_length =3D sb.st_size;
+   =        scan->hit_final_extent =3D = true;
+          return = true;
+        }
+     =  scan->src_total_size =3D sb.st_size;
+   =  }
+
+  unsigned int i =3D 0;
+ =  /* If lseek(2) failed and the errno is set to ENXIO, = for
+     SEEK_DATA there are no more data regions = past the supplied
+     offset.  For SEEK_HOLE, = there are no more holes past the
+     supplied = offset.  Set scan->hit_final_extent to true in
+ =     either case.  */
+  while = (scan->scan_start < scan->src_total_size && i < = count)
+    {
+     =  data_pos =3D lseek (scan->fd, scan->scan_start, = SEEK_DATA);
+      if (data_pos < = 0)
+        {
+     =      if (errno =3D=3D ENXIO)
+     =        {
+         =      scan->hit_final_extent =3D true;
+ =              break;
+ =            }
+     =      return false;
+       =  }
+
+      hole_pos =3D lseek = (scan->fd, data_pos, SEEK_HOLE);
+      if = (hole_pos < 0)
+        {
+ =          if (errno =3D=3D ENXIO)
+ =            {
+     =          scan->hit_final_extent =3D = true;
+             =  hole_pos =3D scan->src_total_size;
+     =          if (data_pos < = hole_pos)
+               =  goto preserve_ext_info;
+         =      break;
+         =    }
+          return = false;
+       =  }
+
+preserve_ext_info:
+   =    ext_info[i].ext_logical =3D data_pos;
+   =    ext_info[i].ext_length =3D hole_pos - data_pos;
+ =      scan->scan_start =3D hole_pos;
+   =    ++i;
+    }
+
+ =  scan->ei_count =3D i;
+  scan->ext_info =3D = xnmalloc (scan->ei_count, sizeof (struct = extent_info));
+
+  for (i =3D 0; i < = scan->ei_count; i++)
+    {
+   =    assert (ext_info[i].ext_logical <=3D = OFF_T_MAX);
+
+     =  scan->ext_info[i].ext_logical =3D = ext_info[i].ext_logical;
+     =  scan->ext_info[i].ext_length =3D = ext_info[i].ext_length;
+   =  }
+
+  return (lseek (scan->fd, = (off_t) 0, SEEK_SET) < 0) ? false : = true;
+}
 #else
 extern = bool
 extent_scan_read (struct extent_scan *scan = ATTRIBUTE_UNUSED)
diff --git a/src/extent-scan.h = b/src/extent-scan.h
index 5b4ded5..4fc05c6 = 100644
--- a/src/extent-scan.h
+++ = b/src/extent-scan.h
@@ -38,6 +38,11 @@ struct = extent_scan
   /* File descriptor of extent scan run = against.  */
   int = fd;
 
+# if defined (SEEK_DATA) && = defined (SEEK_HOLE)
+  /* Source file size, i.e, (struct = stat) &statbuf.st_size.  */
+  size_t = src_total_size;
+#endif
+
   /* = Next scan start offset.  */
   off_t = scan_start;
 
diff --git a/tests/Makefile.am = b/tests/Makefile.am
index 685eb52..6c596b9 = 100644
--- a/tests/Makefile.am
+++ = b/tests/Makefile.am
@@ -28,6 +28,7 @@ root_tests =3D = \
   cp/cp-mv-enotsup-xattr = \
   cp/capability = \
   cp/sparse-fiemap = \
+  cp/sparse-lseek         =                     =   \
   dd/skip-seek-past-dev = \
   install/install-C-root = \
   ls/capability = \
diff --git a/tests/cp/sparse-lseek = b/tests/cp/sparse-lseek
new file mode 100755
index = 0000000..5b8f2c1
--- /dev/null
+++ = b/tests/cp/sparse-lseek
@@ -0,0 +1,56 = @@
+#!/bin/sh
+# Test cp --sparse=3Dalways through = lseek(SEEK_DATA/SEEK_HOLE) copy
+
+# Copyright (C) = 2010-2011 Free Software Foundation, Inc.
+
+# This = program is free software: you can redistribute it and/or = modify
+# it under the terms of the GNU General Public License = as published by
+# the Free Software Foundation, either = version 3 of the License, or
+# (at your option) any later = version.
+
+# This program is distributed in the = hope that it will be useful,
+# but WITHOUT ANY WARRANTY; = without even the implied warranty of
+# MERCHANTABILITY or = FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General = Public License for more details.
+
+# You should = have received a copy of the GNU General Public License
+# = along with this program.  If not, see <http://www.gnu.org/licenses/>= .
+
+. "${srcdir=3D.}/init.sh"; path_prepend_ = ../src
+print_ver_ cp
+$PERL -e 1 || skip_test_ 'you = lack = perl'
+
+zfsdisk=3DdiskX
+zfspool=3Dseektest=
+
+require_root_
+
+cwd=3D$PWD
+cleanup_() { zpool destroy $zfspool; = }
+
+skip=3D0
+mkfile 128m "$cwd/$zfsdisk" = || skip=3D1
+
+# Check if the seektest pool is = already exists
+zpool list $zfspool 2>/dev/null = &&
+  skip_test_ "$zfspool already = exists"
+
+# Create pool and verify if it is mounted = automatically
+zpool create $zfspool "$cwd/$zfsdisk" || = skip=3D1
+zpool list $zfspool >/dev/null || = skip=3D1
+
+test $skip =3D 1 && skip_test_ = "insufficient ZFS support"
+
+for i in $(seq 1 2 = 21); do
+  for j in 1 2 31 100; do
+   =  $PERL -e 'BEGIN { $n =3D '$i' * 1024; *F =3D *STDOUT }' = \
+          -e 'for (1..'$j') { = sysseek (*F, $n, 1)' \
+          -e = '&& syswrite (*F, chr($_)x$n) or die "$!"}' > /$zfspool/j1 || = fail=3D1
+
+    cp --sparse=3Dalways = /$zfspool/j1 /$zfspool/j2 || fail=3D1
+    cmp = /$zfspool/j1 /$zfspool/j2 || fail=3D1
+    test = $fail =3D 1 && break 2
+ =  done
+done
+
+Exit = $fail
-- 
1.7.4


Any comments are = appreciated!

Thanks,
-Jeff
<= div>

Hello = All,

This is the first try to introduce the = SEEK_DATA/SEEK_HOLE support to extent_scan module for efficient sparse = file copy on ZFS,  I have delayed it for a long time, sorry for = that!

Below is the code change = lists:
src/extent_scan.h:  add a new structure item = 'src_total_size' to "struct extent_info",  since I have to make use = of this value to determine
a file is sparse of not for the = initial scan.  If the returns of lseek(fd, 0, SEEK_HOLE) is equal = to the src_total_size or large than it, it means the source = file
is definitely not a sparse file or maybe it is a sparse = file but it does not make sense for proceeding scan = read.
another change in this file is the signature of = extent_scan_init(), just as I mentioned above, it need to accept the = src_total_size variable.
src/extent_scan.c: implement the new = exent_scan_read() through SEEK_DATA/SEEK_HOLE, it will be called if = those two values are defined at <unistd.h>.
src/copy.c: = pass src_total_size to extent_scan_init().

On = my test environment,  Solaris10, SunOS 5.10 Generic_142910-17, I = have tried a few simple cases, they are works to = me.

For now, I am using diff(1) to verify the = copy result,  does anyone know some utilities can be used to write = the test script?
I have sent an email to ZFS DEV mail-list to = ask this question yesterday,  a nice guy suggest me to use ZDB(http://cuddletech.com/blog/?p= =3D407) for that, I'm
still study this utility now,   = I also noticed there is patch to add SEEK_HOLE/SEEK_DATA support to os = module in Python community,  please refer to:
but it require very latest python build I = think,  so could anyone give some other advices in this = point?

The patch is shown as following, any = help testing and comments are = appreciated!!


Thanks,
-J= eff


From: Jie Liu <jeff.liu@oracle.com>
= Date: Thu, 17 Feb 2011 21:14:23 +0800
Subject: [PATCH 1/1] = copy: add SEEK_DATA/SEEK_HOLE support to extent_scan = module

* src/extent_scan.h: add src_total_size = to struct extent_info, we need
  to check the = SEEK_HOLE result against it for initial extent = scan.
  modify the extent_scan_init() signature, to = add size_t src_total_size.
* src/extent_scan.c: implement a = new extent_scan_read() through SEEK_DATA
  and = SEEK_HOLE.
* src/copy.c: pass src_total_size to = extent_scan_init().

Signed-off-by: Jie Liu = <jeff.liu@oracle.com>
= ---
 src/copy.c        |   =  2 +-
 src/extent-scan.c |  113 = ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 src/= extent-scan.h |    9 +++-
 3 files changed, 120 = insertions(+), 4 deletions(-)

diff --git = a/src/copy.c b/src/copy.c
index 104652d..22b9911 = 100644
--- a/src/copy.c
+++ = b/src/copy.c
@@ -306,7 +306,7 @@ extent_copy (int src_fd, int = dest_fd, char *buf, size_t buf_size,
     =  We may need this at the end, for a final ftruncate. =  */
   off_t dest_pos =3D = 0;
 
-  extent_scan_init (src_fd, = &scan);
+  extent_scan_init (src_fd, src_total_size, = &scan);
 
   *require_normal_copy = =3D false;
   bool wrote_hole_at_eof =3D = true;
diff --git a/src/extent-scan.c = b/src/extent-scan.c
index 1ba59db..ffeab7a = 100644
--- a/src/extent-scan.c
+++ = b/src/extent-scan.c
@@ -32,13 +32,17 @@
 /* = Allocate space for struct extent_scan, initialize the entries = if
    necessary and return it as the input = argument of extent_scan_read().  */
 extern = void
-extent_scan_init (int src_fd, struct extent_scan = *scan)
+extent_scan_init (int src_fd, size_t = src_total_size,
+             =      struct extent_scan = *scan)
 {
   scan->fd =3D = src_fd;
   scan->ei_count =3D = 0;
   scan->scan_start =3D = 0;
   scan->initial_scan_failed =3D = false;
   scan->hit_final_extent =3D = false;
+#if defined(SEEK_HOLE) && = defined(SEEK_DATA)
+  scan->src_total_size =3D = src_total_size;
+#endif
 }
 
=
 #ifdef __linux__
@@ -106,6 +110,113 @@ = extent_scan_read (struct extent_scan = *scan)
 
   return = true;
 }
+#elif defined(SEEK_HOLE) && = defined(SEEK_DATA)
+extern bool
+extent_scan_read = (struct extent_scan *scan)
+{
+  off_t = data_pos, hole_pos;
+  union { struct extent_info ei; = char c[4096]; } extent_buf;
+  struct extent_info = *ext_info =3D &extent_buf.ei;
+  enum { count =3D = (sizeof extent_buf / sizeof *ext_info) };
+  verify = (count !=3D 0);
+
+  memset (&extent_buf, = 0, sizeof extent_buf);
+
+  if = (scan->scan_start =3D=3D 0)
+    {
+# = ifdef _PC_MIN_HOLE_SIZE
+      /* To determine = if the underlaying file system support
+       =   SEEK_HOLE, if not, fall back to the standard copy. =  */
+      if (fpathconf (scan->fd, = _PC_MIN_HOLE_SIZE) < 0)
+       =  {
+         =  scan->initial_scan_failed =3D true;
+     =      return false;
+       =  }
+# endif
+
+     =  /* If we have been compiled on an OS that supports = SEEK_HOLE
+         but run on an OS that = does not support SEEK_HOLE, we get
+       =   EINVAL.  If the underlying filesystem does not support = the
+         SEEK_HOLE call, we get = ENOTSUP, fall back to standard copy
+       =   in either case.  */
+      hole_pos = =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE);
+     =  if (hole_pos < 0)
+       =  {
+          if (errno =3D=3D = EINVAL || errno =3D=3D ENOTSUP)
+         =    scan->initial_scan_failed =3D true;
+   =        return false;
+     =    }
+
+      /* Seek back = to position 0 first if we detected a real hole.  */
+ =      if (hole_pos > 0)
+     =    {
+          off_t = tmp_pos;
+          tmp_pos =3D lseek = (scan->fd, (off_t) 0, SEEK_SET);
+       =    if (tmp_pos !=3D (off_t) 0)
+     =          return false;
+
+ =          /* The source file is definitely not a = sparse file, or it
+             = maybe a sparse file but SEEK_HOLE returns the source file's
+ =             total size, fall back to the = standard copy too.  */
+         =  if (hole_pos >=3D scan->src_total_size)
+   =          {
+       =        scan->initial_scan_failed =3D = true;
+              return = false;
+           =  }
+        }
+   =  }
+
+  unsigned int i =3D 0;
+ =  /* If lseek(2) failed and the errno is set to ENXIO, = for
+     SEEK_DATA there are no more data regions = past the supplied
+     offset.  For SEEK_HOLE, = there are no more holes past the 
+     = supplied offset.  Set scan->hit_final_extent to true = for
+     either case.  */
+  do = {
+    data_pos =3D lseek (scan->fd, = scan->scan_start, SEEK_DATA);
+    if (data_pos = < 0)
+      {
+     =    if (errno !=3D ENXIO)
+       =    return false;
+       =  else
+          {
+ =            scan->hit_final_extent =3D = true;
+            return = true;
+          }
+   =    }
+
+    hole_pos =3D lseek = (scan->fd, data_pos, SEEK_HOLE);
+    if = (hole_pos < 0)
+      {
+   =      if (errno !=3D ENXIO)
+     =      return false;
+       =  else
+          {
+ =            scan->hit_final_extent =3D = true;
+            return = true;
+          }
+   =    }
+
+   =  ext_info[i].ext_logical =3D data_pos;
+   =  ext_info[i].ext_length =3D hole_pos - data_pos;
+   =  scan->scan_start =3D hole_pos;
+   =  ++i;
+  } while (scan->scan_start < = scan->src_total_size && i < = count);
+
+  i--;
+ =  scan->ei_count =3D i;
+  scan->ext_info =3D = xnmalloc (scan->ei_count, sizeof (struct = extent_info));
+
+  for (i =3D 0; i < = scan->ei_count; i++)
+    {
+   =    assert (ext_info[i].ext_logical <=3D = OFF_T_MAX);
+
+     =  scan->ext_info[i].ext_logical =3D = ext_info[i].ext_logical;
+     =  scan->ext_info[i].ext_length =3D = ext_info[i].ext_length;
+   =  }
+
+  return = true; 
+}
 #else
 extern = bool
 extent_scan_read (struct extent_scan *scan = ATTRIBUTE_UNUSED)
diff --git a/src/extent-scan.h = b/src/extent-scan.h
index 4724b25..a271b95 = 100644
--- a/src/extent-scan.h
+++ = b/src/extent-scan.h
@@ -18,7 +18,6 = @@
 
 #ifndef = EXTENT_SCAN_H
 # define = EXTENT_SCAN_H
-
 /* Structure used to store = information of each extent.  */
 struct = extent_info
 {
@@ -38,6 +37,11 @@ struct = extent_scan
   /* File descriptor of extent scan run = against.  */
   int = fd;
 
+#if defined(SEEK_DATA) && = defined(SEEK_HOLE)
+  /* Source file size, i.e, (struct = stat) &statbuf.st_size.  */
+  size_t = src_total_size;
+#endif
+
   /* = Next scan start offset.  */
   off_t = scan_start;
 
@@ -55,7 +59,8 @@ struct = extent_scan
   struct extent_info = *ext_info;
 };
 
-void = extent_scan_init (int src_fd, struct extent_scan *scan);
+void = extent_scan_init (int src_fd, size_t src_total_size,
+   =                     = struct extent_scan *scan);
 
 bool = extent_scan_read (struct extent_scan = *scan);
 
-- 
1.7.4




= --Apple-Mail-1-625867617-- From unknown Sat Jul 12 02:58:11 2025 X-Loop: help-debbugs@gnu.org Subject: bug#8061: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module Resent-From: Jim Meyering Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Tue, 19 Apr 2011 09:05:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 8061 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Jeff liu Cc: =?UTF-8?Q?P=C3=A1draig?= Brady , bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.130320384317881 (code B ref -1); Tue, 19 Apr 2011 09:05:02 +0000 Received: (at submit) by debbugs.gnu.org; 19 Apr 2011 09:04:03 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QC6qo-0004eM-RU for submit@debbugs.gnu.org; Tue, 19 Apr 2011 05:04:02 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QC6qn-0004du-0l for submit@debbugs.gnu.org; Tue, 19 Apr 2011 05:04:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QC6qg-0006t0-Qk for submit@debbugs.gnu.org; Tue, 19 Apr 2011 05:03:55 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:41876) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QC6qg-0006sw-P5 for submit@debbugs.gnu.org; Tue, 19 Apr 2011 05:03:54 -0400 Received: from eggs.gnu.org ([140.186.70.92]:35769) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QC6qc-0000cA-CL for bug-coreutils@gnu.org; Tue, 19 Apr 2011 05:03:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QC6qX-0006rM-Mt for bug-coreutils@gnu.org; Tue, 19 Apr 2011 05:03:50 -0400 Received: from mx.meyering.net ([82.230.74.64]:40748) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QC6qX-0006rA-Cm for bug-coreutils@gnu.org; Tue, 19 Apr 2011 05:03:45 -0400 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id B3F9C60113; Tue, 19 Apr 2011 11:03:44 +0200 (CEST) From: Jim Meyering In-Reply-To: <3E3FBE56-4D89-44A4-94ED-13F3D1F693A3@oracle.com> (Jeff liu's message of "Tue, 19 Apr 2011 16:51:38 +0800") References: <2DB776C1-EF34-423D-8BE5-71C2F49DFF01@oracle.com> <3E3FBE56-4D89-44A4-94ED-13F3D1F693A3@oracle.com> Date: Tue, 19 Apr 2011 11:03:44 +0200 Message-ID: <877haq1ukv.fsf@rho.meyering.net> Lines: 11 MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -5.9 (-----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.9 (-----) Jeff liu wrote: ... > Below is the revised patch, > > From 4f966c1fe6226f3f711faae120cd8bea78e722b8 Mon Sep 17 00:00:00 2001 > From: Jie Liu > Date: Tue, 19 Apr 2011 15:24:50 -0700 > Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to extent_scan module Thank you for the update. I will look at it in a week or so if no one else does. From unknown Sat Jul 12 02:58:11 2025 X-Loop: help-debbugs@gnu.org Subject: bug#8061: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module Resent-From: Jeff Liu Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 26 Aug 2011 09:47:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 8061 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 8061@debbugs.gnu.org Cc: zfs-discuss@opensolaris.org, chris.mason@oracle.com, linux-btrfs@vger.kernel.org X-Debbugs-Original-To: bug-coreutils@gnu.org Reply-To: jeff.liu@oracle.com Received: via spool by submit@debbugs.gnu.org id=B.13143519994658 (code B ref -1); Fri, 26 Aug 2011 09:47:01 +0000 Received: (at submit) by debbugs.gnu.org; 26 Aug 2011 09:46:39 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Qwszm-0001D4-0v for submit@debbugs.gnu.org; Fri, 26 Aug 2011 05:46:39 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Qwszh-0001Cv-Lx for submit@debbugs.gnu.org; Fri, 26 Aug 2011 05:46:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Qwsx1-0005mZ-I9 for submit@debbugs.gnu.org; Fri, 26 Aug 2011 05:43:49 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:47007) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qwsx1-0005mV-Fo for submit@debbugs.gnu.org; Fri, 26 Aug 2011 05:43:47 -0400 Received: from eggs.gnu.org ([140.186.70.92]:37994) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qwswz-0005Cg-0o for bug-coreutils@gnu.org; Fri, 26 Aug 2011 05:43:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Qwsww-0005le-GX for bug-coreutils@gnu.org; Fri, 26 Aug 2011 05:43:44 -0400 Received: from acsinet15.oracle.com ([141.146.126.227]:53320) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qwsww-0005lG-3i for bug-coreutils@gnu.org; Fri, 26 Aug 2011 05:43:42 -0400 Received: from rtcsinet21.oracle.com (rtcsinet21.oracle.com [66.248.204.29]) by acsinet15.oracle.com (Switch-3.4.4/Switch-3.4.4) with ESMTP id p7Q9haUr007904 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 26 Aug 2011 09:43:38 GMT Received: from acsmt357.oracle.com (acsmt357.oracle.com [141.146.40.157]) by rtcsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id p7Q9hZvP012634 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 26 Aug 2011 09:43:35 GMT Received: from abhmt108.oracle.com (abhmt108.oracle.com [141.146.116.60]) by acsmt357.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id p7Q9hTgu010063; Fri, 26 Aug 2011 04:43:29 -0500 Received: from [192.168.1.102] (/123.119.98.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 26 Aug 2011 02:43:28 -0700 Message-ID: <4E576ABC.4080705@oracle.com> Date: Fri, 26 Aug 2011 17:43:24 +0800 From: Jeff Liu Organization: Oracle User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11 MIME-Version: 1.0 References: <2DB776C1-EF34-423D-8BE5-71C2F49DFF01@oracle.com> <3E3FBE56-4D89-44A4-94ED-13F3D1F693A3@oracle.com> In-Reply-To: <3E3FBE56-4D89-44A4-94ED-13F3D1F693A3@oracle.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: rtcsinet21.oracle.com [66.248.204.29] X-CT-RefId: str=0001.0A090207.4E576ACA.020F,ss=1,re=-2.300,fgs=0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 1) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -6.4 (------) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.4 (------) Dear All, As the SEEK_HOLE/SEEK_DATA has been implemented on Btrfs in 3.1.0+ and Glibc, I have worked out a new version for your guys review. Changes: ====== extent_scan.[c|h]: 1. add a function pointer to "struct extent_scan": /* Scan method. */ bool (*extent_scan) (struct extent_scan *scan); 2. add a structure item to indicate seek back issue maybe occurred: /* Failed to seek back to position 0 or not. */ bool seek_back_failed; If the file system support SEEK_HOLE, the file offset will pointed to somewhere > 0, so need to seek back to the beginning after support_seek_hole() checking for the proceeding extent scan. 3. rename extent_scan to fiemap_extent_scan. 4. add a new seek_extent_scan method. 5. add a new method to check SEEK stuff is supported or not. if the underlaying file system support SEEK_HOLE, assign seek_extent_scan to scan->extent_scan, or else, fiemap_extent_scan() will be assigned to it. copy.c: 1. pass src_total_size to extent_scan_init (). 2. for the first round extent scan, we need to seek back to position 0 too, if the data extent is started at the beginning of source file. Tested: ====== 1. make syntax-check. 2. verify a copied sparse file with 4697 extents on btrfs jeff@pibroch:~/gnu/coreutils$ python -c "f=open('/btrfs/sparse_test', 'w'); [(f.seek(x) or f.write(str(x))) for x in range(1, 1000000000, 99999)]; f.close()" jeff@pibroch:~/gnu/coreutils$ ./src/cp --sparse=always /btrfs/sparse_test /btrfs/sp.seek jeff@pibroch:~/gnu/coreutils$ cmp /btrfs/sparse_test /btrfs/sp.seek jeff@pibroch:~/gnu/coreutils$ echo $? 0 Also, the previous patch was developed on Solaris ZFS, but my test env was lost now. :( so anyone can help testing it on ZFS would be appreciated!! From 5892744f977a06b5557042682c39fd007eec8030 Mon Sep 17 00:00:00 2001 From: Jie Liu Date: Fri, 26 Aug 2011 17:11:33 +0800 Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to extent_scan module * src/extent_scan.h: introduce src_total_size to struct extent_info, we need it for lseek(2) iteration, add seek_back_failed to indicate that the seek back to position 0 failed in seek captical check or not, and it can be used for further debugging IMHO. add bool (*extent_scan) (struct extent_scan *scan) to switch the scan method. * src/extent_scan.c: implement a new seek_scan_read() through SEEK_DATA and SEEK_HOLE. * src/copy.c: a few code changes according to the new extent call interface. Signed-off-by: Jie Liu --- src/copy.c | 26 +++++++++- src/extent-scan.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++-- src/extent-scan.h | 16 +++++- 3 files changed, 183 insertions(+), 8 deletions(-) diff --git a/src/copy.c b/src/copy.c index bc4d7bd..c5e8714 100644 --- a/src/copy.c +++ b/src/copy.c @@ -309,7 +309,18 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, We may need this at the end, for a final ftruncate. */ off_t dest_pos = 0; - extent_scan_init (src_fd, &scan); + bool init_ok = extent_scan_init (src_fd, src_total_size, &scan); + /* If the underlaying file system support SEEK_HOLE, but failed + to seek back to position 0 after the initial seek checking, + let extent copy failure in this case. */ + if (! init_ok) + { + if (scan.seek_back_failed) + error (0, errno, + _("%s: extent_scan_init () failed, cannot seek back to position 0"), + quote (src_name)); + return false; + } *require_normal_copy = false; bool wrote_hole_at_eof = true; @@ -356,6 +367,19 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, wrote_hole_at_eof = false; + /* For the first round scan, if the data extent start at the + beginning, and the current file pointer is not at position + 0, set it back first, otherwise, we'll read from undesired + file offset. */ + if (ext_start == 0 && lseek (src_fd, 0, SEEK_CUR) != 0) + { + if (lseek (src_fd, 0, SEEK_SET) < 0) + { + error (0, errno, _("cannot lseek %s"), quote (src_name)); + return false; + } + } + if (hole_size) { if (lseek (src_fd, ext_start, SEEK_SET) < 0) diff --git a/src/extent-scan.c b/src/extent-scan.c index 37445b8..c835b63 100644 --- a/src/extent-scan.c +++ b/src/extent-scan.c @@ -27,6 +27,12 @@ #include "fiemap.h" #include "xstrtol.h" +#ifndef SEEK_DATA +# define SEEK_DATA 3 /* Seek to next data. */ +#endif +#ifndef SEEK_HOLE +# define SEEK_HOLE 4 /* Seek to next hole. */ +#endif /* Work around Linux kernel issues on BTRFS and EXT4 before 2.6.39. FIXME: remove in 2013, or whenever we're pretty confident @@ -65,10 +71,48 @@ extent_need_sync (void) #endif } +static bool +support_seek_hole (struct extent_scan *scan) +{ + off_t hole_pos; + +# ifdef _PC_MIN_HOLE_SIZE + /* To determine if the underlaying file system support + SEEK_HOLE, if not, fall back to fiemap extent scan or + the standard copy. */ + if (fpathconf (scan->fd, _PC_MIN_HOLE_SIZE) < 0) + return false; +# endif + + /* Inspired by STAR, If we have been compiled on an OS that + supports SEEK_HOLE but run on an OS that does not support + SEEK_HOLE, we get EINVAL. If the underlying file system + does not support the SEEK_HOLE call, we get ENOTSUP, fall + back to the fiemap scan or standard copy in either case. */ + hole_pos = lseek (scan->fd, (off_t) 0, SEEK_HOLE); + if (hole_pos < 0) + { + if (errno == EINVAL || errno == ENOTSUP) + return false; + } + + /* Seek back to position 0 first if we detected a real hole. */ + if (hole_pos > 0) + { + if (lseek (scan->fd, (off_t) 0, SEEK_SET) != 0) + { + scan->seek_back_failed = true; + return false; + } + } + + return true; +} + /* Allocate space for struct extent_scan, initialize the entries if necessary and return it as the input argument of extent_scan_read(). */ -extern void -extent_scan_init (int src_fd, struct extent_scan *scan) +extern bool +extent_scan_init (int src_fd, size_t src_total_size, struct extent_scan *scan) { scan->fd = src_fd; scan->ei_count = 0; @@ -76,17 +120,110 @@ extent_scan_init (int src_fd, struct extent_scan *scan) scan->scan_start = 0; scan->initial_scan_failed = false; scan->hit_final_extent = false; - scan->fm_flags = extent_need_sync () ? FIEMAP_FLAG_SYNC : 0; + scan->seek_back_failed = false; + + if (support_seek_hole (scan)) + { + scan->src_total_size = src_total_size; + scan->extent_scan = seek_extent_scan; + } + else + { + /* The underlying file system support SEEK_HOLE, but failed + to seek back to position 0 after seek checking, Oops! */ + if (scan->seek_back_failed) + return false; + + scan->extent_scan = fiemap_extent_scan; + scan->fm_flags = extent_need_sync () ? FIEMAP_FLAG_SYNC : 0; + } + + return true; +} + +extern inline bool +extent_scan_read (struct extent_scan *scan) +{ + return scan->extent_scan (scan); +} + +extern bool +seek_extent_scan (struct extent_scan *scan) +{ + off_t data_pos, hole_pos; + union { struct extent_info ei; char c[4096]; } extent_buf; + struct extent_info *ext_info = &extent_buf.ei; + enum { count = (sizeof extent_buf / sizeof *ext_info) }; + verify (count != 0); + + memset (&extent_buf, 0, sizeof extent_buf); + + unsigned int i = 0; + /* If lseek(2) failed and the errno is set to ENXIO, for + SEEK_DATA there are no more data regions past the supplied + offset. For SEEK_HOLE, there are no more holes past the + supplied offset. Set scan->hit_final_extent to true for + either case. */ + do { + data_pos = lseek (scan->fd, scan->scan_start, SEEK_DATA); + if (data_pos < 0) + { + if (errno == ENXIO) + { + scan->hit_final_extent = true; + return true; + } + return false; + } + + /* We hit the final extent if the data offset is equal to + the source file size. */ + if (data_pos == scan->src_total_size) + { + scan->hit_final_extent = true; + break; + } + + hole_pos = lseek (scan->fd, data_pos, SEEK_HOLE); + if (hole_pos < 0) + { + if (errno != ENXIO) + return false; + else + { + scan->hit_final_extent = true; + return true; + } + } + + ext_info[i].ext_logical = data_pos; + ext_info[i].ext_length = hole_pos - data_pos; + scan->scan_start = hole_pos; + ++i; + } while (scan->scan_start < scan->src_total_size && i < count); + + scan->ei_count = i; + scan->ext_info = xnmalloc (scan->ei_count, sizeof (struct extent_info)); + + for (i = 0; i < scan->ei_count; i++) + { + assert (ext_info[i].ext_logical <= OFF_T_MAX); + + scan->ext_info[i].ext_logical = ext_info[i].ext_logical; + scan->ext_info[i].ext_length = ext_info[i].ext_length; + } + + return true; } -#ifdef __linux__ +#if defined __linux__ # ifndef FS_IOC_FIEMAP # define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap) # endif /* Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to obtain a map of file extents excluding holes. */ extern bool -extent_scan_read (struct extent_scan *scan) +fiemap_extent_scan (struct extent_scan *scan) { unsigned int si = 0; struct extent_info *last_ei IF_LINT ( = scan->ext_info); @@ -212,7 +349,7 @@ extent_scan_read (struct extent_scan *scan) } #else extern bool -extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED) +fiemap_extent_scan (struct extent_scan *scan ATTRIBUTE_UNUSED) { scan->initial_scan_failed = true; errno = ENOTSUP; diff --git a/src/extent-scan.h b/src/extent-scan.h index 5b4ded5..e751810 100644 --- a/src/extent-scan.h +++ b/src/extent-scan.h @@ -38,6 +38,9 @@ struct extent_scan /* File descriptor of extent scan run against. */ int fd; + /* Source file size, i.e, (struct stat) &statbuf.st_size. */ + size_t src_total_size; + /* Next scan start offset. */ off_t scan_start; @@ -47,6 +50,9 @@ struct extent_scan /* How many extent info returned for a scan. */ uint32_t ei_count; + /* Failed to seek back to position 0 or not. */ + bool seek_back_failed; + /* If true, fall back to a normal copy, either set by the failure of ioctl(2) for FIEMAP or lseek(2) with SEEK_DATA. */ bool initial_scan_failed; @@ -54,14 +60,22 @@ struct extent_scan /* If true, the total extent scan per file has been finished. */ bool hit_final_extent; + /* Scan method. */ + bool (*extent_scan) (struct extent_scan *scan); + /* Extent information: a malloc'd array of ei_count structs. */ struct extent_info *ext_info; }; -void extent_scan_init (int src_fd, struct extent_scan *scan); +bool extent_scan_init (int src_fd, size_t src_total_size, + struct extent_scan *scan); bool extent_scan_read (struct extent_scan *scan); +bool fiemap_extent_scan (struct extent_scan *scan); + +bool seek_extent_scan (struct extent_scan *scan); + static inline void extent_scan_free (struct extent_scan *scan) { -- 1.7.4.1 Thanks, -Jeff On 04/19/2011 04:51 PM, Jeff liu wrote: >> Hi All, >> >> Please ignore the current patch, I will submit another patch with a few fixes soon. > Now the new patch set coming, > > In previous post, I have tried to change the extent_scan_init() interface by adding a new argument to indicate the source file size, > this will reduce the overhead of call fstat(2) in extent_scan_read(), since the file size is definitely needed for SEEK* stuff, however, the file size is redundant for FIEMAP. > so I changed my idea to keep extent_scan_init() as before, instead, to retrieve the file size in extent_scan_read() when launching the first scan, one benefit is, there is nothing need to > be modified in extent_copy() for this patch. > > Tests: > ==== > A new test sparse-lseek was introduced in this post, it make use of the sparse file generation function in Perl, and do `cmp` against the target copied file. > I have also took a look at the `sdb` utility shipped with ZFS, but did not found any interesting stuff can be used for this test. > > Test run passed on my environment as below, > > bash-3.00# make check TESTS=cp/sparse-lseek VERBOSE=yes > make check-TESTS > make[1]: Entering directory `/coreutils/tests' > make[2]: Entering directory `/coreutils/tests' > PASS: cp/sparse-lseek > ============= > 1 test passed > ============= > make[2]: Leaving directory `/coreutils/tests' > make[1]: Leaving directory `/coreutils/tests' > GEN vc_exe_in_TESTS > No differences encountered > > Manual tests: > =========== > 1. Ensure trailing blanks, test 0 size sparse file, non-sparse file, sparse file with hole start and hole end. > 2. make syntax-check failed, I have no idea of this issue at the moment, I also tried to run make distcheck, looks the package building, install and uninstall procedures all passed, > but it also failed at the final stage, am I missing something here? > > The logs which were shown as following, > bash-3.00# make syntax-check > GFDL_version > awk: syntax error near line 1 > awk: bailing out near line 1 > make: *** [sc_GFDL_version.z] Error 2 > > make distcheck: > ============== > ...... > make[1]: Entering directory `/coreutils' > GEN check-ls-dircolors > make my-distcheck > make[2]: Entering directory `/coreutils' > make syntax-check > make[3]: Entering directory `/coreutils' > GFDL_version > awk: syntax error near line 1 > awk: bailing out near line 1 > make[3]: *** [sc_GFDL_version.z] Error 2 > make[3]: Leaving directory `/coreutils' > make[2]: *** [my-distcheck] Error 2 > make[2]: Leaving directory `/coreutils' > make[1]: *** [distcheck-hook] Error 2 > make[1]: Leaving directory `/coreutils' > make: *** [distcheck] Error 1 > > > > Below is the revised patch, > > From 4f966c1fe6226f3f711faae120cd8bea78e722b8 Mon Sep 17 00:00:00 2001 > From: Jie Liu > Date: Tue, 19 Apr 2011 15:24:50 -0700 > Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to extent_scan module > > * src/extent_scan.h: introduce src_total_size to struct extent_info, we > need it for lseek(2) iteration. > * src/extent_scan.c: implement a new extent_scan_read() through SEEK_DATA > and SEEK_HOLE if those stuff are supported. > * tests/cp/sparse-lseek: add a new test for lseek(2) extent copy. > > Signed-off-by: Jie Liu > --- > src/extent-scan.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++ > src/extent-scan.h | 5 ++ > tests/Makefile.am | 1 + > tests/cp/sparse-lseek | 56 +++++++++++++++++++++++ > 4 files changed, 181 insertions(+), 0 deletions(-) > create mode 100755 tests/cp/sparse-lseek > > diff --git a/src/extent-scan.c b/src/extent-scan.c > index da7eb9d..a54eca0 100644 > --- a/src/extent-scan.c > +++ b/src/extent-scan.c > @@ -17,7 +17,9 @@ > Written by Jie Liu (jeff.liu@oracle.com). */ > > #include > +#include > #include > +#include > #include > #include > #include > @@ -71,6 +73,9 @@ extent_scan_init (int src_fd, struct extent_scan *scan) > scan->initial_scan_failed = false; > scan->hit_final_extent = false; > scan->fm_flags = extent_need_sync () ? FIEMAP_FLAG_SYNC : 0; > +#if defined (SEEK_DATA)&& defined (SEEK_HOLE) > + scan->src_total_size = 0; > +#endif > } > > #ifdef __linux__ > @@ -204,6 +209,120 @@ extent_scan_read (struct extent_scan *scan) > > return true; > } > +#elif defined (SEEK_HOLE)&& defined (SEEK_DATA) > +extern bool > +extent_scan_read (struct extent_scan *scan) > +{ > + off_t data_pos, hole_pos; > + union { struct extent_info ei; char c[4096]; } extent_buf; > + struct extent_info *ext_info =&extent_buf.ei; > + enum { count = (sizeof extent_buf / sizeof *ext_info) }; > + verify (count != 0); > + > + memset (&extent_buf, 0, sizeof extent_buf); > + > + if (scan->scan_start == 0) > + { > +# ifdef _PC_MIN_HOLE_SIZE > + /* To determine if the underlaying file system support > + SEEK_HOLE. If not, fall back to the standard copy. */ > + if (fpathconf (scan->fd, _PC_MIN_HOLE_SIZE)< 0) > + { > + scan->initial_scan_failed = true; > + return false; > + } > +# endif > + > + /* If we have been compiled on an OS that supports SEEK_HOLE > + but run on an OS that does not support SEEK_HOLE, we get > + EINVAL. If the underlying file system does not support the > + SEEK_HOLE call, we get ENOTSUP, setting initial_scan_failed > + to true to fall back to the standard copy in either case. */ > + hole_pos = lseek (scan->fd, (off_t) 0, SEEK_HOLE); > + if (hole_pos< 0) > + { > + if (errno == EINVAL || errno == ENOTSUP) > + scan->initial_scan_failed = true; > + return false; > + } > + > + /* Seek back to position 0 first. */ > + if (hole_pos> 0) > + { > + if (lseek (scan->fd, (off_t) 0, SEEK_SET)< 0) > + return false; > + } > + > + struct stat sb; > + if (fstat (scan->fd,&sb)< 0) > + return false; > + > + /* This is definitely not a sparse file, we treat it as a big extent. */ > + if (hole_pos>= sb.st_size) > + { > + scan->ei_count = 1; > + scan->ext_info = xnmalloc (scan->ei_count, sizeof (struct extent_info)); > + scan->ext_info[0].ext_logical = 0; > + scan->ext_info[0].ext_length = sb.st_size; > + scan->hit_final_extent = true; > + return true; > + } > + scan->src_total_size = sb.st_size; > + } > + > + unsigned int i = 0; > + /* If lseek(2) failed and the errno is set to ENXIO, for > + SEEK_DATA there are no more data regions past the supplied > + offset. For SEEK_HOLE, there are no more holes past the > + supplied offset. Set scan->hit_final_extent to true in > + either case. */ > + while (scan->scan_start< scan->src_total_size&& i< count) > + { > + data_pos = lseek (scan->fd, scan->scan_start, SEEK_DATA); > + if (data_pos< 0) > + { > + if (errno == ENXIO) > + { > + scan->hit_final_extent = true; > + break; > + } > + return false; > + } > + > + hole_pos = lseek (scan->fd, data_pos, SEEK_HOLE); > + if (hole_pos< 0) > + { > + if (errno == ENXIO) > + { > + scan->hit_final_extent = true; > + hole_pos = scan->src_total_size; > + if (data_pos< hole_pos) > + goto preserve_ext_info; > + break; > + } > + return false; > + } > + > +preserve_ext_info: > + ext_info[i].ext_logical = data_pos; > + ext_info[i].ext_length = hole_pos - data_pos; > + scan->scan_start = hole_pos; > + ++i; > + } > + > + scan->ei_count = i; > + scan->ext_info = xnmalloc (scan->ei_count, sizeof (struct extent_info)); > + > + for (i = 0; i< scan->ei_count; i++) > + { > + assert (ext_info[i].ext_logical<= OFF_T_MAX); > + > + scan->ext_info[i].ext_logical = ext_info[i].ext_logical; > + scan->ext_info[i].ext_length = ext_info[i].ext_length; > + } > + > + return (lseek (scan->fd, (off_t) 0, SEEK_SET)< 0) ? false : true; > +} > #else > extern bool > extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED) > diff --git a/src/extent-scan.h b/src/extent-scan.h > index 5b4ded5..4fc05c6 100644 > --- a/src/extent-scan.h > +++ b/src/extent-scan.h > @@ -38,6 +38,11 @@ struct extent_scan > /* File descriptor of extent scan run against. */ > int fd; > > +# if defined (SEEK_DATA)&& defined (SEEK_HOLE) > + /* Source file size, i.e, (struct stat)&statbuf.st_size. */ > + size_t src_total_size; > +#endif > + > /* Next scan start offset. */ > off_t scan_start; > > diff --git a/tests/Makefile.am b/tests/Makefile.am > index 685eb52..6c596b9 100644 > --- a/tests/Makefile.am > +++ b/tests/Makefile.am > @@ -28,6 +28,7 @@ root_tests = \ > cp/cp-mv-enotsup-xattr \ > cp/capability \ > cp/sparse-fiemap \ > + cp/sparse-lseek \ > dd/skip-seek-past-dev \ > install/install-C-root \ > ls/capability \ > diff --git a/tests/cp/sparse-lseek b/tests/cp/sparse-lseek > new file mode 100755 > index 0000000..5b8f2c1 > --- /dev/null > +++ b/tests/cp/sparse-lseek > @@ -0,0 +1,56 @@ > +#!/bin/sh > +# Test cp --sparse=always through lseek(SEEK_DATA/SEEK_HOLE) copy > + > +# Copyright (C) 2010-2011 Free Software Foundation, Inc. > + > +# This program is free software: you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation, either version 3 of the License, or > +# (at your option) any later version. > + > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > + > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see. > + > +. "${srcdir=.}/init.sh"; path_prepend_ ../src > +print_ver_ cp > +$PERL -e 1 || skip_test_ 'you lack perl' > + > +zfsdisk=diskX > +zfspool=seektest > + > +require_root_ > + > +cwd=$PWD > +cleanup_() { zpool destroy $zfspool; } > + > +skip=0 > +mkfile 128m "$cwd/$zfsdisk" || skip=1 > + > +# Check if the seektest pool is already exists > +zpool list $zfspool 2>/dev/null&& > + skip_test_ "$zfspool already exists" > + > +# Create pool and verify if it is mounted automatically > +zpool create $zfspool "$cwd/$zfsdisk" || skip=1 > +zpool list $zfspool>/dev/null || skip=1 > + > +test $skip = 1&& skip_test_ "insufficient ZFS support" > + > +for i in $(seq 1 2 21); do > + for j in 1 2 31 100; do > + $PERL -e 'BEGIN { $n = '$i' * 1024; *F = *STDOUT }' \ > + -e 'for (1..'$j') { sysseek (*F, $n, 1)' \ > + -e '&& syswrite (*F, chr($_)x$n) or die "$!"}'> /$zfspool/j1 || fail=1 > + > + cp --sparse=always /$zfspool/j1 /$zfspool/j2 || fail=1 > + cmp /$zfspool/j1 /$zfspool/j2 || fail=1 > + test $fail = 1&& break 2 > + done > +done > + > +Exit $fail From unknown Sat Jul 12 02:58:11 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Jeff liu Subject: bug#8061: closed (Re: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module) Message-ID: References: <3bc2236c-4e41-7f55-5a63-f4221d38041b@cs.ucla.edu> <2DB776C1-EF34-423D-8BE5-71C2F49DFF01@oracle.com> X-Gnu-PR-Message: they-closed 8061 X-Gnu-PR-Package: coreutils Reply-To: 8061@debbugs.gnu.org Date: Fri, 26 Jun 2020 02:16:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1593137762-14034-1" This is a multi-part message in MIME format... ------------=_1593137762-14034-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #8061: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module which was filed against the coreutils package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 8061@debbugs.gnu.org. --=20 8061: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D8061 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1593137762-14034-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 8061-done) by debbugs.gnu.org; 26 Jun 2020 02:15:35 +0000 Received: from localhost ([127.0.0.1]:41631 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jodtu-0003dg-77 for submit@debbugs.gnu.org; Thu, 25 Jun 2020 22:15:35 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:57692) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jodtq-0003dQ-P8 for 8061-done@debbugs.gnu.org; Thu, 25 Jun 2020 22:15:32 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 6D0711600D1; Thu, 25 Jun 2020 19:15:24 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Hfq_u2YJ3r_2; Thu, 25 Jun 2020 19:15:21 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C3CB61600E0; Thu, 25 Jun 2020 19:15:21 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 0K2r3MmKJcyD; Thu, 25 Jun 2020 19:15:21 -0700 (PDT) Received: from [192.168.1.9] (cpe-75-82-69-226.socal.res.rr.com [75.82.69.226]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 75C441600D1; Thu, 25 Jun 2020 19:15:21 -0700 (PDT) To: Jeff liu From: Paul Eggert Subject: Re: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDUGdRVEFRSUFL QVVDVElCeQpaQUliQXdVSkVzd0RBQVlMQ1FnSEF3SUdGUWdDQ1FvTEJCWUNBd0VDSGdFQ0Y0 QUFDZ2tRN1pmcERtS3FmalJSCkd3LytJajAzZGhZZllsL2dYVlJpdXpWMWdHcmJIayt0bmZy SS9DN2ZBZW9GelE1dFZnVmluU2hhUGtabzBIVFAKZjE4eDZJREVkQWlPOE1xbzF5cDBDdEht ekdNQ0o1MG80R3JnZmpscjZnLyt2dEVPS2JobGVzek4yWHBKdnB3TQoyUWdHdm4vbGFUTFV1 OFBIOWFSV1RzN3FKSlpLS0tBYjRzeFljOTJGZWhQdTZGT0QwZERpeWhsREFxNGxPVjJtCmRC cHpRYmlvam9aelFMTVF3anBnQ1RLMjU3MmVLOUVPRVF5U1VUaFhyU0l6NkFTZW5wNE5ZVEZI czl0dUpRdlgKazlnWkRkUFNsM2JwKzQ3ZEd4bHhFV0xwQklNN3pJT053NGtzNGF6Z1Q4bnZE WnhBNUlaSHR2cUJsSkxCT2JZWQowTGU2MVdwMHkzVGxCRGgycWRLOGVZTDQyNlc0c2NFTVN1 aWc1Z2I4T0F0UWlCVzZrMnNHVXh4ZWl2OG92V3U4CllBWmdLSmZ1b1dJK3VSbk1FZGRydVk4 SnNvTTU0S2FLdlppa2tLczJiZzFuZHRMVnpIcEo2cUZaQzdRVmplSFUKaDYvQm1ndmRqV1Ba WUZUdE4rS0E5Q1dYM0dRS0tnTjN1dTk4OHl6bkQ3TG5COThUNEVVSDFIQS9HbmZCcU1WMQpn cHpUdlBjNHFWUWluQ21Ja0VGcDgzemwrRzVmQ2pKSjNXN2l2ekNuWW80S2hLTHBGVW05N29r VEtSMkxXM3haCnpFVzRjTFNXTzM4N01USzNDekRPeDVxZTZzNGE5MVp1Wk0vai9UUWRUTERh cU5uODNrQTRIcTQ4VUhYWXhjSWgKK05kOGsvM3c2bEZ1b0swd3JPRml5d2pMeCswdXI1am1t YmVjQkdIYzF4ZGhBRkc1QWcwRVRJQnlaQUVRQUthRgo2NzhUOXd5SDR3alRyVjFQejNjREVv U25WLzBaVXJPVDM3cDFkY0d5ai9JWHExeDY3MEhSVmFoQW1rMHNacFljCjI1UEY5RDVHUFlI RldsTmp1UFU5NnJEbmRYQjNoZWRtQlJoTGRDNGJBWGpJNERWK2JtZFZlK3EvSU1ubFpSYVYK bG05RWlNQ1ZBUjZ3MTNzUmV1N3FYa1c5cjNSd1kyQXpYc2twL3RBZTRCUktyMVptYnZpMm5i blE2ZXBFQzQycgpSYngwQjFFaGpiSVFaNUpIR2syNGlQVDdMZEJnbk5tb3M1d1lqendObGtN UUQ1VDBZZHpoazdKK1V4d0E1bTQ2Cm1PaFJEQzJyRlYvQTBnbTVUTHk4RFhqdi9Fc2M0Z1lu WWFpNlNRcW5VRVZoNUx1VjhZQ0pCbmlqcytUaXc3MXgKMWljbW42eEdJNDVFdWdKT2dlYyty THlwWWdwVnA0eDBISTVUODhxQlJZQ2t4SDNLZzhRbytFV05BOUE0TFJROQpEWDhuam9uYTBn ZjBzMDN0b2NLOGtCTjY2VW9xcVB0SEJuYzRlTWdCeW1DZmxLMTJlS2ZkMllZeG55ZzljWmF6 CldBNVZzbHZUeHBtNzZoYmc1b2lBRUgvVmcvOE14SHlBblBoZnJnd3lQcm1KRWNWQmFmZHNw Sm5ZUXhCWU5jbzIKTEZQSWhsT3ZXaDhyNGF0K3MrTTNMYjI2b1VUY3psZ2RXMVNmM1NEQTc3 Qk1SbkYwRlF5RSs3QXpWNzlNQk40eQpraXFhZXpReHRhRjFGeS90dmtoZmZTbzh1K2R3RzBF Z0poK3RlMzhnVGNJU1ZyMEdJUHBsTHo2WWhqcmJIclBSCkYxQ041VXVMOURCR2p4dU4zNVJM TlZFZnRhNlJVRmxSNk5jdFRqdnJBQkVCQUFHSkFpVUVHQUVDQUE4RkFreUEKY21RQ0d3d0ZD UkxNQXdBQUNna1E3WmZwRG1LcWZqU3JIQS8rS3pBS3ZUeFJoQTlNV05MeEl5SjdTNXVKMTZn cwpUM29DalpyQktHRWhLTU9HWDRPMEdBNlZPRXJ5TzdRUkNDWWFoM294U0czOElBbk5laXdK WGdVOUJ6a2s4NVVHCmJQRWQ3SEdGL1ZTZUhDUXdXb3U2anFVRFRTRHZuOVloTlRkRzBLWFBN NzRhQyt4cjJab3cxTzJtaFhpaGdXS0QKMER3KzBMWVBuVU9zUTBLT0Z4SFhYWUhtUnJTMU9a UFU1OUJMdmMrVFJoSWhhZlNIS0x3YlhLKzZja2t4Qng2aAo4ejVjY3BHMFFzNGJGaGRGWW5G ckVpZURMb0dtbkUyWUxoZFY2c3dKOVZOQ1M2cExpRW9oVDNmbTdhWG0xNXRaCk9JeXpNWmhI UlNBUGJsWHhRMFpTV2pxOG9ScmNZTkZ4YzRXMVVScEFrQkNPWUpvWHZRZkQ1TDNscUFsOFRD cUQKVXpZeGhIL3RKaGJEZEhycUhINzY3amFEYVRCMStUYWxwLzJBTUt3Y1hOT2Rpa2xHeGJt SFZHNllHbDZnOExyYgpzdTlOWkVJNHlMbEh6dWlrdGhKV2d6KzN2WmhWR3lObHQrSE5Jb0Y2 Q2pETDJvbXU1Y0VxNFJESE00NFFxUGs2Cmw3TzBwVXZOMW1UNEIrUzFiMDhSS3BxbS9mZjAx NUUzN0hOVi9waUl2Smx4R0FZejhQU2Z1R0NCMXRoTVlxbG0KZ2RoZDkvQmFiR0ZiR0dZSEE2 VTQvVDV6cVUrZjZ4SHkxU3NBUVoxTVNLbEx3ZWtCSVQrNC9jTFJHcUNIam5WMApxNUgvVDZh N3Q1bVBrYnpTck9MU280cHVqK0lUb05qWXlZSURCV3pobEExOWF2T2ErcnZVam1IdEQzc0ZO N2NYCld0a0dvaThidU5jYnk0VT0KPUFMNm8KLS0tLS1FTkQgUEdQIFBVQkxJQyBLRVkgQkxP Q0stLS0tLQo= Organization: UCLA Computer Science Department Message-ID: <3bc2236c-4e41-7f55-5a63-f4221d38041b@cs.ucla.edu> Date: Thu, 25 Jun 2020 19:15:21 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------C013573179FBFEFA6D62E7D4" Content-Language: en-US X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 8061-done Cc: =?UTF-8?Q?P=c3=a1draig_Brady?= , Jim Meyering , 8061-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) This is a multi-part message in MIME format. --------------C013573179FBFEFA6D62E7D4 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit This email is follow up to dated 2011-05-01. Jeff, thanks for reporting the problem. (There's a good chance this email will bounce but I'll send it to your 2011 email address anyway.) I recently ran into the same issue and derived the attached patches independently. I then found your bug report, made sure the attached patches fixed every problem that your proposal did, and installed the attached patches into Savannah. The attached patches 1-3 merely fix typos and refactor. Patch 4 corresponds to your proposal; however, it differs in that its basic idea is to use the FIEMAP code only as a fallback if SEEK_DATA doesn't work, rather than try to add to the already-too-complicated code that fiddles with FIEMAPs. (I don't observe any significant performance advantage to the FIEMAP stuff, but maybe that's just me.) Patch 5 adds opportunistic use of the copy_file_range syscall introduced in Linux kernel 4.5 (2016) and reworked in 5.3 (2019). This should improve 'cp' performance on kernels and file systems that support copy_file_range. --------------C013573179FBFEFA6D62E7D4 Content-Type: text/x-patch; charset=UTF-8; name="0001-maint-typo-fix.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-maint-typo-fix.patch" >From 4fe5259ab6c9e459a6db5938d143a9c65be113d9 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 25 Jun 2020 18:10:49 -0700 Subject: [PATCH 1/5] maint: typo fix * NEWS: Fix typo. --- NEWS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS b/NEWS index d36259641..d713fa724 100644 --- a/NEWS +++ b/NEWS @@ -17,7 +17,7 @@ GNU coreutils NEWS -*- outline -*- cp and install now default to copy-on-write (COW) if available. - On GNU/Linux systems, ls no longer issues an error message on + On GNU/Linux systems, ls no longer issues an error message on a directory merely because it was removed. This reverts a change that was made in release 8.32. -- 2.25.4 --------------C013573179FBFEFA6D62E7D4 Content-Type: text/x-patch; charset=UTF-8; name="0002-cp-refactor-extent_copy.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0002-cp-refactor-extent_copy.patch" >From 51981008f9892d44231c432535deac4f9b3cbe5e Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 23 Jun 2020 19:18:04 -0700 Subject: [PATCH 2/5] cp: refactor extent_copy * src/copy.c (extent_copy): New arg SCAN, replacing REQUIRE_NORMAL_COPY. All callers changed. (enum scantype): New type. (infer_scantype): Rename from is_probably_sparse and return the new type. Add args FD and SCAN. All callers changed. --- src/copy.c | 119 +++++++++++++++++++++++++---------------------------- 1 file changed, 55 insertions(+), 64 deletions(-) diff --git a/src/copy.c b/src/copy.c index 54601ce07..f694f913f 100644 --- a/src/copy.c +++ b/src/copy.c @@ -422,9 +422,8 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, size_t hole_size, off_t src_total_size, enum Sparse_type sparse_mode, char const *src_name, char const *dst_name, - bool *require_normal_copy) + struct extent_scan *scan) { - struct extent_scan scan; off_t last_ext_start = 0; off_t last_ext_len = 0; @@ -432,45 +431,25 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, We may need this at the end, for a final ftruncate. */ off_t dest_pos = 0; - extent_scan_init (src_fd, &scan); - - *require_normal_copy = false; bool wrote_hole_at_eof = true; - do + while (true) { - bool ok = extent_scan_read (&scan); - if (! ok) - { - if (scan.hit_final_extent) - break; - - if (scan.initial_scan_failed) - { - *require_normal_copy = true; - return false; - } - - error (0, errno, _("%s: failed to get extents info"), - quotef (src_name)); - return false; - } - bool empty_extent = false; - for (unsigned int i = 0; i < scan.ei_count || empty_extent; i++) + for (unsigned int i = 0; i < scan->ei_count || empty_extent; i++) { off_t ext_start; off_t ext_len; off_t ext_hole_size; - if (i < scan.ei_count) + if (i < scan->ei_count) { - ext_start = scan.ext_info[i].ext_logical; - ext_len = scan.ext_info[i].ext_length; + ext_start = scan->ext_info[i].ext_logical; + ext_len = scan->ext_info[i].ext_length; } else /* empty extent at EOF. */ { i--; - ext_start = last_ext_start + scan.ext_info[i].ext_length; + ext_start = last_ext_start + scan->ext_info[i].ext_length; ext_len = 0; } @@ -498,7 +477,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, { error (0, errno, _("cannot lseek %s"), quoteaf (src_name)); fail: - extent_scan_free (&scan); + extent_scan_free (scan); return false; } @@ -539,7 +518,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, /* For now, do not treat FIEMAP_EXTENT_UNWRITTEN specially, because that (in combination with no sync) would lead to data loss at least on XFS and ext4 when using 2.6.39-rc3 kernels. */ - if (0 && (scan.ext_info[i].ext_flags & FIEMAP_EXTENT_UNWRITTEN)) + if (0 && (scan->ext_info[i].ext_flags & FIEMAP_EXTENT_UNWRITTEN)) { empty_extent = true; last_ext_len = 0; @@ -571,16 +550,23 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, extents beyond the apparent size. */ if (dest_pos == src_total_size) { - scan.hit_final_extent = true; + scan->hit_final_extent = true; break; } } /* Release the space allocated to scan->ext_info. */ - extent_scan_free (&scan); + extent_scan_free (scan); + if (scan->hit_final_extent) + break; + if (! extent_scan_read (scan) && ! scan->hit_final_extent) + { + error (0, errno, _("%s: failed to get extents info"), + quotef (src_name)); + return false; + } } - while (! scan.hit_final_extent); /* When the source file ends with a hole, we have to do a little more work, since the above copied only up to and including the final extent. @@ -1021,16 +1007,35 @@ fchmod_or_lchmod (int desc, char const *name, mode_t mode) # define HAVE_STRUCT_STAT_ST_BLOCKS 0 #endif +/* Type of scan being done on the input when looking for sparseness. */ +enum scantype + { + /* No fancy scanning; just read and write. */ + PLAIN_SCANTYPE, + + /* Read and examine data looking for zero blocks; useful when + attempting to create sparse output. */ + ZERO_SCANTYPE, + + /* Extent information is available. */ + EXTENT_SCANTYPE + }; + /* Use a heuristic to determine whether stat buffer SB comes from a file with sparse blocks. If the file has fewer blocks than would normally be needed for a file of its size, then at least one of the blocks in the file is a hole. In that case, return true. */ -static bool -is_probably_sparse (struct stat const *sb) +static enum scantype +infer_scantype (int fd, struct stat const *sb, struct extent_scan *scan) { - return (HAVE_STRUCT_STAT_ST_BLOCKS - && S_ISREG (sb->st_mode) - && ST_NBLOCKS (*sb) < sb->st_size / ST_NBLOCKSIZE); + if (! (HAVE_STRUCT_STAT_ST_BLOCKS + && S_ISREG (sb->st_mode) + && ST_NBLOCKS (*sb) < sb->st_size / ST_NBLOCKSIZE)) + return PLAIN_SCANTYPE; + + extent_scan_init (fd, scan); + extent_scan_read (scan); + return scan->initial_scan_failed ? ZERO_SCANTYPE : EXTENT_SCANTYPE; } @@ -1061,6 +1066,7 @@ copy_reg (char const *src_name, char const *dst_name, mode_t src_mode = src_sb->st_mode; struct stat sb; struct stat src_open_sb; + struct extent_scan scan; bool return_val = true; bool data_copy_required = x->data_copy_required; @@ -1260,23 +1266,13 @@ copy_reg (char const *src_name, char const *dst_name, fdadvise (source_desc, 0, 0, FADVISE_SEQUENTIAL); /* Deal with sparse files. */ - bool make_holes = false; - bool sparse_src = is_probably_sparse (&src_open_sb); - - if (S_ISREG (sb.st_mode)) - { - /* Even with --sparse=always, try to create holes only - if the destination is a regular file. */ - if (x->sparse_mode == SPARSE_ALWAYS) - make_holes = true; - - /* Use a heuristic to determine whether SRC_NAME contains any sparse - blocks. If the file has fewer blocks than would normally be - needed for a file of its size, then at least one of the blocks in - the file is a hole. */ - if (x->sparse_mode == SPARSE_AUTO && sparse_src) - make_holes = true; - } + enum scantype scantype = infer_scantype (source_desc, &src_open_sb, + &scan); + bool make_holes + = (S_ISREG (sb.st_mode) + && (x->sparse_mode == SPARSE_ALWAYS + || (x->sparse_mode == SPARSE_AUTO + && scantype != PLAIN_SCANTYPE))); /* If not making a sparse file, try to use a more-efficient buffer size. */ @@ -1305,10 +1301,8 @@ copy_reg (char const *src_name, char const *dst_name, buf_alloc = xmalloc (buf_size + buf_alignment); buf = ptr_align (buf_alloc, buf_alignment); - if (sparse_src) + if (scantype == EXTENT_SCANTYPE) { - bool normal_copy_required; - /* Perform an efficient extent-based copy, falling back to the standard copy only if the initial extent scan fails. If the '--sparse=never' option is specified, write all data but use @@ -1316,14 +1310,11 @@ copy_reg (char const *src_name, char const *dst_name, if (extent_copy (source_desc, dest_desc, buf, buf_size, hole_size, src_open_sb.st_size, make_holes ? x->sparse_mode : SPARSE_NEVER, - src_name, dst_name, &normal_copy_required)) + src_name, dst_name, &scan)) goto preserve_metadata; - if (! normal_copy_required) - { - return_val = false; - goto close_src_and_dst_desc; - } + return_val = false; + goto close_src_and_dst_desc; } off_t n_read; -- 2.25.4 --------------C013573179FBFEFA6D62E7D4 Content-Type: text/x-patch; charset=UTF-8; name="0003-cp-avoid-copy_reg-goto.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0003-cp-avoid-copy_reg-goto.patch" >From 35a55effa81c5f4f8fe7414e8aede0acccee470f Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 24 Jun 2020 17:05:20 -0700 Subject: [PATCH 3/5] cp: avoid copy_reg goto * src/copy.c (copy_reg): Redo to avoid label and goto. --- src/copy.c | 34 +++++++++++----------------------- 1 file changed, 11 insertions(+), 23 deletions(-) diff --git a/src/copy.c b/src/copy.c index f694f913f..b382cfa4d 100644 --- a/src/copy.c +++ b/src/copy.c @@ -1301,29 +1301,18 @@ copy_reg (char const *src_name, char const *dst_name, buf_alloc = xmalloc (buf_size + buf_alignment); buf = ptr_align (buf_alloc, buf_alignment); - if (scantype == EXTENT_SCANTYPE) - { - /* Perform an efficient extent-based copy, falling back to the - standard copy only if the initial extent scan fails. If the - '--sparse=never' option is specified, write all data but use - any extents to read more efficiently. */ - if (extent_copy (source_desc, dest_desc, buf, buf_size, hole_size, - src_open_sb.st_size, - make_holes ? x->sparse_mode : SPARSE_NEVER, - src_name, dst_name, &scan)) - goto preserve_metadata; - - return_val = false; - goto close_src_and_dst_desc; - } - off_t n_read; - bool wrote_hole_at_eof; - if (! sparse_copy (source_desc, dest_desc, buf, buf_size, - make_holes ? hole_size : 0, - x->sparse_mode == SPARSE_ALWAYS, src_name, dst_name, - UINTMAX_MAX, &n_read, - &wrote_hole_at_eof)) + bool wrote_hole_at_eof = false; + if (! (scantype == EXTENT_SCANTYPE + ? extent_copy (source_desc, dest_desc, buf, buf_size, hole_size, + src_open_sb.st_size, + make_holes ? x->sparse_mode : SPARSE_NEVER, + src_name, dst_name, &scan) + : sparse_copy (source_desc, dest_desc, buf, buf_size, + make_holes ? hole_size : 0, + x->sparse_mode == SPARSE_ALWAYS, + src_name, dst_name, UINTMAX_MAX, &n_read, + &wrote_hole_at_eof))) { return_val = false; goto close_src_and_dst_desc; @@ -1336,7 +1325,6 @@ copy_reg (char const *src_name, char const *dst_name, } } -preserve_metadata: if (x->preserve_timestamps) { struct timespec timespec[2]; -- 2.25.4 --------------C013573179FBFEFA6D62E7D4 Content-Type: text/x-patch; charset=UTF-8; name="0004-cp-use-SEEK_DATA-SEEK_HOLE-if-available.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0004-cp-use-SEEK_DATA-SEEK_HOLE-if-available.patch" >From 6fc41029fff6e955a87312d8f5f967f0d01e390c Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 25 Jun 2020 16:31:44 -0700 Subject: [PATCH 4/5] cp: use SEEK_DATA/SEEK_HOLE if available If it works, prefer lseek with SEEK_DATA and SEEK_HOLE to FIEMAP, as lseek is simpler and more portable (will be in next POSIX). Problem reported in 2011 by Jeff Liu (Bug#8061). * NEWS: Mention this. * src/copy.c (lseek_copy) [SEEK_HOLE]: New function. (enum scantype): New constants ERROR_SCANTYPE, LSEEK_SCANTYPE. (union scan_inference): New type. (infer_scantype): Last arg is now union scan_inference *, not struct extent_scan *. All callers changed. Prefer SEEK_HOLE to FIEMAP if both work, since SEEK_HOLE is simpler and more portable. (copy_reg): Do the fdadvise after initial scan, in case the scan fails. Report an error if the initial scan fails. (copy_reg) [SEEK_HOLE]: Use lseek_copy if scantype says so. --- NEWS | 3 + src/copy.c | 209 ++++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 201 insertions(+), 11 deletions(-) diff --git a/NEWS b/NEWS index d713fa724..63cb47d10 100644 --- a/NEWS +++ b/NEWS @@ -17,6 +17,9 @@ GNU coreutils NEWS -*- outline -*- cp and install now default to copy-on-write (COW) if available. + cp, install and mv now prefer lseek+SEEK_HOLE to ioctl+FS_IOC_FIEMAP + on sparse files, as lseek is simpler and more portable. + On GNU/Linux systems, ls no longer issues an error message on a directory merely because it was removed. This reverts a change that was made in release 8.32. diff --git a/src/copy.c b/src/copy.c index b382cfa4d..d88f8cf93 100644 --- a/src/copy.c +++ b/src/copy.c @@ -416,7 +416,12 @@ write_zeros (int fd, off_t n_bytes) Upon a successful copy, return true. If the initial extent scan fails, set *NORMAL_COPY_REQUIRED to true and return false. Upon any other failure, set *NORMAL_COPY_REQUIRED to false and - return false. */ + return false. + + FIXME: Once we no longer need to support Linux kernel versions + before 3.1 (2011), this function can be retired as it is superseded + by lseek_copy. That is, we no longer need extent-scan.h and can + remove any of the code that uses it. */ static bool extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, size_t hole_size, off_t src_total_size, @@ -595,6 +600,150 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, return true; } +#ifdef SEEK_HOLE +/* Perform an efficient extent copy, if possible. This avoids + the overhead of detecting holes in hole-introducing/preserving + copy, and thus makes copying sparse files much more efficient. + Copy from SRC_FD to DEST_FD, using BUF (of size BUF_SIZE) for a buffer. + Look for holes of size HOLE_SIZE in the input. + The input file is of size SRC_TOTAL_SIZE. + Use SPARSE_MODE to determine whether to create holes in the output. + SRC_NAME and DST_NAME are the input and output file names. + Return true if successful, false (with a diagnostic) otherwise. */ + +static bool +lseek_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, + size_t hole_size, off_t ext_start, off_t src_total_size, + enum Sparse_type sparse_mode, + char const *src_name, char const *dst_name) +{ + off_t last_ext_start = 0; + off_t last_ext_len = 0; + off_t dest_pos = 0; + bool wrote_hole_at_eof = true; + + while (0 <= ext_start) + { + off_t ext_end = lseek (src_fd, ext_start, SEEK_HOLE); + if (ext_end < 0) + { + if (errno != ENXIO) + goto cannot_lseek; + ext_end = src_total_size; + if (ext_end <= ext_start) + { + /* The input file grew; get its current size. */ + src_total_size = lseek (src_fd, 0, SEEK_END); + if (src_total_size < 0) + goto cannot_lseek; + + /* If the input file shrank after growing, stop copying. */ + if (src_total_size <= ext_start) + break; + + ext_end = src_total_size; + } + } + /* If the input file must have grown, increase its measured size. */ + if (src_total_size < ext_end) + src_total_size = ext_end; + + if (lseek (src_fd, ext_start, SEEK_SET) < 0) + goto cannot_lseek; + + wrote_hole_at_eof = false; + off_t ext_hole_size = ext_start - last_ext_start - last_ext_len; + + if (ext_hole_size) + { + if (sparse_mode != SPARSE_NEVER) + { + if (! create_hole (dest_fd, dst_name, + sparse_mode == SPARSE_ALWAYS, + ext_hole_size)) + return false; + wrote_hole_at_eof = true; + } + else + { + /* When not inducing holes and when there is a hole between + the end of the previous extent and the beginning of the + current one, write zeros to the destination file. */ + if (! write_zeros (dest_fd, ext_hole_size)) + { + error (0, errno, _("%s: write failed"), + quotef (dst_name)); + return false; + } + } + } + + off_t ext_len = ext_end - ext_start; + last_ext_start = ext_start; + last_ext_len = ext_len; + + /* Copy this extent, looking for further opportunities to not + bother to write zeros unless --sparse=never, since SEEK_HOLE + is conservative and may miss some holes. */ + off_t n_read; + bool read_hole; + if ( ! sparse_copy (src_fd, dest_fd, buf, buf_size, + sparse_mode == SPARSE_NEVER ? 0 : hole_size, + true, src_name, dst_name, ext_len, &n_read, + &read_hole)) + return false; + + dest_pos = ext_start + n_read; + if (n_read) + wrote_hole_at_eof = read_hole; + if (n_read < ext_len) + { + /* The input file shrank. */ + src_total_size = dest_pos; + break; + } + + ext_start = lseek (src_fd, dest_pos, SEEK_DATA); + if (ext_start < 0) + { + if (errno != ENXIO) + goto cannot_lseek; + break; + } + } + + /* When the source file ends with a hole, we have to do a little more work, + since the above copied only up to and including the final extent. + In order to complete the copy, we may have to insert a hole or write + zeros in the destination corresponding to the source file's hole-at-EOF. + + In addition, if the final extent was a block of zeros at EOF and we've + just converted them to a hole in the destination, we must call ftruncate + here in order to record the proper length in the destination. */ + if ((dest_pos < src_total_size || wrote_hole_at_eof) + && ! (sparse_mode == SPARSE_NEVER + ? write_zeros (dest_fd, src_total_size - dest_pos) + : ftruncate (dest_fd, src_total_size) == 0)) + { + error (0, errno, _("failed to extend %s"), quoteaf (dst_name)); + return false; + } + + if (sparse_mode == SPARSE_ALWAYS && dest_pos < src_total_size + && punch_hole (dest_fd, dest_pos, src_total_size - dest_pos) < 0) + { + error (0, errno, _("error deallocating %s"), quoteaf (dst_name)); + return false; + } + + return true; + + cannot_lseek: + error (0, errno, _("cannot lseek %s"), quoteaf (src_name)); + return false; +} +#endif + /* FIXME: describe */ /* FIXME: rewrite this to use a hash table so we avoid the quadratic performance hit that's probably noticeable only on trees deeper @@ -1010,6 +1159,9 @@ fchmod_or_lchmod (int desc, char const *name, mode_t mode) /* Type of scan being done on the input when looking for sparseness. */ enum scantype { + /* An error was found when determining scantype. */ + ERROR_SCANTYPE, + /* No fancy scanning; just read and write. */ PLAIN_SCANTYPE, @@ -1017,22 +1169,44 @@ enum scantype attempting to create sparse output. */ ZERO_SCANTYPE, + /* lseek information is available. */ + LSEEK_SCANTYPE, + /* Extent information is available. */ EXTENT_SCANTYPE }; -/* Use a heuristic to determine whether stat buffer SB comes from a file - with sparse blocks. If the file has fewer blocks than would normally - be needed for a file of its size, then at least one of the blocks in - the file is a hole. In that case, return true. */ +/* Result of infer_scantype. */ +union scan_inference +{ + /* Used if infer_scantype returns LSEEK_SCANTYPE. This is the + offset of the first data block, or -1 if the file has no data. */ + off_t ext_start; + + /* Used if infer_scantype returns EXTENT_SCANTYPE. */ + struct extent_scan extent_scan; +}; + +/* Return how to scan a file with descriptor FD and stat buffer SB. + Store any information gathered into *SCAN. */ static enum scantype -infer_scantype (int fd, struct stat const *sb, struct extent_scan *scan) +infer_scantype (int fd, struct stat const *sb, + union scan_inference *scan_inference) { if (! (HAVE_STRUCT_STAT_ST_BLOCKS && S_ISREG (sb->st_mode) && ST_NBLOCKS (*sb) < sb->st_size / ST_NBLOCKSIZE)) return PLAIN_SCANTYPE; +#ifdef SEEK_HOLE + scan_inference->ext_start = lseek (fd, 0, SEEK_DATA); + if (0 <= scan_inference->ext_start) + return LSEEK_SCANTYPE; + else if (errno != EINVAL && errno != ENOTSUP) + return errno == ENXIO ? LSEEK_SCANTYPE : ERROR_SCANTYPE; +#endif + + struct extent_scan *scan = &scan_inference->extent_scan; extent_scan_init (fd, scan); extent_scan_read (scan); return scan->initial_scan_failed ? ZERO_SCANTYPE : EXTENT_SCANTYPE; @@ -1066,7 +1240,7 @@ copy_reg (char const *src_name, char const *dst_name, mode_t src_mode = src_sb->st_mode; struct stat sb; struct stat src_open_sb; - struct extent_scan scan; + union scan_inference scan_inference; bool return_val = true; bool data_copy_required = x->data_copy_required; @@ -1263,17 +1437,23 @@ copy_reg (char const *src_name, char const *dst_name, size_t buf_size = io_blksize (sb); size_t hole_size = ST_BLKSIZE (sb); - fdadvise (source_desc, 0, 0, FADVISE_SEQUENTIAL); - /* Deal with sparse files. */ enum scantype scantype = infer_scantype (source_desc, &src_open_sb, - &scan); + &scan_inference); + if (scantype == ERROR_SCANTYPE) + { + error (0, errno, _("cannot lseek %s"), quoteaf (src_name)); + return_val = false; + goto close_src_and_dst_desc; + } bool make_holes = (S_ISREG (sb.st_mode) && (x->sparse_mode == SPARSE_ALWAYS || (x->sparse_mode == SPARSE_AUTO && scantype != PLAIN_SCANTYPE))); + fdadvise (source_desc, 0, 0, FADVISE_SEQUENTIAL); + /* If not making a sparse file, try to use a more-efficient buffer size. */ if (! make_holes) @@ -1307,7 +1487,14 @@ copy_reg (char const *src_name, char const *dst_name, ? extent_copy (source_desc, dest_desc, buf, buf_size, hole_size, src_open_sb.st_size, make_holes ? x->sparse_mode : SPARSE_NEVER, - src_name, dst_name, &scan) + src_name, dst_name, &scan_inference.extent_scan) +#ifdef SEEK_HOLE + : scantype == LSEEK_SCANTYPE + ? lseek_copy (source_desc, dest_desc, buf, buf_size, hole_size, + scan_inference.ext_start, src_open_sb.st_size, + make_holes ? x->sparse_mode : SPARSE_NEVER, + src_name, dst_name) +#endif : sparse_copy (source_desc, dest_desc, buf, buf_size, make_holes ? hole_size : 0, x->sparse_mode == SPARSE_ALWAYS, -- 2.25.4 --------------C013573179FBFEFA6D62E7D4 Content-Type: text/x-patch; charset=UTF-8; name="0005-cp-use-copy_file_range-if-available.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0005-cp-use-copy_file_range-if-available.patch" >From 81849f05b94068f873299bd9fc01b4eebdb31f64 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 25 Jun 2020 17:34:23 -0700 Subject: [PATCH 5/5] cp: use copy_file_range if available * NEWS: Mention this. * bootstrap.conf (gnulib_modules): Add copy-file-range. * src/copy.c (sparse_copy): Try copy_file_range if not looking for holes. --- NEWS | 5 +++-- bootstrap.conf | 1 + src/copy.c | 40 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 44 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index 63cb47d10..1c3f6378d 100644 --- a/NEWS +++ b/NEWS @@ -17,8 +17,9 @@ GNU coreutils NEWS -*- outline -*- cp and install now default to copy-on-write (COW) if available. - cp, install and mv now prefer lseek+SEEK_HOLE to ioctl+FS_IOC_FIEMAP - on sparse files, as lseek is simpler and more portable. + cp, install and mv now use the copy_file_range syscall if available. + Also, they prefer lseek+SEEK_HOLE to ioctl+FS_IOC_FIEMAP on sparse + files, as lseek is simpler and more portable. On GNU/Linux systems, ls no longer issues an error message on a directory merely because it was removed. This reverts a change diff --git a/bootstrap.conf b/bootstrap.conf index 12e2d831a..2506f0db4 100644 --- a/bootstrap.conf +++ b/bootstrap.conf @@ -54,6 +54,7 @@ gnulib_modules=" closeout config-h configmake + copy-file-range crypto/md5 crypto/sha1 crypto/sha256 diff --git a/src/copy.c b/src/copy.c index d88f8cf93..4050f6953 100644 --- a/src/copy.c +++ b/src/copy.c @@ -265,6 +265,46 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, { *last_write_made_hole = false; *total_n_read = 0; + + /* If not looking for holes, use copy_file_range if available. */ + if (!hole_size) + while (max_n_read) + { + /* Copy at most COPY_MAX bytes at a time; this is min + (PTRDIFF_MAX, SIZE_MAX) truncated to a value that is + surely aligned well. */ + ssize_t ssize_max = TYPE_MAXIMUM (ssize_t); + ptrdiff_t copy_max = MIN (ssize_max, SIZE_MAX) >> 30 << 30; + ssize_t n_copied = copy_file_range (src_fd, NULL, dest_fd, NULL, + MIN (max_n_read, copy_max), 0); + if (n_copied == 0) + { + /* copy_file_range incorrectly returns 0 when reading from + the proc file system on the Linux kernel through at + least 5.6.19 (2020), so fall back on 'read' if the + input file seems empty. */ + if (*total_n_read == 0) + break; + return true; + } + if (n_copied < 0) + { + if (errno == ENOSYS || errno == EINVAL + || errno == EBADF || errno == EXDEV) + break; + if (errno == EINTR) + n_copied = 0; + else + { + error (0, errno, _("error copying %s to %s"), + quoteaf_n (0, src_name), quoteaf_n (1, dst_name)); + return false; + } + } + max_n_read -= n_copied; + *total_n_read += n_copied; + } + bool make_hole = false; off_t psize = 0; -- 2.25.4 --------------C013573179FBFEFA6D62E7D4-- ------------=_1593137762-14034-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 17 Feb 2011 13:49:27 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Pq4EY-0005Z7-Aw for submit@debbugs.gnu.org; Thu, 17 Feb 2011 08:49:27 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Pq4EV-0005Yv-T5 for submit@debbugs.gnu.org; Thu, 17 Feb 2011 08:49:24 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pq4NH-00089I-L7 for submit@debbugs.gnu.org; Thu, 17 Feb 2011 08:58:29 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,HTML_MESSAGE, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:43637) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pq4NH-00089B-Ae for submit@debbugs.gnu.org; Thu, 17 Feb 2011 08:58:27 -0500 Received: from [140.186.70.92] (port=39195 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pq4Mw-0004mB-2Q for bug-coreutils@gnu.org; Thu, 17 Feb 2011 08:58:26 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pq4MU-0007zp-Lx for bug-coreutils@gnu.org; Thu, 17 Feb 2011 08:57:40 -0500 Received: from rcsinet10.oracle.com ([148.87.113.121]:18456) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pq4MU-0007zR-8M for bug-coreutils@gnu.org; Thu, 17 Feb 2011 08:57:38 -0500 Received: from acsinet15.oracle.com (acsinet15.oracle.com [141.146.126.227]) by rcsinet10.oracle.com (Switch-3.4.2/Switch-3.4.2) with ESMTP id p1HDvT4c001402 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 17 Feb 2011 13:57:34 GMT Received: from acsmt355.oracle.com (acsmt355.oracle.com [141.146.40.155]) by acsinet15.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id p1HD39m2027007; Thu, 17 Feb 2011 13:57:29 GMT Received: from abhmt014.oracle.com by acsmt354.oracle.com with ESMTP id 1063839691297951043; Thu, 17 Feb 2011 05:57:23 -0800 Received: from [192.168.1.101] (/221.223.109.12) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 17 Feb 2011 05:57:21 -0800 From: Jeff liu Content-Type: multipart/alternative; boundary=Apple-Mail-3--331228896 Subject: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module Date: Thu, 17 Feb 2011 21:57:14 +0800 Message-Id: <2DB776C1-EF34-423D-8BE5-71C2F49DFF01@oracle.com> To: bug-coreutils@gnu.org Mime-Version: 1.0 (Apple Message framework v1081) X-Mailer: Apple Mail (2.1081) X-Source-IP: acsmt355.oracle.com [141.146.40.155] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090203.4D5D2949.0106:SCFMA4539814,ss=1,fgs=0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 199.232.76.165 X-Spam-Score: -6.4 (------) X-Debbugs-Envelope-To: submit Cc: "jeff.liu" , Jim Meyering X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.4 (------) --Apple-Mail-3--331228896 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hello All, This is the first try to introduce the SEEK_DATA/SEEK_HOLE support to = extent_scan module for efficient sparse file copy on ZFS, I have = delayed it for a long time, sorry for that! Below is the code change lists: src/extent_scan.h: add a new structure item 'src_total_size' to "struct = extent_info", since I have to make use of this value to determine a file is sparse of not for the initial scan. If the returns of = lseek(fd, 0, SEEK_HOLE) is equal to the src_total_size or large than it, = it means the source file is definitely not a sparse file or maybe it is a sparse file but it does = not make sense for proceeding scan read. another change in this file is the signature of extent_scan_init(), just = as I mentioned above, it need to accept the src_total_size variable. src/extent_scan.c: implement the new exent_scan_read() through = SEEK_DATA/SEEK_HOLE, it will be called if those two values are defined = at . src/copy.c: pass src_total_size to extent_scan_init(). On my test environment, Solaris10, SunOS 5.10 Generic_142910-17, I have = tried a few simple cases, they are works to me. For now, I am using diff(1) to verify the copy result, does anyone know = some utilities can be used to write the test script? I have sent an email to ZFS DEV mail-list to ask this question = yesterday, a nice guy suggest me to use = ZDB(http://cuddletech.com/blog/?p=3D407) for that, I'm still study this utility now, I also noticed there is patch to add = SEEK_HOLE/SEEK_DATA support to os module in Python community, please = refer to: http://bugs.python.org/file19566/z.patch but it require very latest python build I think, so could anyone give = some other advices in this point? The patch is shown as following, any help testing and comments are = appreciated!! Thanks, -Jeff From: Jie Liu Date: Thu, 17 Feb 2011 21:14:23 +0800 Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to = extent_scan module * src/extent_scan.h: add src_total_size to struct extent_info, we need to check the SEEK_HOLE result against it for initial extent scan. modify the extent_scan_init() signature, to add size_t src_total_size. * src/extent_scan.c: implement a new extent_scan_read() through = SEEK_DATA and SEEK_HOLE. * src/copy.c: pass src_total_size to extent_scan_init(). Signed-off-by: Jie Liu --- src/copy.c | 2 +- src/extent-scan.c | 113 = ++++++++++++++++++++++++++++++++++++++++++++++++++++- src/extent-scan.h | 9 +++- 3 files changed, 120 insertions(+), 4 deletions(-) diff --git a/src/copy.c b/src/copy.c index 104652d..22b9911 100644 --- a/src/copy.c +++ b/src/copy.c @@ -306,7 +306,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, = size_t buf_size, We may need this at the end, for a final ftruncate. */ off_t dest_pos =3D 0; =20 - extent_scan_init (src_fd, &scan); + extent_scan_init (src_fd, src_total_size, &scan); =20 *require_normal_copy =3D false; bool wrote_hole_at_eof =3D true; diff --git a/src/extent-scan.c b/src/extent-scan.c index 1ba59db..ffeab7a 100644 --- a/src/extent-scan.c +++ b/src/extent-scan.c @@ -32,13 +32,17 @@ /* Allocate space for struct extent_scan, initialize the entries if necessary and return it as the input argument of extent_scan_read(). = */ extern void -extent_scan_init (int src_fd, struct extent_scan *scan) +extent_scan_init (int src_fd, size_t src_total_size, + struct extent_scan *scan) { scan->fd =3D src_fd; scan->ei_count =3D 0; scan->scan_start =3D 0; scan->initial_scan_failed =3D false; scan->hit_final_extent =3D false; +#if defined(SEEK_HOLE) && defined(SEEK_DATA) + scan->src_total_size =3D src_total_size; +#endif } =20 #ifdef __linux__ @@ -106,6 +110,113 @@ extent_scan_read (struct extent_scan *scan) =20 return true; } +#elif defined(SEEK_HOLE) && defined(SEEK_DATA) +extern bool +extent_scan_read (struct extent_scan *scan) +{ + off_t data_pos, hole_pos; + union { struct extent_info ei; char c[4096]; } extent_buf; + struct extent_info *ext_info =3D &extent_buf.ei; + enum { count =3D (sizeof extent_buf / sizeof *ext_info) }; + verify (count !=3D 0); + + memset (&extent_buf, 0, sizeof extent_buf); + + if (scan->scan_start =3D=3D 0) + { +# ifdef _PC_MIN_HOLE_SIZE + /* To determine if the underlaying file system support + SEEK_HOLE, if not, fall back to the standard copy. */ + if (fpathconf (scan->fd, _PC_MIN_HOLE_SIZE) < 0) + { + scan->initial_scan_failed =3D true; + return false; + } +# endif + + /* If we have been compiled on an OS that supports SEEK_HOLE + but run on an OS that does not support SEEK_HOLE, we get + EINVAL. If the underlying filesystem does not support the + SEEK_HOLE call, we get ENOTSUP, fall back to standard copy + in either case. */ + hole_pos =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE); + if (hole_pos < 0) + { + if (errno =3D=3D EINVAL || errno =3D=3D ENOTSUP) + scan->initial_scan_failed =3D true; + return false; + } + + /* Seek back to position 0 first if we detected a real hole. */ + if (hole_pos > 0) + { + off_t tmp_pos; + tmp_pos =3D lseek (scan->fd, (off_t) 0, SEEK_SET); + if (tmp_pos !=3D (off_t) 0) + return false; + + /* The source file is definitely not a sparse file, or it + maybe a sparse file but SEEK_HOLE returns the source = file's + total size, fall back to the standard copy too. */ + if (hole_pos >=3D scan->src_total_size) + { + scan->initial_scan_failed =3D true; + return false; + } + } + } + + unsigned int i =3D 0; + /* If lseek(2) failed and the errno is set to ENXIO, for + SEEK_DATA there are no more data regions past the supplied + offset. For SEEK_HOLE, there are no more holes past the=20 + supplied offset. Set scan->hit_final_extent to true for + either case. */ + do { + data_pos =3D lseek (scan->fd, scan->scan_start, SEEK_DATA); + if (data_pos < 0) + { + if (errno !=3D ENXIO) + return false; + else + { + scan->hit_final_extent =3D true; + return true; + } + } + + hole_pos =3D lseek (scan->fd, data_pos, SEEK_HOLE); + if (hole_pos < 0) + { + if (errno !=3D ENXIO) + return false; + else + { + scan->hit_final_extent =3D true; + return true; + } + } + + ext_info[i].ext_logical =3D data_pos; + ext_info[i].ext_length =3D hole_pos - data_pos; + scan->scan_start =3D hole_pos; + ++i; + } while (scan->scan_start < scan->src_total_size && i < count); + + i--; + scan->ei_count =3D i; + scan->ext_info =3D xnmalloc (scan->ei_count, sizeof (struct = extent_info)); + + for (i =3D 0; i < scan->ei_count; i++) + { + assert (ext_info[i].ext_logical <=3D OFF_T_MAX); + + scan->ext_info[i].ext_logical =3D ext_info[i].ext_logical; + scan->ext_info[i].ext_length =3D ext_info[i].ext_length; + } + + return true;=20 +} #else extern bool extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED) diff --git a/src/extent-scan.h b/src/extent-scan.h index 4724b25..a271b95 100644 --- a/src/extent-scan.h +++ b/src/extent-scan.h @@ -18,7 +18,6 @@ =20 #ifndef EXTENT_SCAN_H # define EXTENT_SCAN_H - /* Structure used to store information of each extent. */ struct extent_info { @@ -38,6 +37,11 @@ struct extent_scan /* File descriptor of extent scan run against. */ int fd; =20 +#if defined(SEEK_DATA) && defined(SEEK_HOLE) + /* Source file size, i.e, (struct stat) &statbuf.st_size. */ + size_t src_total_size; +#endif + /* Next scan start offset. */ off_t scan_start; =20 @@ -55,7 +59,8 @@ struct extent_scan struct extent_info *ext_info; }; =20 -void extent_scan_init (int src_fd, struct extent_scan *scan); +void extent_scan_init (int src_fd, size_t src_total_size, + struct extent_scan *scan); =20 bool extent_scan_read (struct extent_scan *scan); =20 --=20 1.7.4= --Apple-Mail-3--331228896 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
a file is sparse of not for the = initial scan.  If the returns of lseek(fd, 0, SEEK_HOLE) is equal = to the src_total_size or large than it, it means the source = file
is definitely not a sparse file or maybe it is a sparse = file but it does not make sense for proceeding scan = read.
another change in this file is the signature of = extent_scan_init(), just as I mentioned above, it need to accept the = src_total_size variable.
src/extent_scan.c: implement the new = exent_scan_read() through SEEK_DATA/SEEK_HOLE, it will be called if = those two values are defined at <unistd.h>.
src/copy.c: = pass src_total_size to extent_scan_init().

On = my test environment,  Solaris10, SunOS 5.10 Generic_142910-17, I = have tried a few simple cases, they are works to = me.

For now, I am using diff(1) to verify the = copy result,  does anyone know some utilities can be used to write = the test script?
I have sent an email to ZFS DEV mail-list to = ask this question yesterday,  a nice guy suggest me to use ZDB(http://cuddletech.com/blog/?p= =3D407) for that, I'm
still study this utility now,   = I also noticed there is patch to add SEEK_HOLE/SEEK_DATA support to os = module in Python community,  please refer to:
but it require very latest python build I = think,  so could anyone give some other advices in this = point?

The patch is shown as following, any = help testing and comments are = appreciated!!


Thanks,
-J= eff


From: Jie Liu <jeff.liu@oracle.com>
= Date: Thu, 17 Feb 2011 21:14:23 +0800
Subject: [PATCH 1/1] = copy: add SEEK_DATA/SEEK_HOLE support to extent_scan = module

* src/extent_scan.h: add src_total_size = to struct extent_info, we need
  to check the = SEEK_HOLE result against it for initial extent = scan.
  modify the extent_scan_init() signature, to = add size_t src_total_size.
* src/extent_scan.c: implement a = new extent_scan_read() through SEEK_DATA
  and = SEEK_HOLE.
* src/copy.c: pass src_total_size to = extent_scan_init().

Signed-off-by: Jie Liu = <jeff.liu@oracle.com>
= ---
 src/copy.c        |   =  2 +-
 src/extent-scan.c |  113 = ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 src/= extent-scan.h |    9 +++-
 3 files changed, 120 = insertions(+), 4 deletions(-)

diff --git = a/src/copy.c b/src/copy.c
index 104652d..22b9911 = 100644
--- a/src/copy.c
+++ = b/src/copy.c
@@ -306,7 +306,7 @@ extent_copy (int src_fd, int = dest_fd, char *buf, size_t buf_size,
     =  We may need this at the end, for a final ftruncate. =  */
   off_t dest_pos =3D = 0;
 
-  extent_scan_init (src_fd, = &scan);
+  extent_scan_init (src_fd, src_total_size, = &scan);
 
   *require_normal_copy = =3D false;
   bool wrote_hole_at_eof =3D = true;
diff --git a/src/extent-scan.c = b/src/extent-scan.c
index 1ba59db..ffeab7a = 100644
--- a/src/extent-scan.c
+++ = b/src/extent-scan.c
@@ -32,13 +32,17 @@
 /* = Allocate space for struct extent_scan, initialize the entries = if
    necessary and return it as the input = argument of extent_scan_read().  */
 extern = void
-extent_scan_init (int src_fd, struct extent_scan = *scan)
+extent_scan_init (int src_fd, size_t = src_total_size,
+             =      struct extent_scan = *scan)
 {
   scan->fd =3D = src_fd;
   scan->ei_count =3D = 0;
   scan->scan_start =3D = 0;
   scan->initial_scan_failed =3D = false;
   scan->hit_final_extent =3D = false;
+#if defined(SEEK_HOLE) && = defined(SEEK_DATA)
+  scan->src_total_size =3D = src_total_size;
+#endif
 }
 
=
 #ifdef __linux__
@@ -106,6 +110,113 @@ = extent_scan_read (struct extent_scan = *scan)
 
   return = true;
 }
+#elif defined(SEEK_HOLE) && = defined(SEEK_DATA)
+extern bool
+extent_scan_read = (struct extent_scan *scan)
+{
+  off_t = data_pos, hole_pos;
+  union { struct extent_info ei; = char c[4096]; } extent_buf;
+  struct extent_info = *ext_info =3D &extent_buf.ei;
+  enum { count =3D = (sizeof extent_buf / sizeof *ext_info) };
+  verify = (count !=3D 0);
+
+  memset (&extent_buf, = 0, sizeof extent_buf);
+
+  if = (scan->scan_start =3D=3D 0)
+    {
+# = ifdef _PC_MIN_HOLE_SIZE
+      /* To determine = if the underlaying file system support
+       =   SEEK_HOLE, if not, fall back to the standard copy. =  */
+      if (fpathconf (scan->fd, = _PC_MIN_HOLE_SIZE) < 0)
+       =  {
+         =  scan->initial_scan_failed =3D true;
+     =      return false;
+       =  }
+# endif
+
+     =  /* If we have been compiled on an OS that supports = SEEK_HOLE
+         but run on an OS that = does not support SEEK_HOLE, we get
+       =   EINVAL.  If the underlying filesystem does not support = the
+         SEEK_HOLE call, we get = ENOTSUP, fall back to standard copy
+       =   in either case.  */
+      hole_pos = =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE);
+     =  if (hole_pos < 0)
+       =  {
+          if (errno =3D=3D = EINVAL || errno =3D=3D ENOTSUP)
+         =    scan->initial_scan_failed =3D true;
+   =        return false;
+     =    }
+
+      /* Seek back = to position 0 first if we detected a real hole.  */
+ =      if (hole_pos > 0)
+     =    {
+          off_t = tmp_pos;
+          tmp_pos =3D lseek = (scan->fd, (off_t) 0, SEEK_SET);
+       =    if (tmp_pos !=3D (off_t) 0)
+     =          return false;
+
+ =          /* The source file is definitely not a = sparse file, or it
+             = maybe a sparse file but SEEK_HOLE returns the source file's
+ =             total size, fall back to the = standard copy too.  */
+         =  if (hole_pos >=3D scan->src_total_size)
+   =          {
+       =        scan->initial_scan_failed =3D = true;
+              return = false;
+           =  }
+        }
+   =  }
+
+  unsigned int i =3D 0;
+ =  /* If lseek(2) failed and the errno is set to ENXIO, = for
+     SEEK_DATA there are no more data regions = past the supplied
+     offset.  For SEEK_HOLE, = there are no more holes past the 
+     = supplied offset.  Set scan->hit_final_extent to true = for
+     either case.  */
+  do = {
+    data_pos =3D lseek (scan->fd, = scan->scan_start, SEEK_DATA);
+    if (data_pos = < 0)
+      {
+     =    if (errno !=3D ENXIO)
+       =    return false;
+       =  else
+          {
+ =            scan->hit_final_extent =3D = true;
+            return = true;
+          }
+   =    }
+
+    hole_pos =3D lseek = (scan->fd, data_pos, SEEK_HOLE);
+    if = (hole_pos < 0)
+      {
+   =      if (errno !=3D ENXIO)
+     =      return false;
+       =  else
+          {
+ =            scan->hit_final_extent =3D = true;
+            return = true;
+          }
+   =    }
+
+   =  ext_info[i].ext_logical =3D data_pos;
+   =  ext_info[i].ext_length =3D hole_pos - data_pos;
+   =  scan->scan_start =3D hole_pos;
+   =  ++i;
+  } while (scan->scan_start < = scan->src_total_size && i < = count);
+
+  i--;
+ =  scan->ei_count =3D i;
+  scan->ext_info =3D = xnmalloc (scan->ei_count, sizeof (struct = extent_info));
+
+  for (i =3D 0; i < = scan->ei_count; i++)
+    {
+   =    assert (ext_info[i].ext_logical <=3D = OFF_T_MAX);
+
+     =  scan->ext_info[i].ext_logical =3D = ext_info[i].ext_logical;
+     =  scan->ext_info[i].ext_length =3D = ext_info[i].ext_length;
+   =  }
+
+  return = true; 
+}
 #else
 extern = bool
 extent_scan_read (struct extent_scan *scan = ATTRIBUTE_UNUSED)
diff --git a/src/extent-scan.h = b/src/extent-scan.h
index 4724b25..a271b95 = 100644
--- a/src/extent-scan.h
+++ = b/src/extent-scan.h
@@ -18,7 +18,6 = @@
 
 #ifndef = EXTENT_SCAN_H
 # define = EXTENT_SCAN_H
-
 /* Structure used to store = information of each extent.  */
 struct = extent_info
 {
@@ -38,6 +37,11 @@ struct = extent_scan
   /* File descriptor of extent scan run = against.  */
   int = fd;
 
+#if defined(SEEK_DATA) && = defined(SEEK_HOLE)
+  /* Source file size, i.e, (struct = stat) &statbuf.st_size.  */
+  size_t = src_total_size;
+#endif
+
   /* = Next scan start offset.  */
   off_t = scan_start;
 
@@ -55,7 +59,8 @@ struct = extent_scan
   struct extent_info = *ext_info;
 };
 
-void = extent_scan_init (int src_fd, struct extent_scan *scan);
+void = extent_scan_init (int src_fd, size_t src_total_size,
+   =                     = struct extent_scan *scan);
 
 bool = extent_scan_read (struct extent_scan = *scan);
 
-- 
1.7.4
= --Apple-Mail-3--331228896-- ------------=_1593137762-14034-1--