From debbugs-submit-bounces@debbugs.gnu.org Thu Feb 17 08:49:27 2011 Received: (at submit) by debbugs.gnu.org; 17 Feb 2011 13:49:27 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Pq4EY-0005Z7-Aw for submit@debbugs.gnu.org; Thu, 17 Feb 2011 08:49:27 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Pq4EV-0005Yv-T5 for submit@debbugs.gnu.org; Thu, 17 Feb 2011 08:49:24 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pq4NH-00089I-L7 for submit@debbugs.gnu.org; Thu, 17 Feb 2011 08:58:29 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,HTML_MESSAGE, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:43637) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pq4NH-00089B-Ae for submit@debbugs.gnu.org; Thu, 17 Feb 2011 08:58:27 -0500 Received: from [140.186.70.92] (port=39195 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pq4Mw-0004mB-2Q for bug-coreutils@gnu.org; Thu, 17 Feb 2011 08:58:26 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pq4MU-0007zp-Lx for bug-coreutils@gnu.org; Thu, 17 Feb 2011 08:57:40 -0500 Received: from rcsinet10.oracle.com ([148.87.113.121]:18456) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pq4MU-0007zR-8M for bug-coreutils@gnu.org; Thu, 17 Feb 2011 08:57:38 -0500 Received: from acsinet15.oracle.com (acsinet15.oracle.com [141.146.126.227]) by rcsinet10.oracle.com (Switch-3.4.2/Switch-3.4.2) with ESMTP id p1HDvT4c001402 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 17 Feb 2011 13:57:34 GMT Received: from acsmt355.oracle.com (acsmt355.oracle.com [141.146.40.155]) by acsinet15.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id p1HD39m2027007; Thu, 17 Feb 2011 13:57:29 GMT Received: from abhmt014.oracle.com by acsmt354.oracle.com with ESMTP id 1063839691297951043; Thu, 17 Feb 2011 05:57:23 -0800 Received: from [192.168.1.101] (/221.223.109.12) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 17 Feb 2011 05:57:21 -0800 From: Jeff liu Content-Type: multipart/alternative; boundary=Apple-Mail-3--331228896 Subject: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module Date: Thu, 17 Feb 2011 21:57:14 +0800 Message-Id: <2DB776C1-EF34-423D-8BE5-71C2F49DFF01@oracle.com> To: bug-coreutils@gnu.org Mime-Version: 1.0 (Apple Message framework v1081) X-Mailer: Apple Mail (2.1081) X-Source-IP: acsmt355.oracle.com [141.146.40.155] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090203.4D5D2949.0106:SCFMA4539814,ss=1,fgs=0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 199.232.76.165 X-Spam-Score: -6.4 (------) X-Debbugs-Envelope-To: submit Cc: "jeff.liu" , Jim Meyering X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.4 (------) --Apple-Mail-3--331228896 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hello All, This is the first try to introduce the SEEK_DATA/SEEK_HOLE support to = extent_scan module for efficient sparse file copy on ZFS, I have = delayed it for a long time, sorry for that! Below is the code change lists: src/extent_scan.h: add a new structure item 'src_total_size' to "struct = extent_info", since I have to make use of this value to determine a file is sparse of not for the initial scan. If the returns of = lseek(fd, 0, SEEK_HOLE) is equal to the src_total_size or large than it, = it means the source file is definitely not a sparse file or maybe it is a sparse file but it does = not make sense for proceeding scan read. another change in this file is the signature of extent_scan_init(), just = as I mentioned above, it need to accept the src_total_size variable. src/extent_scan.c: implement the new exent_scan_read() through = SEEK_DATA/SEEK_HOLE, it will be called if those two values are defined = at . src/copy.c: pass src_total_size to extent_scan_init(). On my test environment, Solaris10, SunOS 5.10 Generic_142910-17, I have = tried a few simple cases, they are works to me. For now, I am using diff(1) to verify the copy result, does anyone know = some utilities can be used to write the test script? I have sent an email to ZFS DEV mail-list to ask this question = yesterday, a nice guy suggest me to use = ZDB(http://cuddletech.com/blog/?p=3D407) for that, I'm still study this utility now, I also noticed there is patch to add = SEEK_HOLE/SEEK_DATA support to os module in Python community, please = refer to: http://bugs.python.org/file19566/z.patch but it require very latest python build I think, so could anyone give = some other advices in this point? The patch is shown as following, any help testing and comments are = appreciated!! Thanks, -Jeff From: Jie Liu Date: Thu, 17 Feb 2011 21:14:23 +0800 Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to = extent_scan module * src/extent_scan.h: add src_total_size to struct extent_info, we need to check the SEEK_HOLE result against it for initial extent scan. modify the extent_scan_init() signature, to add size_t src_total_size. * src/extent_scan.c: implement a new extent_scan_read() through = SEEK_DATA and SEEK_HOLE. * src/copy.c: pass src_total_size to extent_scan_init(). Signed-off-by: Jie Liu --- src/copy.c | 2 +- src/extent-scan.c | 113 = ++++++++++++++++++++++++++++++++++++++++++++++++++++- src/extent-scan.h | 9 +++- 3 files changed, 120 insertions(+), 4 deletions(-) diff --git a/src/copy.c b/src/copy.c index 104652d..22b9911 100644 --- a/src/copy.c +++ b/src/copy.c @@ -306,7 +306,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, = size_t buf_size, We may need this at the end, for a final ftruncate. */ off_t dest_pos =3D 0; =20 - extent_scan_init (src_fd, &scan); + extent_scan_init (src_fd, src_total_size, &scan); =20 *require_normal_copy =3D false; bool wrote_hole_at_eof =3D true; diff --git a/src/extent-scan.c b/src/extent-scan.c index 1ba59db..ffeab7a 100644 --- a/src/extent-scan.c +++ b/src/extent-scan.c @@ -32,13 +32,17 @@ /* Allocate space for struct extent_scan, initialize the entries if necessary and return it as the input argument of extent_scan_read(). = */ extern void -extent_scan_init (int src_fd, struct extent_scan *scan) +extent_scan_init (int src_fd, size_t src_total_size, + struct extent_scan *scan) { scan->fd =3D src_fd; scan->ei_count =3D 0; scan->scan_start =3D 0; scan->initial_scan_failed =3D false; scan->hit_final_extent =3D false; +#if defined(SEEK_HOLE) && defined(SEEK_DATA) + scan->src_total_size =3D src_total_size; +#endif } =20 #ifdef __linux__ @@ -106,6 +110,113 @@ extent_scan_read (struct extent_scan *scan) =20 return true; } +#elif defined(SEEK_HOLE) && defined(SEEK_DATA) +extern bool +extent_scan_read (struct extent_scan *scan) +{ + off_t data_pos, hole_pos; + union { struct extent_info ei; char c[4096]; } extent_buf; + struct extent_info *ext_info =3D &extent_buf.ei; + enum { count =3D (sizeof extent_buf / sizeof *ext_info) }; + verify (count !=3D 0); + + memset (&extent_buf, 0, sizeof extent_buf); + + if (scan->scan_start =3D=3D 0) + { +# ifdef _PC_MIN_HOLE_SIZE + /* To determine if the underlaying file system support + SEEK_HOLE, if not, fall back to the standard copy. */ + if (fpathconf (scan->fd, _PC_MIN_HOLE_SIZE) < 0) + { + scan->initial_scan_failed =3D true; + return false; + } +# endif + + /* If we have been compiled on an OS that supports SEEK_HOLE + but run on an OS that does not support SEEK_HOLE, we get + EINVAL. If the underlying filesystem does not support the + SEEK_HOLE call, we get ENOTSUP, fall back to standard copy + in either case. */ + hole_pos =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE); + if (hole_pos < 0) + { + if (errno =3D=3D EINVAL || errno =3D=3D ENOTSUP) + scan->initial_scan_failed =3D true; + return false; + } + + /* Seek back to position 0 first if we detected a real hole. */ + if (hole_pos > 0) + { + off_t tmp_pos; + tmp_pos =3D lseek (scan->fd, (off_t) 0, SEEK_SET); + if (tmp_pos !=3D (off_t) 0) + return false; + + /* The source file is definitely not a sparse file, or it + maybe a sparse file but SEEK_HOLE returns the source = file's + total size, fall back to the standard copy too. */ + if (hole_pos >=3D scan->src_total_size) + { + scan->initial_scan_failed =3D true; + return false; + } + } + } + + unsigned int i =3D 0; + /* If lseek(2) failed and the errno is set to ENXIO, for + SEEK_DATA there are no more data regions past the supplied + offset. For SEEK_HOLE, there are no more holes past the=20 + supplied offset. Set scan->hit_final_extent to true for + either case. */ + do { + data_pos =3D lseek (scan->fd, scan->scan_start, SEEK_DATA); + if (data_pos < 0) + { + if (errno !=3D ENXIO) + return false; + else + { + scan->hit_final_extent =3D true; + return true; + } + } + + hole_pos =3D lseek (scan->fd, data_pos, SEEK_HOLE); + if (hole_pos < 0) + { + if (errno !=3D ENXIO) + return false; + else + { + scan->hit_final_extent =3D true; + return true; + } + } + + ext_info[i].ext_logical =3D data_pos; + ext_info[i].ext_length =3D hole_pos - data_pos; + scan->scan_start =3D hole_pos; + ++i; + } while (scan->scan_start < scan->src_total_size && i < count); + + i--; + scan->ei_count =3D i; + scan->ext_info =3D xnmalloc (scan->ei_count, sizeof (struct = extent_info)); + + for (i =3D 0; i < scan->ei_count; i++) + { + assert (ext_info[i].ext_logical <=3D OFF_T_MAX); + + scan->ext_info[i].ext_logical =3D ext_info[i].ext_logical; + scan->ext_info[i].ext_length =3D ext_info[i].ext_length; + } + + return true;=20 +} #else extern bool extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED) diff --git a/src/extent-scan.h b/src/extent-scan.h index 4724b25..a271b95 100644 --- a/src/extent-scan.h +++ b/src/extent-scan.h @@ -18,7 +18,6 @@ =20 #ifndef EXTENT_SCAN_H # define EXTENT_SCAN_H - /* Structure used to store information of each extent. */ struct extent_info { @@ -38,6 +37,11 @@ struct extent_scan /* File descriptor of extent scan run against. */ int fd; =20 +#if defined(SEEK_DATA) && defined(SEEK_HOLE) + /* Source file size, i.e, (struct stat) &statbuf.st_size. */ + size_t src_total_size; +#endif + /* Next scan start offset. */ off_t scan_start; =20 @@ -55,7 +59,8 @@ struct extent_scan struct extent_info *ext_info; }; =20 -void extent_scan_init (int src_fd, struct extent_scan *scan); +void extent_scan_init (int src_fd, size_t src_total_size, + struct extent_scan *scan); =20 bool extent_scan_read (struct extent_scan *scan); =20 --=20 1.7.4= --Apple-Mail-3--331228896 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
a file is sparse of not for the = initial scan.  If the returns of lseek(fd, 0, SEEK_HOLE) is equal = to the src_total_size or large than it, it means the source = file
is definitely not a sparse file or maybe it is a sparse = file but it does not make sense for proceeding scan = read.
another change in this file is the signature of = extent_scan_init(), just as I mentioned above, it need to accept the = src_total_size variable.
src/extent_scan.c: implement the new = exent_scan_read() through SEEK_DATA/SEEK_HOLE, it will be called if = those two values are defined at <unistd.h>.
src/copy.c: = pass src_total_size to extent_scan_init().

On = my test environment,  Solaris10, SunOS 5.10 Generic_142910-17, I = have tried a few simple cases, they are works to = me.

For now, I am using diff(1) to verify the = copy result,  does anyone know some utilities can be used to write = the test script?
I have sent an email to ZFS DEV mail-list to = ask this question yesterday,  a nice guy suggest me to use ZDB(http://cuddletech.com/blog/?p= =3D407) for that, I'm
still study this utility now,   = I also noticed there is patch to add SEEK_HOLE/SEEK_DATA support to os = module in Python community,  please refer to:
but it require very latest python build I = think,  so could anyone give some other advices in this = point?

The patch is shown as following, any = help testing and comments are = appreciated!!


Thanks,
-J= eff


From: Jie Liu <jeff.liu@oracle.com>
= Date: Thu, 17 Feb 2011 21:14:23 +0800
Subject: [PATCH 1/1] = copy: add SEEK_DATA/SEEK_HOLE support to extent_scan = module

* src/extent_scan.h: add src_total_size = to struct extent_info, we need
  to check the = SEEK_HOLE result against it for initial extent = scan.
  modify the extent_scan_init() signature, to = add size_t src_total_size.
* src/extent_scan.c: implement a = new extent_scan_read() through SEEK_DATA
  and = SEEK_HOLE.
* src/copy.c: pass src_total_size to = extent_scan_init().

Signed-off-by: Jie Liu = <jeff.liu@oracle.com>
= ---
 src/copy.c        |   =  2 +-
 src/extent-scan.c |  113 = ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 src/= extent-scan.h |    9 +++-
 3 files changed, 120 = insertions(+), 4 deletions(-)

diff --git = a/src/copy.c b/src/copy.c
index 104652d..22b9911 = 100644
--- a/src/copy.c
+++ = b/src/copy.c
@@ -306,7 +306,7 @@ extent_copy (int src_fd, int = dest_fd, char *buf, size_t buf_size,
     =  We may need this at the end, for a final ftruncate. =  */
   off_t dest_pos =3D = 0;
 
-  extent_scan_init (src_fd, = &scan);
+  extent_scan_init (src_fd, src_total_size, = &scan);
 
   *require_normal_copy = =3D false;
   bool wrote_hole_at_eof =3D = true;
diff --git a/src/extent-scan.c = b/src/extent-scan.c
index 1ba59db..ffeab7a = 100644
--- a/src/extent-scan.c
+++ = b/src/extent-scan.c
@@ -32,13 +32,17 @@
 /* = Allocate space for struct extent_scan, initialize the entries = if
    necessary and return it as the input = argument of extent_scan_read().  */
 extern = void
-extent_scan_init (int src_fd, struct extent_scan = *scan)
+extent_scan_init (int src_fd, size_t = src_total_size,
+             =      struct extent_scan = *scan)
 {
   scan->fd =3D = src_fd;
   scan->ei_count =3D = 0;
   scan->scan_start =3D = 0;
   scan->initial_scan_failed =3D = false;
   scan->hit_final_extent =3D = false;
+#if defined(SEEK_HOLE) && = defined(SEEK_DATA)
+  scan->src_total_size =3D = src_total_size;
+#endif
 }
 
=
 #ifdef __linux__
@@ -106,6 +110,113 @@ = extent_scan_read (struct extent_scan = *scan)
 
   return = true;
 }
+#elif defined(SEEK_HOLE) && = defined(SEEK_DATA)
+extern bool
+extent_scan_read = (struct extent_scan *scan)
+{
+  off_t = data_pos, hole_pos;
+  union { struct extent_info ei; = char c[4096]; } extent_buf;
+  struct extent_info = *ext_info =3D &extent_buf.ei;
+  enum { count =3D = (sizeof extent_buf / sizeof *ext_info) };
+  verify = (count !=3D 0);
+
+  memset (&extent_buf, = 0, sizeof extent_buf);
+
+  if = (scan->scan_start =3D=3D 0)
+    {
+# = ifdef _PC_MIN_HOLE_SIZE
+      /* To determine = if the underlaying file system support
+       =   SEEK_HOLE, if not, fall back to the standard copy. =  */
+      if (fpathconf (scan->fd, = _PC_MIN_HOLE_SIZE) < 0)
+       =  {
+         =  scan->initial_scan_failed =3D true;
+     =      return false;
+       =  }
+# endif
+
+     =  /* If we have been compiled on an OS that supports = SEEK_HOLE
+         but run on an OS that = does not support SEEK_HOLE, we get
+       =   EINVAL.  If the underlying filesystem does not support = the
+         SEEK_HOLE call, we get = ENOTSUP, fall back to standard copy
+       =   in either case.  */
+      hole_pos = =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE);
+     =  if (hole_pos < 0)
+       =  {
+          if (errno =3D=3D = EINVAL || errno =3D=3D ENOTSUP)
+         =    scan->initial_scan_failed =3D true;
+   =        return false;
+     =    }
+
+      /* Seek back = to position 0 first if we detected a real hole.  */
+ =      if (hole_pos > 0)
+     =    {
+          off_t = tmp_pos;
+          tmp_pos =3D lseek = (scan->fd, (off_t) 0, SEEK_SET);
+       =    if (tmp_pos !=3D (off_t) 0)
+     =          return false;
+
+ =          /* The source file is definitely not a = sparse file, or it
+             = maybe a sparse file but SEEK_HOLE returns the source file's
+ =             total size, fall back to the = standard copy too.  */
+         =  if (hole_pos >=3D scan->src_total_size)
+   =          {
+       =        scan->initial_scan_failed =3D = true;
+              return = false;
+           =  }
+        }
+   =  }
+
+  unsigned int i =3D 0;
+ =  /* If lseek(2) failed and the errno is set to ENXIO, = for
+     SEEK_DATA there are no more data regions = past the supplied
+     offset.  For SEEK_HOLE, = there are no more holes past the 
+     = supplied offset.  Set scan->hit_final_extent to true = for
+     either case.  */
+  do = {
+    data_pos =3D lseek (scan->fd, = scan->scan_start, SEEK_DATA);
+    if (data_pos = < 0)
+      {
+     =    if (errno !=3D ENXIO)
+       =    return false;
+       =  else
+          {
+ =            scan->hit_final_extent =3D = true;
+            return = true;
+          }
+   =    }
+
+    hole_pos =3D lseek = (scan->fd, data_pos, SEEK_HOLE);
+    if = (hole_pos < 0)
+      {
+   =      if (errno !=3D ENXIO)
+     =      return false;
+       =  else
+          {
+ =            scan->hit_final_extent =3D = true;
+            return = true;
+          }
+   =    }
+
+   =  ext_info[i].ext_logical =3D data_pos;
+   =  ext_info[i].ext_length =3D hole_pos - data_pos;
+   =  scan->scan_start =3D hole_pos;
+   =  ++i;
+  } while (scan->scan_start < = scan->src_total_size && i < = count);
+
+  i--;
+ =  scan->ei_count =3D i;
+  scan->ext_info =3D = xnmalloc (scan->ei_count, sizeof (struct = extent_info));
+
+  for (i =3D 0; i < = scan->ei_count; i++)
+    {
+   =    assert (ext_info[i].ext_logical <=3D = OFF_T_MAX);
+
+     =  scan->ext_info[i].ext_logical =3D = ext_info[i].ext_logical;
+     =  scan->ext_info[i].ext_length =3D = ext_info[i].ext_length;
+   =  }
+
+  return = true; 
+}
 #else
 extern = bool
 extent_scan_read (struct extent_scan *scan = ATTRIBUTE_UNUSED)
diff --git a/src/extent-scan.h = b/src/extent-scan.h
index 4724b25..a271b95 = 100644
--- a/src/extent-scan.h
+++ = b/src/extent-scan.h
@@ -18,7 +18,6 = @@
 
 #ifndef = EXTENT_SCAN_H
 # define = EXTENT_SCAN_H
-
 /* Structure used to store = information of each extent.  */
 struct = extent_info
 {
@@ -38,6 +37,11 @@ struct = extent_scan
   /* File descriptor of extent scan run = against.  */
   int = fd;
 
+#if defined(SEEK_DATA) && = defined(SEEK_HOLE)
+  /* Source file size, i.e, (struct = stat) &statbuf.st_size.  */
+  size_t = src_total_size;
+#endif
+
   /* = Next scan start offset.  */
   off_t = scan_start;
 
@@ -55,7 +59,8 @@ struct = extent_scan
   struct extent_info = *ext_info;
 };
 
-void = extent_scan_init (int src_fd, struct extent_scan *scan);
+void = extent_scan_init (int src_fd, size_t src_total_size,
+   =                     = struct extent_scan *scan);
 
 bool = extent_scan_read (struct extent_scan = *scan);
 
-- 
1.7.4
= --Apple-Mail-3--331228896-- From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 18 10:16:14 2011 Received: (at submit) by debbugs.gnu.org; 18 Apr 2011 14:16:14 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QBpFK-0001ha-Jh for submit@debbugs.gnu.org; Mon, 18 Apr 2011 10:16:14 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QBpFH-0001h8-0q for submit@debbugs.gnu.org; Mon, 18 Apr 2011 10:16:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QBpF7-0000h8-P2 for submit@debbugs.gnu.org; Mon, 18 Apr 2011 10:16:01 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,HTML_MESSAGE, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:44165) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QBpF7-0000h4-NG for submit@debbugs.gnu.org; Mon, 18 Apr 2011 10:15:57 -0400 Received: from eggs.gnu.org ([140.186.70.92]:42879) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QBpF3-0006wU-I4 for bug-coreutils@gnu.org; Mon, 18 Apr 2011 10:15:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QBpEx-0000g4-8J for bug-coreutils@gnu.org; Mon, 18 Apr 2011 10:15:53 -0400 Received: from rcsinet10.oracle.com ([148.87.113.121]:56885) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QBpEw-0000fo-UI for bug-coreutils@gnu.org; Mon, 18 Apr 2011 10:15:47 -0400 Received: from acsinet15.oracle.com (acsinet15.oracle.com [141.146.126.227]) by rcsinet10.oracle.com (Switch-3.4.2/Switch-3.4.2) with ESMTP id p3IEFga9001504 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 18 Apr 2011 14:15:44 GMT Received: from acsmt358.oracle.com (acsmt358.oracle.com [141.146.40.158]) by acsinet15.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id p3IEFe4G019882 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 18 Apr 2011 14:15:40 GMT Received: from abhmt016.oracle.com (abhmt016.oracle.com [141.146.116.25]) by acsmt358.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id p3IEFdvj013419; Mon, 18 Apr 2011 09:15:40 -0500 Received: from [192.168.1.100] (/123.119.97.134) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 18 Apr 2011 07:15:34 -0700 Subject: Re: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: multipart/alternative; boundary=Apple-Mail-3-558882838 From: Jeff liu In-Reply-To: <2DB776C1-EF34-423D-8BE5-71C2F49DFF01@oracle.com> Date: Mon, 18 Apr 2011 22:15:13 +0800 Message-Id: References: <2DB776C1-EF34-423D-8BE5-71C2F49DFF01@oracle.com> To: Jeff liu X-Mailer: Apple Mail (2.1082) X-Source-IP: acsmt358.oracle.com [141.146.40.158] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090201.4DAC478D.005F:SCFSTAT5015188, ss=1, pt=DBB_65838, fgs=0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -6.4 (------) X-Debbugs-Envelope-To: submit Cc: bug-coreutils@gnu.org, Jim Meyering X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.4 (------) --Apple-Mail-3-558882838 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=GB2312 Hi All, Please ignore the current patch, I will submit another patch with a few = fixes soon. Thanks, -Jeff =D4=DA 2011-2-17=A3=AC=CF=C2=CE=E79:57=A3=AC Jeff liu =D0=B4=B5=C0=A3=BA > Hello All, >=20 > This is the first try to introduce the SEEK_DATA/SEEK_HOLE support to = extent_scan module for efficient sparse file copy on ZFS, I have = delayed it for a long time, sorry for that! >=20 > Below is the code change lists: > src/extent_scan.h: add a new structure item 'src_total_size' to = "struct extent_info", since I have to make use of this value to = determine > a file is sparse of not for the initial scan. If the returns of = lseek(fd, 0, SEEK_HOLE) is equal to the src_total_size or large than it, = it means the source file > is definitely not a sparse file or maybe it is a sparse file but it = does not make sense for proceeding scan read. > another change in this file is the signature of extent_scan_init(), = just as I mentioned above, it need to accept the src_total_size = variable. > src/extent_scan.c: implement the new exent_scan_read() through = SEEK_DATA/SEEK_HOLE, it will be called if those two values are defined = at . > src/copy.c: pass src_total_size to extent_scan_init(). >=20 > On my test environment, Solaris10, SunOS 5.10 Generic_142910-17, I = have tried a few simple cases, they are works to me. >=20 > For now, I am using diff(1) to verify the copy result, does anyone = know some utilities can be used to write the test script? > I have sent an email to ZFS DEV mail-list to ask this question = yesterday, a nice guy suggest me to use = ZDB(http://cuddletech.com/blog/?p=3D407) for that, I'm > still study this utility now, I also noticed there is patch to add = SEEK_HOLE/SEEK_DATA support to os module in Python community, please = refer to: > http://bugs.python.org/file19566/z.patch > but it require very latest python build I think, so could anyone give = some other advices in this point? >=20 > The patch is shown as following, any help testing and comments are = appreciated!! >=20 >=20 > Thanks, > -Jeff >=20 >=20 > From: Jie Liu > Date: Thu, 17 Feb 2011 21:14:23 +0800 > Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to = extent_scan module >=20 > * src/extent_scan.h: add src_total_size to struct extent_info, we need > to check the SEEK_HOLE result against it for initial extent scan. > modify the extent_scan_init() signature, to add size_t = src_total_size. > * src/extent_scan.c: implement a new extent_scan_read() through = SEEK_DATA > and SEEK_HOLE. > * src/copy.c: pass src_total_size to extent_scan_init(). >=20 > Signed-off-by: Jie Liu > --- > src/copy.c | 2 +- > src/extent-scan.c | 113 = ++++++++++++++++++++++++++++++++++++++++++++++++++++- > src/extent-scan.h | 9 +++- > 3 files changed, 120 insertions(+), 4 deletions(-) >=20 > diff --git a/src/copy.c b/src/copy.c > index 104652d..22b9911 100644 > --- a/src/copy.c > +++ b/src/copy.c > @@ -306,7 +306,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, = size_t buf_size, > We may need this at the end, for a final ftruncate. */ > off_t dest_pos =3D 0; > =20 > - extent_scan_init (src_fd, &scan); > + extent_scan_init (src_fd, src_total_size, &scan); > =20 > *require_normal_copy =3D false; > bool wrote_hole_at_eof =3D true; > diff --git a/src/extent-scan.c b/src/extent-scan.c > index 1ba59db..ffeab7a 100644 > --- a/src/extent-scan.c > +++ b/src/extent-scan.c > @@ -32,13 +32,17 @@ > /* Allocate space for struct extent_scan, initialize the entries if > necessary and return it as the input argument of = extent_scan_read(). */ > extern void > -extent_scan_init (int src_fd, struct extent_scan *scan) > +extent_scan_init (int src_fd, size_t src_total_size, > + struct extent_scan *scan) > { > scan->fd =3D src_fd; > scan->ei_count =3D 0; > scan->scan_start =3D 0; > scan->initial_scan_failed =3D false; > scan->hit_final_extent =3D false; > +#if defined(SEEK_HOLE) && defined(SEEK_DATA) > + scan->src_total_size =3D src_total_size; > +#endif > } > =20 > #ifdef __linux__ > @@ -106,6 +110,113 @@ extent_scan_read (struct extent_scan *scan) > =20 > return true; > } > +#elif defined(SEEK_HOLE) && defined(SEEK_DATA) > +extern bool > +extent_scan_read (struct extent_scan *scan) > +{ > + off_t data_pos, hole_pos; > + union { struct extent_info ei; char c[4096]; } extent_buf; > + struct extent_info *ext_info =3D &extent_buf.ei; > + enum { count =3D (sizeof extent_buf / sizeof *ext_info) }; > + verify (count !=3D 0); > + > + memset (&extent_buf, 0, sizeof extent_buf); > + > + if (scan->scan_start =3D=3D 0) > + { > +# ifdef _PC_MIN_HOLE_SIZE > + /* To determine if the underlaying file system support > + SEEK_HOLE, if not, fall back to the standard copy. */ > + if (fpathconf (scan->fd, _PC_MIN_HOLE_SIZE) < 0) > + { > + scan->initial_scan_failed =3D true; > + return false; > + } > +# endif > + > + /* If we have been compiled on an OS that supports SEEK_HOLE > + but run on an OS that does not support SEEK_HOLE, we get > + EINVAL. If the underlying filesystem does not support the > + SEEK_HOLE call, we get ENOTSUP, fall back to standard copy > + in either case. */ > + hole_pos =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE); > + if (hole_pos < 0) > + { > + if (errno =3D=3D EINVAL || errno =3D=3D ENOTSUP) > + scan->initial_scan_failed =3D true; > + return false; > + } > + > + /* Seek back to position 0 first if we detected a real hole. = */ > + if (hole_pos > 0) > + { > + off_t tmp_pos; > + tmp_pos =3D lseek (scan->fd, (off_t) 0, SEEK_SET); > + if (tmp_pos !=3D (off_t) 0) > + return false; > + > + /* The source file is definitely not a sparse file, or it > + maybe a sparse file but SEEK_HOLE returns the source = file's > + total size, fall back to the standard copy too. */ > + if (hole_pos >=3D scan->src_total_size) > + { > + scan->initial_scan_failed =3D true; > + return false; > + } > + } > + } > + > + unsigned int i =3D 0; > + /* If lseek(2) failed and the errno is set to ENXIO, for > + SEEK_DATA there are no more data regions past the supplied > + offset. For SEEK_HOLE, there are no more holes past the=20 > + supplied offset. Set scan->hit_final_extent to true for > + either case. */ > + do { > + data_pos =3D lseek (scan->fd, scan->scan_start, SEEK_DATA); > + if (data_pos < 0) > + { > + if (errno !=3D ENXIO) > + return false; > + else > + { > + scan->hit_final_extent =3D true; > + return true; > + } > + } > + > + hole_pos =3D lseek (scan->fd, data_pos, SEEK_HOLE); > + if (hole_pos < 0) > + { > + if (errno !=3D ENXIO) > + return false; > + else > + { > + scan->hit_final_extent =3D true; > + return true; > + } > + } > + > + ext_info[i].ext_logical =3D data_pos; > + ext_info[i].ext_length =3D hole_pos - data_pos; > + scan->scan_start =3D hole_pos; > + ++i; > + } while (scan->scan_start < scan->src_total_size && i < count); > + > + i--; > + scan->ei_count =3D i; > + scan->ext_info =3D xnmalloc (scan->ei_count, sizeof (struct = extent_info)); > + > + for (i =3D 0; i < scan->ei_count; i++) > + { > + assert (ext_info[i].ext_logical <=3D OFF_T_MAX); > + > + scan->ext_info[i].ext_logical =3D ext_info[i].ext_logical; > + scan->ext_info[i].ext_length =3D ext_info[i].ext_length; > + } > + > + return true;=20 > +} > #else > extern bool > extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED) > diff --git a/src/extent-scan.h b/src/extent-scan.h > index 4724b25..a271b95 100644 > --- a/src/extent-scan.h > +++ b/src/extent-scan.h > @@ -18,7 +18,6 @@ > =20 > #ifndef EXTENT_SCAN_H > # define EXTENT_SCAN_H > - > /* Structure used to store information of each extent. */ > struct extent_info > { > @@ -38,6 +37,11 @@ struct extent_scan > /* File descriptor of extent scan run against. */ > int fd; > =20 > +#if defined(SEEK_DATA) && defined(SEEK_HOLE) > + /* Source file size, i.e, (struct stat) &statbuf.st_size. */ > + size_t src_total_size; > +#endif > + > /* Next scan start offset. */ > off_t scan_start; > =20 > @@ -55,7 +59,8 @@ struct extent_scan > struct extent_info *ext_info; > }; > =20 > -void extent_scan_init (int src_fd, struct extent_scan *scan); > +void extent_scan_init (int src_fd, size_t src_total_size, > + struct extent_scan *scan); > =20 > bool extent_scan_read (struct extent_scan *scan); > =20 > --=20 > 1.7.4 --Apple-Mail-3-558882838 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=GB2312 Hi = All,

Please ignore the current patch, I will submit = another patch with a few fixes = soon.


Thanks,
-Jeff

=D4=DA 2011-2-17=A3=AC=CF=C2=CE=E79:57=A3=AC Jeff = liu =D0=B4=B5=C0=A3=BA

Hello = All,

This is the first try to introduce the = SEEK_DATA/SEEK_HOLE support to extent_scan module for efficient sparse = file copy on ZFS,  I have delayed it for a long time, sorry for = that!

Below is the code change = lists:
src/extent_scan.h:  add a new structure item = 'src_total_size' to "struct extent_info",  since I have to make use = of this value to determine
a file is sparse of not for the = initial scan.  If the returns of lseek(fd, 0, SEEK_HOLE) is equal = to the src_total_size or large than it, it means the source = file
is definitely not a sparse file or maybe it is a sparse = file but it does not make sense for proceeding scan = read.
another change in this file is the signature of = extent_scan_init(), just as I mentioned above, it need to accept the = src_total_size variable.
src/extent_scan.c: implement the new = exent_scan_read() through SEEK_DATA/SEEK_HOLE, it will be called if = those two values are defined at <unistd.h>.
src/copy.c: = pass src_total_size to extent_scan_init().

On = my test environment,  Solaris10, SunOS 5.10 Generic_142910-17, I = have tried a few simple cases, they are works to = me.

For now, I am using diff(1) to verify the = copy result,  does anyone know some utilities can be used to write = the test script?
I have sent an email to ZFS DEV mail-list to = ask this question yesterday,  a nice guy suggest me to use ZDB(http://cuddletech.com/blog/?p= =3D407) for that, I'm
still study this utility now,   = I also noticed there is patch to add SEEK_HOLE/SEEK_DATA support to os = module in Python community,  please refer to:
but it require very latest python build I = think,  so could anyone give some other advices in this = point?

The patch is shown as following, any = help testing and comments are = appreciated!!


Thanks,
-J= eff


From: Jie Liu <jeff.liu@oracle.com>
= Date: Thu, 17 Feb 2011 21:14:23 +0800
Subject: [PATCH 1/1] = copy: add SEEK_DATA/SEEK_HOLE support to extent_scan = module

* src/extent_scan.h: add src_total_size = to struct extent_info, we need
  to check the = SEEK_HOLE result against it for initial extent = scan.
  modify the extent_scan_init() signature, to = add size_t src_total_size.
* src/extent_scan.c: implement a = new extent_scan_read() through SEEK_DATA
  and = SEEK_HOLE.
* src/copy.c: pass src_total_size to = extent_scan_init().

Signed-off-by: Jie Liu = <jeff.liu@oracle.com>
= ---
 src/copy.c        |   =  2 +-
 src/extent-scan.c |  113 = ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 src/= extent-scan.h |    9 +++-
 3 files changed, 120 = insertions(+), 4 deletions(-)

diff --git = a/src/copy.c b/src/copy.c
index 104652d..22b9911 = 100644
--- a/src/copy.c
+++ = b/src/copy.c
@@ -306,7 +306,7 @@ extent_copy (int src_fd, int = dest_fd, char *buf, size_t buf_size,
     =  We may need this at the end, for a final ftruncate. =  */
   off_t dest_pos =3D = 0;
 
-  extent_scan_init (src_fd, = &scan);
+  extent_scan_init (src_fd, src_total_size, = &scan);
 
   *require_normal_copy = =3D false;
   bool wrote_hole_at_eof =3D = true;
diff --git a/src/extent-scan.c = b/src/extent-scan.c
index 1ba59db..ffeab7a = 100644
--- a/src/extent-scan.c
+++ = b/src/extent-scan.c
@@ -32,13 +32,17 @@
 /* = Allocate space for struct extent_scan, initialize the entries = if
    necessary and return it as the input = argument of extent_scan_read().  */
 extern = void
-extent_scan_init (int src_fd, struct extent_scan = *scan)
+extent_scan_init (int src_fd, size_t = src_total_size,
+             =      struct extent_scan = *scan)
 {
   scan->fd =3D = src_fd;
   scan->ei_count =3D = 0;
   scan->scan_start =3D = 0;
   scan->initial_scan_failed =3D = false;
   scan->hit_final_extent =3D = false;
+#if defined(SEEK_HOLE) && = defined(SEEK_DATA)
+  scan->src_total_size =3D = src_total_size;
+#endif
 }
 
=
 #ifdef __linux__
@@ -106,6 +110,113 @@ = extent_scan_read (struct extent_scan = *scan)
 
   return = true;
 }
+#elif defined(SEEK_HOLE) && = defined(SEEK_DATA)
+extern bool
+extent_scan_read = (struct extent_scan *scan)
+{
+  off_t = data_pos, hole_pos;
+  union { struct extent_info ei; = char c[4096]; } extent_buf;
+  struct extent_info = *ext_info =3D &extent_buf.ei;
+  enum { count =3D = (sizeof extent_buf / sizeof *ext_info) };
+  verify = (count !=3D 0);
+
+  memset (&extent_buf, = 0, sizeof extent_buf);
+
+  if = (scan->scan_start =3D=3D 0)
+    {
+# = ifdef _PC_MIN_HOLE_SIZE
+      /* To determine = if the underlaying file system support
+       =   SEEK_HOLE, if not, fall back to the standard copy. =  */
+      if (fpathconf (scan->fd, = _PC_MIN_HOLE_SIZE) < 0)
+       =  {
+         =  scan->initial_scan_failed =3D true;
+     =      return false;
+       =  }
+# endif
+
+     =  /* If we have been compiled on an OS that supports = SEEK_HOLE
+         but run on an OS that = does not support SEEK_HOLE, we get
+       =   EINVAL.  If the underlying filesystem does not support = the
+         SEEK_HOLE call, we get = ENOTSUP, fall back to standard copy
+       =   in either case.  */
+      hole_pos = =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE);
+     =  if (hole_pos < 0)
+       =  {
+          if (errno =3D=3D = EINVAL || errno =3D=3D ENOTSUP)
+         =    scan->initial_scan_failed =3D true;
+   =        return false;
+     =    }
+
+      /* Seek back = to position 0 first if we detected a real hole.  */
+ =      if (hole_pos > 0)
+     =    {
+          off_t = tmp_pos;
+          tmp_pos =3D lseek = (scan->fd, (off_t) 0, SEEK_SET);
+       =    if (tmp_pos !=3D (off_t) 0)
+     =          return false;
+
+ =          /* The source file is definitely not a = sparse file, or it
+             = maybe a sparse file but SEEK_HOLE returns the source file's
+ =             total size, fall back to the = standard copy too.  */
+         =  if (hole_pos >=3D scan->src_total_size)
+   =          {
+       =        scan->initial_scan_failed =3D = true;
+              return = false;
+           =  }
+        }
+   =  }
+
+  unsigned int i =3D 0;
+ =  /* If lseek(2) failed and the errno is set to ENXIO, = for
+     SEEK_DATA there are no more data regions = past the supplied
+     offset.  For SEEK_HOLE, = there are no more holes past the 
+     = supplied offset.  Set scan->hit_final_extent to true = for
+     either case.  */
+  do = {
+    data_pos =3D lseek (scan->fd, = scan->scan_start, SEEK_DATA);
+    if (data_pos = < 0)
+      {
+     =    if (errno !=3D ENXIO)
+       =    return false;
+       =  else
+          {
+ =            scan->hit_final_extent =3D = true;
+            return = true;
+          }
+   =    }
+
+    hole_pos =3D lseek = (scan->fd, data_pos, SEEK_HOLE);
+    if = (hole_pos < 0)
+      {
+   =      if (errno !=3D ENXIO)
+     =      return false;
+       =  else
+          {
+ =            scan->hit_final_extent =3D = true;
+            return = true;
+          }
+   =    }
+
+   =  ext_info[i].ext_logical =3D data_pos;
+   =  ext_info[i].ext_length =3D hole_pos - data_pos;
+   =  scan->scan_start =3D hole_pos;
+   =  ++i;
+  } while (scan->scan_start < = scan->src_total_size && i < = count);
+
+  i--;
+ =  scan->ei_count =3D i;
+  scan->ext_info =3D = xnmalloc (scan->ei_count, sizeof (struct = extent_info));
+
+  for (i =3D 0; i < = scan->ei_count; i++)
+    {
+   =    assert (ext_info[i].ext_logical <=3D = OFF_T_MAX);
+
+     =  scan->ext_info[i].ext_logical =3D = ext_info[i].ext_logical;
+     =  scan->ext_info[i].ext_length =3D = ext_info[i].ext_length;
+   =  }
+
+  return = true; 
+}
 #else
 extern = bool
 extent_scan_read (struct extent_scan *scan = ATTRIBUTE_UNUSED)
diff --git a/src/extent-scan.h = b/src/extent-scan.h
index 4724b25..a271b95 = 100644
--- a/src/extent-scan.h
+++ = b/src/extent-scan.h
@@ -18,7 +18,6 = @@
 
 #ifndef = EXTENT_SCAN_H
 # define = EXTENT_SCAN_H
-
 /* Structure used to store = information of each extent.  */
 struct = extent_info
 {
@@ -38,6 +37,11 @@ struct = extent_scan
   /* File descriptor of extent scan run = against.  */
   int = fd;
 
+#if defined(SEEK_DATA) && = defined(SEEK_HOLE)
+  /* Source file size, i.e, (struct = stat) &statbuf.st_size.  */
+  size_t = src_total_size;
+#endif
+
   /* = Next scan start offset.  */
   off_t = scan_start;
 
@@ -55,7 +59,8 @@ struct = extent_scan
   struct extent_info = *ext_info;
 };
 
-void = extent_scan_init (int src_fd, struct extent_scan *scan);
+void = extent_scan_init (int src_fd, size_t src_total_size,
+   =                     = struct extent_scan *scan);
 
 bool = extent_scan_read (struct extent_scan = *scan);
 
-- 
1.7.4

= --Apple-Mail-3-558882838-- From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 19 04:53:21 2011 Received: (at submit) by debbugs.gnu.org; 19 Apr 2011 08:53:21 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QC6gS-0004Pk-67 for submit@debbugs.gnu.org; Tue, 19 Apr 2011 04:53:21 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QC6gN-0004PW-Sl for submit@debbugs.gnu.org; Tue, 19 Apr 2011 04:53:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QC6gF-0004e3-F9 for submit@debbugs.gnu.org; Tue, 19 Apr 2011 04:53:10 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,HTML_MESSAGE, RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:42601) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QC6gF-0004dz-9g for submit@debbugs.gnu.org; Tue, 19 Apr 2011 04:53:07 -0400 Received: from eggs.gnu.org ([140.186.70.92]:60467) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QC6gC-0007KG-25 for bug-coreutils@gnu.org; Tue, 19 Apr 2011 04:53:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QC6g8-0004cn-CG for bug-coreutils@gnu.org; Tue, 19 Apr 2011 04:53:04 -0400 Received: from rcsinet10.oracle.com ([148.87.113.121]:49013) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QC6g7-0004ce-VD for bug-coreutils@gnu.org; Tue, 19 Apr 2011 04:53:00 -0400 Received: from rcsinet15.oracle.com (rcsinet15.oracle.com [148.87.113.117]) by rcsinet10.oracle.com (Switch-3.4.2/Switch-3.4.2) with ESMTP id p3J8qixt002378 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 19 Apr 2011 08:52:46 GMT Received: from acsmt358.oracle.com (acsmt358.oracle.com [141.146.40.158]) by rcsinet15.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id p3J8qhGA022441 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 19 Apr 2011 08:52:43 GMT Received: from abhmt013.oracle.com (abhmt013.oracle.com [141.146.116.22]) by acsmt358.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id p3J8pgPb002153; Tue, 19 Apr 2011 03:51:42 -0500 Received: from [10.189.10.146] (/203.190.176.53) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 19 Apr 2011 01:51:41 -0700 Subject: Re: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: multipart/alternative; boundary=Apple-Mail-1-625867617 From: Jeff liu In-Reply-To: Date: Tue, 19 Apr 2011 16:51:38 +0800 Message-Id: <3E3FBE56-4D89-44A4-94ED-13F3D1F693A3@oracle.com> References: <2DB776C1-EF34-423D-8BE5-71C2F49DFF01@oracle.com> To: Jeff liu X-Mailer: Apple Mail (2.1082) X-Source-IP: acsmt358.oracle.com [141.146.40.158] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090201.4DAD4D5C.00E3:SCFSTAT5015188, ss=1, pt=DBB_65838, fgs=0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -6.4 (------) X-Debbugs-Envelope-To: submit Cc: =?GB2312?Q?P=A8=A2draig_Brady?= , bug-coreutils@gnu.org, Jim Meyering X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.4 (------) --Apple-Mail-1-625867617 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > Hi All, >=20 > Please ignore the current patch, I will submit another patch with a = few fixes soon. Now the new patch set coming, =20 In previous post, I have tried to change the extent_scan_init() = interface by adding a new argument to indicate the source file size, this will reduce the overhead of call fstat(2) in extent_scan_read(), = since the file size is definitely needed for SEEK* stuff, however, the = file size is redundant for FIEMAP. so I changed my idea to keep extent_scan_init() as before, instead, to = retrieve the file size in extent_scan_read() when launching the first = scan, one benefit is, there is nothing need to be modified in extent_copy() for this patch. Tests: =3D=3D=3D=3D A new test sparse-lseek was introduced in this post, it make use of the = sparse file generation function in Perl, and do `cmp` against the target = copied file. I have also took a look at the `sdb` utility shipped with ZFS, but did = not found any interesting stuff can be used for this test. Test run passed on my environment as below, bash-3.00# make check TESTS=3Dcp/sparse-lseek VERBOSE=3Dyes make check-TESTS make[1]: Entering directory `/coreutils/tests' make[2]: Entering directory `/coreutils/tests' PASS: cp/sparse-lseek =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 1 test passed =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D make[2]: Leaving directory `/coreutils/tests' make[1]: Leaving directory `/coreutils/tests' GEN vc_exe_in_TESTS No differences encountered Manual tests: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 1. Ensure trailing blanks, test 0 size sparse file, non-sparse file, = sparse file with hole start and hole end. 2. make syntax-check failed, I have no idea of this issue at the moment, = I also tried to run make distcheck, looks the package building, install = and uninstall procedures all passed, but it also failed at the final stage, am I missing something here?=20 The logs which were shown as following, bash-3.00# make syntax-check GFDL_version awk: syntax error near line 1 awk: bailing out near line 1 make: *** [sc_GFDL_version.z] Error 2 make distcheck: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D ...... make[1]: Entering directory `/coreutils' GEN check-ls-dircolors make my-distcheck make[2]: Entering directory `/coreutils' make syntax-check make[3]: Entering directory `/coreutils' GFDL_version awk: syntax error near line 1 awk: bailing out near line 1 make[3]: *** [sc_GFDL_version.z] Error 2 make[3]: Leaving directory `/coreutils' make[2]: *** [my-distcheck] Error 2 make[2]: Leaving directory `/coreutils' make[1]: *** [distcheck-hook] Error 2 make[1]: Leaving directory `/coreutils' make: *** [distcheck] Error 1 Below is the revised patch, =46rom 4f966c1fe6226f3f711faae120cd8bea78e722b8 Mon Sep 17 00:00:00 2001 From: Jie Liu Date: Tue, 19 Apr 2011 15:24:50 -0700 Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to = extent_scan module * src/extent_scan.h: introduce src_total_size to struct extent_info, we need it for lseek(2) iteration. * src/extent_scan.c: implement a new extent_scan_read() through = SEEK_DATA and SEEK_HOLE if those stuff are supported. * tests/cp/sparse-lseek: add a new test for lseek(2) extent copy. Signed-off-by: Jie Liu --- src/extent-scan.c | 119 = +++++++++++++++++++++++++++++++++++++++++++++++++ src/extent-scan.h | 5 ++ tests/Makefile.am | 1 + tests/cp/sparse-lseek | 56 +++++++++++++++++++++++ 4 files changed, 181 insertions(+), 0 deletions(-) create mode 100755 tests/cp/sparse-lseek diff --git a/src/extent-scan.c b/src/extent-scan.c index da7eb9d..a54eca0 100644 --- a/src/extent-scan.c +++ b/src/extent-scan.c @@ -17,7 +17,9 @@ Written by Jie Liu (jeff.liu@oracle.com). */ =20 #include +#include #include +#include #include #include #include @@ -71,6 +73,9 @@ extent_scan_init (int src_fd, struct extent_scan = *scan) scan->initial_scan_failed =3D false; scan->hit_final_extent =3D false; scan->fm_flags =3D extent_need_sync () ? FIEMAP_FLAG_SYNC : 0; +#if defined (SEEK_DATA) && defined (SEEK_HOLE) + scan->src_total_size =3D 0; +#endif } =20 #ifdef __linux__ @@ -204,6 +209,120 @@ extent_scan_read (struct extent_scan *scan) =20 return true; } +#elif defined (SEEK_HOLE) && defined (SEEK_DATA) +extern bool +extent_scan_read (struct extent_scan *scan) +{ + off_t data_pos, hole_pos; + union { struct extent_info ei; char c[4096]; } extent_buf; + struct extent_info *ext_info =3D &extent_buf.ei; + enum { count =3D (sizeof extent_buf / sizeof *ext_info) }; + verify (count !=3D 0); + + memset (&extent_buf, 0, sizeof extent_buf); + + if (scan->scan_start =3D=3D 0) + { +# ifdef _PC_MIN_HOLE_SIZE + /* To determine if the underlaying file system support + SEEK_HOLE. If not, fall back to the standard copy. */ + if (fpathconf (scan->fd, _PC_MIN_HOLE_SIZE) < 0) + { + scan->initial_scan_failed =3D true; + return false; + } +# endif + + /* If we have been compiled on an OS that supports SEEK_HOLE + but run on an OS that does not support SEEK_HOLE, we get + EINVAL. If the underlying file system does not support the + SEEK_HOLE call, we get ENOTSUP, setting initial_scan_failed + to true to fall back to the standard copy in either case. */ + hole_pos =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE); + if (hole_pos < 0) + { + if (errno =3D=3D EINVAL || errno =3D=3D ENOTSUP) + scan->initial_scan_failed =3D true; + return false; + } + + /* Seek back to position 0 first. */ + if (hole_pos > 0) + { + if (lseek (scan->fd, (off_t) 0, SEEK_SET) < 0) + return false; + } + + struct stat sb; + if (fstat (scan->fd, &sb) < 0) + return false; + + /* This is definitely not a sparse file, we treat it as a big = extent. */ + if (hole_pos >=3D sb.st_size) + { + scan->ei_count =3D 1; + scan->ext_info =3D xnmalloc (scan->ei_count, sizeof (struct = extent_info)); + scan->ext_info[0].ext_logical =3D 0; + scan->ext_info[0].ext_length =3D sb.st_size; + scan->hit_final_extent =3D true; + return true; + } + scan->src_total_size =3D sb.st_size; + } + + unsigned int i =3D 0; + /* If lseek(2) failed and the errno is set to ENXIO, for + SEEK_DATA there are no more data regions past the supplied + offset. For SEEK_HOLE, there are no more holes past the + supplied offset. Set scan->hit_final_extent to true in + either case. */ + while (scan->scan_start < scan->src_total_size && i < count) + { + data_pos =3D lseek (scan->fd, scan->scan_start, SEEK_DATA); + if (data_pos < 0) + { + if (errno =3D=3D ENXIO) + { + scan->hit_final_extent =3D true; + break; + } + return false; + } + + hole_pos =3D lseek (scan->fd, data_pos, SEEK_HOLE); + if (hole_pos < 0) + { + if (errno =3D=3D ENXIO) + { + scan->hit_final_extent =3D true; + hole_pos =3D scan->src_total_size; + if (data_pos < hole_pos) + goto preserve_ext_info; + break; + } + return false; + } + +preserve_ext_info: + ext_info[i].ext_logical =3D data_pos; + ext_info[i].ext_length =3D hole_pos - data_pos; + scan->scan_start =3D hole_pos; + ++i; + } + + scan->ei_count =3D i; + scan->ext_info =3D xnmalloc (scan->ei_count, sizeof (struct = extent_info)); + + for (i =3D 0; i < scan->ei_count; i++) + { + assert (ext_info[i].ext_logical <=3D OFF_T_MAX); + + scan->ext_info[i].ext_logical =3D ext_info[i].ext_logical; + scan->ext_info[i].ext_length =3D ext_info[i].ext_length; + } + + return (lseek (scan->fd, (off_t) 0, SEEK_SET) < 0) ? false : true; +} #else extern bool extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED) diff --git a/src/extent-scan.h b/src/extent-scan.h index 5b4ded5..4fc05c6 100644 --- a/src/extent-scan.h +++ b/src/extent-scan.h @@ -38,6 +38,11 @@ struct extent_scan /* File descriptor of extent scan run against. */ int fd; =20 +# if defined (SEEK_DATA) && defined (SEEK_HOLE) + /* Source file size, i.e, (struct stat) &statbuf.st_size. */ + size_t src_total_size; +#endif + /* Next scan start offset. */ off_t scan_start; =20 diff --git a/tests/Makefile.am b/tests/Makefile.am index 685eb52..6c596b9 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -28,6 +28,7 @@ root_tests =3D = \ cp/cp-mv-enotsup-xattr \ cp/capability \ cp/sparse-fiemap \ + cp/sparse-lseek \ dd/skip-seek-past-dev \ install/install-C-root \ ls/capability \ diff --git a/tests/cp/sparse-lseek b/tests/cp/sparse-lseek new file mode 100755 index 0000000..5b8f2c1 --- /dev/null +++ b/tests/cp/sparse-lseek @@ -0,0 +1,56 @@ +#!/bin/sh +# Test cp --sparse=3Dalways through lseek(SEEK_DATA/SEEK_HOLE) copy + +# Copyright (C) 2010-2011 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +. "${srcdir=3D.}/init.sh"; path_prepend_ ../src +print_ver_ cp +$PERL -e 1 || skip_test_ 'you lack perl' + +zfsdisk=3DdiskX +zfspool=3Dseektest + +require_root_ + +cwd=3D$PWD +cleanup_() { zpool destroy $zfspool; } + +skip=3D0 +mkfile 128m "$cwd/$zfsdisk" || skip=3D1 + +# Check if the seektest pool is already exists +zpool list $zfspool 2>/dev/null && + skip_test_ "$zfspool already exists" + +# Create pool and verify if it is mounted automatically +zpool create $zfspool "$cwd/$zfsdisk" || skip=3D1 +zpool list $zfspool >/dev/null || skip=3D1 + +test $skip =3D 1 && skip_test_ "insufficient ZFS support" + +for i in $(seq 1 2 21); do + for j in 1 2 31 100; do + $PERL -e 'BEGIN { $n =3D '$i' * 1024; *F =3D *STDOUT }' \ + -e 'for (1..'$j') { sysseek (*F, $n, 1)' \ + -e '&& syswrite (*F, chr($_)x$n) or die "$!"}' > /$zfspool/j1 = || fail=3D1 + + cp --sparse=3Dalways /$zfspool/j1 /$zfspool/j2 || fail=3D1 + cmp /$zfspool/j1 /$zfspool/j2 || fail=3D1 + test $fail =3D 1 && break 2 + done +done + +Exit $fail --=20 1.7.4 Any comments are appreciated! Thanks, -Jeff >=20 >=20 > Thanks, > -Jeff >=20 >=20 >> Hello All, >>=20 >> This is the first try to introduce the SEEK_DATA/SEEK_HOLE support to = extent_scan module for efficient sparse file copy on ZFS, I have = delayed it for a long time, sorry for that! >>=20 >> Below is the code change lists: >> src/extent_scan.h: add a new structure item 'src_total_size' to = "struct extent_info", since I have to make use of this value to = determine >> a file is sparse of not for the initial scan. If the returns of = lseek(fd, 0, SEEK_HOLE) is equal to the src_total_size or large than it, = it means the source file >> is definitely not a sparse file or maybe it is a sparse file but it = does not make sense for proceeding scan read. >> another change in this file is the signature of extent_scan_init(), = just as I mentioned above, it need to accept the src_total_size = variable. >> src/extent_scan.c: implement the new exent_scan_read() through = SEEK_DATA/SEEK_HOLE, it will be called if those two values are defined = at . >> src/copy.c: pass src_total_size to extent_scan_init(). >>=20 >> On my test environment, Solaris10, SunOS 5.10 Generic_142910-17, I = have tried a few simple cases, they are works to me. >>=20 >> For now, I am using diff(1) to verify the copy result, does anyone = know some utilities can be used to write the test script? >> I have sent an email to ZFS DEV mail-list to ask this question = yesterday, a nice guy suggest me to use = ZDB(http://cuddletech.com/blog/?p=3D407) for that, I'm >> still study this utility now, I also noticed there is patch to add = SEEK_HOLE/SEEK_DATA support to os module in Python community, please = refer to: >> http://bugs.python.org/file19566/z.patch >> but it require very latest python build I think, so could anyone = give some other advices in this point? >>=20 >> The patch is shown as following, any help testing and comments are = appreciated!! >>=20 >>=20 >> Thanks, >> -Jeff >>=20 >>=20 >> From: Jie Liu >> Date: Thu, 17 Feb 2011 21:14:23 +0800 >> Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to = extent_scan module >>=20 >> * src/extent_scan.h: add src_total_size to struct extent_info, we = need >> to check the SEEK_HOLE result against it for initial extent scan. >> modify the extent_scan_init() signature, to add size_t = src_total_size. >> * src/extent_scan.c: implement a new extent_scan_read() through = SEEK_DATA >> and SEEK_HOLE. >> * src/copy.c: pass src_total_size to extent_scan_init(). >>=20 >> Signed-off-by: Jie Liu >> --- >> src/copy.c | 2 +- >> src/extent-scan.c | 113 = ++++++++++++++++++++++++++++++++++++++++++++++++++++- >> src/extent-scan.h | 9 +++- >> 3 files changed, 120 insertions(+), 4 deletions(-) >>=20 >> diff --git a/src/copy.c b/src/copy.c >> index 104652d..22b9911 100644 >> --- a/src/copy.c >> +++ b/src/copy.c >> @@ -306,7 +306,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, = size_t buf_size, >> We may need this at the end, for a final ftruncate. */ >> off_t dest_pos =3D 0; >> =20 >> - extent_scan_init (src_fd, &scan); >> + extent_scan_init (src_fd, src_total_size, &scan); >> =20 >> *require_normal_copy =3D false; >> bool wrote_hole_at_eof =3D true; >> diff --git a/src/extent-scan.c b/src/extent-scan.c >> index 1ba59db..ffeab7a 100644 >> --- a/src/extent-scan.c >> +++ b/src/extent-scan.c >> @@ -32,13 +32,17 @@ >> /* Allocate space for struct extent_scan, initialize the entries if >> necessary and return it as the input argument of = extent_scan_read(). */ >> extern void >> -extent_scan_init (int src_fd, struct extent_scan *scan) >> +extent_scan_init (int src_fd, size_t src_total_size, >> + struct extent_scan *scan) >> { >> scan->fd =3D src_fd; >> scan->ei_count =3D 0; >> scan->scan_start =3D 0; >> scan->initial_scan_failed =3D false; >> scan->hit_final_extent =3D false; >> +#if defined(SEEK_HOLE) && defined(SEEK_DATA) >> + scan->src_total_size =3D src_total_size; >> +#endif >> } >> =20 >> #ifdef __linux__ >> @@ -106,6 +110,113 @@ extent_scan_read (struct extent_scan *scan) >> =20 >> return true; >> } >> +#elif defined(SEEK_HOLE) && defined(SEEK_DATA) >> +extern bool >> +extent_scan_read (struct extent_scan *scan) >> +{ >> + off_t data_pos, hole_pos; >> + union { struct extent_info ei; char c[4096]; } extent_buf; >> + struct extent_info *ext_info =3D &extent_buf.ei; >> + enum { count =3D (sizeof extent_buf / sizeof *ext_info) }; >> + verify (count !=3D 0); >> + >> + memset (&extent_buf, 0, sizeof extent_buf); >> + >> + if (scan->scan_start =3D=3D 0) >> + { >> +# ifdef _PC_MIN_HOLE_SIZE >> + /* To determine if the underlaying file system support >> + SEEK_HOLE, if not, fall back to the standard copy. */ >> + if (fpathconf (scan->fd, _PC_MIN_HOLE_SIZE) < 0) >> + { >> + scan->initial_scan_failed =3D true; >> + return false; >> + } >> +# endif >> + >> + /* If we have been compiled on an OS that supports SEEK_HOLE >> + but run on an OS that does not support SEEK_HOLE, we get >> + EINVAL. If the underlying filesystem does not support the >> + SEEK_HOLE call, we get ENOTSUP, fall back to standard copy >> + in either case. */ >> + hole_pos =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE); >> + if (hole_pos < 0) >> + { >> + if (errno =3D=3D EINVAL || errno =3D=3D ENOTSUP) >> + scan->initial_scan_failed =3D true; >> + return false; >> + } >> + >> + /* Seek back to position 0 first if we detected a real hole. = */ >> + if (hole_pos > 0) >> + { >> + off_t tmp_pos; >> + tmp_pos =3D lseek (scan->fd, (off_t) 0, SEEK_SET); >> + if (tmp_pos !=3D (off_t) 0) >> + return false; >> + >> + /* The source file is definitely not a sparse file, or it >> + maybe a sparse file but SEEK_HOLE returns the source = file's >> + total size, fall back to the standard copy too. */ >> + if (hole_pos >=3D scan->src_total_size) >> + { >> + scan->initial_scan_failed =3D true; >> + return false; >> + } >> + } >> + } >> + >> + unsigned int i =3D 0; >> + /* If lseek(2) failed and the errno is set to ENXIO, for >> + SEEK_DATA there are no more data regions past the supplied >> + offset. For SEEK_HOLE, there are no more holes past the=20 >> + supplied offset. Set scan->hit_final_extent to true for >> + either case. */ >> + do { >> + data_pos =3D lseek (scan->fd, scan->scan_start, SEEK_DATA); >> + if (data_pos < 0) >> + { >> + if (errno !=3D ENXIO) >> + return false; >> + else >> + { >> + scan->hit_final_extent =3D true; >> + return true; >> + } >> + } >> + >> + hole_pos =3D lseek (scan->fd, data_pos, SEEK_HOLE); >> + if (hole_pos < 0) >> + { >> + if (errno !=3D ENXIO) >> + return false; >> + else >> + { >> + scan->hit_final_extent =3D true; >> + return true; >> + } >> + } >> + >> + ext_info[i].ext_logical =3D data_pos; >> + ext_info[i].ext_length =3D hole_pos - data_pos; >> + scan->scan_start =3D hole_pos; >> + ++i; >> + } while (scan->scan_start < scan->src_total_size && i < count); >> + >> + i--; >> + scan->ei_count =3D i; >> + scan->ext_info =3D xnmalloc (scan->ei_count, sizeof (struct = extent_info)); >> + >> + for (i =3D 0; i < scan->ei_count; i++) >> + { >> + assert (ext_info[i].ext_logical <=3D OFF_T_MAX); >> + >> + scan->ext_info[i].ext_logical =3D ext_info[i].ext_logical; >> + scan->ext_info[i].ext_length =3D ext_info[i].ext_length; >> + } >> + >> + return true;=20 >> +} >> #else >> extern bool >> extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED) >> diff --git a/src/extent-scan.h b/src/extent-scan.h >> index 4724b25..a271b95 100644 >> --- a/src/extent-scan.h >> +++ b/src/extent-scan.h >> @@ -18,7 +18,6 @@ >> =20 >> #ifndef EXTENT_SCAN_H >> # define EXTENT_SCAN_H >> - >> /* Structure used to store information of each extent. */ >> struct extent_info >> { >> @@ -38,6 +37,11 @@ struct extent_scan >> /* File descriptor of extent scan run against. */ >> int fd; >> =20 >> +#if defined(SEEK_DATA) && defined(SEEK_HOLE) >> + /* Source file size, i.e, (struct stat) &statbuf.st_size. */ >> + size_t src_total_size; >> +#endif >> + >> /* Next scan start offset. */ >> off_t scan_start; >> =20 >> @@ -55,7 +59,8 @@ struct extent_scan >> struct extent_info *ext_info; >> }; >> =20 >> -void extent_scan_init (int src_fd, struct extent_scan *scan); >> +void extent_scan_init (int src_fd, size_t src_total_size, >> + struct extent_scan *scan); >> =20 >> bool extent_scan_read (struct extent_scan *scan); >> =20 >> --=20 >> 1.7.4 >=20 --Apple-Mail-1-625867617 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
Hi All,

Please ignore the = current patch, I will submit another patch with a few fixes = soon.

Now the new patch set coming, =  

In previous post, I have tried to change = the extent_scan_init() interface by adding a new argument to indicate = the source file size,
this will reduce the overhead of call = fstat(2)  in extent_scan_read(), since the file size is = definitely needed for SEEK* stuff, however, the file size is redundant = for FIEMAP.
so I changed my idea to keep extent_scan_init() as = before,  instead, to retrieve the file size in = extent_scan_read() when launching the first scan, one benefit is, there = is nothing need to
be modified in extent_copy() for this = patch.

Tests:
=3D=3D=3D=3D
A = new test sparse-lseek was introduced in this post, it make use of the = sparse file generation function in Perl, and do `cmp` against the target = copied file.
I have also took a look at the `sdb` utility = shipped with ZFS, but did not found any interesting stuff can be used = for this test.

Test run passed on my = environment as below,

bash-3.00# make = check TESTS=3Dcp/sparse-lseek VERBOSE=3Dyes
make =  check-TESTS
make[1]: Entering directory = `/coreutils/tests'
make[2]: Entering directory = `/coreutils/tests'
PASS: = cp/sparse-lseek
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
1 test passed
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
make[2]: Leaving directory `/coreutils/tests'
make[1]: = Leaving directory `/coreutils/tests'
  GEN   =  vc_exe_in_TESTS
No differences = encountered

Manual = tests:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
1. Ensure = trailing blanks, test 0 size sparse file, non-sparse file,  sparse = file with hole start and hole end.
2. make syntax-check = failed, I have no idea of this issue at the moment,  I also tried = to run make distcheck, looks the package building, install and uninstall = procedures all passed,
but it also failed at the final = stage, am I missing something here? 

The = logs which were shown as following,
bash-3.00# make = syntax-check
GFDL_version
awk: syntax error near = line 1
awk: bailing out near line 1
make: *** = [sc_GFDL_version.z] Error 2

make = distcheck:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
= ......
make[1]: Entering directory = `/coreutils'
  GEN   =  check-ls-dircolors
make my-distcheck
make[2]: = Entering directory `/coreutils'
make = syntax-check
make[3]: Entering directory = `/coreutils'
GFDL_version
awk: syntax error near = line 1
awk: bailing out near line 1
make[3]: *** = [sc_GFDL_version.z] Error 2
make[3]: Leaving directory = `/coreutils'
make[2]: *** [my-distcheck] Error = 2
make[2]: Leaving directory `/coreutils'
make[1]: = *** [distcheck-hook] Error 2
make[1]: Leaving directory = `/coreutils'
make: *** [distcheck] Error = 1



Below is the = revised patch,

=46rom = 4f966c1fe6226f3f711faae120cd8bea78e722b8 Mon Sep 17 00:00:00 = 2001
From: Jie Liu <jeff.liu@oracle.com>
= Date: Tue, 19 Apr 2011 15:24:50 -0700
Subject: [PATCH 1/1] = copy: add SEEK_DATA/SEEK_HOLE support to extent_scan = module

* src/extent_scan.h: introduce = src_total_size to struct extent_info, we
  need it = for lseek(2) iteration.
* src/extent_scan.c: implement a new = extent_scan_read() through SEEK_DATA
  and SEEK_HOLE = if those stuff are supported.
* tests/cp/sparse-lseek: add a = new test for lseek(2) extent = copy.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
= ---
 src/extent-scan.c     |  119 = +++++++++++++++++++++++++++++++++++++++++++++++++
 src/exte= nt-scan.h     |    5 = ++
 tests/Makefile.am     |    1 = +
 tests/cp/sparse-lseek |   56 = +++++++++++++++++++++++
 4 files changed, 181 = insertions(+), 0 deletions(-)
 create mode 100755 = tests/cp/sparse-lseek

diff --git = a/src/extent-scan.c b/src/extent-scan.c
index da7eb9d..a54eca0 = 100644
--- a/src/extent-scan.c
+++ = b/src/extent-scan.c
@@ -17,7 +17,9 @@
   =  Written by Jie Liu (jeff.liu@oracle.com). =  */
 
 #include = <config.h>
+#include = <fcntl.h>
 #include = <sys/types.h>
+#include = <sys/stat.h>
 #include = <sys/ioctl.h>
 #include = <sys/utsname.h>
 #include = <assert.h>
@@ -71,6 +73,9 @@ extent_scan_init (int = src_fd, struct extent_scan *scan)
   = scan->initial_scan_failed =3D false;
   = scan->hit_final_extent =3D false;
   = scan->fm_flags =3D extent_need_sync () ? FIEMAP_FLAG_SYNC : = 0;
+#if defined (SEEK_DATA) && defined = (SEEK_HOLE)
+  scan->src_total_size =3D = 0;
+#endif
 }
 
 #i= fdef __linux__
@@ -204,6 +209,120 @@ extent_scan_read (struct = extent_scan *scan)
 
   return = true;
 }
+#elif defined (SEEK_HOLE) && = defined (SEEK_DATA)
+extern bool
+extent_scan_read = (struct extent_scan *scan)
+{
+  off_t = data_pos, hole_pos;
+  union { struct extent_info ei; = char c[4096]; } extent_buf;
+  struct extent_info = *ext_info =3D &extent_buf.ei;
+  enum { count =3D = (sizeof extent_buf / sizeof *ext_info) };
+  verify = (count !=3D 0);
+
+  memset (&extent_buf, = 0, sizeof extent_buf);
+
+  if = (scan->scan_start =3D=3D 0)
+    {
+# = ifdef _PC_MIN_HOLE_SIZE
+      /* To determine = if the underlaying file system support
+       =   SEEK_HOLE.  If not, fall back to the standard copy. =  */
+      if (fpathconf (scan->fd, = _PC_MIN_HOLE_SIZE) < 0)
+       =  {
+         =  scan->initial_scan_failed =3D true;
+     =      return false;
+       =  }
+# endif
+
+     =  /* If we have been compiled on an OS that supports = SEEK_HOLE
+         but run on an OS that = does not support SEEK_HOLE, we get
+       =   EINVAL.  If the underlying file system does not support = the
+         SEEK_HOLE call, we get = ENOTSUP, setting initial_scan_failed
+       =   to true to fall back to the standard copy in either case. =  */
+      hole_pos =3D lseek = (scan->fd, (off_t) 0, SEEK_HOLE);
+      if = (hole_pos < 0)
+        {
+ =          if (errno =3D=3D EINVAL || errno =3D=3D = ENOTSUP)
+           =  scan->initial_scan_failed =3D true;
+     =      return false;
+       =  }
+
+      /* Seek back to = position 0 first.  */
+      if (hole_pos = > 0)
+        {
+   =        if (lseek (scan->fd, (off_t) 0, SEEK_SET) = < 0)
+            return = false;
+        }
+
+ =      struct stat sb;
+      if = (fstat (scan->fd, &sb) < 0)
+       =  return false;
+
+      /* This = is definitely not a sparse file, we treat it as a big extent. =  */
+      if (hole_pos >=3D = sb.st_size)
+        {
+   =        scan->ei_count =3D 1;
+   =        scan->ext_info =3D xnmalloc = (scan->ei_count, sizeof (struct extent_info));
+   =        scan->ext_info[0].ext_logical =3D = 0;
+         =  scan->ext_info[0].ext_length =3D sb.st_size;
+   =        scan->hit_final_extent =3D = true;
+          return = true;
+        }
+     =  scan->src_total_size =3D sb.st_size;
+   =  }
+
+  unsigned int i =3D 0;
+ =  /* If lseek(2) failed and the errno is set to ENXIO, = for
+     SEEK_DATA there are no more data regions = past the supplied
+     offset.  For SEEK_HOLE, = there are no more holes past the
+     supplied = offset.  Set scan->hit_final_extent to true in
+ =     either case.  */
+  while = (scan->scan_start < scan->src_total_size && i < = count)
+    {
+     =  data_pos =3D lseek (scan->fd, scan->scan_start, = SEEK_DATA);
+      if (data_pos < = 0)
+        {
+     =      if (errno =3D=3D ENXIO)
+     =        {
+         =      scan->hit_final_extent =3D true;
+ =              break;
+ =            }
+     =      return false;
+       =  }
+
+      hole_pos =3D lseek = (scan->fd, data_pos, SEEK_HOLE);
+      if = (hole_pos < 0)
+        {
+ =          if (errno =3D=3D ENXIO)
+ =            {
+     =          scan->hit_final_extent =3D = true;
+             =  hole_pos =3D scan->src_total_size;
+     =          if (data_pos < = hole_pos)
+               =  goto preserve_ext_info;
+         =      break;
+         =    }
+          return = false;
+       =  }
+
+preserve_ext_info:
+   =    ext_info[i].ext_logical =3D data_pos;
+   =    ext_info[i].ext_length =3D hole_pos - data_pos;
+ =      scan->scan_start =3D hole_pos;
+   =    ++i;
+    }
+
+ =  scan->ei_count =3D i;
+  scan->ext_info =3D = xnmalloc (scan->ei_count, sizeof (struct = extent_info));
+
+  for (i =3D 0; i < = scan->ei_count; i++)
+    {
+   =    assert (ext_info[i].ext_logical <=3D = OFF_T_MAX);
+
+     =  scan->ext_info[i].ext_logical =3D = ext_info[i].ext_logical;
+     =  scan->ext_info[i].ext_length =3D = ext_info[i].ext_length;
+   =  }
+
+  return (lseek (scan->fd, = (off_t) 0, SEEK_SET) < 0) ? false : = true;
+}
 #else
 extern = bool
 extent_scan_read (struct extent_scan *scan = ATTRIBUTE_UNUSED)
diff --git a/src/extent-scan.h = b/src/extent-scan.h
index 5b4ded5..4fc05c6 = 100644
--- a/src/extent-scan.h
+++ = b/src/extent-scan.h
@@ -38,6 +38,11 @@ struct = extent_scan
   /* File descriptor of extent scan run = against.  */
   int = fd;
 
+# if defined (SEEK_DATA) && = defined (SEEK_HOLE)
+  /* Source file size, i.e, (struct = stat) &statbuf.st_size.  */
+  size_t = src_total_size;
+#endif
+
   /* = Next scan start offset.  */
   off_t = scan_start;
 
diff --git a/tests/Makefile.am = b/tests/Makefile.am
index 685eb52..6c596b9 = 100644
--- a/tests/Makefile.am
+++ = b/tests/Makefile.am
@@ -28,6 +28,7 @@ root_tests =3D = \
   cp/cp-mv-enotsup-xattr = \
   cp/capability = \
   cp/sparse-fiemap = \
+  cp/sparse-lseek         =                     =   \
   dd/skip-seek-past-dev = \
   install/install-C-root = \
   ls/capability = \
diff --git a/tests/cp/sparse-lseek = b/tests/cp/sparse-lseek
new file mode 100755
index = 0000000..5b8f2c1
--- /dev/null
+++ = b/tests/cp/sparse-lseek
@@ -0,0 +1,56 = @@
+#!/bin/sh
+# Test cp --sparse=3Dalways through = lseek(SEEK_DATA/SEEK_HOLE) copy
+
+# Copyright (C) = 2010-2011 Free Software Foundation, Inc.
+
+# This = program is free software: you can redistribute it and/or = modify
+# it under the terms of the GNU General Public License = as published by
+# the Free Software Foundation, either = version 3 of the License, or
+# (at your option) any later = version.
+
+# This program is distributed in the = hope that it will be useful,
+# but WITHOUT ANY WARRANTY; = without even the implied warranty of
+# MERCHANTABILITY or = FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General = Public License for more details.
+
+# You should = have received a copy of the GNU General Public License
+# = along with this program.  If not, see <http://www.gnu.org/licenses/>= .
+
+. "${srcdir=3D.}/init.sh"; path_prepend_ = ../src
+print_ver_ cp
+$PERL -e 1 || skip_test_ 'you = lack = perl'
+
+zfsdisk=3DdiskX
+zfspool=3Dseektest=
+
+require_root_
+
+cwd=3D$PWD
+cleanup_() { zpool destroy $zfspool; = }
+
+skip=3D0
+mkfile 128m "$cwd/$zfsdisk" = || skip=3D1
+
+# Check if the seektest pool is = already exists
+zpool list $zfspool 2>/dev/null = &&
+  skip_test_ "$zfspool already = exists"
+
+# Create pool and verify if it is mounted = automatically
+zpool create $zfspool "$cwd/$zfsdisk" || = skip=3D1
+zpool list $zfspool >/dev/null || = skip=3D1
+
+test $skip =3D 1 && skip_test_ = "insufficient ZFS support"
+
+for i in $(seq 1 2 = 21); do
+  for j in 1 2 31 100; do
+   =  $PERL -e 'BEGIN { $n =3D '$i' * 1024; *F =3D *STDOUT }' = \
+          -e 'for (1..'$j') { = sysseek (*F, $n, 1)' \
+          -e = '&& syswrite (*F, chr($_)x$n) or die "$!"}' > /$zfspool/j1 || = fail=3D1
+
+    cp --sparse=3Dalways = /$zfspool/j1 /$zfspool/j2 || fail=3D1
+    cmp = /$zfspool/j1 /$zfspool/j2 || fail=3D1
+    test = $fail =3D 1 && break 2
+ =  done
+done
+
+Exit = $fail
-- 
1.7.4


Any comments are = appreciated!

Thanks,
-Jeff
<= div>

Hello = All,

This is the first try to introduce the = SEEK_DATA/SEEK_HOLE support to extent_scan module for efficient sparse = file copy on ZFS,  I have delayed it for a long time, sorry for = that!

Below is the code change = lists:
src/extent_scan.h:  add a new structure item = 'src_total_size' to "struct extent_info",  since I have to make use = of this value to determine
a file is sparse of not for the = initial scan.  If the returns of lseek(fd, 0, SEEK_HOLE) is equal = to the src_total_size or large than it, it means the source = file
is definitely not a sparse file or maybe it is a sparse = file but it does not make sense for proceeding scan = read.
another change in this file is the signature of = extent_scan_init(), just as I mentioned above, it need to accept the = src_total_size variable.
src/extent_scan.c: implement the new = exent_scan_read() through SEEK_DATA/SEEK_HOLE, it will be called if = those two values are defined at <unistd.h>.
src/copy.c: = pass src_total_size to extent_scan_init().

On = my test environment,  Solaris10, SunOS 5.10 Generic_142910-17, I = have tried a few simple cases, they are works to = me.

For now, I am using diff(1) to verify the = copy result,  does anyone know some utilities can be used to write = the test script?
I have sent an email to ZFS DEV mail-list to = ask this question yesterday,  a nice guy suggest me to use ZDB(http://cuddletech.com/blog/?p= =3D407) for that, I'm
still study this utility now,   = I also noticed there is patch to add SEEK_HOLE/SEEK_DATA support to os = module in Python community,  please refer to:
but it require very latest python build I = think,  so could anyone give some other advices in this = point?

The patch is shown as following, any = help testing and comments are = appreciated!!


Thanks,
-J= eff


From: Jie Liu <jeff.liu@oracle.com>
= Date: Thu, 17 Feb 2011 21:14:23 +0800
Subject: [PATCH 1/1] = copy: add SEEK_DATA/SEEK_HOLE support to extent_scan = module

* src/extent_scan.h: add src_total_size = to struct extent_info, we need
  to check the = SEEK_HOLE result against it for initial extent = scan.
  modify the extent_scan_init() signature, to = add size_t src_total_size.
* src/extent_scan.c: implement a = new extent_scan_read() through SEEK_DATA
  and = SEEK_HOLE.
* src/copy.c: pass src_total_size to = extent_scan_init().

Signed-off-by: Jie Liu = <jeff.liu@oracle.com>
= ---
 src/copy.c        |   =  2 +-
 src/extent-scan.c |  113 = ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 src/= extent-scan.h |    9 +++-
 3 files changed, 120 = insertions(+), 4 deletions(-)

diff --git = a/src/copy.c b/src/copy.c
index 104652d..22b9911 = 100644
--- a/src/copy.c
+++ = b/src/copy.c
@@ -306,7 +306,7 @@ extent_copy (int src_fd, int = dest_fd, char *buf, size_t buf_size,
     =  We may need this at the end, for a final ftruncate. =  */
   off_t dest_pos =3D = 0;
 
-  extent_scan_init (src_fd, = &scan);
+  extent_scan_init (src_fd, src_total_size, = &scan);
 
   *require_normal_copy = =3D false;
   bool wrote_hole_at_eof =3D = true;
diff --git a/src/extent-scan.c = b/src/extent-scan.c
index 1ba59db..ffeab7a = 100644
--- a/src/extent-scan.c
+++ = b/src/extent-scan.c
@@ -32,13 +32,17 @@
 /* = Allocate space for struct extent_scan, initialize the entries = if
    necessary and return it as the input = argument of extent_scan_read().  */
 extern = void
-extent_scan_init (int src_fd, struct extent_scan = *scan)
+extent_scan_init (int src_fd, size_t = src_total_size,
+             =      struct extent_scan = *scan)
 {
   scan->fd =3D = src_fd;
   scan->ei_count =3D = 0;
   scan->scan_start =3D = 0;
   scan->initial_scan_failed =3D = false;
   scan->hit_final_extent =3D = false;
+#if defined(SEEK_HOLE) && = defined(SEEK_DATA)
+  scan->src_total_size =3D = src_total_size;
+#endif
 }
 
=
 #ifdef __linux__
@@ -106,6 +110,113 @@ = extent_scan_read (struct extent_scan = *scan)
 
   return = true;
 }
+#elif defined(SEEK_HOLE) && = defined(SEEK_DATA)
+extern bool
+extent_scan_read = (struct extent_scan *scan)
+{
+  off_t = data_pos, hole_pos;
+  union { struct extent_info ei; = char c[4096]; } extent_buf;
+  struct extent_info = *ext_info =3D &extent_buf.ei;
+  enum { count =3D = (sizeof extent_buf / sizeof *ext_info) };
+  verify = (count !=3D 0);
+
+  memset (&extent_buf, = 0, sizeof extent_buf);
+
+  if = (scan->scan_start =3D=3D 0)
+    {
+# = ifdef _PC_MIN_HOLE_SIZE
+      /* To determine = if the underlaying file system support
+       =   SEEK_HOLE, if not, fall back to the standard copy. =  */
+      if (fpathconf (scan->fd, = _PC_MIN_HOLE_SIZE) < 0)
+       =  {
+         =  scan->initial_scan_failed =3D true;
+     =      return false;
+       =  }
+# endif
+
+     =  /* If we have been compiled on an OS that supports = SEEK_HOLE
+         but run on an OS that = does not support SEEK_HOLE, we get
+       =   EINVAL.  If the underlying filesystem does not support = the
+         SEEK_HOLE call, we get = ENOTSUP, fall back to standard copy
+       =   in either case.  */
+      hole_pos = =3D lseek (scan->fd, (off_t) 0, SEEK_HOLE);
+     =  if (hole_pos < 0)
+       =  {
+          if (errno =3D=3D = EINVAL || errno =3D=3D ENOTSUP)
+         =    scan->initial_scan_failed =3D true;
+   =        return false;
+     =    }
+
+      /* Seek back = to position 0 first if we detected a real hole.  */
+ =      if (hole_pos > 0)
+     =    {
+          off_t = tmp_pos;
+          tmp_pos =3D lseek = (scan->fd, (off_t) 0, SEEK_SET);
+       =    if (tmp_pos !=3D (off_t) 0)
+     =          return false;
+
+ =          /* The source file is definitely not a = sparse file, or it
+             = maybe a sparse file but SEEK_HOLE returns the source file's
+ =             total size, fall back to the = standard copy too.  */
+         =  if (hole_pos >=3D scan->src_total_size)
+   =          {
+       =        scan->initial_scan_failed =3D = true;
+              return = false;
+           =  }
+        }
+   =  }
+
+  unsigned int i =3D 0;
+ =  /* If lseek(2) failed and the errno is set to ENXIO, = for
+     SEEK_DATA there are no more data regions = past the supplied
+     offset.  For SEEK_HOLE, = there are no more holes past the 
+     = supplied offset.  Set scan->hit_final_extent to true = for
+     either case.  */
+  do = {
+    data_pos =3D lseek (scan->fd, = scan->scan_start, SEEK_DATA);
+    if (data_pos = < 0)
+      {
+     =    if (errno !=3D ENXIO)
+       =    return false;
+       =  else
+          {
+ =            scan->hit_final_extent =3D = true;
+            return = true;
+          }
+   =    }
+
+    hole_pos =3D lseek = (scan->fd, data_pos, SEEK_HOLE);
+    if = (hole_pos < 0)
+      {
+   =      if (errno !=3D ENXIO)
+     =      return false;
+       =  else
+          {
+ =            scan->hit_final_extent =3D = true;
+            return = true;
+          }
+   =    }
+
+   =  ext_info[i].ext_logical =3D data_pos;
+   =  ext_info[i].ext_length =3D hole_pos - data_pos;
+   =  scan->scan_start =3D hole_pos;
+   =  ++i;
+  } while (scan->scan_start < = scan->src_total_size && i < = count);
+
+  i--;
+ =  scan->ei_count =3D i;
+  scan->ext_info =3D = xnmalloc (scan->ei_count, sizeof (struct = extent_info));
+
+  for (i =3D 0; i < = scan->ei_count; i++)
+    {
+   =    assert (ext_info[i].ext_logical <=3D = OFF_T_MAX);
+
+     =  scan->ext_info[i].ext_logical =3D = ext_info[i].ext_logical;
+     =  scan->ext_info[i].ext_length =3D = ext_info[i].ext_length;
+   =  }
+
+  return = true; 
+}
 #else
 extern = bool
 extent_scan_read (struct extent_scan *scan = ATTRIBUTE_UNUSED)
diff --git a/src/extent-scan.h = b/src/extent-scan.h
index 4724b25..a271b95 = 100644
--- a/src/extent-scan.h
+++ = b/src/extent-scan.h
@@ -18,7 +18,6 = @@
 
 #ifndef = EXTENT_SCAN_H
 # define = EXTENT_SCAN_H
-
 /* Structure used to store = information of each extent.  */
 struct = extent_info
 {
@@ -38,6 +37,11 @@ struct = extent_scan
   /* File descriptor of extent scan run = against.  */
   int = fd;
 
+#if defined(SEEK_DATA) && = defined(SEEK_HOLE)
+  /* Source file size, i.e, (struct = stat) &statbuf.st_size.  */
+  size_t = src_total_size;
+#endif
+
   /* = Next scan start offset.  */
   off_t = scan_start;
 
@@ -55,7 +59,8 @@ struct = extent_scan
   struct extent_info = *ext_info;
 };
 
-void = extent_scan_init (int src_fd, struct extent_scan *scan);
+void = extent_scan_init (int src_fd, size_t src_total_size,
+   =                     = struct extent_scan *scan);
 
 bool = extent_scan_read (struct extent_scan = *scan);
 
-- 
1.7.4




= --Apple-Mail-1-625867617-- From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 19 05:04:03 2011 Received: (at submit) by debbugs.gnu.org; 19 Apr 2011 09:04:03 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QC6qo-0004eM-RU for submit@debbugs.gnu.org; Tue, 19 Apr 2011 05:04:02 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QC6qn-0004du-0l for submit@debbugs.gnu.org; Tue, 19 Apr 2011 05:04:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QC6qg-0006t0-Qk for submit@debbugs.gnu.org; Tue, 19 Apr 2011 05:03:55 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:41876) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QC6qg-0006sw-P5 for submit@debbugs.gnu.org; Tue, 19 Apr 2011 05:03:54 -0400 Received: from eggs.gnu.org ([140.186.70.92]:35769) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QC6qc-0000cA-CL for bug-coreutils@gnu.org; Tue, 19 Apr 2011 05:03:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QC6qX-0006rM-Mt for bug-coreutils@gnu.org; Tue, 19 Apr 2011 05:03:50 -0400 Received: from mx.meyering.net ([82.230.74.64]:40748) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QC6qX-0006rA-Cm for bug-coreutils@gnu.org; Tue, 19 Apr 2011 05:03:45 -0400 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id B3F9C60113; Tue, 19 Apr 2011 11:03:44 +0200 (CEST) From: Jim Meyering To: Jeff liu Subject: Re: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module In-Reply-To: <3E3FBE56-4D89-44A4-94ED-13F3D1F693A3@oracle.com> (Jeff liu's message of "Tue, 19 Apr 2011 16:51:38 +0800") References: <2DB776C1-EF34-423D-8BE5-71C2F49DFF01@oracle.com> <3E3FBE56-4D89-44A4-94ED-13F3D1F693A3@oracle.com> Date: Tue, 19 Apr 2011 11:03:44 +0200 Message-ID: <877haq1ukv.fsf@rho.meyering.net> Lines: 11 MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -5.9 (-----) X-Debbugs-Envelope-To: submit Cc: =?iso-8859-1?Q?P=E1draig?= Brady , bug-coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.9 (-----) Jeff liu wrote: ... > Below is the revised patch, > > From 4f966c1fe6226f3f711faae120cd8bea78e722b8 Mon Sep 17 00:00:00 2001 > From: Jie Liu > Date: Tue, 19 Apr 2011 15:24:50 -0700 > Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to extent_scan module Thank you for the update. I will look at it in a week or so if no one else does. From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 26 05:46:39 2011 Received: (at submit) by debbugs.gnu.org; 26 Aug 2011 09:46:39 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Qwszm-0001D4-0v for submit@debbugs.gnu.org; Fri, 26 Aug 2011 05:46:39 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Qwszh-0001Cv-Lx for submit@debbugs.gnu.org; Fri, 26 Aug 2011 05:46:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Qwsx1-0005mZ-I9 for submit@debbugs.gnu.org; Fri, 26 Aug 2011 05:43:49 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:47007) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qwsx1-0005mV-Fo for submit@debbugs.gnu.org; Fri, 26 Aug 2011 05:43:47 -0400 Received: from eggs.gnu.org ([140.186.70.92]:37994) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qwswz-0005Cg-0o for bug-coreutils@gnu.org; Fri, 26 Aug 2011 05:43:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Qwsww-0005le-GX for bug-coreutils@gnu.org; Fri, 26 Aug 2011 05:43:44 -0400 Received: from acsinet15.oracle.com ([141.146.126.227]:53320) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qwsww-0005lG-3i for bug-coreutils@gnu.org; Fri, 26 Aug 2011 05:43:42 -0400 Received: from rtcsinet21.oracle.com (rtcsinet21.oracle.com [66.248.204.29]) by acsinet15.oracle.com (Switch-3.4.4/Switch-3.4.4) with ESMTP id p7Q9haUr007904 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 26 Aug 2011 09:43:38 GMT Received: from acsmt357.oracle.com (acsmt357.oracle.com [141.146.40.157]) by rtcsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id p7Q9hZvP012634 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 26 Aug 2011 09:43:35 GMT Received: from abhmt108.oracle.com (abhmt108.oracle.com [141.146.116.60]) by acsmt357.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id p7Q9hTgu010063; Fri, 26 Aug 2011 04:43:29 -0500 Received: from [192.168.1.102] (/123.119.98.207) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 26 Aug 2011 02:43:28 -0700 Message-ID: <4E576ABC.4080705@oracle.com> Date: Fri, 26 Aug 2011 17:43:24 +0800 From: Jeff Liu Organization: Oracle User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11 MIME-Version: 1.0 To: bug-coreutils@gnu.org Subject: Re: bug#8061: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module References: <2DB776C1-EF34-423D-8BE5-71C2F49DFF01@oracle.com> <3E3FBE56-4D89-44A4-94ED-13F3D1F693A3@oracle.com> In-Reply-To: <3E3FBE56-4D89-44A4-94ED-13F3D1F693A3@oracle.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: rtcsinet21.oracle.com [66.248.204.29] X-CT-RefId: str=0001.0A090207.4E576ACA.020F,ss=1,re=-2.300,fgs=0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 1) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -6.4 (------) X-Debbugs-Envelope-To: submit Cc: zfs-discuss@opensolaris.org, chris.mason@oracle.com, linux-btrfs@vger.kernel.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Reply-To: jeff.liu@oracle.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.4 (------) Dear All, As the SEEK_HOLE/SEEK_DATA has been implemented on Btrfs in 3.1.0+ and Glibc, I have worked out a new version for your guys review. Changes: ====== extent_scan.[c|h]: 1. add a function pointer to "struct extent_scan": /* Scan method. */ bool (*extent_scan) (struct extent_scan *scan); 2. add a structure item to indicate seek back issue maybe occurred: /* Failed to seek back to position 0 or not. */ bool seek_back_failed; If the file system support SEEK_HOLE, the file offset will pointed to somewhere > 0, so need to seek back to the beginning after support_seek_hole() checking for the proceeding extent scan. 3. rename extent_scan to fiemap_extent_scan. 4. add a new seek_extent_scan method. 5. add a new method to check SEEK stuff is supported or not. if the underlaying file system support SEEK_HOLE, assign seek_extent_scan to scan->extent_scan, or else, fiemap_extent_scan() will be assigned to it. copy.c: 1. pass src_total_size to extent_scan_init (). 2. for the first round extent scan, we need to seek back to position 0 too, if the data extent is started at the beginning of source file. Tested: ====== 1. make syntax-check. 2. verify a copied sparse file with 4697 extents on btrfs jeff@pibroch:~/gnu/coreutils$ python -c "f=open('/btrfs/sparse_test', 'w'); [(f.seek(x) or f.write(str(x))) for x in range(1, 1000000000, 99999)]; f.close()" jeff@pibroch:~/gnu/coreutils$ ./src/cp --sparse=always /btrfs/sparse_test /btrfs/sp.seek jeff@pibroch:~/gnu/coreutils$ cmp /btrfs/sparse_test /btrfs/sp.seek jeff@pibroch:~/gnu/coreutils$ echo $? 0 Also, the previous patch was developed on Solaris ZFS, but my test env was lost now. :( so anyone can help testing it on ZFS would be appreciated!! From 5892744f977a06b5557042682c39fd007eec8030 Mon Sep 17 00:00:00 2001 From: Jie Liu Date: Fri, 26 Aug 2011 17:11:33 +0800 Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to extent_scan module * src/extent_scan.h: introduce src_total_size to struct extent_info, we need it for lseek(2) iteration, add seek_back_failed to indicate that the seek back to position 0 failed in seek captical check or not, and it can be used for further debugging IMHO. add bool (*extent_scan) (struct extent_scan *scan) to switch the scan method. * src/extent_scan.c: implement a new seek_scan_read() through SEEK_DATA and SEEK_HOLE. * src/copy.c: a few code changes according to the new extent call interface. Signed-off-by: Jie Liu --- src/copy.c | 26 +++++++++- src/extent-scan.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++-- src/extent-scan.h | 16 +++++- 3 files changed, 183 insertions(+), 8 deletions(-) diff --git a/src/copy.c b/src/copy.c index bc4d7bd..c5e8714 100644 --- a/src/copy.c +++ b/src/copy.c @@ -309,7 +309,18 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, We may need this at the end, for a final ftruncate. */ off_t dest_pos = 0; - extent_scan_init (src_fd, &scan); + bool init_ok = extent_scan_init (src_fd, src_total_size, &scan); + /* If the underlaying file system support SEEK_HOLE, but failed + to seek back to position 0 after the initial seek checking, + let extent copy failure in this case. */ + if (! init_ok) + { + if (scan.seek_back_failed) + error (0, errno, + _("%s: extent_scan_init () failed, cannot seek back to position 0"), + quote (src_name)); + return false; + } *require_normal_copy = false; bool wrote_hole_at_eof = true; @@ -356,6 +367,19 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, wrote_hole_at_eof = false; + /* For the first round scan, if the data extent start at the + beginning, and the current file pointer is not at position + 0, set it back first, otherwise, we'll read from undesired + file offset. */ + if (ext_start == 0 && lseek (src_fd, 0, SEEK_CUR) != 0) + { + if (lseek (src_fd, 0, SEEK_SET) < 0) + { + error (0, errno, _("cannot lseek %s"), quote (src_name)); + return false; + } + } + if (hole_size) { if (lseek (src_fd, ext_start, SEEK_SET) < 0) diff --git a/src/extent-scan.c b/src/extent-scan.c index 37445b8..c835b63 100644 --- a/src/extent-scan.c +++ b/src/extent-scan.c @@ -27,6 +27,12 @@ #include "fiemap.h" #include "xstrtol.h" +#ifndef SEEK_DATA +# define SEEK_DATA 3 /* Seek to next data. */ +#endif +#ifndef SEEK_HOLE +# define SEEK_HOLE 4 /* Seek to next hole. */ +#endif /* Work around Linux kernel issues on BTRFS and EXT4 before 2.6.39. FIXME: remove in 2013, or whenever we're pretty confident @@ -65,10 +71,48 @@ extent_need_sync (void) #endif } +static bool +support_seek_hole (struct extent_scan *scan) +{ + off_t hole_pos; + +# ifdef _PC_MIN_HOLE_SIZE + /* To determine if the underlaying file system support + SEEK_HOLE, if not, fall back to fiemap extent scan or + the standard copy. */ + if (fpathconf (scan->fd, _PC_MIN_HOLE_SIZE) < 0) + return false; +# endif + + /* Inspired by STAR, If we have been compiled on an OS that + supports SEEK_HOLE but run on an OS that does not support + SEEK_HOLE, we get EINVAL. If the underlying file system + does not support the SEEK_HOLE call, we get ENOTSUP, fall + back to the fiemap scan or standard copy in either case. */ + hole_pos = lseek (scan->fd, (off_t) 0, SEEK_HOLE); + if (hole_pos < 0) + { + if (errno == EINVAL || errno == ENOTSUP) + return false; + } + + /* Seek back to position 0 first if we detected a real hole. */ + if (hole_pos > 0) + { + if (lseek (scan->fd, (off_t) 0, SEEK_SET) != 0) + { + scan->seek_back_failed = true; + return false; + } + } + + return true; +} + /* Allocate space for struct extent_scan, initialize the entries if necessary and return it as the input argument of extent_scan_read(). */ -extern void -extent_scan_init (int src_fd, struct extent_scan *scan) +extern bool +extent_scan_init (int src_fd, size_t src_total_size, struct extent_scan *scan) { scan->fd = src_fd; scan->ei_count = 0; @@ -76,17 +120,110 @@ extent_scan_init (int src_fd, struct extent_scan *scan) scan->scan_start = 0; scan->initial_scan_failed = false; scan->hit_final_extent = false; - scan->fm_flags = extent_need_sync () ? FIEMAP_FLAG_SYNC : 0; + scan->seek_back_failed = false; + + if (support_seek_hole (scan)) + { + scan->src_total_size = src_total_size; + scan->extent_scan = seek_extent_scan; + } + else + { + /* The underlying file system support SEEK_HOLE, but failed + to seek back to position 0 after seek checking, Oops! */ + if (scan->seek_back_failed) + return false; + + scan->extent_scan = fiemap_extent_scan; + scan->fm_flags = extent_need_sync () ? FIEMAP_FLAG_SYNC : 0; + } + + return true; +} + +extern inline bool +extent_scan_read (struct extent_scan *scan) +{ + return scan->extent_scan (scan); +} + +extern bool +seek_extent_scan (struct extent_scan *scan) +{ + off_t data_pos, hole_pos; + union { struct extent_info ei; char c[4096]; } extent_buf; + struct extent_info *ext_info = &extent_buf.ei; + enum { count = (sizeof extent_buf / sizeof *ext_info) }; + verify (count != 0); + + memset (&extent_buf, 0, sizeof extent_buf); + + unsigned int i = 0; + /* If lseek(2) failed and the errno is set to ENXIO, for + SEEK_DATA there are no more data regions past the supplied + offset. For SEEK_HOLE, there are no more holes past the + supplied offset. Set scan->hit_final_extent to true for + either case. */ + do { + data_pos = lseek (scan->fd, scan->scan_start, SEEK_DATA); + if (data_pos < 0) + { + if (errno == ENXIO) + { + scan->hit_final_extent = true; + return true; + } + return false; + } + + /* We hit the final extent if the data offset is equal to + the source file size. */ + if (data_pos == scan->src_total_size) + { + scan->hit_final_extent = true; + break; + } + + hole_pos = lseek (scan->fd, data_pos, SEEK_HOLE); + if (hole_pos < 0) + { + if (errno != ENXIO) + return false; + else + { + scan->hit_final_extent = true; + return true; + } + } + + ext_info[i].ext_logical = data_pos; + ext_info[i].ext_length = hole_pos - data_pos; + scan->scan_start = hole_pos; + ++i; + } while (scan->scan_start < scan->src_total_size && i < count); + + scan->ei_count = i; + scan->ext_info = xnmalloc (scan->ei_count, sizeof (struct extent_info)); + + for (i = 0; i < scan->ei_count; i++) + { + assert (ext_info[i].ext_logical <= OFF_T_MAX); + + scan->ext_info[i].ext_logical = ext_info[i].ext_logical; + scan->ext_info[i].ext_length = ext_info[i].ext_length; + } + + return true; } -#ifdef __linux__ +#if defined __linux__ # ifndef FS_IOC_FIEMAP # define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap) # endif /* Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to obtain a map of file extents excluding holes. */ extern bool -extent_scan_read (struct extent_scan *scan) +fiemap_extent_scan (struct extent_scan *scan) { unsigned int si = 0; struct extent_info *last_ei IF_LINT ( = scan->ext_info); @@ -212,7 +349,7 @@ extent_scan_read (struct extent_scan *scan) } #else extern bool -extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED) +fiemap_extent_scan (struct extent_scan *scan ATTRIBUTE_UNUSED) { scan->initial_scan_failed = true; errno = ENOTSUP; diff --git a/src/extent-scan.h b/src/extent-scan.h index 5b4ded5..e751810 100644 --- a/src/extent-scan.h +++ b/src/extent-scan.h @@ -38,6 +38,9 @@ struct extent_scan /* File descriptor of extent scan run against. */ int fd; + /* Source file size, i.e, (struct stat) &statbuf.st_size. */ + size_t src_total_size; + /* Next scan start offset. */ off_t scan_start; @@ -47,6 +50,9 @@ struct extent_scan /* How many extent info returned for a scan. */ uint32_t ei_count; + /* Failed to seek back to position 0 or not. */ + bool seek_back_failed; + /* If true, fall back to a normal copy, either set by the failure of ioctl(2) for FIEMAP or lseek(2) with SEEK_DATA. */ bool initial_scan_failed; @@ -54,14 +60,22 @@ struct extent_scan /* If true, the total extent scan per file has been finished. */ bool hit_final_extent; + /* Scan method. */ + bool (*extent_scan) (struct extent_scan *scan); + /* Extent information: a malloc'd array of ei_count structs. */ struct extent_info *ext_info; }; -void extent_scan_init (int src_fd, struct extent_scan *scan); +bool extent_scan_init (int src_fd, size_t src_total_size, + struct extent_scan *scan); bool extent_scan_read (struct extent_scan *scan); +bool fiemap_extent_scan (struct extent_scan *scan); + +bool seek_extent_scan (struct extent_scan *scan); + static inline void extent_scan_free (struct extent_scan *scan) { -- 1.7.4.1 Thanks, -Jeff On 04/19/2011 04:51 PM, Jeff liu wrote: >> Hi All, >> >> Please ignore the current patch, I will submit another patch with a few fixes soon. > Now the new patch set coming, > > In previous post, I have tried to change the extent_scan_init() interface by adding a new argument to indicate the source file size, > this will reduce the overhead of call fstat(2) in extent_scan_read(), since the file size is definitely needed for SEEK* stuff, however, the file size is redundant for FIEMAP. > so I changed my idea to keep extent_scan_init() as before, instead, to retrieve the file size in extent_scan_read() when launching the first scan, one benefit is, there is nothing need to > be modified in extent_copy() for this patch. > > Tests: > ==== > A new test sparse-lseek was introduced in this post, it make use of the sparse file generation function in Perl, and do `cmp` against the target copied file. > I have also took a look at the `sdb` utility shipped with ZFS, but did not found any interesting stuff can be used for this test. > > Test run passed on my environment as below, > > bash-3.00# make check TESTS=cp/sparse-lseek VERBOSE=yes > make check-TESTS > make[1]: Entering directory `/coreutils/tests' > make[2]: Entering directory `/coreutils/tests' > PASS: cp/sparse-lseek > ============= > 1 test passed > ============= > make[2]: Leaving directory `/coreutils/tests' > make[1]: Leaving directory `/coreutils/tests' > GEN vc_exe_in_TESTS > No differences encountered > > Manual tests: > =========== > 1. Ensure trailing blanks, test 0 size sparse file, non-sparse file, sparse file with hole start and hole end. > 2. make syntax-check failed, I have no idea of this issue at the moment, I also tried to run make distcheck, looks the package building, install and uninstall procedures all passed, > but it also failed at the final stage, am I missing something here? > > The logs which were shown as following, > bash-3.00# make syntax-check > GFDL_version > awk: syntax error near line 1 > awk: bailing out near line 1 > make: *** [sc_GFDL_version.z] Error 2 > > make distcheck: > ============== > ...... > make[1]: Entering directory `/coreutils' > GEN check-ls-dircolors > make my-distcheck > make[2]: Entering directory `/coreutils' > make syntax-check > make[3]: Entering directory `/coreutils' > GFDL_version > awk: syntax error near line 1 > awk: bailing out near line 1 > make[3]: *** [sc_GFDL_version.z] Error 2 > make[3]: Leaving directory `/coreutils' > make[2]: *** [my-distcheck] Error 2 > make[2]: Leaving directory `/coreutils' > make[1]: *** [distcheck-hook] Error 2 > make[1]: Leaving directory `/coreutils' > make: *** [distcheck] Error 1 > > > > Below is the revised patch, > > From 4f966c1fe6226f3f711faae120cd8bea78e722b8 Mon Sep 17 00:00:00 2001 > From: Jie Liu > Date: Tue, 19 Apr 2011 15:24:50 -0700 > Subject: [PATCH 1/1] copy: add SEEK_DATA/SEEK_HOLE support to extent_scan module > > * src/extent_scan.h: introduce src_total_size to struct extent_info, we > need it for lseek(2) iteration. > * src/extent_scan.c: implement a new extent_scan_read() through SEEK_DATA > and SEEK_HOLE if those stuff are supported. > * tests/cp/sparse-lseek: add a new test for lseek(2) extent copy. > > Signed-off-by: Jie Liu > --- > src/extent-scan.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++ > src/extent-scan.h | 5 ++ > tests/Makefile.am | 1 + > tests/cp/sparse-lseek | 56 +++++++++++++++++++++++ > 4 files changed, 181 insertions(+), 0 deletions(-) > create mode 100755 tests/cp/sparse-lseek > > diff --git a/src/extent-scan.c b/src/extent-scan.c > index da7eb9d..a54eca0 100644 > --- a/src/extent-scan.c > +++ b/src/extent-scan.c > @@ -17,7 +17,9 @@ > Written by Jie Liu (jeff.liu@oracle.com). */ > > #include > +#include > #include > +#include > #include > #include > #include > @@ -71,6 +73,9 @@ extent_scan_init (int src_fd, struct extent_scan *scan) > scan->initial_scan_failed = false; > scan->hit_final_extent = false; > scan->fm_flags = extent_need_sync () ? FIEMAP_FLAG_SYNC : 0; > +#if defined (SEEK_DATA)&& defined (SEEK_HOLE) > + scan->src_total_size = 0; > +#endif > } > > #ifdef __linux__ > @@ -204,6 +209,120 @@ extent_scan_read (struct extent_scan *scan) > > return true; > } > +#elif defined (SEEK_HOLE)&& defined (SEEK_DATA) > +extern bool > +extent_scan_read (struct extent_scan *scan) > +{ > + off_t data_pos, hole_pos; > + union { struct extent_info ei; char c[4096]; } extent_buf; > + struct extent_info *ext_info =&extent_buf.ei; > + enum { count = (sizeof extent_buf / sizeof *ext_info) }; > + verify (count != 0); > + > + memset (&extent_buf, 0, sizeof extent_buf); > + > + if (scan->scan_start == 0) > + { > +# ifdef _PC_MIN_HOLE_SIZE > + /* To determine if the underlaying file system support > + SEEK_HOLE. If not, fall back to the standard copy. */ > + if (fpathconf (scan->fd, _PC_MIN_HOLE_SIZE)< 0) > + { > + scan->initial_scan_failed = true; > + return false; > + } > +# endif > + > + /* If we have been compiled on an OS that supports SEEK_HOLE > + but run on an OS that does not support SEEK_HOLE, we get > + EINVAL. If the underlying file system does not support the > + SEEK_HOLE call, we get ENOTSUP, setting initial_scan_failed > + to true to fall back to the standard copy in either case. */ > + hole_pos = lseek (scan->fd, (off_t) 0, SEEK_HOLE); > + if (hole_pos< 0) > + { > + if (errno == EINVAL || errno == ENOTSUP) > + scan->initial_scan_failed = true; > + return false; > + } > + > + /* Seek back to position 0 first. */ > + if (hole_pos> 0) > + { > + if (lseek (scan->fd, (off_t) 0, SEEK_SET)< 0) > + return false; > + } > + > + struct stat sb; > + if (fstat (scan->fd,&sb)< 0) > + return false; > + > + /* This is definitely not a sparse file, we treat it as a big extent. */ > + if (hole_pos>= sb.st_size) > + { > + scan->ei_count = 1; > + scan->ext_info = xnmalloc (scan->ei_count, sizeof (struct extent_info)); > + scan->ext_info[0].ext_logical = 0; > + scan->ext_info[0].ext_length = sb.st_size; > + scan->hit_final_extent = true; > + return true; > + } > + scan->src_total_size = sb.st_size; > + } > + > + unsigned int i = 0; > + /* If lseek(2) failed and the errno is set to ENXIO, for > + SEEK_DATA there are no more data regions past the supplied > + offset. For SEEK_HOLE, there are no more holes past the > + supplied offset. Set scan->hit_final_extent to true in > + either case. */ > + while (scan->scan_start< scan->src_total_size&& i< count) > + { > + data_pos = lseek (scan->fd, scan->scan_start, SEEK_DATA); > + if (data_pos< 0) > + { > + if (errno == ENXIO) > + { > + scan->hit_final_extent = true; > + break; > + } > + return false; > + } > + > + hole_pos = lseek (scan->fd, data_pos, SEEK_HOLE); > + if (hole_pos< 0) > + { > + if (errno == ENXIO) > + { > + scan->hit_final_extent = true; > + hole_pos = scan->src_total_size; > + if (data_pos< hole_pos) > + goto preserve_ext_info; > + break; > + } > + return false; > + } > + > +preserve_ext_info: > + ext_info[i].ext_logical = data_pos; > + ext_info[i].ext_length = hole_pos - data_pos; > + scan->scan_start = hole_pos; > + ++i; > + } > + > + scan->ei_count = i; > + scan->ext_info = xnmalloc (scan->ei_count, sizeof (struct extent_info)); > + > + for (i = 0; i< scan->ei_count; i++) > + { > + assert (ext_info[i].ext_logical<= OFF_T_MAX); > + > + scan->ext_info[i].ext_logical = ext_info[i].ext_logical; > + scan->ext_info[i].ext_length = ext_info[i].ext_length; > + } > + > + return (lseek (scan->fd, (off_t) 0, SEEK_SET)< 0) ? false : true; > +} > #else > extern bool > extent_scan_read (struct extent_scan *scan ATTRIBUTE_UNUSED) > diff --git a/src/extent-scan.h b/src/extent-scan.h > index 5b4ded5..4fc05c6 100644 > --- a/src/extent-scan.h > +++ b/src/extent-scan.h > @@ -38,6 +38,11 @@ struct extent_scan > /* File descriptor of extent scan run against. */ > int fd; > > +# if defined (SEEK_DATA)&& defined (SEEK_HOLE) > + /* Source file size, i.e, (struct stat)&statbuf.st_size. */ > + size_t src_total_size; > +#endif > + > /* Next scan start offset. */ > off_t scan_start; > > diff --git a/tests/Makefile.am b/tests/Makefile.am > index 685eb52..6c596b9 100644 > --- a/tests/Makefile.am > +++ b/tests/Makefile.am > @@ -28,6 +28,7 @@ root_tests = \ > cp/cp-mv-enotsup-xattr \ > cp/capability \ > cp/sparse-fiemap \ > + cp/sparse-lseek \ > dd/skip-seek-past-dev \ > install/install-C-root \ > ls/capability \ > diff --git a/tests/cp/sparse-lseek b/tests/cp/sparse-lseek > new file mode 100755 > index 0000000..5b8f2c1 > --- /dev/null > +++ b/tests/cp/sparse-lseek > @@ -0,0 +1,56 @@ > +#!/bin/sh > +# Test cp --sparse=always through lseek(SEEK_DATA/SEEK_HOLE) copy > + > +# Copyright (C) 2010-2011 Free Software Foundation, Inc. > + > +# This program is free software: you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation, either version 3 of the License, or > +# (at your option) any later version. > + > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > + > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see. > + > +. "${srcdir=.}/init.sh"; path_prepend_ ../src > +print_ver_ cp > +$PERL -e 1 || skip_test_ 'you lack perl' > + > +zfsdisk=diskX > +zfspool=seektest > + > +require_root_ > + > +cwd=$PWD > +cleanup_() { zpool destroy $zfspool; } > + > +skip=0 > +mkfile 128m "$cwd/$zfsdisk" || skip=1 > + > +# Check if the seektest pool is already exists > +zpool list $zfspool 2>/dev/null&& > + skip_test_ "$zfspool already exists" > + > +# Create pool and verify if it is mounted automatically > +zpool create $zfspool "$cwd/$zfsdisk" || skip=1 > +zpool list $zfspool>/dev/null || skip=1 > + > +test $skip = 1&& skip_test_ "insufficient ZFS support" > + > +for i in $(seq 1 2 21); do > + for j in 1 2 31 100; do > + $PERL -e 'BEGIN { $n = '$i' * 1024; *F = *STDOUT }' \ > + -e 'for (1..'$j') { sysseek (*F, $n, 1)' \ > + -e '&& syswrite (*F, chr($_)x$n) or die "$!"}'> /$zfspool/j1 || fail=1 > + > + cp --sparse=always /$zfspool/j1 /$zfspool/j2 || fail=1 > + cmp /$zfspool/j1 /$zfspool/j2 || fail=1 > + test $fail = 1&& break 2 > + done > +done > + > +Exit $fail From debbugs-submit-bounces@debbugs.gnu.org Thu Jun 25 22:15:35 2020 Received: (at 8061-done) by debbugs.gnu.org; 26 Jun 2020 02:15:35 +0000 Received: from localhost ([127.0.0.1]:41631 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jodtu-0003dg-77 for submit@debbugs.gnu.org; Thu, 25 Jun 2020 22:15:35 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:57692) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jodtq-0003dQ-P8 for 8061-done@debbugs.gnu.org; Thu, 25 Jun 2020 22:15:32 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 6D0711600D1; Thu, 25 Jun 2020 19:15:24 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Hfq_u2YJ3r_2; Thu, 25 Jun 2020 19:15:21 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C3CB61600E0; Thu, 25 Jun 2020 19:15:21 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 0K2r3MmKJcyD; Thu, 25 Jun 2020 19:15:21 -0700 (PDT) Received: from [192.168.1.9] (cpe-75-82-69-226.socal.res.rr.com [75.82.69.226]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 75C441600D1; Thu, 25 Jun 2020 19:15:21 -0700 (PDT) To: Jeff liu From: Paul Eggert Subject: Re: Introduce SEEK_DATA/SEEK_HOLE to extent_scan module Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDUGdRVEFRSUFL QVVDVElCeQpaQUliQXdVSkVzd0RBQVlMQ1FnSEF3SUdGUWdDQ1FvTEJCWUNBd0VDSGdFQ0Y0 QUFDZ2tRN1pmcERtS3FmalJSCkd3LytJajAzZGhZZllsL2dYVlJpdXpWMWdHcmJIayt0bmZy SS9DN2ZBZW9GelE1dFZnVmluU2hhUGtabzBIVFAKZjE4eDZJREVkQWlPOE1xbzF5cDBDdEht ekdNQ0o1MG80R3JnZmpscjZnLyt2dEVPS2JobGVzek4yWHBKdnB3TQoyUWdHdm4vbGFUTFV1 OFBIOWFSV1RzN3FKSlpLS0tBYjRzeFljOTJGZWhQdTZGT0QwZERpeWhsREFxNGxPVjJtCmRC cHpRYmlvam9aelFMTVF3anBnQ1RLMjU3MmVLOUVPRVF5U1VUaFhyU0l6NkFTZW5wNE5ZVEZI czl0dUpRdlgKazlnWkRkUFNsM2JwKzQ3ZEd4bHhFV0xwQklNN3pJT053NGtzNGF6Z1Q4bnZE WnhBNUlaSHR2cUJsSkxCT2JZWQowTGU2MVdwMHkzVGxCRGgycWRLOGVZTDQyNlc0c2NFTVN1 aWc1Z2I4T0F0UWlCVzZrMnNHVXh4ZWl2OG92V3U4CllBWmdLSmZ1b1dJK3VSbk1FZGRydVk4 SnNvTTU0S2FLdlppa2tLczJiZzFuZHRMVnpIcEo2cUZaQzdRVmplSFUKaDYvQm1ndmRqV1Ba WUZUdE4rS0E5Q1dYM0dRS0tnTjN1dTk4OHl6bkQ3TG5COThUNEVVSDFIQS9HbmZCcU1WMQpn cHpUdlBjNHFWUWluQ21Ja0VGcDgzemwrRzVmQ2pKSjNXN2l2ekNuWW80S2hLTHBGVW05N29r VEtSMkxXM3haCnpFVzRjTFNXTzM4N01USzNDekRPeDVxZTZzNGE5MVp1Wk0vai9UUWRUTERh cU5uODNrQTRIcTQ4VUhYWXhjSWgKK05kOGsvM3c2bEZ1b0swd3JPRml5d2pMeCswdXI1am1t YmVjQkdIYzF4ZGhBRkc1QWcwRVRJQnlaQUVRQUthRgo2NzhUOXd5SDR3alRyVjFQejNjREVv U25WLzBaVXJPVDM3cDFkY0d5ai9JWHExeDY3MEhSVmFoQW1rMHNacFljCjI1UEY5RDVHUFlI RldsTmp1UFU5NnJEbmRYQjNoZWRtQlJoTGRDNGJBWGpJNERWK2JtZFZlK3EvSU1ubFpSYVYK bG05RWlNQ1ZBUjZ3MTNzUmV1N3FYa1c5cjNSd1kyQXpYc2twL3RBZTRCUktyMVptYnZpMm5i blE2ZXBFQzQycgpSYngwQjFFaGpiSVFaNUpIR2syNGlQVDdMZEJnbk5tb3M1d1lqendObGtN UUQ1VDBZZHpoazdKK1V4d0E1bTQ2Cm1PaFJEQzJyRlYvQTBnbTVUTHk4RFhqdi9Fc2M0Z1lu WWFpNlNRcW5VRVZoNUx1VjhZQ0pCbmlqcytUaXc3MXgKMWljbW42eEdJNDVFdWdKT2dlYyty THlwWWdwVnA0eDBISTVUODhxQlJZQ2t4SDNLZzhRbytFV05BOUE0TFJROQpEWDhuam9uYTBn ZjBzMDN0b2NLOGtCTjY2VW9xcVB0SEJuYzRlTWdCeW1DZmxLMTJlS2ZkMllZeG55ZzljWmF6 CldBNVZzbHZUeHBtNzZoYmc1b2lBRUgvVmcvOE14SHlBblBoZnJnd3lQcm1KRWNWQmFmZHNw Sm5ZUXhCWU5jbzIKTEZQSWhsT3ZXaDhyNGF0K3MrTTNMYjI2b1VUY3psZ2RXMVNmM1NEQTc3 Qk1SbkYwRlF5RSs3QXpWNzlNQk40eQpraXFhZXpReHRhRjFGeS90dmtoZmZTbzh1K2R3RzBF Z0poK3RlMzhnVGNJU1ZyMEdJUHBsTHo2WWhqcmJIclBSCkYxQ041VXVMOURCR2p4dU4zNVJM TlZFZnRhNlJVRmxSNk5jdFRqdnJBQkVCQUFHSkFpVUVHQUVDQUE4RkFreUEKY21RQ0d3d0ZD UkxNQXdBQUNna1E3WmZwRG1LcWZqU3JIQS8rS3pBS3ZUeFJoQTlNV05MeEl5SjdTNXVKMTZn cwpUM29DalpyQktHRWhLTU9HWDRPMEdBNlZPRXJ5TzdRUkNDWWFoM294U0czOElBbk5laXdK WGdVOUJ6a2s4NVVHCmJQRWQ3SEdGL1ZTZUhDUXdXb3U2anFVRFRTRHZuOVloTlRkRzBLWFBN NzRhQyt4cjJab3cxTzJtaFhpaGdXS0QKMER3KzBMWVBuVU9zUTBLT0Z4SFhYWUhtUnJTMU9a UFU1OUJMdmMrVFJoSWhhZlNIS0x3YlhLKzZja2t4Qng2aAo4ejVjY3BHMFFzNGJGaGRGWW5G ckVpZURMb0dtbkUyWUxoZFY2c3dKOVZOQ1M2cExpRW9oVDNmbTdhWG0xNXRaCk9JeXpNWmhI UlNBUGJsWHhRMFpTV2pxOG9ScmNZTkZ4YzRXMVVScEFrQkNPWUpvWHZRZkQ1TDNscUFsOFRD cUQKVXpZeGhIL3RKaGJEZEhycUhINzY3amFEYVRCMStUYWxwLzJBTUt3Y1hOT2Rpa2xHeGJt SFZHNllHbDZnOExyYgpzdTlOWkVJNHlMbEh6dWlrdGhKV2d6KzN2WmhWR3lObHQrSE5Jb0Y2 Q2pETDJvbXU1Y0VxNFJESE00NFFxUGs2Cmw3TzBwVXZOMW1UNEIrUzFiMDhSS3BxbS9mZjAx NUUzN0hOVi9waUl2Smx4R0FZejhQU2Z1R0NCMXRoTVlxbG0KZ2RoZDkvQmFiR0ZiR0dZSEE2 VTQvVDV6cVUrZjZ4SHkxU3NBUVoxTVNLbEx3ZWtCSVQrNC9jTFJHcUNIam5WMApxNUgvVDZh N3Q1bVBrYnpTck9MU280cHVqK0lUb05qWXlZSURCV3pobEExOWF2T2ErcnZVam1IdEQzc0ZO N2NYCld0a0dvaThidU5jYnk0VT0KPUFMNm8KLS0tLS1FTkQgUEdQIFBVQkxJQyBLRVkgQkxP Q0stLS0tLQo= Organization: UCLA Computer Science Department Message-ID: <3bc2236c-4e41-7f55-5a63-f4221d38041b@cs.ucla.edu> Date: Thu, 25 Jun 2020 19:15:21 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------C013573179FBFEFA6D62E7D4" Content-Language: en-US X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 8061-done Cc: =?UTF-8?Q?P=c3=a1draig_Brady?= , Jim Meyering , 8061-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) This is a multi-part message in MIME format. --------------C013573179FBFEFA6D62E7D4 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit This email is follow up to dated 2011-05-01. Jeff, thanks for reporting the problem. (There's a good chance this email will bounce but I'll send it to your 2011 email address anyway.) I recently ran into the same issue and derived the attached patches independently. I then found your bug report, made sure the attached patches fixed every problem that your proposal did, and installed the attached patches into Savannah. The attached patches 1-3 merely fix typos and refactor. Patch 4 corresponds to your proposal; however, it differs in that its basic idea is to use the FIEMAP code only as a fallback if SEEK_DATA doesn't work, rather than try to add to the already-too-complicated code that fiddles with FIEMAPs. (I don't observe any significant performance advantage to the FIEMAP stuff, but maybe that's just me.) Patch 5 adds opportunistic use of the copy_file_range syscall introduced in Linux kernel 4.5 (2016) and reworked in 5.3 (2019). This should improve 'cp' performance on kernels and file systems that support copy_file_range. --------------C013573179FBFEFA6D62E7D4 Content-Type: text/x-patch; charset=UTF-8; name="0001-maint-typo-fix.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-maint-typo-fix.patch" >From 4fe5259ab6c9e459a6db5938d143a9c65be113d9 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 25 Jun 2020 18:10:49 -0700 Subject: [PATCH 1/5] maint: typo fix * NEWS: Fix typo. --- NEWS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS b/NEWS index d36259641..d713fa724 100644 --- a/NEWS +++ b/NEWS @@ -17,7 +17,7 @@ GNU coreutils NEWS -*- outline -*- cp and install now default to copy-on-write (COW) if available. - On GNU/Linux systems, ls no longer issues an error message on + On GNU/Linux systems, ls no longer issues an error message on a directory merely because it was removed. This reverts a change that was made in release 8.32. -- 2.25.4 --------------C013573179FBFEFA6D62E7D4 Content-Type: text/x-patch; charset=UTF-8; name="0002-cp-refactor-extent_copy.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0002-cp-refactor-extent_copy.patch" >From 51981008f9892d44231c432535deac4f9b3cbe5e Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 23 Jun 2020 19:18:04 -0700 Subject: [PATCH 2/5] cp: refactor extent_copy * src/copy.c (extent_copy): New arg SCAN, replacing REQUIRE_NORMAL_COPY. All callers changed. (enum scantype): New type. (infer_scantype): Rename from is_probably_sparse and return the new type. Add args FD and SCAN. All callers changed. --- src/copy.c | 119 +++++++++++++++++++++++++---------------------------- 1 file changed, 55 insertions(+), 64 deletions(-) diff --git a/src/copy.c b/src/copy.c index 54601ce07..f694f913f 100644 --- a/src/copy.c +++ b/src/copy.c @@ -422,9 +422,8 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, size_t hole_size, off_t src_total_size, enum Sparse_type sparse_mode, char const *src_name, char const *dst_name, - bool *require_normal_copy) + struct extent_scan *scan) { - struct extent_scan scan; off_t last_ext_start = 0; off_t last_ext_len = 0; @@ -432,45 +431,25 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, We may need this at the end, for a final ftruncate. */ off_t dest_pos = 0; - extent_scan_init (src_fd, &scan); - - *require_normal_copy = false; bool wrote_hole_at_eof = true; - do + while (true) { - bool ok = extent_scan_read (&scan); - if (! ok) - { - if (scan.hit_final_extent) - break; - - if (scan.initial_scan_failed) - { - *require_normal_copy = true; - return false; - } - - error (0, errno, _("%s: failed to get extents info"), - quotef (src_name)); - return false; - } - bool empty_extent = false; - for (unsigned int i = 0; i < scan.ei_count || empty_extent; i++) + for (unsigned int i = 0; i < scan->ei_count || empty_extent; i++) { off_t ext_start; off_t ext_len; off_t ext_hole_size; - if (i < scan.ei_count) + if (i < scan->ei_count) { - ext_start = scan.ext_info[i].ext_logical; - ext_len = scan.ext_info[i].ext_length; + ext_start = scan->ext_info[i].ext_logical; + ext_len = scan->ext_info[i].ext_length; } else /* empty extent at EOF. */ { i--; - ext_start = last_ext_start + scan.ext_info[i].ext_length; + ext_start = last_ext_start + scan->ext_info[i].ext_length; ext_len = 0; } @@ -498,7 +477,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, { error (0, errno, _("cannot lseek %s"), quoteaf (src_name)); fail: - extent_scan_free (&scan); + extent_scan_free (scan); return false; } @@ -539,7 +518,7 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, /* For now, do not treat FIEMAP_EXTENT_UNWRITTEN specially, because that (in combination with no sync) would lead to data loss at least on XFS and ext4 when using 2.6.39-rc3 kernels. */ - if (0 && (scan.ext_info[i].ext_flags & FIEMAP_EXTENT_UNWRITTEN)) + if (0 && (scan->ext_info[i].ext_flags & FIEMAP_EXTENT_UNWRITTEN)) { empty_extent = true; last_ext_len = 0; @@ -571,16 +550,23 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, extents beyond the apparent size. */ if (dest_pos == src_total_size) { - scan.hit_final_extent = true; + scan->hit_final_extent = true; break; } } /* Release the space allocated to scan->ext_info. */ - extent_scan_free (&scan); + extent_scan_free (scan); + if (scan->hit_final_extent) + break; + if (! extent_scan_read (scan) && ! scan->hit_final_extent) + { + error (0, errno, _("%s: failed to get extents info"), + quotef (src_name)); + return false; + } } - while (! scan.hit_final_extent); /* When the source file ends with a hole, we have to do a little more work, since the above copied only up to and including the final extent. @@ -1021,16 +1007,35 @@ fchmod_or_lchmod (int desc, char const *name, mode_t mode) # define HAVE_STRUCT_STAT_ST_BLOCKS 0 #endif +/* Type of scan being done on the input when looking for sparseness. */ +enum scantype + { + /* No fancy scanning; just read and write. */ + PLAIN_SCANTYPE, + + /* Read and examine data looking for zero blocks; useful when + attempting to create sparse output. */ + ZERO_SCANTYPE, + + /* Extent information is available. */ + EXTENT_SCANTYPE + }; + /* Use a heuristic to determine whether stat buffer SB comes from a file with sparse blocks. If the file has fewer blocks than would normally be needed for a file of its size, then at least one of the blocks in the file is a hole. In that case, return true. */ -static bool -is_probably_sparse (struct stat const *sb) +static enum scantype +infer_scantype (int fd, struct stat const *sb, struct extent_scan *scan) { - return (HAVE_STRUCT_STAT_ST_BLOCKS - && S_ISREG (sb->st_mode) - && ST_NBLOCKS (*sb) < sb->st_size / ST_NBLOCKSIZE); + if (! (HAVE_STRUCT_STAT_ST_BLOCKS + && S_ISREG (sb->st_mode) + && ST_NBLOCKS (*sb) < sb->st_size / ST_NBLOCKSIZE)) + return PLAIN_SCANTYPE; + + extent_scan_init (fd, scan); + extent_scan_read (scan); + return scan->initial_scan_failed ? ZERO_SCANTYPE : EXTENT_SCANTYPE; } @@ -1061,6 +1066,7 @@ copy_reg (char const *src_name, char const *dst_name, mode_t src_mode = src_sb->st_mode; struct stat sb; struct stat src_open_sb; + struct extent_scan scan; bool return_val = true; bool data_copy_required = x->data_copy_required; @@ -1260,23 +1266,13 @@ copy_reg (char const *src_name, char const *dst_name, fdadvise (source_desc, 0, 0, FADVISE_SEQUENTIAL); /* Deal with sparse files. */ - bool make_holes = false; - bool sparse_src = is_probably_sparse (&src_open_sb); - - if (S_ISREG (sb.st_mode)) - { - /* Even with --sparse=always, try to create holes only - if the destination is a regular file. */ - if (x->sparse_mode == SPARSE_ALWAYS) - make_holes = true; - - /* Use a heuristic to determine whether SRC_NAME contains any sparse - blocks. If the file has fewer blocks than would normally be - needed for a file of its size, then at least one of the blocks in - the file is a hole. */ - if (x->sparse_mode == SPARSE_AUTO && sparse_src) - make_holes = true; - } + enum scantype scantype = infer_scantype (source_desc, &src_open_sb, + &scan); + bool make_holes + = (S_ISREG (sb.st_mode) + && (x->sparse_mode == SPARSE_ALWAYS + || (x->sparse_mode == SPARSE_AUTO + && scantype != PLAIN_SCANTYPE))); /* If not making a sparse file, try to use a more-efficient buffer size. */ @@ -1305,10 +1301,8 @@ copy_reg (char const *src_name, char const *dst_name, buf_alloc = xmalloc (buf_size + buf_alignment); buf = ptr_align (buf_alloc, buf_alignment); - if (sparse_src) + if (scantype == EXTENT_SCANTYPE) { - bool normal_copy_required; - /* Perform an efficient extent-based copy, falling back to the standard copy only if the initial extent scan fails. If the '--sparse=never' option is specified, write all data but use @@ -1316,14 +1310,11 @@ copy_reg (char const *src_name, char const *dst_name, if (extent_copy (source_desc, dest_desc, buf, buf_size, hole_size, src_open_sb.st_size, make_holes ? x->sparse_mode : SPARSE_NEVER, - src_name, dst_name, &normal_copy_required)) + src_name, dst_name, &scan)) goto preserve_metadata; - if (! normal_copy_required) - { - return_val = false; - goto close_src_and_dst_desc; - } + return_val = false; + goto close_src_and_dst_desc; } off_t n_read; -- 2.25.4 --------------C013573179FBFEFA6D62E7D4 Content-Type: text/x-patch; charset=UTF-8; name="0003-cp-avoid-copy_reg-goto.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0003-cp-avoid-copy_reg-goto.patch" >From 35a55effa81c5f4f8fe7414e8aede0acccee470f Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 24 Jun 2020 17:05:20 -0700 Subject: [PATCH 3/5] cp: avoid copy_reg goto * src/copy.c (copy_reg): Redo to avoid label and goto. --- src/copy.c | 34 +++++++++++----------------------- 1 file changed, 11 insertions(+), 23 deletions(-) diff --git a/src/copy.c b/src/copy.c index f694f913f..b382cfa4d 100644 --- a/src/copy.c +++ b/src/copy.c @@ -1301,29 +1301,18 @@ copy_reg (char const *src_name, char const *dst_name, buf_alloc = xmalloc (buf_size + buf_alignment); buf = ptr_align (buf_alloc, buf_alignment); - if (scantype == EXTENT_SCANTYPE) - { - /* Perform an efficient extent-based copy, falling back to the - standard copy only if the initial extent scan fails. If the - '--sparse=never' option is specified, write all data but use - any extents to read more efficiently. */ - if (extent_copy (source_desc, dest_desc, buf, buf_size, hole_size, - src_open_sb.st_size, - make_holes ? x->sparse_mode : SPARSE_NEVER, - src_name, dst_name, &scan)) - goto preserve_metadata; - - return_val = false; - goto close_src_and_dst_desc; - } - off_t n_read; - bool wrote_hole_at_eof; - if (! sparse_copy (source_desc, dest_desc, buf, buf_size, - make_holes ? hole_size : 0, - x->sparse_mode == SPARSE_ALWAYS, src_name, dst_name, - UINTMAX_MAX, &n_read, - &wrote_hole_at_eof)) + bool wrote_hole_at_eof = false; + if (! (scantype == EXTENT_SCANTYPE + ? extent_copy (source_desc, dest_desc, buf, buf_size, hole_size, + src_open_sb.st_size, + make_holes ? x->sparse_mode : SPARSE_NEVER, + src_name, dst_name, &scan) + : sparse_copy (source_desc, dest_desc, buf, buf_size, + make_holes ? hole_size : 0, + x->sparse_mode == SPARSE_ALWAYS, + src_name, dst_name, UINTMAX_MAX, &n_read, + &wrote_hole_at_eof))) { return_val = false; goto close_src_and_dst_desc; @@ -1336,7 +1325,6 @@ copy_reg (char const *src_name, char const *dst_name, } } -preserve_metadata: if (x->preserve_timestamps) { struct timespec timespec[2]; -- 2.25.4 --------------C013573179FBFEFA6D62E7D4 Content-Type: text/x-patch; charset=UTF-8; name="0004-cp-use-SEEK_DATA-SEEK_HOLE-if-available.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0004-cp-use-SEEK_DATA-SEEK_HOLE-if-available.patch" >From 6fc41029fff6e955a87312d8f5f967f0d01e390c Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 25 Jun 2020 16:31:44 -0700 Subject: [PATCH 4/5] cp: use SEEK_DATA/SEEK_HOLE if available If it works, prefer lseek with SEEK_DATA and SEEK_HOLE to FIEMAP, as lseek is simpler and more portable (will be in next POSIX). Problem reported in 2011 by Jeff Liu (Bug#8061). * NEWS: Mention this. * src/copy.c (lseek_copy) [SEEK_HOLE]: New function. (enum scantype): New constants ERROR_SCANTYPE, LSEEK_SCANTYPE. (union scan_inference): New type. (infer_scantype): Last arg is now union scan_inference *, not struct extent_scan *. All callers changed. Prefer SEEK_HOLE to FIEMAP if both work, since SEEK_HOLE is simpler and more portable. (copy_reg): Do the fdadvise after initial scan, in case the scan fails. Report an error if the initial scan fails. (copy_reg) [SEEK_HOLE]: Use lseek_copy if scantype says so. --- NEWS | 3 + src/copy.c | 209 ++++++++++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 201 insertions(+), 11 deletions(-) diff --git a/NEWS b/NEWS index d713fa724..63cb47d10 100644 --- a/NEWS +++ b/NEWS @@ -17,6 +17,9 @@ GNU coreutils NEWS -*- outline -*- cp and install now default to copy-on-write (COW) if available. + cp, install and mv now prefer lseek+SEEK_HOLE to ioctl+FS_IOC_FIEMAP + on sparse files, as lseek is simpler and more portable. + On GNU/Linux systems, ls no longer issues an error message on a directory merely because it was removed. This reverts a change that was made in release 8.32. diff --git a/src/copy.c b/src/copy.c index b382cfa4d..d88f8cf93 100644 --- a/src/copy.c +++ b/src/copy.c @@ -416,7 +416,12 @@ write_zeros (int fd, off_t n_bytes) Upon a successful copy, return true. If the initial extent scan fails, set *NORMAL_COPY_REQUIRED to true and return false. Upon any other failure, set *NORMAL_COPY_REQUIRED to false and - return false. */ + return false. + + FIXME: Once we no longer need to support Linux kernel versions + before 3.1 (2011), this function can be retired as it is superseded + by lseek_copy. That is, we no longer need extent-scan.h and can + remove any of the code that uses it. */ static bool extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, size_t hole_size, off_t src_total_size, @@ -595,6 +600,150 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, return true; } +#ifdef SEEK_HOLE +/* Perform an efficient extent copy, if possible. This avoids + the overhead of detecting holes in hole-introducing/preserving + copy, and thus makes copying sparse files much more efficient. + Copy from SRC_FD to DEST_FD, using BUF (of size BUF_SIZE) for a buffer. + Look for holes of size HOLE_SIZE in the input. + The input file is of size SRC_TOTAL_SIZE. + Use SPARSE_MODE to determine whether to create holes in the output. + SRC_NAME and DST_NAME are the input and output file names. + Return true if successful, false (with a diagnostic) otherwise. */ + +static bool +lseek_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, + size_t hole_size, off_t ext_start, off_t src_total_size, + enum Sparse_type sparse_mode, + char const *src_name, char const *dst_name) +{ + off_t last_ext_start = 0; + off_t last_ext_len = 0; + off_t dest_pos = 0; + bool wrote_hole_at_eof = true; + + while (0 <= ext_start) + { + off_t ext_end = lseek (src_fd, ext_start, SEEK_HOLE); + if (ext_end < 0) + { + if (errno != ENXIO) + goto cannot_lseek; + ext_end = src_total_size; + if (ext_end <= ext_start) + { + /* The input file grew; get its current size. */ + src_total_size = lseek (src_fd, 0, SEEK_END); + if (src_total_size < 0) + goto cannot_lseek; + + /* If the input file shrank after growing, stop copying. */ + if (src_total_size <= ext_start) + break; + + ext_end = src_total_size; + } + } + /* If the input file must have grown, increase its measured size. */ + if (src_total_size < ext_end) + src_total_size = ext_end; + + if (lseek (src_fd, ext_start, SEEK_SET) < 0) + goto cannot_lseek; + + wrote_hole_at_eof = false; + off_t ext_hole_size = ext_start - last_ext_start - last_ext_len; + + if (ext_hole_size) + { + if (sparse_mode != SPARSE_NEVER) + { + if (! create_hole (dest_fd, dst_name, + sparse_mode == SPARSE_ALWAYS, + ext_hole_size)) + return false; + wrote_hole_at_eof = true; + } + else + { + /* When not inducing holes and when there is a hole between + the end of the previous extent and the beginning of the + current one, write zeros to the destination file. */ + if (! write_zeros (dest_fd, ext_hole_size)) + { + error (0, errno, _("%s: write failed"), + quotef (dst_name)); + return false; + } + } + } + + off_t ext_len = ext_end - ext_start; + last_ext_start = ext_start; + last_ext_len = ext_len; + + /* Copy this extent, looking for further opportunities to not + bother to write zeros unless --sparse=never, since SEEK_HOLE + is conservative and may miss some holes. */ + off_t n_read; + bool read_hole; + if ( ! sparse_copy (src_fd, dest_fd, buf, buf_size, + sparse_mode == SPARSE_NEVER ? 0 : hole_size, + true, src_name, dst_name, ext_len, &n_read, + &read_hole)) + return false; + + dest_pos = ext_start + n_read; + if (n_read) + wrote_hole_at_eof = read_hole; + if (n_read < ext_len) + { + /* The input file shrank. */ + src_total_size = dest_pos; + break; + } + + ext_start = lseek (src_fd, dest_pos, SEEK_DATA); + if (ext_start < 0) + { + if (errno != ENXIO) + goto cannot_lseek; + break; + } + } + + /* When the source file ends with a hole, we have to do a little more work, + since the above copied only up to and including the final extent. + In order to complete the copy, we may have to insert a hole or write + zeros in the destination corresponding to the source file's hole-at-EOF. + + In addition, if the final extent was a block of zeros at EOF and we've + just converted them to a hole in the destination, we must call ftruncate + here in order to record the proper length in the destination. */ + if ((dest_pos < src_total_size || wrote_hole_at_eof) + && ! (sparse_mode == SPARSE_NEVER + ? write_zeros (dest_fd, src_total_size - dest_pos) + : ftruncate (dest_fd, src_total_size) == 0)) + { + error (0, errno, _("failed to extend %s"), quoteaf (dst_name)); + return false; + } + + if (sparse_mode == SPARSE_ALWAYS && dest_pos < src_total_size + && punch_hole (dest_fd, dest_pos, src_total_size - dest_pos) < 0) + { + error (0, errno, _("error deallocating %s"), quoteaf (dst_name)); + return false; + } + + return true; + + cannot_lseek: + error (0, errno, _("cannot lseek %s"), quoteaf (src_name)); + return false; +} +#endif + /* FIXME: describe */ /* FIXME: rewrite this to use a hash table so we avoid the quadratic performance hit that's probably noticeable only on trees deeper @@ -1010,6 +1159,9 @@ fchmod_or_lchmod (int desc, char const *name, mode_t mode) /* Type of scan being done on the input when looking for sparseness. */ enum scantype { + /* An error was found when determining scantype. */ + ERROR_SCANTYPE, + /* No fancy scanning; just read and write. */ PLAIN_SCANTYPE, @@ -1017,22 +1169,44 @@ enum scantype attempting to create sparse output. */ ZERO_SCANTYPE, + /* lseek information is available. */ + LSEEK_SCANTYPE, + /* Extent information is available. */ EXTENT_SCANTYPE }; -/* Use a heuristic to determine whether stat buffer SB comes from a file - with sparse blocks. If the file has fewer blocks than would normally - be needed for a file of its size, then at least one of the blocks in - the file is a hole. In that case, return true. */ +/* Result of infer_scantype. */ +union scan_inference +{ + /* Used if infer_scantype returns LSEEK_SCANTYPE. This is the + offset of the first data block, or -1 if the file has no data. */ + off_t ext_start; + + /* Used if infer_scantype returns EXTENT_SCANTYPE. */ + struct extent_scan extent_scan; +}; + +/* Return how to scan a file with descriptor FD and stat buffer SB. + Store any information gathered into *SCAN. */ static enum scantype -infer_scantype (int fd, struct stat const *sb, struct extent_scan *scan) +infer_scantype (int fd, struct stat const *sb, + union scan_inference *scan_inference) { if (! (HAVE_STRUCT_STAT_ST_BLOCKS && S_ISREG (sb->st_mode) && ST_NBLOCKS (*sb) < sb->st_size / ST_NBLOCKSIZE)) return PLAIN_SCANTYPE; +#ifdef SEEK_HOLE + scan_inference->ext_start = lseek (fd, 0, SEEK_DATA); + if (0 <= scan_inference->ext_start) + return LSEEK_SCANTYPE; + else if (errno != EINVAL && errno != ENOTSUP) + return errno == ENXIO ? LSEEK_SCANTYPE : ERROR_SCANTYPE; +#endif + + struct extent_scan *scan = &scan_inference->extent_scan; extent_scan_init (fd, scan); extent_scan_read (scan); return scan->initial_scan_failed ? ZERO_SCANTYPE : EXTENT_SCANTYPE; @@ -1066,7 +1240,7 @@ copy_reg (char const *src_name, char const *dst_name, mode_t src_mode = src_sb->st_mode; struct stat sb; struct stat src_open_sb; - struct extent_scan scan; + union scan_inference scan_inference; bool return_val = true; bool data_copy_required = x->data_copy_required; @@ -1263,17 +1437,23 @@ copy_reg (char const *src_name, char const *dst_name, size_t buf_size = io_blksize (sb); size_t hole_size = ST_BLKSIZE (sb); - fdadvise (source_desc, 0, 0, FADVISE_SEQUENTIAL); - /* Deal with sparse files. */ enum scantype scantype = infer_scantype (source_desc, &src_open_sb, - &scan); + &scan_inference); + if (scantype == ERROR_SCANTYPE) + { + error (0, errno, _("cannot lseek %s"), quoteaf (src_name)); + return_val = false; + goto close_src_and_dst_desc; + } bool make_holes = (S_ISREG (sb.st_mode) && (x->sparse_mode == SPARSE_ALWAYS || (x->sparse_mode == SPARSE_AUTO && scantype != PLAIN_SCANTYPE))); + fdadvise (source_desc, 0, 0, FADVISE_SEQUENTIAL); + /* If not making a sparse file, try to use a more-efficient buffer size. */ if (! make_holes) @@ -1307,7 +1487,14 @@ copy_reg (char const *src_name, char const *dst_name, ? extent_copy (source_desc, dest_desc, buf, buf_size, hole_size, src_open_sb.st_size, make_holes ? x->sparse_mode : SPARSE_NEVER, - src_name, dst_name, &scan) + src_name, dst_name, &scan_inference.extent_scan) +#ifdef SEEK_HOLE + : scantype == LSEEK_SCANTYPE + ? lseek_copy (source_desc, dest_desc, buf, buf_size, hole_size, + scan_inference.ext_start, src_open_sb.st_size, + make_holes ? x->sparse_mode : SPARSE_NEVER, + src_name, dst_name) +#endif : sparse_copy (source_desc, dest_desc, buf, buf_size, make_holes ? hole_size : 0, x->sparse_mode == SPARSE_ALWAYS, -- 2.25.4 --------------C013573179FBFEFA6D62E7D4 Content-Type: text/x-patch; charset=UTF-8; name="0005-cp-use-copy_file_range-if-available.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0005-cp-use-copy_file_range-if-available.patch" >From 81849f05b94068f873299bd9fc01b4eebdb31f64 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 25 Jun 2020 17:34:23 -0700 Subject: [PATCH 5/5] cp: use copy_file_range if available * NEWS: Mention this. * bootstrap.conf (gnulib_modules): Add copy-file-range. * src/copy.c (sparse_copy): Try copy_file_range if not looking for holes. --- NEWS | 5 +++-- bootstrap.conf | 1 + src/copy.c | 40 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 44 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index 63cb47d10..1c3f6378d 100644 --- a/NEWS +++ b/NEWS @@ -17,8 +17,9 @@ GNU coreutils NEWS -*- outline -*- cp and install now default to copy-on-write (COW) if available. - cp, install and mv now prefer lseek+SEEK_HOLE to ioctl+FS_IOC_FIEMAP - on sparse files, as lseek is simpler and more portable. + cp, install and mv now use the copy_file_range syscall if available. + Also, they prefer lseek+SEEK_HOLE to ioctl+FS_IOC_FIEMAP on sparse + files, as lseek is simpler and more portable. On GNU/Linux systems, ls no longer issues an error message on a directory merely because it was removed. This reverts a change diff --git a/bootstrap.conf b/bootstrap.conf index 12e2d831a..2506f0db4 100644 --- a/bootstrap.conf +++ b/bootstrap.conf @@ -54,6 +54,7 @@ gnulib_modules=" closeout config-h configmake + copy-file-range crypto/md5 crypto/sha1 crypto/sha256 diff --git a/src/copy.c b/src/copy.c index d88f8cf93..4050f6953 100644 --- a/src/copy.c +++ b/src/copy.c @@ -265,6 +265,46 @@ sparse_copy (int src_fd, int dest_fd, char *buf, size_t buf_size, { *last_write_made_hole = false; *total_n_read = 0; + + /* If not looking for holes, use copy_file_range if available. */ + if (!hole_size) + while (max_n_read) + { + /* Copy at most COPY_MAX bytes at a time; this is min + (PTRDIFF_MAX, SIZE_MAX) truncated to a value that is + surely aligned well. */ + ssize_t ssize_max = TYPE_MAXIMUM (ssize_t); + ptrdiff_t copy_max = MIN (ssize_max, SIZE_MAX) >> 30 << 30; + ssize_t n_copied = copy_file_range (src_fd, NULL, dest_fd, NULL, + MIN (max_n_read, copy_max), 0); + if (n_copied == 0) + { + /* copy_file_range incorrectly returns 0 when reading from + the proc file system on the Linux kernel through at + least 5.6.19 (2020), so fall back on 'read' if the + input file seems empty. */ + if (*total_n_read == 0) + break; + return true; + } + if (n_copied < 0) + { + if (errno == ENOSYS || errno == EINVAL + || errno == EBADF || errno == EXDEV) + break; + if (errno == EINTR) + n_copied = 0; + else + { + error (0, errno, _("error copying %s to %s"), + quoteaf_n (0, src_name), quoteaf_n (1, dst_name)); + return false; + } + } + max_n_read -= n_copied; + *total_n_read += n_copied; + } + bool make_hole = false; off_t psize = 0; -- 2.25.4 --------------C013573179FBFEFA6D62E7D4-- From unknown Sat Jul 12 03:30:10 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Fri, 24 Jul 2020 11:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator