GNU bug report logs - #9157
[PATCH] dd: sparse conv flag

Previous Next

Package: coreutils;

Reported by: Roman Rybalko <devel <at> romanr.info>

Date: Sat, 23 Jul 2011 22:42:04 UTC

Severity: normal

Tags: patch

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9157 in the body.
You can then email your comments to 9157 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, help-debbugs <at> gnu.org:
bug#9157; Package debbugs.gnu.org. (Sat, 23 Jul 2011 22:42:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Roman Rybalko <devel <at> romanr.info>:
New bug report received and forwarded. Copy sent to help-debbugs <at> gnu.org. (Sat, 23 Jul 2011 22:42:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Roman Rybalko <devel <at> romanr.info>
To: undisclosed-recipients:;
Subject: [PATCH] dd: sparse conv flag
Date: Sat, 23 Jul 2011 20:08:01 +0400
---
 doc/coreutils.texi |    4 +++
 src/dd.c           |   22 +++++++++++++++-
 tests/Makefile.am  |    1 +
 tests/dd/sparse    |   70 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 96 insertions(+), 1 deletions(-)
 create mode 100755 tests/dd/sparse

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 424446c..761c698 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -8127,6 +8127,10 @@ Pad every input block to size of @samp{ibs} with trailing zero bytes.
 When used with @samp{block} or @samp{unblock}, pad with spaces instead of
 zero bytes.
 
+@item sparse
+@opindex sparse
+Make sparse output file.
+
 @end table
 
 The following ``conversions'' are really file flags
diff --git a/src/dd.c b/src/dd.c
index 0824f6c..49847f5 100644
--- a/src/dd.c
+++ b/src/dd.c
@@ -126,7 +126,8 @@ enum
     C_NOCREAT = 010000,
     C_EXCL = 020000,
     C_FDATASYNC = 040000,
-    C_FSYNC = 0100000
+    C_FSYNC = 0100000,
+    C_SPARSE = 0200000
   };
 
 /* Status bit masks.  */
@@ -268,6 +269,7 @@ static struct symbol_value const conversions[] =
   {"sync", C_SYNC},		/* Pad input records to ibs with NULs. */
   {"fdatasync", C_FDATASYNC},	/* Synchronize output data before finishing.  */
   {"fsync", C_FSYNC},		/* Also synchronize output metadata.  */
+  {"sparse", C_SPARSE},		/* Make sparse output file. */
   {"", 0}
 };
 
@@ -533,6 +535,9 @@ Each CONV symbol may be:\n\
   fsync     likewise, but also write metadata\n\
 "), stdout);
       fputs (_("\
+  sparse    make sparse output file\n\
+"), stdout);
+      fputs (_("\
 \n\
 Each FLAG symbol may be:\n\
 \n\
@@ -985,6 +990,21 @@ iwrite (int fd, char const *buf, size_t size)
     {
       ssize_t nwritten;
       process_signals ();
+      if (conversions_mask & C_SPARSE)
+        {
+          off_t seek_size = 0;
+          while (buf[total_written + seek_size] == 0)
+            ++seek_size;
+          if (seek_size)
+            {
+              off_t cur_off = 0;
+              cur_off = lseek(fd, seek_size, SEEK_CUR);
+              if (cur_off < 0)
+                break;
+              total_written += seek_size;
+              continue;
+            }
+        }
       nwritten = write (fd, buf + total_written, size - total_written);
       if (nwritten < 0)
         {
diff --git a/tests/Makefile.am b/tests/Makefile.am
index ebd1b11..0f1376a 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -364,6 +364,7 @@ TESTS =						\
   dd/skip-seek					\
   dd/skip-seek2					\
   dd/skip-seek-past-file			\
+  dd/sparse                                     \
   dd/stderr					\
   dd/unblock					\
   dd/unblock-sync				\
diff --git a/tests/dd/sparse b/tests/dd/sparse
new file mode 100755
index 0000000..f0e0806
--- /dev/null
+++ b/tests/dd/sparse
@@ -0,0 +1,70 @@
+#!/bin/sh
+# Ensure that dd conv=sparse works.
+
+# Copyright (C) 2003, 2005-2011 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+print_ver_ dd
+
+# sometimes we may read less than 1M
+dd if=/dev/zero of=sample0 count=1 bs=1M 2> /dev/null || fail=1
+[ "`stat -c %s sample0`" = "1048576" ] || fail=1
+dd if=/dev/urandom of=sample1 count=1 bs=1M 2> /dev/null || fail=1
+[ "`stat -c %s sample1`" = "1048576" ] || fail=1
+
+# test 1
+dd if=sample1 of=test1 seek=0 bs=1M 2> /dev/null || fail=1
+dd if=sample0 of=test1 seek=1 bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=test1 seek=2 bs=1M 2> /dev/null || fail=1
+
+# test 1-1
+dd if=test1 of=out1-1 conv=sparse bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out1-1-check bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out1-1-check seek=2 bs=1M 2> /dev/null || fail=1
+[ "`stat -c '%s %b %B' out1-1`" = "`stat -c '%s %b %B' out1-1-check`" ] || fail=1
+
+# test 1-2
+dd if=test1 of=out1-2 seek=1 conv=sparse bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out1-2-check seek=1 bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out1-2-check seek=3 bs=1M 2> /dev/null || fail=1
+[ "`stat -c '%s %b %B' out1-2`" = "`stat -c '%s %b %B' out1-2-check`" ] || fail=1
+
+# test 1-3
+dd if=test1 of=out1-3 seek=1 skip=1 conv=sparse bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out1-3-check seek=2 bs=1M 2> /dev/null || fail=1
+[ "`stat -c '%s %b %B' out1-3`" = "`stat -c '%s %b %B' out1-3-check`" ] || fail=1
+
+# test 2
+dd if=sample0 of=test2 seek=0 bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=test2 seek=1 bs=1M 2> /dev/null || fail=1
+dd if=sample0 of=test2 seek=2 bs=1M 2> /dev/null || fail=1
+
+# test 2-1
+dd if=test2 of=out2-1 conv=sparse bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out2-1-check seek=1 bs=1M 2> /dev/null || fail=1
+[ "`stat -c '%s %b %B' out2-1`" = "`stat -c '%s %b %B' out2-1-check`" ] || fail=1
+
+# test 2-2
+dd if=test2 of=out2-2 seek=1 conv=sparse bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out2-2-check seek=2 bs=1M 2> /dev/null || fail=1
+[ "`stat -c '%s %b %B' out2-2`" = "`stat -c '%s %b %B' out2-2-check`" ] || fail=1
+
+# test 2-3
+dd if=test2 of=out2-3 seek=1 skip=1 conv=sparse bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out2-3-check seek=1 bs=1M 2> /dev/null || fail=1
+[ "`stat -c '%s %b %B' out2-3`" = "`stat -c '%s %b %B' out2-3-check`" ] || fail=1
+
+Exit $fail
-- 
1.7.0.4






Information forwarded to owner <at> debbugs.gnu.org, help-debbugs <at> gnu.org:
bug#9157; Package debbugs.gnu.org. (Sun, 24 Jul 2011 18:07:02 GMT) Full text and rfc822 format available.

Message #8 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Roman Rybalko <devel <at> romanr.info>
To: undisclosed-recipients:;
Subject: [PATCH] dd: sparse conv flag
Date: Sat, 23 Jul 2011 20:08:01 +0400
---
 doc/coreutils.texi |    4 +++
 src/dd.c           |   22 +++++++++++++++-
 tests/Makefile.am  |    1 +
 tests/dd/sparse    |   70 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 96 insertions(+), 1 deletions(-)
 create mode 100755 tests/dd/sparse

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 424446c..761c698 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -8127,6 +8127,10 @@ Pad every input block to size of @samp{ibs} with trailing zero bytes.
 When used with @samp{block} or @samp{unblock}, pad with spaces instead of
 zero bytes.
 
+@item sparse
+@opindex sparse
+Make sparse output file.
+
 @end table
 
 The following ``conversions'' are really file flags
diff --git a/src/dd.c b/src/dd.c
index 0824f6c..0393740 100644
--- a/src/dd.c
+++ b/src/dd.c
@@ -126,7 +126,8 @@ enum
     C_NOCREAT = 010000,
     C_EXCL = 020000,
     C_FDATASYNC = 040000,
-    C_FSYNC = 0100000
+    C_FSYNC = 0100000,
+    C_SPARSE = 0200000
   };
 
 /* Status bit masks.  */
@@ -268,6 +269,7 @@ static struct symbol_value const conversions[] =
   {"sync", C_SYNC},		/* Pad input records to ibs with NULs. */
   {"fdatasync", C_FDATASYNC},	/* Synchronize output data before finishing.  */
   {"fsync", C_FSYNC},		/* Also synchronize output metadata.  */
+  {"sparse", C_SPARSE},		/* Make sparse output file. */
   {"", 0}
 };
 
@@ -533,6 +535,9 @@ Each CONV symbol may be:\n\
   fsync     likewise, but also write metadata\n\
 "), stdout);
       fputs (_("\
+  sparse    make sparse output file\n\
+"), stdout);
+      fputs (_("\
 \n\
 Each FLAG symbol may be:\n\
 \n\
@@ -985,6 +990,21 @@ iwrite (int fd, char const *buf, size_t size)
     {
       ssize_t nwritten;
       process_signals ();
+      if (conversions_mask & C_SPARSE)
+        {
+          off_t seek_size = 0;
+          while (total_written + seek_size < size && buf[total_written + seek_size] == 0)
+            ++seek_size;
+          if (seek_size)
+            {
+              off_t cur_off = 0;
+              cur_off = lseek(fd, seek_size, SEEK_CUR);
+              if (cur_off < 0)
+                break;
+              total_written += seek_size;
+              continue;
+            }
+        }
       nwritten = write (fd, buf + total_written, size - total_written);
       if (nwritten < 0)
         {
diff --git a/tests/Makefile.am b/tests/Makefile.am
index ebd1b11..0f1376a 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -364,6 +364,7 @@ TESTS =						\
   dd/skip-seek					\
   dd/skip-seek2					\
   dd/skip-seek-past-file			\
+  dd/sparse                                     \
   dd/stderr					\
   dd/unblock					\
   dd/unblock-sync				\
diff --git a/tests/dd/sparse b/tests/dd/sparse
new file mode 100755
index 0000000..f0e0806
--- /dev/null
+++ b/tests/dd/sparse
@@ -0,0 +1,70 @@
+#!/bin/sh
+# Ensure that dd conv=sparse works.
+
+# Copyright (C) 2003, 2005-2011 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+print_ver_ dd
+
+# sometimes we may read less than 1M
+dd if=/dev/zero of=sample0 count=1 bs=1M 2> /dev/null || fail=1
+[ "`stat -c %s sample0`" = "1048576" ] || fail=1
+dd if=/dev/urandom of=sample1 count=1 bs=1M 2> /dev/null || fail=1
+[ "`stat -c %s sample1`" = "1048576" ] || fail=1
+
+# test 1
+dd if=sample1 of=test1 seek=0 bs=1M 2> /dev/null || fail=1
+dd if=sample0 of=test1 seek=1 bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=test1 seek=2 bs=1M 2> /dev/null || fail=1
+
+# test 1-1
+dd if=test1 of=out1-1 conv=sparse bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out1-1-check bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out1-1-check seek=2 bs=1M 2> /dev/null || fail=1
+[ "`stat -c '%s %b %B' out1-1`" = "`stat -c '%s %b %B' out1-1-check`" ] || fail=1
+
+# test 1-2
+dd if=test1 of=out1-2 seek=1 conv=sparse bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out1-2-check seek=1 bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out1-2-check seek=3 bs=1M 2> /dev/null || fail=1
+[ "`stat -c '%s %b %B' out1-2`" = "`stat -c '%s %b %B' out1-2-check`" ] || fail=1
+
+# test 1-3
+dd if=test1 of=out1-3 seek=1 skip=1 conv=sparse bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out1-3-check seek=2 bs=1M 2> /dev/null || fail=1
+[ "`stat -c '%s %b %B' out1-3`" = "`stat -c '%s %b %B' out1-3-check`" ] || fail=1
+
+# test 2
+dd if=sample0 of=test2 seek=0 bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=test2 seek=1 bs=1M 2> /dev/null || fail=1
+dd if=sample0 of=test2 seek=2 bs=1M 2> /dev/null || fail=1
+
+# test 2-1
+dd if=test2 of=out2-1 conv=sparse bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out2-1-check seek=1 bs=1M 2> /dev/null || fail=1
+[ "`stat -c '%s %b %B' out2-1`" = "`stat -c '%s %b %B' out2-1-check`" ] || fail=1
+
+# test 2-2
+dd if=test2 of=out2-2 seek=1 conv=sparse bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out2-2-check seek=2 bs=1M 2> /dev/null || fail=1
+[ "`stat -c '%s %b %B' out2-2`" = "`stat -c '%s %b %B' out2-2-check`" ] || fail=1
+
+# test 2-3
+dd if=test2 of=out2-3 seek=1 skip=1 conv=sparse bs=1M 2> /dev/null || fail=1
+dd if=sample1 of=out2-3-check seek=1 bs=1M 2> /dev/null || fail=1
+[ "`stat -c '%s %b %B' out2-3`" = "`stat -c '%s %b %B' out2-3-check`" ] || fail=1
+
+Exit $fail
-- 
1.7.0.4






Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Sat, 06 Aug 2011 20:54:01 GMT) Full text and rfc822 format available.

Message #11 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: 9157 <at> debbugs.gnu.org
Cc: Roman Rybalko <devel <at> romanr.info>
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Sat, 06 Aug 2011 16:52:46 -0400
This bug was reported via bcc. Because of this, the bug tracking system
did not know which package it was associated with and assigned it to
debbugs.gnu.org. I have reassigned it to the coreutils package and am
sending this so that the bug-coreutils mailing list sees the report.

See the complete report at http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9157

Roman Rybalko wrote:

> ---
>  doc/coreutils.texi |    4 +++
>  src/dd.c           |   22 +++++++++++++++-
>  tests/Makefile.am  |    1 +
>  tests/dd/sparse    |   70 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 96 insertions(+), 1 deletions(-)
>  create mode 100755 tests/dd/sparse
>
> diff --git a/doc/coreutils.texi b/doc/coreutils.texi
> index 424446c..761c698 100644
> --- a/doc/coreutils.texi
> +++ b/doc/coreutils.texi
> @@ -8127,6 +8127,10 @@ Pad every input block to size of @samp{ibs} with trailing zero bytes.
>  When used with @samp{block} or @samp{unblock}, pad with spaces instead of
>  zero bytes.
>  
> +@item sparse
> +@opindex sparse
> +Make sparse output file.
> +
>  @end table
>  
>  The following ``conversions'' are really file flags
> diff --git a/src/dd.c b/src/dd.c
> index 0824f6c..0393740 100644
> --- a/src/dd.c
> +++ b/src/dd.c
> @@ -126,7 +126,8 @@ enum
>      C_NOCREAT = 010000,
>      C_EXCL = 020000,
>      C_FDATASYNC = 040000,
> -    C_FSYNC = 0100000
> +    C_FSYNC = 0100000,
> +    C_SPARSE = 0200000
>    };
>  
>  /* Status bit masks.  */
> @@ -268,6 +269,7 @@ static struct symbol_value const conversions[] =
>    {"sync", C_SYNC},		/* Pad input records to ibs with NULs. */
>    {"fdatasync", C_FDATASYNC},	/* Synchronize output data before finishing.  */
>    {"fsync", C_FSYNC},		/* Also synchronize output metadata.  */
> +  {"sparse", C_SPARSE},		/* Make sparse output file. */
>    {"", 0}
>  };
>  
> @@ -533,6 +535,9 @@ Each CONV symbol may be:\n\
>    fsync     likewise, but also write metadata\n\
>  "), stdout);
>        fputs (_("\
> +  sparse    make sparse output file\n\
> +"), stdout);
> +      fputs (_("\
>  \n\
>  Each FLAG symbol may be:\n\
>  \n\
> @@ -985,6 +990,21 @@ iwrite (int fd, char const *buf, size_t size)
>      {
>        ssize_t nwritten;
>        process_signals ();
> +      if (conversions_mask & C_SPARSE)
> +        {
> +          off_t seek_size = 0;
> +          while (total_written + seek_size < size && buf[total_written + seek_size] == 0)
> +            ++seek_size;
> +          if (seek_size)
> +            {
> +              off_t cur_off = 0;
> +              cur_off = lseek(fd, seek_size, SEEK_CUR);
> +              if (cur_off < 0)
> +                break;
> +              total_written += seek_size;
> +              continue;
> +            }
> +        }
>        nwritten = write (fd, buf + total_written, size - total_written);
>        if (nwritten < 0)
>          {
> diff --git a/tests/Makefile.am b/tests/Makefile.am
> index ebd1b11..0f1376a 100644
> --- a/tests/Makefile.am
> +++ b/tests/Makefile.am
> @@ -364,6 +364,7 @@ TESTS =						\
>    dd/skip-seek					\
>    dd/skip-seek2					\
>    dd/skip-seek-past-file			\
> +  dd/sparse                                     \
>    dd/stderr					\
>    dd/unblock					\
>    dd/unblock-sync				\
> diff --git a/tests/dd/sparse b/tests/dd/sparse
> new file mode 100755
> index 0000000..f0e0806
> --- /dev/null
> +++ b/tests/dd/sparse
> @@ -0,0 +1,70 @@
> +#!/bin/sh
> +# Ensure that dd conv=sparse works.
> +
> +# Copyright (C) 2003, 2005-2011 Free Software Foundation, Inc.
> +
> +# This program is free software: you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation, either version 3 of the License, or
> +# (at your option) any later version.
> +
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +. "${srcdir=.}/init.sh"; path_prepend_ ../src
> +print_ver_ dd
> +
> +# sometimes we may read less than 1M
> +dd if=/dev/zero of=sample0 count=1 bs=1M 2> /dev/null || fail=1
> +[ "`stat -c %s sample0`" = "1048576" ] || fail=1
> +dd if=/dev/urandom of=sample1 count=1 bs=1M 2> /dev/null || fail=1
> +[ "`stat -c %s sample1`" = "1048576" ] || fail=1
> +
> +# test 1
> +dd if=sample1 of=test1 seek=0 bs=1M 2> /dev/null || fail=1
> +dd if=sample0 of=test1 seek=1 bs=1M 2> /dev/null || fail=1
> +dd if=sample1 of=test1 seek=2 bs=1M 2> /dev/null || fail=1
> +
> +# test 1-1
> +dd if=test1 of=out1-1 conv=sparse bs=1M 2> /dev/null || fail=1
> +dd if=sample1 of=out1-1-check bs=1M 2> /dev/null || fail=1
> +dd if=sample1 of=out1-1-check seek=2 bs=1M 2> /dev/null || fail=1
> +[ "`stat -c '%s %b %B' out1-1`" = "`stat -c '%s %b %B' out1-1-check`" ] || fail=1
> +
> +# test 1-2
> +dd if=test1 of=out1-2 seek=1 conv=sparse bs=1M 2> /dev/null || fail=1
> +dd if=sample1 of=out1-2-check seek=1 bs=1M 2> /dev/null || fail=1
> +dd if=sample1 of=out1-2-check seek=3 bs=1M 2> /dev/null || fail=1
> +[ "`stat -c '%s %b %B' out1-2`" = "`stat -c '%s %b %B' out1-2-check`" ] || fail=1
> +
> +# test 1-3
> +dd if=test1 of=out1-3 seek=1 skip=1 conv=sparse bs=1M 2> /dev/null || fail=1
> +dd if=sample1 of=out1-3-check seek=2 bs=1M 2> /dev/null || fail=1
> +[ "`stat -c '%s %b %B' out1-3`" = "`stat -c '%s %b %B' out1-3-check`" ] || fail=1
> +
> +# test 2
> +dd if=sample0 of=test2 seek=0 bs=1M 2> /dev/null || fail=1
> +dd if=sample1 of=test2 seek=1 bs=1M 2> /dev/null || fail=1
> +dd if=sample0 of=test2 seek=2 bs=1M 2> /dev/null || fail=1
> +
> +# test 2-1
> +dd if=test2 of=out2-1 conv=sparse bs=1M 2> /dev/null || fail=1
> +dd if=sample1 of=out2-1-check seek=1 bs=1M 2> /dev/null || fail=1
> +[ "`stat -c '%s %b %B' out2-1`" = "`stat -c '%s %b %B' out2-1-check`" ] || fail=1
> +
> +# test 2-2
> +dd if=test2 of=out2-2 seek=1 conv=sparse bs=1M 2> /dev/null || fail=1
> +dd if=sample1 of=out2-2-check seek=2 bs=1M 2> /dev/null || fail=1
> +[ "`stat -c '%s %b %B' out2-2`" = "`stat -c '%s %b %B' out2-2-check`" ] || fail=1
> +
> +# test 2-3
> +dd if=test2 of=out2-3 seek=1 skip=1 conv=sparse bs=1M 2> /dev/null || fail=1
> +dd if=sample1 of=out2-3-check seek=1 bs=1M 2> /dev/null || fail=1
> +[ "`stat -c '%s %b %B' out2-3`" = "`stat -c '%s %b %B' out2-3-check`" ] || fail=1
> +
> +Exit $fail




Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Sat, 10 Dec 2011 10:11:03 GMT) Full text and rfc822 format available.

Message #14 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Roman Rybalko <devel <at> romanr.info>
Cc: 9157 <at> debbugs.gnu.org
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Sat, 10 Dec 2011 11:09:18 +0100
Roman Rybalko wrote:
> ---
>  doc/coreutils.texi |    4 +++
>  src/dd.c           |   22 +++++++++++++++-
>  tests/Makefile.am  |    1 +
>  tests/dd/sparse    |   70 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 96 insertions(+), 1 deletions(-)
>  create mode 100755 tests/dd/sparse

Thank you for the patch.  It's nice to see accompanying test suite changes.
I see that copyright paperwork is now on file with the FSF, so...

Did you notice that cp now has the capability (using extents)
to preserve holes efficiently on some file system types?
Recent kernels also have SEEK_HOLE/SEEK_DATA support with which
it is more efficient to detect holes.

Here are some things we'll have to consider before
adding a new hole-punching option to dd:

Your patch may create a hole in the destination for each sequence of
length seek_size or greater of zero bytes in the input.
As you may have seen in the cp-related discussion, one may
want different options:
  - preserve a file's hole/non-hole structure
  - efficiently detect existing holes and fill them with explicit zeros in dest
  - efficiently detect existing holes and seek-in-dest for each sequence of
        zeros (longer than some minimum) in non-hole input

> diff --git a/doc/coreutils.texi b/doc/coreutils.texi
> index 424446c..761c698 100644
> --- a/doc/coreutils.texi
> +++ b/doc/coreutils.texi
> @@ -8127,6 +8127,10 @@ Pad every input block to size of @samp{ibs} with trailing zero bytes.
>  When used with @samp{block} or @samp{unblock}, pad with spaces instead of
>  zero bytes.
>
> +@item sparse
> +@opindex sparse
> +Make sparse output file.

Please say a little more here.
I.e., when might a hole be introduced?
When is this option useful?

>  @end table
>
>  The following ``conversions'' are really file flags
> diff --git a/src/dd.c b/src/dd.c
> index 0824f6c..0393740 100644
> --- a/src/dd.c
> +++ b/src/dd.c
> @@ -126,7 +126,8 @@ enum
>      C_NOCREAT = 010000,
>      C_EXCL = 020000,
>      C_FDATASYNC = 040000,
> -    C_FSYNC = 0100000
> +    C_FSYNC = 0100000,
> +    C_SPARSE = 0200000
>    };
>
>  /* Status bit masks.  */
> @@ -268,6 +269,7 @@ static struct symbol_value const conversions[] =
>    {"sync", C_SYNC},		/* Pad input records to ibs with NULs. */
>    {"fdatasync", C_FDATASYNC},	/* Synchronize output data before finishing.  */
>    {"fsync", C_FSYNC},		/* Also synchronize output metadata.  */
> +  {"sparse", C_SPARSE},		/* Make sparse output file. */
>    {"", 0}
>  };
>
> @@ -533,6 +535,9 @@ Each CONV symbol may be:\n\
>    fsync     likewise, but also write metadata\n\
>  "), stdout);
>        fputs (_("\
> +  sparse    make sparse output file\n\
> +"), stdout);
> +      fputs (_("\
>  \n\
>  Each FLAG symbol may be:\n\
>  \n\
> @@ -985,6 +990,21 @@ iwrite (int fd, char const *buf, size_t size)
>      {
>        ssize_t nwritten;
>        process_signals ();
> +      if (conversions_mask & C_SPARSE)
> +        {
> +          off_t seek_size = 0;
> +          while (total_written + seek_size < size && buf[total_written + seek_size] == 0)
> +            ++seek_size;
> +          if (seek_size)
> +            {
> +              off_t cur_off = 0;
> +              cur_off = lseek(fd, seek_size, SEEK_CUR);
> +              if (cur_off < 0)
> +                break;

dd must not ignore lseek failure.

> +              total_written += seek_size;
> +              continue;
> +            }
> +        }
>        nwritten = write (fd, buf + total_written, size - total_written);
>        if (nwritten < 0)
>          {
> diff --git a/tests/Makefile.am b/tests/Makefile.am
> index ebd1b11..0f1376a 100644
> --- a/tests/Makefile.am
> +++ b/tests/Makefile.am
> @@ -364,6 +364,7 @@ TESTS =						\
>    dd/skip-seek					\
>    dd/skip-seek2					\
>    dd/skip-seek-past-file			\
> +  dd/sparse                                     \
>    dd/stderr					\
>    dd/unblock					\
>    dd/unblock-sync				\
> diff --git a/tests/dd/sparse b/tests/dd/sparse
> new file mode 100755
> index 0000000..f0e0806
> --- /dev/null
> +++ b/tests/dd/sparse
> @@ -0,0 +1,70 @@
> +#!/bin/sh
> +# Ensure that dd conv=sparse works.
> +
> +# Copyright (C) 2003, 2005-2011 Free Software Foundation, Inc.

Use only 2011 as the copyright year.

> +# This program is free software: you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation, either version 3 of the License, or
> +# (at your option) any later version.
> +
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +. "${srcdir=.}/init.sh"; path_prepend_ ../src
> +print_ver_ dd
> +
> +# sometimes we may read less than 1M
> +dd if=/dev/zero of=sample0 count=1 bs=1M 2> /dev/null || fail=1
> +[ "`stat -c %s sample0`" = "1048576" ] || fail=1

We'd write that like this instead:
(note use of test, not "[...]", use of $(...), not `...`)

test "$(stat -c %s sample0)" = 1048576 || fail=1

> +dd if=/dev/urandom of=sample1 count=1 bs=1M 2> /dev/null || fail=1
> +[ "`stat -c %s sample1`" = "1048576" ] || fail=1
> +
> +# test 1
> +dd if=sample1 of=test1 seek=0 bs=1M 2> /dev/null || fail=1
> +dd if=sample0 of=test1 seek=1 bs=1M 2> /dev/null || fail=1
> +dd if=sample1 of=test1 seek=2 bs=1M 2> /dev/null || fail=1
> +
> +# test 1-1
> +dd if=test1 of=out1-1 conv=sparse bs=1M 2> /dev/null || fail=1
> +dd if=sample1 of=out1-1-check bs=1M 2> /dev/null || fail=1
> +dd if=sample1 of=out1-1-check seek=2 bs=1M 2> /dev/null || fail=1
> +[ "`stat -c '%s %b %B' out1-1`" = "`stat -c '%s %b %B' out1-1-check`" ] || fail=1




Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Mon, 12 Dec 2011 00:00:02 GMT) Full text and rfc822 format available.

Message #17 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: "Roman Rybalko (devel)" <devel <at> romanr.info>
To: Jim Meyering <jim <at> meyering.net>
Cc: 9157 <at> debbugs.gnu.org
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Mon, 12 Dec 2011 00:02:51 +0400
On 10.12.2011 14:09, Jim Meyering wrote:
> Here are some things we'll have to consider before
> adding a new hole-punching option to dd:
>
> Your patch may create a hole in the destination for each sequence of
> length seek_size or greater of zero bytes in the input.
> As you may have seen in the cp-related discussion, one may
> want different options:
>   - preserve a file's hole/non-hole structure
>   - efficiently detect existing holes and fill them with explicit zeros in dest
>   - efficiently detect existing holes and seek-in-dest for each sequence of
>         zeros (longer than some minimum) in non-hole input
Okay, I'll think about that.
That's a clear task.
>> diff --git a/doc/coreutils.texi b/doc/coreutils.texi
>> index 424446c..761c698 100644
>> --- a/doc/coreutils.texi
>> +++ b/doc/coreutils.texi
>> @@ -8127,6 +8127,10 @@ Pad every input block to size of @samp{ibs} with trailing zero bytes.
>>  When used with @samp{block} or @samp{unblock}, pad with spaces instead of
>>  zero bytes.
>>
>> +@item sparse
>> +@opindex sparse
>> +Make sparse output file.
> Please say a little more here.
> I.e., when might a hole be introduced?
> When is this option useful?
Okay.
>> @@ -985,6 +990,21 @@ iwrite (int fd, char const *buf, size_t size)
>>      {
>>        ssize_t nwritten;
>>        process_signals ();
>> +      if (conversions_mask & C_SPARSE)
>> +        {
>> +          off_t seek_size = 0;
>> +          while (total_written + seek_size < size && buf[total_written + seek_size] == 0)
>> +            ++seek_size;
>> +          if (seek_size)
>> +            {
>> +              off_t cur_off = 0;
>> +              cur_off = lseek(fd, seek_size, SEEK_CUR);
>> +              if (cur_off < 0)
>> +                break;
> dd must not ignore lseek failure.
That's a problem for me.
How would be suitable to handle lseek failure?
Perhaps with new kernel API this code may be obsoleted.
>> @@ -0,0 +1,70 @@
>> +#!/bin/sh
>> +# Ensure that dd conv=sparse works.
>> +
>> +# Copyright (C) 2003, 2005-2011 Free Software Foundation, Inc.
> Use only 2011 as the copyright year.
Okay.
>> +# sometimes we may read less than 1M
>> +dd if=/dev/zero of=sample0 count=1 bs=1M 2> /dev/null || fail=1
>> +[ "`stat -c %s sample0`" = "1048576" ] || fail=1
> We'd write that like this instead:
> (note use of test, not "[...]", use of $(...), not `...`)
>
> test "$(stat -c %s sample0)" = 1048576 || fail=1
Okay.

-- 
WBR,
Roman Rybalko





Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Mon, 27 Feb 2012 15:23:01 GMT) Full text and rfc822 format available.

Message #20 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: "Roman Rybalko (devel)" <devel <at> romanr.info>
Cc: 9157 <at> debbugs.gnu.org, Jim Meyering <jim <at> meyering.net>
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Mon, 27 Feb 2012 15:20:00 +0000
[Message part 1 (text/plain, inline)]
I recently noticed a use case for this feature:
  https://review.openstack.org/#change,4435
But I managed to forget we had a patch pending for this,
and spent a couple of hours writing my own version :p
Well at least it's good practise for reviewing Roman's...

I notice Roman's seeks() the output for any run of NULs,
while mine only considers blocks of the full output block size.
Checking the full block is a bit more CPU efficient,
and gives a bit more control, so I'm marginally leaning
towards doing that?

Also Roman's doesn't handle the case where a seek
is done at the end of the file. In that case an ftruncate()
or write() is needed to correctly set the size.

Notes on my version attached are:
 I first need to refactor is_nul() for use by cp too.
 My version is advisory also
 We may need to coalesce seeks to larger ones? something like cache_round()?
   I thought it better to keep the code simple in this regard though
   as it's probably not of practical concern.
 I used conv= for bsd compat, rather than oflag=.
 Needs tests and docs yet.

cheers,
Pádraig.
[dd-sparse.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Mon, 27 Feb 2012 15:54:02 GMT) Full text and rfc822 format available.

Message #23 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 9157 <at> debbugs.gnu.org, "Roman Rybalko \(devel\)" <devel <at> romanr.info>
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Mon, 27 Feb 2012 07:50:16 -0800
Thanks for looking into this!  Some nits:

On 02/27/2012 07:20 AM, Pádraig Brady wrote:
> +is_nul (const char* buf, size_t bufsize)

The usual spacing style is "const char *buf", since the
"*" really belongs to the "buf" in C.

> +  memset (obuf + output_blocksize, 1, sizeof (uintptr_t));

This constraint on OUTPUT_BLOCK_SLOP should be documented, thus:

#define OUTPUT_BLOCK_SLOP MAX (sizeof (uintptr_t), page_size - 1)

> +  /* write sentinel to slop after the buffer,
> +     to allow efficient checking for NUL blocks.  */

Initial cap for sentence.  The next comment has a similar issue.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Mon, 27 Feb 2012 16:14:02 GMT) Full text and rfc822 format available.

Message #26 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 9157 <at> debbugs.gnu.org, "Roman Rybalko \(devel\)" <devel <at> romanr.info>
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Mon, 27 Feb 2012 16:10:23 +0000
On 02/27/2012 03:50 PM, Paul Eggert wrote:
> Thanks for looking into this!  Some nits:
> 
> On 02/27/2012 07:20 AM, Pádraig Brady wrote:
>> +is_nul (const char* buf, size_t bufsize)
> 
> The usual spacing style is "const char *buf", since the
> "*" really belongs to the "buf" in C.
> 
>> +  memset (obuf + output_blocksize, 1, sizeof (uintptr_t));
> 
> This constraint on OUTPUT_BLOCK_SLOP should be documented, thus:
> 
> #define OUTPUT_BLOCK_SLOP MAX (sizeof (uintptr_t), page_size - 1)
> 
>> +  /* write sentinel to slop after the buffer,
>> +     to allow efficient checking for NUL blocks.  */
> 
> Initial cap for sentence.  The next comment has a similar issue.
> 

Thanks Paul. Adjustments made.

I'm also thinking I should warn once when
the new lseek fails, but only after the fall back
write() succeeds.

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Mon, 27 Feb 2012 17:18:01 GMT) Full text and rfc822 format available.

Message #29 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 9157 <at> debbugs.gnu.org, "Roman Rybalko \(devel\)" <devel <at> romanr.info>
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Mon, 27 Feb 2012 09:14:14 -0800
On 02/27/2012 08:10 AM, Pádraig Brady wrote:
> I'm also thinking I should warn once when
> the new lseek fails, but only after the fall back
> write() succeeds.

I wouldn't bother.  Either diagnostic should suffice,
and warning about the lseek would slow the app down
a bit and make it more complicated.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Mon, 27 Feb 2012 18:10:02 GMT) Full text and rfc822 format available.

Message #32 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 9157 <at> debbugs.gnu.org
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Mon, 27 Feb 2012 18:06:12 +0000
On 02/27/2012 05:14 PM, Paul Eggert wrote:
> On 02/27/2012 08:10 AM, Pádraig Brady wrote:
>> I'm also thinking I should warn once when
>> the new lseek fails, but only after the fall back
>> write() succeeds.
> 
> I wouldn't bother.  Either diagnostic should suffice,
> and warning about the lseek would slow the app down
> a bit and make it more complicated.

Yep, I meant defer to the write() diagnostic.

What I was wondering was whether to warn about
the case where the seek fails but the write succeeds like here:

  dd conv=sparse bs=1 count=10 | ...

I'll leave it as advisory for now, and not have
a warning for the above case.

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Tue, 28 Feb 2012 01:14:01 GMT) Full text and rfc822 format available.

Message #35 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, roman.rybalko <at> romanr.info
Cc: 9157 <at> debbugs.gnu.org
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Tue, 28 Feb 2012 01:09:51 +0000
On 02/27/2012 06:06 PM, Pádraig Brady wrote:
> On 02/27/2012 05:14 PM, Paul Eggert wrote:
>> On 02/27/2012 08:10 AM, Pádraig Brady wrote:
>>> I'm also thinking I should warn once when
>>> the new lseek fails, but only after the fall back
>>> write() succeeds.
>>
>> I wouldn't bother.  Either diagnostic should suffice,
>> and warning about the lseek would slow the app down
>> a bit and make it more complicated.
> 
> Yep, I meant defer to the write() diagnostic.
> 
> What I was wondering was whether to warn about
> the case where the seek fails but the write succeeds like here:
> 
>   dd conv=sparse bs=1 count=10 | ...
> 
> I'll leave it as advisory for now, and not have
> a warning for the above case.

Another adjustment I'll need to make is how
to handle existing regular files with conv=trunc.
I.E. seeking over existing possible non NUL data.
It's too dangerous/inconsistent to do this for files I think.
With devices we can support auto allocating virtual devices
as mentioned elsewhere in the thread.

We could do any of these for existing files with conv=sparse,notrunc

1. disallow conv=sparse,notrunc
2. ignore sparse (with warning)
3. allow sparse in the additional part of output file
4. detect holes in output file and maintain.

I'm leaning towards 2 as that will allow us to
change to 3 or 4 if deemed necessary in future.

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Tue, 28 Feb 2012 08:55:01 GMT) Full text and rfc822 format available.

Message #38 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Tue, 28 Feb 2012 00:50:36 -0800
On 02/27/2012 05:09 PM, Pádraig Brady wrote:
> how
> to handle existing regular files with conv=trunc.
> I.E. seeking over existing possible non NUL data.
> It's too dangerous/inconsistent to do this for files I think.

Why?  This is *dd* we're talking about here.
It's *supposed* to be used for tricky stuff like this.

If one interprets conv=sparse to mean "write sparsely",
rather than "create a sparse file that exactly mimics
the input's sparseness", then everything should be clear,
no?




Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Tue, 28 Feb 2012 11:07:02 GMT) Full text and rfc822 format available.

Message #41 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Tue, 28 Feb 2012 11:02:31 +0000
On 02/28/2012 08:50 AM, Paul Eggert wrote:
> On 02/27/2012 05:09 PM, Pádraig Brady wrote:
>> how
>> to handle existing regular files with conv=trunc.
>> I.E. seeking over existing possible non NUL data.
>> It's too dangerous/inconsistent to do this for files I think.
> 
> Why?  This is *dd* we're talking about here.
> It's *supposed* to be used for tricky stuff like this.
> 
> If one interprets conv=sparse to mean "write sparsely",
> rather than "create a sparse file that exactly mimics
> the input's sparseness", then everything should be clear,
> no?

Well I wasn't considering reproducing sparseness.
I was thinking it would be more beneficial to be able to
update a file in place, but do so sparsely if possible.
I.E. support something like:

truncate -r src.img backup.img &&
dd if=src.img conv=notrunc,sparse of=backup.img

To do that, one could not simply skip over the output
for NUL input. Doing that would be "dangerous" as I said above,
or in other words, surprising to users to not update
possibly non NUL data in the output file.

As for the "inconsistent" point I mentioned. The last patch
currently just seeks the output for NUL input,
_except_ for the last byte of the file if part of a NUL write.
So that inconsistency means one couldn't use NUL input as
a write mask so to speak, if one ever did want such an
esoteric feature. I guess we could address the consistency aspect
by using ftruncate() rather than the write(outfd, "\0", 1) technique
(only doing that if we've extended the file).
However I'm not sure such functionality is useful for files,
and that we should try to align with the first operation?
Note ftruncate() might be good to do anyway to avoid
writes completely to a fully sparse file.

Also I'm just thought about oflag=append.
That will cause writes to ignore the seek beyond end of file.
I.E. with the last patch, this will create a file.test with just "ab"

  printf "a\000\000b" |
  src/dd conv=sparse bs=1 count=10 oflag=append > file.test

I guess an ftruncate() would cater for that too, except in the case
where another process is writing to the file, in which case
you would might want to disallow the conv=sparse oflag=append combo.
We can't really ftruncate as we go as that would be racy,
so I guess we should disallow the combo?

cheers,
Pádraig.




Reply sent to Pádraig Brady <P <at> draigBrady.com>:
You have taken responsibility. (Tue, 28 Feb 2012 21:51:02 GMT) Full text and rfc822 format available.

Notification sent to Roman Rybalko <devel <at> romanr.info>:
bug acknowledged by developer. (Tue, 28 Feb 2012 21:51:03 GMT) Full text and rfc822 format available.

Message #46 received at 9157-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 9157-done <at> debbugs.gnu.org, roman.rybalko <at> romanr.info
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Tue, 28 Feb 2012 21:49:43 +0000
On 02/28/2012 06:24 PM, Paul Eggert wrote:
> On 02/28/2012 09:01 AM, Pádraig Brady wrote:
> 
>> Doing that would be "dangerous" as I said above,
>> or in other words, surprising to users to not update
>> possibly non NUL data in the output file.
> 
> It's not surprising at all.  It's what I want and
> expect.  For example, suppose every nonzero byte of file A
> has an offset that is in a hole of file B.  Then I should
> be able to overlay A and B into another file C, as follows:
> 
> dd if=A of=C conv=sparse
> dd if=B of=C conv=sparse,notrunc
> 
> This is akin to the "or" operation on files, and it'd be a nice
> operation to have.  Even if A's contents don't fit in B's holes,
> I still might want to do the above, to allow B's non-holes to
> override A's contents.  Why disable this useful functionality?
> 
> Any surprise issues can be dealt with by documenting dd's
> behavior appropriately.

Fair enough. So this is the "write mask" functionality
I referred to, and which is now supported consistently
since the write(fd,"\0",1) to ftruncate() change.

>> +Try to seek rather than write @sc{nul} output blocks.
>> +This will create sparse output when extending.
>> +This option is ignored in conjunction with
>> +@samp{conv=notrunc} or @samp{oflag=append}.
> 
> I still dubious about this level of handholding.
> dd is meant for low-level use, and as far as possible
> options should be orthogonal.  For example, with dd,
> oflag=append does not disable seek=N -- both flags operate,
> which means that the seek is ineffective unless it
> is past the end of file.  conv=sparse oflag=append
> should be similar: all it should mean is that the
> writes are sparse when they're past the end of
> the file (this latter functionality should work, but
> doesn't work with the proposed patch).

OK I'll remove the 2 restrictions, and essentially
move the comments for them from code to texinfo.

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Tue, 28 Feb 2012 22:25:02 GMT) Full text and rfc822 format available.

Message #49 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Tue, 28 Feb 2012 17:01:02 +0000
[Message part 1 (text/plain, inline)]
On 02/28/2012 11:02 AM, Pádraig Brady wrote:
> On 02/28/2012 08:50 AM, Paul Eggert wrote:
>> On 02/27/2012 05:09 PM, Pádraig Brady wrote:
>>> how
>>> to handle existing regular files with conv=trunc.
>>> I.E. seeking over existing possible non NUL data.
>>> It's too dangerous/inconsistent to do this for files I think.
>>
>> Why?  This is *dd* we're talking about here.
>> It's *supposed* to be used for tricky stuff like this.
>>
>> If one interprets conv=sparse to mean "write sparsely",
>> rather than "create a sparse file that exactly mimics
>> the input's sparseness", then everything should be clear,
>> no?
> 
> Well I wasn't considering reproducing sparseness.
> I was thinking it would be more beneficial to be able to
> update a file in place, but do so sparsely if possible.
> I.E. support something like:
> 
> truncate -r src.img backup.img &&
> dd if=src.img conv=notrunc,sparse of=backup.img
> 
> To do that, one could not simply skip over the output
> for NUL input. Doing that would be "dangerous" as I said above,
> or in other words, surprising to users to not update
> possibly non NUL data in the output file.
> 
> As for the "inconsistent" point I mentioned. The last patch
> currently just seeks the output for NUL input,
> _except_ for the last byte of the file if part of a NUL write.
> So that inconsistency means one couldn't use NUL input as
> a write mask so to speak, if one ever did want such an
> esoteric feature. I guess we could address the consistency aspect
> by using ftruncate() rather than the write(outfd, "\0", 1) technique
> (only doing that if we've extended the file).
> However I'm not sure such functionality is useful for files,
> and that we should try to align with the first operation?
> Note ftruncate() might be good to do anyway to avoid
> writes completely to a fully sparse file.
> 
> Also I'm just thought about oflag=append.
> That will cause writes to ignore the seek beyond end of file.
> I.E. with the last patch, this will create a file.test with just "ab"
> 
>   printf "a\000\000b" |
>   src/dd conv=sparse bs=1 count=10 oflag=append > file.test
> 
> I guess an ftruncate() would cater for that too, except in the case
> where another process is writing to the file, in which case
> you would might want to disallow the conv=sparse oflag=append combo.
> We can't really ftruncate as we go as that would be racy,
> so I guess we should disallow the combo?

Ok the attached should be pretty much complete.
It disables conv=sparse when used with
conv=notrunc or oflag=append.

Comments appreciated.

Roman are you around?

cheers,
Pádraig.
[dd-sparse.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Tue, 28 Feb 2012 22:40:02 GMT) Full text and rfc822 format available.

Message #52 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Tue, 28 Feb 2012 10:24:56 -0800
On 02/28/2012 09:01 AM, Pádraig Brady wrote:

> Doing that would be "dangerous" as I said above,
> or in other words, surprising to users to not update
> possibly non NUL data in the output file.

It's not surprising at all.  It's what I want and
expect.  For example, suppose every nonzero byte of file A
has an offset that is in a hole of file B.  Then I should
be able to overlay A and B into another file C, as follows:

dd if=A of=C conv=sparse
dd if=B of=C conv=sparse,notrunc

This is akin to the "or" operation on files, and it'd be a nice
operation to have.  Even if A's contents don't fit in B's holes,
I still might want to do the above, to allow B's non-holes to
override A's contents.  Why disable this useful functionality?

Any surprise issues can be dealt with by documenting dd's
behavior appropriately.


> +Try to seek rather than write @sc{nul} output blocks.
> +This will create sparse output when extending.
> +This option is ignored in conjunction with
> +@samp{conv=notrunc} or @samp{oflag=append}.

I still dubious about this level of handholding.
dd is meant for low-level use, and as far as possible
options should be orthogonal.  For example, with dd,
oflag=append does not disable seek=N -- both flags operate,
which means that the seek is ineffective unless it
is past the end of file.  conv=sparse oflag=append
should be similar: all it should mean is that the
writes are sparse when they're past the end of
the file (this latter functionality should work, but
doesn't work with the proposed patch).




Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Tue, 28 Feb 2012 23:00:02 GMT) Full text and rfc822 format available.

Message #55 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Roman Rybalko <devel <at> romanr.name>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 9157 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Tue, 28 Feb 2012 22:00:23 +0400
On 28.02.2012 21:01, Pádraig Brady wrote:
> Ok the attached should be pretty much complete.
> It disables conv=sparse when used with
> conv=notrunc or oflag=append.
> Comments appreciated.
About "sparse" stuff everything's ok, imo.
Actually I can live with truncated files due to absence of last write,
but for user it's really convenient to have output file with matched
size :) so it worths having additional static var with some logic around.

I have another idea about INPUT_BLOCK_SLOP/OUTPUT_BLOCK_SLOP: what about
not hardcoding them but getting them out of the filesystem, or out of
ibs/obs specified by user - that last would be the best case. But that's
probably not for this feature request.

-- 
WBR,
Roman Rybalko





Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Tue, 28 Feb 2012 23:14:01 GMT) Full text and rfc822 format available.

Message #58 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info,
	Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Wed, 29 Feb 2012 00:13:26 +0100
Pádraig Brady wrote:
...

Thanks for working on that.
All I would adjust are a few nits and
add require_sparse_support_ in the test script:

> Subject: [PATCH] dd: add support for the conv=sparse option
>
> Small seeks are not coalesced to larger ones (like is
> done in cache_round() for example, for the moment at least.
>
> conv= is used rather then oflag= for FreeBSD compatibility.
>
> * src/dd.c (last_seek): A new global boolean to flag

"last" is ambiguous.  Does it mean "final" or "preceding"?
From the context below, (not the comment, since it too uses "last")
I see it means "final".  "final_op_was_seek" is longer but ultra clear.
Maybe worth the length.

> whether the last "write" was converted to a seek.
> (usage): Describe the new conf=sparse option.
> (scanargs): Ignore conv=sparse in some combinations.
> (iwrite): Convert a write of a NUL block to a seek if requested.
> (do_copy): Initialize the output buffer to have a sentinel,
> to allow for efficient testing for NUL output blocks.
> If the last block in the file was converted to a seek,
> then convert back to a write so the size ip updated.

s/ip/is/

> * NEWS: Mention the new feature.
> * tests/dd/sparse: A new test for the feature.
> * tests/Makefile.am: Reference the new test.
> ---
>  NEWS               |    3 +
>  doc/coreutils.texi |    7 +++
>  src/dd.c           |  104 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>  tests/Makefile.am  |    1 +
>  tests/dd/sparse    |   57 ++++++++++++++++++++++++++++
>  5 files changed, 168 insertions(+), 4 deletions(-)
>  create mode 100755 tests/dd/sparse
>
> diff --git a/NEWS b/NEWS
> index e2e8fc5..8006669 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -7,6 +7,9 @@ GNU coreutils NEWS                                    -*- outline -*-
>    dd now accepts the count_bytes, skip_bytes iflags and the seek_bytes
>    oflag, to more easily allow processing portions of a file.
>
> +  dd now accepts the conv=sparse flag to attempt to create sparse
> +  output, by seeking rather than writing to the output file.
> +
>    split now accepts an optional "from" argument to --numeric-suffixes,
>    which changes the start number from the default of 0.
>
> diff --git a/doc/coreutils.texi b/doc/coreutils.texi
> index 414626d..f22e7d2 100644
> --- a/doc/coreutils.texi
> +++ b/doc/coreutils.texi
> @@ -8140,6 +8140,13 @@ Change lowercase letters to uppercase.
>
>  The @samp{lcase} and @samp{ucase} conversions are mutually exclusive.
>
> +@item sparse
> +@opindex sparse
> +Try to seek rather than write @sc{nul} output blocks.
> +This will create sparse output when extending.

Maybe add this:
s/\./on a file system that support sparse files./ ?

> +This option is ignored in conjunction with
> +@samp{conv=notrunc} or @samp{oflag=append}.
> +
>  @item swab
>  @opindex swab @r{(byte-swapping)}
>  @cindex byte-swapping
> diff --git a/src/dd.c b/src/dd.c
> index fe44a30..903346f 100644
> --- a/src/dd.c
> +++ b/src/dd.c
> @@ -94,7 +94,7 @@
>     malloc.  See dd_copy for details.  INPUT_BLOCK_SLOP must be no less than
>     OUTPUT_BLOCK_SLOP.  */
>  #define INPUT_BLOCK_SLOP (2 * SWAB_ALIGN_OFFSET + 2 * page_size - 1)
> -#define OUTPUT_BLOCK_SLOP (page_size - 1)
> +#define OUTPUT_BLOCK_SLOP MAX (sizeof (uintptr_t), page_size - 1)

I haven't seen justification for why you're making the above change.
Can sizeof uintptr_t really be larger than page_size-1 (getpagesize()-1)?
This seems so unlikely that it'd deserve an assertion in main where
page_size is set, even though there are only two uses of OUTPUT_BLOCK_SLOP.

If you leave the sizeof, please omit the parentheses.
They are unnecessary here.
Same below.

...
> +  /* Write sentinel to slop after the buffer,
> +     to allow efficient checking for NUL blocks.  */
> +  memset (obuf + output_blocksize, 1, sizeof (uintptr_t));
...
> diff --git a/tests/dd/sparse b/tests/dd/sparse
...
> +
> +. "${srcdir=.}/init.sh"; path_prepend_ ../src
> +print_ver_ dd

require_sparse_support_

> +# Ensure basic sparse generation works
> +truncate -s1M sparse
> +dd bs=32K if=sparse of=sparse.dd conv=sparse
> +test $(stat -c %s sparse) = $(stat -c %s sparse.dd) || fail=1




Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Tue, 28 Feb 2012 23:35:02 GMT) Full text and rfc822 format available.

Message #61 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info,
	Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Tue, 28 Feb 2012 23:33:40 +0000
On 02/28/2012 11:13 PM, Jim Meyering wrote:
> Pádraig Brady wrote:
> ...
> 
> Thanks for working on that.
> All I would adjust are a few nits and
> add require_sparse_support_ in the test script:
> 
>> Subject: [PATCH] dd: add support for the conv=sparse option
>>
>> Small seeks are not coalesced to larger ones (like is
>> done in cache_round() for example, for the moment at least.
>>
>> conv= is used rather then oflag= for FreeBSD compatibility.
>>
>> * src/dd.c (last_seek): A new global boolean to flag
> 
> "last" is ambiguous.  Does it mean "final" or "preceding"?
>>From the context below, (not the comment, since it too uses "last")
> I see it means "final".  "final_op_was_seek" is longer but ultra clear.
> Maybe worth the length.

yes I was too terse

>> -#define OUTPUT_BLOCK_SLOP (page_size - 1)
>> +#define OUTPUT_BLOCK_SLOP MAX (sizeof (uintptr_t), page_size - 1)
> 
> I haven't seen justification for why you're making the above change.
> Can sizeof uintptr_t really be larger than page_size-1 (getpagesize()-1)?
> This seems so unlikely that it'd deserve an assertion in main where
> page_size is set, even though there are only two uses of OUTPUT_BLOCK_SLOP.

Yes it's never going to trigger.
Paul suggested MAX(...) so as to doc/enforce the new constraint.
assert() is not used in dd.c yet
I'll leave as is unless Paul comments otherwise (modulo the
extraneous () of course).

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Tue, 28 Feb 2012 23:41:01 GMT) Full text and rfc822 format available.

Message #64 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info,
	Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Tue, 28 Feb 2012 15:39:38 -0800
On 02/28/2012 03:13 PM, Jim Meyering wrote:
>> -#define OUTPUT_BLOCK_SLOP (page_size - 1)
>> > +#define OUTPUT_BLOCK_SLOP MAX (sizeof (uintptr_t), page_size - 1)
> I haven't seen justification for why you're making the above change.
> Can sizeof uintptr_t really be larger than page_size-1 (getpagesize()-1)?
> This seems so unlikely that it'd deserve an assertion in main where
> page_size is set, even though there are only two uses of OUTPUT_BLOCK_SLOP.

Theoretically getpagesize could return 1, no?
Admittedly it's unlikely.  I guess it'd be OK
to omit this change, and to just put in a comment
saying something like "We use the output slop
to store a uintptr_t value, so we assume that
getpagesize returns a value greater than sizeof (uintptr_t)."
Perhaps the comment should go right next to the place where
the uintptr_t sentinel is inserted.

> If you leave the sizeof, please omit the parentheses.

But "sizeof uintptr_t" wouldn't work, as uintptr_t is a type.




Information forwarded to bug-coreutils <at> gnu.org:
bug#9157; Package coreutils. (Wed, 29 Feb 2012 00:37:02 GMT) Full text and rfc822 format available.

Message #67 received at 9157 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info,
	Jim Meyering <jim <at> meyering.net>
Subject: Re: bug#9157: [PATCH] dd: sparse conv flag
Date: Wed, 29 Feb 2012 00:36:15 +0000
On 02/28/2012 11:39 PM, Paul Eggert wrote:
> On 02/28/2012 03:13 PM, Jim Meyering wrote:
>>> -#define OUTPUT_BLOCK_SLOP (page_size - 1)
>>>> +#define OUTPUT_BLOCK_SLOP MAX (sizeof (uintptr_t), page_size - 1)
>> I haven't seen justification for why you're making the above change.
>> Can sizeof uintptr_t really be larger than page_size-1 (getpagesize()-1)?
>> This seems so unlikely that it'd deserve an assertion in main where
>> page_size is set, even though there are only two uses of OUTPUT_BLOCK_SLOP.
> 
> Theoretically getpagesize could return 1, no?
> Admittedly it's unlikely.  I guess it'd be OK
> to omit this change, and to just put in a comment
> saying something like "We use the output slop
> to store a uintptr_t value, so we assume that
> getpagesize returns a value greater than sizeof (uintptr_t)."
> Perhaps the comment should go right next to the place where
> the uintptr_t sentinel is inserted.
> 
>> If you leave the sizeof, please omit the parentheses.
> 
> But "sizeof uintptr_t" wouldn't work, as uintptr_t is a type.
> 

Ok I'm going with the version with the assert, since
that's better than a comment.

thanks for all the comments,
Pádraig.

p.s. I can't commit at the moment due to authentication error.
I'll look into it in the morning.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 28 Mar 2012 11:24:02 GMT) Full text and rfc822 format available.

This bug report was last modified 13 years and 182 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.