Package: coreutils;
Reported by: Roman Rybalko <devel <at> romanr.info>
Date: Sat, 23 Jul 2011 22:42:04 UTC
Severity: normal
Tags: patch
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9157 in the body.
You can then email your comments to 9157 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
owner <at> debbugs.gnu.org, help-debbugs <at> gnu.org
:bug#9157
; Package debbugs.gnu.org
.
(Sat, 23 Jul 2011 22:42:04 GMT) Full text and rfc822 format available.Roman Rybalko <devel <at> romanr.info>
:help-debbugs <at> gnu.org
.
(Sat, 23 Jul 2011 22:42:04 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Roman Rybalko <devel <at> romanr.info> To: undisclosed-recipients:; Subject: [PATCH] dd: sparse conv flag Date: Sat, 23 Jul 2011 20:08:01 +0400
--- doc/coreutils.texi | 4 +++ src/dd.c | 22 +++++++++++++++- tests/Makefile.am | 1 + tests/dd/sparse | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 96 insertions(+), 1 deletions(-) create mode 100755 tests/dd/sparse diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 424446c..761c698 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -8127,6 +8127,10 @@ Pad every input block to size of @samp{ibs} with trailing zero bytes. When used with @samp{block} or @samp{unblock}, pad with spaces instead of zero bytes. +@item sparse +@opindex sparse +Make sparse output file. + @end table The following ``conversions'' are really file flags diff --git a/src/dd.c b/src/dd.c index 0824f6c..49847f5 100644 --- a/src/dd.c +++ b/src/dd.c @@ -126,7 +126,8 @@ enum C_NOCREAT = 010000, C_EXCL = 020000, C_FDATASYNC = 040000, - C_FSYNC = 0100000 + C_FSYNC = 0100000, + C_SPARSE = 0200000 }; /* Status bit masks. */ @@ -268,6 +269,7 @@ static struct symbol_value const conversions[] = {"sync", C_SYNC}, /* Pad input records to ibs with NULs. */ {"fdatasync", C_FDATASYNC}, /* Synchronize output data before finishing. */ {"fsync", C_FSYNC}, /* Also synchronize output metadata. */ + {"sparse", C_SPARSE}, /* Make sparse output file. */ {"", 0} }; @@ -533,6 +535,9 @@ Each CONV symbol may be:\n\ fsync likewise, but also write metadata\n\ "), stdout); fputs (_("\ + sparse make sparse output file\n\ +"), stdout); + fputs (_("\ \n\ Each FLAG symbol may be:\n\ \n\ @@ -985,6 +990,21 @@ iwrite (int fd, char const *buf, size_t size) { ssize_t nwritten; process_signals (); + if (conversions_mask & C_SPARSE) + { + off_t seek_size = 0; + while (buf[total_written + seek_size] == 0) + ++seek_size; + if (seek_size) + { + off_t cur_off = 0; + cur_off = lseek(fd, seek_size, SEEK_CUR); + if (cur_off < 0) + break; + total_written += seek_size; + continue; + } + } nwritten = write (fd, buf + total_written, size - total_written); if (nwritten < 0) { diff --git a/tests/Makefile.am b/tests/Makefile.am index ebd1b11..0f1376a 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -364,6 +364,7 @@ TESTS = \ dd/skip-seek \ dd/skip-seek2 \ dd/skip-seek-past-file \ + dd/sparse \ dd/stderr \ dd/unblock \ dd/unblock-sync \ diff --git a/tests/dd/sparse b/tests/dd/sparse new file mode 100755 index 0000000..f0e0806 --- /dev/null +++ b/tests/dd/sparse @@ -0,0 +1,70 @@ +#!/bin/sh +# Ensure that dd conv=sparse works. + +# Copyright (C) 2003, 2005-2011 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +. "${srcdir=.}/init.sh"; path_prepend_ ../src +print_ver_ dd + +# sometimes we may read less than 1M +dd if=/dev/zero of=sample0 count=1 bs=1M 2> /dev/null || fail=1 +[ "`stat -c %s sample0`" = "1048576" ] || fail=1 +dd if=/dev/urandom of=sample1 count=1 bs=1M 2> /dev/null || fail=1 +[ "`stat -c %s sample1`" = "1048576" ] || fail=1 + +# test 1 +dd if=sample1 of=test1 seek=0 bs=1M 2> /dev/null || fail=1 +dd if=sample0 of=test1 seek=1 bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=test1 seek=2 bs=1M 2> /dev/null || fail=1 + +# test 1-1 +dd if=test1 of=out1-1 conv=sparse bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out1-1-check bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out1-1-check seek=2 bs=1M 2> /dev/null || fail=1 +[ "`stat -c '%s %b %B' out1-1`" = "`stat -c '%s %b %B' out1-1-check`" ] || fail=1 + +# test 1-2 +dd if=test1 of=out1-2 seek=1 conv=sparse bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out1-2-check seek=1 bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out1-2-check seek=3 bs=1M 2> /dev/null || fail=1 +[ "`stat -c '%s %b %B' out1-2`" = "`stat -c '%s %b %B' out1-2-check`" ] || fail=1 + +# test 1-3 +dd if=test1 of=out1-3 seek=1 skip=1 conv=sparse bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out1-3-check seek=2 bs=1M 2> /dev/null || fail=1 +[ "`stat -c '%s %b %B' out1-3`" = "`stat -c '%s %b %B' out1-3-check`" ] || fail=1 + +# test 2 +dd if=sample0 of=test2 seek=0 bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=test2 seek=1 bs=1M 2> /dev/null || fail=1 +dd if=sample0 of=test2 seek=2 bs=1M 2> /dev/null || fail=1 + +# test 2-1 +dd if=test2 of=out2-1 conv=sparse bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out2-1-check seek=1 bs=1M 2> /dev/null || fail=1 +[ "`stat -c '%s %b %B' out2-1`" = "`stat -c '%s %b %B' out2-1-check`" ] || fail=1 + +# test 2-2 +dd if=test2 of=out2-2 seek=1 conv=sparse bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out2-2-check seek=2 bs=1M 2> /dev/null || fail=1 +[ "`stat -c '%s %b %B' out2-2`" = "`stat -c '%s %b %B' out2-2-check`" ] || fail=1 + +# test 2-3 +dd if=test2 of=out2-3 seek=1 skip=1 conv=sparse bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out2-3-check seek=1 bs=1M 2> /dev/null || fail=1 +[ "`stat -c '%s %b %B' out2-3`" = "`stat -c '%s %b %B' out2-3-check`" ] || fail=1 + +Exit $fail -- 1.7.0.4
owner <at> debbugs.gnu.org, help-debbugs <at> gnu.org
:bug#9157
; Package debbugs.gnu.org
.
(Sun, 24 Jul 2011 18:07:02 GMT) Full text and rfc822 format available.Message #8 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Roman Rybalko <devel <at> romanr.info> To: undisclosed-recipients:; Subject: [PATCH] dd: sparse conv flag Date: Sat, 23 Jul 2011 20:08:01 +0400
--- doc/coreutils.texi | 4 +++ src/dd.c | 22 +++++++++++++++- tests/Makefile.am | 1 + tests/dd/sparse | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 96 insertions(+), 1 deletions(-) create mode 100755 tests/dd/sparse diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 424446c..761c698 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -8127,6 +8127,10 @@ Pad every input block to size of @samp{ibs} with trailing zero bytes. When used with @samp{block} or @samp{unblock}, pad with spaces instead of zero bytes. +@item sparse +@opindex sparse +Make sparse output file. + @end table The following ``conversions'' are really file flags diff --git a/src/dd.c b/src/dd.c index 0824f6c..0393740 100644 --- a/src/dd.c +++ b/src/dd.c @@ -126,7 +126,8 @@ enum C_NOCREAT = 010000, C_EXCL = 020000, C_FDATASYNC = 040000, - C_FSYNC = 0100000 + C_FSYNC = 0100000, + C_SPARSE = 0200000 }; /* Status bit masks. */ @@ -268,6 +269,7 @@ static struct symbol_value const conversions[] = {"sync", C_SYNC}, /* Pad input records to ibs with NULs. */ {"fdatasync", C_FDATASYNC}, /* Synchronize output data before finishing. */ {"fsync", C_FSYNC}, /* Also synchronize output metadata. */ + {"sparse", C_SPARSE}, /* Make sparse output file. */ {"", 0} }; @@ -533,6 +535,9 @@ Each CONV symbol may be:\n\ fsync likewise, but also write metadata\n\ "), stdout); fputs (_("\ + sparse make sparse output file\n\ +"), stdout); + fputs (_("\ \n\ Each FLAG symbol may be:\n\ \n\ @@ -985,6 +990,21 @@ iwrite (int fd, char const *buf, size_t size) { ssize_t nwritten; process_signals (); + if (conversions_mask & C_SPARSE) + { + off_t seek_size = 0; + while (total_written + seek_size < size && buf[total_written + seek_size] == 0) + ++seek_size; + if (seek_size) + { + off_t cur_off = 0; + cur_off = lseek(fd, seek_size, SEEK_CUR); + if (cur_off < 0) + break; + total_written += seek_size; + continue; + } + } nwritten = write (fd, buf + total_written, size - total_written); if (nwritten < 0) { diff --git a/tests/Makefile.am b/tests/Makefile.am index ebd1b11..0f1376a 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -364,6 +364,7 @@ TESTS = \ dd/skip-seek \ dd/skip-seek2 \ dd/skip-seek-past-file \ + dd/sparse \ dd/stderr \ dd/unblock \ dd/unblock-sync \ diff --git a/tests/dd/sparse b/tests/dd/sparse new file mode 100755 index 0000000..f0e0806 --- /dev/null +++ b/tests/dd/sparse @@ -0,0 +1,70 @@ +#!/bin/sh +# Ensure that dd conv=sparse works. + +# Copyright (C) 2003, 2005-2011 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +. "${srcdir=.}/init.sh"; path_prepend_ ../src +print_ver_ dd + +# sometimes we may read less than 1M +dd if=/dev/zero of=sample0 count=1 bs=1M 2> /dev/null || fail=1 +[ "`stat -c %s sample0`" = "1048576" ] || fail=1 +dd if=/dev/urandom of=sample1 count=1 bs=1M 2> /dev/null || fail=1 +[ "`stat -c %s sample1`" = "1048576" ] || fail=1 + +# test 1 +dd if=sample1 of=test1 seek=0 bs=1M 2> /dev/null || fail=1 +dd if=sample0 of=test1 seek=1 bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=test1 seek=2 bs=1M 2> /dev/null || fail=1 + +# test 1-1 +dd if=test1 of=out1-1 conv=sparse bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out1-1-check bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out1-1-check seek=2 bs=1M 2> /dev/null || fail=1 +[ "`stat -c '%s %b %B' out1-1`" = "`stat -c '%s %b %B' out1-1-check`" ] || fail=1 + +# test 1-2 +dd if=test1 of=out1-2 seek=1 conv=sparse bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out1-2-check seek=1 bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out1-2-check seek=3 bs=1M 2> /dev/null || fail=1 +[ "`stat -c '%s %b %B' out1-2`" = "`stat -c '%s %b %B' out1-2-check`" ] || fail=1 + +# test 1-3 +dd if=test1 of=out1-3 seek=1 skip=1 conv=sparse bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out1-3-check seek=2 bs=1M 2> /dev/null || fail=1 +[ "`stat -c '%s %b %B' out1-3`" = "`stat -c '%s %b %B' out1-3-check`" ] || fail=1 + +# test 2 +dd if=sample0 of=test2 seek=0 bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=test2 seek=1 bs=1M 2> /dev/null || fail=1 +dd if=sample0 of=test2 seek=2 bs=1M 2> /dev/null || fail=1 + +# test 2-1 +dd if=test2 of=out2-1 conv=sparse bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out2-1-check seek=1 bs=1M 2> /dev/null || fail=1 +[ "`stat -c '%s %b %B' out2-1`" = "`stat -c '%s %b %B' out2-1-check`" ] || fail=1 + +# test 2-2 +dd if=test2 of=out2-2 seek=1 conv=sparse bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out2-2-check seek=2 bs=1M 2> /dev/null || fail=1 +[ "`stat -c '%s %b %B' out2-2`" = "`stat -c '%s %b %B' out2-2-check`" ] || fail=1 + +# test 2-3 +dd if=test2 of=out2-3 seek=1 skip=1 conv=sparse bs=1M 2> /dev/null || fail=1 +dd if=sample1 of=out2-3-check seek=1 bs=1M 2> /dev/null || fail=1 +[ "`stat -c '%s %b %B' out2-3`" = "`stat -c '%s %b %B' out2-3-check`" ] || fail=1 + +Exit $fail -- 1.7.0.4
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Sat, 06 Aug 2011 20:54:01 GMT) Full text and rfc822 format available.Message #11 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Glenn Morris <rgm <at> gnu.org> To: 9157 <at> debbugs.gnu.org Cc: Roman Rybalko <devel <at> romanr.info> Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Sat, 06 Aug 2011 16:52:46 -0400
This bug was reported via bcc. Because of this, the bug tracking system did not know which package it was associated with and assigned it to debbugs.gnu.org. I have reassigned it to the coreutils package and am sending this so that the bug-coreutils mailing list sees the report. See the complete report at http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9157 Roman Rybalko wrote: > --- > doc/coreutils.texi | 4 +++ > src/dd.c | 22 +++++++++++++++- > tests/Makefile.am | 1 + > tests/dd/sparse | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 96 insertions(+), 1 deletions(-) > create mode 100755 tests/dd/sparse > > diff --git a/doc/coreutils.texi b/doc/coreutils.texi > index 424446c..761c698 100644 > --- a/doc/coreutils.texi > +++ b/doc/coreutils.texi > @@ -8127,6 +8127,10 @@ Pad every input block to size of @samp{ibs} with trailing zero bytes. > When used with @samp{block} or @samp{unblock}, pad with spaces instead of > zero bytes. > > +@item sparse > +@opindex sparse > +Make sparse output file. > + > @end table > > The following ``conversions'' are really file flags > diff --git a/src/dd.c b/src/dd.c > index 0824f6c..0393740 100644 > --- a/src/dd.c > +++ b/src/dd.c > @@ -126,7 +126,8 @@ enum > C_NOCREAT = 010000, > C_EXCL = 020000, > C_FDATASYNC = 040000, > - C_FSYNC = 0100000 > + C_FSYNC = 0100000, > + C_SPARSE = 0200000 > }; > > /* Status bit masks. */ > @@ -268,6 +269,7 @@ static struct symbol_value const conversions[] = > {"sync", C_SYNC}, /* Pad input records to ibs with NULs. */ > {"fdatasync", C_FDATASYNC}, /* Synchronize output data before finishing. */ > {"fsync", C_FSYNC}, /* Also synchronize output metadata. */ > + {"sparse", C_SPARSE}, /* Make sparse output file. */ > {"", 0} > }; > > @@ -533,6 +535,9 @@ Each CONV symbol may be:\n\ > fsync likewise, but also write metadata\n\ > "), stdout); > fputs (_("\ > + sparse make sparse output file\n\ > +"), stdout); > + fputs (_("\ > \n\ > Each FLAG symbol may be:\n\ > \n\ > @@ -985,6 +990,21 @@ iwrite (int fd, char const *buf, size_t size) > { > ssize_t nwritten; > process_signals (); > + if (conversions_mask & C_SPARSE) > + { > + off_t seek_size = 0; > + while (total_written + seek_size < size && buf[total_written + seek_size] == 0) > + ++seek_size; > + if (seek_size) > + { > + off_t cur_off = 0; > + cur_off = lseek(fd, seek_size, SEEK_CUR); > + if (cur_off < 0) > + break; > + total_written += seek_size; > + continue; > + } > + } > nwritten = write (fd, buf + total_written, size - total_written); > if (nwritten < 0) > { > diff --git a/tests/Makefile.am b/tests/Makefile.am > index ebd1b11..0f1376a 100644 > --- a/tests/Makefile.am > +++ b/tests/Makefile.am > @@ -364,6 +364,7 @@ TESTS = \ > dd/skip-seek \ > dd/skip-seek2 \ > dd/skip-seek-past-file \ > + dd/sparse \ > dd/stderr \ > dd/unblock \ > dd/unblock-sync \ > diff --git a/tests/dd/sparse b/tests/dd/sparse > new file mode 100755 > index 0000000..f0e0806 > --- /dev/null > +++ b/tests/dd/sparse > @@ -0,0 +1,70 @@ > +#!/bin/sh > +# Ensure that dd conv=sparse works. > + > +# Copyright (C) 2003, 2005-2011 Free Software Foundation, Inc. > + > +# This program is free software: you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation, either version 3 of the License, or > +# (at your option) any later version. > + > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > + > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see <http://www.gnu.org/licenses/>. > + > +. "${srcdir=.}/init.sh"; path_prepend_ ../src > +print_ver_ dd > + > +# sometimes we may read less than 1M > +dd if=/dev/zero of=sample0 count=1 bs=1M 2> /dev/null || fail=1 > +[ "`stat -c %s sample0`" = "1048576" ] || fail=1 > +dd if=/dev/urandom of=sample1 count=1 bs=1M 2> /dev/null || fail=1 > +[ "`stat -c %s sample1`" = "1048576" ] || fail=1 > + > +# test 1 > +dd if=sample1 of=test1 seek=0 bs=1M 2> /dev/null || fail=1 > +dd if=sample0 of=test1 seek=1 bs=1M 2> /dev/null || fail=1 > +dd if=sample1 of=test1 seek=2 bs=1M 2> /dev/null || fail=1 > + > +# test 1-1 > +dd if=test1 of=out1-1 conv=sparse bs=1M 2> /dev/null || fail=1 > +dd if=sample1 of=out1-1-check bs=1M 2> /dev/null || fail=1 > +dd if=sample1 of=out1-1-check seek=2 bs=1M 2> /dev/null || fail=1 > +[ "`stat -c '%s %b %B' out1-1`" = "`stat -c '%s %b %B' out1-1-check`" ] || fail=1 > + > +# test 1-2 > +dd if=test1 of=out1-2 seek=1 conv=sparse bs=1M 2> /dev/null || fail=1 > +dd if=sample1 of=out1-2-check seek=1 bs=1M 2> /dev/null || fail=1 > +dd if=sample1 of=out1-2-check seek=3 bs=1M 2> /dev/null || fail=1 > +[ "`stat -c '%s %b %B' out1-2`" = "`stat -c '%s %b %B' out1-2-check`" ] || fail=1 > + > +# test 1-3 > +dd if=test1 of=out1-3 seek=1 skip=1 conv=sparse bs=1M 2> /dev/null || fail=1 > +dd if=sample1 of=out1-3-check seek=2 bs=1M 2> /dev/null || fail=1 > +[ "`stat -c '%s %b %B' out1-3`" = "`stat -c '%s %b %B' out1-3-check`" ] || fail=1 > + > +# test 2 > +dd if=sample0 of=test2 seek=0 bs=1M 2> /dev/null || fail=1 > +dd if=sample1 of=test2 seek=1 bs=1M 2> /dev/null || fail=1 > +dd if=sample0 of=test2 seek=2 bs=1M 2> /dev/null || fail=1 > + > +# test 2-1 > +dd if=test2 of=out2-1 conv=sparse bs=1M 2> /dev/null || fail=1 > +dd if=sample1 of=out2-1-check seek=1 bs=1M 2> /dev/null || fail=1 > +[ "`stat -c '%s %b %B' out2-1`" = "`stat -c '%s %b %B' out2-1-check`" ] || fail=1 > + > +# test 2-2 > +dd if=test2 of=out2-2 seek=1 conv=sparse bs=1M 2> /dev/null || fail=1 > +dd if=sample1 of=out2-2-check seek=2 bs=1M 2> /dev/null || fail=1 > +[ "`stat -c '%s %b %B' out2-2`" = "`stat -c '%s %b %B' out2-2-check`" ] || fail=1 > + > +# test 2-3 > +dd if=test2 of=out2-3 seek=1 skip=1 conv=sparse bs=1M 2> /dev/null || fail=1 > +dd if=sample1 of=out2-3-check seek=1 bs=1M 2> /dev/null || fail=1 > +[ "`stat -c '%s %b %B' out2-3`" = "`stat -c '%s %b %B' out2-3-check`" ] || fail=1 > + > +Exit $fail
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Sat, 10 Dec 2011 10:11:03 GMT) Full text and rfc822 format available.Message #14 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Jim Meyering <jim <at> meyering.net> To: Roman Rybalko <devel <at> romanr.info> Cc: 9157 <at> debbugs.gnu.org Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Sat, 10 Dec 2011 11:09:18 +0100
Roman Rybalko wrote: > --- > doc/coreutils.texi | 4 +++ > src/dd.c | 22 +++++++++++++++- > tests/Makefile.am | 1 + > tests/dd/sparse | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 96 insertions(+), 1 deletions(-) > create mode 100755 tests/dd/sparse Thank you for the patch. It's nice to see accompanying test suite changes. I see that copyright paperwork is now on file with the FSF, so... Did you notice that cp now has the capability (using extents) to preserve holes efficiently on some file system types? Recent kernels also have SEEK_HOLE/SEEK_DATA support with which it is more efficient to detect holes. Here are some things we'll have to consider before adding a new hole-punching option to dd: Your patch may create a hole in the destination for each sequence of length seek_size or greater of zero bytes in the input. As you may have seen in the cp-related discussion, one may want different options: - preserve a file's hole/non-hole structure - efficiently detect existing holes and fill them with explicit zeros in dest - efficiently detect existing holes and seek-in-dest for each sequence of zeros (longer than some minimum) in non-hole input > diff --git a/doc/coreutils.texi b/doc/coreutils.texi > index 424446c..761c698 100644 > --- a/doc/coreutils.texi > +++ b/doc/coreutils.texi > @@ -8127,6 +8127,10 @@ Pad every input block to size of @samp{ibs} with trailing zero bytes. > When used with @samp{block} or @samp{unblock}, pad with spaces instead of > zero bytes. > > +@item sparse > +@opindex sparse > +Make sparse output file. Please say a little more here. I.e., when might a hole be introduced? When is this option useful? > @end table > > The following ``conversions'' are really file flags > diff --git a/src/dd.c b/src/dd.c > index 0824f6c..0393740 100644 > --- a/src/dd.c > +++ b/src/dd.c > @@ -126,7 +126,8 @@ enum > C_NOCREAT = 010000, > C_EXCL = 020000, > C_FDATASYNC = 040000, > - C_FSYNC = 0100000 > + C_FSYNC = 0100000, > + C_SPARSE = 0200000 > }; > > /* Status bit masks. */ > @@ -268,6 +269,7 @@ static struct symbol_value const conversions[] = > {"sync", C_SYNC}, /* Pad input records to ibs with NULs. */ > {"fdatasync", C_FDATASYNC}, /* Synchronize output data before finishing. */ > {"fsync", C_FSYNC}, /* Also synchronize output metadata. */ > + {"sparse", C_SPARSE}, /* Make sparse output file. */ > {"", 0} > }; > > @@ -533,6 +535,9 @@ Each CONV symbol may be:\n\ > fsync likewise, but also write metadata\n\ > "), stdout); > fputs (_("\ > + sparse make sparse output file\n\ > +"), stdout); > + fputs (_("\ > \n\ > Each FLAG symbol may be:\n\ > \n\ > @@ -985,6 +990,21 @@ iwrite (int fd, char const *buf, size_t size) > { > ssize_t nwritten; > process_signals (); > + if (conversions_mask & C_SPARSE) > + { > + off_t seek_size = 0; > + while (total_written + seek_size < size && buf[total_written + seek_size] == 0) > + ++seek_size; > + if (seek_size) > + { > + off_t cur_off = 0; > + cur_off = lseek(fd, seek_size, SEEK_CUR); > + if (cur_off < 0) > + break; dd must not ignore lseek failure. > + total_written += seek_size; > + continue; > + } > + } > nwritten = write (fd, buf + total_written, size - total_written); > if (nwritten < 0) > { > diff --git a/tests/Makefile.am b/tests/Makefile.am > index ebd1b11..0f1376a 100644 > --- a/tests/Makefile.am > +++ b/tests/Makefile.am > @@ -364,6 +364,7 @@ TESTS = \ > dd/skip-seek \ > dd/skip-seek2 \ > dd/skip-seek-past-file \ > + dd/sparse \ > dd/stderr \ > dd/unblock \ > dd/unblock-sync \ > diff --git a/tests/dd/sparse b/tests/dd/sparse > new file mode 100755 > index 0000000..f0e0806 > --- /dev/null > +++ b/tests/dd/sparse > @@ -0,0 +1,70 @@ > +#!/bin/sh > +# Ensure that dd conv=sparse works. > + > +# Copyright (C) 2003, 2005-2011 Free Software Foundation, Inc. Use only 2011 as the copyright year. > +# This program is free software: you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation, either version 3 of the License, or > +# (at your option) any later version. > + > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > + > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see <http://www.gnu.org/licenses/>. > + > +. "${srcdir=.}/init.sh"; path_prepend_ ../src > +print_ver_ dd > + > +# sometimes we may read less than 1M > +dd if=/dev/zero of=sample0 count=1 bs=1M 2> /dev/null || fail=1 > +[ "`stat -c %s sample0`" = "1048576" ] || fail=1 We'd write that like this instead: (note use of test, not "[...]", use of $(...), not `...`) test "$(stat -c %s sample0)" = 1048576 || fail=1 > +dd if=/dev/urandom of=sample1 count=1 bs=1M 2> /dev/null || fail=1 > +[ "`stat -c %s sample1`" = "1048576" ] || fail=1 > + > +# test 1 > +dd if=sample1 of=test1 seek=0 bs=1M 2> /dev/null || fail=1 > +dd if=sample0 of=test1 seek=1 bs=1M 2> /dev/null || fail=1 > +dd if=sample1 of=test1 seek=2 bs=1M 2> /dev/null || fail=1 > + > +# test 1-1 > +dd if=test1 of=out1-1 conv=sparse bs=1M 2> /dev/null || fail=1 > +dd if=sample1 of=out1-1-check bs=1M 2> /dev/null || fail=1 > +dd if=sample1 of=out1-1-check seek=2 bs=1M 2> /dev/null || fail=1 > +[ "`stat -c '%s %b %B' out1-1`" = "`stat -c '%s %b %B' out1-1-check`" ] || fail=1
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Mon, 12 Dec 2011 00:00:02 GMT) Full text and rfc822 format available.Message #17 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: "Roman Rybalko (devel)" <devel <at> romanr.info> To: Jim Meyering <jim <at> meyering.net> Cc: 9157 <at> debbugs.gnu.org Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Mon, 12 Dec 2011 00:02:51 +0400
On 10.12.2011 14:09, Jim Meyering wrote: > Here are some things we'll have to consider before > adding a new hole-punching option to dd: > > Your patch may create a hole in the destination for each sequence of > length seek_size or greater of zero bytes in the input. > As you may have seen in the cp-related discussion, one may > want different options: > - preserve a file's hole/non-hole structure > - efficiently detect existing holes and fill them with explicit zeros in dest > - efficiently detect existing holes and seek-in-dest for each sequence of > zeros (longer than some minimum) in non-hole input Okay, I'll think about that. That's a clear task. >> diff --git a/doc/coreutils.texi b/doc/coreutils.texi >> index 424446c..761c698 100644 >> --- a/doc/coreutils.texi >> +++ b/doc/coreutils.texi >> @@ -8127,6 +8127,10 @@ Pad every input block to size of @samp{ibs} with trailing zero bytes. >> When used with @samp{block} or @samp{unblock}, pad with spaces instead of >> zero bytes. >> >> +@item sparse >> +@opindex sparse >> +Make sparse output file. > Please say a little more here. > I.e., when might a hole be introduced? > When is this option useful? Okay. >> @@ -985,6 +990,21 @@ iwrite (int fd, char const *buf, size_t size) >> { >> ssize_t nwritten; >> process_signals (); >> + if (conversions_mask & C_SPARSE) >> + { >> + off_t seek_size = 0; >> + while (total_written + seek_size < size && buf[total_written + seek_size] == 0) >> + ++seek_size; >> + if (seek_size) >> + { >> + off_t cur_off = 0; >> + cur_off = lseek(fd, seek_size, SEEK_CUR); >> + if (cur_off < 0) >> + break; > dd must not ignore lseek failure. That's a problem for me. How would be suitable to handle lseek failure? Perhaps with new kernel API this code may be obsoleted. >> @@ -0,0 +1,70 @@ >> +#!/bin/sh >> +# Ensure that dd conv=sparse works. >> + >> +# Copyright (C) 2003, 2005-2011 Free Software Foundation, Inc. > Use only 2011 as the copyright year. Okay. >> +# sometimes we may read less than 1M >> +dd if=/dev/zero of=sample0 count=1 bs=1M 2> /dev/null || fail=1 >> +[ "`stat -c %s sample0`" = "1048576" ] || fail=1 > We'd write that like this instead: > (note use of test, not "[...]", use of $(...), not `...`) > > test "$(stat -c %s sample0)" = 1048576 || fail=1 Okay. -- WBR, Roman Rybalko
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Mon, 27 Feb 2012 15:23:01 GMT) Full text and rfc822 format available.Message #20 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com> To: "Roman Rybalko (devel)" <devel <at> romanr.info> Cc: 9157 <at> debbugs.gnu.org, Jim Meyering <jim <at> meyering.net> Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Mon, 27 Feb 2012 15:20:00 +0000
[Message part 1 (text/plain, inline)]
I recently noticed a use case for this feature: https://review.openstack.org/#change,4435 But I managed to forget we had a patch pending for this, and spent a couple of hours writing my own version :p Well at least it's good practise for reviewing Roman's... I notice Roman's seeks() the output for any run of NULs, while mine only considers blocks of the full output block size. Checking the full block is a bit more CPU efficient, and gives a bit more control, so I'm marginally leaning towards doing that? Also Roman's doesn't handle the case where a seek is done at the end of the file. In that case an ftruncate() or write() is needed to correctly set the size. Notes on my version attached are: I first need to refactor is_nul() for use by cp too. My version is advisory also We may need to coalesce seeks to larger ones? something like cache_round()? I thought it better to keep the code simple in this regard though as it's probably not of practical concern. I used conv= for bsd compat, rather than oflag=. Needs tests and docs yet. cheers, Pádraig.
[dd-sparse.patch (text/x-patch, attachment)]
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Mon, 27 Feb 2012 15:54:02 GMT) Full text and rfc822 format available.Message #23 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Paul Eggert <eggert <at> cs.ucla.edu> To: Pádraig Brady <P <at> draigBrady.com> Cc: 9157 <at> debbugs.gnu.org, "Roman Rybalko \(devel\)" <devel <at> romanr.info> Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Mon, 27 Feb 2012 07:50:16 -0800
Thanks for looking into this! Some nits: On 02/27/2012 07:20 AM, Pádraig Brady wrote: > +is_nul (const char* buf, size_t bufsize) The usual spacing style is "const char *buf", since the "*" really belongs to the "buf" in C. > + memset (obuf + output_blocksize, 1, sizeof (uintptr_t)); This constraint on OUTPUT_BLOCK_SLOP should be documented, thus: #define OUTPUT_BLOCK_SLOP MAX (sizeof (uintptr_t), page_size - 1) > + /* write sentinel to slop after the buffer, > + to allow efficient checking for NUL blocks. */ Initial cap for sentence. The next comment has a similar issue.
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Mon, 27 Feb 2012 16:14:02 GMT) Full text and rfc822 format available.Message #26 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com> To: Paul Eggert <eggert <at> cs.ucla.edu> Cc: 9157 <at> debbugs.gnu.org, "Roman Rybalko \(devel\)" <devel <at> romanr.info> Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Mon, 27 Feb 2012 16:10:23 +0000
On 02/27/2012 03:50 PM, Paul Eggert wrote: > Thanks for looking into this! Some nits: > > On 02/27/2012 07:20 AM, Pádraig Brady wrote: >> +is_nul (const char* buf, size_t bufsize) > > The usual spacing style is "const char *buf", since the > "*" really belongs to the "buf" in C. > >> + memset (obuf + output_blocksize, 1, sizeof (uintptr_t)); > > This constraint on OUTPUT_BLOCK_SLOP should be documented, thus: > > #define OUTPUT_BLOCK_SLOP MAX (sizeof (uintptr_t), page_size - 1) > >> + /* write sentinel to slop after the buffer, >> + to allow efficient checking for NUL blocks. */ > > Initial cap for sentence. The next comment has a similar issue. > Thanks Paul. Adjustments made. I'm also thinking I should warn once when the new lseek fails, but only after the fall back write() succeeds. cheers, Pádraig.
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Mon, 27 Feb 2012 17:18:01 GMT) Full text and rfc822 format available.Message #29 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Paul Eggert <eggert <at> cs.ucla.edu> To: Pádraig Brady <P <at> draigBrady.com> Cc: 9157 <at> debbugs.gnu.org, "Roman Rybalko \(devel\)" <devel <at> romanr.info> Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Mon, 27 Feb 2012 09:14:14 -0800
On 02/27/2012 08:10 AM, Pádraig Brady wrote: > I'm also thinking I should warn once when > the new lseek fails, but only after the fall back > write() succeeds. I wouldn't bother. Either diagnostic should suffice, and warning about the lseek would slow the app down a bit and make it more complicated.
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Mon, 27 Feb 2012 18:10:02 GMT) Full text and rfc822 format available.Message #32 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com> To: Paul Eggert <eggert <at> cs.ucla.edu> Cc: 9157 <at> debbugs.gnu.org Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Mon, 27 Feb 2012 18:06:12 +0000
On 02/27/2012 05:14 PM, Paul Eggert wrote: > On 02/27/2012 08:10 AM, Pádraig Brady wrote: >> I'm also thinking I should warn once when >> the new lseek fails, but only after the fall back >> write() succeeds. > > I wouldn't bother. Either diagnostic should suffice, > and warning about the lseek would slow the app down > a bit and make it more complicated. Yep, I meant defer to the write() diagnostic. What I was wondering was whether to warn about the case where the seek fails but the write succeeds like here: dd conv=sparse bs=1 count=10 | ... I'll leave it as advisory for now, and not have a warning for the above case. cheers, Pádraig.
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Tue, 28 Feb 2012 01:14:01 GMT) Full text and rfc822 format available.Message #35 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com> To: Paul Eggert <eggert <at> cs.ucla.edu>, roman.rybalko <at> romanr.info Cc: 9157 <at> debbugs.gnu.org Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Tue, 28 Feb 2012 01:09:51 +0000
On 02/27/2012 06:06 PM, Pádraig Brady wrote: > On 02/27/2012 05:14 PM, Paul Eggert wrote: >> On 02/27/2012 08:10 AM, Pádraig Brady wrote: >>> I'm also thinking I should warn once when >>> the new lseek fails, but only after the fall back >>> write() succeeds. >> >> I wouldn't bother. Either diagnostic should suffice, >> and warning about the lseek would slow the app down >> a bit and make it more complicated. > > Yep, I meant defer to the write() diagnostic. > > What I was wondering was whether to warn about > the case where the seek fails but the write succeeds like here: > > dd conv=sparse bs=1 count=10 | ... > > I'll leave it as advisory for now, and not have > a warning for the above case. Another adjustment I'll need to make is how to handle existing regular files with conv=trunc. I.E. seeking over existing possible non NUL data. It's too dangerous/inconsistent to do this for files I think. With devices we can support auto allocating virtual devices as mentioned elsewhere in the thread. We could do any of these for existing files with conv=sparse,notrunc 1. disallow conv=sparse,notrunc 2. ignore sparse (with warning) 3. allow sparse in the additional part of output file 4. detect holes in output file and maintain. I'm leaning towards 2 as that will allow us to change to 3 or 4 if deemed necessary in future. cheers, Pádraig.
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Tue, 28 Feb 2012 08:55:01 GMT) Full text and rfc822 format available.Message #38 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Paul Eggert <eggert <at> cs.ucla.edu> To: Pádraig Brady <P <at> draigBrady.com> Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Tue, 28 Feb 2012 00:50:36 -0800
On 02/27/2012 05:09 PM, Pádraig Brady wrote: > how > to handle existing regular files with conv=trunc. > I.E. seeking over existing possible non NUL data. > It's too dangerous/inconsistent to do this for files I think. Why? This is *dd* we're talking about here. It's *supposed* to be used for tricky stuff like this. If one interprets conv=sparse to mean "write sparsely", rather than "create a sparse file that exactly mimics the input's sparseness", then everything should be clear, no?
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Tue, 28 Feb 2012 11:07:02 GMT) Full text and rfc822 format available.Message #41 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com> To: Paul Eggert <eggert <at> cs.ucla.edu> Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Tue, 28 Feb 2012 11:02:31 +0000
On 02/28/2012 08:50 AM, Paul Eggert wrote: > On 02/27/2012 05:09 PM, Pádraig Brady wrote: >> how >> to handle existing regular files with conv=trunc. >> I.E. seeking over existing possible non NUL data. >> It's too dangerous/inconsistent to do this for files I think. > > Why? This is *dd* we're talking about here. > It's *supposed* to be used for tricky stuff like this. > > If one interprets conv=sparse to mean "write sparsely", > rather than "create a sparse file that exactly mimics > the input's sparseness", then everything should be clear, > no? Well I wasn't considering reproducing sparseness. I was thinking it would be more beneficial to be able to update a file in place, but do so sparsely if possible. I.E. support something like: truncate -r src.img backup.img && dd if=src.img conv=notrunc,sparse of=backup.img To do that, one could not simply skip over the output for NUL input. Doing that would be "dangerous" as I said above, or in other words, surprising to users to not update possibly non NUL data in the output file. As for the "inconsistent" point I mentioned. The last patch currently just seeks the output for NUL input, _except_ for the last byte of the file if part of a NUL write. So that inconsistency means one couldn't use NUL input as a write mask so to speak, if one ever did want such an esoteric feature. I guess we could address the consistency aspect by using ftruncate() rather than the write(outfd, "\0", 1) technique (only doing that if we've extended the file). However I'm not sure such functionality is useful for files, and that we should try to align with the first operation? Note ftruncate() might be good to do anyway to avoid writes completely to a fully sparse file. Also I'm just thought about oflag=append. That will cause writes to ignore the seek beyond end of file. I.E. with the last patch, this will create a file.test with just "ab" printf "a\000\000b" | src/dd conv=sparse bs=1 count=10 oflag=append > file.test I guess an ftruncate() would cater for that too, except in the case where another process is writing to the file, in which case you would might want to disallow the conv=sparse oflag=append combo. We can't really ftruncate as we go as that would be racy, so I guess we should disallow the combo? cheers, Pádraig.
Pádraig Brady <P <at> draigBrady.com>
:Roman Rybalko <devel <at> romanr.info>
:Message #46 received at 9157-done <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com> To: Paul Eggert <eggert <at> cs.ucla.edu> Cc: 9157-done <at> debbugs.gnu.org, roman.rybalko <at> romanr.info Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Tue, 28 Feb 2012 21:49:43 +0000
On 02/28/2012 06:24 PM, Paul Eggert wrote: > On 02/28/2012 09:01 AM, Pádraig Brady wrote: > >> Doing that would be "dangerous" as I said above, >> or in other words, surprising to users to not update >> possibly non NUL data in the output file. > > It's not surprising at all. It's what I want and > expect. For example, suppose every nonzero byte of file A > has an offset that is in a hole of file B. Then I should > be able to overlay A and B into another file C, as follows: > > dd if=A of=C conv=sparse > dd if=B of=C conv=sparse,notrunc > > This is akin to the "or" operation on files, and it'd be a nice > operation to have. Even if A's contents don't fit in B's holes, > I still might want to do the above, to allow B's non-holes to > override A's contents. Why disable this useful functionality? > > Any surprise issues can be dealt with by documenting dd's > behavior appropriately. Fair enough. So this is the "write mask" functionality I referred to, and which is now supported consistently since the write(fd,"\0",1) to ftruncate() change. >> +Try to seek rather than write @sc{nul} output blocks. >> +This will create sparse output when extending. >> +This option is ignored in conjunction with >> +@samp{conv=notrunc} or @samp{oflag=append}. > > I still dubious about this level of handholding. > dd is meant for low-level use, and as far as possible > options should be orthogonal. For example, with dd, > oflag=append does not disable seek=N -- both flags operate, > which means that the seek is ineffective unless it > is past the end of file. conv=sparse oflag=append > should be similar: all it should mean is that the > writes are sparse when they're past the end of > the file (this latter functionality should work, but > doesn't work with the proposed patch). OK I'll remove the 2 restrictions, and essentially move the comments for them from code to texinfo. cheers, Pádraig.
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Tue, 28 Feb 2012 22:25:02 GMT) Full text and rfc822 format available.Message #49 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com> To: Paul Eggert <eggert <at> cs.ucla.edu> Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Tue, 28 Feb 2012 17:01:02 +0000
[Message part 1 (text/plain, inline)]
On 02/28/2012 11:02 AM, Pádraig Brady wrote: > On 02/28/2012 08:50 AM, Paul Eggert wrote: >> On 02/27/2012 05:09 PM, Pádraig Brady wrote: >>> how >>> to handle existing regular files with conv=trunc. >>> I.E. seeking over existing possible non NUL data. >>> It's too dangerous/inconsistent to do this for files I think. >> >> Why? This is *dd* we're talking about here. >> It's *supposed* to be used for tricky stuff like this. >> >> If one interprets conv=sparse to mean "write sparsely", >> rather than "create a sparse file that exactly mimics >> the input's sparseness", then everything should be clear, >> no? > > Well I wasn't considering reproducing sparseness. > I was thinking it would be more beneficial to be able to > update a file in place, but do so sparsely if possible. > I.E. support something like: > > truncate -r src.img backup.img && > dd if=src.img conv=notrunc,sparse of=backup.img > > To do that, one could not simply skip over the output > for NUL input. Doing that would be "dangerous" as I said above, > or in other words, surprising to users to not update > possibly non NUL data in the output file. > > As for the "inconsistent" point I mentioned. The last patch > currently just seeks the output for NUL input, > _except_ for the last byte of the file if part of a NUL write. > So that inconsistency means one couldn't use NUL input as > a write mask so to speak, if one ever did want such an > esoteric feature. I guess we could address the consistency aspect > by using ftruncate() rather than the write(outfd, "\0", 1) technique > (only doing that if we've extended the file). > However I'm not sure such functionality is useful for files, > and that we should try to align with the first operation? > Note ftruncate() might be good to do anyway to avoid > writes completely to a fully sparse file. > > Also I'm just thought about oflag=append. > That will cause writes to ignore the seek beyond end of file. > I.E. with the last patch, this will create a file.test with just "ab" > > printf "a\000\000b" | > src/dd conv=sparse bs=1 count=10 oflag=append > file.test > > I guess an ftruncate() would cater for that too, except in the case > where another process is writing to the file, in which case > you would might want to disallow the conv=sparse oflag=append combo. > We can't really ftruncate as we go as that would be racy, > so I guess we should disallow the combo? Ok the attached should be pretty much complete. It disables conv=sparse when used with conv=notrunc or oflag=append. Comments appreciated. Roman are you around? cheers, Pádraig.
[dd-sparse.patch (text/x-patch, attachment)]
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Tue, 28 Feb 2012 22:40:02 GMT) Full text and rfc822 format available.Message #52 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Paul Eggert <eggert <at> cs.ucla.edu> To: Pádraig Brady <P <at> draigBrady.com> Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Tue, 28 Feb 2012 10:24:56 -0800
On 02/28/2012 09:01 AM, Pádraig Brady wrote: > Doing that would be "dangerous" as I said above, > or in other words, surprising to users to not update > possibly non NUL data in the output file. It's not surprising at all. It's what I want and expect. For example, suppose every nonzero byte of file A has an offset that is in a hole of file B. Then I should be able to overlay A and B into another file C, as follows: dd if=A of=C conv=sparse dd if=B of=C conv=sparse,notrunc This is akin to the "or" operation on files, and it'd be a nice operation to have. Even if A's contents don't fit in B's holes, I still might want to do the above, to allow B's non-holes to override A's contents. Why disable this useful functionality? Any surprise issues can be dealt with by documenting dd's behavior appropriately. > +Try to seek rather than write @sc{nul} output blocks. > +This will create sparse output when extending. > +This option is ignored in conjunction with > +@samp{conv=notrunc} or @samp{oflag=append}. I still dubious about this level of handholding. dd is meant for low-level use, and as far as possible options should be orthogonal. For example, with dd, oflag=append does not disable seek=N -- both flags operate, which means that the seek is ineffective unless it is past the end of file. conv=sparse oflag=append should be similar: all it should mean is that the writes are sparse when they're past the end of the file (this latter functionality should work, but doesn't work with the proposed patch).
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Tue, 28 Feb 2012 23:00:02 GMT) Full text and rfc822 format available.Message #55 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Roman Rybalko <devel <at> romanr.name> To: Pádraig Brady <P <at> draigBrady.com> Cc: 9157 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu> Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Tue, 28 Feb 2012 22:00:23 +0400
On 28.02.2012 21:01, Pádraig Brady wrote: > Ok the attached should be pretty much complete. > It disables conv=sparse when used with > conv=notrunc or oflag=append. > Comments appreciated. About "sparse" stuff everything's ok, imo. Actually I can live with truncated files due to absence of last write, but for user it's really convenient to have output file with matched size :) so it worths having additional static var with some logic around. I have another idea about INPUT_BLOCK_SLOP/OUTPUT_BLOCK_SLOP: what about not hardcoding them but getting them out of the filesystem, or out of ibs/obs specified by user - that last would be the best case. But that's probably not for this feature request. -- WBR, Roman Rybalko
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Tue, 28 Feb 2012 23:14:01 GMT) Full text and rfc822 format available.Message #58 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Jim Meyering <jim <at> meyering.net> To: Pádraig Brady <P <at> draigBrady.com> Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info, Paul Eggert <eggert <at> cs.ucla.edu> Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Wed, 29 Feb 2012 00:13:26 +0100
Pádraig Brady wrote: ... Thanks for working on that. All I would adjust are a few nits and add require_sparse_support_ in the test script: > Subject: [PATCH] dd: add support for the conv=sparse option > > Small seeks are not coalesced to larger ones (like is > done in cache_round() for example, for the moment at least. > > conv= is used rather then oflag= for FreeBSD compatibility. > > * src/dd.c (last_seek): A new global boolean to flag "last" is ambiguous. Does it mean "final" or "preceding"? From the context below, (not the comment, since it too uses "last") I see it means "final". "final_op_was_seek" is longer but ultra clear. Maybe worth the length. > whether the last "write" was converted to a seek. > (usage): Describe the new conf=sparse option. > (scanargs): Ignore conv=sparse in some combinations. > (iwrite): Convert a write of a NUL block to a seek if requested. > (do_copy): Initialize the output buffer to have a sentinel, > to allow for efficient testing for NUL output blocks. > If the last block in the file was converted to a seek, > then convert back to a write so the size ip updated. s/ip/is/ > * NEWS: Mention the new feature. > * tests/dd/sparse: A new test for the feature. > * tests/Makefile.am: Reference the new test. > --- > NEWS | 3 + > doc/coreutils.texi | 7 +++ > src/dd.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++-- > tests/Makefile.am | 1 + > tests/dd/sparse | 57 ++++++++++++++++++++++++++++ > 5 files changed, 168 insertions(+), 4 deletions(-) > create mode 100755 tests/dd/sparse > > diff --git a/NEWS b/NEWS > index e2e8fc5..8006669 100644 > --- a/NEWS > +++ b/NEWS > @@ -7,6 +7,9 @@ GNU coreutils NEWS -*- outline -*- > dd now accepts the count_bytes, skip_bytes iflags and the seek_bytes > oflag, to more easily allow processing portions of a file. > > + dd now accepts the conv=sparse flag to attempt to create sparse > + output, by seeking rather than writing to the output file. > + > split now accepts an optional "from" argument to --numeric-suffixes, > which changes the start number from the default of 0. > > diff --git a/doc/coreutils.texi b/doc/coreutils.texi > index 414626d..f22e7d2 100644 > --- a/doc/coreutils.texi > +++ b/doc/coreutils.texi > @@ -8140,6 +8140,13 @@ Change lowercase letters to uppercase. > > The @samp{lcase} and @samp{ucase} conversions are mutually exclusive. > > +@item sparse > +@opindex sparse > +Try to seek rather than write @sc{nul} output blocks. > +This will create sparse output when extending. Maybe add this: s/\./on a file system that support sparse files./ ? > +This option is ignored in conjunction with > +@samp{conv=notrunc} or @samp{oflag=append}. > + > @item swab > @opindex swab @r{(byte-swapping)} > @cindex byte-swapping > diff --git a/src/dd.c b/src/dd.c > index fe44a30..903346f 100644 > --- a/src/dd.c > +++ b/src/dd.c > @@ -94,7 +94,7 @@ > malloc. See dd_copy for details. INPUT_BLOCK_SLOP must be no less than > OUTPUT_BLOCK_SLOP. */ > #define INPUT_BLOCK_SLOP (2 * SWAB_ALIGN_OFFSET + 2 * page_size - 1) > -#define OUTPUT_BLOCK_SLOP (page_size - 1) > +#define OUTPUT_BLOCK_SLOP MAX (sizeof (uintptr_t), page_size - 1) I haven't seen justification for why you're making the above change. Can sizeof uintptr_t really be larger than page_size-1 (getpagesize()-1)? This seems so unlikely that it'd deserve an assertion in main where page_size is set, even though there are only two uses of OUTPUT_BLOCK_SLOP. If you leave the sizeof, please omit the parentheses. They are unnecessary here. Same below. ... > + /* Write sentinel to slop after the buffer, > + to allow efficient checking for NUL blocks. */ > + memset (obuf + output_blocksize, 1, sizeof (uintptr_t)); ... > diff --git a/tests/dd/sparse b/tests/dd/sparse ... > + > +. "${srcdir=.}/init.sh"; path_prepend_ ../src > +print_ver_ dd require_sparse_support_ > +# Ensure basic sparse generation works > +truncate -s1M sparse > +dd bs=32K if=sparse of=sparse.dd conv=sparse > +test $(stat -c %s sparse) = $(stat -c %s sparse.dd) || fail=1
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Tue, 28 Feb 2012 23:35:02 GMT) Full text and rfc822 format available.Message #61 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com> To: Jim Meyering <jim <at> meyering.net> Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info, Paul Eggert <eggert <at> cs.ucla.edu> Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Tue, 28 Feb 2012 23:33:40 +0000
On 02/28/2012 11:13 PM, Jim Meyering wrote: > Pádraig Brady wrote: > ... > > Thanks for working on that. > All I would adjust are a few nits and > add require_sparse_support_ in the test script: > >> Subject: [PATCH] dd: add support for the conv=sparse option >> >> Small seeks are not coalesced to larger ones (like is >> done in cache_round() for example, for the moment at least. >> >> conv= is used rather then oflag= for FreeBSD compatibility. >> >> * src/dd.c (last_seek): A new global boolean to flag > > "last" is ambiguous. Does it mean "final" or "preceding"? >>From the context below, (not the comment, since it too uses "last") > I see it means "final". "final_op_was_seek" is longer but ultra clear. > Maybe worth the length. yes I was too terse >> -#define OUTPUT_BLOCK_SLOP (page_size - 1) >> +#define OUTPUT_BLOCK_SLOP MAX (sizeof (uintptr_t), page_size - 1) > > I haven't seen justification for why you're making the above change. > Can sizeof uintptr_t really be larger than page_size-1 (getpagesize()-1)? > This seems so unlikely that it'd deserve an assertion in main where > page_size is set, even though there are only two uses of OUTPUT_BLOCK_SLOP. Yes it's never going to trigger. Paul suggested MAX(...) so as to doc/enforce the new constraint. assert() is not used in dd.c yet I'll leave as is unless Paul comments otherwise (modulo the extraneous () of course). cheers, Pádraig
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Tue, 28 Feb 2012 23:41:01 GMT) Full text and rfc822 format available.Message #64 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Paul Eggert <eggert <at> cs.ucla.edu> To: Jim Meyering <jim <at> meyering.net> Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info, Pádraig Brady <P <at> draigBrady.com> Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Tue, 28 Feb 2012 15:39:38 -0800
On 02/28/2012 03:13 PM, Jim Meyering wrote: >> -#define OUTPUT_BLOCK_SLOP (page_size - 1) >> > +#define OUTPUT_BLOCK_SLOP MAX (sizeof (uintptr_t), page_size - 1) > I haven't seen justification for why you're making the above change. > Can sizeof uintptr_t really be larger than page_size-1 (getpagesize()-1)? > This seems so unlikely that it'd deserve an assertion in main where > page_size is set, even though there are only two uses of OUTPUT_BLOCK_SLOP. Theoretically getpagesize could return 1, no? Admittedly it's unlikely. I guess it'd be OK to omit this change, and to just put in a comment saying something like "We use the output slop to store a uintptr_t value, so we assume that getpagesize returns a value greater than sizeof (uintptr_t)." Perhaps the comment should go right next to the place where the uintptr_t sentinel is inserted. > If you leave the sizeof, please omit the parentheses. But "sizeof uintptr_t" wouldn't work, as uintptr_t is a type.
bug-coreutils <at> gnu.org
:bug#9157
; Package coreutils
.
(Wed, 29 Feb 2012 00:37:02 GMT) Full text and rfc822 format available.Message #67 received at 9157 <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com> To: Paul Eggert <eggert <at> cs.ucla.edu> Cc: 9157 <at> debbugs.gnu.org, roman.rybalko <at> romanr.info, Jim Meyering <jim <at> meyering.net> Subject: Re: bug#9157: [PATCH] dd: sparse conv flag Date: Wed, 29 Feb 2012 00:36:15 +0000
On 02/28/2012 11:39 PM, Paul Eggert wrote: > On 02/28/2012 03:13 PM, Jim Meyering wrote: >>> -#define OUTPUT_BLOCK_SLOP (page_size - 1) >>>> +#define OUTPUT_BLOCK_SLOP MAX (sizeof (uintptr_t), page_size - 1) >> I haven't seen justification for why you're making the above change. >> Can sizeof uintptr_t really be larger than page_size-1 (getpagesize()-1)? >> This seems so unlikely that it'd deserve an assertion in main where >> page_size is set, even though there are only two uses of OUTPUT_BLOCK_SLOP. > > Theoretically getpagesize could return 1, no? > Admittedly it's unlikely. I guess it'd be OK > to omit this change, and to just put in a comment > saying something like "We use the output slop > to store a uintptr_t value, so we assume that > getpagesize returns a value greater than sizeof (uintptr_t)." > Perhaps the comment should go right next to the place where > the uintptr_t sentinel is inserted. > >> If you leave the sizeof, please omit the parentheses. > > But "sizeof uintptr_t" wouldn't work, as uintptr_t is a type. > Ok I'm going with the version with the assert, since that's better than a comment. thanks for all the comments, Pádraig. p.s. I can't commit at the moment due to authentication error. I'll look into it in the morning.
Debbugs Internal Request <help-debbugs <at> gnu.org>
to internal_control <at> debbugs.gnu.org
.
(Wed, 28 Mar 2012 11:24:02 GMT) Full text and rfc822 format available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.