GNU bug report logs - #5918
[dd] conv=sparse option

Previous Next

Package: coreutils;

Reported by: Heinrich Langos <henrik-gnu <at> prak.org>

Date: Sat, 10 Apr 2010 00:33:02 UTC

Severity: normal

Tags: fixed

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 5918 in the body.
You can then email your comments to 5918 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5918; Package coreutils. (Sat, 10 Apr 2010 00:33:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Heinrich Langos <henrik-gnu <at> prak.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 10 Apr 2010 00:33:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Heinrich Langos <henrik-gnu <at> prak.org>
To: Andreas Schwab <schwab <at> suse.de>
Cc: Samuel Thibault <samuel.thibault <at> ens-lyon.org>, bug-coreutils <at> gnu.org
Subject: Re: [dd] conv=sparse option
Date: Sat, 10 Apr 2010 02:28:57 +0200
Hello Andreas, Samuel and list,

sorry to pick up such an old thread, but I stumbled upon it while
looking for an efficient way to "re-sparse" files that contain a 
lot of zero blocks but 
1) had already been expanded 
or 
2) are being expanded due to pipes.

On Sun, Dec 30, 2007 at 10:19:54AM +0100, Andreas Schwab wrote:
> Samuel Thibault <samuel.thibault <at> ens-lyon.org> writes:
> 
> > Some time ago, I wrote a conv=sparse option for dd, attached is the
> > patch.
> 
> How is it different from cp --sparse=always?

I'd say in enough ways to make such an option highly desirable.

a) "dd" will maintain an existing of=target file including the inode 
   number, thus respecting existing hard links. "cp" will depending 
   on the other options given (e.g. "-a") maintain or break existing 
   hard links to an existing target file.

b) "dd" could read a stream from a device or stdin and write it directly 
   to a sparse file. no need to "dd" from e.g. a block device to a file and 
   afterwards do a "cp --sparse=always file sparse-file". this will save a 
   lot of disk space, io operations and time.


example transcript for a) :

     1  hlangos <at> jukebox:~/sparse$ ls -lis                                  
     2  total 1984                                                         
     3  114692 496 -rw-r--r-- 4 hlangos hlangos 500000 2010-04-10 01:55 non-sparse
     4  114692 496 -rw-r--r-- 4 hlangos hlangos 500000 2010-04-10 01:55 non-sparse2
     5  114692 496 -rw-r--r-- 4 hlangos hlangos 500000 2010-04-10 01:55 non-sparse3
     6  114692 496 -rw-r--r-- 4 hlangos hlangos 500000 2010-04-10 01:55 non-sparse4
     7  114690   0 -rw-r--r-- 1 hlangos hlangos 500000 2010-04-10 01:56 sparse
     8  hlangos <at> jukebox:~/sparse$ cp sparse non-sparse
     9  hlangos <at> jukebox:~/sparse$ ls -lis
    10  total 0
    11  114692 0 -rw-r--r-- 4 hlangos hlangos 500000 2010-04-10 01:56 non-sparse
    12  114692 0 -rw-r--r-- 4 hlangos hlangos 500000 2010-04-10 01:56 non-sparse2
    13  114692 0 -rw-r--r-- 4 hlangos hlangos 500000 2010-04-10 01:56 non-sparse3
    14  114692 0 -rw-r--r-- 4 hlangos hlangos 500000 2010-04-10 01:56 non-sparse4
    15  114690 0 -rw-r--r-- 1 hlangos hlangos 500000 2010-04-10 01:56 sparse
    16  hlangos <at> jukebox:~/sparse$ dd if=/dev/zero bs=1 count=500000 of=non-sparse
    17  500000+0 records in
    18  500000+0 records out
    19  500000 bytes (500 kB) copied, 3.96621 s, 126 kB/s
    20  hlangos <at> jukebox:~/sparse$ ls -lis
    21  total 1984
    22  114692 496 -rw-r--r-- 4 hlangos hlangos 500000 2010-04-10 01:57 non-sparse
    23  114692 496 -rw-r--r-- 4 hlangos hlangos 500000 2010-04-10 01:57 non-sparse2
    24  114692 496 -rw-r--r-- 4 hlangos hlangos 500000 2010-04-10 01:57 non-sparse3
    25  114692 496 -rw-r--r-- 4 hlangos hlangos 500000 2010-04-10 01:57 non-sparse4
    26  114690   0 -rw-r--r-- 1 hlangos hlangos 500000 2010-04-10 01:56 sparse
    27  hlangos <at> jukebox:~/sparse$ cp -a sparse non-sparse
    28  hlangos <at> jukebox:~/sparse$ ls -lis
    29  total 1488
    30  114691   0 -rw-r--r-- 1 hlangos hlangos 500000 2010-04-10 01:56 non-sparse
    31  114692 496 -rw-r--r-- 3 hlangos hlangos 500000 2010-04-10 01:57 non-sparse2
    32  114692 496 -rw-r--r-- 3 hlangos hlangos 500000 2010-04-10 01:57 non-sparse3
    33  114692 496 -rw-r--r-- 3 hlangos hlangos 500000 2010-04-10 01:57 non-sparse4
    34  114690   0 -rw-r--r-- 1 hlangos hlangos 500000 2010-04-10 01:56 sparse
    35  hlangos <at> jukebox:~/sparse$

As you see in line 30, a new "non-sparse" file has been created with a different inode
number while the link count of the other "non-sparse*" files has be reduced.


I'd very much like to see the patch make it into "dd", though I think it might be
better to integrate that function as "oflag=sparse" instead of "conv=sparse". 
After all you don't convert data but change the way the output is done.

cheers
-henrik






Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5918; Package coreutils. (Sat, 10 Apr 2010 15:34:02 GMT) Full text and rfc822 format available.

Message #8 received at 5918 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Heinrich Langos <henrik-gnu <at> prak.org>
Cc: Andreas Schwab <schwab <at> suse.de>, 5918 <at> debbugs.gnu.org,
	samuel.thibault <at> ens-lyon.org
Subject: Re: bug#5918: [dd] conv=sparse option
Date: Sat, 10 Apr 2010 16:33:07 +0100
On 10/04/10 01:28, Heinrich Langos wrote:
> Hello Andreas, Samuel and list,
> 
> sorry to pick up such an old thread, but I stumbled upon it while
> looking for an efficient way to "re-sparse" files that contain a 
> lot of zero blocks but 
> 1) had already been expanded 
> or 
> 2) are being expanded due to pipes.
> 
> On Sun, Dec 30, 2007 at 10:19:54AM +0100, Andreas Schwab wrote:
>> Samuel Thibault <samuel.thibault <at> ens-lyon.org> writes:
>>
>>> Some time ago, I wrote a conv=sparse option for dd, attached is the
>>> patch.
>>
>> How is it different from cp --sparse=always?
> 
> I'd say in enough ways to make such an option highly desirable.
> 
> a) "dd" will maintain an existing of=target file including the inode 
>    number, thus respecting existing hard links. "cp" will depending 
>    on the other options given (e.g. "-a") maintain or break existing 
>    hard links to an existing target file.

I don't think that's possible as holes can only be created at the end of a file.
Well I think NTFS supports punching holes in the "middle" but it's not common.

> 
> b) "dd" could read a stream from a device or stdin and write it directly 
>    to a sparse file. no need to "dd" from e.g. a block device to a file and 
>    afterwards do a "cp --sparse=always file sparse-file". this will save a 
>    lot of disk space, io operations and time.

This seems to work:
cp --sparse=always /dev/stdin file

cheers,
Pádraig.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5918; Package coreutils. (Sat, 10 Apr 2010 17:18:01 GMT) Full text and rfc822 format available.

Message #11 received at 5918 <at> debbugs.gnu.org (full text, mbox):

From: Samuel Thibault <samuel.thibault <at> gnu.org>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Andreas Schwab <schwab <at> suse.de>, 5918 <at> debbugs.gnu.org,
	Heinrich Langos <henrik-gnu <at> prak.org>
Subject: Re: bug#5918: [dd] conv=sparse option
Date: Sat, 10 Apr 2010 18:46:13 +0200
Pádraig Brady, le Sat 10 Apr 2010 16:33:07 +0100, a écrit :
> On 10/04/10 01:28, Heinrich Langos wrote:
> > a) "dd" will maintain an existing of=target file including the inode 
> >    number, thus respecting existing hard links. "cp" will depending 
> >    on the other options given (e.g. "-a") maintain or break existing 
> >    hard links to an existing target file.
> 
> I don't think that's possible as holes can only be created at the end of a file.
> Well I think NTFS supports punching holes in the "middle" but it's not common.

I believe there's demand for supporting punching holes in the middle
of files and it will eventually show up in Linux. For instance, the
combination of IDE TRIM support and virtualization can allow virtualized
guest to take as less disk space as possible in file-backed virtual
disks.

Samuel




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5918; Package coreutils. (Sun, 11 Apr 2010 14:02:01 GMT) Full text and rfc822 format available.

Message #14 received at 5918 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Andreas Schwab <schwab <at> suse.de>, 5918 <at> debbugs.gnu.org,
	samuel.thibault <at> ens-lyon.org, Heinrich Langos <henrik-gnu <at> prak.org>
Subject: Re: bug#5918: [dd] conv=sparse option
Date: Sun, 11 Apr 2010 16:01:35 +0200
Pádraig Brady wrote:
> On 10/04/10 01:28, Heinrich Langos wrote:
>> Hello Andreas, Samuel and list,
>>
>> sorry to pick up such an old thread, but I stumbled upon it while
>> looking for an efficient way to "re-sparse" files that contain a
>> lot of zero blocks but
>> 1) had already been expanded
>> or
>> 2) are being expanded due to pipes.
>>
>> On Sun, Dec 30, 2007 at 10:19:54AM +0100, Andreas Schwab wrote:
>>> Samuel Thibault <samuel.thibault <at> ens-lyon.org> writes:
>>>
>>>> Some time ago, I wrote a conv=sparse option for dd, attached is the
>>>> patch.
>>>
>>> How is it different from cp --sparse=always?
>>
>> I'd say in enough ways to make such an option highly desirable.
>>
>> a) "dd" will maintain an existing of=target file including the inode
>>    number, thus respecting existing hard links. "cp" will depending
>>    on the other options given (e.g. "-a") maintain or break existing
>>    hard links to an existing target file.
>
> I don't think that's possible as holes can only be created at the end of a file.
> Well I think NTFS supports punching holes in the "middle" but it's not common.

I would like at least cp to be able to copy sparse files efficiently,
and considering the FIEMAP patches that Jeff Liu is working on, we
don't have long to wait.

BTW, I'm pretty sure it is possible to punch a hole in the middle of
a file with XFS.  Maybe with other CoW file systems, too?




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5918; Package coreutils. (Tue, 13 Apr 2010 10:32:02 GMT) Full text and rfc822 format available.

Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "H. Langos" <henrik-gnu <at> prak.org>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#5918: [dd] conv=sparse option
Date: Tue, 13 Apr 2010 12:30:47 +0200
On Sat, Apr 10, 2010 at 06:46:13PM +0200, Samuel Thibault wrote:
> Pádraig Brady, le Sat 10 Apr 2010 16:33:07 +0100, a écrit :
> > On 10/04/10 01:28, Heinrich Langos wrote:
> > > a) "dd" will maintain an existing of=target file including the inode 
> > >    number, thus respecting existing hard links. "cp" will depending 
> > >    on the other options given (e.g. "-a") maintain or break existing 
> > >    hard links to an existing target file.
> > 
> > I don't think that's possible as holes can only be created at the end of a file.
> > Well I think NTFS supports punching holes in the "middle" but it's not common.
 
I was not advocating support for punching hole in existing files (though
this is what I want to do in the end).

I was only interested in an option that would create sparse output files by
seeking forward in the output file whenever zero bytes in the input stream
ocurr. This is completely independent of the filesystem underneath. It
should even work if the FS doesn't support holes at all, as long as the 
interface is POSIX compliant.


> I believe there's demand for supporting punching holes in the middle
> of files and it will eventually show up in Linux. For instance, the
> combination of IDE TRIM support and virtualization can allow virtualized
> guest to take as less disk space as possible in file-backed virtual
> disks.

True. "Thin Provisioning", that is the usage of disk images that start out
by allocating only the used disk blocks and growing on demand, suffers from
the inability of guest systems to "give back" unused blocks whenever a block
is released. There are ways around it, but none of them work very well with 
a life guest.

The question there is wether this is a task for "dd" or for a specialized
tool.

-henrik







Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5918; Package coreutils. (Tue, 13 Apr 2010 12:16:02 GMT) Full text and rfc822 format available.

Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "H. Langos" <henrik-gnu <at> prak.org>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#5918: [dd] conv=sparse option
Date: Tue, 13 Apr 2010 14:14:48 +0200
On Sat, Apr 10, 2010 at 04:33:07PM +0100, Pádraig Brady wrote:
> On 10/04/10 01:28, Heinrich Langos wrote:
> > Hello Andreas, Samuel and list,
> > 
> > sorry to pick up such an old thread, but I stumbled upon it while
> > looking for an efficient way to "re-sparse" files that contain a 
> > lot of zero blocks but 
> > 1) had already been expanded 
> > or 
> > 2) are being expanded due to pipes.
> > 
> > On Sun, Dec 30, 2007 at 10:19:54AM +0100, Andreas Schwab wrote:
> >> Samuel Thibault <samuel.thibault <at> ens-lyon.org> writes:
> >>
> >>> Some time ago, I wrote a conv=sparse option for dd, attached is the
> >>> patch.
> >>
> >> How is it different from cp --sparse=always?
> > 
> > I'd say in enough ways to make such an option highly desirable.
> > 
> > a) "dd" will maintain an existing of=target file including the inode 
> >    number, thus respecting existing hard links. "cp" will depending 
> >    on the other options given (e.g. "-a") maintain or break existing 
> >    hard links to an existing target file.
> 
> I don't think that's possible as holes can only be created at the end of a file.
> Well I think NTFS supports punching holes in the "middle" but it's not common.
> 
> > 
> > b) "dd" could read a stream from a device or stdin and write it directly 
> >    to a sparse file. no need to "dd" from e.g. a block device to a file and 
> >    afterwards do a "cp --sparse=always file sparse-file". this will save a 
> >    lot of disk space, io operations and time.
> 
> This seems to work:
> cp --sparse=always /dev/stdin file

Yeap. That worked!

>  hlangos <at> pc-hlangos:~/zaurus$ ls -lisa foo
>  958477 4 -rw-r--r-- 1 hlangos hlangos 3072 2010-04-13 12:12 foo
>  hlangos <at> pc-hlangos:~/zaurus$ dd if=/dev/zero bs=1k count=100 | cp --sparse=always /dev/stdin foo
>  100+0 records in
>  100+0 records out
>  102400 bytes (102 kB) copied, 0.0802346 s, 1.3 MB/s
>  hlangos <at> pc-hlangos:~/zaurus$ ls -lisa foo
>  958477 0 -rw-r--r-- 1 hlangos hlangos 102400 2010-04-13 14:06 foo

It doesn't change the target file's inode (and also maintains the existing
hard links).

Cheers
-henrik







Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#5918; Package coreutils. (Mon, 22 Nov 2010 16:16:02 GMT) Full text and rfc822 format available.

Message #23 received at 5918 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Andreas Schwab <schwab <at> suse.de>, 5918 <at> debbugs.gnu.org,
	samuel.thibault <at> ens-lyon.org, Heinrich Langos <henrik-gnu <at> prak.org>
Subject: Re: bug#5918: [dd] conv=sparse option
Date: Mon, 22 Nov 2010 16:19:25 +0000
On 11/04/10 15:01, Jim Meyering wrote:
> Pádraig Brady wrote:
>> On 10/04/10 01:28, Heinrich Langos wrote:
>>> Hello Andreas, Samuel and list,
>>>
>>> sorry to pick up such an old thread, but I stumbled upon it while
>>> looking for an efficient way to "re-sparse" files that contain a
>>> lot of zero blocks but
>>> 1) had already been expanded
>>> or
>>> 2) are being expanded due to pipes.
>>>
>>> On Sun, Dec 30, 2007 at 10:19:54AM +0100, Andreas Schwab wrote:
>>>> Samuel Thibault <samuel.thibault <at> ens-lyon.org> writes:
>>>>
>>>>> Some time ago, I wrote a conv=sparse option for dd, attached is the
>>>>> patch.
>>>>
>>>> How is it different from cp --sparse=always?
>>>
>>> I'd say in enough ways to make such an option highly desirable.
>>>
>>> a) "dd" will maintain an existing of=target file including the inode
>>>    number, thus respecting existing hard links. "cp" will depending
>>>    on the other options given (e.g. "-a") maintain or break existing
>>>    hard links to an existing target file.
>>
>> I don't think that's possible as holes can only be created at the end of a file.
>> Well I think NTFS supports punching holes in the "middle" but it's not common.
> 
> I would like at least cp to be able to copy sparse files efficiently,
> and considering the FIEMAP patches that Jeff Liu is working on, we
> don't have long to wait.
> 
> BTW, I'm pretty sure it is possible to punch a hole in the middle of
> a file with XFS.  Maybe with other CoW file systems, too?

They're just now adding FALLOC_FL_PUNCH_HOLE to fallocate() in the Linux kernel.
It's supported by xfs and ocfs2, and other filesystems will for now, return EOPNOTSUPP
fallocate(..,FALLOC_FL_PUNCH_HOLE) will return EOPNOTSUPP on older kernels
even for xfs and ocfs2.

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#5918; Package coreutils. (Wed, 10 Oct 2018 16:45:02 GMT) Full text and rfc822 format available.

Message #26 received at 5918 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: 5918 <at> debbugs.gnu.org
Subject: Re: bug#5918: [dd] conv=sparse option
Date: Wed, 10 Oct 2018 10:44:14 -0600
tags 5918 fixed
close 5918
stop

Hello,

Coreutils version 8.16 (released 2012) gained "dd conv=sparse" option,
see 
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=4e776faa8482ae630d2ea9bc767298e664f07ba9

closing this bug.

regards,
 - assaf




Added tag(s) fixed. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 10 Oct 2018 16:45:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 5918 <at> debbugs.gnu.org and Heinrich Langos <henrik-gnu <at> prak.org> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 10 Oct 2018 16:45:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 08 Nov 2018 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 228 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.