GNU bug report logs - #6131
[PATCH]: fiemap support for efficient sparse file copy

Previous Next

Package: coreutils;

Reported by: "jeff.liu" <jeff.liu <at> oracle.com>

Date: Fri, 7 May 2010 14:16:02 UTC

Severity: normal

Tags: patch

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


Message #236 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, bug-coreutils <at> gnu.org,
	Jim Meyering <jim <at> meyering.net>, Joel Becker <Joel.Becker <at> oracle.com>,
	"jeff.liu" <jeff.liu <at> oracle.com>,
	Chris Mason <chris.mason <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>
Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Date: Thu, 15 Jul 2010 00:51:36 +0100
On 14/07/10 18:45, Paul Eggert wrote:
>>> I see fiemap just as a way to efficiently detect/read holes,
>>> and should have no bearing on the destination.
> 
> Hmm, but the proposal quoted below would mean that fiemap does have a
> bearing on the destination, in the --sparse=auto case.
> I guess this is OK, but it should be documented.
> 
>>> cp --sparse=auto (this is currently what cp does by default)
>>>   recreate the original fiemap holes or resort to existing
>>>   heuristic if fiemap not available
> 
> It's not just fiemap.  It's also the Solaris interface with SEEK_HOLE
> and SEEK_DATA.  The change should involve a module that isolates these
> low-level details from copy.c.  copy.c should ask the new module for the
> locations of the holes (or the non-holes: that could be more convenient).
> On traditional hosts without fiemap or SEEK_DATA, the module should report
> that it doesn't know where the holes are; this can let copy.c resort to
> the existing heuristic of looking at the size and the disk usage and
> using the --sparse=always approach if the file "smells" like it's sparse.
> 
>>> cp --sparse=never
>>>   write all data, but use fiemap if available to efficiently read
> 
> Surely there's no need to write all the data if fallocate works.
> 
>>> cp --sparse=always
>>>   recreate original holes and perhaps extend add to them for
>>>   other runs of zero bytes. Without having looked at the code
>>>   I see this as a little tricky to mix with fiemap.
>>>   Now since fiemap is only an optimization we can skip it
>>>   completely for this uncommon case if too tricky (just add a FIXME for now).
> 
> Yes, that makes sense.  --sparse=always should never invoke fallocate.
> 
>> For 'cp --sparse=never', when detected holes from SRC file, do not lseek(2) against DST file,
>> instead, write ZEROs to DST file, Am I right?
> 
> Only if fallocate doesn't work.  If fallocate works, there's no need
> to write zeros to the destination.

What you're describing here is posix_fallocate()
which uses fallocate() if available or falls back
to an implementation that writes a single 0 byte
to each block.

> 
>> 2. Performance optimization, invoke fallocate(2) if an extent flag is UNWRITTEN
> 
> This doesn't sound right.  A FIEMAP_EXTENT_UNWRITTEN extent is all zeros, and
> so it should act as if it were a hole.  The goal is not to copy the exact
> fiemap structure of the source (that's impossible): the goal is to use as
> little time and space as possible.
> 
>> If you decide to do that, then please do it as a separate patch.
> 
> It's not clear to me that the fiemap stuff can be cleanly separated
> from the fallocate stuff.  To some extent they're the same issue.
> If they can easily be separated, that's better of course.

I see fiemap as optimizing reads,
posix_fallocate() as optimizing writing zeros
and fallocate() as optimizing allocation.

So not having thought much about implementation details,
it seems like they could be logically separated.
I.E. we could optimize the writing zeros and allocation
later when we have the fallocate and posix_fallocate
gnulib modules in place.

In saying that, doing both now is better
when these details are in everyone's minds.
I'll not get to resubmitting my fallocate gnulib patch,
or doing a posix_fallocate module, this week at least I think.

cheers,
Pádraig.




This bug report was last modified 14 years and 119 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.