Package: coreutils;
Reported by: Kelly Anderson <kelly <at> silka.with-linux.com>
Date: Wed, 14 Sep 2011 06:47:02 UTC
Severity: wishlist
Tags: patch
View this message in rfc822 format
From: Pádraig Brady <P <at> draigBrady.com> To: Eric Blake <eblake <at> redhat.com> Cc: 9500 <at> debbugs.gnu.org, Kelly Anderson <kelly <at> silka.with-linux.com> Subject: bug#9500: [PATCH]: use posix_fallocate where supported Date: Wed, 23 Nov 2011 00:49:11 +0000
On 09/14/2011 03:46 PM, Pádraig Brady wrote: > On 09/14/2011 03:06 PM, Eric Blake wrote: >> On 09/13/2011 11:55 PM, Kelly Anderson wrote: >>> Hi, >>> >>> I put together a patch 2 or 3 years ago (back when posix_fallocate was >>> first introduced in glibc). >> >> Thanks for the effort. However, this has been discussed in the past, and the consensus was that we should first write a patch to gnulib that provides a posix_fallocate() stub for all platforms, so that coreutils can unconditionally call posix_fallocate, rather than making coreutils have to use #ifdef. Among other things, a gnulib module would make it possible to emulate posix_fallocate() even on older glibc where it is missing or broken. >> > > Also we probably want fallocate() for this use case > rather than posix_fallocate() in any case, > as we don't want to fall back to writing zeros. > > Also I had a whole lot of fallocate() things to try > once the fiemap() stuff landed, but unfortunately > that doesn't work reliably on all file systems > and is currently restricted to sparse files. > So I need to dig out my notes on how to apply > fallocate() to files with holes and "empty portions" again. I thought a little about this today. fallocate() is a feature to quickly allocate space in a file system. It's useful for 3 things as far as I can see: 1. Improved file layout for subsequent access 2. Immediate indication of ENOSPC 3. Efficient writing of NUL portions Note 1. is somewhat moot with newer file systems that do "delayed allocation". So what do we need to consider when using fallocate on the destination file? Considering just cp for the moment, its inputs impacting this are the options: --sparse={auto,always,never} Note with no --sparse specified we behave with --sparse=auto, where we try to detect holes based on st_size vs st_blocks The other significant input is the construction of the source file. Now data in a file can generally be classed into 4 types: Data: normal data Zero: normal data containing only NULs Hole: unallocated data containing only NULs Empty: allocated data containing only NULs One can have any of the above types at any point in the file. Also 'Empty' is special in that it can extend beyond the apparent size. In fact this tail allocation is common on XFS for performance reasons. An important factor is how well we can distinguish the above data classes. There are currently three possible identification options: Heuristics This is used by default to see if holes might be present. The test is simply st_size >= the appropriate number of allocated st_blocks. Note, this can fail for example in the case where there is a tail allocation not accounted for in the size like: +-----------+---+ | D | E | H | E | +-----------+---+ Traditionally when a sparse source is detected we check input blocks for all zeros and create a 'Hole' in the destination instead. This is inefficient as it requires reading all the NUL data and verifying that it is in fact NUL. SEEK_HOLE Available on linux since 3.1 'Empty' is treated like a 'Hole' which at least allows 'Empty' portions to be processed quickly by `cp`. We lose the ability to copy the allocation from src to dst. fiemap Available on linux since around 2.6.39 Gives greater control by distinguishing Hole and Empty, thus allowing us to both efficiently copy and maintain allocation. Requires sync on ext4, xfs Code already done and used (with sync) for sparse files Note by not being able to use fiemap with non sparse files, means that we need to read() the empty extents which is inefficient, especially in --sparse=always mode. So given the above info, what functionality might the use of fallocate() make available to cp? Exact copy from source to dest: Copying the source layout would mean that one could for example, create a backup copy of a large db file, which could be then used without worrying about fragmentation or ENOSPC issues. There is the argument that this might be better as a higher level file operation anyway, and perhaps `cp --reflink` might cover this use case on some file systems at least. fiemap gives us most control, allowing us to copy even tail allocations from source to destination. But the sync issue makes it not usable in general at present, and is currently restricted to sparse files where it's used to avoid reading 'Empty' and 'Hole' portions. Copying sparse files It's worth noting again, the caveat mentioned above that we might not recognise some sparse files due to tail allocation. Given that we use fiemap (with sync) for sparse files at present, we can augment the fiemap copying code to use fallocate where appropriate. So dependent on the options the operations would be: --sparse=auto => 'Empty' -> 'Empty' --sparse=always => 'Empty' -> 'Hole' && discard tail allocation --sparse=never => 'Hole' -> 'Empty' Perhaps the first case could be simplified to initially doing: fallocate(dest, blocks*blocksize)) Copying normal files Note using SEEK_HOLE for this case, would only help to avoid reading 'Hole' and more likely 'Empty' portions, and should not impact on the use of fallocate(dest). So assuming we initially did: if ! --sparse=always fallocate(dest, st_size) That would throw away any tail allocation in the source, which is probably OK as noted above. In fact we might always discard tail allocation for consistency, unless we can use fiemap for all cases. I'll cook something up on this soon. cheers, Pádraig.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.