Hi,

[Resending message cc'ed to 9500@debbugs.gnu.org as requested.]

On Fri, May 11, 2012 21:36, Pádraig Brady wrote:
> On 05/11/2012 08:45 PM, Mark wrote:
>> ...
>> I'm using kernel 3.0.0-19-generic #32-Ubuntu here. But probably more
relevant, the partition I tested on was ~1.7TB ext4 on an external USB
2.0
>> drive which was almost full and probably *very* fragmented, i.e. free
space spread all over the disk in thousands of small chunks. ext4 seems
to be pretty slow at allocating space in that case.
> But you asked to fallocate(10GB).
> That should have failed immediately,
> because it's bigger than the file system
> Could you run the fallocate loop from my previous mail,
> across an appropriate range of sizes, just to confirm,
> and maybe prepare input to a kernel bug report?

Unfortunately I have freed up a lot of space on the partition in question,
but I did some testing with a new small empty test partition, see below.


>> If I were designing a filesystem, I'd have it immediately return
failure
>> if fallocate() is specified with additional size larger than the amount of
>> free space. Though for the filesystem, determining how much extra space a
>> fallocate() call would need can be quite involved in some cases and
require a significant amount of disk access...
>> Imagine a huge sparse file with many thousands of holes, and the requested
>> region for fallocate() serving to "fill in" many of the holes. But any
non-hole parts within the fallocate() region would reduce the amount of
additional space required for fallocate() to succeed. So it's not as
simple as comparing length of fallocate() region with amount of free
space...
>> Unless you're creating a new file, which is what cp does most of the time.
>> So maybe a workaround could be added to cp. If --preallocate is specified,
>> cp could check the amount of free space before writing to the
>> destination
>> file and abort without even needing to call fallocate() if there isn't
enough. (In fact, cp could do that anyway in most cases I think?)
>
> Good analysis. Still though for a new file the file system
> should be able to to do the simple short calculation
> of fallocate_request - free_space > 0.

Yes.

From the filesystem's perspective it could easily immediately fail
fallocate() in some cases.

When fallocate() is called, the approximate space needed for success is
  (length of region passed to fallocate) - (amount already allocated to
file which overlaps the fallocate region).

The worst case is that the entire overlap between the fallocate region and
the file is a hole. Then
  space needed = fallocate region size.

However the filesystem could narrow it down a little. Consider the
difference between a file's apparent size and its on-disk size. Roughly
  size of all holes = apparent size - on-disk size

The space needed will not be more than:
  (size of fallocate region past end of file) + min((size of fallocate
region overlapping file), size of all holes)

So the fallocate() call could be failed without doing any work if there is
less free space than that.


> I can understand inefficiencies in fallocating
> around the free space limit, but otherwise
> this seems like a bug in ext4.
> (maybe a regression since I don't see it on my ext4 system).

I also saw slowness (i.e. a lot of disk I/O) when writing a file to that
ext4 partition without pre-allocation. It seemed like once the file got
above a certain size, there was a *lot* of disk I/O for several seconds.
Or maybe all the disk I/O happened when the kernel writeback kicked in.
Perhaps that's related to the fallocate() slowness, if the filesystem had
to seek all over the disk again and again to find free space.

As I mentioned though, that filesystem was/is *very* fragmented, with a
relatively small proportion of free space. I might not be able to
reproduce the issue now, because I shifted about 250GB of data off that
partition the other day.

Also worth noting, most ext4 partitions have a certain percentage reserved
for the root user. Maybe testing should be done either on a partition with
no reserved space, or as root.


I posted to the ext4 list a while ago about fallocate() behaviour. Not
specifically about this issue, but the fact that it's not atomic. If it
fails due to lack of space, it allocates all space on the partition with
no way to easily undo that.

For example, suppose the user has a very large sparse file with large
holes, maybe a virtual machine hard disk image or similar. User wants to
make the file non-sparse so calls fallocate() to allocate all the holes.
But it turns out there wasn't enough disk space for that to succeed. Or
maybe some other program allocated a lot of space in the mean time.
fallocate() fails after allocating all remaining space on the partition.
Other than deleting the file, the user would need to roll their own
hole-punching program to reclaim space.

You can read the thread about that at
  http://comments.gmane.org/gmane.comp.file-systems.ext4/29942


Back to the brief test I mentioned above. See the attached file. That
doesn't demonstrate any slowness (since the test was done on a very small
empty partition), but does demonstrate ext4 fallocate() not returning an
error on attempting to allocate more than the partition size. Instead it
allocates all remaining space then fails, leaving the user to manually fix
things afterwards.


Regards,
-- Mark