Hi, [Resending message cc'ed to 9500@debbugs.gnu.org as requested.] On Fri, May 11, 2012 21:36, Pádraig Brady wrote: > On 05/11/2012 08:45 PM, Mark wrote: >> ... >> I'm using kernel 3.0.0-19-generic #32-Ubuntu here. But probably more relevant, the partition I tested on was ~1.7TB ext4 on an external USB 2.0 >> drive which was almost full and probably *very* fragmented, i.e. free space spread all over the disk in thousands of small chunks. ext4 seems to be pretty slow at allocating space in that case. > But you asked to fallocate(10GB). > That should have failed immediately, > because it's bigger than the file system > Could you run the fallocate loop from my previous mail, > across an appropriate range of sizes, just to confirm, > and maybe prepare input to a kernel bug report? Unfortunately I have freed up a lot of space on the partition in question, but I did some testing with a new small empty test partition, see below. >> If I were designing a filesystem, I'd have it immediately return failure >> if fallocate() is specified with additional size larger than the amount of >> free space. Though for the filesystem, determining how much extra space a >> fallocate() call would need can be quite involved in some cases and require a significant amount of disk access... >> Imagine a huge sparse file with many thousands of holes, and the requested >> region for fallocate() serving to "fill in" many of the holes. But any non-hole parts within the fallocate() region would reduce the amount of additional space required for fallocate() to succeed. So it's not as simple as comparing length of fallocate() region with amount of free space... >> Unless you're creating a new file, which is what cp does most of the time. >> So maybe a workaround could be added to cp. If --preallocate is specified, >> cp could check the amount of free space before writing to the >> destination >> file and abort without even needing to call fallocate() if there isn't enough. (In fact, cp could do that anyway in most cases I think?) > > Good analysis. Still though for a new file the file system > should be able to to do the simple short calculation > of fallocate_request - free_space > 0. Yes. From the filesystem's perspective it could easily immediately fail fallocate() in some cases. When fallocate() is called, the approximate space needed for success is (length of region passed to fallocate) - (amount already allocated to file which overlaps the fallocate region). The worst case is that the entire overlap between the fallocate region and the file is a hole. Then space needed = fallocate region size. However the filesystem could narrow it down a little. Consider the difference between a file's apparent size and its on-disk size. Roughly size of all holes = apparent size - on-disk size The space needed will not be more than: (size of fallocate region past end of file) + min((size of fallocate region overlapping file), size of all holes) So the fallocate() call could be failed without doing any work if there is less free space than that. > I can understand inefficiencies in fallocating > around the free space limit, but otherwise > this seems like a bug in ext4. > (maybe a regression since I don't see it on my ext4 system). I also saw slowness (i.e. a lot of disk I/O) when writing a file to that ext4 partition without pre-allocation. It seemed like once the file got above a certain size, there was a *lot* of disk I/O for several seconds. Or maybe all the disk I/O happened when the kernel writeback kicked in. Perhaps that's related to the fallocate() slowness, if the filesystem had to seek all over the disk again and again to find free space. As I mentioned though, that filesystem was/is *very* fragmented, with a relatively small proportion of free space. I might not be able to reproduce the issue now, because I shifted about 250GB of data off that partition the other day. Also worth noting, most ext4 partitions have a certain percentage reserved for the root user. Maybe testing should be done either on a partition with no reserved space, or as root. I posted to the ext4 list a while ago about fallocate() behaviour. Not specifically about this issue, but the fact that it's not atomic. If it fails due to lack of space, it allocates all space on the partition with no way to easily undo that. For example, suppose the user has a very large sparse file with large holes, maybe a virtual machine hard disk image or similar. User wants to make the file non-sparse so calls fallocate() to allocate all the holes. But it turns out there wasn't enough disk space for that to succeed. Or maybe some other program allocated a lot of space in the mean time. fallocate() fails after allocating all remaining space on the partition. Other than deleting the file, the user would need to roll their own hole-punching program to reclaim space. You can read the thread about that at http://comments.gmane.org/gmane.comp.file-systems.ext4/29942 Back to the brief test I mentioned above. See the attached file. That doesn't demonstrate any slowness (since the test was done on a very small empty partition), but does demonstrate ext4 fallocate() not returning an error on attempting to allocate more than the partition size. Instead it allocates all remaining space then fails, leaving the user to manually fix things afterwards. Regards, -- Mark