On 30/12/2022 15:33, Pádraig Brady wrote: > On 29/12/2022 16:04, Braiam wrote: >> When using a nfs export, cp seems to not try hard enough using >> copy_file_range(). This was >> the conclusion we arrived in this forum thread[1] at Truenas forums. >> It was found a way to force >> cp to use it, but it should not be necessary, since there are supposed >> to be fallbacks. >> >> I'm unsure if we found something that triggered the undesired behavior. >> >> [1]: https://www.truenas.com/community/threads/nfs-does-not-support-server-side-copy-with-zfs.103309/post-712071 >> > > Currently we don't use copy_file_range() with sparse files, > as per the Linux man page "copy_file_range() may expand any > holes existing in the requested range". > I confirmed that copy_file_range() definitely expands holes > on ext4 on Linux 5.7 at least. > Also the FreeBSD man page says "this system call attempts > to maintain holes in the output file for the byte range being copied. > However, this does not always work well." > > Now maybe we should give precedence to server side copy for remote file systems, > though that would be optimizing runtime and network usage > while potentially pessimizing storage usage > if holes were expanded in the destination. > > Now fundamentally copy_file_range() isn't restricted from > maintaining holes from source to destination, > which suggests we could give precedence to copy_file_range() > on remote file systems. > > Also perhaps we can improve the heuristic somehow, > even again just for remote file systems. > Maybe determine a file is sparse on remote systems > only if it seems that more than half of the file is sparse. > > For completeness, one might be wondering why we're using > copy_file_range() at all with --sparse=never, as that syscall > doesn't guarantee sparseness. However in this case we > only use copy_file_range() with the portion of the file > considered non sparse (and manually write the zeros)¹. > > So in summary pseudo code could be: > > sparse_file = blocks < size/blocksize; > very_sparse_file = blocks < (size/2)/blocksize; > > if ( (!possible_hole_in_range || sparse_mode=auto) > && reflinks_allowed > && ( > (remote_file && !very_sparse_file) > || (!remote_file && !sparse_file) > ) > ) > copy_file_range(...); > > > Note `stat ; df -T ` would > give us some concrete heuristics for your case at least. > Note it would be useful to get such stats from files > that were completely copied by copy_file_range() on your systems > (i.e. not by cp), to see if holes were expanded for your case. > > cheers, > Pádraig > > ¹ Actually I now notice that where SEEK_HOLE is _not_ available > and copy_file_range() is, then `cp --sparse=never` would not be honored, > as copy_file_range() would expand the holes (confirmed on ext4 at least). > Now I don't know of any practical systems having copy_file_range() > and not having SEEK_HOLE, but I might add the constraint to the code, > at least for clarification reasons. Looking a bit closer at things, the current code does something like: if file appears sparse use SEEK_DATA to iterate over data parts, try writing data with copy_file_range() if sparse_mode == never (as thinking was there is no need to look for additional holes) else use copy_file_range if sparse_never || sparse_auto Now the use of copy_file_range() with sparse=never may be problematic, as assumes copy_file_range() won't add additional holes. This is problematic in first case as SEEK_HOLE is best effort, and in the second (else) case as file sparseness is a heuristic. Also from above analysis the OP's case seems to be first sparse case, as in second (else) case, with --sparse=auto, copy_file_range() would have been used (as file appears non sparse (so plain scan used)). So while we theoretically should only be allowing copy_file_range() with --sparse=auto, we definitely should not be precluding it in the first case with that sparse mode. I.e. for the first case we should only disallow copy_file_range() with sparse=ALWAYS, because --sparse=auto is best effort, and we've enough signal from SEEK_DATA as to the likelihood of processing holes or not. The attached makes that change. It would be great if you could test that on your systems. To that end you might find the following tarball useful, which has the patch already applied: https://pixelbeat.org/cu/coreutils-9.1.109-bd17b.tar.xz which can be built like: tar -xf coreutils-9.1.109-bd17b.tar.xz cd coreutils-9.1.109-bd17b/ ./configure && make -j $(nproc) then tested with: src/cp ... cheers, Pádraig