On 19/11/2022 08:14, Korn Andras wrote: > Hi, > > on zfs, newfstatat() can return an st_blksize that is approximately equal to the file size in bytes (if the file fit into a single zfs record). > > The block size for filesystems can also be quite large (currently, up to 16M). > > The code at https://github.com/coreutils/coreutils/blob/master/src/copy.c#L1343 will try to compute the least common multiple of the input and output block sizes, then allocate a buffer of that size using mmap(). With such unusual block sizes, the least common multiple can be very large, causing the mmap() to return ENOMEM and cp to abort with "memory exhausted". > > Example: > > openat(AT_FDCWD, "tmp", O_RDONLY|O_PATH|O_DIRECTORY) = 3 > newfstatat(AT_FDCWD, "usr/src/zfs-2.1.6/configure", {st_dev=makedev(0, 0x30), st_ino=29267, st_mode=S_IFREG|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=2647552, st_blocks=5177, st_size=2647046, st_atime=1668786119 /* 2022-11-18T16:41:59.760703544+0100 */, st_atime_nsec=760703544, st_mtime=1667705386 /* 2022-11-06T04:29:46+0100 */, st_mtime_nsec=0, st_ctime=1668785062 /* 2022-11-18T16:24:22.866737598+0100 */, st_ctime_nsec=866737598}, AT_SYMLINK_NOFOLLOW) = 0 > newfstatat(3, "configure", {st_dev=makedev(0, 0x32), st_ino=3838, st_mode=S_IFREG|0700, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4194304, st_blocks=1, st_size=0, st_atime=1668788205 /* 2022-11-18T17:16:45.416328293+0100 */, st_atime_nsec=416328293, st_mtime=1668788255 /* 2022-11-18T17:17:35.809758585+0100 */, st_mtime_nsec=809758585, st_ctime=1668788255 /* 2022-11-18T17:17:35.809758585+0100 */, st_ctime_nsec=809758585}, 0) = 0 > openat(AT_FDCWD, "usr/src/zfs-2.1.6/configure", O_RDONLY|O_NOFOLLOW) = 4 > newfstatat(4, "", {st_dev=makedev(0, 0x30), st_ino=29267, st_mode=S_IFREG|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=2647552, st_blocks=5177, st_size=2647046, st_atime=1668786119 /* 2022-11-18T16:41:59.760703544+0100 */, st_atime_nsec=760703544, st_mtime=1667705386 /* 2022-11-06T04:29:46+0100 */, st_mtime_nsec=0, st_ctime=1668785062 /* 2022-11-18T16:24:22.866737598+0100 */, st_ctime_nsec=866737598}, AT_EMPTY_PATH) = 0 > openat(3, "configure", O_WRONLY|O_TRUNC) = 5 > ioctl(5, BTRFS_IOC_CLONE or FICLONE, 4) = -1 EXDEV (Invalid cross-device link) > newfstatat(5, "", {st_dev=makedev(0, 0x32), st_ino=3838, st_mode=S_IFREG|0700, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4194304, st_blocks=1, st_size=0, st_atime=1668788205 /* 2022-11-18T17:16:45.416328293+0100 */, st_atime_nsec=416328293, st_mtime=1668788266 /* 2022-11-18T17:17:46.326056957+0100 */, st_mtime_nsec=326056957, st_ctime=1668788266 /* 2022-11-18T17:17:46.326056957+0100 */, st_ctime_nsec=326056957}, AT_EMPTY_PATH) = 0 > fadvise64(4, 0, 0, POSIX_FADV_SEQUENTIAL) = 0 > copy_file_range(4, NULL, 5, NULL, 9223372035781033984, 0) = -1 EXDEV (Invalid cross-device link) > mmap(NULL, 21688754176, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) > brk(0x55a7fa4b0000) = 0x55a2ed8ab000 > mmap(NULL, 21689794560, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) > openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory) > write(2>, "cp: ", 4) = 4 > write(2>, "memory exhausted", 16) = 16 > write(2>, "\n", 1) = 1 > lseek(0>, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) > close(0>) = 0 > close(1>) = 0 > close(2>) = 0 > exit_group(1) = ? > > I think cp(1) shouldn't insist on using the lcm if it's very large; additionally, it should never try to allocate a buffer that is (much) larger than the source file. > > Perhaps you could also try to use successively smaller buffers if allocating a large one fails? Thank you for the detailed report. This buffer_lcm code has been present for a long time, but I agree in the presence of disparate st_blksize values it can result in inappropriate buffer sizes. In your case: src st_blksize = 2647552 (0x286600) (st_size = 2647046) dst st_blksize = 4194304 (0x400000) (st_size = 0) Which implies a lowest common multiple of 21688745984 Once the st_blksize is based on st_size we can get very large values like this. In the zfs case it seems to set st_blksize to st_size rounded to next multiple of 512. The proposed patch attached removes the use of buffer_lcm() and just picks the largest st_blksize, which would be 4MiB in your case. It also limits the max buffer size to 32MiB in the edge case where st_blksize returns a larger value that this. cheers, Pádraig