GNU bug report logs -
#79139
cp --reflink truncates sparse files on ZFS
Previous Next
To reply to this bug, email your comments to 79139 AT debbugs.gnu.org.
There is no need to reopen the bug first.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Fri, 01 Aug 2025 15:02:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Leah Neukirchen <leah <at> vuxu.org>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Fri, 01 Aug 2025 15:02:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello,
I found the following issue with coreutils 9.7, Linux 6.12.40-1-lts,
zfs 2.3.3 on Arch x86_64, glibc 2.42:
Copying a file with sparse holes using "cp --reflink=auto" truncates
the file before the final segment. The relevant strace is:
openat(AT_FDCWD, "celestis.img", O_RDONLY|O_PATH|O_DIRECTORY) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/.zfs/snapshot/pre-fixup/var/lib/libvirt/images/celestis.img", {st_mode=S_IFREG|0644,>
openat(AT_FDCWD, "/.zfs/snapshot/pre-fixup/var/lib/libvirt/images/celestis.img", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=137438953472, ...}) = 0
openat(AT_FDCWD, "celestis.img", O_WRONLY|O_CREAT|O_EXCL, 0644) = 4
ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = -1 EXDEV (Invalid cross-device link)
fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
lseek(3, 0, SEEK_DATA) = 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
lseek(3, 0, SEEK_HOLE) = 131072
lseek(3, 0, SEEK_SET) = 0
copy_file_range(3, NULL, 4, NULL, 131072, 0) = 131072
lseek(3, 131072, SEEK_DATA) = 1048576
lseek(3, 1048576, SEEK_HOLE) = 1179648
lseek(3, 1048576, SEEK_SET) = 1048576
lseek(4, 917504, SEEK_CUR) = 1048576
copy_file_range(3, NULL, 4, NULL, 131072, 0) = 131072
lseek(3, 1179648, SEEK_DATA) = 4194304
lseek(3, 4194304, SEEK_HOLE) = 16646144
lseek(3, 4194304, SEEK_SET) = 4194304
lseek(4, 3014656, SEEK_CUR) = 4194304
copy_file_range(3, NULL, 4, NULL, 12451840, 0) = 12451840
lseek(3, 16646144, SEEK_DATA) = 134217728
lseek(3, 134217728, SEEK_HOLE) = 137438953472
lseek(3, 134217728, SEEK_SET) = 134217728
lseek(4, 117571584, SEEK_CUR) = 134217728
copy_file_range(3, NULL, 4, NULL, 137304735744, 0) = 137304735744
mmap(NULL, 270336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76c5df0ce000
read(3, "", 262144) = 0
ftruncate(4, 134217728) = 0
close(4) = 0
close(3) = 0
As we can see, there's a hole from 16646144 to 134217728, then data up
to the end at 137438953472 (= the total file size). Both fd are thus
moved to 134217728, and a copy_file_range for the rest of the file is
issued and successful.
However, in the end the file is truncated to the first 128MB... why?
For comparison, a plain cat simply does this:
fstat(1, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
openat(AT_FDCWD, "/.zfs/snapshot/pre-fixup/var/lib/libvirt/images/celestis.img", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=137438953472, ...}) = 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
copy_file_range(3, NULL, 1, NULL, 9223372035781033984, 0) = 137438953472
mmap(NULL, 270336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7e7af83a9000
read(3, "", 262144) = 0
munmap(0x7e7af83a9000, 270336) = 0
close(3) = 0
close(1) = 0
close(2) = 0
This works correctly and the source and destination agree in the end.
Likewise for xcp(1), which uses copy_file_range in 1MB blocks by
default and does not care for holes.
Thus I think this is a logic bug in cp and not a ZFS issue.
Do not hesitate to contact me if you inquire further details.
Thanks,
--
Leah Neukirchen <leah <at> vuxu.org> https://leahneukirchen.org/
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Fri, 01 Aug 2025 17:12:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 79139 <at> debbugs.gnu.org (full text, mbox):
On 01/08/2025 16:00, Leah Neukirchen wrote:
> Hello,
>
> I found the following issue with coreutils 9.7, Linux 6.12.40-1-lts,
> zfs 2.3.3 on Arch x86_64, glibc 2.42:
>
> Copying a file with sparse holes using "cp --reflink=auto" truncates
> the file before the final segment. The relevant strace is:
>
> openat(AT_FDCWD, "celestis.img", O_RDONLY|O_PATH|O_DIRECTORY) = -1 ENOENT (No such file or directory)
> newfstatat(AT_FDCWD, "/.zfs/snapshot/pre-fixup/var/lib/libvirt/images/celestis.img", {st_mode=S_IFREG|0644,>
> openat(AT_FDCWD, "/.zfs/snapshot/pre-fixup/var/lib/libvirt/images/celestis.img", O_RDONLY) = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=137438953472, ...}) = 0
> openat(AT_FDCWD, "celestis.img", O_WRONLY|O_CREAT|O_EXCL, 0644) = 4
> ioctl(4, BTRFS_IOC_CLONE or FICLONE, 3) = -1 EXDEV (Invalid cross-device link)
> fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> lseek(3, 0, SEEK_DATA) = 0
> fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
> lseek(3, 0, SEEK_HOLE) = 131072
> lseek(3, 0, SEEK_SET) = 0
> copy_file_range(3, NULL, 4, NULL, 131072, 0) = 131072
> lseek(3, 131072, SEEK_DATA) = 1048576
> lseek(3, 1048576, SEEK_HOLE) = 1179648
> lseek(3, 1048576, SEEK_SET) = 1048576
> lseek(4, 917504, SEEK_CUR) = 1048576
> copy_file_range(3, NULL, 4, NULL, 131072, 0) = 131072
> lseek(3, 1179648, SEEK_DATA) = 4194304
> lseek(3, 4194304, SEEK_HOLE) = 16646144
> lseek(3, 4194304, SEEK_SET) = 4194304
> lseek(4, 3014656, SEEK_CUR) = 4194304
> copy_file_range(3, NULL, 4, NULL, 12451840, 0) = 12451840
> lseek(3, 16646144, SEEK_DATA) = 134217728
> lseek(3, 134217728, SEEK_HOLE) = 137438953472
> lseek(3, 134217728, SEEK_SET) = 134217728
> lseek(4, 117571584, SEEK_CUR) = 134217728
> copy_file_range(3, NULL, 4, NULL, 137304735744, 0) = 137304735744
> mmap(NULL, 270336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76c5df0ce000
> read(3, "", 262144) = 0
> ftruncate(4, 134217728) = 0
> close(4) = 0
> close(3) = 0
>
> As we can see, there's a hole from 16646144 to 134217728, then data up
> to the end at 137438953472 (= the total file size). Both fd are thus
> moved to 134217728, and a copy_file_range for the rest of the file is
> issued and successful.
>
> However, in the end the file is truncated to the first 128MB... why?
>
> For comparison, a plain cat simply does this:
>
> fstat(1, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> openat(AT_FDCWD, "/.zfs/snapshot/pre-fixup/var/lib/libvirt/images/celestis.img", O_RDONLY) = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=137438953472, ...}) = 0
> fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
> copy_file_range(3, NULL, 1, NULL, 9223372035781033984, 0) = 137438953472
> mmap(NULL, 270336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7e7af83a9000
> read(3, "", 262144) = 0
> munmap(0x7e7af83a9000, 270336) = 0
> close(3) = 0
> close(1) = 0
> close(2) = 0
>
> This works correctly and the source and destination agree in the end.
> Likewise for xcp(1), which uses copy_file_range in 1MB blocks by
> default and does not care for holes.
>
> Thus I think this is a logic bug in cp and not a ZFS issue.
>
> Do not hesitate to contact me if you inquire further details.
I haven't tried to repro yet.
The syscalls look OK, so the only thing I can think of
is that last large copy_file_range() syscall didn't propagate appropriately back up to cp,
as there is a fallback read() which should not have occurred.
What is the destination file system type?
It would give us a bit more info if you gave the output from cp
when run with the --debug option.
thank you,
Padraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Fri, 01 Aug 2025 17:34:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 79139 <at> debbugs.gnu.org (full text, mbox):
I debugged this further:
The issue boils down to several things that happen rarely:
- source and destination must be on different mountpoints, so FICLONE fails
- the fallback copy_file_range usually copies at most 2GB segments on ZFS,
however it seems to be able to copy more at once when copying from a
snapshot.
The problem now is that the return value is interpreted as a negative
number. It's not clear to me how that happens, as ssize_t should be a
signed 64-bit number and contain the value fine, however, gdb also agrees:
Breakpoint 1, copy_file_range (infd=infd <at> entry=3, pinoff=pinoff <at> entry=0x0, outfd=outfd <at> entry=4, poutoff=poutoff <at> entry=0x0, length=137304735744,
flags=flags <at> entry=0) at ../sysdeps/unix/sysv/linux/copy_file_range.c:27
27 {
(gdb) fin
Run till exit from #0 copy_file_range (infd=infd <at> entry=3, pinoff=pinoff <at> entry=0x0, outfd=outfd <at> entry=4, poutoff=poutoff <at> entry=0x0, length=137304735744,
flags=flags <at> entry=0) at ../sysdeps/unix/sysv/linux/copy_file_range.c:27
sparse_copy (src_fd=src_fd <at> entry=3, dest_fd=dest_fd <at> entry=4, abuf=abuf <at> entry=0x7fffffffd9d8, buf_size=buf_size <at> entry=262144, hole_size=0,
punch_holes=punch_holes <at> entry=true, allow_reflink=true, src_name=0x7fffffffe3d7 "/.zfs/snapshot/pre-fixup/var/lib/libvirt/images/celestis.img",
dst_name=0x7fffffffe414 "celestis.img", max_n_read=137304735744, total_n_read=0x7fffffffd9e0, last_write_made_hole=0x7fffffffd9d0) at src/copy.c:344
344 if (n_copied == 0)
Value returned is $2 = -134217728
Then the error branch is triggered and the code falsely reads errno
(which is 18 from the failed FICLONE) so is_CLONENOTSUP is true, we
leave the loop without error reporting, total_n_read is still 0,
etc... and it ends up truncating the file thinking the file has
shrunk. Unfortunate.
I think the return value gets corrupted in glibc, see:
https://github.com/bminor/glibc/blob/d9a348d0927c7a1aec5caf3df3fcd36956b3eb23/nptl/cancellation.c#L66
long int
__syscall_cancel (__syscall_arg_t a1, __syscall_arg_t a2,
__syscall_arg_t a3, __syscall_arg_t a4,
__syscall_arg_t a5, __syscall_arg_t a6,
__SYSCALL_CANCEL7_ARG_DEF __syscall_arg_t nr)
{
int r = __internal_syscall_cancel (a1, a2, a3, a4, a5, a6,
__SYSCALL_CANCEL7_ARG nr);
return __glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (r))
? SYSCALL_ERROR_LABEL (INTERNAL_SYSCALL_ERRNO (r))
: r;
}
Here, r should be a long int.
As a workaround, copy_max could be clamped to 2GB.
P.S.: why does coreutils cat not fail as well? It checks the return
value against -1, which it is not...
--
Leah Neukirchen <leah <at> vuxu.org> https://leahneukirchen.org/
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Fri, 01 Aug 2025 21:41:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 79139 <at> debbugs.gnu.org (full text, mbox):
On 01/08/2025 18:33, Leah Neukirchen wrote:
> I debugged this further:
>
> The issue boils down to several things that happen rarely:
> - source and destination must be on different mountpoints, so FICLONE fails
> - the fallback copy_file_range usually copies at most 2GB segments on ZFS,
> however it seems to be able to copy more at once when copying from a
> snapshot.
>
> The problem now is that the return value is interpreted as a negative
> number. It's not clear to me how that happens, as ssize_t should be a
> signed 64-bit number and contain the value fine, however, gdb also agrees:
>
> Breakpoint 1, copy_file_range (infd=infd <at> entry=3, pinoff=pinoff <at> entry=0x0, outfd=outfd <at> entry=4, poutoff=poutoff <at> entry=0x0, length=137304735744,
> flags=flags <at> entry=0) at ../sysdeps/unix/sysv/linux/copy_file_range.c:27
> 27 {
> (gdb) fin
> Run till exit from #0 copy_file_range (infd=infd <at> entry=3, pinoff=pinoff <at> entry=0x0, outfd=outfd <at> entry=4, poutoff=poutoff <at> entry=0x0, length=137304735744,
> flags=flags <at> entry=0) at ../sysdeps/unix/sysv/linux/copy_file_range.c:27
> sparse_copy (src_fd=src_fd <at> entry=3, dest_fd=dest_fd <at> entry=4, abuf=abuf <at> entry=0x7fffffffd9d8, buf_size=buf_size <at> entry=262144, hole_size=0,
> punch_holes=punch_holes <at> entry=true, allow_reflink=true, src_name=0x7fffffffe3d7 "/.zfs/snapshot/pre-fixup/var/lib/libvirt/images/celestis.img",
> dst_name=0x7fffffffe414 "celestis.img", max_n_read=137304735744, total_n_read=0x7fffffffd9e0, last_write_made_hole=0x7fffffffd9d0) at src/copy.c:344
> 344 if (n_copied == 0)
> Value returned is $2 = -134217728
>
> Then the error branch is triggered and the code falsely reads errno
> (which is 18 from the failed FICLONE) so is_CLONENOTSUP is true, we
> leave the loop without error reporting, total_n_read is still 0,
> etc... and it ends up truncating the file thinking the file has
> shrunk. Unfortunate.
>
> I think the return value gets corrupted in glibc, see:
> https://github.com/bminor/glibc/blob/d9a348d0927c7a1aec5caf3df3fcd36956b3eb23/nptl/cancellation.c#L66
>
> long int
> __syscall_cancel (__syscall_arg_t a1, __syscall_arg_t a2,
> __syscall_arg_t a3, __syscall_arg_t a4,
> __syscall_arg_t a5, __syscall_arg_t a6,
> __SYSCALL_CANCEL7_ARG_DEF __syscall_arg_t nr)
> {
> int r = __internal_syscall_cancel (a1, a2, a3, a4, a5, a6,
> __SYSCALL_CANCEL7_ARG nr);
> return __glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (r))
> ? SYSCALL_ERROR_LABEL (INTERNAL_SYSCALL_ERRNO (r))
> : r;
> }
>
> Here, r should be a long int.
>
> As a workaround, copy_max could be clamped to 2GB.
>
> P.S.: why does coreutils cat not fail as well? It checks the return
> value against -1, which it is not...
Ouch. As I suspected, the info doesn't seem to be propagated from the syscall appropriately.
The distinction between -1 and < 0 isn't useful I think since
the value returned could be just truncated to a positive value.
I guess all we can do is limit copy_max to INT_MAX for now.
Could you log this with https://sourceware.org/bugzilla/
and reference the bug number here?
thank you,
Padraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Fri, 01 Aug 2025 21:57:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 79139 <at> debbugs.gnu.org (full text, mbox):
> Could you log this with https://sourceware.org/bugzilla/
> and reference the bug number here?
>
> thank you,
> Padraig
This is https://sourceware.org/bugzilla/show_bug.cgi?id=33245
and a patch is at
https://sourceware.org/pipermail/libc-alpha/2025-August/169096.html
--
Leah Neukirchen <leah <at> vuxu.org> https://leahneukirchen.org/
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Fri, 01 Aug 2025 21:58:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 79139 <at> debbugs.gnu.org (full text, mbox):
On 2025-08-01 14:40, Pádraig Brady wrote:
> Could you log this with https://sourceware.org/bugzilla/
He already did that, here:
https://sourceware.org/bugzilla/show_bug.cgi?id=33245
I should have a Gnulib fix shortly.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Fri, 01 Aug 2025 22:06:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 79139 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert <eggert <at> cs.ucla.edu> writes:
> On 2025-08-01 14:40, Pádraig Brady wrote:
>> Could you log this with https://sourceware.org/bugzilla/
>
> He already did that, here:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=33245
That is an unfortunate bug. Thanks and good catch Leah.
> I should have a Gnulib fix shortly.
Thanks! I was hoping that file could be made a tiny stub, due to the
workarounds for Linux 4.19 being mostly unnecessary now that it is EOL.
But now we have a new problem to deal with. :)
Collin
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Sat, 02 Aug 2025 00:18:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Leah Neukirchen <leah <at> vuxu.org>
:
bug acknowledged by developer.
(Sat, 02 Aug 2025 00:18:02 GMT)
Full text and
rfc822 format available.
Message #28 received at 79139-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 2025-08-01 15:05, Collin Funk wrote:
> I was hoping that file could be made a tiny stub, due to the
> workarounds for Linux 4.19 being mostly unnecessary now that it is EOL.
> But now we have a new problem to deal with. :)
That we do. But we can more thorougly stubify the old Linux kernel bug
workaround while we're in the neighborhood. Probably best not to remove
it entirely as RHEL 8 still uses the no-longer-supported kernel.
To do that, I installed the attached patches into Gnulib and propagated
them into coreutils.
Boldly closing the bug report. Thanks, Leah, for reporting it. That one
was quite a whopper.
[0001-copy-file-range-tune-for-more-modern-kernels.patch (text/x-patch, attachment)]
[0002-copy-file-range-work-around-glibc-bug-33245.patch (text/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Sat, 02 Aug 2025 03:57:01 GMT)
Full text and
rfc822 format available.
Message #31 received at 79139-done <at> debbugs.gnu.org (full text, mbox):
Paul Eggert <eggert <at> cs.ucla.edu> writes:
> On 2025-08-01 15:05, Collin Funk wrote:
>> I was hoping that file could be made a tiny stub, due to the
>> workarounds for Linux 4.19 being mostly unnecessary now that it is EOL.
>> But now we have a new problem to deal with. :)
>
> That we do. But we can more thorougly stubify the old Linux kernel bug
> workaround while we're in the neighborhood. Probably best not to
> remove it entirely as RHEL 8 still uses the no-longer-supported
> kernel.
Good point. I agree.
> +# if defined __GLIBC__ && ! (2 < __GLIBC__ + (43 <= __GLIBC_MINOR__))
> + /* Work around glibc bug 33245
> + <https://sourceware.org/bugzilla/show_bug.cgi?id=33245>.
> + This bug is present in glibc 2.42 (2025) and fixed in 2.43,
> + so this workaround, and the configure-time check for glibc,
> + can be removed once glibc 2.42 and earlier is no longer a
> + consideration. Perhaps in 2040. */
> + if (SYS_BUFSIZE_MAX < length)
> + length = SYS_BUFSIZE_MAX;
> +# endif
Can't we make this condition only occur for glibc 2.41 and glibc 2.42?
The issue shouldn't occur before commit
89b53077d2a58f00e7debdfe58afabe953dac60d in glibc (2024-06-25). Before
that commit SYSCALL_CANCEL was defined as the following:
#define SYSCALL_CANCEL(...) \
({ \
long int sc_ret; \
if (NO_SYSCALL_CANCEL_CHECKING) \
sc_ret = INLINE_SYSCALL_CALL (__VA_ARGS__); \
else \
{ \
int sc_cancel_oldtype = LIBC_CANCEL_ASYNC (); \
sc_ret = INLINE_SYSCALL_CALL (__VA_ARGS__); \
LIBC_CANCEL_RESET (sc_cancel_oldtype); \
} \
sc_ret; \
})
So returning 'long int' like the now fixed version.
Also, I assume this bug will cause problems in any syscall returning
ssize_t (e.g. read, write, send).
Collin
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Sat, 02 Aug 2025 04:16:02 GMT)
Full text and
rfc822 format available.
Message #34 received at 79139-done <at> debbugs.gnu.org (full text, mbox):
Paul Eggert wrote:
> +# if defined __GLIBC__ && ! (2 < __GLIBC__ + (43 <= __GLIBC_MINOR__))
This line is mis-indented.
> + /* Work around glibc bug 33245
It would be good to document the workaround in
doc/glibc-functions/copy_file_range.texi.
Collin Funk wrote:
> Can't we make this condition only occur for glibc 2.41 and glibc 2.42?
> The issue shouldn't occur before commit
> 89b53077d2a58f00e7debdfe58afabe953dac60d in glibc (2024-06-25).
Users are supposed to be able to create binaries with an older version of
glibc, then upgrade their glibc. The binaries should continue to work.
Bruno
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Sat, 02 Aug 2025 04:40:02 GMT)
Full text and
rfc822 format available.
Message #37 received at 79139-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Bruno Haible <bruno <at> clisp.org> writes:
>> + /* Work around glibc bug 33245
>
> It would be good to document the workaround in
> doc/glibc-functions/copy_file_range.texi.
Yep, I noticed as well. Just wanted to make sure I wasn't
misunderstanding the versions before doing it myself. Done with the
attached patch now.
> Collin Funk wrote:
>> Can't we make this condition only occur for glibc 2.41 and glibc 2.42?
>> The issue shouldn't occur before commit
>> 89b53077d2a58f00e7debdfe58afabe953dac60d in glibc (2024-06-25).
>
> Users are supposed to be able to create binaries with an older version of
> glibc, then upgrade their glibc. The binaries should continue to work.
Right. I seemed to have forgot that every program isn't statically
linked to glibc... Thanks.
Collin
[0001-doc-Mention-the-copy_file_range-bug.patch (text/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Sat, 02 Aug 2025 06:00:02 GMT)
Full text and
rfc822 format available.
Message #40 received at 79139-done <at> debbugs.gnu.org (full text, mbox):
On 2025-08-01 20:56, Collin Funk wrote:
> Also, I assume this bug will cause problems in any syscall returning
> ssize_t (e.g. read, write, send).
It could well do that, yes. I suspect I haven't run into it because the
programs I help maintain respect SYS_BUFSIZE_MAX in their calls to those
other functions.
For now I suppose we could just document the bug as something Gnulib
doesn't fix for those functions. If we ever run into this being a real
problem, I suppose we could implement Gnulib workarounds though I hope
we don't have to because they'd be a real pain due to EINTR, other errno
values, programs expecting no partial reads on regular files, etc.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Sat, 02 Aug 2025 08:33:03 GMT)
Full text and
rfc822 format available.
Message #43 received at 79139 <at> debbugs.gnu.org (full text, mbox):
On 02/08/2025 05:39, Collin Funk wrote:
> Bruno Haible <bruno <at> clisp.org> writes:
>
>>> + /* Work around glibc bug 33245
>>
>> It would be good to document the workaround in
>> doc/glibc-functions/copy_file_range.texi.
>
> Yep, I noticed as well. Just wanted to make sure I wasn't
> misunderstanding the versions before doing it myself. Done with the
> attached patch now.
>
>> Collin Funk wrote:
>>> Can't we make this condition only occur for glibc 2.41 and glibc 2.42?
>>> The issue shouldn't occur before commit
>>> 89b53077d2a58f00e7debdfe58afabe953dac60d in glibc (2024-06-25).
>>
>> Users are supposed to be able to create binaries with an older version of
>> glibc, then upgrade their glibc. The binaries should continue to work.
>
> Right. I seemed to have forgot that every program isn't statically
> linked to glibc... Thanks.
Thanks for the prompt fixes everyone.
I think the current gnulib code is good enough,
but it's worth mentioning run-time vs build-time checks.
For data corruption bugs we should be extra wary.
Consider build hosts with new glibc building binaries
to be run on older glibc (perhaps in containers etc.)
We saw such issues with cp before, with the kernel version check:
https://github.com/coreutils/gnulib/commit/fb034b35eb
Now the kernel binary interface does have more stringent
compat guarantees, than library interfaces like glibc,
so this is less of a concern than for kernel version checks.
For reference I made some notes on various version compat at:
http://pixelbeat/programming/linux_binary_compatibility.html
The thrust of that is that building on older systems
should produce binaries that work on newer ones,
and the current gnulib patch caters for that.
For reference if we did want to be extra defensive
for this silent data corruption bug, I suppose we could
check the glibc version at runtime with something like:
static signed char libc_ok;
if (! libc_ok)
{
#if 2 < __GLIBC__ + (8 <= __GLIBC_MINOR__)
#include <gnu/libc-version.h>
char const * glibc_ver = gnu_get_libc_version();
libc_ok = (strcmp (glibc_ver, "2.41") != 0 && strcmp (glibc_ver, "2.42") != 0)
? 1 : -1;
#else
libc_ok = 1;
#endif
}
Other reasons that the above might be overkill;
the gnulib workaround isn't too onerous as SYS_BUFZISE_MAX is large,
and I expect the glibc fix will be backported to glibc 2.41 systems promptly anyway.
cheers,
Padraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Sat, 02 Aug 2025 16:04:01 GMT)
Full text and
rfc822 format available.
Message #46 received at 79139 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 2025-08-02 01:32, Pádraig Brady wrote:
> it's worth mentioning run-time vs build-time checks.
Yes, and this could be documented more. I installed the attached.
> For reference I made some notes on various version compat at:
> http://pixelbeat/programming/linux_binary_compatibility.html
I needed to use this URL:
https://www.pixelbeat.org/programming/linux_binary_compatibility.html
> the gnulib workaround isn't too onerous as SYS_BUFZISE_MAX is large,
> and I expect the glibc fix will be backported to glibc 2.41 systems
> promptly anyway.
Yes, I went through similar thought processes. It didn't seem worth the
hassle to do the extra glibc runtime checks. Gnulib has always used
static checks for glibc versions, even in areas where this is serious
business (e.g., malloc misbehavior). So far, nobody has reported an
issue for this. Maybe people who build for older kernels (which is
dubious if you ask me) aren't building for older glibcs (which is even
more dubious).
[0001-More-copy_file_range-commentary.patch (text/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Sat, 02 Aug 2025 16:48:02 GMT)
Full text and
rfc822 format available.
Message #49 received at 79139 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert <eggert <at> cs.ucla.edu> writes:
> On 2025-08-02 01:32, Pádraig Brady wrote:
>
>> it's worth mentioning run-time vs build-time checks.
>
> Yes, and this could be documented more. I installed the attached.
>
>> For reference I made some notes on various version compat at:
>> http://pixelbeat/programming/linux_binary_compatibility.html
>
> I needed to use this URL:
>
> https://www.pixelbeat.org/programming/linux_binary_compatibility.html
>
>> the gnulib workaround isn't too onerous as SYS_BUFZISE_MAX is large,
>> and I expect the glibc fix will be backported to glibc 2.41 systems
>> promptly anyway.
>
> Yes, I went through similar thought processes. It didn't seem worth
> the hassle to do the extra glibc runtime checks. Gnulib has always
> used static checks for glibc versions, even in areas where this is
> serious business (e.g., malloc misbehavior). So far, nobody has
> reported an issue for this. Maybe people who build for older kernels
> (which is dubious if you ask me) aren't building for older glibcs
> (which is even more dubious).
>
> [2. text/x-patch; 0001-More-copy_file_range-commentary.patch]...
You can shorten sourceware (and gcc) bug URLs to:
https://sourceware.org/PR123456
https://gcc.gnu.org/PR123456
if that is ever useful. The changes themselves look good. Really, c_f_r
has been an API plagued with problems :(
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Sat, 02 Aug 2025 17:32:02 GMT)
Full text and
rfc822 format available.
Message #52 received at 79139 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 2025-08-02 09:47, Sam James wrote:
> You can shorten sourceware (and gcc) bug URLs to:
> https://sourceware.org/PR123456
> https://gcc.gnu.org/PR123456
Thanks, I didn't know that. These URL shorthands are well supported so
let's use them in the code and doc, as that's a readability win. I
installed the attached patches to do that. I didn't alter ChangeLog
partly due to laziness, partly as it's less important there.
[0001-Shorten-glibc-bug-URLs.patch (text/x-patch, attachment)]
[0002-Shorten-GCC-bug-URLs.patch (text/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79139
; Package
coreutils
.
(Sat, 02 Aug 2025 19:37:02 GMT)
Full text and
rfc822 format available.
Message #55 received at 79139 <at> debbugs.gnu.org (full text, mbox):
Sam James <sam <at> gentoo.org> writes:
> if that is ever useful. The changes themselves look good. Really, c_f_r
> has been an API plagued with problems :(
To be fair this is not the fault of copy_file_range itself. Not that it
makes the situation any better. :)
Collin
This bug report was last modified 8 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.