GNU bug report logs -
#79267
cp --sparse=auto heuristic fails on a squashfs mounted drive.
Previous Next
To reply to this bug, email your comments to 79267 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#79267
; Package
coreutils
.
(Tue, 19 Aug 2025 02:39:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Jeremy Allison <jallison <at> ciq.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Tue, 19 Aug 2025 02:39:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
It turns out that: lseek(3, 0, SEEK_HOLE) returns end-of-file for a
sparse file copied from a Linux squashfs mounted drive. This breaks
the --sparse=auto heuristic that detects a sparse file.
I have a fix for you to consider.
To reproduce:
First, create a squashfs drive containing a file output_file.bin.
mkdir squashfs-root
cd squashfs-root
Then run the following script mkhole.sh:
--------------------------------------------------------------
#!/bin/bash
OUTPUT="output_file.bin"
# Remove file if it exists
rm -f "$OUTPUT"
# Write 4KB of 'A'
dd if=<(yes A | tr -d '\n' | head -c 4096) of="$OUTPUT" bs=4096 count=1
# Create a 4k*100 hole followed by 4KB of zeros
dd if=/dev/zero of="$OUTPUT" bs=4096 count=1 seek=101
# Write another 4KB of 'A' after the hole (overwriting the 4k of zeros)
dd if=<(yes A | tr -d '\n' | head -c 4096) of="$OUTPUT" bs=4096 count=1 seek=101
--------------------------------------------------------------
Now create the mysquashfs.img file to mount:
cd ..
mksquashfs squashfs-root mysquashfs.img
sudo mount -o loop mysquashfs.img /mnt
Check that /mnt/output_file.bin is sparse:
ls -lh /mnt/output_file.bin
du -sh /mnt/output_file.bin
(the second value should be less).
Now use a newly built cp command from coreutils to copy this file to a
local filesystem.
mkdir ~/tmp
cd ~/tmp
~/src/coreutils/src/cp --reflink=never /mnt/output_file.bin nonsparse
Even though --sparse=auto and the file is sparse, it is not detected
as such. This can be confirmed by
running:
strace ~/src/coreutils/src/cp --reflink=never /mnt/output_file.bin nonsparse
and you will see:
lseek(3, 0, SEEK_DATA) = 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
lseek(3, 0, SEEK_HOLE) = 417792
The following diff fixes this for me, and still passes "make check".
diff --git a/src/copy.c b/src/copy.c
index 77f0c561e..91136cd7c 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -592,7 +592,7 @@ lseek_copy (int src_fd, int dest_fd, char **abuf,
size_t buf_size,
else if (sparse_mode != SPARSE_NEVER)
{
if (! create_hole (dest_fd, dst_name,
- sparse_mode == SPARSE_ALWAYS,
+ sparse_mode != SPARSE_NEVER,
ext_hole_size))
return false;
}
@@ -621,7 +621,7 @@ lseek_copy (int src_fd, int dest_fd, char **abuf,
size_t buf_size,
if ( ! sparse_copy (src_fd, dest_fd, abuf, buf_size,
true, allow_reflink, src_name, dst_name,
ext_len,
- sparse_mode == SPARSE_ALWAYS ? hole_size : nullptr,
+ sparse_mode != SPARSE_NEVER ? hole_size : nullptr,
&n_read))
return false;
@@ -1576,7 +1576,7 @@ copy_reg (char const *src_name, char const *dst_name,
:
#endif
sparse_copy (source_desc, dest_desc, &buf, buf_size,
- x->sparse_mode == SPARSE_ALWAYS,
+ x->sparse_mode != SPARSE_NEVER,
x->reflink_mode != REFLINK_NEVER,
src_name, dst_name, UINTMAX_MAX,
make_holes ? &hole_size : nullptr, &n_read)))
Thanks !
Jeremy Allison,
CIQ Inc.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79267
; Package
coreutils
.
(Thu, 21 Aug 2025 23:04:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 79267 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Thanks for the bug report. Although this part of the code is messy and
needs a revamp, in the meantime I installed the attached into the master
branch on Savannah; please give it a try.
[0001-cp-always-punch-holes-that-we-make.patch (text/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79267
; Package
coreutils
.
(Thu, 21 Aug 2025 23:24:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 79267 <at> debbugs.gnu.org (full text, mbox):
Yes - that seems to fix the problem ! Thanks Paul.
On Thu, Aug 21, 2025 at 4:03 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>
> Thanks for the bug report. Although this part of the code is messy and
> needs a revamp, in the meantime I installed the attached into the master
> branch on Savannah; please give it a try.
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Thu, 21 Aug 2025 23:25:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Jeremy Allison <jallison <at> ciq.com>
:
bug acknowledged by developer.
(Thu, 21 Aug 2025 23:25:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 79267-done <at> debbugs.gnu.org (full text, mbox):
Thanks for checking; closing the bug report.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79267
; Package
coreutils
.
(Fri, 22 Aug 2025 13:05:01 GMT)
Full text and
rfc822 format available.
Message #19 received at 79267 <at> debbugs.gnu.org (full text, mbox):
A question about this hunk:
@@ -619,9 +615,9 @@ lseek_copy (int src_fd, int dest_fd, char **abuf, size_t buf_size,
is conservative and may miss some holes. */
off_t n_read;
if ( ! sparse_copy (src_fd, dest_fd, abuf, buf_size,
- true, allow_reflink, src_name, dst_name,
+ allow_reflink, src_name, dst_name,
ext_len,
- sparse_mode == SPARSE_ALWAYS ? hole_size : nullptr,
+ sparse_mode != SPARSE_NEVER ? hole_size : nullptr,
&n_read))
return false;
The comment above that is:
/* Copy this extent, looking for further opportunities to not
bother to write zeros if --sparse=always, since SEEK_HOLE
is conservative and may miss some holes. */
So the comment needs to be tweaked, but a more general issue
is that it disables copy offloading (copy_file_range) for sparse files.
I.e. it undoes https://github.com/coreutils/coreutils/commit/879d2180d
BTW commit 26bf557 also changed this a couple of weeks ago
without updating the comment, so the comment relates to sparse_mode != SPARSE_ALWAYS.
If we do decide to change behavior here it should documented in NEWS,
but I don't think this is the right compromise.
If we can't handle all cases optimally, I'd be inclined to err on being
as performant as possible by default, and only try harder to look for holes
with --sparse=always. squashfs is giving the wrong info here after all, right?
cheers,
Padraig.
Did not alter fixed versions and reopened.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 22 Aug 2025 17:36:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79267
; Package
coreutils
.
(Sat, 23 Aug 2025 00:49:02 GMT)
Full text and
rfc822 format available.
Message #24 received at 79267 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 8/22/25 07:04, Pádraig Brady wrote:
> /* Copy this extent, looking for further opportunities to not
> bother to write zeros if --sparse=always, since SEEK_HOLE
> is conservative and may miss some holes. */
>
> So the comment needs to be tweaked, but a more general issue
> is that it disables copy offloading (copy_file_range) for sparse files.
Ouch, I didn't see that. That's a real loss. I installed the first
attached patch to revert that part of my recent change.
I assume the part of the change that always punches holes is OK. I
couldn't see why one would not want to punch a hole if one has already
taken the trouble to find and create the hole.
> BTW commit 26bf557 also changed this a couple of weeks ago
> without updating the comment, so the comment relates to sparse_mode !=
> SPARSE_ALWAYS.
Not quite following but I hope the comment is OK now with the first
patch installed.
> squashfs is giving the wrong info here after all,
> right?
Yes, that's the actual space-performance bug here. I installed the
second attached patch to try to work around it.
Jeremy, can you please try these two further patches? Thanks.
[0001-cp-go-back-to-copy_file_range-optimization.patch (text/x-patch, attachment)]
[0002-cp-improve-hole-handling-on-squashfs.patch (text/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79267
; Package
coreutils
.
(Sat, 23 Aug 2025 04:33:02 GMT)
Full text and
rfc822 format available.
Message #27 received at submit <at> debbugs.gnu.org (full text, mbox):
Jeremy Allison wrote:
> It turns out that: lseek(3, 0, SEEK_HOLE) returns end-of-file for a
> sparse file copied from a Linux squashfs mounted drive. This breaks
> the --sparse=auto heuristic that detects a sparse file.
The reason for this is because Squashfs supports sparse files, but
it has never implemented SEEK_HOLE/SEEK_DATA, forcing applications to
do their own hole discovery. This was done for following reason.
Squashfs supports sparse holes at the granularity of the block, but
the block size in Squashfs is by default 128 Kbytes (and can be up to
1 Mbyte). In contrast most Linux filesystems use 4K block sizes.
This means any Squashfs SEEK_HOLE/SEEK_DATA implementation will not
behave like other Linux filesystems, because it won't report sparseness
at the 4K granularity that most people or programs will expect it to.
With the result a program may miss holes that exist in the file.
I have always considered it better not to support something rather than
support it in a way that people won't expect it to behave, or the
principle of least surprise.
> lseek(3, 0, SEEK_DATA) = 0
> fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
> lseek(3, 0, SEEK_HOLE) = 417792
This is the behaviour of the default llseek() implementation in the
Linux kernel VFS when doing an lseek SEEK_HOLE. This is to seek to
a virtual hole at the end of the file.
See
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/read_write.c#n102
I am not subscribed to this email list, and so please CC me on replies.
Thanks
Phillip
---
Squashfs author and maintainer
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79267
; Package
coreutils
.
(Sat, 23 Aug 2025 04:33:02 GMT)
Full text and
rfc822 format available.
Message #30 received at submit <at> debbugs.gnu.org (full text, mbox):
Padraig Brady wrote:
> with --sparse=always. squashfs is giving the wrong info here after
all, right?
No Squashfs is not giving the wrong information here.
Support for SEEK_HOLE/SEEK_DATA is not mandated, and no Linux filesystem
is required to support it.
What you are seeing here is the default Linux VFS behaviour.
See
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/read_write.c#n102
Phillip
---
Squashfs author and maintainer
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79267
; Package
coreutils
.
(Sat, 23 Aug 2025 16:47:02 GMT)
Full text and
rfc822 format available.
Message #33 received at 79267 <at> debbugs.gnu.org (full text, mbox):
On 2025-08-22 19:11, Phillip Lougher wrote:
> any Squashfs SEEK_HOLE/SEEK_DATA implementation will not
> behave like other Linux filesystems, because it won't report sparseness
> at the 4K granularity that most people or programs will expect it to.
Coreutils doesn't expect 4 KiB granularity for LSEEK_DATA+LSEEK_HOLE,
and I don't know of any programs that do expect it. I fail to see why
squashfs should penalize the performance of core programs like 'cp'
merely because some (which?) programs are poorly written.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79267
; Package
coreutils
.
(Sat, 23 Aug 2025 18:02:01 GMT)
Full text and
rfc822 format available.
Message #36 received at 79267 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 23/08/2025 01:48, Paul Eggert wrote:
> On 8/22/25 07:04, Pádraig Brady wrote:
>> /* Copy this extent, looking for further opportunities to not
>> bother to write zeros if --sparse=always, since SEEK_HOLE
>> is conservative and may miss some holes. */
>>
>> So the comment needs to be tweaked, but a more general issue
>> is that it disables copy offloading (copy_file_range) for sparse files.
>
> Ouch, I didn't see that. That's a real loss. I installed the first
> attached patch to revert that part of my recent change.
Cool. I'll push the attached test to enforce this
> I assume the part of the change that always punches holes is OK. I
> couldn't see why one would not want to punch a hole if one has already
> taken the trouble to find and create the hole.
Fair enough. It simplifies the code anyway.
>> BTW commit 26bf557 also changed this a couple of weeks ago
>> without updating the comment, so the comment relates to sparse_mode !=
>> SPARSE_ALWAYS.
>
> Not quite following but I hope the comment is OK now with the first
> patch installed.
Right, I missed that commit 26bf557 didn't actually change the logic
for this line, only the syntax.
thanks!
Padraig
[cp-copy-offload-nfs-test.patch (text/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79267
; Package
coreutils
.
(Sat, 23 Aug 2025 18:28:01 GMT)
Full text and
rfc822 format available.
Message #39 received at 79267 <at> debbugs.gnu.org (full text, mbox):
On 23/08/2025 17:46, Paul Eggert wrote:
> On 2025-08-22 19:11, Phillip Lougher wrote:
>> any Squashfs SEEK_HOLE/SEEK_DATA implementation will not
>> behave like other Linux filesystems, because it won't report sparseness
>> at the 4K granularity that most people or programs will expect it to.
>
> Coreutils doesn't expect 4 KiB granularity for LSEEK_DATA+LSEEK_HOLE,
> and I don't know of any programs that do expect it. I fail to see why
> squashfs should penalize the performance of core programs like 'cp'
> merely because some (which?) programs are poorly written.
Yeah let's take the attitude everyone writes well written programs, and
if they don't it's their fault when they unexpectedly break in
production. In reality a lot of code in embedded Linux systems is
dreadful, written by inexperienced programmers.
Anyway, my email was pointing out that the SEEK_HOLE behaviour being
complained about is the default Linux VFS behaviour. So this isn't
about Squashfs.
Go into the Linux kernel fs directory and run the following script.
% yes=""; no=""; for i in *; do if [ -d $i ]; then l=$(find $i -name
"*.[ch]" | xargs grep SEEK_HOLE | wc -l); if [ $l -ne 0 ]; then yes+="
"$i ; else no+=" "$i; fi; fi; done; yc=$(echo $yes | wc -w); nc=$(echo
$no | wc -w); echo -e "\nSEEK_HOLE"=$yc $yes; echo -e "\nNo
SEEK_HOLE"=$nc $no
SEEK_HOLE=18 bcachefs btrfs cachefiles ceph erofs ext4 f2fs fuse gfs2
hpfs iomap nfs nfsd ocfs2 orangefs overlayfs smb xfs
No SEEK_HOLE=60 9p adfs affs afs autofs befs bfs coda configfs cramfs
crypto debugfs devpts dlm ecryptfs efivarfs efs exfat exportfs ext2 fat
freevxfs hfs hfsplus hostfs hugetlbfs isofs jbd2 jffs2 jfs kernfs lockd
minix netfs nfs_common nilfs2 nls notify ntfs3 omfs openpromfs proc
pstore qnx4 qnx6 quota ramfs resctrl romfs squashfs sysfs tests tracefs
ubifs udf ufs unicode vboxsf verity zonefs
This picks up some false positives, but there are much more filesystems
which don't support SEEK_HOLE than those that do. If you think this is
a problem with the Linux kernel then by all means raise it on the linux
kernel mailing list.
Cheers
Phillip
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79267
; Package
coreutils
.
(Sat, 23 Aug 2025 18:47:01 GMT)
Full text and
rfc822 format available.
Message #42 received at 79267 <at> debbugs.gnu.org (full text, mbox):
On 23/08/2025 17:46, Paul Eggert wrote:
> On 2025-08-22 19:11, Phillip Lougher wrote:
>> any Squashfs SEEK_HOLE/SEEK_DATA implementation will not
>> behave like other Linux filesystems, because it won't report sparseness
>> at the 4K granularity that most people or programs will expect it to.
>
> Coreutils doesn't expect 4 KiB granularity for LSEEK_DATA+LSEEK_HOLE,
> and I don't know of any programs that do expect it. I fail to see why
> squashfs should penalize the performance of core programs like 'cp'
> merely because some (which?) programs are poorly written.
As far as Squashfs is concerned SEEK_HOLE/SEEK_DATA is easy to
implement. So I'll think about adding it as a build option.
But this isn't going to fix it for any other case.
Phillip
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79267
; Package
coreutils
.
(Sat, 23 Aug 2025 19:02:01 GMT)
Full text and
rfc822 format available.
Message #45 received at 79267 <at> debbugs.gnu.org (full text, mbox):
On 2025-08-23 11:27, Phillip Lougher wrote:
> Yeah let's take the attitude everyone writes well written programs, and if they don't it's their fault when they unexpectedly break in production. In reality a lot of code in embedded Linux systems is dreadful, written by inexperienced programmers.
Inexperienced programmers don't use SEEK_HOLE or SEEK_DATA.
Several commonly-used programs would benefit from proper support for
SEEK_HOLE and SEEK_DATA. What programs would be hurt? If nobody knows of
such programs, we should be skeptical of the argument that support would
be dangerous. To be honest I can't think of why it would hurt in
practical programs.
> Anyway, my email was pointing out that the SEEK_HOLE behaviour being
> complained about is the default Linux VFS behaviour. So this isn't
> about Squashfs.
Yes it is. The default behavior is appropriate for simple file systems
that lack extents. However, Squashfs is not such a file system. If
Squashfs has extents but does not expose them to user code, user code
can be waaaaaayy less efficient.
And this isn't merely an efficiency issue. It can be a security issue,
as sparse files can be used for denial-of-service attacks. I assume
efficiency and security are of concern to Squashfs users, which is why I
press this point.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79267
; Package
coreutils
.
(Sat, 23 Aug 2025 19:04:01 GMT)
Full text and
rfc822 format available.
Message #48 received at 79267 <at> debbugs.gnu.org (full text, mbox):
On 2025-08-23 11:46, Phillip Lougher wrote:
> As far as Squashfs is concerned SEEK_HOLE/SEEK_DATA is easy to
> implement. So I'll think about adding it as a build option.
Thanks, that'll be helpful.
> But this isn't going to fix it for any other case.
Right, and bleeding-edge coreutils already has a (slowish) workaround
for Squashfs as-is, as well as for other file systems that don't expose
extents to user code. If I get around to it I will install similar
workarounds in other user code I help maintain.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#79267
; Package
coreutils
.
(Mon, 25 Aug 2025 16:28:02 GMT)
Full text and
rfc822 format available.
Message #51 received at 79267 <at> debbugs.gnu.org (full text, mbox):
Hi Paul,
I tested with the code currently in the master branch in coreutils -
top of tree respec is 4bfcf62f74b38d762ee06ceef582c326023635a9.
This current code still fixes my testcase thanks !
For full reference, it contains the 3 relevant patches:
commit 6c668dc133af7d374790c1da666a701e21682a35
Author: Pádraig Brady <P <at> draigBrady.com>
Date: Sat Aug 23 18:53:17 2025 +0100
tests: cp: ensure copy offload is not disabled for sparse files
Related to commits v9.1-109-g879d2180d and v9.7-248-g306de6c26
* tests/cp/sparse-perf.sh: This edge case was missed a couple of times,
so add a test to ensure we attempt copy offload.
commit 39f22fe687ea0c226e3fb35e86cd5ea329180b80
Author: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Fri Aug 22 17:34:04 2025 -0700
cp: improve hole handling on squashfs
Better fix for problem reported by Jeremy Allison
<https://bugs.gnu.org/79267>.
* src/copy.c (struct scan_inference): New type, replacing
union scan_inference. All uses changed. This is so
infer_scantype can report the first hole's offset when known.
(lseek_copy): 5th arg is now struct scan_inference const *,
not just off_t. All uses changed.
(infer_scantype): If SEEK_SET+SEEK_HOLE do not find a hole,
fall back on ZERO_SCANTYPE.
commit 306de6c2619e2a9339ade9a88d55c4940942d516
Author: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Fri Aug 22 10:37:50 2025 -0700
cp: go back to copy_file_range optimization
This reverts part of the previous change.
* src/copy.c (lseek_copy): When calling sparse_copy, do not
ask it to scan for zeros unless --sparse=always, so that it
can use copy_file_range which can be far more efficient.
Cheers,
Jeremy.
On Fri, Aug 22, 2025 at 5:48 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>
> On 8/22/25 07:04, Pádraig Brady wrote:
> > /* Copy this extent, looking for further opportunities to not
> > bother to write zeros if --sparse=always, since SEEK_HOLE
> > is conservative and may miss some holes. */
> >
> > So the comment needs to be tweaked, but a more general issue
> > is that it disables copy offloading (copy_file_range) for sparse files.
>
> Ouch, I didn't see that. That's a real loss. I installed the first
> attached patch to revert that part of my recent change.
>
> I assume the part of the change that always punches holes is OK. I
> couldn't see why one would not want to punch a hole if one has already
> taken the trouble to find and create the hole.
>
> > BTW commit 26bf557 also changed this a couple of weeks ago
> > without updating the comment, so the comment relates to sparse_mode !=
> > SPARSE_ALWAYS.
>
> Not quite following but I hope the comment is OK now with the first
> patch installed.
>
> > squashfs is giving the wrong info here after all,
> > right?
>
> Yes, that's the actual space-performance bug here. I installed the
> second attached patch to try to work around it.
>
> Jeremy, can you please try these two further patches? Thanks.
This bug report was last modified 17 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.