GNU bug report logs - #79267
cp --sparse=auto heuristic fails on a squashfs mounted drive.

Previous Next

Package: coreutils;

Reported by: Jeremy Allison <jallison <at> ciq.com>

Date: Tue, 19 Aug 2025 02:39:02 UTC

Severity: normal

To reply to this bug, email your comments to 79267 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#79267; Package coreutils. (Tue, 19 Aug 2025 02:39:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jeremy Allison <jallison <at> ciq.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Tue, 19 Aug 2025 02:39:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jeremy Allison <jallison <at> ciq.com>
To: bug-coreutils <at> gnu.org, Jeremy Allison <jallison <at> ciq.com>, 
 Howard Van Der Wal <hvanderwal <at> ciq.com>
Subject: cp --sparse=auto heuristic fails on a squashfs mounted drive.
Date: Mon, 18 Aug 2025 14:25:32 -0700
It turns out that: lseek(3, 0, SEEK_HOLE) returns end-of-file for a
sparse file copied from a Linux squashfs mounted drive. This breaks
the --sparse=auto heuristic that detects a sparse file.

I have a fix for you to consider.

To reproduce:

First, create a squashfs drive containing a file output_file.bin.

mkdir squashfs-root
cd squashfs-root

Then run the following script mkhole.sh:

--------------------------------------------------------------
#!/bin/bash
OUTPUT="output_file.bin"

# Remove file if it exists
rm -f "$OUTPUT"

# Write 4KB of 'A'
dd if=<(yes A | tr -d '\n' | head -c 4096) of="$OUTPUT" bs=4096 count=1

# Create a 4k*100 hole followed by 4KB of zeros
dd if=/dev/zero of="$OUTPUT" bs=4096 count=1 seek=101

# Write another 4KB of 'A' after the hole (overwriting the 4k of zeros)
dd if=<(yes A | tr -d '\n' | head -c 4096) of="$OUTPUT" bs=4096 count=1 seek=101
--------------------------------------------------------------

Now create the mysquashfs.img file to mount:

cd ..
mksquashfs squashfs-root mysquashfs.img
sudo mount -o loop  mysquashfs.img /mnt

Check that /mnt/output_file.bin is sparse:

ls -lh /mnt/output_file.bin
du -sh /mnt/output_file.bin

(the second value should be less).

Now use a newly built cp command from coreutils to copy this file to a
local filesystem.

mkdir ~/tmp
cd ~/tmp
~/src/coreutils/src/cp --reflink=never /mnt/output_file.bin nonsparse

Even though --sparse=auto and the file is sparse, it is not detected
as such. This can be confirmed by
running:

strace ~/src/coreutils/src/cp --reflink=never /mnt/output_file.bin nonsparse

and you will see:

lseek(3, 0, SEEK_DATA)                  = 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
lseek(3, 0, SEEK_HOLE)                  = 417792

The following diff fixes this for me, and still passes "make check".

diff --git a/src/copy.c b/src/copy.c
index 77f0c561e..91136cd7c 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -592,7 +592,7 @@ lseek_copy (int src_fd, int dest_fd, char **abuf,
size_t buf_size,
           else if (sparse_mode != SPARSE_NEVER)
             {
               if (! create_hole (dest_fd, dst_name,
-                                 sparse_mode == SPARSE_ALWAYS,
+                                 sparse_mode != SPARSE_NEVER,
                                  ext_hole_size))
                 return false;
             }
@@ -621,7 +621,7 @@ lseek_copy (int src_fd, int dest_fd, char **abuf,
size_t buf_size,
       if ( ! sparse_copy (src_fd, dest_fd, abuf, buf_size,
                           true, allow_reflink, src_name, dst_name,
                           ext_len,
-                          sparse_mode == SPARSE_ALWAYS ? hole_size : nullptr,
+                          sparse_mode != SPARSE_NEVER ? hole_size : nullptr,
                           &n_read))
         return false;

@@ -1576,7 +1576,7 @@ copy_reg (char const *src_name, char const *dst_name,
              :
 #endif
                sparse_copy (source_desc, dest_desc, &buf, buf_size,
-                            x->sparse_mode == SPARSE_ALWAYS,
+                            x->sparse_mode != SPARSE_NEVER,
                             x->reflink_mode != REFLINK_NEVER,
                             src_name, dst_name, UINTMAX_MAX,
                             make_holes ? &hole_size : nullptr, &n_read)))

Thanks !

Jeremy Allison,
CIQ Inc.




Information forwarded to bug-coreutils <at> gnu.org:
bug#79267; Package coreutils. (Thu, 21 Aug 2025 23:04:02 GMT) Full text and rfc822 format available.

Message #8 received at 79267 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jeremy Allison <jallison <at> ciq.com>
Cc: 79267 <at> debbugs.gnu.org, hvanderwal <at> ciq.com
Subject: Re: bug#79267: cp --sparse=auto heuristic fails on a squashfs mounted
 drive.
Date: Thu, 21 Aug 2025 17:03:32 -0600
[Message part 1 (text/plain, inline)]
Thanks for the bug report. Although this part of the code is messy and 
needs a revamp, in the meantime I installed the attached into the master 
branch on Savannah; please give it a try.
[0001-cp-always-punch-holes-that-we-make.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#79267; Package coreutils. (Thu, 21 Aug 2025 23:24:02 GMT) Full text and rfc822 format available.

Message #11 received at 79267 <at> debbugs.gnu.org (full text, mbox):

From: Jeremy Allison <jallison <at> ciq.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 79267 <at> debbugs.gnu.org, hvanderwal <at> ciq.com
Subject: Re: bug#79267: cp --sparse=auto heuristic fails on a squashfs mounted
 drive.
Date: Thu, 21 Aug 2025 16:22:51 -0700
Yes - that seems to fix the problem ! Thanks Paul.

On Thu, Aug 21, 2025 at 4:03 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>
> Thanks for the bug report. Although this part of the code is messy and
> needs a revamp, in the meantime I installed the attached into the master
> branch on Savannah; please give it a try.




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Thu, 21 Aug 2025 23:25:02 GMT) Full text and rfc822 format available.

Notification sent to Jeremy Allison <jallison <at> ciq.com>:
bug acknowledged by developer. (Thu, 21 Aug 2025 23:25:02 GMT) Full text and rfc822 format available.

Message #16 received at 79267-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jeremy Allison <jallison <at> ciq.com>
Cc: 79267-done <at> debbugs.gnu.org
Subject: Re: bug#79267: cp --sparse=auto heuristic fails on a squashfs mounted
 drive.
Date: Thu, 21 Aug 2025 17:24:16 -0600
Thanks for checking; closing the bug report.




Information forwarded to bug-coreutils <at> gnu.org:
bug#79267; Package coreutils. (Fri, 22 Aug 2025 13:05:01 GMT) Full text and rfc822 format available.

Message #19 received at 79267 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Jeremy Allison <jallison <at> ciq.com>
Cc: 79267 <at> debbugs.gnu.org, hvanderwal <at> ciq.com
Subject: Re: bug#79267: cp --sparse=auto heuristic fails on a squashfs mounted
 drive.
Date: Fri, 22 Aug 2025 14:04:19 +0100
A question about this hunk:

@@ -619,9 +615,9 @@ lseek_copy (int src_fd, int dest_fd, char **abuf, size_t buf_size,
          is conservative and may miss some holes.  */
       off_t n_read;
       if ( ! sparse_copy (src_fd, dest_fd, abuf, buf_size,
-                          true, allow_reflink, src_name, dst_name,
+                          allow_reflink, src_name, dst_name,
                           ext_len,
-                          sparse_mode == SPARSE_ALWAYS ? hole_size : nullptr,
+                          sparse_mode != SPARSE_NEVER ? hole_size : nullptr,
                           &n_read))
         return false;


The comment above that is:

      /* Copy this extent, looking for further opportunities to not
         bother to write zeros if --sparse=always, since SEEK_HOLE
         is conservative and may miss some holes.  */

So the comment needs to be tweaked, but a more general issue
is that it disables copy offloading (copy_file_range) for sparse files.
I.e. it undoes https://github.com/coreutils/coreutils/commit/879d2180d
BTW commit 26bf557 also changed this a couple of weeks ago
without updating the comment, so the comment relates to sparse_mode != SPARSE_ALWAYS.

If we do decide to change behavior here it should documented in NEWS,
but I don't think this is the right compromise.

If we can't handle all cases optimally, I'd be inclined to err on being
as performant as possible by default, and only try harder to look for holes
with --sparse=always. squashfs is giving the wrong info here after all, right?

cheers,
Padraig.




Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 22 Aug 2025 17:36:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#79267; Package coreutils. (Sat, 23 Aug 2025 00:49:02 GMT) Full text and rfc822 format available.

Message #24 received at 79267 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>,
 Jeremy Allison <jallison <at> ciq.com>
Cc: 79267 <at> debbugs.gnu.org, hvanderwal <at> ciq.com
Subject: Re: bug#79267: cp --sparse=auto heuristic fails on a squashfs mounted
 drive.
Date: Fri, 22 Aug 2025 18:48:31 -0600
[Message part 1 (text/plain, inline)]
On 8/22/25 07:04, Pádraig Brady wrote:
>        /* Copy this extent, looking for further opportunities to not
>           bother to write zeros if --sparse=always, since SEEK_HOLE
>           is conservative and may miss some holes.  */
> 
> So the comment needs to be tweaked, but a more general issue
> is that it disables copy offloading (copy_file_range) for sparse files.

Ouch, I didn't see that. That's a real loss. I installed the first 
attached patch to revert that part of my recent change.

I assume the part of the change that always punches holes is OK. I 
couldn't see why one would not want to punch a hole if one has already 
taken the trouble to find and create the hole.

> BTW commit 26bf557 also changed this a couple of weeks ago
> without updating the comment, so the comment relates to sparse_mode != 
> SPARSE_ALWAYS.

Not quite following but I hope the comment is OK now with the first 
patch installed.

> squashfs is giving the wrong info here after all, 
> right?

Yes, that's the actual space-performance bug here. I installed the 
second attached patch to try to work around it.

Jeremy, can you please try these two further patches? Thanks.
[0001-cp-go-back-to-copy_file_range-optimization.patch (text/x-patch, attachment)]
[0002-cp-improve-hole-handling-on-squashfs.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#79267; Package coreutils. (Sat, 23 Aug 2025 04:33:02 GMT) Full text and rfc822 format available.

Message #27 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Phillip Lougher <phillip <at> squashfs.org.uk>
To: jallison <at> ciq.com
Cc: bug-coreutils <at> gnu.org
Subject: bug#79267: cp --sparse=auto heuristic fails on a squashfs mounted
 drive.
Date: Sat, 23 Aug 2025 03:11:51 +0100
Jeremy Allison wrote:

> It turns out that: lseek(3, 0, SEEK_HOLE) returns end-of-file for a
> sparse file copied from a Linux squashfs mounted drive. This breaks
> the --sparse=auto heuristic that detects a sparse file.

The reason for this is because Squashfs supports sparse files, but
it has never implemented SEEK_HOLE/SEEK_DATA, forcing applications to
do their own hole discovery.  This was done for following reason.

Squashfs supports sparse holes at the granularity of the block, but
the block size in Squashfs is by default 128 Kbytes (and can be up to
1 Mbyte).  In contrast most Linux filesystems use 4K block sizes.

This means any Squashfs SEEK_HOLE/SEEK_DATA implementation will not
behave like other Linux filesystems, because it won't report sparseness
at the 4K granularity that most people or programs will expect it to.
With the result a program may miss holes that exist in the file.

I have always considered it better not to support something rather than
support it in a way that people won't expect it to behave, or the
principle of least surprise.

> lseek(3, 0, SEEK_DATA)                  = 0
> fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
> lseek(3, 0, SEEK_HOLE)                  = 417792

This is the behaviour of the default llseek() implementation in the
Linux kernel VFS when doing an lseek SEEK_HOLE.  This is to seek to
a virtual hole at the end of the file.

See 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/read_write.c#n102

I am not subscribed to this email list, and so please CC me on replies.

Thanks

Phillip

---
Squashfs author and maintainer







Information forwarded to bug-coreutils <at> gnu.org:
bug#79267; Package coreutils. (Sat, 23 Aug 2025 04:33:02 GMT) Full text and rfc822 format available.

Message #30 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Phillip Lougher <phillip <at> squashfs.org.uk>
To: P <at> draigBrady.com
Cc: bug-coreutils <at> gnu.org
Subject: bug#79267: cp --sparse=auto heuristic fails on a squashfs mounted
 drive.
Date: Sat, 23 Aug 2025 05:26:26 +0100
Padraig Brady wrote:

> with --sparse=always. squashfs is giving the wrong info here after 
all, right?

No Squashfs is not giving the wrong information here.

Support for SEEK_HOLE/SEEK_DATA is not mandated, and no Linux filesystem 
is required to support it.

What you are seeing here is the default Linux VFS behaviour.

See 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/read_write.c#n102

Phillip

---
Squashfs author and maintainer







Information forwarded to bug-coreutils <at> gnu.org:
bug#79267; Package coreutils. (Sat, 23 Aug 2025 16:47:02 GMT) Full text and rfc822 format available.

Message #33 received at 79267 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Phillip Lougher <phillip <at> squashfs.org.uk>
Cc: jallison <at> ciq.com, 79267 <at> debbugs.gnu.org
Subject: Re: bug#79267: cp --sparse=auto heuristic fails on a squashfs mounted
 drive.
Date: Sat, 23 Aug 2025 09:46:38 -0700
On 2025-08-22 19:11, Phillip Lougher wrote:
> any Squashfs SEEK_HOLE/SEEK_DATA implementation will not
> behave like other Linux filesystems, because it won't report sparseness
> at the 4K granularity that most people or programs will expect it to.

Coreutils doesn't expect 4 KiB granularity for LSEEK_DATA+LSEEK_HOLE, 
and I don't know of any programs that do expect it. I fail to see why 
squashfs should penalize the performance of core programs like 'cp' 
merely because some (which?) programs are poorly written.




Information forwarded to bug-coreutils <at> gnu.org:
bug#79267; Package coreutils. (Sat, 23 Aug 2025 18:02:01 GMT) Full text and rfc822 format available.

Message #36 received at 79267 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Jeremy Allison <jallison <at> ciq.com>
Cc: 79267 <at> debbugs.gnu.org, hvanderwal <at> ciq.com
Subject: Re: bug#79267: cp --sparse=auto heuristic fails on a squashfs mounted
 drive.
Date: Sat, 23 Aug 2025 19:01:09 +0100
[Message part 1 (text/plain, inline)]
On 23/08/2025 01:48, Paul Eggert wrote:
> On 8/22/25 07:04, Pádraig Brady wrote:
>>         /* Copy this extent, looking for further opportunities to not
>>            bother to write zeros if --sparse=always, since SEEK_HOLE
>>            is conservative and may miss some holes.  */
>>
>> So the comment needs to be tweaked, but a more general issue
>> is that it disables copy offloading (copy_file_range) for sparse files.
> 
> Ouch, I didn't see that. That's a real loss. I installed the first
> attached patch to revert that part of my recent change.

Cool. I'll push the attached test to enforce this
> I assume the part of the change that always punches holes is OK. I
> couldn't see why one would not want to punch a hole if one has already
> taken the trouble to find and create the hole.

Fair enough. It simplifies the code anyway.

>> BTW commit 26bf557 also changed this a couple of weeks ago
>> without updating the comment, so the comment relates to sparse_mode !=
>> SPARSE_ALWAYS.
> 
> Not quite following but I hope the comment is OK now with the first
> patch installed.

Right, I missed that commit 26bf557 didn't actually change the logic
for this line, only the syntax.

thanks!
Padraig
[cp-copy-offload-nfs-test.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#79267; Package coreutils. (Sat, 23 Aug 2025 18:28:01 GMT) Full text and rfc822 format available.

Message #39 received at 79267 <at> debbugs.gnu.org (full text, mbox):

From: Phillip Lougher <phillip <at> squashfs.org.uk>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: jallison <at> ciq.com, 79267 <at> debbugs.gnu.org
Subject: Re: bug#79267: cp --sparse=auto heuristic fails on a squashfs mounted
 drive.
Date: Sat, 23 Aug 2025 19:27:24 +0100
On 23/08/2025 17:46, Paul Eggert wrote:
> On 2025-08-22 19:11, Phillip Lougher wrote:
>> any Squashfs SEEK_HOLE/SEEK_DATA implementation will not
>> behave like other Linux filesystems, because it won't report sparseness
>> at the 4K granularity that most people or programs will expect it to.
>
> Coreutils doesn't expect 4 KiB granularity for LSEEK_DATA+LSEEK_HOLE, 
> and I don't know of any programs that do expect it. I fail to see why 
> squashfs should penalize the performance of core programs like 'cp' 
> merely because some (which?) programs are poorly written.

Yeah let's take the attitude everyone writes well written programs, and 
if they don't it's their fault when they unexpectedly break in 
production.   In reality a lot of code in embedded Linux systems is 
dreadful, written by inexperienced programmers.

Anyway, my email was pointing out that the SEEK_HOLE behaviour being 
complained about is the default Linux VFS behaviour.  So this isn't 
about Squashfs.

Go into the Linux kernel fs directory and run the following script.

% yes=""; no=""; for i in *; do if [ -d $i ]; then l=$(find $i -name 
"*.[ch]" | xargs grep SEEK_HOLE | wc -l); if [ $l -ne 0 ]; then yes+=" 
"$i ; else no+=" "$i; fi; fi; done; yc=$(echo $yes | wc -w); nc=$(echo 
$no | wc -w); echo -e "\nSEEK_HOLE"=$yc $yes; echo -e "\nNo 
SEEK_HOLE"=$nc $no

SEEK_HOLE=18 bcachefs btrfs cachefiles ceph erofs ext4 f2fs fuse gfs2 
hpfs iomap nfs nfsd ocfs2 orangefs overlayfs smb xfs

No SEEK_HOLE=60 9p adfs affs afs autofs befs bfs coda configfs cramfs 
crypto debugfs devpts dlm ecryptfs efivarfs efs exfat exportfs ext2 fat 
freevxfs hfs hfsplus hostfs hugetlbfs isofs jbd2 jffs2 jfs kernfs lockd 
minix netfs nfs_common nilfs2 nls notify ntfs3 omfs openpromfs proc 
pstore qnx4 qnx6 quota ramfs resctrl romfs squashfs sysfs tests tracefs 
ubifs udf ufs unicode vboxsf verity zonefs

This picks up some false positives, but there are much more filesystems 
which don't support SEEK_HOLE than those that do.  If you think this is 
a problem with the Linux kernel then by all means raise it on the linux 
kernel mailing list.

Cheers

Phillip






Information forwarded to bug-coreutils <at> gnu.org:
bug#79267; Package coreutils. (Sat, 23 Aug 2025 18:47:01 GMT) Full text and rfc822 format available.

Message #42 received at 79267 <at> debbugs.gnu.org (full text, mbox):

From: Phillip Lougher <phillip <at> squashfs.org.uk>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: jallison <at> ciq.com, 79267 <at> debbugs.gnu.org
Subject: Re: bug#79267: cp --sparse=auto heuristic fails on a squashfs mounted
 drive.
Date: Sat, 23 Aug 2025 19:46:05 +0100
On 23/08/2025 17:46, Paul Eggert wrote:
> On 2025-08-22 19:11, Phillip Lougher wrote:
>> any Squashfs SEEK_HOLE/SEEK_DATA implementation will not
>> behave like other Linux filesystems, because it won't report sparseness
>> at the 4K granularity that most people or programs will expect it to.
>
> Coreutils doesn't expect 4 KiB granularity for LSEEK_DATA+LSEEK_HOLE, 
> and I don't know of any programs that do expect it. I fail to see why 
> squashfs should penalize the performance of core programs like 'cp' 
> merely because some (which?) programs are poorly written.

As far as Squashfs is concerned SEEK_HOLE/SEEK_DATA is easy to 
implement.  So I'll think about adding it as a build option.

But this isn't going to fix it for any other case.

Phillip






Information forwarded to bug-coreutils <at> gnu.org:
bug#79267; Package coreutils. (Sat, 23 Aug 2025 19:02:01 GMT) Full text and rfc822 format available.

Message #45 received at 79267 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Phillip Lougher <phillip <at> squashfs.org.uk>
Cc: jallison <at> ciq.com, 79267 <at> debbugs.gnu.org
Subject: Re: bug#79267: cp --sparse=auto heuristic fails on a squashfs mounted
 drive.
Date: Sat, 23 Aug 2025 12:01:01 -0700
On 2025-08-23 11:27, Phillip Lougher wrote:

> Yeah let's take the attitude everyone writes well written programs, and if they don't it's their fault when they unexpectedly break in production.   In reality a lot of code in embedded Linux systems is dreadful, written by inexperienced programmers.

Inexperienced programmers don't use SEEK_HOLE or SEEK_DATA.

Several commonly-used programs would benefit from proper support for 
SEEK_HOLE and SEEK_DATA. What programs would be hurt? If nobody knows of 
such programs, we should be skeptical of the argument that support would 
be dangerous. To be honest I can't think of why it would hurt in 
practical programs.


> Anyway, my email was pointing out that the SEEK_HOLE behaviour being 
> complained about is the default Linux VFS behaviour.  So this isn't 
> about Squashfs.

Yes it is. The default behavior is appropriate for simple file systems 
that lack extents. However, Squashfs is not such a file system. If 
Squashfs has extents but does not expose them to user code, user code 
can be waaaaaayy less efficient.

And this isn't merely an efficiency issue. It can be a security issue, 
as sparse files can be used for denial-of-service attacks. I assume 
efficiency and security are of concern to Squashfs users, which is why I 
press this point.




Information forwarded to bug-coreutils <at> gnu.org:
bug#79267; Package coreutils. (Sat, 23 Aug 2025 19:04:01 GMT) Full text and rfc822 format available.

Message #48 received at 79267 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Phillip Lougher <phillip <at> squashfs.org.uk>
Cc: jallison <at> ciq.com, 79267 <at> debbugs.gnu.org
Subject: Re: bug#79267: cp --sparse=auto heuristic fails on a squashfs mounted
 drive.
Date: Sat, 23 Aug 2025 12:03:08 -0700
On 2025-08-23 11:46, Phillip Lougher wrote:
> As far as Squashfs is concerned SEEK_HOLE/SEEK_DATA is easy to 
> implement.  So I'll think about adding it as a build option.

Thanks, that'll be helpful.

> But this isn't going to fix it for any other case.

Right, and bleeding-edge coreutils already has a (slowish) workaround 
for Squashfs as-is, as well as for other file systems that don't expose 
extents to user code. If I get around to it I will install similar 
workarounds in other user code I help maintain.




Information forwarded to bug-coreutils <at> gnu.org:
bug#79267; Package coreutils. (Mon, 25 Aug 2025 16:28:02 GMT) Full text and rfc822 format available.

Message #51 received at 79267 <at> debbugs.gnu.org (full text, mbox):

From: Jeremy Allison <jallison <at> ciq.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 79267 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigbrady.com>,
 hvanderwal <at> ciq.com
Subject: Re: bug#79267: cp --sparse=auto heuristic fails on a squashfs mounted
 drive.
Date: Mon, 25 Aug 2025 09:27:28 -0700
Hi Paul,

I tested with the code currently in the master branch in coreutils -
top of tree respec is 4bfcf62f74b38d762ee06ceef582c326023635a9.

This current code still fixes my testcase thanks !

For full reference, it contains the 3 relevant patches:

commit 6c668dc133af7d374790c1da666a701e21682a35
Author: Pádraig Brady <P <at> draigBrady.com>
Date:   Sat Aug 23 18:53:17 2025 +0100

    tests: cp: ensure copy offload is not disabled for sparse files

    Related to commits v9.1-109-g879d2180d and v9.7-248-g306de6c26

    * tests/cp/sparse-perf.sh: This edge case was missed a couple of times,
    so add a test to ensure we attempt copy offload.

commit 39f22fe687ea0c226e3fb35e86cd5ea329180b80
Author: Paul Eggert <eggert <at> cs.ucla.edu>
Date:   Fri Aug 22 17:34:04 2025 -0700

    cp: improve hole handling on squashfs

    Better fix for problem reported by Jeremy Allison
    <https://bugs.gnu.org/79267>.
    * src/copy.c (struct scan_inference): New type, replacing
    union scan_inference.  All uses changed.  This is so
    infer_scantype can report the first hole's offset when known.
    (lseek_copy): 5th arg is now struct scan_inference const *,
    not just off_t.  All uses changed.
    (infer_scantype): If SEEK_SET+SEEK_HOLE do not find a hole,
    fall back on ZERO_SCANTYPE.

commit 306de6c2619e2a9339ade9a88d55c4940942d516
Author: Paul Eggert <eggert <at> cs.ucla.edu>
Date:   Fri Aug 22 10:37:50 2025 -0700

    cp: go back to copy_file_range optimization

    This reverts part of the previous change.
    * src/copy.c (lseek_copy): When calling sparse_copy, do not
    ask it to scan for zeros unless --sparse=always, so that it
    can use copy_file_range which can be far more efficient.

Cheers,

Jeremy.

On Fri, Aug 22, 2025 at 5:48 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>
> On 8/22/25 07:04, Pádraig Brady wrote:
> >        /* Copy this extent, looking for further opportunities to not
> >           bother to write zeros if --sparse=always, since SEEK_HOLE
> >           is conservative and may miss some holes.  */
> >
> > So the comment needs to be tweaked, but a more general issue
> > is that it disables copy offloading (copy_file_range) for sparse files.
>
> Ouch, I didn't see that. That's a real loss. I installed the first
> attached patch to revert that part of my recent change.
>
> I assume the part of the change that always punches holes is OK. I
> couldn't see why one would not want to punch a hole if one has already
> taken the trouble to find and create the hole.
>
> > BTW commit 26bf557 also changed this a couple of weeks ago
> > without updating the comment, so the comment relates to sparse_mode !=
> > SPARSE_ALWAYS.
>
> Not quite following but I hope the comment is OK now with the first
> patch installed.
>
> > squashfs is giving the wrong info here after all,
> > right?
>
> Yes, that's the actual space-performance bug here. I installed the
> second attached patch to try to work around it.
>
> Jeremy, can you please try these two further patches? Thanks.




This bug report was last modified 17 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.