GNU bug report logs - #69535
Problem with copying an EXTREMELY large file - cmp finds a mismatch

Previous Next

Package: coreutils;

Reported by: Brian <b_lists <at> patandbrian.org>

Date: Mon, 4 Mar 2024 04:27:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 69535 in the body.
You can then email your comments to 69535 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#69535; Package coreutils. (Mon, 04 Mar 2024 04:27:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Brian <b_lists <at> patandbrian.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 04 Mar 2024 04:27:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Brian <b_lists <at> patandbrian.org>
To: bug-coreutils <at> gnu.org
Subject: Problem with copying an EXTREMELY large file - cmp finds a mismatch
Date: Sun, 3 Mar 2024 15:04:51 -0500
I don't know whether the problem I've found is with cp or with cmp, so 
I don't know whether to address this report to coreutils or diffutils. 
If you think I've guessed wrong, please tell me so.

I am trying to make a backup copy of a very large (40 Gigabyte) data 
file - yes, I have plenty of disk space! :) It's a binary file, 200 
byte fixed length records to be precise, not a text file. I have 
downloaded, compiled and used the latest versions of cp and cmp and 
the problem persists. My system is a 16-core AMD Ryzen desktop running 
Linux Mint 21.3.

The steps to reproduce the problem are simple, provided you have the 
data file!

I have a folder called original in the data directory. From a terminal 
prompt, I run

cp data.dat original

this apparently completes correctly - at least, no error messages are seen

I then run

cmp -l data.dat original/data.dat

and I get something around 100 bytes of differences. On the basis of 
three attempted copy and comparison pairs, the addresses of these 
differences vary, but they're always a single block of contiguous 
locations, and always towards the end of the file (the last time, they 
were in the 35,000,000,000s).

I have run a fsck on the drive (a 14 TB Seagate connected to one of 
the motherboard SATA ports) and no problems were found.

Any advice, please? I'm close to the limits of my debugging knowledge.

Please note that I have absolutely zero knowledge of the C language or 
its derivatives. I'm a (retired) scientist turned database programmer, 
I know Pascal, FORTRAN and SQL, and that's about it.


Thanks,

Brian.




Information forwarded to bug-coreutils <at> gnu.org:
bug#69535; Package coreutils. (Mon, 04 Mar 2024 08:12:01 GMT) Full text and rfc822 format available.

Message #8 received at 69535 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Brian <b_lists <at> patandbrian.org>
Cc: 69535 <at> debbugs.gnu.org
Subject: Re: bug#69535: Problem with copying an EXTREMELY large file - cmp
 finds a mismatch
Date: Mon, 4 Mar 2024 00:10:48 -0800
Try running 'strace -o tr cp data.dat original' and then look at the 
file 'tr' (which could be quite large). Look for the syscalls near the 
start, and near the end, of the bulk copy.

Quite possibly it's a bug in your Linux drivers or your firmware or 
hardware. For example, if you're using ZFS, see:

https://github.com/openzfs/zfs/issues/15526

The strace output might help figure this out.




Information forwarded to bug-coreutils <at> gnu.org:
bug#69535; Package coreutils. (Mon, 04 Mar 2024 14:12:02 GMT) Full text and rfc822 format available.

Message #11 received at 69535 <at> debbugs.gnu.org (full text, mbox):

From: Brian <b_lists <at> patandbrian.org>
To: 69535 <at> debbugs.gnu.org
Subject: Re: bug#69535: Problem with copying an EXTREMELY large file - cmp
 finds a mismatch
Date: Mon, 4 Mar 2024 03:53:44 -0500
On 3/4/24 03:10, Paul Eggert wrote:
> Try running 'strace -o tr cp data.dat original' and then look at the 
> file 'tr' (which could be quite large). Look for the syscalls near the 
> start, and near the end, of the bulk copy.
> 
> Quite possibly it's a bug in your Linux drivers or your firmware or 
> hardware. For example, if you're using ZFS, see:
> 
> https://github.com/openzfs/zfs/issues/15526
> 
> The strace output might help figure this out.


My drives are formatted using ext4. The command above did indeed 
produce a large output file, almost 40 Megabytes of it, but deleting 
every line that started with

read(3,

or

write(4,

(there were over 300,000 pairs) got the file down to a far more 
manageable 7 KB. At first glance, it doesn't make much sense to me, 
but I will try going through it line-by-line tomorrow (it's silly 
o'clock at the moment) and see whether anything jumps out at me.

Thanks for the help.





Information forwarded to bug-coreutils <at> gnu.org:
bug#69535; Package coreutils. (Fri, 08 Mar 2024 09:39:01 GMT) Full text and rfc822 format available.

Message #14 received at 69535 <at> debbugs.gnu.org (full text, mbox):

From: brian <brian <at> patandbrian.org>
To: 69535 <at> debbugs.gnu.org
Subject: update
Date: Fri, 8 Mar 2024 03:49:50 -0500
Sorry for the delay in updating this problem - I've been doing some 
testing!

The first thing I did was wrote a quick and dirty Pascal program to do 
a byte-by-byte comparison of the data files, just in case it was cmp 
that was causing the problem, not cp. The results were the same using 
my program. OK, so cmp is innocent.

I then wrote a program which generated 40 GB of more or less random 
data - 200,000,000 fixed length records of 200 bytes each - and sent 
it to a friend with the request that he do the test I did, i.e. 
generate the file, copy it, and compare the copies. He's also using 
Mint 21.3, as I am, but with a different type of drive. The files 
compared correctly on his system.

OK, so the next thing was I tried using different drives. I have a 2 
GB SSD in this system, so I tried that. No problem, the files compared 
correctly. I got the same result using my RAID 5 backup device, which 
contains 5 of the same type of drive as was producing the error when 
connected direct to the SATA port, but obviously the backup device's 
software has an effect there.

So, the bottom line is that I'm chasing some kind of goofy bug which 
likely involves the drivers for my hardware - a 14 TB Seagate 
connected to the mobo SATA port - and cp is also innocent.

Please consider this bug report to be closed. I'm not sure if/how I 
can do that via e-mail.




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Fri, 08 Mar 2024 19:23:02 GMT) Full text and rfc822 format available.

Notification sent to Brian <b_lists <at> patandbrian.org>:
bug acknowledged by developer. (Fri, 08 Mar 2024 19:23:02 GMT) Full text and rfc822 format available.

Message #19 received at 69535-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: brian <brian <at> patandbrian.org>
Cc: 69535-done <at> debbugs.gnu.org
Subject: Re: bug#69535: update
Date: Fri, 8 Mar 2024 11:21:45 -0800
On 2024-03-08 00:49, brian wrote:
> Please consider this bug report to be closed. I'm not sure if/how I can 
> do that via e-mail.

Thanks for following up, and good luck with your hardware or drivers. 
Closing the bug report.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 06 Apr 2024 11:24:20 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 167 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.