GNU bug report logs -
#69535
Problem with copying an EXTREMELY large file - cmp finds a mismatch
Previous Next
Reported by: Brian <b_lists <at> patandbrian.org>
Date: Mon, 4 Mar 2024 04:27:02 UTC
Severity: normal
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 69535 in the body.
You can then email your comments to 69535 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#69535
; Package
coreutils
.
(Mon, 04 Mar 2024 04:27:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Brian <b_lists <at> patandbrian.org>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Mon, 04 Mar 2024 04:27:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
I don't know whether the problem I've found is with cp or with cmp, so
I don't know whether to address this report to coreutils or diffutils.
If you think I've guessed wrong, please tell me so.
I am trying to make a backup copy of a very large (40 Gigabyte) data
file - yes, I have plenty of disk space! :) It's a binary file, 200
byte fixed length records to be precise, not a text file. I have
downloaded, compiled and used the latest versions of cp and cmp and
the problem persists. My system is a 16-core AMD Ryzen desktop running
Linux Mint 21.3.
The steps to reproduce the problem are simple, provided you have the
data file!
I have a folder called original in the data directory. From a terminal
prompt, I run
cp data.dat original
this apparently completes correctly - at least, no error messages are seen
I then run
cmp -l data.dat original/data.dat
and I get something around 100 bytes of differences. On the basis of
three attempted copy and comparison pairs, the addresses of these
differences vary, but they're always a single block of contiguous
locations, and always towards the end of the file (the last time, they
were in the 35,000,000,000s).
I have run a fsck on the drive (a 14 TB Seagate connected to one of
the motherboard SATA ports) and no problems were found.
Any advice, please? I'm close to the limits of my debugging knowledge.
Please note that I have absolutely zero knowledge of the C language or
its derivatives. I'm a (retired) scientist turned database programmer,
I know Pascal, FORTRAN and SQL, and that's about it.
Thanks,
Brian.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#69535
; Package
coreutils
.
(Mon, 04 Mar 2024 08:12:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 69535 <at> debbugs.gnu.org (full text, mbox):
Try running 'strace -o tr cp data.dat original' and then look at the
file 'tr' (which could be quite large). Look for the syscalls near the
start, and near the end, of the bulk copy.
Quite possibly it's a bug in your Linux drivers or your firmware or
hardware. For example, if you're using ZFS, see:
https://github.com/openzfs/zfs/issues/15526
The strace output might help figure this out.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#69535
; Package
coreutils
.
(Mon, 04 Mar 2024 14:12:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 69535 <at> debbugs.gnu.org (full text, mbox):
On 3/4/24 03:10, Paul Eggert wrote:
> Try running 'strace -o tr cp data.dat original' and then look at the
> file 'tr' (which could be quite large). Look for the syscalls near the
> start, and near the end, of the bulk copy.
>
> Quite possibly it's a bug in your Linux drivers or your firmware or
> hardware. For example, if you're using ZFS, see:
>
> https://github.com/openzfs/zfs/issues/15526
>
> The strace output might help figure this out.
My drives are formatted using ext4. The command above did indeed
produce a large output file, almost 40 Megabytes of it, but deleting
every line that started with
read(3,
or
write(4,
(there were over 300,000 pairs) got the file down to a far more
manageable 7 KB. At first glance, it doesn't make much sense to me,
but I will try going through it line-by-line tomorrow (it's silly
o'clock at the moment) and see whether anything jumps out at me.
Thanks for the help.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#69535
; Package
coreutils
.
(Fri, 08 Mar 2024 09:39:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 69535 <at> debbugs.gnu.org (full text, mbox):
Sorry for the delay in updating this problem - I've been doing some
testing!
The first thing I did was wrote a quick and dirty Pascal program to do
a byte-by-byte comparison of the data files, just in case it was cmp
that was causing the problem, not cp. The results were the same using
my program. OK, so cmp is innocent.
I then wrote a program which generated 40 GB of more or less random
data - 200,000,000 fixed length records of 200 bytes each - and sent
it to a friend with the request that he do the test I did, i.e.
generate the file, copy it, and compare the copies. He's also using
Mint 21.3, as I am, but with a different type of drive. The files
compared correctly on his system.
OK, so the next thing was I tried using different drives. I have a 2
GB SSD in this system, so I tried that. No problem, the files compared
correctly. I got the same result using my RAID 5 backup device, which
contains 5 of the same type of drive as was producing the error when
connected direct to the SATA port, but obviously the backup device's
software has an effect there.
So, the bottom line is that I'm chasing some kind of goofy bug which
likely involves the drivers for my hardware - a 14 TB Seagate
connected to the mobo SATA port - and cp is also innocent.
Please consider this bug report to be closed. I'm not sure if/how I
can do that via e-mail.
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Fri, 08 Mar 2024 19:23:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Brian <b_lists <at> patandbrian.org>
:
bug acknowledged by developer.
(Fri, 08 Mar 2024 19:23:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 69535-done <at> debbugs.gnu.org (full text, mbox):
On 2024-03-08 00:49, brian wrote:
> Please consider this bug report to be closed. I'm not sure if/how I can
> do that via e-mail.
Thanks for following up, and good luck with your hardware or drivers.
Closing the bug report.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 06 Apr 2024 11:24:20 GMT)
Full text and
rfc822 format available.
This bug report was last modified 1 year and 167 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.