GNU bug report logs -
#69748
Does diff not work on big enough files?
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 69748 in the body.
You can then email your comments to 69748 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-diffutils <at> gnu.org
:
bug#69748
; Package
diffutils
.
(Tue, 12 Mar 2024 15:20:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Robert Boyer <robertstephenboyer <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-diffutils <at> gnu.org
.
(Tue, 12 Mar 2024 15:20:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I am not sure whether to call this a bug, but it is a difficulty for me.
It is simply incredible to me that diff might not work!
If one cannot count on diff to work, is there anything one can count on? Does
diff just not work on big enough files? Apparently yes.
> diff the-primes-below-10000000000.lisp billion-primes.txt
diff: the-primes-below-10000000000.lisp: Cannot allocate memory
> ls -l the-primes-below-10000000000.lisp billion-primes.txt
-rw-r----- 1 bob chronos-access 501959790 Mar 10 14:08 billion-primes.txt
-rw-r----- 1 bob chronos-access 5403267048 Mar 12 09:55
the-primes-below-10000000000.lisp
>
> free
total used free shared buff/cache
available
Mem: 6736088 1458180 5060628 16568 217280
5277908
Swap: 0 0 0
>
I am running on a $300 Lenovo Chromebook using their default Gnu Linux.
Bob
> cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 156
model name : Intel(R) Celeron(R) N4500 @ 1.10GHz
stepping : 0
microcode : 0x1
cpu MHz : 1113.600
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 27
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc
arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni
pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave rdrand hypervisor lahf_lm 3dnowprefetch
cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority
ept vpid ept_ad fsgsbase tsc_adjust smep erms rdseed smap clflushopt clwb
sha_ni xsaveopt xsavec xgetbv1 xsaves arat umip gfni rdpid movdiri
movdir64b md_clear arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad
ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid
unrestricted_guest vapic_reg vid shadow_vmcs pml
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs srbds mmio_stale_data
bogomips : 2227.20
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 156
model name : Intel(R) Celeron(R) N4500 @ 1.10GHz
stepping : 0
microcode : 0x1
cpu MHz : 1113.600
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 27
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_tsc
arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni
pclmulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave rdrand hypervisor lahf_lm 3dnowprefetch
cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority
ept vpid ept_ad fsgsbase tsc_adjust smep erms rdseed smap clflushopt clwb
sha_ni xsaveopt xsavec xgetbv1 xsaves arat umip gfni rdpid movdiri
movdir64b md_clear arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad
ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid
unrestricted_guest vapic_reg vid shadow_vmcs pml
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs srbds mmio_stale_data
bogomips : 2227.20
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-diffutils <at> gnu.org
:
bug#69748
; Package
diffutils
.
(Tue, 12 Mar 2024 20:00:03 GMT)
Full text and
rfc822 format available.
Message #8 received at 69748 <at> debbugs.gnu.org (full text, mbox):
On 3/12/24 08:17, Robert Boyer wrote:
> It is simply incredible to me that diff might not work!
Like any other program, 'diff' needs enough resources to run. You're
trying to compare a 5 GiB file on a Chromebook that has (let me guess) 4
GiB of RAM and 32 GB of flash, most of which is occupied by ChromeOS and
other stuff. If so, there isn't enough room for 'diff' to do its job
with its current algorithm and you'll have to either use a bigger
machine or solve a smaller problem.
It's possible to imagine a different 'diff' algorithm that would take
less RAM but a lot more time, presumably because it would do more I/O to
a temporary file. But if the available flash is small enough, even that
wouldn't work. I doubt whether it'd be worth the time to develop the
code for this alternative approach.
Added tag(s) notabug.
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Tue, 12 Mar 2024 20:21:01 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
69748 <at> debbugs.gnu.org and Robert Boyer <robertstephenboyer <at> gmail.com>
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Tue, 12 Mar 2024 20:21:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-diffutils <at> gnu.org
:
bug#69748
; Package
diffutils
.
(Tue, 12 Mar 2024 20:25:02 GMT)
Full text and
rfc822 format available.
Message #15 received at 69748 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Are you trying to be funny? Or are you simply stupid? You are much too
brilliant and famous
to be stupid, so I am assuming you were trying to be funny, a parody of the
overworked bug fixer.
In an almost immediate follow up message, I already solved the problem, and
it
worked perfectly for me trying to compare an old file of the primes below a
billion with a new
file of the primes below ten billion. Fortunately, this little gem of a
program helped me
believe that I had computed at least the primes below a billion correctly.
What a relief!
> there isn't enough room for 'diff' to do its job with its current
algorithm
Probably very sadly true, so you must improve your algorithm, and here is
how. It won't hurt, I promise.
From my previous message:
Here is a better version of diff, better only in the sense that it works on
all files. But what do I know? Nothing.
This is Common Lisp. I was running in SBCL.
(defun my-diff (file1 file2)
(let ((s1 (open file1 :element-type '(integer 0 255)))
(s2 (open file2 :element-type '(integer 0 255)))
(c1 0)
(c2 0))
(declare (fixnum c1 c2))
(loop
(setq c1 (read-byte s1 nil 256))
(setq c2 (read-byte s2 nil 256))
(cond ((and (eql c1 256) (eql c2 256)) (return "no difference")))
(cond ((eql c1 256) (return "file1 hit eof first")))
(cond ((eql c2 256) (return "file2 hit eof first")))
(cond ((eql c1 c2))
(t (return (format nil
"difference at position ~s; c1 = ~s, c2 = ~s."
(file-position s1) c1 c2)))))))
On Tue, Mar 12, 2024 at 2:58 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 3/12/24 08:17, Robert Boyer wrote:
>
> > It is simply incredible to me that diff might not work!
>
> Like any other program, 'diff' needs enough resources to run. You're
> trying to compare a 5 GiB file on a Chromebook that has (let me guess) 4
> GiB of RAM and 32 GB of flash, most of which is occupied by ChromeOS and
> other stuff. If so, there isn't enough room for 'diff' to do its job
> with its current algorithm and you'll have to either use a bigger
> machine or solve a smaller problem.
>
> It's possible to imagine a different 'diff' algorithm that would take
> less RAM but a lot more time, presumably because it would do more I/O to
> a temporary file. But if the available flash is small enough, even that
> wouldn't work. I doubt whether it'd be worth the time to develop the
> code for this alternative approach.
>
[Message part 2 (text/html, inline)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 10 Apr 2024 11:24:09 GMT)
Full text and
rfc822 format available.
This bug report was last modified 1 year and 149 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.