GNU bug report logs -
#44838
diff 3.7 incorrectly reports added lines and can generate huge diffs
Previous Next
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
I've attached an archive with 2 files "file1" and "file2"; "file2"
is "file1" with some lines removed, so that a diff should report
only removed lines.
Here are some tests done under Debian/sid (x86_64) with diff 3.7
(Debian package diffutils 1:3.7-3).
First, for the reference, the size of the initial diff:
$ diff -u file1 file2 | wc -l
22319
But this diff reports added lines, though "file2" has only removed
lines compared to "file1".
──────────────────────────────────────────────────────────────────
$ diff -u file1 file2 | grep -C16 'mark :37950'
-commit refs/heads/master
-#legacy-id 9122
-mark :37948
-committer Vincent Lefèvre <vincent <at> vinc17.net> 1404215412 +0000
-data 55
-[tests/trandom_deviate.c] Correction (fprintf format).
-from :37946
-M 100644 :37947 tests/trandom_deviate.c
-
-blob
-mark :37949
-data 15
-Blob at :37949
-
-commit refs/heads/misc
-#legacy-id 9123
-mark :37950
-committer Vincent Lefèvre <vincent <at> vinc17.net> 1404216001 +0000
-data 23
-[www/pub.html] Update.
-from :37941
-M 100644 :37949 www/pub.html
+data 55
+[tests/trandom_deviate.c] Correction (fprintf format).
+from :37946
+M 100644 :37947 tests/trandom_deviate.c
blob
mark :37951
@@ -9910,21 +467,6 @@
M 100644 :38018 src/round_raw_generic.c
blob
──────────────────────────────────────────────────────────────────
In particular, one can see:
-data 55
-[tests/trandom_deviate.c] Correction (fprintf format).
-from :37946
-M 100644 :37947 tests/trandom_deviate.c
and
+data 55
+[tests/trandom_deviate.c] Correction (fprintf format).
+from :37946
+M 100644 :37947 tests/trandom_deviate.c
while these lines should have been regarded as unmodified.
This problem disappears if I shorten "file2" a bit (these lines are
at the very beginning in "file2", so that such a change of behavior
is surprising):
$ head -n 129410 file2 > file3
$ diff -u file1 file3 | grep '^\+'
+++ file3 2020-11-24 11:58:17.922462693 +0100
So, now, no added lines reported. This is fine.
And here's what diff now gives around these lines:
──────────────────────────────────────────────────────────────────
$ diff -u file1 file3 | grep -C16 'mark :37950'
-commit refs/heads/master
-#legacy-id 9122
-mark :37948
-committer Vincent Lefèvre <vincent <at> vinc17.net> 1404215412 +0000
data 55
[tests/trandom_deviate.c] Correction (fprintf format).
from :37946
M 100644 :37947 tests/trandom_deviate.c
blob
-mark :37949
-data 15
-Blob at :37949
-
-commit refs/heads/misc
-#legacy-id 9123
-mark :37950
-committer Vincent Lefèvre <vincent <at> vinc17.net> 1404216001 +0000
-data 23
-[www/pub.html] Update.
-from :37941
-M 100644 :37949 www/pub.html
-
-blob
mark :37951
data 15
Blob at :37951
@@ -9910,21 +467,6 @@
M 100644 :38018 src/round_raw_generic.c
blob
-mark :38020
-data 15
──────────────────────────────────────────────────────────────────
This is now OK, but stranger things happen when I reduce "file2"
even more:
$ head -n 120200 file2 > file4
$ diff -u file1 file4 | grep -c '^\+'
7
$ diff -u file1 file4 | wc -l
31251
So, with "file2" reduced to 120200 lines, 7 − 1 = 6 added lines
are reported (though this new file has only removed lines). This
is incorrect, but if I remove 100 more lines at the end, this is
much worse, with 81120 added lines reported, and a huge diff:
$ head -n 120100 file2 > file5
$ diff -u file1 file5 | grep -c '^\+'
81121
$ diff -u file1 file5 | wc -l
231111
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
[files.tar.xz (application/octet-stream, attachment)]
This bug report was last modified 4 years and 233 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.