GNU bug report logs -
#44838
diff 3.7 incorrectly reports added lines and can generate huge diffs
Previous Next
To reply to this bug, email your comments to 44838 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-diffutils <at> gnu.org
:
bug#44838
; Package
diffutils
.
(Tue, 24 Nov 2020 11:35:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Vincent Lefevre <vincent <at> vinc17.net>
:
New bug report received and forwarded. Copy sent to
bug-diffutils <at> gnu.org
.
(Tue, 24 Nov 2020 11:35:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I've attached an archive with 2 files "file1" and "file2"; "file2"
is "file1" with some lines removed, so that a diff should report
only removed lines.
Here are some tests done under Debian/sid (x86_64) with diff 3.7
(Debian package diffutils 1:3.7-3).
First, for the reference, the size of the initial diff:
$ diff -u file1 file2 | wc -l
22319
But this diff reports added lines, though "file2" has only removed
lines compared to "file1".
──────────────────────────────────────────────────────────────────
$ diff -u file1 file2 | grep -C16 'mark :37950'
-commit refs/heads/master
-#legacy-id 9122
-mark :37948
-committer Vincent Lefèvre <vincent <at> vinc17.net> 1404215412 +0000
-data 55
-[tests/trandom_deviate.c] Correction (fprintf format).
-from :37946
-M 100644 :37947 tests/trandom_deviate.c
-
-blob
-mark :37949
-data 15
-Blob at :37949
-
-commit refs/heads/misc
-#legacy-id 9123
-mark :37950
-committer Vincent Lefèvre <vincent <at> vinc17.net> 1404216001 +0000
-data 23
-[www/pub.html] Update.
-from :37941
-M 100644 :37949 www/pub.html
+data 55
+[tests/trandom_deviate.c] Correction (fprintf format).
+from :37946
+M 100644 :37947 tests/trandom_deviate.c
blob
mark :37951
@@ -9910,21 +467,6 @@
M 100644 :38018 src/round_raw_generic.c
blob
──────────────────────────────────────────────────────────────────
In particular, one can see:
-data 55
-[tests/trandom_deviate.c] Correction (fprintf format).
-from :37946
-M 100644 :37947 tests/trandom_deviate.c
and
+data 55
+[tests/trandom_deviate.c] Correction (fprintf format).
+from :37946
+M 100644 :37947 tests/trandom_deviate.c
while these lines should have been regarded as unmodified.
This problem disappears if I shorten "file2" a bit (these lines are
at the very beginning in "file2", so that such a change of behavior
is surprising):
$ head -n 129410 file2 > file3
$ diff -u file1 file3 | grep '^\+'
+++ file3 2020-11-24 11:58:17.922462693 +0100
So, now, no added lines reported. This is fine.
And here's what diff now gives around these lines:
──────────────────────────────────────────────────────────────────
$ diff -u file1 file3 | grep -C16 'mark :37950'
-commit refs/heads/master
-#legacy-id 9122
-mark :37948
-committer Vincent Lefèvre <vincent <at> vinc17.net> 1404215412 +0000
data 55
[tests/trandom_deviate.c] Correction (fprintf format).
from :37946
M 100644 :37947 tests/trandom_deviate.c
blob
-mark :37949
-data 15
-Blob at :37949
-
-commit refs/heads/misc
-#legacy-id 9123
-mark :37950
-committer Vincent Lefèvre <vincent <at> vinc17.net> 1404216001 +0000
-data 23
-[www/pub.html] Update.
-from :37941
-M 100644 :37949 www/pub.html
-
-blob
mark :37951
data 15
Blob at :37951
@@ -9910,21 +467,6 @@
M 100644 :38018 src/round_raw_generic.c
blob
-mark :38020
-data 15
──────────────────────────────────────────────────────────────────
This is now OK, but stranger things happen when I reduce "file2"
even more:
$ head -n 120200 file2 > file4
$ diff -u file1 file4 | grep -c '^\+'
7
$ diff -u file1 file4 | wc -l
31251
So, with "file2" reduced to 120200 lines, 7 − 1 = 6 added lines
are reported (though this new file has only removed lines). This
is incorrect, but if I remove 100 more lines at the end, this is
much worse, with 81120 added lines reported, and a huge diff:
$ head -n 120100 file2 > file5
$ diff -u file1 file5 | grep -c '^\+'
81121
$ diff -u file1 file5 | wc -l
231111
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
[files.tar.xz (application/octet-stream, attachment)]
This bug report was last modified 4 years and 203 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.