GNU bug report logs - #44838
diff 3.7 incorrectly reports added lines and can generate huge diffs

Previous Next

Package: diffutils;

Reported by: Vincent Lefevre <vincent <at> vinc17.net>

Date: Tue, 24 Nov 2020 11:35:02 UTC

Severity: normal

To reply to this bug, email your comments to 44838 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-diffutils <at> gnu.org:
bug#44838; Package diffutils. (Tue, 24 Nov 2020 11:35:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Vincent Lefevre <vincent <at> vinc17.net>:
New bug report received and forwarded. Copy sent to bug-diffutils <at> gnu.org. (Tue, 24 Nov 2020 11:35:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Vincent Lefevre <vincent <at> vinc17.net>
To: bug-diffutils <at> gnu.org
Subject: diff 3.7 incorrectly reports added lines and can generate huge diffs
Date: Tue, 24 Nov 2020 12:33:52 +0100
[Message part 1 (text/plain, inline)]
I've attached an archive with 2 files "file1" and "file2"; "file2"
is "file1" with some lines removed, so that a diff should report
only removed lines.

Here are some tests done under Debian/sid (x86_64) with diff 3.7
(Debian package diffutils 1:3.7-3).

First, for the reference, the size of the initial diff:

$ diff -u file1 file2 | wc -l
22319

But this diff reports added lines, though "file2" has only removed
lines compared to "file1".

──────────────────────────────────────────────────────────────────
$ diff -u file1 file2 | grep -C16 'mark :37950'
-commit refs/heads/master
-#legacy-id 9122
-mark :37948
-committer Vincent Lefèvre <vincent <at> vinc17.net> 1404215412 +0000
-data 55
-[tests/trandom_deviate.c] Correction (fprintf format).
-from :37946
-M 100644 :37947 tests/trandom_deviate.c
-
-blob
-mark :37949
-data 15
-Blob at :37949
-
-commit refs/heads/misc
-#legacy-id 9123
-mark :37950
-committer Vincent Lefèvre <vincent <at> vinc17.net> 1404216001 +0000
-data 23
-[www/pub.html] Update.
-from :37941
-M 100644 :37949 www/pub.html
+data 55
+[tests/trandom_deviate.c] Correction (fprintf format).
+from :37946
+M 100644 :37947 tests/trandom_deviate.c
 
 blob
 mark :37951
@@ -9910,21 +467,6 @@
 M 100644 :38018 src/round_raw_generic.c
 
 blob
──────────────────────────────────────────────────────────────────

In particular, one can see:

-data 55
-[tests/trandom_deviate.c] Correction (fprintf format).
-from :37946
-M 100644 :37947 tests/trandom_deviate.c

and

+data 55
+[tests/trandom_deviate.c] Correction (fprintf format).
+from :37946
+M 100644 :37947 tests/trandom_deviate.c

while these lines should have been regarded as unmodified.

This problem disappears if I shorten "file2" a bit (these lines are
at the very beginning in "file2", so that such a change of behavior
is surprising):

$ head -n 129410 file2 > file3
$ diff -u file1 file3 | grep '^\+'
+++ file3       2020-11-24 11:58:17.922462693 +0100

So, now, no added lines reported. This is fine.

And here's what diff now gives around these lines:

──────────────────────────────────────────────────────────────────
$ diff -u file1 file3 | grep -C16 'mark :37950'
-commit refs/heads/master
-#legacy-id 9122
-mark :37948
-committer Vincent Lefèvre <vincent <at> vinc17.net> 1404215412 +0000
 data 55
 [tests/trandom_deviate.c] Correction (fprintf format).
 from :37946
 M 100644 :37947 tests/trandom_deviate.c
 
 blob
-mark :37949
-data 15
-Blob at :37949
-
-commit refs/heads/misc
-#legacy-id 9123
-mark :37950
-committer Vincent Lefèvre <vincent <at> vinc17.net> 1404216001 +0000
-data 23
-[www/pub.html] Update.
-from :37941
-M 100644 :37949 www/pub.html
-
-blob
 mark :37951
 data 15
 Blob at :37951
@@ -9910,21 +467,6 @@
 M 100644 :38018 src/round_raw_generic.c
 
 blob
-mark :38020
-data 15
──────────────────────────────────────────────────────────────────

This is now OK, but stranger things happen when I reduce "file2"
even more:

$ head -n 120200 file2 > file4
$ diff -u file1 file4 | grep -c '^\+'
7
$ diff -u file1 file4 | wc -l
31251

So, with "file2" reduced to 120200 lines, 7 − 1 = 6 added lines
are reported (though this new file has only removed lines). This
is incorrect, but if I remove 100 more lines at the end, this is
much worse, with 81120 added lines reported, and a huge diff:

$ head -n 120100 file2 > file5
$ diff -u file1 file5 | grep -c '^\+'
81121
$ diff -u file1 file5 | wc -l
231111

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
[files.tar.xz (application/octet-stream, attachment)]

This bug report was last modified 4 years and 203 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.