From debbugs-submit-bounces@debbugs.gnu.org Mon Oct 17 09:03:34 2016 Received: (at submit) by debbugs.gnu.org; 17 Oct 2016 13:03:34 +0000 Received: from localhost ([127.0.0.1]:36181 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bw7Zj-0000UD-5N for submit@debbugs.gnu.org; Mon, 17 Oct 2016 09:03:34 -0400 Received: from eggs.gnu.org ([208.118.235.92]:42246) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bw7Ze-0000Tu-9h for submit@debbugs.gnu.org; Mon, 17 Oct 2016 09:03:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bw7ZU-0003sr-92 for submit@debbugs.gnu.org; Mon, 17 Oct 2016 09:03:21 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:47830) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bw7ZU-0003sa-5b for submit@debbugs.gnu.org; Mon, 17 Oct 2016 09:03:16 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33160) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bw7ZQ-0002xj-1K for bug-diffutils@gnu.org; Mon, 17 Oct 2016 09:03:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bw7ZM-0003p9-8p for bug-diffutils@gnu.org; Mon, 17 Oct 2016 09:03:12 -0400 Received: from mx2.suse.de ([195.135.220.15]:60198) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bw7ZM-0003oG-2b for bug-diffutils@gnu.org; Mon, 17 Oct 2016 09:03:08 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 03495AAEF for ; Mon, 17 Oct 2016 13:03:05 +0000 (UTC) From: Andreas Schwab To: bug-diffutils@gnu.org Subject: Massive performance regression makes diff unusable X-Yow: Did you move a lot of KOREAN STEAK KNIVES this trip, Dingy? Date: Mon, 17 Oct 2016 15:03:05 +0200 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) $ wc tmp.* 6712297 27215448 135217073 tmp.1 6712247 27215244 135216202 tmp.2 13424544 54430692 270433275 total $ diff --version | head -1 diff (GNU diffutils) 3.3 $ time diff -u tmp.* > xx 17.023user 8.246system 0m25.335selapsed 99.74%CPU $ diff --version | head -1 diff (GNU diffutils) 3.5 $ time diff -u tmp.* > yy 5020.640user 3.176system 83m50.461selapsed 99.86%CPU Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." From debbugs-submit-bounces@debbugs.gnu.org Thu Oct 20 18:25:31 2016 Received: (at 24715) by debbugs.gnu.org; 20 Oct 2016 22:25:31 +0000 Received: from localhost ([127.0.0.1]:42550 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bxLmF-0005KS-6Q for submit@debbugs.gnu.org; Thu, 20 Oct 2016 18:25:31 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:51242) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bxLmE-0005KF-5n for 24715@debbugs.gnu.org; Thu, 20 Oct 2016 18:25:30 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 68E501612B3; Thu, 20 Oct 2016 15:25:24 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id oRlFNYhrxXoA; Thu, 20 Oct 2016 15:25:23 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id BB3551612B0; Thu, 20 Oct 2016 15:25:23 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id jZsrQPnBstLb; Thu, 20 Oct 2016 15:25:23 -0700 (PDT) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id A428B161298; Thu, 20 Oct 2016 15:25:23 -0700 (PDT) Subject: Re: [bug-diffutils] bug#24715: Massive performance regression makes diff unusable To: Andreas Schwab , 24715@debbugs.gnu.org References: From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: Date: Thu, 20 Oct 2016 15:25:23 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.3 (/) X-Debbugs-Envelope-To: 24715 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.3 (/) Thanks, can you make the data files available so we can reproduce the problem? From debbugs-submit-bounces@debbugs.gnu.org Mon Oct 24 04:50:48 2016 Received: (at 24715) by debbugs.gnu.org; 24 Oct 2016 08:50:48 +0000 Received: from localhost ([127.0.0.1]:51968 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1byay0-0006ag-IB for submit@debbugs.gnu.org; Mon, 24 Oct 2016 04:50:48 -0400 Received: from mx2.suse.de ([195.135.220.15]:52040) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1byaxz-0006aY-3y for 24715@debbugs.gnu.org; Mon, 24 Oct 2016 04:50:47 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id E23ACAB9D; Mon, 24 Oct 2016 08:50:45 +0000 (UTC) From: Andreas Schwab To: Paul Eggert Subject: Re: [bug-diffutils] bug#24715: Massive performance regression makes diff unusable References: X-Yow: Did I SELL OUT yet?? Date: Mon, 24 Oct 2016 10:50:45 +0200 In-Reply-To: (Paul Eggert's message of "Thu, 20 Oct 2016 15:25:23 -0700") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.6 (--) X-Debbugs-Envelope-To: 24715 Cc: 24715@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.6 (--) https://bugzilla.opensuse.org/attachment.cgi?id=698854 Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." From debbugs-submit-bounces@debbugs.gnu.org Wed Oct 26 02:13:31 2016 Received: (at 24715-done) by debbugs.gnu.org; 26 Oct 2016 06:13:31 +0000 Received: from localhost ([127.0.0.1]:56946 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bzHSt-00068a-8j for submit@debbugs.gnu.org; Wed, 26 Oct 2016 02:13:31 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:47638) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bzHSr-00068M-Jr for 24715-done@debbugs.gnu.org; Wed, 26 Oct 2016 02:13:30 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C3B8D161297; Tue, 25 Oct 2016 23:13:23 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id rCj8k_O7DFIz; Tue, 25 Oct 2016 23:13:22 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 213FD16129D; Tue, 25 Oct 2016 23:13:22 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 1PKjSOoDPPmQ; Tue, 25 Oct 2016 23:13:22 -0700 (PDT) Received: from [192.168.1.9] (unknown [47.153.178.162]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id EE01B161297; Tue, 25 Oct 2016 23:13:21 -0700 (PDT) Subject: Re: [bug-diffutils] bug#24715: Massive performance regression makes diff unusable To: Andreas Schwab References: From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <4b24a328-3563-5441-939d-f6e87552cc2d@cs.ucla.edu> Date: Tue, 25 Oct 2016 23:13:21 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------F6A67579D3B67A8A82A6101C" X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: 24715-done Cc: 24715-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.4 (-) This is a multi-part message in MIME format. --------------F6A67579D3B67A8A82A6101C Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Thanks for the test data. I installed a Gnulib patch, noted here: http://lists.gnu.org/archive/html/bug-gnulib/2016-10/msg00157.html along with the attached diffutils patches. This fixed the problem for me, so I am closing the bug report. --------------F6A67579D3B67A8A82A6101C Content-Type: text/x-diff; name="0001-build-update-gnulib-submodule-to-latest.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-build-update-gnulib-submodule-to-latest.patch" >From 571f01c069dfc7f860e5500a5d08ebfdaf9c3068 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 25 Oct 2016 21:52:31 -0700 Subject: [PATCH 1/2] build: update gnulib submodule to latest --- gnulib | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gnulib b/gnulib index e63f5eb..d55ed06 160000 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit e63f5eb55570a1ea3e51ce47df33689785e085c1 +Subproject commit d55ed0636eefeb43ad1aa8123416c33839652646 -- 2.7.4 --------------F6A67579D3B67A8A82A6101C Content-Type: text/x-diff; name="0002-diff-fix-big-performance-degradation-in-3.4.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0002-diff-fix-big-performance-degradation-in-3.4.patch" >From 68b82f6f8419a815cfcf962b3061352d414dc606 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 25 Oct 2016 21:57:56 -0700 Subject: [PATCH 2/2] diff: fix big performance degradation in 3.4 * NEWS, doc/diffutils.texi (Overview): Document this. * src/analyze.c (diff_2_files): Restore too_expensive heuristic, but this time with a floor that is 16 times the old floor. This should fix Bug#16848, by generating good-quality output for its test case, while not introducing Bug#24715, by running nearly as fast as diff-3.3 for that test case. --- NEWS | 5 +++++ doc/diffutils.texi | 5 ++++- src/analyze.c | 11 ++++++++++- 3 files changed, 19 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index 473332e..88f5d2b 100644 --- a/NEWS +++ b/NEWS @@ -6,6 +6,11 @@ GNU diffutils NEWS -*- outline -*- diff no longer mishandles line numbers exceeding 2**31 on Mingw-w64. +** Performance changes + + diff's default algorithm has been tweaked to deal better with larger + files, reversing some of the changes made in diffutils-3.4. + * Noteworthy changes in release 3.5 (2016-08-20) [stable] diff --git a/doc/diffutils.texi b/doc/diffutils.texi index b478380..ceaf557 100644 --- a/doc/diffutils.texi +++ b/doc/diffutils.texi @@ -161,7 +161,10 @@ The algorithm was independently discovered as described by Esko Ukkonen in @c Date: Wed, 29 Sep 1993 08:27:55 MST @c Ukkonen should be given credit for also discovering the algorithm used @c in GNU diff. -Related algorithms are surveyed by Alfred V. Aho in +Unless the @option{--minimal} option is used, @command{diff} uses a +heuristic by Paul Eggert that limits the cost to @math{O(N^1.5 log N)} +at the price of producing suboptimal output for large inputs with many +differences. Related algorithms are surveyed by Alfred V. Aho in section 6.3 of ``Algorithms for Finding Patterns in Strings'', @cite{Handbook of Theoretical Computer Science} (Jan Van Leeuwen, ed.), Vol.@: A, @cite{Algorithms and Complexity}, Elsevier/MIT Press, diff --git a/src/analyze.c b/src/analyze.c index 3981872..847b8b0 100644 --- a/src/analyze.c +++ b/src/analyze.c @@ -534,6 +534,7 @@ diff_2_files (struct comparison *cmp) { struct context ctxt; lin diags; + lin too_expensive; /* Allocate vectors for the results of comparison: a flag for each line of each file, saying whether that line @@ -565,11 +566,19 @@ diff_2_files (struct comparison *cmp) ctxt.heuristic = speed_large_files; + /* Set TOO_EXPENSIVE to be the approximate square root of the + input size, bounded below by 4096. 4096 seems to be good for + circa-2016 CPUs; see Bug#16848 and Bug#24715. */ + too_expensive = 1; + for (; diags != 0; diags >>= 2) + too_expensive <<= 1; + ctxt.too_expensive = MAX (4096, too_expensive); + files[0] = cmp->file[0]; files[1] = cmp->file[1]; compareseq (0, cmp->file[0].nondiscarded_lines, - 0, cmp->file[1].nondiscarded_lines, &ctxt); + 0, cmp->file[1].nondiscarded_lines, minimal, &ctxt); free (ctxt.fdiag - (cmp->file[1].nondiscarded_lines + 1)); -- 2.7.4 --------------F6A67579D3B67A8A82A6101C-- From unknown Sun Aug 10 12:26:42 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 23 Nov 2016 12:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator