From unknown Sat Aug 16 22:48:05 2025 X-Loop: help-debbugs@gnu.org Subject: bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6 Resent-From: Frank Heckenbach Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 20 Nov 2020 05:32:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 44754 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 44754@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Reply-To: bug-grep@gnu.org, f.heckenbach@fh-soft.de Received: via spool by submit@debbugs.gnu.org id=B.160585032123222 (code B ref -1); Fri, 20 Nov 2020 05:32:01 +0000 Received: (at submit) by debbugs.gnu.org; 20 Nov 2020 05:32:01 +0000 Received: from localhost ([127.0.0.1]:41699 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kfz1c-00062T-Kn for submit@debbugs.gnu.org; Fri, 20 Nov 2020 00:32:00 -0500 Received: from lists.gnu.org ([209.51.188.17]:33488) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kfyrA-0005kP-Dk for submit@debbugs.gnu.org; Fri, 20 Nov 2020 00:21:12 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:38516) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kfyr9-0006mw-RP for bug-grep@gnu.org; Fri, 20 Nov 2020 00:21:12 -0500 Received: from mx3.gerwinski.de ([2a01:4f8:221:3802::170:59]:56591) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kfyr1-00044S-Uh for bug-grep@gnu.org; Fri, 20 Nov 2020 00:21:11 -0500 Received: from pd9f48616.dip0.t-ipconnect.de ([217.244.134.22] helo=mars) by m31.gerwinski.de with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kfyqG-0000M1-Db for bug-grep@gnu.org; Fri, 20 Nov 2020 06:20:18 +0100 Received: from frank by mars with local-rmail (Exim 4.92) (envelope-from ) id 1kfyqt-0007kr-PJ for bug-grep@gnu.org; Fri, 20 Nov 2020 06:20:55 +0100 From: Frank Heckenbach MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=ISO-8859-1 Message-Id: Date: Fri, 20 Nov 2020 06:20:55 +0100 Received-SPF: none client-ip=2a01:4f8:221:3802::170:59; envelope-from=f.heckenbach@fh-soft.de; helo=mx3.gerwinski.de X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-Mailman-Approved-At: Fri, 20 Nov 2020 00:32:00 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, I have a use case where I run grep with a large number of search patterns on a large text file. It works well with grep-3.3, but with grep-3.4 it quickly burned through GBs of memory and almost locked up my system due to swapping. To avoid attaching those large files, I could mostly reproduce the effects like this: ulimit -d 5000000 # avoid system lockup due to excessive swapping export LC_ALL=C # make sure no Unicode case conversions are needed % time ./grep-3.3 -Fwif <(seq 300000 | tr 0-9 A-J) << Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Thu, 26 Nov 2020 01:13:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 44754 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 44754@debbugs.gnu.org, f.heckenbach@fh-soft.de X-Debbugs-Original-To: bug-grep@gnu.org, f.heckenbach@fh-soft.de X-Debbugs-Original-Cc: 44754@debbugs.gnu.org Received: via spool by submit@debbugs.gnu.org id=B.16063531449707 (code B ref -1); Thu, 26 Nov 2020 01:13:02 +0000 Received: (at submit) by debbugs.gnu.org; 26 Nov 2020 01:12:24 +0000 Received: from localhost ([127.0.0.1]:37568 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ki5pg-0002WP-5R for submit@debbugs.gnu.org; Wed, 25 Nov 2020 20:12:24 -0500 Received: from lists.gnu.org ([209.51.188.17]:45718) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ki5pe-0002WA-K5 for submit@debbugs.gnu.org; Wed, 25 Nov 2020 20:12:22 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:54354) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ki5pe-0003Ge-DU for bug-grep@gnu.org; Wed, 25 Nov 2020 20:12:22 -0500 Received: from mail-wr1-f41.google.com ([209.85.221.41]:44930) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ki5pc-0000to-Fu for bug-grep@gnu.org; Wed, 25 Nov 2020 20:12:22 -0500 Received: by mail-wr1-f41.google.com with SMTP id 64so293498wra.11 for ; Wed, 25 Nov 2020 17:12:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6jEwnfmUQ8Qk1x4HT0KFBjQDurlLP3IolY9HFSkp4OA=; b=B8ZRpAOKylEGR+LnLYxL+S/MHrt2I2yohVlB3skQQ5JebgE2CuMWlKrgjf5wLRRCeg dFOuGSW0NSf/SN9petaRgpud4Qmmbw/8g9fWEn8ArnTswRYeIr0ATEzgE/7He2uOqJKj 0cPPeEOky1w+NFR+LE1KEpYzSpp93p8o8zXXddy5rlJriCgC3lMwHMIXlOL7W7alecK2 RlThpM+7LeU8k5uf7IOaELBdESUugL/Z0XIu8WBoXlXTMD79YV3Ib/QO/QkR3cJmbYw2 HgWMO8vcrTx/2MrMf7fxGQCi9y8qCqmPtNXA9WfzBZPfDeQpe81omoPnju+PRK3rK3NM M0dQ== X-Gm-Message-State: AOAM533aWSu+j5JqtbbSmUlLFSmkcQFmqRK1vPFQmRZnSJMuOzIrbtrf W2aVSfXHArQtXBXT8AZ3qOdGLhUbM/dX2GLpclLbDAHUyAwG3g== X-Google-Smtp-Source: ABdhPJzm+BBcW9rr+TnfYTkZH4cyJRokABjBs4IAtn1MFDmUTiZjLbUtNiwU3zHL4d9cgsoyVxFCj+M8r5MhCyfjmic= X-Received: by 2002:adf:dd8f:: with SMTP id x15mr650672wrl.343.1606353135968; Wed, 25 Nov 2020 17:12:15 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Jim Meyering Date: Wed, 25 Nov 2020 15:12:04 -1000 Message-ID: Content-Type: multipart/mixed; boundary="0000000000000dc33405b4f83cf6" Received-SPF: pass client-ip=209.85.221.41; envelope-from=meyering@gmail.com; helo=mail-wr1-f41.google.com X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -0.8 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.8 (-) --0000000000000dc33405b4f83cf6 Content-Type: text/plain; charset="UTF-8" On Thu, Nov 19, 2020 at 7:32 PM Frank Heckenbach wrote: > I have a use case where I run grep with a large number of search > patterns on a large text file. It works well with grep-3.3, but with > grep-3.4 it quickly burned through GBs of memory and almost locked > up my system due to swapping. > > To avoid attaching those large files, I could mostly reproduce the > effects like this: > > ulimit -d 5000000 # avoid system lockup due to excessive swapping > export LC_ALL=C # make sure no Unicode case conversions are needed > > % time ./grep-3.3 -Fwif <(seq 300000 | tr 0-9 A-J) << > real 0m0.054s > user 0m0.048s > sys 0m0.012s > > % time ./grep-3.4 -Fwif <(seq 30000 | tr 0-9 A-J) << ./grep-3.4: Memory exhausted > Aborted > > real 0m1.291s > user 0m0.696s > sys 0m0.599s > > % time ./grep-3.6 -Fwif <(seq 300000 | tr 0-9 A-J) << > real 0m13.162s > user 0m12.955s > sys 0m0.211s > > grep-3.3 behaves well, even with much larger number of patterns. > Time seems to grow linearly, and memory usage is constant. > > grep-3.4 behaves the worst of these 3 versions. Even with just 30000 > patterns it exceeds the ulimit of 5 GB. > > grep-3.6 behaves a bit better than 3.4, but still bad. Time seems to > be quadratic in the number of patterns, and though memory usage in > this case seems to be almost constant, in my actual use case it also > runs out of memory where grep-3.3 works well with just a few 100 MB > used. > > Without "-i", grep-3.4 seems to run as fast as grep-3.3, but > grep-3.6 is almost as slow as with "-i". > > So there might actually be two different issues here, one that > affects 3.4 with "-i" and one that affects 3.6 with or without "-i". Thank you for the fine bug report. The grep-3.6 bug you've exposed is due to the fact that your input triggers excessive hash collisions when using the code modeled after gnulib/lib/hash-pjw.c. That made the new pattern-preprocessing phase take O(N^2) time for N patterns. In the attached, I've switched grep to use the djb2 hash function, and that resolves the problem. I'll also add a NEWS entry and a test before pushing this. --0000000000000dc33405b4f83cf6 Content-Type: application/octet-stream; name="0001-grep-avoid-performance-regression-with-many-patterns.patch" Content-Disposition: attachment; filename="0001-grep-avoid-performance-regression-with-many-patterns.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_khy4xer10 RnJvbSBlNTdmNGFkMTllMzAwMWYxN2JmNzc5YmQ0NDE5YmFmOTQ0ZDYyZDJkIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog V2VkLCAyNSBOb3YgMjAyMCAxNjo0OTo1MSAtMDgwMApTdWJqZWN0OiBbUEFUQ0hdIGdyZXA6IGF2 b2lkIHBlcmZvcm1hbmNlIHJlZ3Jlc3Npb24gd2l0aCBtYW55IHBhdHRlcm5zCgoqIHNyYy9ncmVw LmMgKGhhc2hfcGF0dGVybik6IFN3aXRjaCBmcm9tIFBKVyB0byBESkIyLCB0byBhdm9pZCBhbgpP KE4pIHRvIE8oTl4yKSBwZXJmb3JtYW5jZSByZWdyZXNzaW9uIGR1ZSB0byBoYXNoIGNvbGxpc2lv bnMgd2l0aApwYXR0ZXJucyBmcm9tIGUuZy4sIHNlcSA1MDAwMDB8dHIgMC05IEEtSgpSZXBvcnRl ZCBieSBGcmFuayBIZWNrZW5iYWNoIGluIGh0dHBzOi8vYnVncy5nbnUub3JnLzQ0NzU0CgoqIGNv bmZpZ3VyZS5hYyAoR05VTElCX1RFU1RfV0FSTl9DRkxBR1MpOiBEaXNhYmxlCnRoZSBzYW1lIHRo cmVlIHdhcm5pbmcgb3B0aW9ucyB0aGF0IGNvcmV1dGlscyBkb2VzOgotV3N0cmljdC1wcm90b3R5 cGVzCi1Xc3VnZ2VzdC1hdHRyaWJ1dGU9Y29uc3QKLVdzdWdnZXN0LWF0dHJpYnV0ZT1wdXJlCi0t LQogY29uZmlndXJlLmFjIHwgMyArKysKIHNyYy9ncmVwLmMgICB8IDUgKysrLS0KIDIgZmlsZXMg Y2hhbmdlZCwgNiBpbnNlcnRpb25zKCspLCAyIGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL2Nv bmZpZ3VyZS5hYyBiL2NvbmZpZ3VyZS5hYwppbmRleCBiYjIwZTM5Li5lMjc5ODg2IDEwMDY0NAot LS0gYS9jb25maWd1cmUuYWMKKysrIGIvY29uZmlndXJlLmFjCkBAIC0xNTQsOSArMTU0LDEyIEBA IGlmIHRlc3QgIiRnbF9nY2Nfd2FybmluZ3MiID0geWVzOyB0aGVuCiAgICMgSXQncyBub3Qgd29y dGggYmVpbmcgdGhpcyBwaWNreSBhYm91dCB0ZXN0IHByb2dyYW1zLgogICBudz0iJG53IC1Xc3Vn Z2VzdC1hdHRyaWJ1dGU9Y29uc3QiCiAgIG53PSIkbncgLVdzdWdnZXN0LWF0dHJpYnV0ZT1wdXJl IgorICBudz0iJG53IC1Xc3VnZ2VzdC1hdHRyaWJ1dGU9Zm9ybWF0IgogICBudz0iJG53IC1XZm9y bWF0LXRydW5jYXRpb249MiIgICAgIyBGYWxzZSBhbGFybSBpbiBzdHJlcnJvcl9yLmMKKyAgbnc9 IiRudyAtV29sZC1zdHlsZS1kZWZpbml0aW9uIgogICBnbF9NQU5ZV0FSTl9DT01QTEVNRU5UKFtH TlVMSUJfVEVTVF9XQVJOX0NGTEFHU10sCiAgICAgICAgICAgICAgICAgICAgICAgICAgWyRHTlVM SUJfV0FSTl9DRkxBR1NdLCBbJG53XSkKKyAgZ2xfV0FSTl9BREQoWy1Xbm8tcmV0dXJuLXR5cGVd LCBbR05VTElCX1RFU1RfV0FSTl9DRkxBR1NdKQogICBBQ19TVUJTVChbR05VTElCX1RFU1RfV0FS Tl9DRkxBR1NdKQogZmkKCmRpZmYgLS1naXQgYS9zcmMvZ3JlcC5jIGIvc3JjL2dyZXAuYwppbmRl eCBjYzJiOTYyLi43NGJkZGI3IDEwMDY0NAotLS0gYS9zcmMvZ3JlcC5jCisrKyBiL3NyYy9ncmVw LmMKQEAgLTEyOCw4ICsxMjgsOSBAQCBoYXNoX3BhdHRlcm4gKHZvaWQgY29uc3QgKnBhdCwgc2l6 ZV90IG5fYnVja2V0cykKIHsKICAgc2l6ZV90IGggPSAwOwogICBpbnRwdHJfdCBwYXRfb2Zmc2V0 ID0gKGludHB0cl90KSBwYXQgLSAxOwotICBmb3IgKGNoYXIgY29uc3QgKnMgPSBwYXR0ZXJuX2Fy cmF5ICsgcGF0X29mZnNldDsgKnMgIT0gJ1xuJzsgcysrKQotICAgIGggPSAqcyArICgoaCA8PCA5 KSB8IChoID4+IChTSVpFX1dJRFRIIC0gOSkpKTsKKyAgdW5zaWduZWQgY2hhciBjb25zdCAqcyA9 ICh1bnNpZ25lZCBjaGFyIGNvbnN0ICopIHBhdHRlcm5fYXJyYXkgKyBwYXRfb2Zmc2V0OworICBm b3IgKCA7ICpzICE9ICdcbic7IHMrKykKKyAgICBoID0gaCAqIDMzIF4gKnM7CiAgIHJldHVybiBo ICUgbl9idWNrZXRzOwogfQogc3RhdGljIGJvb2wgX0dMX0FUVFJJQlVURV9QVVJFCi0tIAoyLjI5 LjIuMTU0Lmc3ZjdlYmUwNTRhCgo= --0000000000000dc33405b4f83cf6-- From unknown Sat Aug 16 22:48:05 2025 X-Loop: help-debbugs@gnu.org Subject: bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6 Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Thu, 26 Nov 2020 19:04:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 44754 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 44754@debbugs.gnu.org, f.heckenbach@fh-soft.de X-Debbugs-Original-To: bug-grep@gnu.org, f.heckenbach@fh-soft.de X-Debbugs-Original-Cc: 44754@debbugs.gnu.org Received: via spool by submit@debbugs.gnu.org id=B.160641744112263 (code B ref -1); Thu, 26 Nov 2020 19:04:02 +0000 Received: (at submit) by debbugs.gnu.org; 26 Nov 2020 19:04:01 +0000 Received: from localhost ([127.0.0.1]:42898 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kiMYi-0003Bi-Me for submit@debbugs.gnu.org; Thu, 26 Nov 2020 14:04:00 -0500 Received: from lists.gnu.org ([209.51.188.17]:52278) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kiMYh-0003Bb-1b for submit@debbugs.gnu.org; Thu, 26 Nov 2020 14:03:59 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:59742) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kiMYg-0006j0-So for bug-grep@gnu.org; Thu, 26 Nov 2020 14:03:58 -0500 Received: from mail-wm1-f52.google.com ([209.85.128.52]:39562) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kiMYf-0005Ub-6p for bug-grep@gnu.org; Thu, 26 Nov 2020 14:03:58 -0500 Received: by mail-wm1-f52.google.com with SMTP id s13so3390001wmh.4 for ; Thu, 26 Nov 2020 11:03:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=HlVTDDXLlc1fXblysH/k8HO2RUMCA6TSRZHah2Oov/E=; b=MAq1A6h4RYIHFddBXhrsWkVfrJThpVziUz3j5kVgTwOzwK5qIU5qdmXZmuGAzI1yGW 8GCKQQ2dhjUXVMtazJNSwCpvnn8QkC1Y0ygG6W4NBPvzaUq8OTKuCEDVTBFWgi4P/CCz ExMC4al3jcRy0Bc4B19DylotizCJWkIOTyKfbjnVMJK4JEQrlIGTmyOwoZtLKUkRTGHL qaJ0ccDxqxHelHjLUlm/CVFx2aQ0U8x3GJZFsuCp7fVoS4N3qESBh8///7PqL2KqlWJ6 GcbgRyRcU8EITYifLm+Ndrxzq5zX8JC0gecqqVKSLHKSu+GiUZRxrYrZUxbqywJpcqzV 0wJQ== X-Gm-Message-State: AOAM531bNPW385WNRglpZ6xkEdcphK8LrQZqb47KYfiT7oJ1BvT04d3C TI0fzy1d6UfA8EeSm4dOAl4/QN5ogYEv/jnoIzdOaTpmAebGEA== X-Google-Smtp-Source: ABdhPJxrSYqJayzjmNy6+dgkljqh8ePL8X4r26hKXloX7PTP2vu2wsRUk7igJvzie3oCPYIlxQDoo3Y3AJFUm9CNynE= X-Received: by 2002:a1c:a1c6:: with SMTP id k189mr4947299wme.137.1606417434547; Thu, 26 Nov 2020 11:03:54 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Jim Meyering Date: Thu, 26 Nov 2020 09:03:42 -1000 Message-ID: Content-Type: multipart/mixed; boundary="0000000000008c38e005b5073468" Received-SPF: pass client-ip=209.85.128.52; envelope-from=meyering@gmail.com; helo=mail-wm1-f52.google.com X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -0.8 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.8 (-) --0000000000008c38e005b5073468 Content-Type: text/plain; charset="UTF-8" On Wed, Nov 25, 2020 at 3:12 PM Jim Meyering wrote: > Thank you for the fine bug report. > The grep-3.6 bug you've exposed is due to the fact that your input > triggers excessive hash collisions when using the code modeled after > gnulib/lib/hash-pjw.c. That made the new pattern-preprocessing phase > take O(N^2) time for N patterns. In the attached, I've switched grep > to use the djb2 hash function, and that resolves the problem. I'll > also add a NEWS entry and a test before pushing this. Timings suggest that grep-3.6's preprocessing came closer to O(N^3). Here's an example that would take 2-3 days with grep-3.6 and only seconds with this fix: : | grep -Ff <(seq 6400000 | tr 0-9 A-J) Here's a complete patch. I'll push it later today. --0000000000008c38e005b5073468 Content-Type: application/octet-stream; name="0001-grep-avoid-performance-regression-with-many-patterns.patch" Content-Disposition: attachment; filename="0001-grep-avoid-performance-regression-with-many-patterns.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_khz7fqmm0 RnJvbSAxOTJlNTk5MDNjN2QzMTNiYjQ3ZGUzZDVjMTViM2RjNjM0ZTk4YzVmIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog V2VkLCAyNSBOb3YgMjAyMCAxNjo0OTo1MSAtMDgwMApTdWJqZWN0OiBbUEFUQ0hdIGdyZXA6IGF2 b2lkIHBlcmZvcm1hbmNlIHJlZ3Jlc3Npb24gd2l0aCBtYW55IHBhdHRlcm5zCgoqIHNyYy9ncmVw LmMgKGhhc2hfcGF0dGVybik6IFN3aXRjaCBmcm9tIFBKVyB0byBESkIyLCB0byBhdm9pZCBhbgpP KE4pIHRvIE8oTl4yKSBwZXJmb3JtYW5jZSByZWdyZXNzaW9uIGR1ZSB0byBoYXNoIGNvbGxpc2lv bnMgd2l0aApwYXR0ZXJucyBmcm9tIGUuZy4sIHNlcSA1MDAwMDB8dHIgMC05IEEtSgpSZXBvcnRl ZCBieSBGcmFuayBIZWNrZW5iYWNoIGluIGh0dHBzOi8vYnVncy5nbnUub3JnLzQ0NzU0CiogTkVX UyAoQnVnIGZpeGVzKTogTWVudGlvbiBpdC4KKiB0ZXN0cy9oYXNoLWNvbGxpc2lvbi1wZXJmOiBO ZXcgZmlsZS4KKiB0ZXN0cy9NYWtlZmlsZS5hbSAoVEVTVFMpOiBBZGQgaXQuCi0tLQogTkVXUyAg ICAgICAgICAgICAgICAgICAgICB8ICA3ICsrKysrKwogc3JjL2dyZXAuYyAgICAgICAgICAgICAg ICB8ICA1ICsrLS0KIHRlc3RzL01ha2VmaWxlLmFtICAgICAgICAgfCAgMSArCiB0ZXN0cy9oYXNo LWNvbGxpc2lvbi1wZXJmIHwgNTMgKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr KysrCiA0IGZpbGVzIGNoYW5nZWQsIDY0IGluc2VydGlvbnMoKyksIDIgZGVsZXRpb25zKC0pCiBj cmVhdGUgbW9kZSAxMDA3NTUgdGVzdHMvaGFzaC1jb2xsaXNpb24tcGVyZgoKZGlmZiAtLWdpdCBh L05FV1MgYi9ORVdTCmluZGV4IGE0NWNlZDEuLjUwODc1OWEgMTAwNjQ0Ci0tLSBhL05FV1MKKysr IGIvTkVXUwpAQCAtMiw2ICsyLDEzIEBAIEdOVSBncmVwIE5FV1MgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAtKi0gb3V0bGluZSAtKi0KCiAqIE5vdGV3b3J0aHkgY2hhbmdlcyBp biByZWxlYXNlID8uPyAoPz8/Py0/Py0/PykgWz9dCgorKiogQnVnIGZpeGVzCisKKyAgUHJlcHJv Y2Vzc2luZyBOIHBhdHRlcm5zIHdvdWxkIHRha2UgYXQgbGVhc3QgTyhOXjIpIHRpbWUgd2hlbiB0 b28gbWFueQorICBwYXR0ZXJucyBoYXNoZWQgdG8gdG9vIGZldyBidWNrZXRzLiBUaGlzIG5vdyB0 YWtlcyBzZWNvbmRzLCBub3QgZGF5czoKKyAgOiB8IGdyZXAgLUZmIDwoc2VxIDY0MDAwMDAgfCB0 ciAwLTkgQS1KKQorICBbQnVnIzQ0NzU0IGludHJvZHVjZWQgaW4gZ3JlcCAzLjVdCisKCiAqIE5v dGV3b3J0aHkgY2hhbmdlcyBpbiByZWxlYXNlIDMuNiAoMjAyMC0xMS0wOCkgW3N0YWJsZV0KCmRp ZmYgLS1naXQgYS9zcmMvZ3JlcC5jIGIvc3JjL2dyZXAuYwppbmRleCBlYjdjNjBlLi41YTU4YmY1 IDEwMDY0NAotLS0gYS9zcmMvZ3JlcC5jCisrKyBiL3NyYy9ncmVwLmMKQEAgLTEyOCw4ICsxMjgs OSBAQCBoYXNoX3BhdHRlcm4gKHZvaWQgY29uc3QgKnBhdCwgc2l6ZV90IG5fYnVja2V0cykKIHsK ICAgc2l6ZV90IGggPSAwOwogICBpbnRwdHJfdCBwYXRfb2Zmc2V0ID0gKGludHB0cl90KSBwYXQg LSAxOwotICBmb3IgKGNoYXIgY29uc3QgKnMgPSBwYXR0ZXJuX2FycmF5ICsgcGF0X29mZnNldDsg KnMgIT0gJ1xuJzsgcysrKQotICAgIGggPSAqcyArICgoaCA8PCA5KSB8IChoID4+IChTSVpFX1dJ RFRIIC0gOSkpKTsKKyAgdW5zaWduZWQgY2hhciBjb25zdCAqcyA9ICh1bnNpZ25lZCBjaGFyIGNv bnN0ICopIHBhdHRlcm5fYXJyYXkgKyBwYXRfb2Zmc2V0OworICBmb3IgKCA7ICpzICE9ICdcbic7 IHMrKykKKyAgICBoID0gaCAqIDMzIF4gKnM7CiAgIHJldHVybiBoICUgbl9idWNrZXRzOwogfQog c3RhdGljIGJvb2wgX0dMX0FUVFJJQlVURV9QVVJFCmRpZmYgLS1naXQgYS90ZXN0cy9NYWtlZmls ZS5hbSBiL3Rlc3RzL01ha2VmaWxlLmFtCmluZGV4IDQ4MGJmYjQuLmIxYjVkNjEgMTAwNjQ0Ci0t LSBhL3Rlc3RzL01ha2VmaWxlLmFtCisrKyBiL3Rlc3RzL01ha2VmaWxlLmFtCkBAIC0xMDksNiAr MTA5LDcgQEAgVEVTVFMgPQkJCQkJCVwKICAgZ3JlcC1kZXYtbnVsbAkJCQkJXAogICBncmVwLWRl di1udWxsLW91dAkJCQlcCiAgIGdyZXAtZGlyCQkJCQlcCisgIGhhc2gtY29sbGlzaW9uLXBlcmYJ CQkJXAogICBoZWxwLXZlcnNpb24JCQkJCVwKICAgaGlnaC1iaXQtcmFuZ2UJCQkJXAogICBpbi1l cS1vdXQtaW5mbG9vcAkJCQlcCmRpZmYgLS1naXQgYS90ZXN0cy9oYXNoLWNvbGxpc2lvbi1wZXJm IGIvdGVzdHMvaGFzaC1jb2xsaXNpb24tcGVyZgpuZXcgZmlsZSBtb2RlIDEwMDc1NQppbmRleCAw MDAwMDAwLi4zOGM4NzZkCi0tLSAvZGV2L251bGwKKysrIGIvdGVzdHMvaGFzaC1jb2xsaXNpb24t cGVyZgpAQCAtMCwwICsxLDUzIEBACisjIS9iaW4vc2gKKyMgVGVzdCBmb3IgdGhpcyBwZXJmb3Jt YW5jZSByZWdyZXNzaW9uOgorIyBncmVwLTMuNSBhbmQgMy42IHdvdWxkIHRha2UgTyhOXjIpIHRp bWUgZm9yIHNvbWUgc2V0cyBvZiBpbnB1dCByZWdleHBzLgorCisjIENvcHlyaWdodCAyMDIwIEZy ZWUgU29mdHdhcmUgRm91bmRhdGlvbiwgSW5jLgorCisjIFRoaXMgcHJvZ3JhbSBpcyBmcmVlIHNv ZnR3YXJlOiB5b3UgY2FuIHJlZGlzdHJpYnV0ZSBpdCBhbmQvb3IgbW9kaWZ5CisjIGl0IHVuZGVy IHRoZSB0ZXJtcyBvZiB0aGUgR05VIEdlbmVyYWwgUHVibGljIExpY2Vuc2UgYXMgcHVibGlzaGVk IGJ5CisjIHRoZSBGcmVlIFNvZnR3YXJlIEZvdW5kYXRpb24sIGVpdGhlciB2ZXJzaW9uIDMgb2Yg dGhlIExpY2Vuc2UsIG9yCisjIChhdCB5b3VyIG9wdGlvbikgYW55IGxhdGVyIHZlcnNpb24uCisK KyMgVGhpcyBwcm9ncmFtIGlzIGRpc3RyaWJ1dGVkIGluIHRoZSBob3BlIHRoYXQgaXQgd2lsbCBi ZSB1c2VmdWwsCisjIGJ1dCBXSVRIT1VUIEFOWSBXQVJSQU5UWTsgd2l0aG91dCBldmVuIHRoZSBp bXBsaWVkIHdhcnJhbnR5IG9mCisjIE1FUkNIQU5UQUJJTElUWSBvciBGSVRORVNTIEZPUiBBIFBB UlRJQ1VMQVIgUFVSUE9TRS4gIFNlZSB0aGUKKyMgR05VIEdlbmVyYWwgUHVibGljIExpY2Vuc2Ug Zm9yIG1vcmUgZGV0YWlscy4KKworIyBZb3Ugc2hvdWxkIGhhdmUgcmVjZWl2ZWQgYSBjb3B5IG9m IHRoZSBHTlUgR2VuZXJhbCBQdWJsaWMgTGljZW5zZQorIyBhbG9uZyB3aXRoIHRoaXMgcHJvZ3Jh bS4gIElmIG5vdCwgc2VlIDxodHRwczovL3d3dy5nbnUub3JnL2xpY2Vuc2VzLz4uCisKKy4gIiR7 c3JjZGlyPS59L2luaXQuc2giOyBwYXRoX3ByZXBlbmRfIC4uL3NyYworCitmYWlsPTAKKworOiA+ IGVtcHR5IHx8IGZyYW1ld29ya19mYWlsdXJlXworCisjIENvbnN0cnVjdCBhIHRlc3QgY2FzZSB0 aGF0IGNvbnN1bWVzIGVub3VnaCBDUFUgdGltZSB0aGF0IHdlIGRvbid0CisjIGhhdmUgdG8gd29y cnkgYWJvdXQgbWVhc3VyZW1lbnQgbm9pc2UuIFRoaXMgZmlyc3QgY2FzZSBpcyBzZWFyY2hpbmcK KyMgZm9yIGRpZ2l0cywgd2hpY2ggbmV2ZXIgZXhoaWJpdGVkIGEgcHJvYmxlbSB3aXRoIGhhc2gg Y29sbGlzaW9ucy4KK25fcGF0PTQwMDAwCit3aGlsZSA6OyBkbworICBzZXEgJG5fcGF0ID4gaW4g fHwgZnJhbWV3b3JrX2ZhaWx1cmVfCisgIHNtYWxsX21zPSQoTENfQUxMPUMgdXNlcl90aW1lXyAx IGdyZXAgLS1maWxlPWluIGVtcHR5KSB8fCBmYWlsPTEKKyAgdGVzdCAkc21hbGxfbXMgLWdlIDIw MCAmJiBicmVhaworICBuX3BhdD0kKGV4cHIgJG5fcGF0ICcqJyAyKQorZG9uZQorCisjIE5vdywg c2VhcmNoIGZvciB0aG9zZSBzYW1lIGRpZ2l0cyBtYXBwZWQgdG8gQS1KLgorIyBXaXRoIHRoZSBQ SlctYmFzZWQgaGFzaCBmdW5jdGlvbiwgdGhpcyBiZWNhbWUgTyhOXjIpLgorc2VxICRuX3BhdCB8 IHRyIDAtOSBBLUogPiBpbiB8fCBmcmFtZXdvcmtfZmFpbHVyZV8KK2xhcmdlX21zPSQoTENfQUxM PUMgdXNlcl90aW1lXyAxIGdyZXAgLS1maWxlPWluIGVtcHR5KSB8fCBmYWlsPTEKKworIyBEZWxp YmVyYXRlbHkgcmVjb3JkaW5nIGluIGFuIHVudXNlZCB2YXJpYWJsZSBzbyBpdAorIyBzaG93cyB1 cCBpbiBzZXQgLXggb3V0cHV0LCBpbiBjYXNlIHRoaXMgdGVzdCBmYWlscy4KK3JhdGlvPSQoZXhw ciAiJGxhcmdlX21zIiAvICIkc21hbGxfbXMiKQord2Fybl8gcmF0aW89JHJhdGlvCisKKyMgVGhl IGR1cmF0aW9uIG9mIHRoZSBsYXR0ZXIgcnVuIG11c3QgYmUgbm8gbW9yZSB0aGFuIDEwIHRpbWVz CisjIHRoYXQgb2YgdGhlIGZvcm1lci4gIFVzaW5nIHJlY2VudCB2ZXJzaW9ucyBwcmlvciB0byB0 aGlzIGZpeCwKKyMgdGhpcyB0ZXN0IHdvdWxkIGZhaWwgZHVlIHRvIHJhdGlvcyA+IDgwMC4gIFVz aW5nIHRoZSBmaXhlZCB2ZXJzaW9uLAorIyBpdCdzIGNvbW1vbiB0byBzZWUgYSByYXRpbyBsZXNz IHRoYW4gMS4KK3JldHVybnNfIDEgZXhwciAkc21hbGxfbXMgJzwnICRsYXJnZV9tcyAvIDEwIHx8 IGZhaWw9MQorCitFeGl0ICRmYWlsCi0tIAoyLjI5LjIuMTU0Lmc3ZjdlYmUwNTRhCgo= --0000000000008c38e005b5073468-- From unknown Sat Aug 16 22:48:05 2025 X-Loop: help-debbugs@gnu.org Subject: bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6 Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 27 Nov 2020 07:42:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 44754 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 44754@debbugs.gnu.org, f.heckenbach@fh-soft.de Cc: 44754-done@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org, f.heckenbach@fh-soft.de Received: via spool by submit@debbugs.gnu.org id=B.16064629053260 (code B ref -1); Fri, 27 Nov 2020 07:42:02 +0000 Received: (at submit) by debbugs.gnu.org; 27 Nov 2020 07:41:45 +0000 Received: from localhost ([127.0.0.1]:43699 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kiYO0-0000qV-EY for submit@debbugs.gnu.org; Fri, 27 Nov 2020 02:41:44 -0500 Received: from lists.gnu.org ([209.51.188.17]:33432) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kiYNt-0000q5-H4 for submit@debbugs.gnu.org; Fri, 27 Nov 2020 02:41:39 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:34300) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kiYNt-0000uf-6f for bug-grep@gnu.org; Fri, 27 Nov 2020 02:41:37 -0500 Received: from mail-wr1-f42.google.com ([209.85.221.42]:41844) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kiYNq-0007Y0-9t for bug-grep@gnu.org; Fri, 27 Nov 2020 02:41:36 -0500 Received: by mail-wr1-f42.google.com with SMTP id 23so4528177wrc.8 for ; Thu, 26 Nov 2020 23:41:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JEsuy3A62IYuhg+ruGV+Kwouf2OWg8D3VL7ujn5fjA4=; b=q50FR3H5D7ID5V3UJwwSRXeLIbn/bhzGeYGLJH+bhwAg5V3F2AnuheLjfno4fkKPMv dnmK1ZRymLtgu1UlGKLOFAzHXRUZ1qwO+ZvFUK4Yh2kRkdg7aMyf0gHbCbeIlAJZxMNA BuWOboWg7QbPC8NZZ4LIwo3KXjMIkDSxOaIYuJludv0WPiRgqAsYgF5mVzcql83Nffgc QgWcAWYbZZzzTPlVHn0+X3MqhovKvDPUWrvR34hxWUS4QY00ts/u/hK+T3d7iAosf+Hg 11K9nGJSw5VjE5Kz4v+GU996RSHjtZ/CJiv+y2AjQAJ038INgOeOKViEIKG9o2yMn2ja uI2w== X-Gm-Message-State: AOAM5329IAhXd1rvLaH83ZZmuNir2vDA7zlEjJvc1rxRZMPYKUZiaEZ5 7M6IaUYwoyhfwYBz3/LtWZ4ZP6OuAH4p/1LVKACHCXPYr4k= X-Google-Smtp-Source: ABdhPJySj7xc1XPrgQ8YHTR1pX1f/A0sDbaKxFnQguuceGVb7pQh3lPt+4TjYzEKcahrHao74MVMd61TnN5GZ7fj2LU= X-Received: by 2002:a5d:548b:: with SMTP id h11mr8693554wrv.306.1606462892101; Thu, 26 Nov 2020 23:41:32 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Jim Meyering Date: Thu, 26 Nov 2020 21:41:20 -1000 Message-ID: Content-Type: text/plain; charset="UTF-8" Received-SPF: pass client-ip=209.85.221.42; envelope-from=meyering@gmail.com; helo=mail-wr1-f42.google.com X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -0.8 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.8 (-) On Thu, Nov 26, 2020 at 9:03 AM Jim Meyering wrote: > > On Wed, Nov 25, 2020 at 3:12 PM Jim Meyering wrote: > > Thank you for the fine bug report. > > The grep-3.6 bug you've exposed is due to the fact that your input > > triggers excessive hash collisions when using the code modeled after > > gnulib/lib/hash-pjw.c. That made the new pattern-preprocessing phase > > take O(N^2) time for N patterns. In the attached, I've switched grep > > to use the djb2 hash function, and that resolves the problem. I'll > > also add a NEWS entry and a test before pushing this. > > Timings suggest that grep-3.6's preprocessing came closer to O(N^3). > Here's an example that would take 2-3 days with grep-3.6 and only > seconds with this fix: > > : | grep -Ff <(seq 6400000 | tr 0-9 A-J) > > Here's a complete patch. > I'll push it later today. Pushed along with two gnulib-related changes. From unknown Sat Aug 16 22:48:05 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: bug-grep@gnu.org, f.heckenbach@fh-soft.de Subject: bug#44754: closed (Re: bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6) Message-ID: References: X-Gnu-PR-Message: they-closed 44754 X-Gnu-PR-Package: grep Reply-To: 44754@debbugs.gnu.org Date: Fri, 27 Nov 2020 07:42:03 +0000 Content-Type: multipart/mixed; boundary="----------=_1606462923-3300-1" This is a multi-part message in MIME format... ------------=_1606462923-3300-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #44754: Extreme performance degradation in GNU grep 3.4 / 3.6 which was filed against the grep package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 44754@debbugs.gnu.org. --=20 44754: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D44754 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1606462923-3300-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 44754-done) by debbugs.gnu.org; 27 Nov 2020 07:41:40 +0000 Received: from localhost ([127.0.0.1]:43697 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kiYNw-0000qI-8V for submit@debbugs.gnu.org; Fri, 27 Nov 2020 02:41:40 -0500 Received: from mail-wr1-f52.google.com ([209.85.221.52]:37909) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kiYNt-0000pz-Pp for 44754-done@debbugs.gnu.org; Fri, 27 Nov 2020 02:41:38 -0500 Received: by mail-wr1-f52.google.com with SMTP id p8so4525821wrx.5 for <44754-done@debbugs.gnu.org>; Thu, 26 Nov 2020 23:41:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JEsuy3A62IYuhg+ruGV+Kwouf2OWg8D3VL7ujn5fjA4=; b=XdRg5V9ytRO8BhtW3tFQ+gAxEzXpdpQJEJwpk34dOQw/CBN617OrJRjlUmGZTmgVQ7 xidzU4EfKM4SX33bf8jeosNofjdi0/QAoD+QnAaKgFfPEgEjLmHrdLwS3hy/BbyhUatu fOamT759aSZBNsmGX/dy9rHBKQaVzkg8QHkARamE9XXdmMs4KRLK+922TFbA7GZcGVCx +PGe41T5Z1jKi2pjxla6r/dbdE1FWVdc074FHlmxDVfxSKqcOFMxVpfdSf8Wd0+97uMf T+HvbkAZRQFRmdXyuBPl2pxL8yusABEPIER89zKAbkWC52DRXYIhSsL51eXMEHThRORM iUNA== X-Gm-Message-State: AOAM532bpaFnyQVDjdSv1HiXx4rj7UYyYgeXA46XVlf+8X76ADSbl4pN o7DfyL1bEkirlPaER0B1uByGB2K0iCIrUhmC6mw= X-Google-Smtp-Source: ABdhPJySj7xc1XPrgQ8YHTR1pX1f/A0sDbaKxFnQguuceGVb7pQh3lPt+4TjYzEKcahrHao74MVMd61TnN5GZ7fj2LU= X-Received: by 2002:a5d:548b:: with SMTP id h11mr8693554wrv.306.1606462892101; Thu, 26 Nov 2020 23:41:32 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Jim Meyering Date: Thu, 26 Nov 2020 21:41:20 -1000 Message-ID: Subject: Re: bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6 To: bug-grep@gnu.org, f.heckenbach@fh-soft.de Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 44754-done Cc: 44754-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Thu, Nov 26, 2020 at 9:03 AM Jim Meyering wrote: > > On Wed, Nov 25, 2020 at 3:12 PM Jim Meyering wrote: > > Thank you for the fine bug report. > > The grep-3.6 bug you've exposed is due to the fact that your input > > triggers excessive hash collisions when using the code modeled after > > gnulib/lib/hash-pjw.c. That made the new pattern-preprocessing phase > > take O(N^2) time for N patterns. In the attached, I've switched grep > > to use the djb2 hash function, and that resolves the problem. I'll > > also add a NEWS entry and a test before pushing this. > > Timings suggest that grep-3.6's preprocessing came closer to O(N^3). > Here's an example that would take 2-3 days with grep-3.6 and only > seconds with this fix: > > : | grep -Ff <(seq 6400000 | tr 0-9 A-J) > > Here's a complete patch. > I'll push it later today. Pushed along with two gnulib-related changes. ------------=_1606462923-3300-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 20 Nov 2020 05:32:01 +0000 Received: from localhost ([127.0.0.1]:41699 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kfz1c-00062T-Kn for submit@debbugs.gnu.org; Fri, 20 Nov 2020 00:32:00 -0500 Received: from lists.gnu.org ([209.51.188.17]:33488) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kfyrA-0005kP-Dk for submit@debbugs.gnu.org; Fri, 20 Nov 2020 00:21:12 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:38516) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kfyr9-0006mw-RP for bug-grep@gnu.org; Fri, 20 Nov 2020 00:21:12 -0500 Received: from mx3.gerwinski.de ([2a01:4f8:221:3802::170:59]:56591) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kfyr1-00044S-Uh for bug-grep@gnu.org; Fri, 20 Nov 2020 00:21:11 -0500 Received: from pd9f48616.dip0.t-ipconnect.de ([217.244.134.22] helo=mars) by m31.gerwinski.de with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kfyqG-0000M1-Db for bug-grep@gnu.org; Fri, 20 Nov 2020 06:20:18 +0100 Received: from frank by mars with local-rmail (Exim 4.92) (envelope-from ) id 1kfyqt-0007kr-PJ for bug-grep@gnu.org; Fri, 20 Nov 2020 06:20:55 +0100 To: bug-grep@gnu.org Subject: Extreme performance degradation in GNU grep 3.4 / 3.6 From: Frank Heckenbach MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=ISO-8859-1 Message-Id: Date: Fri, 20 Nov 2020 06:20:55 +0100 Received-SPF: none client-ip=2a01:4f8:221:3802::170:59; envelope-from=f.heckenbach@fh-soft.de; helo=mx3.gerwinski.de X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 20 Nov 2020 00:32:00 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: bug-grep@gnu.org, f.heckenbach@fh-soft.de Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, I have a use case where I run grep with a large number of search patterns on a large text file. It works well with grep-3.3, but with grep-3.4 it quickly burned through GBs of memory and almost locked up my system due to swapping. To avoid attaching those large files, I could mostly reproduce the effects like this: ulimit -d 5000000 # avoid system lockup due to excessive swapping export LC_ALL=C # make sure no Unicode case conversions are needed % time ./grep-3.3 -Fwif <(seq 300000 | tr 0-9 A-J) << Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Thu, 03 Dec 2020 08:27:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 44754 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 44754@debbugs.gnu.org Cc: jim@meyering.net, f.heckenbach@fh-soft.de Received: via spool by 44754-submit@debbugs.gnu.org id=B44754.160698401922336 (code B ref 44754); Thu, 03 Dec 2020 08:27:02 +0000 Received: (at 44754) by debbugs.gnu.org; 3 Dec 2020 08:26:59 +0000 Received: from localhost ([127.0.0.1]:38186 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kkjx4-0005oC-Pm for submit@debbugs.gnu.org; Thu, 03 Dec 2020 03:26:59 -0500 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:48212) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kkjx2-0005nv-MC for 44754@debbugs.gnu.org; Thu, 03 Dec 2020 03:26:57 -0500 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 3BAD94A02B3 for <44754@debbugs.gnu.org>; Thu, 3 Dec 2020 17:26:49 +0900 (JST) X-matriXscan-loop-detect: bd78718d9d1724ff55c82a10c8a1c721536a7195 Received: from mail13.kcn.ne.jp ([61.86.6.131]) by mxs02-s with ESMTP; Thu, 03 Dec 2020 17:26:47 +0900 (JST) Received: from [10.120.1.62] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail13.kcn.ne.jp (Postfix) with ESMTPA id 0B5BF4122431; Thu, 3 Dec 2020 17:26:46 +0900 (JST) Date: Thu, 03 Dec 2020 17:26:46 +0900 From: Norihiro Tanaka In-Reply-To: References: Message-Id: <20201203172633.14B9.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5FC89EE20000000014B0_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.75.01 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --------_5FC89EE20000000014B0_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit On Thu, 26 Nov 2020 21:41:20 -1000 Jim Meyering wrote: > On Thu, Nov 26, 2020 at 9:03 AM Jim Meyering wrote: > > > > On Wed, Nov 25, 2020 at 3:12 PM Jim Meyering wrote: > > > Thank you for the fine bug report. > > > The grep-3.6 bug you've exposed is due to the fact that your input > > > triggers excessive hash collisions when using the code modeled after > > > gnulib/lib/hash-pjw.c. That made the new pattern-preprocessing phase > > > take O(N^2) time for N patterns. In the attached, I've switched grep > > > to use the djb2 hash function, and that resolves the problem. I'll > > > also add a NEWS entry and a test before pushing this. > > > > Timings suggest that grep-3.6's preprocessing came closer to O(N^3). > > Here's an example that would take 2-3 days with grep-3.6 and only > > seconds with this fix: > > > > : | grep -Ff <(seq 6400000 | tr 0-9 A-J) > > > > Here's a complete patch. > > I'll push it later today. > > Pushed along with two gnulib-related changes. The fix has improved some performance. However, it's still quite slow compared to version 3.3, and that can be remedied. It converts to grep only if the potential match does not match the word frequently. --------_5FC89EE20000000014B0_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-grep-improvement-of-grep-convertion-for-fgrep.patch" Content-Disposition: attachment; filename="0001-grep-improvement-of-grep-convertion-for-fgrep.patch" Content-Transfer-Encoding: base64 RnJvbSAxYmZjZGNhNjU4YmQ5MWRkNmI4ZTZlM2E5NmM5ZTc3Njc4YmI0ZDJlIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBUaHUsIDMgRGVjIDIwMjAgMTc6MjI6NTAgKzA5MDAKU3ViamVjdDogW1BBVENIXSBncmVw OiBpbXByb3ZlbWVudCBvZiBncmVwIGNvbnZlcnRpb24gZm9yIGZncmVwCgoqIHNyYy9rd3NlYXJj aC5jIChzdHJ1Y3Qga3dzZWFyY2gpOiBBZGQgbmV3IG1lbWJlcnMuCihGY29tcGlsZSk6IEluaXRp YWxpemUgdGhlbS4KKEZleGVjdXRlKTogQ29udmVydCB0byBncmVwIG9ubHkgaWYgdGhlIHBvdGVu dGlhbCBtYXRjaCBkb2VzIG5vdCBtYXRjaCB0aGUKd29yZCBmcmVxdWVudGx5LgotLS0KIHNyYy9r d3NlYXJjaC5jIHwgICAxNyArKysrKysrKysrKysrKystLQogMSBmaWxlcyBjaGFuZ2VkLCAxNSBp bnNlcnRpb25zKCspLCAyIGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NyYy9rd3NlYXJjaC5j IGIvc3JjL2t3c2VhcmNoLmMKaW5kZXggNjg1NTAyZC4uYTc1NDJmZCAxMDA2NDQKLS0tIGEvc3Jj L2t3c2VhcmNoLmMKKysrIGIvc3JjL2t3c2VhcmNoLmMKQEAgLTQxLDYgKzQxLDExIEBAIHN0cnVj dCBrd3NlYXJjaAogICAvKiBUaGUgdXNlcidzIHBhdHRlcm4gY29tcGlsZWQgYXMgYSByZWd1bGFy IGV4cHJlc3Npb24sCiAgICAgIG9yIG51bGwgaWYgaXQgaGFzIG5vdCBiZWVuIGNvbXBpbGVkLiAg Ki8KICAgdm9pZCAqcmU7CisKKyAgLyogVGhlIG51bWJlciBvZiBtYXRjaGVkIGFuZCBub3QgbWF0 Y2hlZCB3aXRoIHdvcmQgc2VwYXJhdG9ycy4gIFRoZXNlCisgICAgIGFyZSB1c2VkIHRvIGRlY2lk ZSB3aGV0aGVyIGNvbnZlcnQgcGF0dGVybnMgdG8gZ3JlcCBvciBub3QuICAqLworICBzaXplX3Qg d29yZF9zdWNjZXNzOworICBzaXplX3Qgd29yZF9mYWlsdXJlOwogfTsKIAogLyogQ29tcGlsZSB0 aGUgLUYgc3R5bGUgUEFUVEVSTiwgY29udGFpbmluZyBTSVpFIGJ5dGVzIHRoYXQgYXJlCkBAIC05 Nyw2ICsxMDIsOCBAQCBGY29tcGlsZSAoY2hhciAqcGF0dGVybiwgc2l6ZV90IHNpemUsIHJlZ19z eW50YXhfdCBpZ25vcmVkLCBib29sIGV4YWN0KQogICBrd3NlYXJjaC0+cGF0dGVybiA9IHBhdHRl cm47CiAgIGt3c2VhcmNoLT5zaXplID0gc2l6ZTsKICAga3dzZWFyY2gtPnJlID0gTlVMTDsKKyAg a3dzZWFyY2gtPndvcmRfc3VjY2VzcyA9IDA7CisgIGt3c2VhcmNoLT53b3JkX2ZhaWx1cmUgPSAw OwogICByZXR1cm4ga3dzZWFyY2g7CiB9CiAKQEAgLTE4MSw3ICsxODgsOSBAQCBGZXhlY3V0ZSAo dm9pZCAqdmNwLCBjaGFyIGNvbnN0ICpidWYsIHNpemVfdCBzaXplLCBzaXplX3QgKm1hdGNoX3Np emUsCiAgICAgICAgICAgICAgICAgZWxzZQogICAgICAgICAgICAgICAgICAgZ290byBzdWNjZXNz OwogICAgICAgICAgICAgICB9Ci0gICAgICAgICAgICBpZiAoIXN0YXJ0X3B0ciAmJiAhbG9jYWxl aW5mby5tdWx0aWJ5dGUpCisgICAgICAgICAgICBpZiAoIXN0YXJ0X3B0ciAmJiAhbG9jYWxlaW5m by5tdWx0aWJ5dGUKKyAgICAgICAgICAgICAgICAmJiAoa3dzZWFyY2gtPndvcmRfZmFpbHVyZSAm IH4weGZmZmYpCisgICAgICAgICAgICAgICAgJiYga3dzZWFyY2gtPndvcmRfc3VjY2VzcyA8IGt3 c2VhcmNoLT53b3JkX2ZhaWx1cmUpCiAgICAgICAgICAgICAgIHsKICAgICAgICAgICAgICAgICBp ZiAoISBrd3NlYXJjaC0+cmUpCiAgICAgICAgICAgICAgICAgICB7CkBAIC0yMDksNyArMjE4LDEx IEBAIEZleGVjdXRlICh2b2lkICp2Y3AsIGNoYXIgY29uc3QgKmJ1Ziwgc2l6ZV90IHNpemUsIHNp emVfdCAqbWF0Y2hfc2l6ZSwKIAogICAgICAgICAgICAgc3RydWN0IGt3c21hdGNoIHNob3J0ZXJf bWF0Y2g7CiAgICAgICAgICAgICBpZiAoa3dzZXhlYyAoa3dzZXQsIGJlZywgLS1sZW4sICZzaG9y dGVyX21hdGNoLCB0cnVlKSAhPSAwKQotICAgICAgICAgICAgICBicmVhazsKKyAgICAgICAgICAg ICAgeworICAgICAgICAgICAgICAgIGt3c2VhcmNoLT53b3JkX3N1Y2Nlc3MrKzsKKyAgICAgICAg ICAgICAgICBicmVhazsKKyAgICAgICAgICAgICAgfQorICAgICAgICAgICAga3dzZWFyY2gtPndv cmRfZmFpbHVyZSsrOwogICAgICAgICAgICAgbGVuID0gc2hvcnRlcl9tYXRjaC5zaXplOwogICAg ICAgICAgIH0KIAotLSAKMS43LjEKCg== --------_5FC89EE20000000014B0_MULTIPART_MIXED_-- From unknown Sat Aug 16 22:48:05 2025 X-Loop: help-debbugs@gnu.org Subject: bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6 Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sat, 05 Dec 2020 18:07:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 44754 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Norihiro Tanaka Cc: 44754@debbugs.gnu.org, f.heckenbach@fh-soft.de Received: via spool by 44754-submit@debbugs.gnu.org id=B44754.1607191607621 (code B ref 44754); Sat, 05 Dec 2020 18:07:02 +0000 Received: (at 44754) by debbugs.gnu.org; 5 Dec 2020 18:06:47 +0000 Received: from localhost ([127.0.0.1]:48174 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1klbxH-00009w-Ar for submit@debbugs.gnu.org; Sat, 05 Dec 2020 13:06:47 -0500 Received: from mail-wm1-f48.google.com ([209.85.128.48]:53925) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1klbxF-00009j-D1 for 44754@debbugs.gnu.org; Sat, 05 Dec 2020 13:06:45 -0500 Received: by mail-wm1-f48.google.com with SMTP id k10so8041674wmi.3 for <44754@debbugs.gnu.org>; Sat, 05 Dec 2020 10:06:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BtaJeX4wTZLRUtcs9WyubDpS7dfICshkq8uYtDqgicg=; b=IUz0O7zct/rF+GEhRzSZ/u9M4vMxp0yPuQkcEjt6GOi/HRBByxxrnnZSkwYxk/qLF6 62B/u138YEeFy2vQ3YNq5bIV6npbkIOrp+cYxVtqKGqiXmoiLgSQ5zn1ubcgidLDyA06 eWnjByn9T6osmdtmdmjpvuzF+GPZZOYtsxKGKyCNeIo5kpwPJo8B+z7TwRCajKcaVeHe I4z7fsNbaGkj6qrpV0SHMHow1dVh3VutKzsKOLIt59uWpsYeRfhwIXi9q1AiDNO2+g3D QbSeFmNq375WwNembBcg5l+25wQPLng9Pw6DxJGN38WeBZK+TeHjcg6AWlgyMfjYnLQm T12A== X-Gm-Message-State: AOAM5315Gh2fWsuTAaeR41f7fLKArQIwnZuUXc86Xv9KWKK5T7DyLbMq nNwgQtvfy5OgU5QTgxcwK3YoNCag0lr9mckrKpw= X-Google-Smtp-Source: ABdhPJzMsL5MeG9acDtQ6PSELDFV5wWkn6Gz19hN9p253pKY/KiLIkPWd6CwxoZ3THRxAozkRoVQwmK9Hd8HFCy3tlQ= X-Received: by 2002:a1c:c902:: with SMTP id f2mr10445175wmb.130.1607191599655; Sat, 05 Dec 2020 10:06:39 -0800 (PST) MIME-Version: 1.0 References: <20201203172633.14B9.27F6AC2D@kcn.ne.jp> In-Reply-To: <20201203172633.14B9.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Sat, 5 Dec 2020 10:06:27 -0800 Message-ID: Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Thu, Dec 3, 2020 at 12:26 AM Norihiro Tanaka wrote: > On Thu, 26 Nov 2020 21:41:20 -1000 > Jim Meyering wrote: > > > On Thu, Nov 26, 2020 at 9:03 AM Jim Meyering wrote: > > > > > > On Wed, Nov 25, 2020 at 3:12 PM Jim Meyering wrote: > > > > Thank you for the fine bug report. > > > > The grep-3.6 bug you've exposed is due to the fact that your input > > > > triggers excessive hash collisions when using the code modeled after > > > > gnulib/lib/hash-pjw.c. That made the new pattern-preprocessing phase > > > > take O(N^2) time for N patterns. In the attached, I've switched grep > > > > to use the djb2 hash function, and that resolves the problem. I'll > > > > also add a NEWS entry and a test before pushing this. > > > > > > Timings suggest that grep-3.6's preprocessing came closer to O(N^3). > > > Here's an example that would take 2-3 days with grep-3.6 and only > > > seconds with this fix: > > > > > > : | grep -Ff <(seq 6400000 | tr 0-9 A-J) > > > > > > Here's a complete patch. > > > I'll push it later today. > > > > Pushed along with two gnulib-related changes. > > The fix has improved some performance. However, it's still quite slow > compared to version 3.3, and that can be remedied. > > It converts to grep only if the potential match does not match the word > frequently. Thank you for that patch. Can you say a little more about the domain of the problem? I.e., is it specific to invocations with "-w"? Can you provide an example that exhibits the performance improvement, with timings? From unknown Sat Aug 16 22:48:05 2025 X-Loop: help-debbugs@gnu.org Subject: bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6 Resent-From: Frank Heckenbach Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sun, 06 Dec 2020 06:07:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 44754 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: noritnk@kcn.ne.jp, jim@meyering.net Cc: 44754@debbugs.gnu.org Received: via spool by 44754-submit@debbugs.gnu.org id=B44754.160723479810569 (code B ref 44754); Sun, 06 Dec 2020 06:07:02 +0000 Received: (at 44754) by debbugs.gnu.org; 6 Dec 2020 06:06:38 +0000 Received: from localhost ([127.0.0.1]:48745 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1klnBt-0002kP-GM for submit@debbugs.gnu.org; Sun, 06 Dec 2020 01:06:38 -0500 Received: from mx3.gerwinski.de ([88.198.170.59]:39001) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1klmgv-0001xT-Tv for 44754@debbugs.gnu.org; Sun, 06 Dec 2020 00:34:38 -0500 Received: from pd9f48616.dip0.t-ipconnect.de ([217.244.134.22] helo=mars) by m31.gerwinski.de with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1klmgc-0006BC-9f; Sun, 06 Dec 2020 06:34:29 +0100 Received: from frank by mars with local-rmail (Exim 4.92) (envelope-from ) id 1klmgV-0006a4-K0; Sun, 06 Dec 2020 06:34:11 +0100 References: <20201203172633.14B9.27F6AC2D@kcn.ne.jp> In-Reply-To: From: Frank Heckenbach MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=ISO-8859-1 Message-Id: Date: Sun, 06 Dec 2020 06:34:11 +0100 X-Spam-Score: 0.0 (/) X-Mailman-Approved-At: Sun, 06 Dec 2020 01:06:36 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Jim Meyering wrote: > On Thu, Dec 3, 2020 at 12:26 AM Norihiro Tanaka wrote: > > On Thu, 26 Nov 2020 21:41:20 -1000 > > Jim Meyering wrote: > > > > > On Thu, Nov 26, 2020 at 9:03 AM Jim Meyering wrote: > > > > > > > > On Wed, Nov 25, 2020 at 3:12 PM Jim Meyering wrote: > > > > > Thank you for the fine bug report. > > > > > The grep-3.6 bug you've exposed is due to the fact that your input > > > > > triggers excessive hash collisions when using the code modeled after > > > > > gnulib/lib/hash-pjw.c. That made the new pattern-preprocessing phase > > > > > take O(N^2) time for N patterns. In the attached, I've switched grep > > > > > to use the djb2 hash function, and that resolves the problem. I'll > > > > > also add a NEWS entry and a test before pushing this. > > > > > > > > Timings suggest that grep-3.6's preprocessing came closer to O(N^3). > > > > Here's an example that would take 2-3 days with grep-3.6 and only > > > > seconds with this fix: > > > > > > > > : | grep -Ff <(seq 6400000 | tr 0-9 A-J) > > > > > > > > Here's a complete patch. > > > > I'll push it later today. > > > > > > Pushed along with two gnulib-related changes. > > > > The fix has improved some performance. However, it's still quite slow > > compared to version 3.3, and that can be remedied. > > > > It converts to grep only if the potential match does not match the word > > frequently. > > Thank you for that patch. Can you say a little more about the domain > of the problem? > I.e., is it specific to invocations with "-w"? > Can you provide an example that exhibits the performance improvement, > with timings? I can confirm that it seems to solve my problem (my actual use case): grep-3.3: real 0m0.861s user 0m0.784s sys 0m0.076s grep-3.6 with this patch: real 0m1.052s user 0m0.943s sys 0m0.108s grep-3.6 with both your patch and this one: real 0m1.097s user 0m1.037s sys 0m0.060s Apparently there were at least two issues here. While your patch fixes the simplified case I reported and had no effect on my actual use case, Norihiro Tanaka's patch fixes the latter (and has no big effect on the former). So it looks like both patches are indeed needed. Thanks, Frank From unknown Sat Aug 16 22:48:05 2025 X-Loop: help-debbugs@gnu.org Subject: bug#44754: Extreme performance degradation in GNU grep 3.4 / 3.6 Resent-From: Norihiro Tanaka Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 07 Dec 2020 07:39:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 44754 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Jim Meyering Cc: 44754@debbugs.gnu.org, f.heckenbach@fh-soft.de Received: via spool by 44754-submit@debbugs.gnu.org id=B44754.16073266857343 (code B ref 44754); Mon, 07 Dec 2020 07:39:01 +0000 Received: (at 44754) by debbugs.gnu.org; 7 Dec 2020 07:38:05 +0000 Received: from localhost ([127.0.0.1]:52209 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kmB5x-0001uM-4e for submit@debbugs.gnu.org; Mon, 07 Dec 2020 02:38:05 -0500 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:35672) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kmB5u-0001tp-D1 for 44754@debbugs.gnu.org; Mon, 07 Dec 2020 02:38:03 -0500 Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 0C6564A0269 for <44754@debbugs.gnu.org>; Mon, 7 Dec 2020 16:37:55 +0900 (JST) X-matriXscan-loop-detect: bd78718d9d1724ff55c82a10c8a1c721536a7195 Received: from mail13.kcn.ne.jp ([61.86.6.131]) by mxs01-s with ESMTP; Mon, 07 Dec 2020 16:37:52 +0900 (JST) Received: from [10.120.1.62] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail13.kcn.ne.jp (Postfix) with ESMTPA id AA05A415C4B6; Mon, 7 Dec 2020 16:37:52 +0900 (JST) Date: Mon, 07 Dec 2020 16:37:51 +0900 From: Norihiro Tanaka In-Reply-To: References: <20201203172633.14B9.27F6AC2D@kcn.ne.jp> Message-Id: <20201207163750.C9C2.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.75.01 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Sat, 5 Dec 2020 10:06:27 -0800 Jim Meyering wrote: > Thank you for that patch. Can you say a little more about the domain > of the problem? > I.e., is it specific to invocations with "-w"? > Can you provide an example that exhibits the performance improvement, > with timings? The test case shown by Frank speeds-up about 10x. That is a little slower than grep 3.3. Before: $ env LC_ALL=C time -p src/grep -Fwif <(seq 300000 | tr 0-9 A-J) <<