From unknown Sun Jun 22 00:42:57 2025 X-Loop: help-debbugs@gnu.org Subject: bug#22103: [PATCH] grep: improve performance for grep -P in UTF-8 Resent-From: Norihiro Tanaka Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sun, 06 Dec 2015 23:02:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 22103 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: 22103@debbugs.gnu.org X-Debbugs-Original-To: Received: via spool by submit@debbugs.gnu.org id=B.144944291522806 (code B ref -1); Sun, 06 Dec 2015 23:02:01 +0000 Received: (at submit) by debbugs.gnu.org; 6 Dec 2015 23:01:55 +0000 Received: from localhost ([127.0.0.1]:41061 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5iJX-0005vm-CW for submit@debbugs.gnu.org; Sun, 06 Dec 2015 18:01:55 -0500 Received: from eggs.gnu.org ([208.118.235.92]:60204) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5iJD-0005vJ-GD for submit@debbugs.gnu.org; Sun, 06 Dec 2015 18:01:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a5iJB-0004r5-VA for submit@debbugs.gnu.org; Sun, 06 Dec 2015 18:01:35 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:59665) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5iJB-0004r1-Si for submit@debbugs.gnu.org; Sun, 06 Dec 2015 18:01:33 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41965) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5iJA-0003QH-Cp for bug-grep@gnu.org; Sun, 06 Dec 2015 18:01:33 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a5iJ7-0004p6-1c for bug-grep@gnu.org; Sun, 06 Dec 2015 18:01:32 -0500 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:55435) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5iJ6-0004nR-IU for bug-grep@gnu.org; Sun, 06 Dec 2015 18:01:28 -0500 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 47AF380015 for ; Mon, 7 Dec 2015 08:01:24 +0900 (JST) X-matriXscan-loop-detect: 290c8523ed2c7339f511c67656f56e2b08b51380 Received: from mail09.kcn.ne.jp ([61.86.6.188]) by mxs02-s with ESMTP; Mon, 07 Dec 2015 08:01:23 +0900 (JST) Received: from [10.120.1.72] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail09.kcn.ne.jp (Postfix) with ESMTPA id 0419D1BD00BF for ; Mon, 7 Dec 2015 08:01:22 +0900 (JST) Date: Mon, 07 Dec 2015 08:01:23 +0900 From: Norihiro Tanaka Message-Id: <20151207080123.8BBA.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5664B90F000000008BAB_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --------_5664B90F000000008BAB_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit After grep -P found first match, TEXTBIN_UNKNOWN optimizations is not used. Therefore, if grep -P found early match, grep -P is very slow in UTF-8. $ time -p grep -P ^1$ <(seq 999999) 1 real 14.55 user 13.77 sys 1.12 Or grep -Pa is not used TEXTBIN_UNKNOWN optimizations. Therefere, it is also very slow in UTF-8. grep -P ^1$ <(seq 999999) $ time -p grep -Pa a <(seq 999999) real 14.53 user 13.65 sys 1.35 This change makes deference to leave TEXTBIN_UNKNOWN optimizations until grep -P finds a binary character. It will bring more than 10x speed up. $ time -p src/grep -P ^1$ <(seq 999999) 1 real 0.97 user 0.79 sys 0.24 $ time -p src/grep -Pa a <(seq 999999) real 0.98 user 0.23 sys 0.99 BTW, this change conflicts with proposal in bug#22028. --------_5664B90F000000008BAB_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-grep-improve-performance-for-grep-P-in-UTF-8.patch" Content-Disposition: attachment; filename="0001-grep-improve-performance-for-grep-P-in-UTF-8.patch" Content-Transfer-Encoding: base64 RnJvbSAyY2Y5ODU5NGUxYjdjZTc0OTBkMGI2ZDc1NTFmNTJkNjVjY2Q0NGE0IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBUaHUsIDI2IE5vdiAyMDE1IDE1OjM0OjEzICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZ3Jl cDogaW1wcm92ZSBwZXJmb3JtYW5jZSBmb3IgZ3JlcCAtUCBpbiBVVEYtOAoKZ3JlcCAtUCB1c2Vz IGxpbmUgYnkgbGluZSBzZWFyY2ggYWZ0ZXIgZm91bmQgZmlyc3QgbWF0Y2ggb3Igc3BlY2lmaWVk IC1hCm9wdGlvbiwgYnV0IGl0IGlzIHZlcnkgc2xvdy4gIFRoaXMgY2hhbmdlIGFsc28gdHJpZXMg dG8gdXNlIG11bHRpLWxpbmUKc2VhcmNoIGFmdGVyIHRoZW0gdW50aWwgZm91bmQgbm90IHRleHQg Y2hhcmFjdGVyLgoKKiBzcmMvZ3JlcC5jIChncmVwKTogRG8gaXQuCiogTkVXUzogTWVudGlvbiBp dC4KLS0tCiBORVdTICAgICAgIHwgIDYgKysrKysrCiBzcmMvZ3JlcC5jIHwgMjggKysrKysrKysr KysrKystLS0tLS0tLS0tLS0tLQogMiBmaWxlcyBjaGFuZ2VkLCAyMCBpbnNlcnRpb25zKCspLCAx NCBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9ORVdTIGIvTkVXUwppbmRleCBhYzYzMmQ3Li5h OWE3MDQyIDEwMDY0NAotLS0gYS9ORVdTCisrKyBiL05FV1MKQEAgLTIsNiArMiwxMiBAQCBHTlUg Z3JlcCBORVdTICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgLSotIG91dGxpbmUg LSotCiAKICogTm90ZXdvcnRoeSBjaGFuZ2VzIGluIHJlbGVhc2UgPy4/ICg/Pz8/LT8/LT8/KSBb P10KIAorKiogSW1wcm92ZW1lbnRzCisKKyAgUGVyZm9ybWFuY2UgaGFzIGltcHJvdmVkIGZvciBn cmVwIC1QIGluIFVURi04LiAgQmVmb3JlLCBjb21tYW5kcworICBsaWtlIHRoZSBmb2xsb3dpbmcg d291bGQgc3BlZWQgdXAgbW9yZSB0aGFuIDEweDoKKyAgICBncmVwIC1QIF4xJCA8KHNlcSA5OTk5 OTkpCisgICAgZ3JlcCAtYVAgYSA8KHNlcSA5OTk5OTkpCiAKICogTm90ZXdvcnRoeSBjaGFuZ2Vz IGluIHJlbGVhc2UgMi4yMiAoMjAxNS0xMS0wMSkgW3N0YWJsZV0KIApkaWZmIC0tZ2l0IGEvc3Jj L2dyZXAuYyBiL3NyYy9ncmVwLmMKaW5kZXggMmM1ZTA5YS4uYTFlZTE4MyAxMDA2NDQKLS0tIGEv c3JjL2dyZXAuYworKysgYi9zcmMvZ3JlcC5jCkBAIC0xMzQ1LDcgKzEzNDUsNyBAQCBncmVwIChp bnQgZmQsIHN0cnVjdCBzdGF0IGNvbnN0ICpzdCkKICAgICAgIHJldHVybiAwOwogICAgIH0KIAot ICBpZiAoYmluYXJ5X2ZpbGVzID09IFRFWFRfQklOQVJZX0ZJTEVTKQorICBpZiAoYmluYXJ5X2Zp bGVzID09IFRFWFRfQklOQVJZX0ZJTEVTICYmIGV4ZWN1dGUgIT0gUGV4ZWN1dGUpCiAgICAgdGV4 dGJpbiA9IFRFWFRCSU5fVEVYVDsKICAgZWxzZQogICAgIHsKQEAgLTE0MTUsMTMgKzE0MTUsOCBA QCBncmVwIChpbnQgZmQsIHN0cnVjdCBzdGF0IGNvbnN0ICpzdCkKICAgICAgICAgfQogCiAgICAg ICAvKiBEZXRlY3Qgd2hldGhlciBsZWFkaW5nIGNvbnRleHQgaXMgYWRqYWNlbnQgdG8gcHJldmlv dXMgb3V0cHV0LiAgKi8KLSAgICAgIGlmIChsYXN0b3V0KQotICAgICAgICB7Ci0gICAgICAgICAg aWYgKHRleHRiaW4gPT0gVEVYVEJJTl9VTktOT1dOKQotICAgICAgICAgICAgdGV4dGJpbiA9IFRF WFRCSU5fVEVYVDsKLSAgICAgICAgICBpZiAoYmVnICE9IGxhc3RvdXQpCi0gICAgICAgICAgICBs YXN0b3V0ID0gMDsKLSAgICAgICAgfQorICAgICAgaWYgKGJlZyAhPSBsYXN0b3V0KQorICAgICAg ICBsYXN0b3V0ID0gTlVMTDsKIAogICAgICAgLyogSGFuZGxlIHNvbWUgZGV0YWlscyBhbmQgcmVh ZCBtb3JlIGRhdGEgdG8gc2Nhbi4gICovCiAgICAgICBzYXZlID0gcmVzaWR1ZSArIGxpbSAtIGJl ZzsKQEAgLTE0NDIsMTIgKzE0MzcsMTcgQEAgZ3JlcCAoaW50IGZkLCBzdHJ1Y3Qgc3RhdCBjb25z dCAqc3QpCiAgICAgICAgICAgZW51bSB0ZXh0YmluIHRiID0gYnVmZmVyX3RleHRiaW4gKGJ1ZmJl ZywgYnVmbGltIC0gYnVmYmVnKTsKICAgICAgICAgICBpZiAodGV4dGJpbl9pc19iaW5hcnkgKHRi KSkKICAgICAgICAgICAgIHsKLSAgICAgICAgICAgICAgaWYgKGJpbmFyeV9maWxlcyA9PSBXSVRI T1VUX01BVENIX0JJTkFSWV9GSUxFUykKLSAgICAgICAgICAgICAgICByZXR1cm4gMDsKLSAgICAg ICAgICAgICAgdGV4dGJpbiA9IHRiOwotICAgICAgICAgICAgICBkb25lX29uX21hdGNoID0gb3V0 X3F1aWV0ID0gdHJ1ZTsKLSAgICAgICAgICAgICAgbnVsX3phcHBlciA9IGVvbDsKLSAgICAgICAg ICAgICAgc2tpcF9udWxzID0gc2tpcF9lbXB0eV9saW5lczsKKyAgICAgICAgICAgICAgaWYgKG5s aW5lcyB8fCBiaW5hcnlfZmlsZXMgPT0gVEVYVF9CSU5BUllfRklMRVMpCisgICAgICAgICAgICAg ICAgdGV4dGJpbiA9IFRFWFRCSU5fVEVYVDsKKyAgICAgICAgICAgICAgZWxzZQorICAgICAgICAg ICAgICAgIHsKKyAgICAgICAgICAgICAgICAgIGlmIChiaW5hcnlfZmlsZXMgPT0gV0lUSE9VVF9N QVRDSF9CSU5BUllfRklMRVMpCisgICAgICAgICAgICAgICAgICAgIHJldHVybiAwOworICAgICAg ICAgICAgICAgICAgdGV4dGJpbiA9IHRiOworICAgICAgICAgICAgICAgICAgZG9uZV9vbl9tYXRj aCA9IG91dF9xdWlldCA9IHRydWU7CisgICAgICAgICAgICAgICAgICBudWxfemFwcGVyID0gZW9s OworICAgICAgICAgICAgICAgICAgc2tpcF9udWxzID0gc2tpcF9lbXB0eV9saW5lczsKKyAgICAg ICAgICAgICAgICB9CiAgICAgICAgICAgICB9CiAgICAgICAgIH0KICAgICB9Ci0tIAoyLjQuNgoK --------_5664B90F000000008BAB_MULTIPART_MIXED_-- From unknown Sun Jun 22 00:42:57 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Norihiro Tanaka Subject: bug#22103: closed (Re: bug#20526: grep BUG: text file is detected as binary) Message-ID: References: <20160108224632.A9BA.27F6AC2D@kcn.ne.jp> <20151207080123.8BBA.27F6AC2D@kcn.ne.jp> X-Gnu-PR-Message: they-closed 22103 X-Gnu-PR-Package: grep X-Gnu-PR-Keywords: patch Reply-To: 22103@debbugs.gnu.org Date: Fri, 08 Jan 2016 13:47:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1452260822-24510-1" This is a multi-part message in MIME format... ------------=_1452260822-24510-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #22103: [PATCH] grep: improve performance for grep -P in UTF-8 which was filed against the grep package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 22103@debbugs.gnu.org. --=20 22103: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D22103 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1452260822-24510-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 22103-done) by debbugs.gnu.org; 8 Jan 2016 13:46:44 +0000 Received: from localhost ([127.0.0.1]:42398 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aHXNM-0006Ms-Gk for submit@debbugs.gnu.org; Fri, 08 Jan 2016 08:46:44 -0500 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:36025) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aHXNK-0006Mb-9c for 22103-done@debbugs.gnu.org; Fri, 08 Jan 2016 08:46:42 -0500 Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id AD08980241 for <22103-done@debbugs.gnu.org>; Fri, 8 Jan 2016 22:46:35 +0900 (JST) X-matriXscan-loop-detect: eb0ad9b332750cc9038946b976ca17175a5ec7ec Received: from mail05.kcn.ne.jp ([61.86.6.184]) by mxs01-s with ESMTP; Fri, 08 Jan 2016 22:46:32 +0900 (JST) Received: from [10.120.1.74] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail05.kcn.ne.jp (Postfix) with ESMTPA id C427D7D0099 for <22103-done@debbugs.gnu.org>; Fri, 8 Jan 2016 22:46:32 +0900 (JST) Date: Fri, 08 Jan 2016 22:46:33 +0900 From: Norihiro Tanaka To: 22103-done@debbugs.gnu.org Subject: Re: bug#20526: grep BUG: text file is detected as binary In-Reply-To: <568D559A.6050000@cs.ucla.edu> References: <568CD111.5010801@cs.ucla.edu> <568D559A.6050000@cs.ucla.edu> Message-Id: <20160108224632.A9BA.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 22103-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) On Wed, 6 Jan 2016 09:57:46 -0800 Paul Eggert wrote: > On 01/06/2016 12:32 AM, Paul Eggert wrote: > > I installed the attached patch, which fixed this performance bug for me. > Whoops! I forgot to 'git add src/search.h' before committing. We also need the attached followup patch, which I installed. Great! Thanks, many issues including for output of invalid sequence are fixed by your patches. bug#22103 is also fixed in them, so I am closing it. ------------=_1452260822-24510-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 6 Dec 2015 23:01:55 +0000 Received: from localhost ([127.0.0.1]:41061 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5iJX-0005vm-CW for submit@debbugs.gnu.org; Sun, 06 Dec 2015 18:01:55 -0500 Received: from eggs.gnu.org ([208.118.235.92]:60204) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5iJD-0005vJ-GD for submit@debbugs.gnu.org; Sun, 06 Dec 2015 18:01:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a5iJB-0004r5-VA for submit@debbugs.gnu.org; Sun, 06 Dec 2015 18:01:35 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:59665) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5iJB-0004r1-Si for submit@debbugs.gnu.org; Sun, 06 Dec 2015 18:01:33 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41965) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5iJA-0003QH-Cp for bug-grep@gnu.org; Sun, 06 Dec 2015 18:01:33 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a5iJ7-0004p6-1c for bug-grep@gnu.org; Sun, 06 Dec 2015 18:01:32 -0500 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:55435) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a5iJ6-0004nR-IU for bug-grep@gnu.org; Sun, 06 Dec 2015 18:01:28 -0500 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 47AF380015 for ; Mon, 7 Dec 2015 08:01:24 +0900 (JST) X-matriXscan-loop-detect: 290c8523ed2c7339f511c67656f56e2b08b51380 Received: from mail09.kcn.ne.jp ([61.86.6.188]) by mxs02-s with ESMTP; Mon, 07 Dec 2015 08:01:23 +0900 (JST) Received: from [10.120.1.72] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail09.kcn.ne.jp (Postfix) with ESMTPA id 0419D1BD00BF for ; Mon, 7 Dec 2015 08:01:22 +0900 (JST) Date: Mon, 07 Dec 2015 08:01:23 +0900 From: Norihiro Tanaka To: Subject: [PATCH] grep: improve performance for grep -P in UTF-8 Message-Id: <20151207080123.8BBA.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5664B90F000000008BAB_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --------_5664B90F000000008BAB_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit After grep -P found first match, TEXTBIN_UNKNOWN optimizations is not used. Therefore, if grep -P found early match, grep -P is very slow in UTF-8. $ time -p grep -P ^1$ <(seq 999999) 1 real 14.55 user 13.77 sys 1.12 Or grep -Pa is not used TEXTBIN_UNKNOWN optimizations. Therefere, it is also very slow in UTF-8. grep -P ^1$ <(seq 999999) $ time -p grep -Pa a <(seq 999999) real 14.53 user 13.65 sys 1.35 This change makes deference to leave TEXTBIN_UNKNOWN optimizations until grep -P finds a binary character. It will bring more than 10x speed up. $ time -p src/grep -P ^1$ <(seq 999999) 1 real 0.97 user 0.79 sys 0.24 $ time -p src/grep -Pa a <(seq 999999) real 0.98 user 0.23 sys 0.99 BTW, this change conflicts with proposal in bug#22028. --------_5664B90F000000008BAB_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-grep-improve-performance-for-grep-P-in-UTF-8.patch" Content-Disposition: attachment; filename="0001-grep-improve-performance-for-grep-P-in-UTF-8.patch" Content-Transfer-Encoding: base64 RnJvbSAyY2Y5ODU5NGUxYjdjZTc0OTBkMGI2ZDc1NTFmNTJkNjVjY2Q0NGE0IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBUaHUsIDI2IE5vdiAyMDE1IDE1OjM0OjEzICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZ3Jl cDogaW1wcm92ZSBwZXJmb3JtYW5jZSBmb3IgZ3JlcCAtUCBpbiBVVEYtOAoKZ3JlcCAtUCB1c2Vz IGxpbmUgYnkgbGluZSBzZWFyY2ggYWZ0ZXIgZm91bmQgZmlyc3QgbWF0Y2ggb3Igc3BlY2lmaWVk IC1hCm9wdGlvbiwgYnV0IGl0IGlzIHZlcnkgc2xvdy4gIFRoaXMgY2hhbmdlIGFsc28gdHJpZXMg dG8gdXNlIG11bHRpLWxpbmUKc2VhcmNoIGFmdGVyIHRoZW0gdW50aWwgZm91bmQgbm90IHRleHQg Y2hhcmFjdGVyLgoKKiBzcmMvZ3JlcC5jIChncmVwKTogRG8gaXQuCiogTkVXUzogTWVudGlvbiBp dC4KLS0tCiBORVdTICAgICAgIHwgIDYgKysrKysrCiBzcmMvZ3JlcC5jIHwgMjggKysrKysrKysr KysrKystLS0tLS0tLS0tLS0tLQogMiBmaWxlcyBjaGFuZ2VkLCAyMCBpbnNlcnRpb25zKCspLCAx NCBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9ORVdTIGIvTkVXUwppbmRleCBhYzYzMmQ3Li5h OWE3MDQyIDEwMDY0NAotLS0gYS9ORVdTCisrKyBiL05FV1MKQEAgLTIsNiArMiwxMiBAQCBHTlUg Z3JlcCBORVdTICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgLSotIG91dGxpbmUg LSotCiAKICogTm90ZXdvcnRoeSBjaGFuZ2VzIGluIHJlbGVhc2UgPy4/ICg/Pz8/LT8/LT8/KSBb P10KIAorKiogSW1wcm92ZW1lbnRzCisKKyAgUGVyZm9ybWFuY2UgaGFzIGltcHJvdmVkIGZvciBn cmVwIC1QIGluIFVURi04LiAgQmVmb3JlLCBjb21tYW5kcworICBsaWtlIHRoZSBmb2xsb3dpbmcg d291bGQgc3BlZWQgdXAgbW9yZSB0aGFuIDEweDoKKyAgICBncmVwIC1QIF4xJCA8KHNlcSA5OTk5 OTkpCisgICAgZ3JlcCAtYVAgYSA8KHNlcSA5OTk5OTkpCiAKICogTm90ZXdvcnRoeSBjaGFuZ2Vz IGluIHJlbGVhc2UgMi4yMiAoMjAxNS0xMS0wMSkgW3N0YWJsZV0KIApkaWZmIC0tZ2l0IGEvc3Jj L2dyZXAuYyBiL3NyYy9ncmVwLmMKaW5kZXggMmM1ZTA5YS4uYTFlZTE4MyAxMDA2NDQKLS0tIGEv c3JjL2dyZXAuYworKysgYi9zcmMvZ3JlcC5jCkBAIC0xMzQ1LDcgKzEzNDUsNyBAQCBncmVwIChp bnQgZmQsIHN0cnVjdCBzdGF0IGNvbnN0ICpzdCkKICAgICAgIHJldHVybiAwOwogICAgIH0KIAot ICBpZiAoYmluYXJ5X2ZpbGVzID09IFRFWFRfQklOQVJZX0ZJTEVTKQorICBpZiAoYmluYXJ5X2Zp bGVzID09IFRFWFRfQklOQVJZX0ZJTEVTICYmIGV4ZWN1dGUgIT0gUGV4ZWN1dGUpCiAgICAgdGV4 dGJpbiA9IFRFWFRCSU5fVEVYVDsKICAgZWxzZQogICAgIHsKQEAgLTE0MTUsMTMgKzE0MTUsOCBA QCBncmVwIChpbnQgZmQsIHN0cnVjdCBzdGF0IGNvbnN0ICpzdCkKICAgICAgICAgfQogCiAgICAg ICAvKiBEZXRlY3Qgd2hldGhlciBsZWFkaW5nIGNvbnRleHQgaXMgYWRqYWNlbnQgdG8gcHJldmlv dXMgb3V0cHV0LiAgKi8KLSAgICAgIGlmIChsYXN0b3V0KQotICAgICAgICB7Ci0gICAgICAgICAg aWYgKHRleHRiaW4gPT0gVEVYVEJJTl9VTktOT1dOKQotICAgICAgICAgICAgdGV4dGJpbiA9IFRF WFRCSU5fVEVYVDsKLSAgICAgICAgICBpZiAoYmVnICE9IGxhc3RvdXQpCi0gICAgICAgICAgICBs YXN0b3V0ID0gMDsKLSAgICAgICAgfQorICAgICAgaWYgKGJlZyAhPSBsYXN0b3V0KQorICAgICAg ICBsYXN0b3V0ID0gTlVMTDsKIAogICAgICAgLyogSGFuZGxlIHNvbWUgZGV0YWlscyBhbmQgcmVh ZCBtb3JlIGRhdGEgdG8gc2Nhbi4gICovCiAgICAgICBzYXZlID0gcmVzaWR1ZSArIGxpbSAtIGJl ZzsKQEAgLTE0NDIsMTIgKzE0MzcsMTcgQEAgZ3JlcCAoaW50IGZkLCBzdHJ1Y3Qgc3RhdCBjb25z dCAqc3QpCiAgICAgICAgICAgZW51bSB0ZXh0YmluIHRiID0gYnVmZmVyX3RleHRiaW4gKGJ1ZmJl ZywgYnVmbGltIC0gYnVmYmVnKTsKICAgICAgICAgICBpZiAodGV4dGJpbl9pc19iaW5hcnkgKHRi KSkKICAgICAgICAgICAgIHsKLSAgICAgICAgICAgICAgaWYgKGJpbmFyeV9maWxlcyA9PSBXSVRI T1VUX01BVENIX0JJTkFSWV9GSUxFUykKLSAgICAgICAgICAgICAgICByZXR1cm4gMDsKLSAgICAg ICAgICAgICAgdGV4dGJpbiA9IHRiOwotICAgICAgICAgICAgICBkb25lX29uX21hdGNoID0gb3V0 X3F1aWV0ID0gdHJ1ZTsKLSAgICAgICAgICAgICAgbnVsX3phcHBlciA9IGVvbDsKLSAgICAgICAg ICAgICAgc2tpcF9udWxzID0gc2tpcF9lbXB0eV9saW5lczsKKyAgICAgICAgICAgICAgaWYgKG5s aW5lcyB8fCBiaW5hcnlfZmlsZXMgPT0gVEVYVF9CSU5BUllfRklMRVMpCisgICAgICAgICAgICAg ICAgdGV4dGJpbiA9IFRFWFRCSU5fVEVYVDsKKyAgICAgICAgICAgICAgZWxzZQorICAgICAgICAg ICAgICAgIHsKKyAgICAgICAgICAgICAgICAgIGlmIChiaW5hcnlfZmlsZXMgPT0gV0lUSE9VVF9N QVRDSF9CSU5BUllfRklMRVMpCisgICAgICAgICAgICAgICAgICAgIHJldHVybiAwOworICAgICAg ICAgICAgICAgICAgdGV4dGJpbiA9IHRiOworICAgICAgICAgICAgICAgICAgZG9uZV9vbl9tYXRj aCA9IG91dF9xdWlldCA9IHRydWU7CisgICAgICAgICAgICAgICAgICBudWxfemFwcGVyID0gZW9s OworICAgICAgICAgICAgICAgICAgc2tpcF9udWxzID0gc2tpcF9lbXB0eV9saW5lczsKKyAgICAg ICAgICAgICAgICB9CiAgICAgICAgICAgICB9CiAgICAgICAgIH0KICAgICB9Ci0tIAoyLjQuNgoK --------_5664B90F000000008BAB_MULTIPART_MIXED_-- ------------=_1452260822-24510-1-- From unknown Sun Jun 22 00:42:57 2025 X-Loop: help-debbugs@gnu.org Subject: bug#22103: bug#20526: grep BUG: text file is detected as binary Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Fri, 08 Jan 2016 21:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 22103 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: 22103@debbugs.gnu.org, Norihiro Tanaka Cc: 22103-done@debbugs.gnu.org Received: via spool by 22103-submit@debbugs.gnu.org id=B22103.145228894016000 (code B ref 22103); Fri, 08 Jan 2016 21:36:02 +0000 Received: (at 22103) by debbugs.gnu.org; 8 Jan 2016 21:35:40 +0000 Received: from localhost ([127.0.0.1]:43480 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aHehA-00049v-04 for submit@debbugs.gnu.org; Fri, 08 Jan 2016 16:35:40 -0500 Received: from mail-io0-f175.google.com ([209.85.223.175]:35814) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aHeh8-00049e-44; Fri, 08 Jan 2016 16:35:38 -0500 Received: by mail-io0-f175.google.com with SMTP id 77so268263096ioc.2; Fri, 08 Jan 2016 13:35:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=kzqcHwK3hncilTAOuJfapf5mh8X/ZbdmldIzreZbZz4=; b=x277vbLHT+jF2PXb6eft11g0lTkmSzmFlmuDcWrLd184Rvmo1hyofNSECWbbckC3NZ EoSdyJjwFPCh8lm8TXgoXG8OAWICu+z37l7R+0cSpM2z9lJmuJV493hy96hjEk683VDm 0zPyj8n3ag372nu2GvtPZmmpc+j+DpVQWuj/jiTrLSOBU0RzcLlFgW/IRQjQR+S97D0v E51uI63cueE0vHlDkPzck+zTOjcEmnJ7+uMeFt9hkLR9SGr3bQdWDHaDSb+KGXYNns35 CEIZ5csU2KTQNGAle6BRrqOZhmBgfJWUj06rVfXrFd9TKbXHVmEyzc5IVsD2wNLyrKDK GmVg== X-Received: by 10.107.27.6 with SMTP id b6mr105747327iob.163.1452288932642; Fri, 08 Jan 2016 13:35:32 -0800 (PST) MIME-Version: 1.0 Received: by 10.36.10.18 with HTTP; Fri, 8 Jan 2016 13:35:12 -0800 (PST) In-Reply-To: <20160108224632.A9BA.27F6AC2D@kcn.ne.jp> References: <568CD111.5010801@cs.ucla.edu> <568D559A.6050000@cs.ucla.edu> <20160108224632.A9BA.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Fri, 8 Jan 2016 13:35:12 -0800 X-Google-Sender-Auth: 0bKW8DwdZffhNxkI5fbYOT95Ifo Message-ID: Content-Type: text/plain; charset=UTF-8 X-Spam-Score: -0.4 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.4 (/) On Fri, Jan 8, 2016 at 5:46 AM, Norihiro Tanaka wrote: > > On Wed, 6 Jan 2016 09:57:46 -0800 > Paul Eggert wrote: > >> On 01/06/2016 12:32 AM, Paul Eggert wrote: >> > I installed the attached patch, which fixed this performance bug for me. >> Whoops! I forgot to 'git add src/search.h' before committing. We also need the attached followup patch, which I installed. > > Great! Thanks, many issues including for output of invalid sequence > are fixed by your patches. bug#22103 is also fixed in them, so I am > closing it. Thank you for helping with bug triage.