GNU bug report logs - #43862
[PATCH] grep: set RE_NO_SUB for calling regex only to check syntax

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Thu, 8 Oct 2020 09:41:01 UTC

Severity: normal

Tags: patch

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Subject: bug#43862: closed (Re: bug#43862: [PATCH] grep: set RE_NO_SUB for
 calling regex only to check syntax)
Date: Sun, 01 Nov 2020 19:41:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#43862: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 43862 <at> debbugs.gnu.org.

-- 
43862: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=43862
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 43862-done <at> debbugs.gnu.org
Subject: Re: bug#43862: [PATCH] grep: set RE_NO_SUB for calling regex only to
 check syntax
Date: Sun, 1 Nov 2020 11:39:55 -0800
On Mon, Oct 12, 2020 at 4:08 PM Jim Meyering <jim <at> meyering.net> wrote:
> On Thu, Oct 8, 2020 at 2:41 AM Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> >
> > We can set RE_NO_SUB for calling regex only to check syntax.  It brings
> > performance gains in cases to have a lot of enormous epsilon nodes.
> >
> >
> > $ printf '(%020000d)\n' | sed 's/0/|/g' >pat
> >
> > (before)
> > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null
> > real 6.15
> > user 4.62
> > sys 1.52
> >
> > (after)
> > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null
> > real 0.66
> > user 0.19
> > sys 0.46
>
> Thank you.
>
> FYI, when running similar commands with and without your patch (with
> an eye to adding a test), I ran this one (with your patch). It shows
> that using 80,000 terms caused grep to consume 32GB of memory before
> being OOM-killed:
>
> $ printf '(%080000d)\n' | sed 's/0/|/g' | env time src/grep -Ef- /dev/null
> Command terminated by signal 9
> 6.42user 19.98system 0:57.91elapsed 45%CPU (0avgtext+0avgdata
> 32024460maxresident)k
> 6504inputs+0outputs (92major+12003644minor)pagefaults 0swaps
> [Exit 137 (KILL)]
>
> I will come back to this later this week.

We must accept the fact that extreme regular expressions will cause
resource exhaustion like that when processed by classical regex_*
functions. This is yet another good reason to prefer PCRE and to use
grep's -P option. In that case, it fails like this:

$ printf '(%080000d)\n' | sed 's/0/|/g' |grep -Pf- /dev/null
grep: regular expression is too large

I have just pushed your patch, but without adding a test.

[Message part 3 (message/rfc822, inline)]
From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: <bug-grep <at> gnu.org>
Subject: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax
Date: Thu, 08 Oct 2020 18:40:36 +0900
[Message part 4 (text/plain, inline)]
We can set RE_NO_SUB for calling regex only to check syntax.  It brings
performance gains in cases to have a lot of enormous epsilon nodes.


$ printf '(%020000d)\n' | sed 's/0/|/g' >pat

(before)
$ time -p env LC_ALL=C src/grep -Ef pat /dev/null
real 6.15
user 4.62
sys 1.52

(after)
$ time -p env LC_ALL=C src/grep -Ef pat /dev/null
real 0.66
user 0.19
sys 0.46
[0001-grep-set-RE_NO_SUB-for-calling-regex-only-to-check-s.patch (text/plain, attachment)]

This bug report was last modified 4 years and 205 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.