GNU bug report logs -
#43862
[PATCH] grep: set RE_NO_SUB for calling regex only to check syntax
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Thu, 8 Oct 2020 09:41:01 UTC
Severity: normal
Tags: patch
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your message dated Sun, 1 Nov 2020 11:39:55 -0800
with message-id <CA+8g5KGYui0wYB3outbOJtZ4h1wQs-st=OrTdhPvaWgjFLL93w <at> mail.gmail.com>
and subject line Re: bug#43862: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax
has caused the debbugs.gnu.org bug report #43862,
regarding [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax
to be marked as done.
(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)
--
43862: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=43862
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
[Message part 3 (text/plain, inline)]
We can set RE_NO_SUB for calling regex only to check syntax. It brings
performance gains in cases to have a lot of enormous epsilon nodes.
$ printf '(%020000d)\n' | sed 's/0/|/g' >pat
(before)
$ time -p env LC_ALL=C src/grep -Ef pat /dev/null
real 6.15
user 4.62
sys 1.52
(after)
$ time -p env LC_ALL=C src/grep -Ef pat /dev/null
real 0.66
user 0.19
sys 0.46
[0001-grep-set-RE_NO_SUB-for-calling-regex-only-to-check-s.patch (text/plain, attachment)]
[Message part 5 (message/rfc822, inline)]
On Mon, Oct 12, 2020 at 4:08 PM Jim Meyering <jim <at> meyering.net> wrote:
> On Thu, Oct 8, 2020 at 2:41 AM Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> >
> > We can set RE_NO_SUB for calling regex only to check syntax. It brings
> > performance gains in cases to have a lot of enormous epsilon nodes.
> >
> >
> > $ printf '(%020000d)\n' | sed 's/0/|/g' >pat
> >
> > (before)
> > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null
> > real 6.15
> > user 4.62
> > sys 1.52
> >
> > (after)
> > $ time -p env LC_ALL=C src/grep -Ef pat /dev/null
> > real 0.66
> > user 0.19
> > sys 0.46
>
> Thank you.
>
> FYI, when running similar commands with and without your patch (with
> an eye to adding a test), I ran this one (with your patch). It shows
> that using 80,000 terms caused grep to consume 32GB of memory before
> being OOM-killed:
>
> $ printf '(%080000d)\n' | sed 's/0/|/g' | env time src/grep -Ef- /dev/null
> Command terminated by signal 9
> 6.42user 19.98system 0:57.91elapsed 45%CPU (0avgtext+0avgdata
> 32024460maxresident)k
> 6504inputs+0outputs (92major+12003644minor)pagefaults 0swaps
> [Exit 137 (KILL)]
>
> I will come back to this later this week.
We must accept the fact that extreme regular expressions will cause
resource exhaustion like that when processed by classical regex_*
functions. This is yet another good reason to prefer PCRE and to use
grep's -P option. In that case, it fails like this:
$ printf '(%080000d)\n' | sed 's/0/|/g' |grep -Pf- /dev/null
grep: regular expression is too large
I have just pushed your patch, but without adding a test.
This bug report was last modified 4 years and 205 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.