GNU bug report logs - #43862
[PATCH] grep: set RE_NO_SUB for calling regex only to check syntax

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Thu, 8 Oct 2020 09:41:01 UTC

Severity: normal

Tags: patch

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 43862 <at> debbugs.gnu.org
Subject: bug#43862: [PATCH] grep: set RE_NO_SUB for calling regex only to check syntax
Date: Mon, 12 Oct 2020 16:08:28 -0700
On Thu, Oct 8, 2020 at 2:41 AM Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>
> We can set RE_NO_SUB for calling regex only to check syntax.  It brings
> performance gains in cases to have a lot of enormous epsilon nodes.
>
>
> $ printf '(%020000d)\n' | sed 's/0/|/g' >pat
>
> (before)
> $ time -p env LC_ALL=C src/grep -Ef pat /dev/null
> real 6.15
> user 4.62
> sys 1.52
>
> (after)
> $ time -p env LC_ALL=C src/grep -Ef pat /dev/null
> real 0.66
> user 0.19
> sys 0.46

Thank you.

FYI, when running similar commands with and without your patch (with
an eye to adding a test), I ran this one (with your patch). It shows
that using 80,000 terms caused grep to consume 32GB of memory before
being OOM-killed:

$ printf '(%080000d)\n' | sed 's/0/|/g' | env time src/grep -Ef- /dev/null
Command terminated by signal 9
6.42user 19.98system 0:57.91elapsed 45%CPU (0avgtext+0avgdata
32024460maxresident)k
6504inputs+0outputs (92major+12003644minor)pagefaults 0swaps
[Exit 137 (KILL)]

I will come back to this later this week.




This bug report was last modified 4 years and 205 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.