GNU bug report logs - #62483
echo a | grep -E -w '((()|a)|())*' # does not terminate

Previous Next

Package: grep;

Reported by: Koen Claessen <koen <at> chalmers.se>

Date: Mon, 27 Mar 2023 13:15:05 UTC

Severity: normal

Full log


View this message in rfc822 format

From: arnold <at> skeeve.com
To: koen <at> chalmers.se, dimitry <at> andric.com
Cc: 62483 <at> debbugs.gnu.org
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Date: Tue, 28 Mar 2023 00:46:53 -0600
This does not reproduce with gawk, even when I force use of the regex
matcher.

What happens if grep is built with the included regex files instead of
relying on the ones in the local glibc?

Arnold

Dimitry Andric <dimitry <at> andric.com> wrote:

> On 27 Mar 2023, at 11:14, Koen Claessen <koen <at> chalmers.se> wrote:
> > 
> > Running the command:
> > 
> >  echo a | grep -E -w '((()|a)|())*'
> > 
> > does not terminate, and uses a LOT of processor time, for all versions of
> > grep I have tried.
> > 
> > This is the smallest case that could be found; simplifying anything in the
> > input and/or expression leads to correct behavior.
>
> Reproducible with GNU grep 3.7 on Ubuntu 22:
>
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>   93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep -E -w ((()|a)|())*
>
> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
>
> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
>
>   1378  static reg_errcode_t
>   1379  __attribute_warn_unused_result__
>   1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
>   1381            regmatch_t *pmatch, bool fl_backtrack)
>   1382  {
> ...
>   1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
>   1416      {
>   1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
>   1418
>   1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
>   1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> ...
>   1442        /* Proceed to next node.  */
>   1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
>   1444                                      &idx, cur_node,
>   1445                                      &eps_via_nodes, fs);
>   1446
>   1447        if (__glibc_unlikely (cur_node < 0))
> ...
>   1465          }
>   1466      }
>
> -Dimitry
>
> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
>




This bug report was last modified 2 years and 75 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.