GNU bug report logs - #62483
echo a | grep -E -w '((()|a)|())*' # does not terminate

Previous Next

Package: grep;

Reported by: Koen Claessen <koen <at> chalmers.se>

Date: Mon, 27 Mar 2023 13:15:05 UTC

Severity: normal

Full log


Message #8 received at 62483 <at> debbugs.gnu.org (full text, mbox):

From: Dimitry Andric <dimitry <at> andric.com>
To: Koen Claessen <koen <at> chalmers.se>
Cc: 62483 <at> debbugs.gnu.org
Subject: Re: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Date: Mon, 27 Mar 2023 19:54:27 +0200
[Message part 1 (text/plain, inline)]
On 27 Mar 2023, at 11:14, Koen Claessen <koen <at> chalmers.se> wrote:
> 
> Running the command:
> 
>  echo a | grep -E -w '((()|a)|())*'
> 
> does not terminate, and uses a LOT of processor time, for all versions of
> grep I have tried.
> 
> This is the smallest case that could be found; simplifying anything in the
> input and/or expression leads to correct behavior.

Reproducible with GNU grep 3.7 on Ubuntu 22:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep -E -w ((()|a)|())*

It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.

It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:

  1378  static reg_errcode_t
  1379  __attribute_warn_unused_result__
  1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
  1381            regmatch_t *pmatch, bool fl_backtrack)
  1382  {
...
  1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
  1416      {
  1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
  1418
  1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
  1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
...
  1442        /* Proceed to next node.  */
  1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
  1444                                      &idx, cur_node,
  1445                                      &eps_via_nodes, fs);
  1446
  1447        if (__glibc_unlikely (cur_node < 0))
...
  1465          }
  1466      }

-Dimitry

P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".

[signature.asc (application/pgp-signature, attachment)]

This bug report was last modified 2 years and 74 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.