GNU bug report logs - #62483
echo a | grep -E -w '((()|a)|())*' # does not terminate

Previous Next

Package: grep;

Reported by: Koen Claessen <koen <at> chalmers.se>

Date: Mon, 27 Mar 2023 13:15:05 UTC

Severity: normal

Full log


View this message in rfc822 format

From: Dimitry Andric <dimitry <at> andric.com>
To: arnold <at> skeeve.com
Cc: 62483 <at> debbugs.gnu.org, koen <at> chalmers.se
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Date: Sun, 2 Apr 2023 11:33:42 +0200
[Message part 1 (text/plain, inline)]
Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
debugging purposes added CFLAGS="-O0 -g".

In any case, the problematic code is both in glibc and grep, as I
believe these are originating from the same source.

-Dimitry

> On 2 Apr 2023, at 08:52, arnold <at> skeeve.com wrote:
> 
> Hi.
> 
> Dimitry Andric <dimitry <at> andric.com> wrote:
> 
>> Yes, it still reproduces when I configure the latest grep using
>> --without-included-regex:
> 
> I assume you meant --with-included-regex?
> 
> If you really used --without-included-regex, that doesn't prove anything... :-)
> 
> It's interesting, as gawk uses the same regex, but with different flags.
> It might be worth trying to understand which of the syntax bits
> is causing this.
> 
> Thanks,
> 
> Arnold
> 
>> 
>> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
>> 1: idx = 0
>> (gdb)
>> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
>> 1: idx = 0
>> (gdb)
>> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
>> 1: idx = 0
>> (gdb)
>> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
>> 1: idx = 0
>> (gdb)
>> 1428          cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
>> 1: idx = 0
>> (gdb)
>> 1432          if (__glibc_unlikely (cur_node < 0))
>> 1: idx = 0
>> (gdb)
>> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
>> 1: idx = 0
>> (gdb)
>> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
>> 1: idx = 0
>> (gdb)
>> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
>> 1: idx = 0
>> (gdb)
>> 1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
>> 1: idx = 0
>> 
>> The endless loop looks the same. grep's regexec.c is mostly the same as
>> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
>> directives added.
>> 
>> -Dimitry
>> 
>>> On 28 Mar 2023, at 08:46, arnold <at> skeeve.com wrote:
>>> 
>>> This does not reproduce with gawk, even when I force use of the regex
>>> matcher.
>>> 
>>> What happens if grep is built with the included regex files instead of
>>> relying on the ones in the local glibc?
>>> 
>>> Arnold
>>> 
>>> Dimitry Andric <dimitry <at> andric.com> wrote:
>>> 
>>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen <at> chalmers.se> wrote:
>>>>> 
>>>>> Running the command:
>>>>> 
>>>>> echo a | grep -E -w '((()|a)|())*'
>>>>> 
>>>>> does not terminate, and uses a LOT of processor time, for all versions of
>>>>> grep I have tried.
>>>>> 
>>>>> This is the smallest case that could be found; simplifying anything in the
>>>>> input and/or expression leads to correct behavior.
>>>> 
>>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>>>> 
>>>>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>>>> 93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep -E -w ((()|a)|())*
>>>> 
>>>> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
>>>> 
>>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
>>>> 
>>>> 1378  static reg_errcode_t
>>>> 1379  __attribute_warn_unused_result__
>>>> 1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
>>>> 1381            regmatch_t *pmatch, bool fl_backtrack)
>>>> 1382  {
>>>> ...
>>>> 1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
>>>> 1416      {
>>>> 1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
>>>> 1418
>>>> 1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
>>>> 1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
>>>> ...
>>>> 1442        /* Proceed to next node.  */
>>>> 1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
>>>> 1444                                      &idx, cur_node,
>>>> 1445                                      &eps_via_nodes, fs);
>>>> 1446
>>>> 1447        if (__glibc_unlikely (cur_node < 0))
>>>> ...
>>>> 1465          }
>>>> 1466      }
>>>> 
>>>> -Dimitry
>>>> 
>>>> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
>>>> 
>> 

[signature.asc (application/pgp-signature, attachment)]

This bug report was last modified 2 years and 74 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.