GNU bug report logs -
#62483
echo a | grep -E -w '((()|a)|())*' # does not terminate
Previous Next
Full log
View this message in rfc822 format
Hi.
Dimitry Andric <dimitry <at> andric.com> wrote:
> Yes, it still reproduces when I configure the latest grep using
> --without-included-regex:
I assume you meant --with-included-regex?
If you really used --without-included-regex, that doesn't prove anything... :-)
It's interesting, as gawk uses the same regex, but with different flags.
It might be worth trying to understand which of the syntax bits
is causing this.
Thanks,
Arnold
>
> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1: idx = 0
> (gdb)
> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1: idx = 0
> (gdb)
> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1: idx = 0
> (gdb)
> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> 1: idx = 0
> (gdb)
> 1428 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> 1: idx = 0
> (gdb)
> 1432 if (__glibc_unlikely (cur_node < 0))
> 1: idx = 0
> (gdb)
> 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> 1: idx = 0
> (gdb)
> 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> 1: idx = 0
> (gdb)
> 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> 1: idx = 0
> (gdb)
> 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> 1: idx = 0
>
> The endless loop looks the same. grep's regexec.c is mostly the same as
> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
> directives added.
>
> -Dimitry
>
> > On 28 Mar 2023, at 08:46, arnold <at> skeeve.com wrote:
> >
> > This does not reproduce with gawk, even when I force use of the regex
> > matcher.
> >
> > What happens if grep is built with the included regex files instead of
> > relying on the ones in the local glibc?
> >
> > Arnold
> >
> > Dimitry Andric <dimitry <at> andric.com> wrote:
> >
> >> On 27 Mar 2023, at 11:14, Koen Claessen <koen <at> chalmers.se> wrote:
> >>>
> >>> Running the command:
> >>>
> >>> echo a | grep -E -w '((()|a)|())*'
> >>>
> >>> does not terminate, and uses a LOT of processor time, for all versions of
> >>> grep I have tried.
> >>>
> >>> This is the smallest case that could be found; simplifying anything in the
> >>> input and/or expression leads to correct behavior.
> >>
> >> Reproducible with GNU grep 3.7 on Ubuntu 22:
> >>
> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> >> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep -E -w ((()|a)|())*
> >>
> >> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop.
> >>
> >> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0:
> >>
> >> 1378 static reg_errcode_t
> >> 1379 __attribute_warn_unused_result__
> >> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch,
> >> 1381 regmatch_t *pmatch, bool fl_backtrack)
> >> 1382 {
> >> ...
> >> 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1416 {
> >> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
> >> 1418
> >> 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
> >> ...
> >> 1442 /* Proceed to next node. */
> >> 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match,
> >> 1444 &idx, cur_node,
> >> 1445 &eps_via_nodes, fs);
> >> 1446
> >> 1447 if (__glibc_unlikely (cur_node < 0))
> >> ...
> >> 1465 }
> >> 1466 }
> >>
> >> -Dimitry
> >>
> >> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a".
> >>
>
This bug report was last modified 2 years and 74 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.