Yes, it still reproduces when I configure the latest grep using --without-included-regex: 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) 1: idx = 0 (gdb) 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); 1: idx = 0 (gdb) 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) 1: idx = 0 (gdb) 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) 1: idx = 0 (gdb) 1428 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match, 1: idx = 0 (gdb) 1432 if (__glibc_unlikely (cur_node < 0)) 1: idx = 0 (gdb) 1400 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) 1: idx = 0 (gdb) 1402 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); 1: idx = 0 (gdb) 1404 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) 1: idx = 0 (gdb) 1405 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) 1: idx = 0 The endless loop looks the same. grep's regexec.c is mostly the same as glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N directives added. -Dimitry > On 28 Mar 2023, at 08:46, arnold@skeeve.com wrote: > > This does not reproduce with gawk, even when I force use of the regex > matcher. > > What happens if grep is built with the included regex files instead of > relying on the ones in the local glibc? > > Arnold > > Dimitry Andric wrote: > >> On 27 Mar 2023, at 11:14, Koen Claessen wrote: >>> >>> Running the command: >>> >>> echo a | grep -E -w '((()|a)|())*' >>> >>> does not terminate, and uses a LOT of processor time, for all versions of >>> grep I have tried. >>> >>> This is the smallest case that could be found; simplifying anything in the >>> input and/or expression leads to correct behavior. >> >> Reproducible with GNU grep 3.7 on Ubuntu 22: >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 93938 dim 20 0 9.0m 2.1m 2.0m R 100.0 0.0 0:08.32 grep -E -w ((()|a)|())* >> >> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), and it is that implementation which ends up in an endless loop. >> >> It loops between lines 1415, 1417 and 1443, but idx and cur_node never change from 0: >> >> 1378 static reg_errcode_t >> 1379 __attribute_warn_unused_result__ >> 1380 set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t nmatch, >> 1381 regmatch_t *pmatch, bool fl_backtrack) >> 1382 { >> ... >> 1415 for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;) >> 1416 { >> 1417 update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch); >> 1418 >> 1419 if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node) >> 1420 || (fs && re_node_set_contains (&eps_via_nodes, cur_node))) >> ... >> 1442 /* Proceed to next node. */ >> 1443 cur_node = proceed_next_node (mctx, nmatch, pmatch, prev_idx_match, >> 1444 &idx, cur_node, >> 1445 &eps_via_nodes, fs); >> 1446 >> 1447 if (__glibc_unlikely (cur_node < 0)) >> ... >> 1465 } >> 1466 } >> >> -Dimitry >> >> P.S.: Interestingly this does not reproduce with BSD grep, which returns immediately with "a". >>