I found it when I was testing various new regular expression algorithms against grep (which I used as the golden standard for this).

I used a random generator for regular expressions (using the QuickCheck framework) and then shrinking/delta debugging to automatically find the smallest failing test case.

BTW, if you are interested, I could do a larger more targeted effort stress testing grep like this and possibly find more test cases with unexpected behavior. I would need some guidance on where to put most effort in order to be as useful as this can be. I could find a MSc student to help out with that. Let me know if this sounds like an interesting thing to do!

kind regards,
/Koen

On Sun, Apr 2, 2023 at 6:15 AM Jim Meyering <jim@meyering.net> wrote:
On Mon, Mar 27, 2023 at 6:15 AM Koen Claessen <koen@chalmers.se> wrote:
> Running the command:
>
>   echo a | grep -E -w '((()|a)|())*'
>
> does not terminate, and uses a LOT of processor time, for all versions of
> grep I have tried.
>
> This is the smallest case that could be found; simplifying anything in the
> input and/or expression leads to correct behavior.

Thank you! How did you find that?

FYI, this strikes grep-3.10 (on Fedora 37/glibc-2.36-9.fc37.x86_64)
when using LC_ALL=en_US.UTF-8, but not with LC_ALL=C.
I.e., this infloops:
   echo a | LC_ALL=en_US.UTF-8 grep -E -w '((()|a)|())*'

but this works as expected and promptly prints its line of input:
     echo a | LC_ALL=C grep -E -w '((()|a)|())*'

For now, I've added an expected-failing test case for this bug: