Thanks to Jim and for the new release. Let's just start with, for next release I want to add further improvement to it. In dfaexec, DFA state is always 0 until have found potential match. So we can improve matching there by continuing to use the transition table without replacing it. I tested the patch, and got about 3x speed-up. $ yes j | head -10000000 >k $ env LC_ALL=C time -p src/grep '\(a\|b\)' k Before: real 1.12 user 1.06 sys 0.04 After : real 0.39 user 0.34 sys 0.04 I also tested for non-utf8 multibyte locale. $ env LC_ALL=ja_JP.eucJP time -p src/grep '\(a\|b\)' k Before: real 1.41 user 1.35 sys 0.05 After : real 0.38 user 0.32 sys 0.06 By the way, below on grep 2.18 (non-patch). (^_^) $ env LANG=ja_JP.eucJP time -p src/grep '\(a\|b\)' k real 12.00 user 11.86 sys 0.13