Thanks for the test cases and patch. In my tests, switching to macros does not help performance, and that SWITCH macro's implementation actually slows things down a bit, which is what I'd expect. If there is a reason to use macros I'd like to see a patch that simply changes functions to macros without changing the algorithm, so that we can measure this effect separately from the algorithm change. I attempted to suss out the performance improvements in that patch without using macros, and installed the attached changes. With these changes grep performs about as well as it does with that patch, on the benchmarks you mentioned that I tried (as before, I'm using the default optimization with GCC 4.9.0 x86-64 on an AMD Phenom II X4 910e). Quite possibly I've missed something, of course. The two "advance_*" constants used in the heuristics are guesses: I haven't measured rigorously to try to come up with good values.