I believe that memchr() isn't slower on HP-UX and Solaris but more faster on x86 processors. Now I wrote the additional patch. It replaces `tp += d' to the other code in bmexec() and cwexec() for x86 processors. I see that the code is more complex and slower theoretically, but it's faster actually when `d' is small. (We can test it by `grep -i jk k' after patch#17230 and this patch.) Therefore, I think that memchr() is faster than `tp += d' because the increment instruction is more faster than the add instruction on x86 processors. KWSet may be slower than DFA on them in the worst case, even if doesn't use delta2. Try to run and compare below and compare without this patches. The former uses only KWSet and the later uses only DFA. Later is faster on Linux x86, because DFA uses increment instrument for shift of text pointer. It may generate a wrong usage among users. I think that it should be avoided. $ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 >k $ env LANG=C time -p src/grep jk k $ env LANG=C time -p src/grep -i jk k > sometimes the former is more important than the latter, and this may be > one of those times. I see that grep attaches a high value to performance. It uses not only regex but also DFA. Further more, DFA has the specific codes for utf-8 locale, which is used most frequently. I think that grep may also have specific codes for x86 processors, which are also used most frequently. Norihiro