On binary files, it seems that testing the UTF-8 sequences in pcresearch.c is faster than asking pcre_exec to do that (because of the retry I assume); see attached patch. It actually checks UTF-8 only if an invalid sequence was already found by pcre_exec, assuming that pcre_exec can check the validity of a valid text file in a faster way. On some file similar to PDF (test 1): Before: 1.77s After: 1.38s But now, the main problem is the many pcre_exec. Indeed, if I replace the non-ASCII bytes by \n with: LC_ALL=C tr \\200-\\377 \\n (now, one has a valid file but with many short lines), the grep -P time is 1.52s (test 2). And if I replace the non-ASCII bytes by null bytes with: LC_ALL=C tr \\200-\\377 \\000 the grep -P time is 0.30s (test 3), thus it is much faster. Note also that libpcre is much slower than normal grep on simple words, but on "a[0-9]b", it can be faster: grep PCRE PCRE+patch test 1 4.31 1.90 1.53 test 2 0.18 1.61 1.63 test 3 3.28 0.39 0.39 With grep, I wonder why test 2 is much faster. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)