Norihiro Tanaka wrote: > I'm worried that to re-run for invalid UTF-8 makes slowness for searching > of the large number of binary files. Yes, that could be a problem, but even so it's better for grep to report matches than to give up and fail. Perhaps someone could optimize this better later, but to be honest given how flaky libpcre is we're probably better off spending our scarce development resources elsewhere. Santiago's latest patch still had some troubles, unfortunately. It could mishandle '^' by having it match just past an encoding error. It was less efficient than it could be, as it checked all valid bytes for UTF-8-edness twice. If I understand PCRE correctly (which quite possibly I don't), it also appeared to mishandle matches that contain nested subexpressions. But the worst part was that the code was too complicated (and this was true even before Santiago's patch was applied). So I rewrote it and installed the attached patch instead. Please give it a try.