Norihiro Tanaka wrote:
> I'm worried that to re-run for invalid UTF-8 makes slowness for searching
> of the large number of binary files.

Yes, that could be a problem, but even so it's better for grep to report 
matches than to give up and fail.  Perhaps someone could optimize this 
better later, but to be honest given how flaky libpcre is we're probably 
better off spending our scarce development resources elsewhere.

Santiago's latest patch still had some troubles, unfortunately.  It 
could mishandle '^' by having it match just past an encoding error.  It 
was less efficient than it could be, as it checked all valid bytes for 
UTF-8-edness twice.  If I understand PCRE correctly (which quite 
possibly I don't), it also appeared to mishandle matches that contain 
nested subexpressions.  But the worst part was that the code was too 
complicated (and this was true even before Santiago's patch was 
applied).  So I rewrote it and installed the attached patch instead. 
Please give it a try.