GNU bug report logs - #18455
grep 2.20 perl-regexp: invalid UTF-8 byte sequence in input

Previous Next

Package: grep;

Reported by: Mario Grgic <veritas.divina <at> gmail.com>

Date: Fri, 12 Sep 2014 02:21:01 UTC

Severity: wishlist

Merged with 18266

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mario Grgic <veritas.divina <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: grep 2.20 perl-regexp: invalid UTF-8 byte sequence in input
Date: Thu, 11 Sep 2014 21:27:22 -0400
This happens with GNU grep version 2.20 and PCRE 8.35 on Mac OS X. The following command reproduce the problem: 

$ printf 'j\x82\nj\n' | grep -P j
invalid UTF-8 byte sequence in input

But I usually encounter this when recursively searching through files and encountering a binary file which contains invalid UTF-8 sequence.  If binary file with invalid UTF-8 sequence is encountered first (without any other matches), grep will abort the entire recursive search and not even mention which file caused the error. This is somewhat confusing when you first encounter it. 

By the way, this works in GNU grep 2.18 without any errors (you get messages like binary file x matches), and with PCRE 8.33 or 8.35 (I have not tried any other combinations). 









This bug report was last modified 10 years and 249 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.