GNU bug report logs - #18455
grep 2.20 perl-regexp: invalid UTF-8 byte sequence in input

Previous Next

Package: grep;

Reported by: Mario Grgic <veritas.divina <at> gmail.com>

Date: Fri, 12 Sep 2014 02:21:01 UTC

Severity: wishlist

Merged with 18266

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18455 in the body.
You can then email your comments to 18455 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#18455; Package grep. (Fri, 12 Sep 2014 02:21:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mario Grgic <veritas.divina <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 12 Sep 2014 02:21:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mario Grgic <veritas.divina <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: grep 2.20 perl-regexp: invalid UTF-8 byte sequence in input
Date: Thu, 11 Sep 2014 21:27:22 -0400
This happens with GNU grep version 2.20 and PCRE 8.35 on Mac OS X. The following command reproduce the problem: 

$ printf 'j\x82\nj\n' | grep -P j
invalid UTF-8 byte sequence in input

But I usually encounter this when recursively searching through files and encountering a binary file which contains invalid UTF-8 sequence.  If binary file with invalid UTF-8 sequence is encountered first (without any other matches), grep will abort the entire recursive search and not even mention which file caused the error. This is somewhat confusing when you first encounter it. 

By the way, this works in GNU grep 2.18 without any errors (you get messages like binary file x matches), and with PCRE 8.33 or 8.35 (I have not tried any other combinations). 









Information forwarded to bug-grep <at> gnu.org:
bug#18455; Package grep. (Fri, 12 Sep 2014 03:41:01 GMT) Full text and rfc822 format available.

Message #8 received at 18455 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Mario Grgic <veritas.divina <at> gmail.com>
Cc: 18455 <at> debbugs.gnu.org
Subject: grep 2.20 perl-regexp: invalid UTF-8 byte sequence in input
Date: Thu, 11 Sep 2014 20:40:35 -0700
This appears to be the same as Bug#18266:

http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18266

which means it's fixed in the master version and the fix should appear 
in the next release.




Forcibly Merged 18266 18455. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Fri, 12 Sep 2014 03:43:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 15 Oct 2014 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 10 years and 249 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.