GNU bug report logs - #22059
grep -E: unexpected behaviour

Previous Next

Package: grep;

Reported by: Charles <c <at> charlesmatkinson.org>

Date: Mon, 30 Nov 2015 07:23:02 UTC

Severity: wishlist

To reply to this bug, email your comments to 22059 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#22059; Package grep. (Mon, 30 Nov 2015 07:23:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Charles <c <at> charlesmatkinson.org>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Mon, 30 Nov 2015 07:23:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Charles <c <at> charlesmatkinson.org>
To: bug-grep <at> gnu.org
Subject: grep -E: unexpected behaviour
Date: Mon, 30 Nov 2015 10:27:55 +0530
As expected:

# grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1
Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? ±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? ±?¾MUæíE³èBãÄL'
Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? ±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? ±?¾MUæíE³èBãÄL'

But add the i to the pattern and the behaviour is unexpected:

# grep -E 'udisksd\[[[:digit:]]+\]: The string .* i' /var/log/syslog.1
[no output]

Apparently grep silently stops processing when it encounters the invalid UTF-8:

# grep -E --only-matching 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | tail -1
udisksd[2650]: The string `TSSTcorp CDDVDW

In case the specific unusual characters are relevant, here they are in hex:

# grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | head -1 | cut --delimiter=' ' --fields=10-11 | od -x
0000000 4853 8251 f265 88d0 b120 b8d3 4dbe e655
0000020 45ed e8b3 e342 4cc4 0a27
0000032

When the input has invalid characters so grep cannot process it, a message could be expected perhaps configurable by the -s/--no-messages option because the input is (sort of) unreadable.

Version: 2.20 from the Debian Jessie package 2.20-4.1

Charles





Information forwarded to bug-grep <at> gnu.org:
bug#22059; Package grep. (Mon, 30 Nov 2015 17:28:02 GMT) Full text and rfc822 format available.

Message #8 received at 22059 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Charles <c <at> charlesmatkinson.org>, 22059 <at> debbugs.gnu.org
Subject: Re: bug#22059: grep -E: unexpected behaviour
Date: Mon, 30 Nov 2015 09:27:25 -0800
On 11/29/2015 08:57 PM, Charles wrote:
> Apparently grep silently stops processing when it encounters the invalid UTF-8:

The regular expression "." matches a single character, and ".*" matches 
a string of characters. In your example, there is an encoding error, and 
encoding errors are not characters so "." and ".*" do not match them. I 
don't see any bug here.

> When the input has invalid characters so grep cannot process it, a message could be expected

That's a good suggestion, yes.




Severity set to 'wishlist' from 'normal' Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Thu, 31 Dec 2015 08:56:02 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 173 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.