GNU bug report logs - #22059
grep -E: unexpected behaviour

Previous Next

Package: grep;

Reported by: Charles <c <at> charlesmatkinson.org>

Date: Mon, 30 Nov 2015 07:23:02 UTC

Severity: wishlist

Full log


View this message in rfc822 format

From: Charles <c <at> charlesmatkinson.org>
To: 22059 <at> debbugs.gnu.org
Subject: bug#22059: grep -E: unexpected behaviour
Date: Mon, 30 Nov 2015 10:27:55 +0530
As expected:

# grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1
Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? ±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? ±?¾MUæíE³èBãÄL'
Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? ±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? ±?¾MUæíE³èBãÄL'

But add the i to the pattern and the behaviour is unexpected:

# grep -E 'udisksd\[[[:digit:]]+\]: The string .* i' /var/log/syslog.1
[no output]

Apparently grep silently stops processing when it encounters the invalid UTF-8:

# grep -E --only-matching 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | tail -1
udisksd[2650]: The string `TSSTcorp CDDVDW

In case the specific unusual characters are relevant, here they are in hex:

# grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | head -1 | cut --delimiter=' ' --fields=10-11 | od -x
0000000 4853 8251 f265 88d0 b120 b8d3 4dbe e655
0000020 45ed e8b3 e342 4cc4 0a27
0000032

When the input has invalid characters so grep cannot process it, a message could be expected perhaps configurable by the -s/--no-messages option because the input is (sort of) unreadable.

Version: 2.20 from the Debian Jessie package 2.20-4.1

Charles





This bug report was last modified 9 years and 173 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.