GNU bug report logs -
#22059
grep -E: unexpected behaviour
Previous Next
To reply to this bug, email your comments to 22059 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#22059
; Package
grep
.
(Mon, 30 Nov 2015 07:23:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Charles <c <at> charlesmatkinson.org>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Mon, 30 Nov 2015 07:23:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
As expected:
# grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1
Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? ±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? ±?¾MUæíE³èBãÄL'
Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? ±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? ±?¾MUæíE³èBãÄL'
But add the i to the pattern and the behaviour is unexpected:
# grep -E 'udisksd\[[[:digit:]]+\]: The string .* i' /var/log/syslog.1
[no output]
Apparently grep silently stops processing when it encounters the invalid UTF-8:
# grep -E --only-matching 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | tail -1
udisksd[2650]: The string `TSSTcorp CDDVDW
In case the specific unusual characters are relevant, here they are in hex:
# grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | head -1 | cut --delimiter=' ' --fields=10-11 | od -x
0000000 4853 8251 f265 88d0 b120 b8d3 4dbe e655
0000020 45ed e8b3 e342 4cc4 0a27
0000032
When the input has invalid characters so grep cannot process it, a message could be expected perhaps configurable by the -s/--no-messages option because the input is (sort of) unreadable.
Version: 2.20 from the Debian Jessie package 2.20-4.1
Charles
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22059
; Package
grep
.
(Mon, 30 Nov 2015 17:28:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 22059 <at> debbugs.gnu.org (full text, mbox):
On 11/29/2015 08:57 PM, Charles wrote:
> Apparently grep silently stops processing when it encounters the invalid UTF-8:
The regular expression "." matches a single character, and ".*" matches
a string of characters. In your example, there is an encoding error, and
encoding errors are not characters so "." and ".*" do not match them. I
don't see any bug here.
> When the input has invalid characters so grep cannot process it, a message could be expected
That's a good suggestion, yes.
Severity set to 'wishlist' from 'normal'
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Thu, 31 Dec 2015 08:56:02 GMT)
Full text and
rfc822 format available.
This bug report was last modified 9 years and 173 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.