GNU bug report logs -
#25749
grep 3.0 skips "binary" lines in ssconvert output
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 25749 in the body.
You can then email your comments to 25749 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#25749
; Package
grep
.
(Thu, 16 Feb 2017 05:01:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Alexey Shipunov <dactylorhiza <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Thu, 16 Feb 2017 05:01:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Dear Madam or Sir,
That problem almost ruined my work today.
I made the following note to myself but you might be also interested:
===
current grep (2.25) is much faster than 2.5.4 from Lucid but SKIPS
"binary" lines in ssconvert output, freshly compiled grep 3.0 skips
less but still does it. Workaround: look for "binary match" phrase in
the end of file and apply grep -a. Report to
https://www.gnu.org/software/grep/manual/html_node/Reporting-Bugs.html
?
===
The file of question (gzipped) is attached.
My system:
===
$ uname -a
Linux ... 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017
x86_64 x86_64 x86_64 GNU/Linux
===
Commands which reproduce the problem:
===
grep . usa-format.txt > 1
grep -a . usa-format.txt > 2
diff 1 2
===
Again, the problem exists with both Ubuntu Xenial default grep 2.25
and new grep 3.0
With best wishes,
Alexey Shipunov
[usa-format.txt.gz (application/x-gzip, attachment)]
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Thu, 16 Feb 2017 07:12:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Alexey Shipunov <dactylorhiza <at> gmail.com>
:
bug acknowledged by developer.
(Thu, 16 Feb 2017 07:12:02 GMT)
Full text and
rfc822 format available.
Message #10 received at 25749-done <at> debbugs.gnu.org (full text, mbox):
When I tried to read that attachment, gedit complained "There was a problem
opening" it, and then "The file you opened has some invalid characters. If you
continue editing this file you could corrupt this document. You can also choose
another character encoding and try again." So it is not only "grep" that is
having problems with the file.
Looking into it further, the file contains a non-text byte in line 13676, in the
string "1 <at> 8MI W OF RALEIGH", where the "@" denotes a byte with octal value 233.
This is invalid UTF-8 text. You can work around the issue by replacing the
non-text byte with a valid character, or by using "grep -a" as you noted, or by
setting the LC_ALL environment variable to "C", or by using a grep pattern that
does not match the non-text line.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#25749
; Package
grep
.
(Thu, 16 Feb 2017 07:16:01 GMT)
Full text and
rfc822 format available.
Message #13 received at submit <at> debbugs.gnu.org (full text, mbox):
P.S.
That problem does not exists in 2.5.4.
AS
2017-02-15 22:36 GMT-06:00 Alexey Shipunov <dactylorhiza <at> gmail.com>:
> Dear Madam or Sir,
>
> That problem almost ruined my work today.
>
> I made the following note to myself but you might be also interested:
>
> ===
> current grep (2.25) is much faster than 2.5.4 from Lucid but SKIPS
> "binary" lines in ssconvert output, freshly compiled grep 3.0 skips
> less but still does it. Workaround: look for "binary match" phrase in
> the end of file and apply grep -a. Report to
> https://www.gnu.org/software/grep/manual/html_node/Reporting-Bugs.html
> ?
>
> ===
>
> The file of question (gzipped) is attached.
>
> My system:
>
> ===
> $ uname -a
> Linux ... 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017
> x86_64 x86_64 x86_64 GNU/Linux
> ===
>
> Commands which reproduce the problem:
>
> ===
> grep . usa-format.txt > 1
> grep -a . usa-format.txt > 2
> diff 1 2
> ===
>
> Again, the problem exists with both Ubuntu Xenial default grep 2.25
> and new grep 3.0
>
> With best wishes,
>
> Alexey Shipunov
Information forwarded
to
bug-grep <at> gnu.org
:
bug#25749
; Package
grep
.
(Thu, 16 Feb 2017 07:16:02 GMT)
Full text and
rfc822 format available.
Message #16 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
P.P.S.
Attached are three diff files for grep 2.5.4, grep 2.25 and grep 3.0
AS
2017-02-15 22:47 GMT-06:00 Alexey Shipunov <dactylorhiza <at> gmail.com>:
> P.S.
>
> That problem does not exists in 2.5.4.
>
> AS
>
> 2017-02-15 22:36 GMT-06:00 Alexey Shipunov <dactylorhiza <at> gmail.com>:
>> Dear Madam or Sir,
>>
>> That problem almost ruined my work today.
>>
>> I made the following note to myself but you might be also interested:
>>
>> ===
>> current grep (2.25) is much faster than 2.5.4 from Lucid but SKIPS
>> "binary" lines in ssconvert output, freshly compiled grep 3.0 skips
>> less but still does it. Workaround: look for "binary match" phrase in
>> the end of file and apply grep -a. Report to
>> https://www.gnu.org/software/grep/manual/html_node/Reporting-Bugs.html
>> ?
>>
>> ===
>>
>> The file of question (gzipped) is attached.
>>
>> My system:
>>
>> ===
>> $ uname -a
>> Linux ... 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017
>> x86_64 x86_64 x86_64 GNU/Linux
>> ===
>>
>> Commands which reproduce the problem:
>>
>> ===
>> grep . usa-format.txt > 1
>> grep -a . usa-format.txt > 2
>> diff 1 2
>> ===
>>
>> Again, the problem exists with both Ubuntu Xenial default grep 2.25
>> and new grep 3.0
>>
>> With best wishes,
>>
>> Alexey Shipunov
[diffs.tar.gz (application/x-gzip, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#25749
; Package
grep
.
(Thu, 16 Feb 2017 07:40:01 GMT)
Full text and
rfc822 format available.
Message #19 received at 25749-done <at> debbugs.gnu.org (full text, mbox):
Hi,
Thanks for explanation. However, it does not explain why grep 2.5.4
has no problem with this file.
With best wishes,
Alexey
2017-02-16 1:11 GMT-06:00 Paul Eggert <eggert <at> cs.ucla.edu>:
> When I tried to read that attachment, gedit complained "There was a problem
> opening" it, and then "The file you opened has some invalid characters. If
> you continue editing this file you could corrupt this document. You can also
> choose another character encoding and try again." So it is not only "grep"
> that is having problems with the file.
>
> Looking into it further, the file contains a non-text byte in line 13676, in
> the string "1 <at> 8MI W OF RALEIGH", where the "@" denotes a byte with octal
> value 233. This is invalid UTF-8 text. You can work around the issue by
> replacing the non-text byte with a valid character, or by using "grep -a" as
> you noted, or by setting the LC_ALL environment variable to "C", or by using
> a grep pattern that does not match the non-text line.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#25749
; Package
grep
.
(Thu, 16 Feb 2017 07:54:01 GMT)
Full text and
rfc822 format available.
Message #22 received at 25749-done <at> debbugs.gnu.org (full text, mbox):
Alexey Shipunov wrote:
> it does not explain why grep 2.5.4
> has no problem with this file
Your test case relies on undefined behavior. In such cases, the behavior might
be want you want, and it might not. Although 2.5.4 happened to work the way you
wanted, its behavior was not guaranteed and had some other downsides, which is
why it was changed.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#25749
; Package
grep
.
(Thu, 16 Feb 2017 22:08:01 GMT)
Full text and
rfc822 format available.
Message #25 received at 25749 <at> debbugs.gnu.org (full text, mbox):
On 02/16/2017 10:10 AM, Alexey Shipunov wrote:
> I wonder how much work would require to invent new option, saying
> --binary-text-strict which will cause grep to stop with error and do
> not output anything...
Something like that shouldn't be hard, although "do not output anything"
is too simple, as we can't expect grep to make two passes through the
whole input, which means that it could already have output something
before it discovers the encoding error.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#25749
; Package
grep
.
(Thu, 16 Feb 2017 22:15:01 GMT)
Full text and
rfc822 format available.
Message #28 received at 25749 <at> debbugs.gnu.org (full text, mbox):
Yes, I understand. So maybe that supposed option should make grep go
through data two times? It could be hard though.
But reporting error not (only) in stdout but (also) to stderr should
be really helpful!
AS
2017-02-16 16:07 GMT-06:00 Paul Eggert <eggert <at> cs.ucla.edu>:
> On 02/16/2017 10:10 AM, Alexey Shipunov wrote:
>>
>> I wonder how much work would require to invent new option, saying
>> --binary-text-strict which will cause grep to stop with error and do
>> not output anything...
>
>
> Something like that shouldn't be hard, although "do not output anything" is
> too simple, as we can't expect grep to make two passes through the whole
> input, which means that it could already have output something before it
> discovers the encoding error.
>
Information forwarded
to
bug-grep <at> gnu.org
:
bug#25749
; Package
grep
.
(Thu, 16 Feb 2017 22:26:02 GMT)
Full text and
rfc822 format available.
Message #31 received at 25749 <at> debbugs.gnu.org (full text, mbox):
On 02/16/2017 02:14 PM, Alexey Shipunov wrote:
> But reporting error not (only) in stdout but (also) to stderr should
> be really helpful!
That sounds like a better idea.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 17 Mar 2017 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 8 years and 93 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.