GNU bug report logs - #25749
grep 3.0 skips "binary" lines in ssconvert output

Previous Next

Package: grep;

Reported by: Alexey Shipunov <dactylorhiza <at> gmail.com>

Date: Thu, 16 Feb 2017 05:01:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Alexey Shipunov <dactylorhiza <at> gmail.com>
Subject: bug#25749: closed (Re: bug#25749: grep 3.0 skips "binary" lines
 in ssconvert output)
Date: Thu, 16 Feb 2017 07:12:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#25749: grep 3.0 skips "binary" lines in ssconvert output

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 25749 <at> debbugs.gnu.org.

-- 
25749: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=25749
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Alexey Shipunov <dactylorhiza <at> gmail.com>, 25749-done <at> debbugs.gnu.org
Subject: Re: bug#25749: grep 3.0 skips "binary" lines in ssconvert output
Date: Wed, 15 Feb 2017 23:11:04 -0800
When I tried to read that attachment, gedit complained "There was a problem 
opening" it, and then "The file you opened has some invalid characters. If you 
continue editing this file you could corrupt this document. You can also choose 
another character encoding and try again." So it is not only "grep" that is 
having problems with the file.

Looking into it further, the file contains a non-text byte in line 13676, in the 
string "1 <at> 8MI W OF RALEIGH", where the "@" denotes a byte with octal value 233. 
This is invalid UTF-8 text. You can work around the issue by replacing the 
non-text byte with a valid character, or by using "grep -a" as you noted, or by 
setting the LC_ALL environment variable to "C", or by using a grep pattern that 
does not match the non-text line.

[Message part 3 (message/rfc822, inline)]
From: Alexey Shipunov <dactylorhiza <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: grep 3.0 skips "binary" lines in ssconvert output
Date: Wed, 15 Feb 2017 22:36:36 -0600
[Message part 4 (text/plain, inline)]
Dear Madam or Sir,

That problem almost ruined my work today.

I made the following note to myself but you might be also interested:

===
current grep (2.25) is much faster than 2.5.4 from Lucid but SKIPS
"binary" lines in ssconvert output, freshly compiled grep 3.0 skips
less but still does it. Workaround: look for "binary match" phrase in
the end of file and apply grep -a. Report to
https://www.gnu.org/software/grep/manual/html_node/Reporting-Bugs.html
?

===

The file of question (gzipped) is attached.

My system:

===
$ uname -a
Linux ... 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017
x86_64 x86_64 x86_64 GNU/Linux
===

Commands which reproduce the problem:

===
grep . usa-format.txt > 1
grep -a . usa-format.txt > 2
diff 1 2
===

Again, the problem exists with both Ubuntu Xenial default grep 2.25
and new grep 3.0

With best wishes,

Alexey Shipunov
[usa-format.txt.gz (application/x-gzip, attachment)]

This bug report was last modified 8 years and 93 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.