GNU bug report logs -
#20526
BUG: text file is detected as binary
Previous Next
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your message dated Wed, 30 Dec 2015 19:25:04 -0800
with message-id <5684A010.4000302 <at> cs.ucla.edu>
and subject line Re: grep BUG: text file is detected as binary
has caused the debbugs.gnu.org bug report #20526,
regarding checking for a binary file is not deterministic
to be marked as done.
(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)
--
20526: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20526
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
Hi,
When piping a certain diff into grep-2.21, it sometimes thinks
it is a binary file, and sometimes treats it as text. The latter
behaviour is expected and desired. I think grep should never
consider standard input to be binary.
For lack of a simple recipe, here is the actual use case:
wget http://http.debian.net/debian/pool/main/g/gtkorphan/gtkorphan_0.4.4.orig.tar.gz
tar -xf gtkorphan_0.4.4.orig.tar.gz
cd gtkorphan-0.4.4/
mkdir fresh
# the command rsync does not work at this location:
for lang in pt_BR bg zh_CN hr cs da nl eo fi fr de hu id it lv pl ru sr sv vi; do \
wget http://translationproject.org/PO-files/$lang/gtkorphan-0.4.3.$lang.po -O fresh/$lang.po; \
done
diff -ur po fresh | /usr/local/bin/grep "Only in" | grep "fi"
That last command sometimes outputs:
Only in fresh: fi.po
Only in po: Makefile.in.in
and sometimes:
Binary file (standard input) matches
(If you can't get the second output, try hitting Enter a few times
and then running the command again, and again, and again. If you
still can't get both outputs, try using the en_US.utf8 locale.)
What seems to happening is that sometimes grep will look
far enough to see the diff between po/fr.po and fresh/fr.po
(which contains some ISO8859-1 codes), and sometimes
not. When deleting fresh/bg.po and fresh/de.po, grep will
always see those codes and will always consider the input
to be binary.
I can of course use -a to force grep to see standard input
as text, but still... I think the determining whether a file
is text or binary should be deterministic: it should always
yield the same result when the input is the same.
$ /usr/local/bin/grep --version | head -1
/usr/local/bin/grep (GNU grep) 2.21
$ grep --version | head -1
grep (GNU grep) 2.21
$ diff --version | head -1
diff (GNU diffutils) 2.8.1
$ locale
LANG=eo.utf8
LANGUAGE=en
LC_CTYPE="eo.utf8"
LC_NUMERIC="eo.utf8"
LC_TIME="eo.utf8"
LC_COLLATE="eo.utf8"
LC_MONETARY="eo.utf8"
LC_MESSAGES="eo.utf8"
LC_PAPER="eo.utf8"
LC_NAME="eo.utf8"
LC_ADDRESS="eo.utf8"
LC_TELEPHONE="eo.utf8"
LC_MEASUREMENT="eo.utf8"
LC_IDENTIFICATION="eo.utf8"
LC_ALL=
Benno
--
http://www.fastmail.com - Accessible with your email software
or over the web
[Message part 3 (message/rfc822, inline)]
[Message part 4 (text/plain, inline)]
I installed into Savannah a patch (attached) that should fix this problem in
typical cases, and am boldly marking the bug as done. Please give the fix a try
if you have the time. Thanks.
[0001-grep-be-less-picky-about-encoding-errors.patch (text/x-diff, attachment)]
This bug report was last modified 9 years and 138 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.