GNU bug report logs - #20526
BUG: text file is detected as binary

Previous Next

Package: grep;

Reported by: Sebastian Poehn <sebastian.poehn <at> gmail.com>

Date: Thu, 7 May 2015 15:41:03 UTC

Severity: normal

Merged with 19230, 19985, 21558

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#21558: closed (checking for a binary file is not deterministic)
Date: Thu, 31 Dec 2015 03:26:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Wed, 30 Dec 2015 19:25:04 -0800
with message-id <5684A010.4000302 <at> cs.ucla.edu>
and subject line Re: grep BUG: text file is detected as binary
has caused the debbugs.gnu.org bug report #20526,
regarding checking for a binary file is not deterministic
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
20526: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20526
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Benno Schulenberg <bensberg <at> justemail.net>
To: Grep <bug-grep <at> gnu.org>
Subject: checking for a binary file is not deterministic
Date: Fri, 25 Sep 2015 11:11:06 +0200
Hi,

When piping a certain diff into grep-2.21, it sometimes thinks
it is a binary file, and sometimes treats it as text.  The latter
behaviour is expected and desired.  I think grep should never
consider standard input to be binary.

For lack of a simple recipe, here is the actual use case:

  wget http://http.debian.net/debian/pool/main/g/gtkorphan/gtkorphan_0.4.4.orig.tar.gz
  tar -xf gtkorphan_0.4.4.orig.tar.gz
  cd gtkorphan-0.4.4/
  mkdir fresh
  # the command rsync does not work at this location:
  for lang in pt_BR bg zh_CN hr cs da nl eo fi fr de hu id it lv pl ru sr sv vi;  do \
    wget http://translationproject.org/PO-files/$lang/gtkorphan-0.4.3.$lang.po -O fresh/$lang.po; \
  done

  diff -ur po fresh | /usr/local/bin/grep "Only in" | grep "fi"

That last command sometimes outputs:

  Only in fresh: fi.po
  Only in po: Makefile.in.in

and sometimes:

  Binary file (standard input) matches

(If you can't get the second output, try hitting Enter a few times
and then running the command again, and again, and again.  If you
still can't get both outputs, try using the en_US.utf8 locale.)


What seems to happening is that sometimes grep will look
far enough to see the diff between po/fr.po and fresh/fr.po
(which contains some ISO8859-1 codes), and sometimes
not.  When deleting fresh/bg.po and fresh/de.po, grep will
always see those codes and will always consider the input
to be binary.

I can of course use -a to force grep to see standard input
as text, but still... I think the determining whether a file
is text or binary should be deterministic: it should always
yield the same result when the input is the same.


$ /usr/local/bin/grep --version | head -1
/usr/local/bin/grep (GNU grep) 2.21

$ grep --version | head -1
grep (GNU grep) 2.21

$ diff --version | head -1
diff (GNU diffutils) 2.8.1

$ locale
LANG=eo.utf8
LANGUAGE=en
LC_CTYPE="eo.utf8"
LC_NUMERIC="eo.utf8"
LC_TIME="eo.utf8"
LC_COLLATE="eo.utf8"
LC_MONETARY="eo.utf8"
LC_MESSAGES="eo.utf8"
LC_PAPER="eo.utf8"
LC_NAME="eo.utf8"
LC_ADDRESS="eo.utf8"
LC_TELEPHONE="eo.utf8"
LC_MEASUREMENT="eo.utf8"
LC_IDENTIFICATION="eo.utf8"
LC_ALL=

Benno

-- 
http://www.fastmail.com - Accessible with your email software
                          or over the web



[Message part 3 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 20526-done <at> debbugs.gnu.org
Cc: Kamil Dudka <kdudka <at> redhat.com>, Benno Schulenberg <bensberg <at> justemail.net>,
 Mike Frysinger <vapier <at> gentoo.org>, Johannes Meixner <jsmeix <at> suse.de>,
 Hans Pelleboer <hanspelleboer <at> online.nl>,
 Sebastian Poehn <sebastian.poehn <at> gmail.com>,
 Ángel González <angel <at> re.16bits.net>,
 Eric Blake <eblake <at> redhat.com>
Subject: Re: grep BUG: text file is detected as binary
Date: Wed, 30 Dec 2015 19:25:04 -0800
[Message part 4 (text/plain, inline)]
I installed into Savannah a patch (attached) that should fix this problem in 
typical cases, and am boldly marking the bug as done. Please give the fix a try 
if you have the time. Thanks.
[0001-grep-be-less-picky-about-encoding-errors.patch (text/x-diff, attachment)]

This bug report was last modified 9 years and 138 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.