GNU bug report logs - #20526
BUG: text file is detected as binary

Previous Next

Package: grep;

Reported by: Sebastian Poehn <sebastian.poehn <at> gmail.com>

Date: Thu, 7 May 2015 15:41:03 UTC

Severity: normal

Merged with 19230, 19985, 21558

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Benno Schulenberg <bensberg <at> justemail.net>
Subject: bug#21558: closed (Re: grep BUG: text file is detected as binary)
Date: Thu, 31 Dec 2015 03:26:03 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#20526: checking for a binary file is not deterministic

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 21558 <at> debbugs.gnu.org.

-- 
20526: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20526
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 20526-done <at> debbugs.gnu.org
Cc: Kamil Dudka <kdudka <at> redhat.com>, Benno Schulenberg <bensberg <at> justemail.net>,
 Mike Frysinger <vapier <at> gentoo.org>, Johannes Meixner <jsmeix <at> suse.de>,
 Hans Pelleboer <hanspelleboer <at> online.nl>,
 Sebastian Poehn <sebastian.poehn <at> gmail.com>,
 Ángel González <angel <at> re.16bits.net>,
 Eric Blake <eblake <at> redhat.com>
Subject: Re: grep BUG: text file is detected as binary
Date: Wed, 30 Dec 2015 19:25:04 -0800
[Message part 3 (text/plain, inline)]
I installed into Savannah a patch (attached) that should fix this problem in 
typical cases, and am boldly marking the bug as done. Please give the fix a try 
if you have the time. Thanks.
[0001-grep-be-less-picky-about-encoding-errors.patch (text/x-diff, attachment)]
[Message part 5 (message/rfc822, inline)]
From: Benno Schulenberg <bensberg <at> justemail.net>
To: Grep <bug-grep <at> gnu.org>
Subject: checking for a binary file is not deterministic
Date: Fri, 25 Sep 2015 11:11:06 +0200
Hi,

When piping a certain diff into grep-2.21, it sometimes thinks
it is a binary file, and sometimes treats it as text.  The latter
behaviour is expected and desired.  I think grep should never
consider standard input to be binary.

For lack of a simple recipe, here is the actual use case:

  wget http://http.debian.net/debian/pool/main/g/gtkorphan/gtkorphan_0.4.4.orig.tar.gz
  tar -xf gtkorphan_0.4.4.orig.tar.gz
  cd gtkorphan-0.4.4/
  mkdir fresh
  # the command rsync does not work at this location:
  for lang in pt_BR bg zh_CN hr cs da nl eo fi fr de hu id it lv pl ru sr sv vi;  do \
    wget http://translationproject.org/PO-files/$lang/gtkorphan-0.4.3.$lang.po -O fresh/$lang.po; \
  done

  diff -ur po fresh | /usr/local/bin/grep "Only in" | grep "fi"

That last command sometimes outputs:

  Only in fresh: fi.po
  Only in po: Makefile.in.in

and sometimes:

  Binary file (standard input) matches

(If you can't get the second output, try hitting Enter a few times
and then running the command again, and again, and again.  If you
still can't get both outputs, try using the en_US.utf8 locale.)


What seems to happening is that sometimes grep will look
far enough to see the diff between po/fr.po and fresh/fr.po
(which contains some ISO8859-1 codes), and sometimes
not.  When deleting fresh/bg.po and fresh/de.po, grep will
always see those codes and will always consider the input
to be binary.

I can of course use -a to force grep to see standard input
as text, but still... I think the determining whether a file
is text or binary should be deterministic: it should always
yield the same result when the input is the same.


$ /usr/local/bin/grep --version | head -1
/usr/local/bin/grep (GNU grep) 2.21

$ grep --version | head -1
grep (GNU grep) 2.21

$ diff --version | head -1
diff (GNU diffutils) 2.8.1

$ locale
LANG=eo.utf8
LANGUAGE=en
LC_CTYPE="eo.utf8"
LC_NUMERIC="eo.utf8"
LC_TIME="eo.utf8"
LC_COLLATE="eo.utf8"
LC_MONETARY="eo.utf8"
LC_MESSAGES="eo.utf8"
LC_PAPER="eo.utf8"
LC_NAME="eo.utf8"
LC_ADDRESS="eo.utf8"
LC_TELEPHONE="eo.utf8"
LC_MEASUREMENT="eo.utf8"
LC_IDENTIFICATION="eo.utf8"
LC_ALL=

Benno

-- 
http://www.fastmail.com - Accessible with your email software
                          or over the web




This bug report was last modified 9 years and 138 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.