GNU bug report logs - #21558
checking for a binary file is not deterministic

Previous Next

Package: grep;

Reported by: Benno Schulenberg <bensberg <at> justemail.net>

Date: Fri, 25 Sep 2015 09:12:01 UTC

Severity: normal

Merged with 19230, 19985, 20526

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Benno Schulenberg <bensberg <at> justemail.net>
To: Grep <bug-grep <at> gnu.org>
Subject: checking for a binary file is not deterministic
Date: Fri, 25 Sep 2015 11:11:06 +0200
Hi,

When piping a certain diff into grep-2.21, it sometimes thinks
it is a binary file, and sometimes treats it as text.  The latter
behaviour is expected and desired.  I think grep should never
consider standard input to be binary.

For lack of a simple recipe, here is the actual use case:

  wget http://http.debian.net/debian/pool/main/g/gtkorphan/gtkorphan_0.4.4.orig.tar.gz
  tar -xf gtkorphan_0.4.4.orig.tar.gz
  cd gtkorphan-0.4.4/
  mkdir fresh
  # the command rsync does not work at this location:
  for lang in pt_BR bg zh_CN hr cs da nl eo fi fr de hu id it lv pl ru sr sv vi;  do \
    wget http://translationproject.org/PO-files/$lang/gtkorphan-0.4.3.$lang.po -O fresh/$lang.po; \
  done

  diff -ur po fresh | /usr/local/bin/grep "Only in" | grep "fi"

That last command sometimes outputs:

  Only in fresh: fi.po
  Only in po: Makefile.in.in

and sometimes:

  Binary file (standard input) matches

(If you can't get the second output, try hitting Enter a few times
and then running the command again, and again, and again.  If you
still can't get both outputs, try using the en_US.utf8 locale.)


What seems to happening is that sometimes grep will look
far enough to see the diff between po/fr.po and fresh/fr.po
(which contains some ISO8859-1 codes), and sometimes
not.  When deleting fresh/bg.po and fresh/de.po, grep will
always see those codes and will always consider the input
to be binary.

I can of course use -a to force grep to see standard input
as text, but still... I think the determining whether a file
is text or binary should be deterministic: it should always
yield the same result when the input is the same.


$ /usr/local/bin/grep --version | head -1
/usr/local/bin/grep (GNU grep) 2.21

$ grep --version | head -1
grep (GNU grep) 2.21

$ diff --version | head -1
diff (GNU diffutils) 2.8.1

$ locale
LANG=eo.utf8
LANGUAGE=en
LC_CTYPE="eo.utf8"
LC_NUMERIC="eo.utf8"
LC_TIME="eo.utf8"
LC_COLLATE="eo.utf8"
LC_MONETARY="eo.utf8"
LC_MESSAGES="eo.utf8"
LC_PAPER="eo.utf8"
LC_NAME="eo.utf8"
LC_ADDRESS="eo.utf8"
LC_TELEPHONE="eo.utf8"
LC_MEASUREMENT="eo.utf8"
LC_IDENTIFICATION="eo.utf8"
LC_ALL=

Benno

-- 
http://www.fastmail.com - Accessible with your email software
                          or over the web





This bug report was last modified 9 years and 139 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.