GNU bug report logs -
#29668
grep: Fatal problem with (big) file
Previous Next
Reported by: pg <pasi.vitsa <at> yahoo.com>
Date: Mon, 11 Dec 2017 22:03:02 UTC
Severity: normal
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
Message #20 received at 29668 <at> debbugs.gnu.org (full text, mbox):
On Wed, 13 Dec 2017 16:03:57 -0800
Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 12/13/2017 03:25 PM, Norihiro Tanaka wrote:
> > I don't seem that that's problem. the user pass output of grep to wc -l,
> > so `Binary file ... matches' line is also counted by `wc' as one line.
>
> The intent of 'grep PATTERN | wc -l' is to count the number of matches, like 'grep -c PATTERN' would. But it doesn't work that way here. E.g., on Fedora 27 with LANG=en_US.UTF-8:
>
> $ grep -c Volvo Tieliikenne5.0.csv
> 266175
> $ grep Volvo Tieliikenne5.0.csv | wc -l
> 241264
> $ grep Volvo Tieliikenne5.0.csv | tail -n 1
> Binary file Tieliikenne5.0.csv matches
>
> If the "Binary file ... matches" line were sent to stdout instead of to stderr, the problem would be more obvious to the user:
>
> $ grep -c Volvo Tieliikenne5.0.csv
> 266175
> $ grep Volvo Tieliikenne5.0.csv | wc -l
> Binary file Tieliikenne5.0.csv matches
> 241264
> $ grep Volvo Tieliikenne5.0.csv | tail -n 1
> Binary file Tieliikenne5.0.csv matches
> T;2017-09-29;75;01;;;19550000;;;;;1;1570;;3000;2595;1670;;01;2200;20.6;4;false;false;Volvo;;;;;01;;01;977;;;841;;5092946
>
> I believe that in the past I've thought that the "Binary file" message should be sent to stdout, but these examples are a reasonably compelling reason to send them to stderr instead.
In addition, the following problem can also occur.
$ printf 'Binary file a.txt matches\n' >a.txt
$ env LC_ALL=en_US.utf8 grep B a.txt
Binary file a.txt matches
$ printf '\xFFB\n' >a.txt
$ env LC_ALL=en_US.utf8 grep B a.txt
Binary file a.txt matches
Both are same output. However, the former displays the contents of the
matched line, OTOH the latter is not so. if "Binary file" is sent to stdout,
a user can not distinguish whether a.txt is text file or a binary file
without opening the file.
This bug report was last modified 4 years and 238 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.