GNU bug report logs - #29668
grep: Fatal problem with (big) file

Previous Next

Package: grep;

Reported by: pg <pasi.vitsa <at> yahoo.com>

Date: Mon, 11 Dec 2017 22:03:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #14 received at 29668 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 29668 <at> debbugs.gnu.org, toimitus <at> masinistit.com, webmaster <at> ubuntu.com,
 pg <pasi.vitsa <at> yahoo.com>
Subject: Re: bug#29668: grep: Fatal problem with (big) file
Date: Thu, 14 Dec 2017 08:25:26 +0900
On Tue, 12 Dec 2017 16:28:09 -0800
Paul Eggert <eggert <at> cs.ucla.edu> wrote:

> On 12/11/2017 03:36 PM, Norihiro Tanaka wrote:
> > Perhaps, characters not to be able to recognize in your locale included
> > in Tieliikenne 5.0.csv and volvot.csv are included.
> 
> Yes, that's the problem. The original 'grep' output ended in "Binary file Tieliikenne5.0.csv matches" but the user didn't see that. Perhaps we should send that diagnostic to stderr as well.

I don't seem that that's problem.  the user pass output of grep to wc -l,
so `Binary file ... matches' line is also counted by `wc' as one line.

$ env LC_ALL=C grep 'Volvo' Tieliikenne\ 5.0.csv | wc -l
266175
$ env LC_ALL=en_US.utf8 grep 'Volvo' Tieliikenne\ 5.0.csv | wc -l
241264
$ env LC_ALL=en_US.utf8 grep 'Volvo' Tieliikenne\ 5.0.csv | tail -1
Binary file Tieliikenne 5.0.csv matches

$ env LC_ALL=C grep N3 volvot.csv | wc -l
17822
$ env LC_ALL=en_US.utf8 grep N3 volvot.csv | wc -l
11741
$ env LC_ALL=en_US.utf8 grep N3 volvot.csv | tail -1
Binary file volvot.csv matches





This bug report was last modified 4 years and 239 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.