GNU bug report logs - #20526
BUG: text file is detected as binary

Previous Next

Package: grep;

Reported by: Sebastian Poehn <sebastian.poehn <at> gmail.com>

Date: Thu, 7 May 2015 15:41:03 UTC

Severity: normal

Merged with 19230, 19985, 21558

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, 20526 <at> debbugs.gnu.org, sebastian.poehn <at> gmail.com
Cc: Johannes Meixner <jsmeix <at> suse.de>, Kamil Dudka <kdudka <at> redhat.com>, Benno Schulenberg <bensberg <at> justemail.net>
Subject: bug#20526: grep BUG: text file is detected as binary
Date: Thu, 31 Dec 2015 01:29:35 -0800
Jim Meyering wrote:
> The combination of this and the grep -oP infloop fix make this look
> like a good time for a bug-fix release. If there are any other pending
> bug fixes or small+safe changes people would like to see included,
> please let us know.

I have one major qualm about this: since 'grep' no longer checks whether the 
input is correctly encoded, I expect this may hurt -P performance significantly 
(though it may help non -P performance). This is because PCRE is slow at 
checking whether input data are valid UTF-8. I just now did a brief check and 
found one major performance issue:

grep -rP 'fed.*cba' .

On my machine the above command is 125x slower with the new grep than the old 
one, which suggests some tuning is in order before releasing. (It's bogged down 
inside libpcre somewhere.)

Since you wrote your email I did a triage of the outstanding bugs, except for 
the bugs where patches are available which are mostly performance-related, and 
where I expect there will be some stuff that is relevant to -P slowdown.




This bug report was last modified 9 years and 138 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.