GNU bug report logs - #19985
active locale impacts binary data detection?

Previous Next

Package: grep;

Reported by: Mike Frysinger <vapier <at> gentoo.org>

Date: Tue, 3 Mar 2015 02:01:01 UTC

Severity: normal

Merged with 19230, 20526, 21558

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 19985 in the body.
You can then email your comments to 19985 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#19985; Package grep. (Tue, 03 Mar 2015 02:01:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mike Frysinger <vapier <at> gentoo.org>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Tue, 03 Mar 2015 02:01:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mike Frysinger <vapier <at> gentoo.org>
To: bug-grep <at> gnu.org
Cc: proteuss <at> sdf.lonestar.org
Subject: active locale impacts binary data detection?
Date: Mon, 2 Mar 2015 20:59:51 -0500
[Message part 1 (text/plain, inline)]
i've got some users reporting diff behavior between 2.20 and 2.21.  the example 
file is attached and has mixed encoding.  i think the new behavior is correct, 
but want to make sure it's expected, or that the maintainers don't have 
different ideas here.

with 2.20:
$ LC_ALL=en_US.UTF8 grep 8 test-mixed
  852  cd  ΘΕΜΑΤΑ\ ΠΑΝΕΛΛΗΝΙΩΝ/

with 2.21:
$ LC_ALL=en_US.UTF8 grep 8 test-mixed
Binary file test-mixed matches
$ LC_ALL=en_US.UTF8 grep -a 8 test-mixed
  852  cd  ΘΕΜΑΤΑ\ ΠΑΝΕΛΛΗΝΙΩΝ/
$ LC_ALL=C grep 8 test-mixed
  852  cd  ΘΕΜΑΤΑ\ ΠΑΝΕΛΛΗΝΙΩΝ/
-mike
[test-mixed.gz (application/octet-stream, attachment)]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#19985; Package grep. (Tue, 03 Mar 2015 02:32:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: bug-grep <at> gnu.org, proteuss <at> sdf.lonestar.org
Subject: Re: bug#19985: active locale impacts binary data detection?
Date: Mon, 02 Mar 2015 18:31:27 -0800
The new behavior is expected, and this is mentioned in the NEWS file:

  If a file contains data improperly encoded for the current locale,
  and this is discovered before any of the file's contents are output,
  grep now treats the file as binary.

In some cases one can get the old behavior with 'grep -a'.

This is not the first time the problem has been reported.  Please see:

http://bugs.gnu.org/19230

If the problem occurs often enough, perhaps we should change grep's behavior. 
For example, perhaps grep should fall back on the C locale if the first part of 
a file contains an encoding error but no NUL bytes.




Merged 19230 19985 20526. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Sat, 30 May 2015 20:05:06 GMT) Full text and rfc822 format available.

Merged 19230 19985 20526 21558. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Fri, 25 Sep 2015 18:05:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 06 Feb 2016 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 139 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.