GNU bug report logs - #19230
Help! grepV2.21 treats ISO-8859 text files as if they are binary

Previous Next

Package: grep;

Reported by: Hans Pelleboer <hanspelleboer <at> online.nl>

Date: Sun, 30 Nov 2014 19:10:02 UTC

Severity: normal

Merged with 19985, 20526, 21558

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 19230 <at> debbugs.gnu.org (full text, mbox):

From: Hans Pelleboer <hanspelleboer <at> online.nl>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 19230 <at> debbugs.gnu.org
Subject: Re: bug#19230: Help! grepV2.21 treats ISO-8859 text files as if they
 are binary
Date: Mon, 01 Dec 2014 08:57:52 +0100
I think you nailed it, Paul:

OS: Arch Linux / kernel 3.17.4 / x86_64, locale is set to UTF-8
grep came straight from the Arch repository.

As grepV2.20 still showed the `old', more forgiving behaviour,
I was wondering what can be done to compile grep in such a way,
that it processes all text files, no matter what way they are encoded.
After all sed, vi, emacs, the works, do just that.

Yours,

hansp

On 11/30/2014 11:02 PM, Paul Eggert wrote:
> Hans Pelleboer wrote:
>
>> Binary file <NAME_FILE> matches
>>
>> Further tests showed, that grep only behaved this way with text
>> files that were encoded according to ISO-8859 (There may be more!).
>
> What operating system are you running on, and how did you build or 
> import grep?
>
> Also, what's your locale?  What is the output of the shell command 
> 'locale'?
>
> I can see this happening if you are using an UTF-8 locale, as in 
> general ISO-8859 is not valid UTF-8 text.  Older versions of 'grep' 
> were less picky in this area, and that might explain the symptoms you 
> observed.  With newer versions it's more important for the locale to 
> be compatible with the text file's encoding.





This bug report was last modified 9 years and 194 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.