GNU bug report logs - #19242
latest grep considers text files as binary

Previous Next

Package: grep;

Reported by: Thomas Wolff <towo <at> computer.org>

Date: Mon, 1 Dec 2014 18:02:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #16 received at 19242 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Thomas Wolff <towo <at> computer.org>
Cc: Jim Meyering <meyering <at> fb.com>, Paul Eggert <eggert <at> cs.ucla.edu>,
 19242 <at> debbugs.gnu.org
Subject: Re: bug#19242: latest grep considers text files as binary
Date: Fri, 5 Dec 2014 07:00:21 -0800
On Fri, Dec 5, 2014 at 1:58 AM, Thomas Wolff <towo <at> computer.org> wrote:
> Paul Eggert wrote:
>>>
>>> the mentioned patches are apparently intended to fix issues in non-UTF-8
>>> locales.
>>
>> No, they're also needed for UTF-8 locales I'm afraid.  There are some
>> security issues, not only having to do with grep's internals, but also for
>> the behavior of downstream programs that may be expecting UTF-8 text.
>>
>> You can work around the problem with 'grep -a'.
>
> I was aware of this workaround but I claim it should not be needed because
> the files affected are in fact not binary files but text files. The manual
> clearly says about -a: "Process a binary file as if it were text" but
> partial content in a different text encoding does not make a file binary.
>
> Jim Meyering wrote:
>>
>>   this is due to documented and desirable behavior.
>
> I deny this is desirable behavior and I doubt there is a security issue as
> described. If any other, independent software has a security issue with
> non-UTF-8 input, it should decide itself to filter it and use accordingly
> stable decoding functions. It cannot be the task of any tool (grep in this
> case) to filter output to work around possible security issues in other
> programs in a pipe. This would be completely against the concept of pipes in
> the Unix tradition.

This is another side effect of using a multibyte locale.
As long as there are no NUL bytes in your input, you can work
around the issue by running grep in the C locale:

  LC_ALL=C grep ...




This bug report was last modified 10 years and 65 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.