GNU bug report logs - #23234
unexpected results with charset handling in GNU grep 2.23

Previous Next

Package: grep;

Reported by: Björn JACKE <bjoern <at> j3e.de>

Date: Wed, 6 Apr 2016 20:45:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eric Blake <eblake <at> redhat.com>, Björn JACKE <bjoern <at> j3e.de>, 23234 <at> debbugs.gnu.org
Subject: bug#23234: unexpected results with charset handling in GNU grep 2.23
Date: Wed, 6 Apr 2016 18:28:33 -0700
On 04/06/2016 02:04 PM, Eric Blake wrote:
> POSIX ... says that LC_ALL=C is _required_ to treat all 256 byte 
> values as valid characters

Although that was the intent of POSIX, it's not what the current 
standard says, and it's not what many popular platforms do. Problematic 
platforms include Fedora 23, where mbrtowc reports an encoding error in 
the C locale when given a byte outside the range 0-127. This affects 
many programs other than 'grep'.

This bug in the standard is intended to be fixed in a future version of 
POSIX (see <http://austingroupbugs.net/view.php?id=663#c2738>). I 
suppose glibc and eventually Fedora will be fixed to conform to the new 
standard in due course.

Perhaps grep should work around this problem on systems like Fedora 23 
where the underlying C library does not conform to the next version of 
POSIX. It sounds like a new gnulib module or two might do the trick. 
This should fix the problems that Björn mentions.

In the meantime grep -a is the way to go. Yes, it's not portable to 
non-GNU grep, but there is no portable solution given the abovementioned 
POSIX problems, so a GNU-grep-only workaround is all one can reasonably 
ask for.




This bug report was last modified 9 years and 46 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.