GNU bug report logs -
#23234
unexpected results with charset handling in GNU grep 2.23
Previous Next
Reported by: Björn JACKE <bjoern <at> j3e.de>
Date: Wed, 6 Apr 2016 20:45:01 UTC
Severity: normal
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
On 04/06/2016 02:04 PM, Eric Blake wrote:
> POSIX ... says that LC_ALL=C is _required_ to treat all 256 byte
> values as valid characters
Although that was the intent of POSIX, it's not what the current
standard says, and it's not what many popular platforms do. Problematic
platforms include Fedora 23, where mbrtowc reports an encoding error in
the C locale when given a byte outside the range 0-127. This affects
many programs other than 'grep'.
This bug in the standard is intended to be fixed in a future version of
POSIX (see <http://austingroupbugs.net/view.php?id=663#c2738>). I
suppose glibc and eventually Fedora will be fixed to conform to the new
standard in due course.
Perhaps grep should work around this problem on systems like Fedora 23
where the underlying C library does not conform to the next version of
POSIX. It sounds like a new gnulib module or two might do the trick.
This should fix the problems that Björn mentions.
In the meantime grep -a is the way to go. Yes, it's not portable to
non-GNU grep, but there is no portable solution given the abovementioned
POSIX problems, so a GNU-grep-only workaround is all one can reasonably
ask for.
This bug report was last modified 9 years and 46 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.