#20638 - BUG: standard & extended RE's don't find NUL's :-(

GNU bug report logs - #20638
BUG: standard & extended RE's don't find NUL's :-(

Package: grep;

Reported by: "L. A. Walsh" <gnu <at> tlinx.org>

Date: Sun, 24 May 2015 00:06:02 UTC

Severity: normal

Tags: notabug

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Message #17 received at 20638 <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <gnu <at> tlinx.org> To: Paul Eggert <eggert <at> cs.ucla.edu> Cc: 20638 <at> debbugs.gnu.org, Eric Blake <eblake <at> redhat.com> Subject: Re: bug#20638: BUG: standard & extended RE's don't find NUL's :-( Date: Mon, 25 May 2015 12:46:23 -0700

Paul Eggert wrote: > Linda Walsh wrote: > >> I had one file that it bailed on >> saying it has an invalid UTF-8 encoding -- but the line was >> recursive starting from '.' -- and it didn't name the file ---- I didn't report that as 'a bug', because when I went back to reproduce it -- low level physics took over -- i.e. the closer I looked, the more uncertain the problem became! I did change the grep * into a for i in *;do echo file;grep file;...but couldn't find the file that gave the message...Grrr. I will bet it was with the '-P' option, since the standard Regex in perl complains about such things and since I was only interested in status (was using -q _because_ I was searching for a binary pattern -- the '\000\000') I got the warning but nothing else. If I run into it again, maybe I can find it w/o looking too closely then that uncertainty principle won't kick in... ;-) > > That's pretty vague. Can you reproduce that problem? I don't observe > it: > > $ mkdir d > $ printf 'a\200\n' >d/f > $ printf 'b\200\n' >d/g > $ grep -r a d > Binary file d/f matches > >> "-a" doesn't work, BTW: >> >> Ishtar:/tmp> grep -a '\000\000' zeros >> Ishtar:/tmp> echo $? >> 1 > > That's the way 'grep' has always behaved. The regular expression '\0' > matches the string "0", not the NUL byte. > >> Ishtar:/tmp> grep -P '\000\000' zeros Binary file zeros matches > > I don't follow this example; perhaps some text was omitted? Anyway, > -P has always treated files containing zeros as binary files too, ever > since -P has been introduced. It's the same as without -P. > >> But there it is -- if grep wasn't meant to handle binary files, >> it wouldn't know to call 'zeroes' a binary file. > > Obviously, grep *is* meant to handle binary files; it's documented to > handle them in a particular way. --- Nevertheless, it is documented, that '\ddd' or '\xHH' can be used to match a single character of the value specified. '\000\000' is found in 'zeroes' (as mentioned in the original report -- a file filled with 4k of nulls), with the -P switch, but not the -a switch. That behavior violates the documentation. > >> how can 'shuf' claim to work on input lines yet have this allowed: >> >> -z, --zero-terminated >> line delimiter is NUL, not newline. > > I don't follow this point. -z is a nice feature; we don't want to get > rid of it. ---- Nice of you to not read the previous notes. The argument was that a NUL in a file made it non-text -- therefore it woudln't be a "line". > >> People argue to dumb down POSIX >> utils, because some corp wants to get a posix label but >> has a few shortcomings -- so they donate enough money and >> posix changes it's rules. > > I'm afraid you've gone off the deep end here. I didn't bring up POSIX, Eric did. Again, nice of you to jump in the middle of a conversation and not read the earlier notes... :-) *Cheers* Paul...(et al). -linda

This bug report was last modified 9 years and 363 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #20638 BUG: standard & extended RE's don't find NUL's :-(

GNU bug report logs - #20638
BUG: standard & extended RE's don't find NUL's :-(