GNU bug report logs -
#75806
Trailing spaces; pattern "\s" before "[[:cntrl:]]" faulty
Previous Next
Full log
View this message in rfc822 format
On Fri, Jan 24, 2025 at 07:26:00PM +0000, Peter White wrote:
> On Fri, Jan 24, 2025 at 01:27:13PM +0000, Andreas BROCKMANN via Bug reports for GNU grep wrote:
> > Hi,
> >
> > The 1st command below correctly reports trailing spaces, for Unix and Windows format files.
> > The 2nd one incorrectly reports all lines.
> >
> > grep -sHn -i " [[:cntrl:]]*$" *.vhd
> > grep -sHn -i "\s[[:cntrl:]]*$" *.vhd
> As someone who just today made a similar mistake I would like to point
> out that the pattern does as intended because '*' matches *zero* or more
> occurrences of the preceding atom. So the second pattern matches
> any line that contains a *literal* 's' followed by zero or more control
> chars, which is any line because of the newline at the end which is a
> control char. Since you did not ask for perl regex (-P) grep uses basic
> POSIX regex instead; at least I *think* you want perl syntax given that
> '\s' is only valid in PCRE, IIRC.
Turns out that last part is not true, sorry. I was going by the grep(1)
man page instead of `info grep`, which does say that '\s' is shorthand
for '[[:space:]]'. Still, the 2nd pattern is incorrect. IIUC this is
what it should look like:
# '-i' is bogus since there is no upper/lower case whitespace
grep --color=never -sHn '[[:blank:]][[:cntrl:]]*$'
[:blank:] is the more correct char class because '\s' matches anything
in the ASCII range 0-31 (plus <DEL>[127]) and as it so happens <CR> is
in that range. DOS files have the <CR> in front of <LF> (a.k.a. '$'),
which is why the original pattern did match *correctly*. Contrary to the
claim in the OP I could only reproduce the "false" behavior with DOS and
not UNIX files. And now I understand why '[[:cntrl:]]' is in the pattern
(sorry for my initial misunderstanding). DOS, the gift that keeps on
giving. :P
Also note the '--color=never'. I don't know how relevant this is on
Windows but on my terminal emulator (with --color=auto) the <CR> at the
end of a line in DOS files would be printed as a match and the terminal
obeyed with all the ensuing consequences, leaving empty lines without
match text. Another "gift", I guess.
PW
This bug report was last modified 119 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.