GNU bug report logs - #75806
Trailing spaces; pattern "\s" before "[[:cntrl:]]" faulty

Previous Next

Package: grep;

Reported by: Andreas BROCKMANN <andreas.brockmann <at> diehl.com>

Date: Fri, 24 Jan 2025 14:50:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Peter White <peter.white <at> posteo.net>
To: bug-grep <at> gnu.org
Subject: Re: bug#75806: Trailing spaces; pattern "\s" before "[[:cntrl:]]"
 faulty
Date: Fri, 24 Jan 2025 22:59:27 +0000
On Fri, Jan 24, 2025 at 07:26:00PM +0000, Peter White wrote:
> On Fri, Jan 24, 2025 at 01:27:13PM +0000, Andreas BROCKMANN via Bug reports for GNU grep wrote:
> > Hi,
> > 
> > The 1st command below correctly reports trailing spaces, for Unix and Windows format files.
> > The 2nd one incorrectly reports all lines.
> > 
> >   grep -sHn -i " [[:cntrl:]]*$" *.vhd
> >   grep -sHn -i "\s[[:cntrl:]]*$" *.vhd
> As someone who just today made a similar mistake I would like to point
> out that the pattern does as intended because '*' matches *zero* or more
> occurrences of the preceding atom. So the second pattern matches
> any line that contains a *literal* 's' followed by zero or more control
> chars, which is any line because of the newline at the end which is a
> control char. Since you did not ask for perl regex (-P) grep uses basic
> POSIX regex instead; at least I *think* you want perl syntax given that
> '\s' is only valid in PCRE, IIRC.

Turns out that last part is not true, sorry. I was going by the grep(1)
man page instead of `info grep`, which does say that '\s' is shorthand
for '[[:space:]]'. Still, the 2nd pattern is incorrect. IIUC this is
what it should look like:

	# '-i' is bogus since there is no upper/lower case whitespace
	grep --color=never -sHn '[[:blank:]][[:cntrl:]]*$'

[:blank:] is the more correct char class because '\s' matches anything
in the ASCII range 0-31 (plus <DEL>[127]) and as it so happens <CR> is
in that range. DOS files have the <CR> in front of <LF> (a.k.a. '$'),
which is why the original pattern did match *correctly*. Contrary to the
claim in the OP I could only reproduce the "false" behavior with DOS and
not UNIX files. And now I understand why '[[:cntrl:]]' is in the pattern
(sorry for my initial misunderstanding). DOS, the gift that keeps on
giving. :P

Also note the '--color=never'. I don't know how relevant this is on
Windows but on my terminal emulator (with --color=auto) the <CR> at the
end of a line in DOS files would be printed as a match and the terminal
obeyed with all the ensuing consequences, leaving empty lines without
match text. Another "gift", I guess.


PW




This bug report was last modified 119 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.