GNU bug report logs - #75806
Trailing spaces; pattern "\s" before "[[:cntrl:]]" faulty

Previous Next

Package: grep;

Reported by: Andreas BROCKMANN <andreas.brockmann <at> diehl.com>

Date: Fri, 24 Jan 2025 14:50:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 75806 in the body.
You can then email your comments to 75806 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#75806; Package grep. (Fri, 24 Jan 2025 14:50:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Andreas BROCKMANN <andreas.brockmann <at> diehl.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 24 Jan 2025 14:50:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andreas BROCKMANN <andreas.brockmann <at> diehl.com>
To: "bug-grep <at> gnu.org" <bug-grep <at> gnu.org>
Cc: "cygwin <at> cygwin.com" <cygwin <at> cygwin.com>
Subject: Trailing spaces; pattern "\s" before "[[:cntrl:]]" faulty
Date: Fri, 24 Jan 2025 13:27:13 +0000
[Message part 1 (text/plain, inline)]
Hi,

The 1st command below correctly reports trailing spaces, for Unix and Windows format files.
The 2nd one incorrectly reports all lines.

  grep -sHn -i " [[:cntrl:]]*$" *.vhd
  grep -sHn -i "\s[[:cntrl:]]*$" *.vhd

grep -V
grep (GNU grep) 3.0
Packaged by Cygwin (3.0-2)
Copyright (C) 2017 Free Software Foundation, Inc.

cmd --help
Microsoft Windows [Version 10.0.19045.5371]
(c) Microsoft Corporation. All rights reserved.

Kind Regards
Brockmann

Andreas Brockmann, Dipl.-Ing.
Hardware Engineer
Strategic Business Segment Aircraft Systems
HW & Mech. Engineering & Config. Management

phone +49 7551 891 4104
andreas.brockmann <at> diehl.com | www.diehl.com/aviation

Diehl Aerospace GmbH
Alte Nussdorfer Strasse 23 | 88662 Ueberlingen | Germany

Diehl Aerospace is a Joint Diehl Thales Company

[cid:image001.png <at> 01DB6E6A.9AA98030]

Discover our product highlights!<https://www.highlights-diehlaviation.com/en/>

[cid:image002.png <at> 01DB6E6A.9AA98030]<https://twitter.com/diehlaviation>    [cid:image003.png <at> 01DB6E6A.9AA98030] <https://www.linkedin.com/company/13579979>

Save Paper! Think Before You Print.


Diehl Aerospace GmbH, Alte Nu?dorfer Stra?e 23, 88662 Ueberlingen, Deutschland/Germany

Sitz der Gesellschaft/registered office: Ueberlingen | Registergericht/Registry court: Freiburg, HRB 581408

Geschaeftsfuehrer/General management: Dipl.-Betr.-Wirt (BA) Florian Maier (Vorsitzender/CEO), Dipl.-Wirtsch.-Ing. Joerg Maeder, Dipl.-Kfm. Eric Gros

Der Inhalt der vorstehenden E-Mail ist nicht rechtlich bindend. Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Informieren Sie uns bitte, wenn Sie diese E-Mail faelschlicherweise erhalten haben. Bitte loeschen Sie in diesem Fall die Nachricht. Jede unerlaubte Form der Reproduktion, Bekanntgabe, Aenderung, Verteilung und/oder Publikation dieser E-Mail ist strengstens untersagt.
Informationen zum Datenschutz, insbesondere zu Ihren Rechten, erhalten Sie unter: https://www.diehl.com/group/de/transparenz-und-informationspflichten

The content of the above mentioned e-mail is not legally binding. This e-mail contains confidential and/or legally protected information. Please inform us if you have received this e-mail by mistake and delete it in such a case. Each unauthorized reproduction, disclosure, alteration, distribution and/or publication of this e-mail is strictly prohibited.
For general information on data protection and your respective rights please visit: https://www.diehl.com/group/en/transparency-and-information-obligations


[Message part 2 (text/html, inline)]
[image001.png (image/png, inline)]
[image002.png (image/png, inline)]
[image003.png (image/png, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#75806; Package grep. (Fri, 24 Jan 2025 19:27:01 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Peter White <peter.white <at> posteo.net>
To: bug-grep <at> gnu.org
Subject: Re: bug#75806: Trailing spaces; pattern "\s" before "[[:cntrl:]]"
 faulty
Date: Fri, 24 Jan 2025 19:26:00 +0000
On Fri, Jan 24, 2025 at 01:27:13PM +0000, Andreas BROCKMANN via Bug reports for GNU grep wrote:
> Hi,
> 
> The 1st command below correctly reports trailing spaces, for Unix and Windows format files.
> The 2nd one incorrectly reports all lines.
> 
>   grep -sHn -i " [[:cntrl:]]*$" *.vhd
>   grep -sHn -i "\s[[:cntrl:]]*$" *.vhd

As someone who just today made a similar mistake I would like to point
out that the pattern does as intended because '*' matches *zero* or more
occurrences of the preceding atom. So the second pattern matches
any line that contains a *literal* 's' followed by zero or more control
chars, which is any line because of the newline at the end which is a
control char. Since you did not ask for perl regex (-P) grep uses basic
POSIX regex instead; at least I *think* you want perl syntax given that
'\s' is only valid in PCRE, IIRC.

Also [:cntrl:] is not the correct char class for white space, why not
[:space:] or [:blank:]? Your first pattern just happens to match the
literal space in it *and* any following string of zero or more control
chars.


PW




Information forwarded to bug-grep <at> gnu.org:
bug#75806; Package grep. (Fri, 24 Jan 2025 23:00:02 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Peter White <peter.white <at> posteo.net>
To: bug-grep <at> gnu.org
Subject: Re: bug#75806: Trailing spaces; pattern "\s" before "[[:cntrl:]]"
 faulty
Date: Fri, 24 Jan 2025 22:59:27 +0000
On Fri, Jan 24, 2025 at 07:26:00PM +0000, Peter White wrote:
> On Fri, Jan 24, 2025 at 01:27:13PM +0000, Andreas BROCKMANN via Bug reports for GNU grep wrote:
> > Hi,
> > 
> > The 1st command below correctly reports trailing spaces, for Unix and Windows format files.
> > The 2nd one incorrectly reports all lines.
> > 
> >   grep -sHn -i " [[:cntrl:]]*$" *.vhd
> >   grep -sHn -i "\s[[:cntrl:]]*$" *.vhd
> As someone who just today made a similar mistake I would like to point
> out that the pattern does as intended because '*' matches *zero* or more
> occurrences of the preceding atom. So the second pattern matches
> any line that contains a *literal* 's' followed by zero or more control
> chars, which is any line because of the newline at the end which is a
> control char. Since you did not ask for perl regex (-P) grep uses basic
> POSIX regex instead; at least I *think* you want perl syntax given that
> '\s' is only valid in PCRE, IIRC.

Turns out that last part is not true, sorry. I was going by the grep(1)
man page instead of `info grep`, which does say that '\s' is shorthand
for '[[:space:]]'. Still, the 2nd pattern is incorrect. IIUC this is
what it should look like:

	# '-i' is bogus since there is no upper/lower case whitespace
	grep --color=never -sHn '[[:blank:]][[:cntrl:]]*$'

[:blank:] is the more correct char class because '\s' matches anything
in the ASCII range 0-31 (plus <DEL>[127]) and as it so happens <CR> is
in that range. DOS files have the <CR> in front of <LF> (a.k.a. '$'),
which is why the original pattern did match *correctly*. Contrary to the
claim in the OP I could only reproduce the "false" behavior with DOS and
not UNIX files. And now I understand why '[[:cntrl:]]' is in the pattern
(sorry for my initial misunderstanding). DOS, the gift that keeps on
giving. :P

Also note the '--color=never'. I don't know how relevant this is on
Windows but on my terminal emulator (with --color=auto) the <CR> at the
end of a line in DOS files would be printed as a match and the terminal
obeyed with all the ensuing consequences, leaving empty lines without
match text. Another "gift", I guess.


PW




Information forwarded to bug-grep <at> gnu.org:
bug#75806; Package grep. (Sat, 25 Jan 2025 17:09:02 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Marcus Blumhagen <marcus.blumhagen <at> posteo.de>
To: bug-grep <at> gnu.org
Subject: Re: bug#75806: Trailing spaces; pattern "\s" before "[[:cntrl:]]"
 faulty
Date: Fri, 24 Jan 2025 19:00:02 +0000
On Fri, Jan 24, 2025 at 01:27:13PM +0000, Andreas BROCKMANN via Bug reports for GNU grep wrote:
> Hi,
> 
> The 1st command below correctly reports trailing spaces, for Unix and Windows format files.
> The 2nd one incorrectly reports all lines.
> 
>   grep -sHn -i " [[:cntrl:]]*$" *.vhd
>   grep -sHn -i "\s[[:cntrl:]]*$" *.vhd

As someone who just today made a similar mistake I would like to point
out that the pattern does as intended because '*' matches *zero* or more
occurrences of the preceding atom. So the second pattern matches
any line that contains a *literal* 's' followed by zero or more control
chars - you did not ask for perl regex and thus got basic POSIX regex
instead; at least I *think* you want perl syntax given that '\s' is only
valid in PCRE, IIRC.

Also [:cntrl:] is not the correct char class for white space, why not
[:space:] or [:blank:]?




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Sat, 25 Jan 2025 19:32:02 GMT) Full text and rfc822 format available.

Notification sent to Andreas BROCKMANN <andreas.brockmann <at> diehl.com>:
bug acknowledged by developer. (Sat, 25 Jan 2025 19:32:02 GMT) Full text and rfc822 format available.

Message #19 received at 75806-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Andreas BROCKMANN <andreas.brockmann <at> diehl.com>
Cc: "cygwin <at> cygwin.com" <cygwin <at> cygwin.com>, 75806-done <at> debbugs.gnu.org
Subject: Re: bug#75806: Trailing spaces; pattern "\s" before "[[:cntrl:]]"
 faulty
Date: Sat, 25 Jan 2025 11:31:06 -0800
On 2025-01-24 05:27, Andreas BROCKMANN via Bug reports for GNU grep wrote:
> The 1st command below correctly reports trailing spaces, for Unix and Windows format files.
> The 2nd one incorrectly reports all lines.
> 
>    grep -sHn -i " [[:cntrl:]]*$" *.vhd
>    grep -sHn -i "\s[[:cntrl:]]*$" *.vhd

I don't see a bug. The latter command is equivalent to:

   grep -Hins '[[:space:]][[:cntrl:]]*$' *.vhd

and if the input files use Microsoft CRLF format then [[:space:]] 
matches the CR at the end of every line and [[:cntrl:]]* matches the 
empty string after the CR.

Possibly you need to use Unix LF format, or use the --text option, or 
something like that.

Marking the bug as done.




Information forwarded to bug-grep <at> gnu.org:
bug#75806; Package grep. (Sat, 25 Jan 2025 20:43:01 GMT) Full text and rfc822 format available.

Message #22 received at 75806-done <at> debbugs.gnu.org (full text, mbox):

From: Brian Inglis <Brian.Inglis <at> SystematicSW.ab.ca>
To: 75806-done <at> debbugs.gnu.org
Cc: Andreas BROCKMANN <andreas.brockmann <at> diehl.com>, cygwin <at> cygwin.com,
 Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#75806: Trailing spaces; pattern "\s" before "[[:cntrl:]]"
 faulty
Date: Sat, 25 Jan 2025 13:42:02 -0700
On 2025-01-25 12:31, Paul Eggert via Cygwin wrote:
> On 2025-01-24 05:27, Andreas BROCKMANN via Bug reports for GNU grep wrote:
>> The 1st command below correctly reports trailing spaces, for Unix and Windows 
>> format files.
>> The 2nd one incorrectly reports all lines.
>>
>>    grep -sHn -i " [[:cntrl:]]*$" *.vhd
>>    grep -sHn -i "\s[[:cntrl:]]*$" *.vhd
> 
> I don't see a bug. The latter command is equivalent to:
> 
>     grep -Hins '[[:space:]][[:cntrl:]]*$' *.vhd
> 
> and if the input files use Microsoft CRLF format then [[:space:]] matches the CR 
> at the end of every line and [[:cntrl:]]* matches the empty string after the CR.
> 
> Possibly you need to use Unix LF format, or use the --text option, or something 
> like that.
> 
> Marking the bug as done.

IIRC even Cygwin dropped Windows text handling in coreutils, findutils, grep, 
sed, etc. about 2018 to be consistent with other POSIX platforms.

Use d2u/dos2unix or u2d/unix2dos in pipes to convert, or equivalent, such as
tr -d '\r', sed -e 's/\r//g', awk -e '{gsub(/\r/,"")'.

Cygwin users may be able to compensate by remounting the filesystem with a 
"text" mount option -o text or the equivalent in an /etc/fstab entry, but I am 
unsure if anyone has tested using that option nowadays.

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retrancher  but when there is no more to cut
                                -- Antoine de Saint-Exupéry




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 23 Feb 2025 12:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 119 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.