GNU bug report logs -
#75806
Trailing spaces; pattern "\s" before "[[:cntrl:]]" faulty
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 75806 in the body.
You can then email your comments to 75806 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#75806
; Package
grep
.
(Fri, 24 Jan 2025 14:50:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Andreas BROCKMANN <andreas.brockmann <at> diehl.com>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Fri, 24 Jan 2025 14:50:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi,
The 1st command below correctly reports trailing spaces, for Unix and Windows format files.
The 2nd one incorrectly reports all lines.
grep -sHn -i " [[:cntrl:]]*$" *.vhd
grep -sHn -i "\s[[:cntrl:]]*$" *.vhd
grep -V
grep (GNU grep) 3.0
Packaged by Cygwin (3.0-2)
Copyright (C) 2017 Free Software Foundation, Inc.
cmd --help
Microsoft Windows [Version 10.0.19045.5371]
(c) Microsoft Corporation. All rights reserved.
Kind Regards
Brockmann
Andreas Brockmann, Dipl.-Ing.
Hardware Engineer
Strategic Business Segment Aircraft Systems
HW & Mech. Engineering & Config. Management
phone +49 7551 891 4104
andreas.brockmann <at> diehl.com | www.diehl.com/aviation
Diehl Aerospace GmbH
Alte Nussdorfer Strasse 23 | 88662 Ueberlingen | Germany
Diehl Aerospace is a Joint Diehl Thales Company
[cid:image001.png <at> 01DB6E6A.9AA98030]
Discover our product highlights!<https://www.highlights-diehlaviation.com/en/>
[cid:image002.png <at> 01DB6E6A.9AA98030]<https://twitter.com/diehlaviation> [cid:image003.png <at> 01DB6E6A.9AA98030] <https://www.linkedin.com/company/13579979>
Save Paper! Think Before You Print.
Diehl Aerospace GmbH, Alte Nu?dorfer Stra?e 23, 88662 Ueberlingen, Deutschland/Germany
Sitz der Gesellschaft/registered office: Ueberlingen | Registergericht/Registry court: Freiburg, HRB 581408
Geschaeftsfuehrer/General management: Dipl.-Betr.-Wirt (BA) Florian Maier (Vorsitzender/CEO), Dipl.-Wirtsch.-Ing. Joerg Maeder, Dipl.-Kfm. Eric Gros
Der Inhalt der vorstehenden E-Mail ist nicht rechtlich bindend. Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Informieren Sie uns bitte, wenn Sie diese E-Mail faelschlicherweise erhalten haben. Bitte loeschen Sie in diesem Fall die Nachricht. Jede unerlaubte Form der Reproduktion, Bekanntgabe, Aenderung, Verteilung und/oder Publikation dieser E-Mail ist strengstens untersagt.
Informationen zum Datenschutz, insbesondere zu Ihren Rechten, erhalten Sie unter: https://www.diehl.com/group/de/transparenz-und-informationspflichten
The content of the above mentioned e-mail is not legally binding. This e-mail contains confidential and/or legally protected information. Please inform us if you have received this e-mail by mistake and delete it in such a case. Each unauthorized reproduction, disclosure, alteration, distribution and/or publication of this e-mail is strictly prohibited.
For general information on data protection and your respective rights please visit: https://www.diehl.com/group/en/transparency-and-information-obligations
[Message part 2 (text/html, inline)]
[image001.png (image/png, inline)]
[image002.png (image/png, inline)]
[image003.png (image/png, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#75806
; Package
grep
.
(Fri, 24 Jan 2025 19:27:01 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
On Fri, Jan 24, 2025 at 01:27:13PM +0000, Andreas BROCKMANN via Bug reports for GNU grep wrote:
> Hi,
>
> The 1st command below correctly reports trailing spaces, for Unix and Windows format files.
> The 2nd one incorrectly reports all lines.
>
> grep -sHn -i " [[:cntrl:]]*$" *.vhd
> grep -sHn -i "\s[[:cntrl:]]*$" *.vhd
As someone who just today made a similar mistake I would like to point
out that the pattern does as intended because '*' matches *zero* or more
occurrences of the preceding atom. So the second pattern matches
any line that contains a *literal* 's' followed by zero or more control
chars, which is any line because of the newline at the end which is a
control char. Since you did not ask for perl regex (-P) grep uses basic
POSIX regex instead; at least I *think* you want perl syntax given that
'\s' is only valid in PCRE, IIRC.
Also [:cntrl:] is not the correct char class for white space, why not
[:space:] or [:blank:]? Your first pattern just happens to match the
literal space in it *and* any following string of zero or more control
chars.
PW
Information forwarded
to
bug-grep <at> gnu.org
:
bug#75806
; Package
grep
.
(Fri, 24 Jan 2025 23:00:02 GMT)
Full text and
rfc822 format available.
Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):
On Fri, Jan 24, 2025 at 07:26:00PM +0000, Peter White wrote:
> On Fri, Jan 24, 2025 at 01:27:13PM +0000, Andreas BROCKMANN via Bug reports for GNU grep wrote:
> > Hi,
> >
> > The 1st command below correctly reports trailing spaces, for Unix and Windows format files.
> > The 2nd one incorrectly reports all lines.
> >
> > grep -sHn -i " [[:cntrl:]]*$" *.vhd
> > grep -sHn -i "\s[[:cntrl:]]*$" *.vhd
> As someone who just today made a similar mistake I would like to point
> out that the pattern does as intended because '*' matches *zero* or more
> occurrences of the preceding atom. So the second pattern matches
> any line that contains a *literal* 's' followed by zero or more control
> chars, which is any line because of the newline at the end which is a
> control char. Since you did not ask for perl regex (-P) grep uses basic
> POSIX regex instead; at least I *think* you want perl syntax given that
> '\s' is only valid in PCRE, IIRC.
Turns out that last part is not true, sorry. I was going by the grep(1)
man page instead of `info grep`, which does say that '\s' is shorthand
for '[[:space:]]'. Still, the 2nd pattern is incorrect. IIUC this is
what it should look like:
# '-i' is bogus since there is no upper/lower case whitespace
grep --color=never -sHn '[[:blank:]][[:cntrl:]]*$'
[:blank:] is the more correct char class because '\s' matches anything
in the ASCII range 0-31 (plus <DEL>[127]) and as it so happens <CR> is
in that range. DOS files have the <CR> in front of <LF> (a.k.a. '$'),
which is why the original pattern did match *correctly*. Contrary to the
claim in the OP I could only reproduce the "false" behavior with DOS and
not UNIX files. And now I understand why '[[:cntrl:]]' is in the pattern
(sorry for my initial misunderstanding). DOS, the gift that keeps on
giving. :P
Also note the '--color=never'. I don't know how relevant this is on
Windows but on my terminal emulator (with --color=auto) the <CR> at the
end of a line in DOS files would be printed as a match and the terminal
obeyed with all the ensuing consequences, leaving empty lines without
match text. Another "gift", I guess.
PW
Information forwarded
to
bug-grep <at> gnu.org
:
bug#75806
; Package
grep
.
(Sat, 25 Jan 2025 17:09:02 GMT)
Full text and
rfc822 format available.
Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):
On Fri, Jan 24, 2025 at 01:27:13PM +0000, Andreas BROCKMANN via Bug reports for GNU grep wrote:
> Hi,
>
> The 1st command below correctly reports trailing spaces, for Unix and Windows format files.
> The 2nd one incorrectly reports all lines.
>
> grep -sHn -i " [[:cntrl:]]*$" *.vhd
> grep -sHn -i "\s[[:cntrl:]]*$" *.vhd
As someone who just today made a similar mistake I would like to point
out that the pattern does as intended because '*' matches *zero* or more
occurrences of the preceding atom. So the second pattern matches
any line that contains a *literal* 's' followed by zero or more control
chars - you did not ask for perl regex and thus got basic POSIX regex
instead; at least I *think* you want perl syntax given that '\s' is only
valid in PCRE, IIRC.
Also [:cntrl:] is not the correct char class for white space, why not
[:space:] or [:blank:]?
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Sat, 25 Jan 2025 19:32:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Andreas BROCKMANN <andreas.brockmann <at> diehl.com>
:
bug acknowledged by developer.
(Sat, 25 Jan 2025 19:32:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 75806-done <at> debbugs.gnu.org (full text, mbox):
On 2025-01-24 05:27, Andreas BROCKMANN via Bug reports for GNU grep wrote:
> The 1st command below correctly reports trailing spaces, for Unix and Windows format files.
> The 2nd one incorrectly reports all lines.
>
> grep -sHn -i " [[:cntrl:]]*$" *.vhd
> grep -sHn -i "\s[[:cntrl:]]*$" *.vhd
I don't see a bug. The latter command is equivalent to:
grep -Hins '[[:space:]][[:cntrl:]]*$' *.vhd
and if the input files use Microsoft CRLF format then [[:space:]]
matches the CR at the end of every line and [[:cntrl:]]* matches the
empty string after the CR.
Possibly you need to use Unix LF format, or use the --text option, or
something like that.
Marking the bug as done.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#75806
; Package
grep
.
(Sat, 25 Jan 2025 20:43:01 GMT)
Full text and
rfc822 format available.
Message #22 received at 75806-done <at> debbugs.gnu.org (full text, mbox):
On 2025-01-25 12:31, Paul Eggert via Cygwin wrote:
> On 2025-01-24 05:27, Andreas BROCKMANN via Bug reports for GNU grep wrote:
>> The 1st command below correctly reports trailing spaces, for Unix and Windows
>> format files.
>> The 2nd one incorrectly reports all lines.
>>
>> grep -sHn -i " [[:cntrl:]]*$" *.vhd
>> grep -sHn -i "\s[[:cntrl:]]*$" *.vhd
>
> I don't see a bug. The latter command is equivalent to:
>
> grep -Hins '[[:space:]][[:cntrl:]]*$' *.vhd
>
> and if the input files use Microsoft CRLF format then [[:space:]] matches the CR
> at the end of every line and [[:cntrl:]]* matches the empty string after the CR.
>
> Possibly you need to use Unix LF format, or use the --text option, or something
> like that.
>
> Marking the bug as done.
IIRC even Cygwin dropped Windows text handling in coreutils, findutils, grep,
sed, etc. about 2018 to be consistent with other POSIX platforms.
Use d2u/dos2unix or u2d/unix2dos in pipes to convert, or equivalent, such as
tr -d '\r', sed -e 's/\r//g', awk -e '{gsub(/\r/,"")'.
Cygwin users may be able to compensate by remounting the filesystem with a
"text" mount option -o text or the equivalent in an /etc/fstab entry, but I am
unsure if anyone has tested using that option nowadays.
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retrancher but when there is no more to cut
-- Antoine de Saint-Exupéry
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 23 Feb 2025 12:24:07 GMT)
Full text and
rfc822 format available.
This bug report was last modified 119 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.