GNU bug report logs - #51458
grep PCRE - mean

Previous Next

Package: grep;

Reported by: "Skrzyniarz, Slawomir (Nokia - PL/Krakow)" <slawomir.skrzyniarz <at> nokia.com>

Date: Thu, 28 Oct 2021 09:11:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: "Skrzyniarz, Slawomir (Nokia - PL/Krakow)" <slawomir.skrzyniarz <at> nokia.com>
Subject: bug#51458: closed (Re: bug#51458: grep PCRE - '^' and '$' are not
 recognized as begin and end of line for multiline strings)
Date: Tue, 09 Nov 2021 18:06:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#51458: grep PCRE - mean

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 51458 <at> debbugs.gnu.org.

-- 
51458: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=51458
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Skrzyniarz, Slawomir (Nokia - PL/Krakow)" <slawomir.skrzyniarz <at> nokia.com>,
 "51458 <at> debbugs.gnu.org" <51458-done <at> debbugs.gnu.org>
Subject: Re: bug#51458: grep PCRE - '^' and '$' are not recognized as begin
 and end of line for multiline strings
Date: Tue, 9 Nov 2021 10:05:31 -0800
On 11/8/21 22:48, Skrzyniarz, Slawomir (Nokia - PL/Krakow) wrote:
> Solve my issue.

Thanks for letting us know; closing the bug report.

[Message part 3 (message/rfc822, inline)]
From: "Skrzyniarz, Slawomir (Nokia - PL/Krakow)"
 <slawomir.skrzyniarz <at> nokia.com>
To: "bug-grep <at> gnu.org" <bug-grep <at> gnu.org>
Subject: grep PCRE - mean
Date: Thu, 28 Oct 2021 08:23:08 +0000
[Message part 4 (text/plain, inline)]
Hello Grep Team,
I would update grep from version 2.20 to 3.1 and noticed that grep with -P option
stops recognize below regular expression:

cat SomeTestFile.cpp | sed -r -e 's:\/(\*([^*]|\*[^\/])*[*]\/|\/.*)::g' -e 's:\"[^"]*\"::g' |
grep -ozPLq '\A(?:\s*^(?:#\w+.*\s*|extern\s+.+)$)*+(?<namespace>\s*namespace(?:\s+ utTestNamespace \s*(?>(?<block>{(?:[^{}]*(?&block)*)*}))|(\s*[\w:]*\s*{)(?&namespace)\s*}))\s*\z'; echo "retcode $?"

Content of file SomeTestFile.cpp:
#include <memory>
#include <vector>
#include <gtest/gtest.h>

namespace utTestNamespace
{
using ::testing::NiceMock;
# some code here
}
//end of file


I checked regular expression on regex101.com webpage and noticed that mentioned regex is working for PCRE and PCRE2 on webpage but stop working in grep 3.1 and later versions (versions between 2.20 and 3.1 were not checked).
See link:
https://regex101.com/r/9NwluI/1/

Investigation shows that grep in 3.1 version and later 3.6 and 3.7 different handle "^" and "$" for "-P" option.
It looks that "^" does not detect all begin of lines but "$" does not recognize all end of lines.

It seems that "^" is treated as beginning of whole test string - not new lines.
"$" is suspected to recognize only end of whole test string - not end of lines.

I would ask you if is intended behavior or it looks like an issue in grep.

useful command in test:
cat SomeTestFile.cpp | sed -r -e 's:\/(\*([^*]|\*[^\/])*[*]\/|\/.*)::g' -e 's:\"[^"]*\"::g' | grep -zP '(?:\s*^(?:\#\w+.*\s*|extern\s+.+)$)*+'
cat SomeTestFile.cpp | sed -r -e 's:\/(\*([^*]|\*[^\/])*[*]\/|\/.*)::g' -e 's:\"[^"]*\"::g' | grep -zP '(?:\s*^(?:\#\w+.*\s*|extern\s+.+)\s*)*+'


Best Regards,
Sławek

[Message part 5 (text/html, inline)]

This bug report was last modified 3 years and 255 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.