GNU bug report logs - #16865
grep -wP and backreferences

Previous Next

Package: grep;

Reported by: Stephane Chazelas <stephane.chazelas <at> gmail.com>

Date: Mon, 24 Feb 2014 16:31:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Stephane Chazelas <stephane.chazelas <at> gmail.com>
Subject: bug#16865: closed (Re: bug#16865: grep -wP and backreferences)
Date: Tue, 25 Feb 2014 18:04:04 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#16865: grep -wP and backreferences

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 16865 <at> debbugs.gnu.org.

-- 
16865: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16865
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Jim Meyering <jim <at> meyering.net>
To: Stephane Chazelas <stephane.chazelas <at> gmail.com>
Cc: 16865-done <at> debbugs.gnu.org
Subject: Re: bug#16865: grep -wP and backreferences
Date: Tue, 25 Feb 2014 10:03:28 -0800
[Message part 3 (text/plain, inline)]
On Tue, Feb 25, 2014 at 8:08 AM, Stephane Chazelas
<stephane.chazelas <at> gmail.com> wrote:
> 2014-02-24 20:55:42 -0800, Jim Meyering:
>> On Mon, Feb 24, 2014 at 1:20 PM, Stephane Chazelas
>> <stephane.chazelas <at> gmail.com> wrote:
>> > A last note: with -w, pcregrep wraps the regexp in \b...\b
>> > instead of \b(?:...)\b, so it could be that those brackets are
>> > not necessary in the first place.
>
> The brackets are actually needed in cases like:
>
> grep -Pw 'foo|bar'
>
> (pcregrep has a bug there).
>
>
>> > Maybe instead of \b(?:...)\b, we could use (?<!\w)...(?!\w)
>> >
>> > $ echo a%%b | grep -P '(?<!\w)%%(?!\w)'
>> > $ echo %aa% | grep -P '(?<!\w)aa(?!\w)'
>> > %aa%
>>
>> I like both suggestions. Making -wP work like grep's -w makes perfect sense.
>> Care to prepare a patch to make it do that, with a separate test case?
>> "git format-patch ..." output preferred, if you're game.
>>
>> I pushed the above patch, but would welcome another one.
>
> Please find the patch attached.

Thank you very much.  Nearly perfect.
I've uncapitalized the 1-line summary, changed a That to This
in the log, and added examples to NEWS, and added an empty
line to restore the 2-empty-line section delimiter.

> (note that tests/word-delim-multibyte fails for me, but it's not
> my doing, it was failing before).

That's an XFAIL test (as noted in tests/Makefile.am), hence, expected
to fail, and as long as it fails as expected, "make check" can still succeed.

I've closed this ticket, and will push once you ack these changes.
[0001-align-grep-Pw-with-grep-w.patch (application/octet-stream, attachment)]
[Message part 5 (message/rfc822, inline)]
From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: grep -wP and backreferences
Date: Mon, 24 Feb 2014 10:01:54 +0000
Hello,

Backreferences don't work with -w or -x in combination with -P:

$ echo aa | grep -Pw '(.)\1'
$

Or they work in an unexpected way:

$ echo aa | grep -Pw '(.)\2'
aa

The fix is simple:


--- src/pcresearch.c~	2014-02-24 09:59:56.864374362 +0000
+++ src/pcresearch.c	2014-02-24 07:33:04.666398105 +0000
@@ -75,9 +75,9 @@ Pcompile (char const *pattern, size_t si
 
   *n = '\0';
   if (match_lines)
-    strcpy (n, "^(");
+    strcpy (n, "^(?:");
   if (match_words)
-    strcpy (n, "\\b(");
+    strcpy (n, "\\b(?:");
   n += strlen (n);
 
   /* The PCRE interface doesn't allow NUL bytes in the pattern, so



This bug report was last modified 11 years and 90 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.