GNU bug report logs - #16865
grep -wP and backreferences

Previous Next

Package: grep;

Reported by: Stephane Chazelas <stephane.chazelas <at> gmail.com>

Date: Mon, 24 Feb 2014 16:31:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Jim Meyering <jim <at> meyering.net>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#16865: closed (grep -wP and backreferences)
Date: Tue, 25 Feb 2014 18:04:03 +0000
[Message part 1 (text/plain, inline)]
Your message dated Tue, 25 Feb 2014 10:03:28 -0800
with message-id <CA+8g5KGNGgzSM3LQzB8CE+a=wqOa=U5h=hSb7OM8XjB87BqVYQ <at> mail.gmail.com>
and subject line Re: bug#16865: grep -wP and backreferences
has caused the debbugs.gnu.org bug report #16865,
regarding grep -wP and backreferences
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
16865: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16865
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: grep -wP and backreferences
Date: Mon, 24 Feb 2014 10:01:54 +0000
Hello,

Backreferences don't work with -w or -x in combination with -P:

$ echo aa | grep -Pw '(.)\1'
$

Or they work in an unexpected way:

$ echo aa | grep -Pw '(.)\2'
aa

The fix is simple:


--- src/pcresearch.c~	2014-02-24 09:59:56.864374362 +0000
+++ src/pcresearch.c	2014-02-24 07:33:04.666398105 +0000
@@ -75,9 +75,9 @@ Pcompile (char const *pattern, size_t si
 
   *n = '\0';
   if (match_lines)
-    strcpy (n, "^(");
+    strcpy (n, "^(?:");
   if (match_words)
-    strcpy (n, "\\b(");
+    strcpy (n, "\\b(?:");
   n += strlen (n);
 
   /* The PCRE interface doesn't allow NUL bytes in the pattern, so


[Message part 3 (message/rfc822, inline)]
From: Jim Meyering <jim <at> meyering.net>
To: Stephane Chazelas <stephane.chazelas <at> gmail.com>
Cc: 16865-done <at> debbugs.gnu.org
Subject: Re: bug#16865: grep -wP and backreferences
Date: Tue, 25 Feb 2014 10:03:28 -0800
[Message part 4 (text/plain, inline)]
On Tue, Feb 25, 2014 at 8:08 AM, Stephane Chazelas
<stephane.chazelas <at> gmail.com> wrote:
> 2014-02-24 20:55:42 -0800, Jim Meyering:
>> On Mon, Feb 24, 2014 at 1:20 PM, Stephane Chazelas
>> <stephane.chazelas <at> gmail.com> wrote:
>> > A last note: with -w, pcregrep wraps the regexp in \b...\b
>> > instead of \b(?:...)\b, so it could be that those brackets are
>> > not necessary in the first place.
>
> The brackets are actually needed in cases like:
>
> grep -Pw 'foo|bar'
>
> (pcregrep has a bug there).
>
>
>> > Maybe instead of \b(?:...)\b, we could use (?<!\w)...(?!\w)
>> >
>> > $ echo a%%b | grep -P '(?<!\w)%%(?!\w)'
>> > $ echo %aa% | grep -P '(?<!\w)aa(?!\w)'
>> > %aa%
>>
>> I like both suggestions. Making -wP work like grep's -w makes perfect sense.
>> Care to prepare a patch to make it do that, with a separate test case?
>> "git format-patch ..." output preferred, if you're game.
>>
>> I pushed the above patch, but would welcome another one.
>
> Please find the patch attached.

Thank you very much.  Nearly perfect.
I've uncapitalized the 1-line summary, changed a That to This
in the log, and added examples to NEWS, and added an empty
line to restore the 2-empty-line section delimiter.

> (note that tests/word-delim-multibyte fails for me, but it's not
> my doing, it was failing before).

That's an XFAIL test (as noted in tests/Makefile.am), hence, expected
to fail, and as long as it fails as expected, "make check" can still succeed.

I've closed this ticket, and will push once you ack these changes.
[0001-align-grep-Pw-with-grep-w.patch (application/octet-stream, attachment)]

This bug report was last modified 11 years and 90 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.