GNU bug report logs - #22655
grep-2.21 (and git master): --null-data and ranges work in an odd way (-P works fine)

Previous Next

Package: grep;

Reported by: Sergei Trofimovich <slyfox <at> gentoo.org>

Date: Sat, 13 Feb 2016 23:24:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net>
To: Ulya Fokanova <skvadrik <at> gmail.com>
Cc: Ulrich Mueller <ulm <at> gentoo.org>, 22655 <at> debbugs.gnu.org, Sergei Trofimovich <slyfox <at> gentoo.org>, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: bug#22655: grep-2.21 (and git master): --null-data and ranges work in an odd way (-P works fine)
Date: Sun, 21 Feb 2016 08:34:34 -0800
On Sat, Feb 20, 2016 at 8:19 PM, Jim Meyering <jim <at> meyering.net> wrote:
> On Sun, Feb 14, 2016 at 12:02 PM, Ulya Fokanova <skvadrik <at> gmail.com> wrote:
>> I've explored the following case:
>>
>>    $ printf '12\n34\0' | LC_ALL=en_US.utf-8 grep -z '^[1-4]*$' | wc -c
>>    6
...
>> The bug also present with PCRE engine:
>>
>>    $ printf '12\n34\0' | LC_ALL=en_US.utf-8 grep -z -P '^[1234]*$' | wc -c
>>    6
>>    $ printf '12\n34\0' | LC_ALL=en_US.utf-8 grep -z -P '^[1-4]*$' | wc -c
>>    6
>
> Thank you for the analysis and the report.
> I have fixed the regex-oriented problem with the attached
> patch, but not yet the case using -P -z (PCRE + --null-data):

The -Pz/PCRE problem is more fundamental, and strikes
even with LC_ALL=C. This shows that with -Pz, anchors
still wrongly match at newlines, rather than at \0 bytes:

  $ printf '\0a\nb\0' | LC_ALL=C src/grep -Plz '^a'
  [Exit 1]
  $ printf '\0a\nb\0' | LC_ALL=C src/grep -Plz '^b'
  (standard input)

Fixing this is on PCRE's maint/README wish list with this item:

. Line endings:
  * Option to use NUL as a line terminator in subject strings. This could now
    be done relatively easily since the extension to support LF, CR, and CRLF.




This bug report was last modified 8 years and 190 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.