GNU bug report logs - #22655
grep-2.21 (and git master): --null-data and ranges work in an odd way (-P works fine)

Previous Next

Package: grep;

Reported by: Sergei Trofimovich <slyfox <at> gentoo.org>

Date: Sat, 13 Feb 2016 23:24:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ulya Fokanova <skvadrik <at> gmail.com>
To: Ulrich Mueller <ulm <at> gentoo.org>, Sergei Trofimovich <slyfox <at> gentoo.org>
Cc: Jim Meyering <meyering <at> fb.com>, bug-grep <at> gnu.org,
 Norihiro Tanaka <noritnk <at> kcn.ne.jp>, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: grep-2.21 (and git master): --null-data and ranges work in an odd
 way (-P works fine)
Date: Sun, 14 Feb 2016 20:02:13 +0000
[Message part 1 (text/plain, inline)]
I've explored the following case:

   $ printf '12\n34\0' | LC_ALL=en_US.utf-8 grep -z '^[1-4]*$' | wc -c
   6

It's a bug (there should be no match).

This is what grep does:

 * triesto build DFA (as indfa.c)
 * fails to expand character range [1-4] because of multibyte
   localeen_US.utf-8 and gives up building DFA(marks [1-4] as BACKREF
   that suppressesall dfa.c-related code), note the difference with
   [1234] casein whichthere's no need to expand multibyte range
 * falls back to Regex (gnulib extension of regex.h)
 * Regex doesn't support '-z'semantics(the closest configuration to
   '-z' is RE_NEWLINE_ALT, which is already included in RE_SYNTAX_GREP
   set), so '\n'is treated as newline and match erroneously succeeds

I think this should be worked around in grep: before calling 're_search' 
it should split the input string by 'eolbyte'.

The bug also present with PCRE engine:

   $ printf '12\n34\0' | LC_ALL=en_US.utf-8 grep -z -P '^[1234]*$' | wc -c
   6
   $ printf '12\n34\0' | LC_ALL=en_US.utf-8 grep -z -P '^[1-4]*$' | wc -c
   6

Ulya

[Message part 2 (text/html, inline)]

This bug report was last modified 8 years and 190 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.