From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 02 03:55:49 2015 Received: (at submit) by debbugs.gnu.org; 2 Jul 2015 07:55:49 +0000 Received: from localhost ([127.0.0.1]:36873 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZAZLY-0002HB-9g for submit@debbugs.gnu.org; Thu, 02 Jul 2015 03:55:49 -0400 Received: from eggs.gnu.org ([208.118.235.92]:32938) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZAZLW-0002Gy-4Y for submit@debbugs.gnu.org; Thu, 02 Jul 2015 03:55:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZAZLQ-0006jR-8V for submit@debbugs.gnu.org; Thu, 02 Jul 2015 03:55:41 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, MSGID_FROM_MTA_HEADER,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:50733) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZAZLQ-0006jJ-4M for submit@debbugs.gnu.org; Thu, 02 Jul 2015 03:55:40 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46444) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZAZLP-0006R5-5c for bug-grep@gnu.org; Thu, 02 Jul 2015 03:55:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZAZLK-0006hc-Vu for bug-grep@gnu.org; Thu, 02 Jul 2015 03:55:39 -0400 Received: from mail-oi0-x232.google.com ([2607:f8b0:4003:c06::232]:32898) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZAZLK-0006gt-Qs for bug-grep@gnu.org; Thu, 02 Jul 2015 03:55:34 -0400 Received: by oiyy130 with SMTP id y130so50176141oiy.0 for ; Thu, 02 Jul 2015 00:55:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:to:subject; bh=7CA9G1e4pNstajchhNJdDuBu0ZRDEo3t6+YfJW249H0=; b=M4mwr+J9g0/CeokvXfvCi/MZ5pCC6ksnxfkb0ZzubVaHlzk2Fks++G4cvR2MmYrcdw Vf/zSY/fHxf5pZKfpKAnh+cj3roL23Ww/1TRoFqP6MtcM8p5PqzuTOnYxt9kIv1/Nm/D U7mzn5iaD6I7725b6GnsRZp2sgfWksKlBIX8Z3fLyYS9eJ4FElhY1GfPcgYsNB3lGZZt JfoHju+xx6UN8GcvMiC9+joeoZLf6jPZk/+6vnVnJXYXDkltl4OsgpCynj7MpR+oSLkV sBAdLL5wFNvjaPYiJ9zaY25NdqQIkc4/6m6GctgAYWydPotvgMiMfcePZ6cQ9zKRu90T o3rA== X-Received: by 10.202.217.68 with SMTP id q65mr25696503oig.17.1435823733186; Thu, 02 Jul 2015 00:55:33 -0700 (PDT) Received: from evo ([2605:6000:ee4a:2900:6250:c93b:e4d4:b4bc]) by mx.google.com with ESMTPSA id y5sm2605607oes.15.2015.07.02.00.55.30 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 02 Jul 2015 00:55:32 -0700 (PDT) Message-ID: <5594ee74.a5413c0a.aad5a.ffffb80e@mx.google.com> Received: by evo (sSMTP sendmail emulation); Thu, 02 Jul 2015 02:55:28 -0500 Date: Thu, 02 Jul 2015 02:55:28 -0500 From: vampyrebat@gmail.com To: bug-grep@gnu.org Subject: 2.21 bug in handling at least one -P regular expression X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) grep 2.21 incorrectly handles a -P regular expression that 2.20 handled correctly. Thanks for looking into this. $ cat file Here's a line. This line has one blank line above it. This line has two blank lines above it. This line has three blank lines above it. This line has four blank lines above it. $ grep-2.20/src/grep -Pzo '(?<=\n\n\n).*' file This line has two blank lines above it. This line has three blank lines above it. This line has four blank lines above it. $ grep-2.21/src/grep -Pzo '(?<=\n\n\n).*' file This line has two blank lines above it. From debbugs-submit-bounces@debbugs.gnu.org Fri Jul 03 11:23:55 2015 Received: (at 20957-done) by debbugs.gnu.org; 3 Jul 2015 15:23:55 +0000 Received: from localhost ([127.0.0.1]:38852 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZB2ok-00069B-E1 for submit@debbugs.gnu.org; Fri, 03 Jul 2015 11:23:55 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:58734) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZB2oh-00068u-EC for 20957-done@debbugs.gnu.org; Fri, 03 Jul 2015 11:23:53 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 9DE211608D6; Fri, 3 Jul 2015 08:23:45 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id UpqCB75mqgC3; Fri, 3 Jul 2015 08:23:43 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id B57151608CB; Fri, 3 Jul 2015 08:23:43 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id RlB3IRKU46DE; Fri, 3 Jul 2015 08:23:43 -0700 (PDT) Received: from [192.168.1.9] (pool-100-32-155-148.lsanca.fios.verizon.net [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 8BEC41608C9; Fri, 3 Jul 2015 08:23:43 -0700 (PDT) Message-ID: <5596A8FF.6050707@cs.ucla.edu> Date: Fri, 03 Jul 2015 08:23:43 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: vampyrebat@gmail.com, 20957-done@debbugs.gnu.org Subject: Re: bug#20957: 2.21 bug in handling at least one -P regular expression References: <5594ee74.a5413c0a.aad5a.ffffb80e@mx.google.com> In-Reply-To: <5594ee74.a5413c0a.aad5a.ffffb80e@mx.google.com> Content-Type: multipart/mixed; boundary="------------040800020104090606090609" X-Spam-Score: -0.6 (/) X-Debbugs-Envelope-To: 20957-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.6 (/) This is a multi-part message in MIME format. --------------040800020104090606090609 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit vampyrebat@gmail.com wrote: > grep 2.21 incorrectly handles a -P regular expression that 2.20 handled correctly. Thanks for reporting that. I installed the attached patches. The first fixes the bug; the second is a minor cleanup. --------------040800020104090606090609 Content-Type: text/x-diff; name="0001-grep-don-t-mishandle-left-context-in-P.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-grep-don-t-mishandle-left-context-in-P.patch" >From bffb51cfda75eeb1d99c34973d5a45fc1b784d89 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 3 Jul 2015 08:10:54 -0700 Subject: [PATCH 1/2] grep: don't mishandle left context in -P http://bugs.gnu.org/20957 * src/pcresearch.c (jit_exec): New arg SEARCH_OFFSET. Caller changed. (Pexecute): Pass the left context to pcre_exec, so that PCRE regular-expression matching can see it. * tests/pcre-context: New file, to test for this bug. * tests/Makefile.am (TESTS): Add it. --- src/pcresearch.c | 55 +++++++++++++++++++++++++++++++++--------------------- tests/Makefile.am | 1 + tests/pcre-context | 38 +++++++++++++++++++++++++++++++++++++ 3 files changed, 73 insertions(+), 21 deletions(-) create mode 100755 tests/pcre-context diff --git a/src/pcresearch.c b/src/pcresearch.c index aa05e20..b1f8310 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -43,16 +43,18 @@ static pcre_extra *extra; static int jit_stack_size; # endif -/* Match the already-compiled PCRE pattern against the data in P, of - size SEARCH_BYTES, with options OPTIONS, and storing resulting - matches into SUB. Return the (nonnegative) match location or a - (negative) error number. */ +/* Match the already-compiled PCRE pattern against the data in SUBJECT, + of size SEARCH_BYTES and starting with offset SEARCH_OFFSET, with + options OPTIONS, and storing resulting matches into SUB. Return + the (nonnegative) match location or a (negative) error number. */ static int -jit_exec (char const *p, int search_bytes, int options, int *sub) +jit_exec (char const *subject, int search_bytes, int search_offset, + int options, int *sub) { while (true) { - int e = pcre_exec (cre, extra, p, search_bytes, 0, options, sub, NSUB); + int e = pcre_exec (cre, extra, subject, search_bytes, search_offset, + options, sub, NSUB); # if PCRE_STUDY_JIT_COMPILE if (e == PCRE_ERROR_JIT_STACKLIMIT @@ -187,6 +189,11 @@ Pexecute (char const *buf, size_t size, size_t *match_size, int e = PCRE_ERROR_NOMATCH; char const *line_end; + /* The search address to pass to pcre_exec. This is the start of + the buffer, or just past the most-recently discovered encoding + error. */ + char const *subject = buf; + /* If the input type is unknown, the caller is still testing the input, which means the current buffer cannot contain encoding errors and a multiline search is typically more efficient. @@ -226,12 +233,13 @@ Pexecute (char const *buf, size_t size, size_t *match_size, bol = false; } + int search_offset = p - subject; + /* Check for an empty match; this is faster than letting pcre_exec do it. */ - int search_bytes = line_end - p; - if (search_bytes == 0) + if (p == line_end) { - sub[0] = sub[1] = 0; + sub[0] = sub[1] = search_offset; e = empty_match[bol]; break; } @@ -242,17 +250,18 @@ Pexecute (char const *buf, size_t size, size_t *match_size, if (multiline) options |= PCRE_NO_UTF8_CHECK; - e = jit_exec (p, search_bytes, options, sub); + e = jit_exec (subject, line_end - subject, search_offset, + options, sub); if (e != PCRE_ERROR_BADUTF8) { if (0 < e && multiline && sub[1] - sub[0] != 0) { - char const *nl = memchr (p + sub[0], eolbyte, + char const *nl = memchr (subject + sub[0], eolbyte, sub[1] - sub[0]); if (nl) { /* This match crosses a line boundary; reject it. */ - p += sub[0]; + p = subject + sub[0]; line_end = nl; continue; } @@ -261,22 +270,26 @@ Pexecute (char const *buf, size_t size, size_t *match_size, } int valid_bytes = sub[0]; - /* Try to match the string before the encoding error. - Again, handle the empty-match case specially, for speed. */ - if (valid_bytes == 0) + /* Try to match the string before the encoding error. */ + if (valid_bytes < search_offset) + e = PCRE_ERROR_NOMATCH; + else if (valid_bytes == 0) { + /* Handle the empty-match case specially, for speed. + This optimization is valid if VALID_BYTES is zero, + which means SEARCH_OFFSET is also zero. */ sub[1] = 0; e = empty_match[bol]; } else - e = pcre_exec (cre, extra, p, valid_bytes, 0, - options | PCRE_NO_UTF8_CHECK | PCRE_NOTEOL, - sub, NSUB); + e = jit_exec (subject, valid_bytes, search_offset, + options | PCRE_NO_UTF8_CHECK | PCRE_NOTEOL, sub); + if (e != PCRE_ERROR_NOMATCH) break; /* Treat the encoding error as data that cannot match. */ - p += valid_bytes + 1; + p = subject += valid_bytes + 1; bol = false; } @@ -315,8 +328,8 @@ Pexecute (char const *buf, size_t size, size_t *match_size, } else { - char const *matchbeg = p + sub[0]; - char const *matchend = p + sub[1]; + char const *matchbeg = subject + sub[0]; + char const *matchend = subject + sub[1]; char const *beg; char const *end; if (start_ptr) diff --git a/tests/Makefile.am b/tests/Makefile.am index 2d7ebf6..7bceac7 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -92,6 +92,7 @@ TESTS = \ options \ pcre \ pcre-abort \ + pcre-context \ pcre-infloop \ pcre-invalid-utf8-input \ pcre-jitstack \ diff --git a/tests/pcre-context b/tests/pcre-context new file mode 100755 index 0000000..f0c96e0 --- /dev/null +++ b/tests/pcre-context @@ -0,0 +1,38 @@ +#!/bin/sh +# Test Perl regex with context +. "${srcdir=.}/init.sh"; path_prepend_ ../src +require_pcre_ + +cat >in <<'EOF' +Preceded by 0 empty lines. + +Preceded by 1 empty line. + + +Preceded by 2 empty lines. + + + +Preceded by 3 empty lines. + + + + +Preceded by 4 empty lines. + +EOF +test $? -eq 0 || framework_failure_ + +cat >exp <<'EOF' +Preceded by 2 empty lines. +Preceded by 3 empty lines. +Preceded by 4 empty lines. +EOF +test $? -eq 0 || framework_failure_ + +fail=0 + +grep -Pzo '(?<=\n\n\n).*' in >out || fail_ 'grep -Pzo failed' +compare exp out || fail=1 + +Exit $fail -- 2.1.0 --------------040800020104090606090609 Content-Type: text/x-diff; name="0002-grep-simplify-print_line_middle-slightly.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0002-grep-simplify-print_line_middle-slightly.patch" >From 36f8a291f87368072fd382cdcd9255b4163d6e1b Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 3 Jul 2015 08:11:53 -0700 Subject: [PATCH 2/2] grep: simplify print_line_middle slightly * src/grep.c (print_line_middle): Simplify. --- src/grep.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/grep.c b/src/grep.c index d1581e3..778dbcb 100644 --- a/src/grep.c +++ b/src/grep.c @@ -1022,8 +1022,8 @@ print_line_middle (const char *beg, const char *lim, const char *mid = NULL; while (cur < lim - && ((match_offset = execute (beg, lim - beg, &match_size, - beg + (cur - beg))) != (size_t) -1)) + && ((match_offset = execute (beg, lim - beg, &match_size, cur)) + != (size_t) -1)) { char const *b = beg + match_offset; -- 2.1.0 --------------040800020104090606090609-- From unknown Tue Jun 24 03:24:12 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 01 Aug 2015 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator