From unknown Tue Jun 17 22:29:41 2025 X-Loop: help-debbugs@gnu.org Subject: bug#17305: [PATCH] dfa: fix bug that caused NUL to be mishandled in patterns Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 21 Apr 2014 06:24:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 17305 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: bonzini@gnu.org, 17305@debbugs.gnu.org X-Debbugs-Original-To: bonzini@gnu.org, bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.13980613858771 (code B ref -1); Mon, 21 Apr 2014 06:24:02 +0000 Received: (at submit) by debbugs.gnu.org; 21 Apr 2014 06:23:05 +0000 Received: from localhost ([127.0.0.1]:53700 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wc7dA-0002HO-O2 for submit@debbugs.gnu.org; Mon, 21 Apr 2014 02:23:05 -0400 Received: from eggs.gnu.org ([208.118.235.92]:60863) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wc7d9-0002HH-Bu for submit@debbugs.gnu.org; Mon, 21 Apr 2014 02:23:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Wc7d3-00019J-8X for submit@debbugs.gnu.org; Mon, 21 Apr 2014 02:23:03 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:52812) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wc7d3-00019F-6C for submit@debbugs.gnu.org; Mon, 21 Apr 2014 02:22:57 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34310) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wc7cy-0003c8-SV for bug-grep@gnu.org; Mon, 21 Apr 2014 02:22:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Wc7cu-00018d-Jb for bug-grep@gnu.org; Mon, 21 Apr 2014 02:22:52 -0400 Received: from kiwi.cs.ucla.edu ([131.179.128.19]:59689) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wc7cu-00018T-Ai; Mon, 21 Apr 2014 02:22:48 -0400 Received: from kiwi.cs.ucla.edu (localhost.cs.ucla.edu [127.0.0.1]) by kiwi.cs.ucla.edu (8.14.5+Sun/8.14.5/UCLACS-6.0) with ESMTP id s3L6Mkd5028333; Sun, 20 Apr 2014 23:22:46 -0700 (PDT) Received: (from eggert@localhost) by kiwi.cs.ucla.edu (8.14.5+Sun/8.14.5/Submit) id s3L6MkUe028332; Sun, 20 Apr 2014 23:22:46 -0700 (PDT) Message-Id: <201404210622.s3L6MkUe028332@kiwi.cs.ucla.edu> From: Paul Eggert Date: Sun, 20 Apr 2014 23:18:56 -0700 X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) This bug was introduced in the early-2012 patches that fixed some context-handling bugs. Bisecting found commit d8951d3f4e1bbd564809aa8e713d8333bda2f802 (2012-02-05 18:00:43 +0100), but it apears the underlying problem was introduced in commit 8b47c4cf6556933f59226c234b0fe984f6c77dc7 (2012-01-03 11:22:09 +0100). * NEWS: Mention bug fix. * src/dfa.c (char_context): Consider NUL to be a newline only if -z. * tests/Makefile.am (TESTS): Add null-byte. * tests/null-byte: New file. --- NEWS | 3 +++ src/dfa.c | 2 +- tests/Makefile.am | 1 + tests/null-byte | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 57 insertions(+), 1 deletion(-) create mode 100755 tests/null-byte diff --git a/NEWS b/NEWS index 92ce95e..fbb782b 100644 --- a/NEWS +++ b/NEWS @@ -11,6 +11,9 @@ GNU grep NEWS -*- outline -*- grep no longer mishandles an empty pattern at the end of a pattern list. [bug introduced in grep-2.5] + grep -f no longer mishandles patterns containing NUL bytes. + [bug introduced in grep-2.11] + grep -P now works with -w and -x and backreferences. Before, echo aa|grep -Pw '(.)\1' would fail to match, yet echo aa|grep -Pw '(.)\2' would match. diff --git a/src/dfa.c b/src/dfa.c index 90cf4a9..c93f451 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -694,7 +694,7 @@ static charclass newline; static int char_context (unsigned char c) { - if (c == eolbyte || c == 0) + if (c == eolbyte) return CTX_NEWLINE; if (IS_WORD_CONSTITUENT (c)) return CTX_LETTER; diff --git a/tests/Makefile.am b/tests/Makefile.am index cc79903..91775bd 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -76,6 +76,7 @@ TESTS = \ max-count-vs-context \ mb-non-UTF8-performance \ multibyte-white-space \ + null-byte \ empty-line-mb \ unibyte-bracket-expr \ unibyte-negated-circumflex \ diff --git a/tests/null-byte b/tests/null-byte new file mode 100755 index 0000000..c967dbc --- /dev/null +++ b/tests/null-byte @@ -0,0 +1,52 @@ +#!/bin/sh +# Test NUL bytes in patterns and data. + +# Copyright 2014 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +. "${srcdir=.}/init.sh"; path_prepend_ ../src + +# Add "." to PATH for the use of get-mb-cur-max. +path_prepend_ . + +locales=C +for locale in en_US.iso885915 en_US.UTF-8; do + get-mb-cur-max en_US.UTF-8 >/dev/null 2>&1 && locales="$locales $locale" +done + +fail=0 + +for left in '' a '#' '\0'; do + for right in '' b '#' '\0'; do + data="$left\\0$right" + printf "$data\\n" >in || framework_failure_ + for hat in '' '^'; do + for dollar in '' '$'; do + for force_regex in '' '\\(\\)\\1'; do + pat="$hat$force_regex$data$dollar" + printf "$pat\\n" >pat || framework_failure_ + for locale in $locales; do + LC_ALL=$locale grep -f pat in || + fail_ "'$pat' does not match '$data'" + LC_ALL=$locale grep -a -f pat in | cmp -s - in || + fail_ "-a '$pat' does not match '$data'" + done + done + done + done + done +done + +Exit $fail -- 1.9.0 From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 21 02:25:26 2014 Received: (at control) by debbugs.gnu.org; 21 Apr 2014 06:25:26 +0000 Received: from localhost ([127.0.0.1]:53709 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wc7fO-0002Mc-TY for submit@debbugs.gnu.org; Mon, 21 Apr 2014 02:25:24 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:43497) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wc7fH-0002MF-7l for control@debbugs.gnu.org; Mon, 21 Apr 2014 02:25:16 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 53C16A60008 for ; Sun, 20 Apr 2014 23:25:14 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iIEcEU3ZNMZ2 for ; Sun, 20 Apr 2014 23:25:05 -0700 (PDT) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id C7A1AA60007 for ; Sun, 20 Apr 2014 23:25:05 -0700 (PDT) Message-ID: <5354B9C1.3010907@cs.ucla.edu> Date: Sun, 20 Apr 2014 23:25:05 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: control@debbugs.gnu.org Subject: 17305 is installed Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) close 17305 thanks From unknown Tue Jun 17 22:29:41 2025 X-Loop: help-debbugs@gnu.org Subject: bug#17305: [PATCH] dfa: fix bug that caused NUL to be mishandled in patterns Resent-From: Paolo Bonzini Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 21 Apr 2014 13:08:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 17305 X-GNU-PR-Package: grep X-GNU-PR-Keywords: patch To: eggert@CS.UCLA.EDU, 17305@debbugs.gnu.org X-Debbugs-Original-To: Paul Eggert , bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.13980856747851 (code B ref -1); Mon, 21 Apr 2014 13:08:02 +0000 Received: (at submit) by debbugs.gnu.org; 21 Apr 2014 13:07:54 +0000 Received: from localhost ([127.0.0.1]:53879 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WcDwv-00022W-UT for submit@debbugs.gnu.org; Mon, 21 Apr 2014 09:07:54 -0400 Received: from eggs.gnu.org ([208.118.235.92]:39395) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WcDwt-00022J-2r for submit@debbugs.gnu.org; Mon, 21 Apr 2014 09:07:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WcDwo-0006XM-50 for submit@debbugs.gnu.org; Mon, 21 Apr 2014 09:07:50 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:44180) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WcDwo-0006XG-1Z for submit@debbugs.gnu.org; Mon, 21 Apr 2014 09:07:46 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41103) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WcDwj-0001e8-HM for bug-grep@gnu.org; Mon, 21 Apr 2014 09:07:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WcDwf-0006Wl-2Q for bug-grep@gnu.org; Mon, 21 Apr 2014 09:07:41 -0400 Received: from mail-yk0-x22b.google.com ([2607:f8b0:4002:c07::22b]:53403) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WcDwe-0006Wh-Sx for bug-grep@gnu.org; Mon, 21 Apr 2014 09:07:36 -0400 Received: by mail-yk0-f171.google.com with SMTP id q9so3347843ykb.16 for ; Mon, 21 Apr 2014 06:07:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=zB2W1ZrbDqIsiDBfTFRX+2l/4P4LW0k9jjmmVuLtsww=; b=buWUR3RCWTVT8vpbq/VjRUVE5uGl06qrwV8V+XRqOQ0OqfmDf5JhW2Gauf0jmX8QvD 2lE5tmjuVOtwDfLqDe+ApBnrnm9emMl/XeEAsECeaQTDwFxCplIsEwLdRNpLivXdRWQI w1AaSOd4DwmX6w6vUzqihsOagoP6wwBkB+y6YwHAM2Z+t6u7qaYURlpHZfoNZF9L0HAD Qw0tQaNAeFoZ4xZ7/VeTi7e1RiO6Il4S1J8G3DdqIbE1L17DRcGHPwSjWEkFKugc6WLh mEZtKfDF4ZzTeutblLtwOamP728FpXPBnrfeTD/SSMDpmqK5Zfb0bQ/RfiMDC6qNBMnj l+lA== X-Received: by 10.236.140.16 with SMTP id d16mr52707227yhj.55.1398085656436; Mon, 21 Apr 2014 06:07:36 -0700 (PDT) Received: from yakj.usersys.redhat.com ([216.127.123.146]) by mx.google.com with ESMTPSA id c27sm69228187yhm.35.2014.04.21.06.07.34 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 21 Apr 2014 06:07:35 -0700 (PDT) Message-ID: <53551814.1090201@gnu.org> Date: Mon, 21 Apr 2014 09:07:32 -0400 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 References: <201404210622.s3L6MkUe028332@kiwi.cs.ucla.edu> In-Reply-To: <201404210622.s3L6MkUe028332@kiwi.cs.ucla.edu> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Il 21/04/2014 02:18, Paul Eggert ha scritto: > This bug was introduced in the early-2012 patches that fixed some > context-handling bugs. Bisecting found commit > d8951d3f4e1bbd564809aa8e713d8333bda2f802 (2012-02-05 18:00:43 +0100), > but it apears the underlying problem was introduced in commit > 8b47c4cf6556933f59226c234b0fe984f6c77dc7 (2012-01-03 11:22:09 +0100). > * NEWS: Mention bug fix. > * src/dfa.c (char_context): Consider NUL to be a newline only if -z. > * tests/Makefile.am (TESTS): Add null-byte. > * tests/null-byte: New file. Looks good, thanks! Paolo > --- > NEWS | 3 +++ > src/dfa.c | 2 +- > tests/Makefile.am | 1 + > tests/null-byte | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 57 insertions(+), 1 deletion(-) > create mode 100755 tests/null-byte > > diff --git a/NEWS b/NEWS > index 92ce95e..fbb782b 100644 > --- a/NEWS > +++ b/NEWS > @@ -11,6 +11,9 @@ GNU grep NEWS -*- outline -*- > grep no longer mishandles an empty pattern at the end of a pattern list. > [bug introduced in grep-2.5] > > + grep -f no longer mishandles patterns containing NUL bytes. > + [bug introduced in grep-2.11] > + > grep -P now works with -w and -x and backreferences. Before, > echo aa|grep -Pw '(.)\1' would fail to match, yet > echo aa|grep -Pw '(.)\2' would match. > diff --git a/src/dfa.c b/src/dfa.c > index 90cf4a9..c93f451 100644 > --- a/src/dfa.c > +++ b/src/dfa.c > @@ -694,7 +694,7 @@ static charclass newline; > static int > char_context (unsigned char c) > { > - if (c == eolbyte || c == 0) > + if (c == eolbyte) > return CTX_NEWLINE; > if (IS_WORD_CONSTITUENT (c)) > return CTX_LETTER; > diff --git a/tests/Makefile.am b/tests/Makefile.am > index cc79903..91775bd 100644 > --- a/tests/Makefile.am > +++ b/tests/Makefile.am > @@ -76,6 +76,7 @@ TESTS = \ > max-count-vs-context \ > mb-non-UTF8-performance \ > multibyte-white-space \ > + null-byte \ > empty-line-mb \ > unibyte-bracket-expr \ > unibyte-negated-circumflex \ > diff --git a/tests/null-byte b/tests/null-byte > new file mode 100755 > index 0000000..c967dbc > --- /dev/null > +++ b/tests/null-byte > @@ -0,0 +1,52 @@ > +#!/bin/sh > +# Test NUL bytes in patterns and data. > + > +# Copyright 2014 Free Software Foundation, Inc. > + > +# This program is free software: you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation, either version 3 of the License, or > +# (at your option) any later version. > + > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > + > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see . > + > +. "${srcdir=.}/init.sh"; path_prepend_ ../src > + > +# Add "." to PATH for the use of get-mb-cur-max. > +path_prepend_ . > + > +locales=C > +for locale in en_US.iso885915 en_US.UTF-8; do > + get-mb-cur-max en_US.UTF-8 >/dev/null 2>&1 && locales="$locales $locale" > +done > + > +fail=0 > + > +for left in '' a '#' '\0'; do > + for right in '' b '#' '\0'; do > + data="$left\\0$right" > + printf "$data\\n" >in || framework_failure_ > + for hat in '' '^'; do > + for dollar in '' '$'; do > + for force_regex in '' '\\(\\)\\1'; do > + pat="$hat$force_regex$data$dollar" > + printf "$pat\\n" >pat || framework_failure_ > + for locale in $locales; do > + LC_ALL=$locale grep -f pat in || > + fail_ "'$pat' does not match '$data'" > + LC_ALL=$locale grep -a -f pat in | cmp -s - in || > + fail_ "-a '$pat' does not match '$data'" > + done > + done > + done > + done > + done > +done > + > +Exit $fail >