GNU bug report logs - #17305
[PATCH] dfa: fix bug that caused NUL to be mishandled in patterns

Previous Next

Package: grep;

Reported by: Paul Eggert <eggert <at> CS.UCLA.EDU>

Date: Mon, 21 Apr 2014 06:24:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17305 in the body.
You can then email your comments to 17305 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#17305; Package grep. (Mon, 21 Apr 2014 06:24:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert <at> CS.UCLA.EDU>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Mon, 21 Apr 2014 06:24:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: bonzini <at> gnu.org
Subject: [PATCH] dfa: fix bug that caused NUL to be mishandled in patterns
Date: Sun, 20 Apr 2014 23:18:56 -0700
This bug was introduced in the early-2012 patches that fixed some
context-handling bugs.  Bisecting found commit
d8951d3f4e1bbd564809aa8e713d8333bda2f802 (2012-02-05 18:00:43 +0100),
but it apears the underlying problem was introduced in commit
8b47c4cf6556933f59226c234b0fe984f6c77dc7 (2012-01-03 11:22:09 +0100).
* NEWS: Mention bug fix.
* src/dfa.c (char_context): Consider NUL to be a newline only if -z.
* tests/Makefile.am (TESTS): Add null-byte.
* tests/null-byte: New file.
---
 NEWS              |  3 +++
 src/dfa.c         |  2 +-
 tests/Makefile.am |  1 +
 tests/null-byte   | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100755 tests/null-byte

diff --git a/NEWS b/NEWS
index 92ce95e..fbb782b 100644
--- a/NEWS
+++ b/NEWS
@@ -11,6 +11,9 @@ GNU grep NEWS                                    -*- outline -*-
   grep no longer mishandles an empty pattern at the end of a pattern list.
   [bug introduced in grep-2.5]
 
+  grep -f no longer mishandles patterns containing NUL bytes.
+  [bug introduced in grep-2.11]
+
   grep -P now works with -w and -x and backreferences. Before,
   echo aa|grep -Pw '(.)\1' would fail to match, yet
   echo aa|grep -Pw '(.)\2' would match.
diff --git a/src/dfa.c b/src/dfa.c
index 90cf4a9..c93f451 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -694,7 +694,7 @@ static charclass newline;
 static int
 char_context (unsigned char c)
 {
-  if (c == eolbyte || c == 0)
+  if (c == eolbyte)
     return CTX_NEWLINE;
   if (IS_WORD_CONSTITUENT (c))
     return CTX_LETTER;
diff --git a/tests/Makefile.am b/tests/Makefile.am
index cc79903..91775bd 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -76,6 +76,7 @@ TESTS =						\
   max-count-vs-context				\
   mb-non-UTF8-performance			\
   multibyte-white-space				\
+  null-byte					\
   empty-line-mb					\
   unibyte-bracket-expr				\
   unibyte-negated-circumflex			\
diff --git a/tests/null-byte b/tests/null-byte
new file mode 100755
index 0000000..c967dbc
--- /dev/null
+++ b/tests/null-byte
@@ -0,0 +1,52 @@
+#!/bin/sh
+# Test NUL bytes in patterns and data.
+
+# Copyright 2014 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+
+# Add "." to PATH for the use of get-mb-cur-max.
+path_prepend_ .
+
+locales=C
+for locale in en_US.iso885915 en_US.UTF-8; do
+  get-mb-cur-max en_US.UTF-8 >/dev/null 2>&1 && locales="$locales $locale"
+done
+
+fail=0
+
+for left in '' a '#' '\0'; do
+  for right in '' b '#' '\0'; do
+    data="$left\\0$right"
+    printf "$data\\n" >in || framework_failure_
+    for hat in '' '^'; do
+      for dollar in '' '$'; do
+        for force_regex in '' '\\(\\)\\1'; do
+          pat="$hat$force_regex$data$dollar"
+          printf "$pat\\n" >pat || framework_failure_
+          for locale in $locales; do
+            LC_ALL=$locale grep -f pat in ||
+              fail_ "'$pat' does not match '$data'"
+            LC_ALL=$locale grep -a -f pat in | cmp -s - in ||
+              fail_ "-a '$pat' does not match '$data'"
+          done
+        done
+      done
+    done
+  done
+done
+
+Exit $fail
-- 
1.9.0





bug closed, send any further explanations to 17305 <at> debbugs.gnu.org and Paul Eggert <eggert <at> CS.UCLA.EDU> Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Mon, 21 Apr 2014 06:26:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#17305; Package grep. (Mon, 21 Apr 2014 13:08:02 GMT) Full text and rfc822 format available.

Message #10 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paolo Bonzini <bonzini <at> gnu.org>
To: Paul Eggert <eggert <at> CS.UCLA.EDU>, bug-grep <at> gnu.org
Subject: Re: [PATCH] dfa: fix bug that caused NUL to be mishandled in patterns
Date: Mon, 21 Apr 2014 09:07:32 -0400
Il 21/04/2014 02:18, Paul Eggert ha scritto:
> This bug was introduced in the early-2012 patches that fixed some
> context-handling bugs.  Bisecting found commit
> d8951d3f4e1bbd564809aa8e713d8333bda2f802 (2012-02-05 18:00:43 +0100),
> but it apears the underlying problem was introduced in commit
> 8b47c4cf6556933f59226c234b0fe984f6c77dc7 (2012-01-03 11:22:09 +0100).
> * NEWS: Mention bug fix.
> * src/dfa.c (char_context): Consider NUL to be a newline only if -z.
> * tests/Makefile.am (TESTS): Add null-byte.
> * tests/null-byte: New file.

Looks good, thanks!

Paolo

> ---
>  NEWS              |  3 +++
>  src/dfa.c         |  2 +-
>  tests/Makefile.am |  1 +
>  tests/null-byte   | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 57 insertions(+), 1 deletion(-)
>  create mode 100755 tests/null-byte
>
> diff --git a/NEWS b/NEWS
> index 92ce95e..fbb782b 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -11,6 +11,9 @@ GNU grep NEWS                                    -*- outline -*-
>    grep no longer mishandles an empty pattern at the end of a pattern list.
>    [bug introduced in grep-2.5]
>
> +  grep -f no longer mishandles patterns containing NUL bytes.
> +  [bug introduced in grep-2.11]
> +
>    grep -P now works with -w and -x and backreferences. Before,
>    echo aa|grep -Pw '(.)\1' would fail to match, yet
>    echo aa|grep -Pw '(.)\2' would match.
> diff --git a/src/dfa.c b/src/dfa.c
> index 90cf4a9..c93f451 100644
> --- a/src/dfa.c
> +++ b/src/dfa.c
> @@ -694,7 +694,7 @@ static charclass newline;
>  static int
>  char_context (unsigned char c)
>  {
> -  if (c == eolbyte || c == 0)
> +  if (c == eolbyte)
>      return CTX_NEWLINE;
>    if (IS_WORD_CONSTITUENT (c))
>      return CTX_LETTER;
> diff --git a/tests/Makefile.am b/tests/Makefile.am
> index cc79903..91775bd 100644
> --- a/tests/Makefile.am
> +++ b/tests/Makefile.am
> @@ -76,6 +76,7 @@ TESTS =						\
>    max-count-vs-context				\
>    mb-non-UTF8-performance			\
>    multibyte-white-space				\
> +  null-byte					\
>    empty-line-mb					\
>    unibyte-bracket-expr				\
>    unibyte-negated-circumflex			\
> diff --git a/tests/null-byte b/tests/null-byte
> new file mode 100755
> index 0000000..c967dbc
> --- /dev/null
> +++ b/tests/null-byte
> @@ -0,0 +1,52 @@
> +#!/bin/sh
> +# Test NUL bytes in patterns and data.
> +
> +# Copyright 2014 Free Software Foundation, Inc.
> +
> +# This program is free software: you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation, either version 3 of the License, or
> +# (at your option) any later version.
> +
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +. "${srcdir=.}/init.sh"; path_prepend_ ../src
> +
> +# Add "." to PATH for the use of get-mb-cur-max.
> +path_prepend_ .
> +
> +locales=C
> +for locale in en_US.iso885915 en_US.UTF-8; do
> +  get-mb-cur-max en_US.UTF-8 >/dev/null 2>&1 && locales="$locales $locale"
> +done
> +
> +fail=0
> +
> +for left in '' a '#' '\0'; do
> +  for right in '' b '#' '\0'; do
> +    data="$left\\0$right"
> +    printf "$data\\n" >in || framework_failure_
> +    for hat in '' '^'; do
> +      for dollar in '' '$'; do
> +        for force_regex in '' '\\(\\)\\1'; do
> +          pat="$hat$force_regex$data$dollar"
> +          printf "$pat\\n" >pat || framework_failure_
> +          for locale in $locales; do
> +            LC_ALL=$locale grep -f pat in ||
> +              fail_ "'$pat' does not match '$data'"
> +            LC_ALL=$locale grep -a -f pat in | cmp -s - in ||
> +              fail_ "-a '$pat' does not match '$data'"
> +          done
> +        done
> +      done
> +    done
> +  done
> +done
> +
> +Exit $fail
>





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 20 May 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 29 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.