GNU bug report logs - #17305
[PATCH] dfa: fix bug that caused NUL to be mishandled in patterns

Previous Next

Package: grep;

Reported by: Paul Eggert <eggert <at> CS.UCLA.EDU>

Date: Mon, 21 Apr 2014 06:24:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: bonzini <at> gnu.org, 17305 <at> debbugs.gnu.org
Subject: bug#17305: [PATCH] dfa: fix bug that caused NUL to be mishandled in patterns
Date: Sun, 20 Apr 2014 23:18:56 -0700
This bug was introduced in the early-2012 patches that fixed some
context-handling bugs.  Bisecting found commit
d8951d3f4e1bbd564809aa8e713d8333bda2f802 (2012-02-05 18:00:43 +0100),
but it apears the underlying problem was introduced in commit
8b47c4cf6556933f59226c234b0fe984f6c77dc7 (2012-01-03 11:22:09 +0100).
* NEWS: Mention bug fix.
* src/dfa.c (char_context): Consider NUL to be a newline only if -z.
* tests/Makefile.am (TESTS): Add null-byte.
* tests/null-byte: New file.
---
 NEWS              |  3 +++
 src/dfa.c         |  2 +-
 tests/Makefile.am |  1 +
 tests/null-byte   | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100755 tests/null-byte

diff --git a/NEWS b/NEWS
index 92ce95e..fbb782b 100644
--- a/NEWS
+++ b/NEWS
@@ -11,6 +11,9 @@ GNU grep NEWS                                    -*- outline -*-
   grep no longer mishandles an empty pattern at the end of a pattern list.
   [bug introduced in grep-2.5]
 
+  grep -f no longer mishandles patterns containing NUL bytes.
+  [bug introduced in grep-2.11]
+
   grep -P now works with -w and -x and backreferences. Before,
   echo aa|grep -Pw '(.)\1' would fail to match, yet
   echo aa|grep -Pw '(.)\2' would match.
diff --git a/src/dfa.c b/src/dfa.c
index 90cf4a9..c93f451 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -694,7 +694,7 @@ static charclass newline;
 static int
 char_context (unsigned char c)
 {
-  if (c == eolbyte || c == 0)
+  if (c == eolbyte)
     return CTX_NEWLINE;
   if (IS_WORD_CONSTITUENT (c))
     return CTX_LETTER;
diff --git a/tests/Makefile.am b/tests/Makefile.am
index cc79903..91775bd 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -76,6 +76,7 @@ TESTS =						\
   max-count-vs-context				\
   mb-non-UTF8-performance			\
   multibyte-white-space				\
+  null-byte					\
   empty-line-mb					\
   unibyte-bracket-expr				\
   unibyte-negated-circumflex			\
diff --git a/tests/null-byte b/tests/null-byte
new file mode 100755
index 0000000..c967dbc
--- /dev/null
+++ b/tests/null-byte
@@ -0,0 +1,52 @@
+#!/bin/sh
+# Test NUL bytes in patterns and data.
+
+# Copyright 2014 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+
+# Add "." to PATH for the use of get-mb-cur-max.
+path_prepend_ .
+
+locales=C
+for locale in en_US.iso885915 en_US.UTF-8; do
+  get-mb-cur-max en_US.UTF-8 >/dev/null 2>&1 && locales="$locales $locale"
+done
+
+fail=0
+
+for left in '' a '#' '\0'; do
+  for right in '' b '#' '\0'; do
+    data="$left\\0$right"
+    printf "$data\\n" >in || framework_failure_
+    for hat in '' '^'; do
+      for dollar in '' '$'; do
+        for force_regex in '' '\\(\\)\\1'; do
+          pat="$hat$force_regex$data$dollar"
+          printf "$pat\\n" >pat || framework_failure_
+          for locale in $locales; do
+            LC_ALL=$locale grep -f pat in ||
+              fail_ "'$pat' does not match '$data'"
+            LC_ALL=$locale grep -a -f pat in | cmp -s - in ||
+              fail_ "-a '$pat' does not match '$data'"
+          done
+        done
+      done
+    done
+  done
+done
+
+Exit $fail
-- 
1.9.0





This bug report was last modified 11 years and 29 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.