GNU bug report logs - #18888
new snapshot available: grep-2.20.72-d512

Previous Next

Package: grep;

Reported by: Jim Meyering <jim <at> meyering.net>

Date: Wed, 29 Oct 2014 18:31:01 UTC

Severity: normal

Tags: notabug

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: bug-grep <at> gnu.org
Cc: TP coordinator <coordinator <at> translationproject.org>,
 platform-testers <at> gnu.org
Subject: new snapshot available: grep-2.20.72-d512
Date: Wed, 29 Oct 2014 11:29:59 -0700
Thanks to many fixes and improvements by Paul Eggert and Norihiro Tanaka,
here is a pre-release snapshot:

grep snapshot:
  http://meyering.net/grep/grep-ss.tar.xz      1.2 MB
  http://meyering.net/grep/grep-ss.tar.xz.sig
  http://meyering.net/grep/grep-2.20.72-d512.tar.xz

Here is the NEWS so far:

** Improvements

  Performance has been greatly improved for searching files containing
  holes, on platforms where lseek's SEEK_DATA flag works efficiently.

  Performance has improved for rejecting data that cannot match even
  the first part of a nontrivial pattern.

  Performance has improved for very long strings in patterns.

  If a file contains data improperly encoded for the current locale,
  and this is discovered before any of the file's contents are output,
  grep now treats the file as binary.

  grep -P no longer reports an error and exits when given invalid UTF-8 data.
  Instead, it considers the data to be non-matching.

** Bug fixes

  grep no longer mishandles patterns that contain \w or \W in multibyte
  locales.

  grep would fail to count newlines internally when operating in non-UTF8
  multibyte locales, leading it to print potentially many lines that did
  not match.  E.g., the command, "seq 10 | env LC_ALL=zh_CN src/grep -n .."
  would print this:
  1:1
  2
  3
  4
  5
  6
  7
  8
  9
  10
  implying that the match, "10" was on line 1.
  [bug introduced in grep-2.19]

  grep in a non-UTF8 multibyte locale could mistakenly match in the middle
  of a multibyte character when using a '^'-anchored alternate in a pattern,
  leading it to print non-matching lines.  [bug present since "the beginning"]

  grep -E rejected unmatched ')', instead of treating it like '\)'.
  [bug present since "the beginning"]

** Changes in behavior

  The GREP_OPTIONS environment variable is now obsolescent, and grep
  now warns if it is used.  Please use an alias or script instead.

  In locales with multibyte character encodings other than UTF-8,
  grep -P now reports an error and exits instead of misbehaving.

  When searching binary data, grep now may treat non-text bytes as
  line terminators.  This can boost performance significantly.

  grep -z no longer automatically treats the byte '\200' as binary data.
====================================================

Changes in grep since v2.20:

Jim Meyering (13):
      maint: post-release administrivia
      build: don't redirect directly to $@
      build: improve rule to generate egrep+fgrep scripts
      maint: generate distributed THANKS from VC'd THANKS.in
      doc: update HACKING
      maint: split long lines, and enforce the 80-column limit
      maint: avoid distcheck failure
      tests: add expect-to-fail test for a glibc regexp bug
      doc: move NEWS note about GREP_OPTIONS into proper section
      maint: suppress a false-positive -Wcast-align warning
      grep: avoid stack buffer read-underrun and overrun
      tests: make new test script executable
      gnulib: update to latest; bootstrap, too

Norihiro Tanaka (13):
      dfa: speed-up at initial state
      dfa: separate dfaexec function to help optimization by compiler
      grep: fix subscript error when testing whether empty lines match
      dfa: check end of input buffer after transition in non-UTF8
multibyte locale
      dfa: factor out a new nontrivial block of duplicated code
      dfa: test for just-fixed bug
      dfa: fix a theoretical bug
      grep: initialize validation_boundary properly before use
      dfa: process all MBCSET constructs via glibc's matcher
      dfa: remove two erroneous clauses from a now-unused function
      tests: add test for grep -P fix
      dfa: avoid false match in a non-UTF8 multibyte locale
      dfa: make \w and \W work in multibyte locales

Paul Eggert (46):
      build: update gnulib submodule to latest
      grep: use system strstr if available and fast
      grep: undo part of previous change
      doc: use gnulib fdl module
      maint: remove grep.spec
      build: don't make output files read-only
      build: avoid -Wstack-protector
      grep: with -E, unmatched ')' matches itself
      doc: Document -r vs --exclude more carefully.
      doc: prefer @env to @code
      doc: document LANGUAGE
      grep: fix integer-width bugs in undossify_input etc.
      grep: -P now treats invalid UTF-8 input as non-matching
      grep: port recent fix to older pcre version
      grep: fix false matches with -P '...$' and invalid UTF-8
      grep: fix false matches with -P '...$' and invalid UTF-8
      doc: bug tracker has moved to debbugs.gnu.org
      grep: make GREP_OPTIONS obsolescent
      grep: diagnose -P in non-UTF-8 multibyte locale
      grep: remove/refactor unnecessary code about line splitting
      grep: speed up -P on files containing many multibyte errors
      grep: use bool for boolean in grep.c
      grep: treat a file as binary if its prefix contains encoding errors
      grep: improve performance for older glibc
      grep: use mbclen cache more effectively
      grep: avoid false alarms for mb_clen and to_uchar
      grep: use mbclen cache in one more place
      grep: port -P speedup to hosts lacking PCRE_STUDY_JIT_COMPILE
      grep: fix -P speedup bug with empty match
      grep: refactor binary-vs-unknown-vs-text flags for clarity
      grep: -z no longer considers '\200' to be binary data
      grep: non-text bytes in binary data may be treated as line ends
      grep: minor -P speedup with jit_stack
      grep: improve -P performance in typical cases
      grep: skip past holes efficiently
      grep: port to platforms lacking SEEK_DATA
      grep: speed up processing of holes before EOF on Solaris
      grep: scan for valid multibyte strings more quickly
      grep: don't check extensively for invalid prefix bytes unless -P
      maint: generalize the -Wcast-align fix
      dfa: minor tweaks, mostly to remove __attribute__ ((noinline))
      doc: clarify exit status
      doc: modernize and simplify man page
      grep: fix off-by-one bug in -P optimization
      grep: fix grep -P crash
      tests: work around older libpcre bugs when testing -P and UTF-8


Changes in gnulib since v2.20:

* gnulib 98ca2c0...8415b67 (95):
  > socketlib, sockets, sys_socket: Use AC_REQUIRE to pacify autoconf.
  > iconv: avoid false detection of non-working iconv
  > bootstrap: print more diagnostics for missing programs
  > bootstrap: only update the gnulib submodule
  > symlinkat: port to AIX 7.1
  > readlinkat: port to AIX 7.1
  > remove spurious {
  > modules/fcntl: fix error reporting by dupfd
  > basename, dirname: Improve documentation.
  > exclude: declare exclude_patopts static
  > autoupdate
  > dirname: support compilation with C++
  > qsort_r: include <config.h>
  > avltree-list: avoid compiler warnings
  > qsort_r: new module, for GNU-style qsort_r
  > strerror_r-posix: support compilation with C++
  > fcntl-h: fix compilation with Intel C++ compiler
  > autoupdate
  > mountlist: use /proc/self/mountinfo when available
  > users.txt: add cmogstored
  > gnulib-tool: Sync with build-aux/bootstrap options
  > gnulib-tool: Fallback to wget when rsync fails
  > maintainer-makefile: add syntax check for useless ';;'
  > pthread, pthread_sigmask, threadlib: port to Ubuntu 14.04
  > error: drop spurious semicolon
  > gnulib-common.m4: port to GCC 4.2.1 and Sun Studio 12 C++
  > manywarnings: add GCC 4.9 warnings
  > vasnprintf: fix bugs in width computation
  > vasnprintf: Avoid signed/unsigned comparison warning.
  > parse-datetime: Avoid signed/unsigned comparison warning
  > qsort_r: new module, for GNU-style qsort_r
  > vla: new module
  > localename: make gl_locale_name_thread really thread-safe on Windows
  > getpass: don't assume struct termios
  > getdtablesize: fall back on sysconf (_SC_OPEN_MAX)
  > vararrays: modernize AC_C_VARARRAYS for C11
  > relocatable-prog-wrapper: port gettext to OS X 10.8 + GCC 4.8.1
  > sys_select: fix FD_ZERO problem on Solaris 10
  > accept: document Solaris 10 type glitch
  > extern-inline: port to FreeBSD, DragonFly
  > autoupdate
  > Use consistent style to check DEBUG macro in regex_internal.c
  > openat-die: use _Noreturn markup
  > test-open: port to cygwin, which lacks Fortify
  > localename: Enforce declarations before statements.
  > test-userspec: don't look up numeric user names
  > localcharset, localename: MS-Windows support for non-default locales
  > announce-gen: avoid failure when Digest::SHA is installed
  > gettext: revert "update macros to version 0.19"
  > regex: don't deref NULL upon heap allocation failure
  > maint.mk: give projects more flexibilty in set_prog_name arguments
  > regex: fix memory leak in compiler
  > announce-gen: avoid perl warnings
  > localename: avoid -Wsuggest-attribute={const,pure} warnings
  > nl_langinfo: Fix last change.
  > Define macros for glibc
  > Sync up error.c with glibc
  > nl_langinfo: fix build under mingw
  > mountlist: do not classify a bind-mounted dir entry as "dummy"
  > maint.mk: less syntax-check noise when SIGPIPE is ignored
  > nl_langinfo: CODESET on MS-Windows and more items from localeconv
  > Bruno Haible has stepped down as maintainer.
  > mktime: merge #if/#ifdef usage from glibc
  > git-version-gen: improve option descriptions
  > regex: fix memory leak in compiler
  > regex: merge patch from libc
  > acl: port to gcc -Wredundant-decls
  > parse-duration: eliminate 68-year duration limit
  > pthread: don't assume AC_CANONICAL_HOST, port better to Solaris, etc.
  > pthread: define thread-safe macros on some platforms
  > regex: don't be multithreaded if USE_UNLOCKED_IO.
  > gettext: update macros to version 0.19
  > select,poll: fix console handle check on windows 8
  > select: fix waiting on anonymous pipes on MS-Windows
  > times: fix to return non constant value on MS-Windows
  > isatty: fix to work on windows 8
  > maint: fix typo in fdl.texi
  > mountlist: avoid hasmntopt const type warning on solaris
  > maintainer-makefile: delete obsolete code
  > maintainer-makefile: avoid spurious error messages
  > rename: avoid unused-but-set-variable compiler warning
  > maint: add ChangeLog entry missing in previous commit
  > rename: mark a label as potentially unused
  > gnulib-common.m4: Fix typo in _GL_UNUSED_LABEL.
  > acl: apply pure attribute to two functions
  > gnulib-common.m4: add _GL_UNUSED_LABEL
  > dup2, fcntl, fcntl-h: port to AIX 7.1
  > printf, config.rpath: Port to FreeBSD 10.
  > ftoastr: work around compiler bug in IBM xlc 12.1
  > valgrind-tests: fixed misleading help message
  > isfinite, isinf, isnan tests: fix for little-endian PowerPC
  > exclude-tests: port to AIX 7.1
  > pthread_sigmask, timer-time: use gl_THREADLIB only if needed
  > gnulib-tool: wget translations using --no-verbose rather than --quiet
  > gnulib-tool: adjust translation wget to avoid a https redirection




This bug report was last modified 10 years and 20 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.