Package: grep;
Reported by: Jim Meyering <jim <at> meyering.net>
Date: Wed, 29 Oct 2014 18:31:01 UTC
Severity: normal
Tags: notabug
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Jim Meyering <jim <at> meyering.net> To: 18888 <at> debbugs.gnu.org Cc: TP coordinator <coordinator <at> translationproject.org>, platform-testers <at> gnu.org Subject: bug#18888: new snapshot available: grep-2.20.72-d512 Date: Wed, 29 Oct 2014 11:29:59 -0700
Thanks to many fixes and improvements by Paul Eggert and Norihiro Tanaka, here is a pre-release snapshot: grep snapshot: http://meyering.net/grep/grep-ss.tar.xz 1.2 MB http://meyering.net/grep/grep-ss.tar.xz.sig http://meyering.net/grep/grep-2.20.72-d512.tar.xz Here is the NEWS so far: ** Improvements Performance has been greatly improved for searching files containing holes, on platforms where lseek's SEEK_DATA flag works efficiently. Performance has improved for rejecting data that cannot match even the first part of a nontrivial pattern. Performance has improved for very long strings in patterns. If a file contains data improperly encoded for the current locale, and this is discovered before any of the file's contents are output, grep now treats the file as binary. grep -P no longer reports an error and exits when given invalid UTF-8 data. Instead, it considers the data to be non-matching. ** Bug fixes grep no longer mishandles patterns that contain \w or \W in multibyte locales. grep would fail to count newlines internally when operating in non-UTF8 multibyte locales, leading it to print potentially many lines that did not match. E.g., the command, "seq 10 | env LC_ALL=zh_CN src/grep -n .." would print this: 1:1 2 3 4 5 6 7 8 9 10 implying that the match, "10" was on line 1. [bug introduced in grep-2.19] grep in a non-UTF8 multibyte locale could mistakenly match in the middle of a multibyte character when using a '^'-anchored alternate in a pattern, leading it to print non-matching lines. [bug present since "the beginning"] grep -E rejected unmatched ')', instead of treating it like '\)'. [bug present since "the beginning"] ** Changes in behavior The GREP_OPTIONS environment variable is now obsolescent, and grep now warns if it is used. Please use an alias or script instead. In locales with multibyte character encodings other than UTF-8, grep -P now reports an error and exits instead of misbehaving. When searching binary data, grep now may treat non-text bytes as line terminators. This can boost performance significantly. grep -z no longer automatically treats the byte '\200' as binary data. ==================================================== Changes in grep since v2.20: Jim Meyering (13): maint: post-release administrivia build: don't redirect directly to $@ build: improve rule to generate egrep+fgrep scripts maint: generate distributed THANKS from VC'd THANKS.in doc: update HACKING maint: split long lines, and enforce the 80-column limit maint: avoid distcheck failure tests: add expect-to-fail test for a glibc regexp bug doc: move NEWS note about GREP_OPTIONS into proper section maint: suppress a false-positive -Wcast-align warning grep: avoid stack buffer read-underrun and overrun tests: make new test script executable gnulib: update to latest; bootstrap, too Norihiro Tanaka (13): dfa: speed-up at initial state dfa: separate dfaexec function to help optimization by compiler grep: fix subscript error when testing whether empty lines match dfa: check end of input buffer after transition in non-UTF8 multibyte locale dfa: factor out a new nontrivial block of duplicated code dfa: test for just-fixed bug dfa: fix a theoretical bug grep: initialize validation_boundary properly before use dfa: process all MBCSET constructs via glibc's matcher dfa: remove two erroneous clauses from a now-unused function tests: add test for grep -P fix dfa: avoid false match in a non-UTF8 multibyte locale dfa: make \w and \W work in multibyte locales Paul Eggert (46): build: update gnulib submodule to latest grep: use system strstr if available and fast grep: undo part of previous change doc: use gnulib fdl module maint: remove grep.spec build: don't make output files read-only build: avoid -Wstack-protector grep: with -E, unmatched ')' matches itself doc: Document -r vs --exclude more carefully. doc: prefer @env to @code doc: document LANGUAGE grep: fix integer-width bugs in undossify_input etc. grep: -P now treats invalid UTF-8 input as non-matching grep: port recent fix to older pcre version grep: fix false matches with -P '...$' and invalid UTF-8 grep: fix false matches with -P '...$' and invalid UTF-8 doc: bug tracker has moved to debbugs.gnu.org grep: make GREP_OPTIONS obsolescent grep: diagnose -P in non-UTF-8 multibyte locale grep: remove/refactor unnecessary code about line splitting grep: speed up -P on files containing many multibyte errors grep: use bool for boolean in grep.c grep: treat a file as binary if its prefix contains encoding errors grep: improve performance for older glibc grep: use mbclen cache more effectively grep: avoid false alarms for mb_clen and to_uchar grep: use mbclen cache in one more place grep: port -P speedup to hosts lacking PCRE_STUDY_JIT_COMPILE grep: fix -P speedup bug with empty match grep: refactor binary-vs-unknown-vs-text flags for clarity grep: -z no longer considers '\200' to be binary data grep: non-text bytes in binary data may be treated as line ends grep: minor -P speedup with jit_stack grep: improve -P performance in typical cases grep: skip past holes efficiently grep: port to platforms lacking SEEK_DATA grep: speed up processing of holes before EOF on Solaris grep: scan for valid multibyte strings more quickly grep: don't check extensively for invalid prefix bytes unless -P maint: generalize the -Wcast-align fix dfa: minor tweaks, mostly to remove __attribute__ ((noinline)) doc: clarify exit status doc: modernize and simplify man page grep: fix off-by-one bug in -P optimization grep: fix grep -P crash tests: work around older libpcre bugs when testing -P and UTF-8 Changes in gnulib since v2.20: * gnulib 98ca2c0...8415b67 (95): > socketlib, sockets, sys_socket: Use AC_REQUIRE to pacify autoconf. > iconv: avoid false detection of non-working iconv > bootstrap: print more diagnostics for missing programs > bootstrap: only update the gnulib submodule > symlinkat: port to AIX 7.1 > readlinkat: port to AIX 7.1 > remove spurious { > modules/fcntl: fix error reporting by dupfd > basename, dirname: Improve documentation. > exclude: declare exclude_patopts static > autoupdate > dirname: support compilation with C++ > qsort_r: include <config.h> > avltree-list: avoid compiler warnings > qsort_r: new module, for GNU-style qsort_r > strerror_r-posix: support compilation with C++ > fcntl-h: fix compilation with Intel C++ compiler > autoupdate > mountlist: use /proc/self/mountinfo when available > users.txt: add cmogstored > gnulib-tool: Sync with build-aux/bootstrap options > gnulib-tool: Fallback to wget when rsync fails > maintainer-makefile: add syntax check for useless ';;' > pthread, pthread_sigmask, threadlib: port to Ubuntu 14.04 > error: drop spurious semicolon > gnulib-common.m4: port to GCC 4.2.1 and Sun Studio 12 C++ > manywarnings: add GCC 4.9 warnings > vasnprintf: fix bugs in width computation > vasnprintf: Avoid signed/unsigned comparison warning. > parse-datetime: Avoid signed/unsigned comparison warning > qsort_r: new module, for GNU-style qsort_r > vla: new module > localename: make gl_locale_name_thread really thread-safe on Windows > getpass: don't assume struct termios > getdtablesize: fall back on sysconf (_SC_OPEN_MAX) > vararrays: modernize AC_C_VARARRAYS for C11 > relocatable-prog-wrapper: port gettext to OS X 10.8 + GCC 4.8.1 > sys_select: fix FD_ZERO problem on Solaris 10 > accept: document Solaris 10 type glitch > extern-inline: port to FreeBSD, DragonFly > autoupdate > Use consistent style to check DEBUG macro in regex_internal.c > openat-die: use _Noreturn markup > test-open: port to cygwin, which lacks Fortify > localename: Enforce declarations before statements. > test-userspec: don't look up numeric user names > localcharset, localename: MS-Windows support for non-default locales > announce-gen: avoid failure when Digest::SHA is installed > gettext: revert "update macros to version 0.19" > regex: don't deref NULL upon heap allocation failure > maint.mk: give projects more flexibilty in set_prog_name arguments > regex: fix memory leak in compiler > announce-gen: avoid perl warnings > localename: avoid -Wsuggest-attribute={const,pure} warnings > nl_langinfo: Fix last change. > Define macros for glibc > Sync up error.c with glibc > nl_langinfo: fix build under mingw > mountlist: do not classify a bind-mounted dir entry as "dummy" > maint.mk: less syntax-check noise when SIGPIPE is ignored > nl_langinfo: CODESET on MS-Windows and more items from localeconv > Bruno Haible has stepped down as maintainer. > mktime: merge #if/#ifdef usage from glibc > git-version-gen: improve option descriptions > regex: fix memory leak in compiler > regex: merge patch from libc > acl: port to gcc -Wredundant-decls > parse-duration: eliminate 68-year duration limit > pthread: don't assume AC_CANONICAL_HOST, port better to Solaris, etc. > pthread: define thread-safe macros on some platforms > regex: don't be multithreaded if USE_UNLOCKED_IO. > gettext: update macros to version 0.19 > select,poll: fix console handle check on windows 8 > select: fix waiting on anonymous pipes on MS-Windows > times: fix to return non constant value on MS-Windows > isatty: fix to work on windows 8 > maint: fix typo in fdl.texi > mountlist: avoid hasmntopt const type warning on solaris > maintainer-makefile: delete obsolete code > maintainer-makefile: avoid spurious error messages > rename: avoid unused-but-set-variable compiler warning > maint: add ChangeLog entry missing in previous commit > rename: mark a label as potentially unused > gnulib-common.m4: Fix typo in _GL_UNUSED_LABEL. > acl: apply pure attribute to two functions > gnulib-common.m4: add _GL_UNUSED_LABEL > dup2, fcntl, fcntl-h: port to AIX 7.1 > printf, config.rpath: Port to FreeBSD 10. > ftoastr: work around compiler bug in IBM xlc 12.1 > valgrind-tests: fixed misleading help message > isfinite, isinf, isnan tests: fix for little-endian PowerPC > exclude-tests: port to AIX 7.1 > pthread_sigmask, timer-time: use gl_THREADLIB only if needed > gnulib-tool: wget translations using --no-verbose rather than --quiet > gnulib-tool: adjust translation wget to avoid a https redirection
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.