Package: grep;
Reported by: Jim Meyering <jim <at> meyering.net>
Date: Wed, 29 Oct 2014 18:31:01 UTC
Severity: normal
Tags: notabug
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Message #8 received at 18888 <at> debbugs.gnu.org (full text, mbox):
From: Jim Meyering <jim <at> meyering.net> To: 18888 <at> debbugs.gnu.org Subject: Re: bug#18888: new snapshot available: grep-2.20.72-d512 Date: Wed, 29 Oct 2014 11:35:48 -0700
FYI, just prior to making that snapshot, I pushed a change that updated the gnulib submodule to the latest as of some time yesterday, and also pulled in some small improvements to the bootstrap script: http://git.sv.gnu.org/cgit/grep.git/commit/?id=d512007830d2c On Wed, Oct 29, 2014 at 11:29 AM, Jim Meyering <jim <at> meyering.net> wrote: > Thanks to many fixes and improvements by Paul Eggert and Norihiro Tanaka, > here is a pre-release snapshot: > > grep snapshot: > http://meyering.net/grep/grep-ss.tar.xz 1.2 MB > http://meyering.net/grep/grep-ss.tar.xz.sig > http://meyering.net/grep/grep-2.20.72-d512.tar.xz > > Here is the NEWS so far: > > ** Improvements > > Performance has been greatly improved for searching files containing > holes, on platforms where lseek's SEEK_DATA flag works efficiently. > > Performance has improved for rejecting data that cannot match even > the first part of a nontrivial pattern. > > Performance has improved for very long strings in patterns. > > If a file contains data improperly encoded for the current locale, > and this is discovered before any of the file's contents are output, > grep now treats the file as binary. > > grep -P no longer reports an error and exits when given invalid UTF-8 data. > Instead, it considers the data to be non-matching. > > ** Bug fixes > > grep no longer mishandles patterns that contain \w or \W in multibyte > locales. > > grep would fail to count newlines internally when operating in non-UTF8 > multibyte locales, leading it to print potentially many lines that did > not match. E.g., the command, "seq 10 | env LC_ALL=zh_CN src/grep -n .." > would print this: > 1:1 > 2 > 3 > 4 > 5 > 6 > 7 > 8 > 9 > 10 > implying that the match, "10" was on line 1. > [bug introduced in grep-2.19] > > grep in a non-UTF8 multibyte locale could mistakenly match in the middle > of a multibyte character when using a '^'-anchored alternate in a pattern, > leading it to print non-matching lines. [bug present since "the beginning"] > > grep -E rejected unmatched ')', instead of treating it like '\)'. > [bug present since "the beginning"] > > ** Changes in behavior > > The GREP_OPTIONS environment variable is now obsolescent, and grep > now warns if it is used. Please use an alias or script instead. > > In locales with multibyte character encodings other than UTF-8, > grep -P now reports an error and exits instead of misbehaving. > > When searching binary data, grep now may treat non-text bytes as > line terminators. This can boost performance significantly. > > grep -z no longer automatically treats the byte '\200' as binary data. > ==================================================== > > Changes in grep since v2.20: > > Jim Meyering (13): > maint: post-release administrivia > build: don't redirect directly to $@ > build: improve rule to generate egrep+fgrep scripts > maint: generate distributed THANKS from VC'd THANKS.in > doc: update HACKING > maint: split long lines, and enforce the 80-column limit > maint: avoid distcheck failure > tests: add expect-to-fail test for a glibc regexp bug > doc: move NEWS note about GREP_OPTIONS into proper section > maint: suppress a false-positive -Wcast-align warning > grep: avoid stack buffer read-underrun and overrun > tests: make new test script executable > gnulib: update to latest; bootstrap, too > > Norihiro Tanaka (13): > dfa: speed-up at initial state > dfa: separate dfaexec function to help optimization by compiler > grep: fix subscript error when testing whether empty lines match > dfa: check end of input buffer after transition in non-UTF8 > multibyte locale > dfa: factor out a new nontrivial block of duplicated code > dfa: test for just-fixed bug > dfa: fix a theoretical bug > grep: initialize validation_boundary properly before use > dfa: process all MBCSET constructs via glibc's matcher > dfa: remove two erroneous clauses from a now-unused function > tests: add test for grep -P fix > dfa: avoid false match in a non-UTF8 multibyte locale > dfa: make \w and \W work in multibyte locales > > Paul Eggert (46): > build: update gnulib submodule to latest > grep: use system strstr if available and fast > grep: undo part of previous change > doc: use gnulib fdl module > maint: remove grep.spec > build: don't make output files read-only > build: avoid -Wstack-protector > grep: with -E, unmatched ')' matches itself > doc: Document -r vs --exclude more carefully. > doc: prefer @env to @code > doc: document LANGUAGE > grep: fix integer-width bugs in undossify_input etc. > grep: -P now treats invalid UTF-8 input as non-matching > grep: port recent fix to older pcre version > grep: fix false matches with -P '...$' and invalid UTF-8 > grep: fix false matches with -P '...$' and invalid UTF-8 > doc: bug tracker has moved to debbugs.gnu.org > grep: make GREP_OPTIONS obsolescent > grep: diagnose -P in non-UTF-8 multibyte locale > grep: remove/refactor unnecessary code about line splitting > grep: speed up -P on files containing many multibyte errors > grep: use bool for boolean in grep.c > grep: treat a file as binary if its prefix contains encoding errors > grep: improve performance for older glibc > grep: use mbclen cache more effectively > grep: avoid false alarms for mb_clen and to_uchar > grep: use mbclen cache in one more place > grep: port -P speedup to hosts lacking PCRE_STUDY_JIT_COMPILE > grep: fix -P speedup bug with empty match > grep: refactor binary-vs-unknown-vs-text flags for clarity > grep: -z no longer considers '\200' to be binary data > grep: non-text bytes in binary data may be treated as line ends > grep: minor -P speedup with jit_stack > grep: improve -P performance in typical cases > grep: skip past holes efficiently > grep: port to platforms lacking SEEK_DATA > grep: speed up processing of holes before EOF on Solaris > grep: scan for valid multibyte strings more quickly > grep: don't check extensively for invalid prefix bytes unless -P > maint: generalize the -Wcast-align fix > dfa: minor tweaks, mostly to remove __attribute__ ((noinline)) > doc: clarify exit status > doc: modernize and simplify man page > grep: fix off-by-one bug in -P optimization > grep: fix grep -P crash > tests: work around older libpcre bugs when testing -P and UTF-8 > > > Changes in gnulib since v2.20: > > * gnulib 98ca2c0...8415b67 (95): > > socketlib, sockets, sys_socket: Use AC_REQUIRE to pacify autoconf. > > iconv: avoid false detection of non-working iconv > > bootstrap: print more diagnostics for missing programs > > bootstrap: only update the gnulib submodule > > symlinkat: port to AIX 7.1 > > readlinkat: port to AIX 7.1 > > remove spurious { > > modules/fcntl: fix error reporting by dupfd > > basename, dirname: Improve documentation. > > exclude: declare exclude_patopts static > > autoupdate > > dirname: support compilation with C++ > > qsort_r: include <config.h> > > avltree-list: avoid compiler warnings > > qsort_r: new module, for GNU-style qsort_r > > strerror_r-posix: support compilation with C++ > > fcntl-h: fix compilation with Intel C++ compiler > > autoupdate > > mountlist: use /proc/self/mountinfo when available > > users.txt: add cmogstored > > gnulib-tool: Sync with build-aux/bootstrap options > > gnulib-tool: Fallback to wget when rsync fails > > maintainer-makefile: add syntax check for useless ';;' > > pthread, pthread_sigmask, threadlib: port to Ubuntu 14.04 > > error: drop spurious semicolon > > gnulib-common.m4: port to GCC 4.2.1 and Sun Studio 12 C++ > > manywarnings: add GCC 4.9 warnings > > vasnprintf: fix bugs in width computation > > vasnprintf: Avoid signed/unsigned comparison warning. > > parse-datetime: Avoid signed/unsigned comparison warning > > qsort_r: new module, for GNU-style qsort_r > > vla: new module > > localename: make gl_locale_name_thread really thread-safe on Windows > > getpass: don't assume struct termios > > getdtablesize: fall back on sysconf (_SC_OPEN_MAX) > > vararrays: modernize AC_C_VARARRAYS for C11 > > relocatable-prog-wrapper: port gettext to OS X 10.8 + GCC 4.8.1 > > sys_select: fix FD_ZERO problem on Solaris 10 > > accept: document Solaris 10 type glitch > > extern-inline: port to FreeBSD, DragonFly > > autoupdate > > Use consistent style to check DEBUG macro in regex_internal.c > > openat-die: use _Noreturn markup > > test-open: port to cygwin, which lacks Fortify > > localename: Enforce declarations before statements. > > test-userspec: don't look up numeric user names > > localcharset, localename: MS-Windows support for non-default locales > > announce-gen: avoid failure when Digest::SHA is installed > > gettext: revert "update macros to version 0.19" > > regex: don't deref NULL upon heap allocation failure > > maint.mk: give projects more flexibilty in set_prog_name arguments > > regex: fix memory leak in compiler > > announce-gen: avoid perl warnings > > localename: avoid -Wsuggest-attribute={const,pure} warnings > > nl_langinfo: Fix last change. > > Define macros for glibc > > Sync up error.c with glibc > > nl_langinfo: fix build under mingw > > mountlist: do not classify a bind-mounted dir entry as "dummy" > > maint.mk: less syntax-check noise when SIGPIPE is ignored > > nl_langinfo: CODESET on MS-Windows and more items from localeconv > > Bruno Haible has stepped down as maintainer. > > mktime: merge #if/#ifdef usage from glibc > > git-version-gen: improve option descriptions > > regex: fix memory leak in compiler > > regex: merge patch from libc > > acl: port to gcc -Wredundant-decls > > parse-duration: eliminate 68-year duration limit > > pthread: don't assume AC_CANONICAL_HOST, port better to Solaris, etc. > > pthread: define thread-safe macros on some platforms > > regex: don't be multithreaded if USE_UNLOCKED_IO. > > gettext: update macros to version 0.19 > > select,poll: fix console handle check on windows 8 > > select: fix waiting on anonymous pipes on MS-Windows > > times: fix to return non constant value on MS-Windows > > isatty: fix to work on windows 8 > > maint: fix typo in fdl.texi > > mountlist: avoid hasmntopt const type warning on solaris > > maintainer-makefile: delete obsolete code > > maintainer-makefile: avoid spurious error messages > > rename: avoid unused-but-set-variable compiler warning > > maint: add ChangeLog entry missing in previous commit > > rename: mark a label as potentially unused > > gnulib-common.m4: Fix typo in _GL_UNUSED_LABEL. > > acl: apply pure attribute to two functions > > gnulib-common.m4: add _GL_UNUSED_LABEL > > dup2, fcntl, fcntl-h: port to AIX 7.1 > > printf, config.rpath: Port to FreeBSD 10. > > ftoastr: work around compiler bug in IBM xlc 12.1 > > valgrind-tests: fixed misleading help message > > isfinite, isinf, isnan tests: fix for little-endian PowerPC > > exclude-tests: port to AIX 7.1 > > pthread_sigmask, timer-time: use gl_THREADLIB only if needed > > gnulib-tool: wget translations using --no-verbose rather than --quiet > > gnulib-tool: adjust translation wget to avoid a https redirection > > >
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.