Package: grep;
Reported by: Jim Meyering <jim <at> meyering.net>
Date: Sun, 11 May 2014 05:44:02 UTC
Severity: normal
Tags: notabug
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Jim Meyering <jim <at> meyering.net> To: 17460 <at> debbugs.gnu.org Cc: TP coordinator <coordinator <at> translationproject.org>, platform-testers <at> gnu.org Subject: bug#17460: new snapshot available: grep-2.18.143-b298 Date: Sat, 10 May 2014 22:43:09 -0700
Here's the latest, in preparation for a grep-2.19 release. Please give it a good work-out and let us know of any problems. This release includes an unusually large number of bug fixes and impressive performance improvements, thanks to a lot of work by Norihiro Tanaka and Paul Eggert. grep snapshot: http://meyering.net/grep/grep-ss.tar.xz 1.2 MB http://meyering.net/grep/grep-ss.tar.xz.sig http://meyering.net/grep/grep-2.18.143-b298.tar.xz Here are the new parts of the NEWS file, followed by git shortlog entries: ================================================= ** Improvements Performance has improved, typically by 10% and in some cases by a factor of 200. However, performance of grep -P in UTF-8 locales has gotten worse as part of the fix for the abovementioned crashes. ** Bug fixes grep no longer mishandles patterns like [a-[.z.]], and no longer mishandles patterns like [^a] in locales that have multicharacter collating sequences so that [^a] can match a string of two characters. grep no longer mishandles an empty pattern at the end of a pattern list. [bug introduced in grep-2.5] grep -C NUM now outputs separators consistently even when NUM is zero, and similarly for grep -A NUM and grep -B NUM. [bug present since "the beginning"] grep -f no longer mishandles patterns containing NUL bytes. [bug introduced in grep-2.11] Plain grep, grep -E, and grep -F now treat encoding errors in patterns the same way the GNU regular expression matcher treats them, with respect to whether the errors can match parts of multibyte characters in data. [bug present since "the beginning"] grep -w no longer mishandles a potential match adjacent to a letter that takes up two or more bytes in a multibyte encoding. Similarly, the patterns '\<', '\>', '\b', and '\B' no longer mishandle word-boundary matches in multibyte locales. [bug present since "the beginning"] grep -P now reports an error and exits when given invalid UTF-8 data. Previously it was unreliable, and sometimes crashed or looped. [bug introduced in grep-2.16] grep -P now works with -w and -x and backreferences. Before, echo aa|grep -Pw '(.)\1' would fail to match, yet echo aa|grep -Pw '(.)\2' would match. grep -Pw now works like grep -w in that the matched string has to be preceded and followed by non-word components or the beginning and end of the line (as opposed to word boundaries before). Before, this echo a@@a| grep -Pw @@ would match, yet this echo a@@a| grep -w @@ would not. Now, they both fail to match, per the documentation on how grep's -w works. grep -i no longer mishandles patterns containing titlecase characters. For example, in a locale containing the titlecase character 'Lj' (U+01C8 LATIN CAPITAL LETTER L WITH SMALL LETTER J), 'grep -i Lj' now matches both 'LJ' (U+01C7 LATIN CAPITAL LETTER LJ) and 'lj' (U+01C9 LATIN SMALL LETTER LJ). ================================================= Changes in grep since v2.18: Jim Meyering (18): maint: post-release administrivia maint: dfa: pass NULL, not 0, as 2nd arg to setlocale tests: make a performance-measuring test less system-sensitive tests: avoid false-positive failure on some AMD CPUs maint: fix "make dist" tests: placate "make syntax-check" re compare arg ordering build: avoid OS X 10.8.5 build failure due to lack of static_assert maint: avoid sc_po_check syntax-check failure (kwset.c) tests: detect an infloop-inducing bug in grep -P (pcre-8.35) dfa: avoid new NULL dereference maint: Revert "dfa: avoid new NULL dereference" build: reenable some compiler warning options tests: use consistent spelling for locale name, en_US.UTF-8 grep: fix new heap write buffer overrun gnulib: update to latest maint: make ChangeLog generation more robust maint: mark some breakless cases with /* fallthrough */ comment gnulib: update submodule to latest, and bootstrap Norihiro Tanaka (33): grep: don't match line-by-line for case-insensitive with grep and awk grep: remove trivial_case_ignore grep: optimization of bracket expression for non-UTF8 locales grep: revert removal of trivial_case_ignore grep: avoid to add same character to a bracket expression grep: optimization for fgrep with changing the macher to grep macher. grep: perform the kwset-helping DFA match in narrower range grep: take mbrtowc_cache into new member of struct dfa dfa: avoid re-building a state built previously grep: reuse multibyte DFA buffers in non-UTF8 locales grep: fix performance bug with regex in line-by-line mode grep: optimization with the superset of DFA grep: use the Galil rule for Boyer-Moore algorithm in KWSet grep: prefer regex to DFA for ANYCHAR in multibyte locales grep: no match for the empty string included in multiple patterns grep: open CSET and transform into uppercase when MB_CUR_MAX == 1 dfa: speed up by checking multibyte characters on demand grep: speed-up for exact matching with begline and endline constraints. grep: may also use Boyer-Moore algorithm for case-insensitive matching grep: speed-up by using memchr() in Boyer-Moore searching grep: avoid wasting memory for large patterns in dfamust grep: skip checking of multibyte character boundary, reaching at eolbyte grep: speed up for a case to repeat failure in DFA after success in kwset kwset: improve performance by inlining tr dfa: optimize memory allocation grep: simplify superset grep: adjust timing back to kwset when dfaisfast is true grep: fix the bug in previous patch. grep: make KWset and DFA agree about invalid sequences in patterns dfa: speed up 'dfaisfast' grep: improve performance of -v when combined with -L, -l or -q dfa: fix inconsistency in multibyte locales grep: retry DFA superset after matching multiple lines Paul Eggert (90): grep: fix multiple bugs with bracket expressions * src/dfa.c (parse_bracket_exp): Parenthesize. * src/dfa.c (prednames): POSIX allows [[:xdigit:]] to match multibyte chars. grep: remove lint grep: fix bugs with -i and titlecase grep: avoid 'inline' when it doesn't matter grep: minor tuning for mb_case_map_apply doc: describe titlecase fix better grep: fix some unlikely bugs in trivial_case_ignore grep: fix comment maint: remove differences from gnulib regex code doc: do not overpromise --ignore-case's behavior build: update gnulib submodule to latest grep: fix case-fold mismatches between DFA and regex fgrep: fix case-fold incompatibility with plain 'grep' maint: pacify 'make dist' dfa: port to freestanding DJGPP (Bug#17056) egrep, fgrep: go back to shell scripts grep: fix and simplify grep -iF optimization dfa: avoid undefined behavior egrep, fgrep: improve diagnostics from shell scripts dfa: improve port to freestanding DJGPP dfa: cache results of mbrtowc for speed dfa: avoid an indirection and port wint_t usage dfa: improve port to freestanding DJGPP grep: simplify dfa.c by having it not include mbsupport.h directly grep: minor improvements to previous patch grep: cleanup DFA superset optimization grep: minor cleanups for Galil speedups grep: simplify memory allocation in kwset grep: remove trival_case_ignore grep: prefer bool in DFA internals grep: port better to hosts with nonstandard nl_langinfo grep: remove bool_bf grep: cleanup for empty-string fix grep: cleanup for HAS_DOS_FILE_CONTENTS issue grep: improvements for the open-CSET patch build: update gnulib submodule to latest dfa: clarify memory allocation and port to IRIX dfa: avoid unnecessary work and other initialization dfa: better size-overflow check dfa: simplify transition table allocation dfa: simplify range char allocation dfa: simplify multibyte_prop allocation dfa: simplify position set and element count allocation dfa: simplify memory allocation dfa: avoid duplicate strlen when allocating memory dfa: simplify freelist dfa: simplify dfmust initialization dfa: trans reallocation microoptimization dfa: minor cleanup dfa: fix pointer type conversion bug dfa: fix bug that caused NUL to be mishandled in patterns dfa: minor improvements to previous patch grep: -P now rejects invalid input sequences in UTF-8 locales kwset: simplify Boyer-Moore with unibyte -i kwset: simplify and speed up Boyer-Moore unibyte -i in some cases dfa: omit static variables that limited dfaexec to one struct dfa dfa: fix memory leak reintroduced by previous patch build: suppress unsafe-loop-optimizations warnings dfa: minor tuneup of dfamust memory savings patch dfa: fix incorrect comment that led to heap overrun dfa: simplify and be more consistent about MB_CUR_MAX dfa: minor simplification of dfaexec misc: fix doc and test bugs re grep -z dfa: fix recently-introduced memory leak dfa: fix index bug in previous patch, and simplify kwset: improve performance when large Boyer-Moore key doesn't match kwset: speed up by using memchr2 kwset: improve performance by inlining more grep: simplify EGexecute further grep: clarify EGexecute slightly tests: improve coverage for prefix-of-multibyte grep: simplify and fix problems with KWset-DFA agreement patch dfa: minor simplification grep: fix encoding-error incompatibilities among regex, DFA, KWset grep: improve internal API for multibyte boundary grep: fix -w match next to a multibyte letter dfa: minor performance improvement for previous change dfa: clarify use of "if" doc: mention performance changes grep: simplify and clarify invert-related code maint: fix indenting to pacify 'prohibit_tab_based_indentation' dfa: don't assume unsigned int is exactly 32 bits wide dfa: assume C89 for CHAR_BIT grep: minor improvements to retry-DFA-superset patch grep: -A 0, -B 0, -C 0 now output a separator tests: add test case for -C 0 change dfa: fix bug with \< etc in multibyte locales dfa: omit double includes Stephane Chazelas (2): grep -P: fix it so backreferences now work with -w and -x align grep -Pw with grep -w Changes in gnulib since v2.18: * gnulib 497f4cd...c2e80b7 (49): > update from texinfo > autoupdate > autoupdate > autoupdate > gitlog-to-changelog: revert inclusion of git-log-fix file > maint.mk: Relax the copyright check to cater for non FSF projects > physmem: use sysinfo if _SC_PHYS_PAGES unavailable > exclude: port to strict C99 > regex: do not depend on malloc-gnu > autoupdate > expl: avoid incorrect expl(small_value) on OpenBSD 5.4 > xalloc: allow x2nrealloc (P, PN, S) where P && !*PN > fts: avoid unnecessary strlen calls > fts: avoid unnecessary strlen calls > fts: avoid unnecessary strlen calls > autoupdate > autoupdate > obstack: Remove ancient NeXTSTEP gcc support conditional > obstack: merge with glibc changes > strftime: wrap macros in "do {...} while(0)" > modechange: avoid memory leaks for invalid octal modes > autoupdate > gitlog-to-changelog: include a dummy git-log-fix file > autoupdate > update from texinfo > gitlog-to-changelog: also include the file, git-log-fix > autoupdate > regex: port to OS X 10.8.5 en_US.UTF-8 locale > maint: fix ChangeLog to match commit record > stdint, read-file: fix missing SIZE_MAX on Android (tiny change) > parse-datetime: fix crash or infloop in TZ="" parsing > * NEWS: Recent changes are not that important. > savedir: new symbol for fast-read version > unistd: port readlink to Mac OS X 10.3.9 > * NEWS: Document recent change to diffseq. > diffseq: remove TOO_EXPENSIVE heuristic > savedir: simplify by using stpcpy > spawn: fix link error on uclibc > m4: fix gl_TIMER_TIME() detection of threads on uClibc > maintainer-makefiles: provide AC_PROG_SED for older autoconf > exclude: add support for posix regexps > maintainer-makefiles: use $(SED) for syntax check > update from texinfo > savedir: add sorting arg to savedir, streamsavedir; remove fdsavedir > autoupdate > update from texinfo > update from texinfo > file-type: add support for doors and other less-common file types > update from texinfo
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.