Package: grep;
Reported by: Jim Meyering <jim <at> meyering.net>
Date: Sun, 11 May 2014 05:44:02 UTC
Severity: normal
Tags: notabug
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17460 in the body.
You can then email your comments to 17460 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-grep <at> gnu.org
:bug#17460
; Package grep
.
(Sun, 11 May 2014 05:44:02 GMT) Full text and rfc822 format available.Jim Meyering <jim <at> meyering.net>
:bug-grep <at> gnu.org
.
(Sun, 11 May 2014 05:44:03 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Jim Meyering <jim <at> meyering.net> To: bug-grep <at> gnu.org Cc: TP coordinator <coordinator <at> translationproject.org>, platform-testers <at> gnu.org Subject: new snapshot available: grep-2.18.143-b298 Date: Sat, 10 May 2014 22:43:09 -0700
Here's the latest, in preparation for a grep-2.19 release. Please give it a good work-out and let us know of any problems. This release includes an unusually large number of bug fixes and impressive performance improvements, thanks to a lot of work by Norihiro Tanaka and Paul Eggert. grep snapshot: http://meyering.net/grep/grep-ss.tar.xz 1.2 MB http://meyering.net/grep/grep-ss.tar.xz.sig http://meyering.net/grep/grep-2.18.143-b298.tar.xz Here are the new parts of the NEWS file, followed by git shortlog entries: ================================================= ** Improvements Performance has improved, typically by 10% and in some cases by a factor of 200. However, performance of grep -P in UTF-8 locales has gotten worse as part of the fix for the abovementioned crashes. ** Bug fixes grep no longer mishandles patterns like [a-[.z.]], and no longer mishandles patterns like [^a] in locales that have multicharacter collating sequences so that [^a] can match a string of two characters. grep no longer mishandles an empty pattern at the end of a pattern list. [bug introduced in grep-2.5] grep -C NUM now outputs separators consistently even when NUM is zero, and similarly for grep -A NUM and grep -B NUM. [bug present since "the beginning"] grep -f no longer mishandles patterns containing NUL bytes. [bug introduced in grep-2.11] Plain grep, grep -E, and grep -F now treat encoding errors in patterns the same way the GNU regular expression matcher treats them, with respect to whether the errors can match parts of multibyte characters in data. [bug present since "the beginning"] grep -w no longer mishandles a potential match adjacent to a letter that takes up two or more bytes in a multibyte encoding. Similarly, the patterns '\<', '\>', '\b', and '\B' no longer mishandle word-boundary matches in multibyte locales. [bug present since "the beginning"] grep -P now reports an error and exits when given invalid UTF-8 data. Previously it was unreliable, and sometimes crashed or looped. [bug introduced in grep-2.16] grep -P now works with -w and -x and backreferences. Before, echo aa|grep -Pw '(.)\1' would fail to match, yet echo aa|grep -Pw '(.)\2' would match. grep -Pw now works like grep -w in that the matched string has to be preceded and followed by non-word components or the beginning and end of the line (as opposed to word boundaries before). Before, this echo a@@a| grep -Pw @@ would match, yet this echo a@@a| grep -w @@ would not. Now, they both fail to match, per the documentation on how grep's -w works. grep -i no longer mishandles patterns containing titlecase characters. For example, in a locale containing the titlecase character 'Lj' (U+01C8 LATIN CAPITAL LETTER L WITH SMALL LETTER J), 'grep -i Lj' now matches both 'LJ' (U+01C7 LATIN CAPITAL LETTER LJ) and 'lj' (U+01C9 LATIN SMALL LETTER LJ). ================================================= Changes in grep since v2.18: Jim Meyering (18): maint: post-release administrivia maint: dfa: pass NULL, not 0, as 2nd arg to setlocale tests: make a performance-measuring test less system-sensitive tests: avoid false-positive failure on some AMD CPUs maint: fix "make dist" tests: placate "make syntax-check" re compare arg ordering build: avoid OS X 10.8.5 build failure due to lack of static_assert maint: avoid sc_po_check syntax-check failure (kwset.c) tests: detect an infloop-inducing bug in grep -P (pcre-8.35) dfa: avoid new NULL dereference maint: Revert "dfa: avoid new NULL dereference" build: reenable some compiler warning options tests: use consistent spelling for locale name, en_US.UTF-8 grep: fix new heap write buffer overrun gnulib: update to latest maint: make ChangeLog generation more robust maint: mark some breakless cases with /* fallthrough */ comment gnulib: update submodule to latest, and bootstrap Norihiro Tanaka (33): grep: don't match line-by-line for case-insensitive with grep and awk grep: remove trivial_case_ignore grep: optimization of bracket expression for non-UTF8 locales grep: revert removal of trivial_case_ignore grep: avoid to add same character to a bracket expression grep: optimization for fgrep with changing the macher to grep macher. grep: perform the kwset-helping DFA match in narrower range grep: take mbrtowc_cache into new member of struct dfa dfa: avoid re-building a state built previously grep: reuse multibyte DFA buffers in non-UTF8 locales grep: fix performance bug with regex in line-by-line mode grep: optimization with the superset of DFA grep: use the Galil rule for Boyer-Moore algorithm in KWSet grep: prefer regex to DFA for ANYCHAR in multibyte locales grep: no match for the empty string included in multiple patterns grep: open CSET and transform into uppercase when MB_CUR_MAX == 1 dfa: speed up by checking multibyte characters on demand grep: speed-up for exact matching with begline and endline constraints. grep: may also use Boyer-Moore algorithm for case-insensitive matching grep: speed-up by using memchr() in Boyer-Moore searching grep: avoid wasting memory for large patterns in dfamust grep: skip checking of multibyte character boundary, reaching at eolbyte grep: speed up for a case to repeat failure in DFA after success in kwset kwset: improve performance by inlining tr dfa: optimize memory allocation grep: simplify superset grep: adjust timing back to kwset when dfaisfast is true grep: fix the bug in previous patch. grep: make KWset and DFA agree about invalid sequences in patterns dfa: speed up 'dfaisfast' grep: improve performance of -v when combined with -L, -l or -q dfa: fix inconsistency in multibyte locales grep: retry DFA superset after matching multiple lines Paul Eggert (90): grep: fix multiple bugs with bracket expressions * src/dfa.c (parse_bracket_exp): Parenthesize. * src/dfa.c (prednames): POSIX allows [[:xdigit:]] to match multibyte chars. grep: remove lint grep: fix bugs with -i and titlecase grep: avoid 'inline' when it doesn't matter grep: minor tuning for mb_case_map_apply doc: describe titlecase fix better grep: fix some unlikely bugs in trivial_case_ignore grep: fix comment maint: remove differences from gnulib regex code doc: do not overpromise --ignore-case's behavior build: update gnulib submodule to latest grep: fix case-fold mismatches between DFA and regex fgrep: fix case-fold incompatibility with plain 'grep' maint: pacify 'make dist' dfa: port to freestanding DJGPP (Bug#17056) egrep, fgrep: go back to shell scripts grep: fix and simplify grep -iF optimization dfa: avoid undefined behavior egrep, fgrep: improve diagnostics from shell scripts dfa: improve port to freestanding DJGPP dfa: cache results of mbrtowc for speed dfa: avoid an indirection and port wint_t usage dfa: improve port to freestanding DJGPP grep: simplify dfa.c by having it not include mbsupport.h directly grep: minor improvements to previous patch grep: cleanup DFA superset optimization grep: minor cleanups for Galil speedups grep: simplify memory allocation in kwset grep: remove trival_case_ignore grep: prefer bool in DFA internals grep: port better to hosts with nonstandard nl_langinfo grep: remove bool_bf grep: cleanup for empty-string fix grep: cleanup for HAS_DOS_FILE_CONTENTS issue grep: improvements for the open-CSET patch build: update gnulib submodule to latest dfa: clarify memory allocation and port to IRIX dfa: avoid unnecessary work and other initialization dfa: better size-overflow check dfa: simplify transition table allocation dfa: simplify range char allocation dfa: simplify multibyte_prop allocation dfa: simplify position set and element count allocation dfa: simplify memory allocation dfa: avoid duplicate strlen when allocating memory dfa: simplify freelist dfa: simplify dfmust initialization dfa: trans reallocation microoptimization dfa: minor cleanup dfa: fix pointer type conversion bug dfa: fix bug that caused NUL to be mishandled in patterns dfa: minor improvements to previous patch grep: -P now rejects invalid input sequences in UTF-8 locales kwset: simplify Boyer-Moore with unibyte -i kwset: simplify and speed up Boyer-Moore unibyte -i in some cases dfa: omit static variables that limited dfaexec to one struct dfa dfa: fix memory leak reintroduced by previous patch build: suppress unsafe-loop-optimizations warnings dfa: minor tuneup of dfamust memory savings patch dfa: fix incorrect comment that led to heap overrun dfa: simplify and be more consistent about MB_CUR_MAX dfa: minor simplification of dfaexec misc: fix doc and test bugs re grep -z dfa: fix recently-introduced memory leak dfa: fix index bug in previous patch, and simplify kwset: improve performance when large Boyer-Moore key doesn't match kwset: speed up by using memchr2 kwset: improve performance by inlining more grep: simplify EGexecute further grep: clarify EGexecute slightly tests: improve coverage for prefix-of-multibyte grep: simplify and fix problems with KWset-DFA agreement patch dfa: minor simplification grep: fix encoding-error incompatibilities among regex, DFA, KWset grep: improve internal API for multibyte boundary grep: fix -w match next to a multibyte letter dfa: minor performance improvement for previous change dfa: clarify use of "if" doc: mention performance changes grep: simplify and clarify invert-related code maint: fix indenting to pacify 'prohibit_tab_based_indentation' dfa: don't assume unsigned int is exactly 32 bits wide dfa: assume C89 for CHAR_BIT grep: minor improvements to retry-DFA-superset patch grep: -A 0, -B 0, -C 0 now output a separator tests: add test case for -C 0 change dfa: fix bug with \< etc in multibyte locales dfa: omit double includes Stephane Chazelas (2): grep -P: fix it so backreferences now work with -w and -x align grep -Pw with grep -w Changes in gnulib since v2.18: * gnulib 497f4cd...c2e80b7 (49): > update from texinfo > autoupdate > autoupdate > autoupdate > gitlog-to-changelog: revert inclusion of git-log-fix file > maint.mk: Relax the copyright check to cater for non FSF projects > physmem: use sysinfo if _SC_PHYS_PAGES unavailable > exclude: port to strict C99 > regex: do not depend on malloc-gnu > autoupdate > expl: avoid incorrect expl(small_value) on OpenBSD 5.4 > xalloc: allow x2nrealloc (P, PN, S) where P && !*PN > fts: avoid unnecessary strlen calls > fts: avoid unnecessary strlen calls > fts: avoid unnecessary strlen calls > autoupdate > autoupdate > obstack: Remove ancient NeXTSTEP gcc support conditional > obstack: merge with glibc changes > strftime: wrap macros in "do {...} while(0)" > modechange: avoid memory leaks for invalid octal modes > autoupdate > gitlog-to-changelog: include a dummy git-log-fix file > autoupdate > update from texinfo > gitlog-to-changelog: also include the file, git-log-fix > autoupdate > regex: port to OS X 10.8.5 en_US.UTF-8 locale > maint: fix ChangeLog to match commit record > stdint, read-file: fix missing SIZE_MAX on Android (tiny change) > parse-datetime: fix crash or infloop in TZ="" parsing > * NEWS: Recent changes are not that important. > savedir: new symbol for fast-read version > unistd: port readlink to Mac OS X 10.3.9 > * NEWS: Document recent change to diffseq. > diffseq: remove TOO_EXPENSIVE heuristic > savedir: simplify by using stpcpy > spawn: fix link error on uclibc > m4: fix gl_TIMER_TIME() detection of threads on uClibc > maintainer-makefiles: provide AC_PROG_SED for older autoconf > exclude: add support for posix regexps > maintainer-makefiles: use $(SED) for syntax check > update from texinfo > savedir: add sorting arg to savedir, streamsavedir; remove fdsavedir > autoupdate > update from texinfo > update from texinfo > file-type: add support for doors and other less-common file types > update from texinfo
Paul Eggert <eggert <at> cs.ucla.edu>
to control <at> debbugs.gnu.org
.
(Thu, 15 May 2014 16:51:01 GMT) Full text and rfc822 format available.Paul Eggert <eggert <at> cs.ucla.edu>
to control <at> debbugs.gnu.org
.
(Thu, 15 May 2014 16:51:02 GMT) Full text and rfc822 format available.Debbugs Internal Request <help-debbugs <at> gnu.org>
to internal_control <at> debbugs.gnu.org
.
(Fri, 13 Jun 2014 11:24:03 GMT) Full text and rfc822 format available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.