GNU bug report logs - #17460
new snapshot available: grep-2.18.143-b298

Previous Next

Package: grep;

Reported by: Jim Meyering <jim <at> meyering.net>

Date: Sun, 11 May 2014 05:44:02 UTC

Severity: normal

Tags: notabug

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: bug-grep <at> gnu.org
Cc: TP coordinator <coordinator <at> translationproject.org>,
 platform-testers <at> gnu.org
Subject: new snapshot available: grep-2.18.143-b298
Date: Sat, 10 May 2014 22:43:09 -0700
Here's the latest, in preparation for a grep-2.19 release.
Please give it a good work-out and let us know of any problems.

This release includes an unusually large number of bug fixes and
impressive performance improvements, thanks to a lot of work
by Norihiro Tanaka and Paul Eggert.

grep snapshot:
  http://meyering.net/grep/grep-ss.tar.xz      1.2 MB
  http://meyering.net/grep/grep-ss.tar.xz.sig
  http://meyering.net/grep/grep-2.18.143-b298.tar.xz

Here are the new parts of the NEWS file, followed by git shortlog entries:
=================================================

** Improvements

  Performance has improved, typically by 10% and in some cases by a
  factor of 200.  However, performance of grep -P in UTF-8 locales has
  gotten worse as part of the fix for the abovementioned crashes.

** Bug fixes

  grep no longer mishandles patterns like [a-[.z.]], and no longer
  mishandles patterns like [^a] in locales that have multicharacter
  collating sequences so that [^a] can match a string of two characters.

  grep no longer mishandles an empty pattern at the end of a pattern list.
  [bug introduced in grep-2.5]

  grep -C NUM now outputs separators consistently even when NUM is zero,
  and similarly for grep -A NUM and grep -B NUM.
  [bug present since "the beginning"]

  grep -f no longer mishandles patterns containing NUL bytes.
  [bug introduced in grep-2.11]

  Plain grep, grep -E, and grep -F now treat encoding errors in patterns
  the same way the GNU regular expression matcher treats them, with respect
  to whether the errors can match parts of multibyte characters in data.
  [bug present since "the beginning"]

  grep -w no longer mishandles a potential match adjacent to a letter that
  takes up two or more bytes in a multibyte encoding.
  Similarly, the patterns '\<', '\>', '\b', and '\B' no longer
  mishandle word-boundary matches in multibyte locales.
  [bug present since "the beginning"]

  grep -P now reports an error and exits when given invalid UTF-8 data.
  Previously it was unreliable, and sometimes crashed or looped.
  [bug introduced in grep-2.16]

  grep -P now works with -w and -x and backreferences. Before,
  echo aa|grep -Pw '(.)\1' would fail to match, yet
  echo aa|grep -Pw '(.)\2' would match.

  grep -Pw now works like grep -w in that the matched string has to be
  preceded and followed by non-word components or the beginning and end
  of the line (as opposed to word boundaries before).  Before, this
  echo a@@a| grep -Pw @@ would match, yet this
  echo a@@a| grep -w @@ would not.  Now, they both fail to match,
  per the documentation on how grep's -w works.

  grep -i no longer mishandles patterns containing titlecase characters.
  For example, in a locale containing the titlecase character
  'Lj' (U+01C8 LATIN CAPITAL LETTER L WITH SMALL LETTER J),
  'grep -i Lj' now matches both 'LJ' (U+01C7 LATIN CAPITAL LETTER LJ)
  and 'lj' (U+01C9 LATIN SMALL LETTER LJ).

=================================================
Changes in grep since v2.18:

Jim Meyering (18):
      maint: post-release administrivia
      maint: dfa: pass NULL, not 0, as 2nd arg to setlocale
      tests: make a performance-measuring test less system-sensitive
      tests: avoid false-positive failure on some AMD CPUs
      maint: fix "make dist"
      tests: placate "make syntax-check" re compare arg ordering
      build: avoid OS X 10.8.5 build failure due to lack of static_assert
      maint: avoid sc_po_check syntax-check failure (kwset.c)
      tests: detect an infloop-inducing bug in grep -P (pcre-8.35)
      dfa: avoid new NULL dereference
      maint: Revert "dfa: avoid new NULL dereference"
      build: reenable some compiler warning options
      tests: use consistent spelling for locale name, en_US.UTF-8
      grep: fix new heap write buffer overrun
      gnulib: update to latest
      maint: make ChangeLog generation more robust
      maint: mark some breakless cases with /* fallthrough */ comment
      gnulib: update submodule to latest, and bootstrap

Norihiro Tanaka (33):
      grep: don't match line-by-line for case-insensitive with grep and awk
      grep: remove trivial_case_ignore
      grep: optimization of bracket expression for non-UTF8 locales
      grep: revert removal of trivial_case_ignore
      grep: avoid to add same character to a bracket expression
      grep: optimization for fgrep with changing the macher to grep macher.
      grep: perform the kwset-helping DFA match in narrower range
      grep: take mbrtowc_cache into new member of struct dfa
      dfa: avoid re-building a state built previously
      grep: reuse multibyte DFA buffers in non-UTF8 locales
      grep: fix performance bug with regex in line-by-line mode
      grep: optimization with the superset of DFA
      grep: use the Galil rule for Boyer-Moore algorithm in KWSet
      grep: prefer regex to DFA for ANYCHAR in multibyte locales
      grep: no match for the empty string included in multiple patterns
      grep: open CSET and transform into uppercase when MB_CUR_MAX == 1
      dfa: speed up by checking multibyte characters on demand
      grep: speed-up for exact matching with begline and endline constraints.
      grep: may also use Boyer-Moore algorithm for case-insensitive matching
      grep: speed-up by using memchr() in Boyer-Moore searching
      grep: avoid wasting memory for large patterns in dfamust
      grep: skip checking of multibyte character boundary, reaching at eolbyte
      grep: speed up for a case to repeat failure in DFA after success in kwset
      kwset: improve performance by inlining tr
      dfa: optimize memory allocation
      grep: simplify superset
      grep: adjust timing back to kwset when dfaisfast is true
      grep: fix the bug in previous patch.
      grep: make KWset and DFA agree about invalid sequences in patterns
      dfa: speed up 'dfaisfast'
      grep: improve performance of -v when combined with -L, -l or -q
      dfa: fix inconsistency in multibyte locales
      grep: retry DFA superset after matching multiple lines

Paul Eggert (90):
      grep: fix multiple bugs with bracket expressions
      * src/dfa.c (parse_bracket_exp): Parenthesize.
      * src/dfa.c (prednames): POSIX allows [[:xdigit:]] to match
multibyte chars.
      grep: remove lint
      grep: fix bugs with -i and titlecase
      grep: avoid 'inline' when it doesn't matter
      grep: minor tuning for mb_case_map_apply
      doc: describe titlecase fix better
      grep: fix some unlikely bugs in trivial_case_ignore
      grep: fix comment
      maint: remove differences from gnulib regex code
      doc: do not overpromise --ignore-case's behavior
      build: update gnulib submodule to latest
      grep: fix case-fold mismatches between DFA and regex
      fgrep: fix case-fold incompatibility with plain 'grep'
      maint: pacify 'make dist'
      dfa: port to freestanding DJGPP (Bug#17056)
      egrep, fgrep: go back to shell scripts
      grep: fix and simplify grep -iF optimization
      dfa: avoid undefined behavior
      egrep, fgrep: improve diagnostics from shell scripts
      dfa: improve port to freestanding DJGPP
      dfa: cache results of mbrtowc for speed
      dfa: avoid an indirection and port wint_t usage
      dfa: improve port to freestanding DJGPP
      grep: simplify dfa.c by having it not include mbsupport.h directly
      grep: minor improvements to previous patch
      grep: cleanup DFA superset optimization
      grep: minor cleanups for Galil speedups
      grep: simplify memory allocation in kwset
      grep: remove trival_case_ignore
      grep: prefer bool in DFA internals
      grep: port better to hosts with nonstandard nl_langinfo
      grep: remove bool_bf
      grep: cleanup for empty-string fix
      grep: cleanup for HAS_DOS_FILE_CONTENTS issue
      grep: improvements for the open-CSET patch
      build: update gnulib submodule to latest
      dfa: clarify memory allocation and port to IRIX
      dfa: avoid unnecessary work and other initialization
      dfa: better size-overflow check
      dfa: simplify transition table allocation
      dfa: simplify range char allocation
      dfa: simplify multibyte_prop allocation
      dfa: simplify position set and element count allocation
      dfa: simplify memory allocation
      dfa: avoid duplicate strlen when allocating memory
      dfa: simplify freelist
      dfa: simplify dfmust initialization
      dfa: trans reallocation microoptimization
      dfa: minor cleanup
      dfa: fix pointer type conversion bug
      dfa: fix bug that caused NUL to be mishandled in patterns
      dfa: minor improvements to previous patch
      grep: -P now rejects invalid input sequences in UTF-8 locales
      kwset: simplify Boyer-Moore with unibyte -i
      kwset: simplify and speed up Boyer-Moore unibyte -i in some cases
      dfa: omit static variables that limited dfaexec to one struct dfa
      dfa: fix memory leak reintroduced by previous patch
      build: suppress unsafe-loop-optimizations warnings
      dfa: minor tuneup of dfamust memory savings patch
      dfa: fix incorrect comment that led to heap overrun
      dfa: simplify and be more consistent about MB_CUR_MAX
      dfa: minor simplification of dfaexec
      misc: fix doc and test bugs re grep -z
      dfa: fix recently-introduced memory leak
      dfa: fix index bug in previous patch, and simplify
      kwset: improve performance when large Boyer-Moore key doesn't match
      kwset: speed up by using memchr2
      kwset: improve performance by inlining more
      grep: simplify EGexecute further
      grep: clarify EGexecute slightly
      tests: improve coverage for prefix-of-multibyte
      grep: simplify and fix problems with KWset-DFA agreement patch
      dfa: minor simplification
      grep: fix encoding-error incompatibilities among regex, DFA, KWset
      grep: improve internal API for multibyte boundary
      grep: fix -w match next to a multibyte letter
      dfa: minor performance improvement for previous change
      dfa: clarify use of "if"
      doc: mention performance changes
      grep: simplify and clarify invert-related code
      maint: fix indenting to pacify 'prohibit_tab_based_indentation'
      dfa: don't assume unsigned int is exactly 32 bits wide
      dfa: assume C89 for CHAR_BIT
      grep: minor improvements to retry-DFA-superset patch
      grep: -A 0, -B 0, -C 0 now output a separator
      tests: add test case for -C 0 change
      dfa: fix bug with \< etc in multibyte locales
      dfa: omit double includes

Stephane Chazelas (2):
      grep -P: fix it so backreferences now work with -w and -x
      align grep -Pw with grep -w


Changes in gnulib since v2.18:

* gnulib 497f4cd...c2e80b7 (49):
  > update from texinfo
  > autoupdate
  > autoupdate
  > autoupdate
  > gitlog-to-changelog: revert inclusion of git-log-fix file
  > maint.mk: Relax the copyright check to cater for non FSF projects
  > physmem: use sysinfo if _SC_PHYS_PAGES unavailable
  > exclude: port to strict C99
  > regex: do not depend on malloc-gnu
  > autoupdate
  > expl: avoid incorrect expl(small_value) on OpenBSD 5.4
  > xalloc: allow x2nrealloc (P, PN, S) where P && !*PN
  > fts: avoid unnecessary strlen calls
  > fts: avoid unnecessary strlen calls
  > fts: avoid unnecessary strlen calls
  > autoupdate
  > autoupdate
  > obstack: Remove ancient NeXTSTEP gcc support conditional
  > obstack: merge with glibc changes
  > strftime: wrap macros in "do {...} while(0)"
  > modechange: avoid memory leaks for invalid octal modes
  > autoupdate
  > gitlog-to-changelog: include a dummy git-log-fix file
  > autoupdate
  > update from texinfo
  > gitlog-to-changelog: also include the file, git-log-fix
  > autoupdate
  > regex: port to OS X 10.8.5 en_US.UTF-8 locale
  > maint: fix ChangeLog to match commit record
  > stdint, read-file: fix missing SIZE_MAX on Android (tiny change)
  > parse-datetime: fix crash or infloop in TZ="" parsing
  > * NEWS: Recent changes are not that important.
  > savedir: new symbol for fast-read version
  > unistd: port readlink to Mac OS X 10.3.9
  > * NEWS: Document recent change to diffseq.
  > diffseq: remove TOO_EXPENSIVE heuristic
  > savedir: simplify by using stpcpy
  > spawn: fix link error on uclibc
  > m4: fix gl_TIMER_TIME() detection of threads on uClibc
  > maintainer-makefiles: provide AC_PROG_SED for older autoconf
  > exclude: add support for posix regexps
  > maintainer-makefiles: use $(SED) for syntax check
  > update from texinfo
  > savedir: add sorting arg to savedir, streamsavedir; remove fdsavedir
  > autoupdate
  > update from texinfo
  > update from texinfo
  > file-type: add support for doors and other less-common file types
  > update from texinfo




This bug report was last modified 11 years and 65 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.