From unknown Fri Aug 15 15:35:45 2025 X-Loop: help-debbugs@gnu.org Subject: bug#17460: new snapshot available: grep-2.18.143-b298 Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sun, 11 May 2014 05:44:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 17460 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 17460@debbugs.gnu.org Cc: TP coordinator , platform-testers@gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.139978703014203 (code B ref -1); Sun, 11 May 2014 05:44:02 +0000 Received: (at submit) by debbugs.gnu.org; 11 May 2014 05:43:50 +0000 Received: from localhost ([127.0.0.1]:58635 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WjMY9-0003h0-6n for submit@debbugs.gnu.org; Sun, 11 May 2014 01:43:50 -0400 Received: from eggs.gnu.org ([208.118.235.92]:36798) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WjMY5-0003gd-Bj for submit@debbugs.gnu.org; Sun, 11 May 2014 01:43:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WjMXy-0000R8-5N for submit@debbugs.gnu.org; Sun, 11 May 2014 01:43:40 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:48591) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WjMXy-0000R2-22 for submit@debbugs.gnu.org; Sun, 11 May 2014 01:43:38 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38497) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WjMXv-0002qW-UR for bug-grep@gnu.org; Sun, 11 May 2014 01:43:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WjMXt-0000QD-PH for bug-grep@gnu.org; Sun, 11 May 2014 01:43:35 -0400 Received: from mail-yh0-x22d.google.com ([2607:f8b0:4002:c01::22d]:49795) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WjMXp-0000Ol-Qh; Sun, 11 May 2014 01:43:29 -0400 Received: by mail-yh0-f45.google.com with SMTP id b6so5215994yha.32 for ; Sat, 10 May 2014 22:43:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=ccQU7uD1Mm8XXF8qQkd1HQCwV/z+7GOUUGfU9nUrFgY=; b=YtM72NCcvXS/Nz2VngiMZhky0tF69RHn/WgTRGHulvUiF834yJhnP0eFLiKioNX4YS lAGUlWw2tYbKyvWqttVGSmpWGHtUxi97GNqRrl+u6+LyIhXAq1gE903cALoRaLcAQhok DONF0oAA6XxXJUHAk/oEPNZ/U0BAe5IxIFb3hdRxRXOfma1vGZOwF9tK4I97ZVXfy/Hd EeecM4mEWvyhaD9DxN3dlKyKH2MwuX1jZPnZ3LeXZeNFMf41ebJEApQQyWwzkQHYpJJ2 ujxl/8BTPtbDh9z/kGYvcd+anFMxhLrwMGe6+qXh3xNXnpMYINqCRC6EqW5pQzSQcec8 L8vQ== X-Received: by 10.236.197.68 with SMTP id s44mr28978809yhn.109.1399787009241; Sat, 10 May 2014 22:43:29 -0700 (PDT) MIME-Version: 1.0 Received: by 10.170.127.18 with HTTP; Sat, 10 May 2014 22:43:09 -0700 (PDT) From: Jim Meyering Date: Sat, 10 May 2014 22:43:09 -0700 X-Google-Sender-Auth: lltNXnScYXPW-ydr7gUj8UpmGW4 Message-ID: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Here's the latest, in preparation for a grep-2.19 release. Please give it a good work-out and let us know of any problems. This release includes an unusually large number of bug fixes and impressive performance improvements, thanks to a lot of work by Norihiro Tanaka and Paul Eggert. grep snapshot: http://meyering.net/grep/grep-ss.tar.xz 1.2 MB http://meyering.net/grep/grep-ss.tar.xz.sig http://meyering.net/grep/grep-2.18.143-b298.tar.xz Here are the new parts of the NEWS file, followed by git shortlog entries: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D ** Improvements Performance has improved, typically by 10% and in some cases by a factor of 200. However, performance of grep -P in UTF-8 locales has gotten worse as part of the fix for the abovementioned crashes. ** Bug fixes grep no longer mishandles patterns like [a-[.z.]], and no longer mishandles patterns like [^a] in locales that have multicharacter collating sequences so that [^a] can match a string of two characters. grep no longer mishandles an empty pattern at the end of a pattern list. [bug introduced in grep-2.5] grep -C NUM now outputs separators consistently even when NUM is zero, and similarly for grep -A NUM and grep -B NUM. [bug present since "the beginning"] grep -f no longer mishandles patterns containing NUL bytes. [bug introduced in grep-2.11] Plain grep, grep -E, and grep -F now treat encoding errors in patterns the same way the GNU regular expression matcher treats them, with respect to whether the errors can match parts of multibyte characters in data. [bug present since "the beginning"] grep -w no longer mishandles a potential match adjacent to a letter that takes up two or more bytes in a multibyte encoding. Similarly, the patterns '\<', '\>', '\b', and '\B' no longer mishandle word-boundary matches in multibyte locales. [bug present since "the beginning"] grep -P now reports an error and exits when given invalid UTF-8 data. Previously it was unreliable, and sometimes crashed or looped. [bug introduced in grep-2.16] grep -P now works with -w and -x and backreferences. Before, echo aa|grep -Pw '(.)\1' would fail to match, yet echo aa|grep -Pw '(.)\2' would match. grep -Pw now works like grep -w in that the matched string has to be preceded and followed by non-word components or the beginning and end of the line (as opposed to word boundaries before). Before, this echo a@@a| grep -Pw @@ would match, yet this echo a@@a| grep -w @@ would not. Now, they both fail to match, per the documentation on how grep's -w works. grep -i no longer mishandles patterns containing titlecase characters. For example, in a locale containing the titlecase character '=C7=88' (U+01C8 LATIN CAPITAL LETTER L WITH SMALL LETTER J), 'grep -i =C7=88' now matches both '=C7=87' (U+01C7 LATIN CAPITAL LETTER L= J) and '=C7=89' (U+01C9 LATIN SMALL LETTER LJ). =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Changes in grep since v2.18: Jim Meyering (18): maint: post-release administrivia maint: dfa: pass NULL, not 0, as 2nd arg to setlocale tests: make a performance-measuring test less system-sensitive tests: avoid false-positive failure on some AMD CPUs maint: fix "make dist" tests: placate "make syntax-check" re compare arg ordering build: avoid OS X 10.8.5 build failure due to lack of static_assert maint: avoid sc_po_check syntax-check failure (kwset.c) tests: detect an infloop-inducing bug in grep -P (pcre-8.35) dfa: avoid new NULL dereference maint: Revert "dfa: avoid new NULL dereference" build: reenable some compiler warning options tests: use consistent spelling for locale name, en_US.UTF-8 grep: fix new heap write buffer overrun gnulib: update to latest maint: make ChangeLog generation more robust maint: mark some breakless cases with /* fallthrough */ comment gnulib: update submodule to latest, and bootstrap Norihiro Tanaka (33): grep: don't match line-by-line for case-insensitive with grep and awk grep: remove trivial_case_ignore grep: optimization of bracket expression for non-UTF8 locales grep: revert removal of trivial_case_ignore grep: avoid to add same character to a bracket expression grep: optimization for fgrep with changing the macher to grep macher. grep: perform the kwset-helping DFA match in narrower range grep: take mbrtowc_cache into new member of struct dfa dfa: avoid re-building a state built previously grep: reuse multibyte DFA buffers in non-UTF8 locales grep: fix performance bug with regex in line-by-line mode grep: optimization with the superset of DFA grep: use the Galil rule for Boyer-Moore algorithm in KWSet grep: prefer regex to DFA for ANYCHAR in multibyte locales grep: no match for the empty string included in multiple patterns grep: open CSET and transform into uppercase when MB_CUR_MAX =3D=3D 1 dfa: speed up by checking multibyte characters on demand grep: speed-up for exact matching with begline and endline constraint= s. grep: may also use Boyer-Moore algorithm for case-insensitive matchin= g grep: speed-up by using memchr() in Boyer-Moore searching grep: avoid wasting memory for large patterns in dfamust grep: skip checking of multibyte character boundary, reaching at eolb= yte grep: speed up for a case to repeat failure in DFA after success in k= wset kwset: improve performance by inlining tr dfa: optimize memory allocation grep: simplify superset grep: adjust timing back to kwset when dfaisfast is true grep: fix the bug in previous patch. grep: make KWset and DFA agree about invalid sequences in patterns dfa: speed up 'dfaisfast' grep: improve performance of -v when combined with -L, -l or -q dfa: fix inconsistency in multibyte locales grep: retry DFA superset after matching multiple lines Paul Eggert (90): grep: fix multiple bugs with bracket expressions * src/dfa.c (parse_bracket_exp): Parenthesize. * src/dfa.c (prednames): POSIX allows [[:xdigit:]] to match multibyte chars. grep: remove lint grep: fix bugs with -i and titlecase grep: avoid 'inline' when it doesn't matter grep: minor tuning for mb_case_map_apply doc: describe titlecase fix better grep: fix some unlikely bugs in trivial_case_ignore grep: fix comment maint: remove differences from gnulib regex code doc: do not overpromise --ignore-case's behavior build: update gnulib submodule to latest grep: fix case-fold mismatches between DFA and regex fgrep: fix case-fold incompatibility with plain 'grep' maint: pacify 'make dist' dfa: port to freestanding DJGPP (Bug#17056) egrep, fgrep: go back to shell scripts grep: fix and simplify grep -iF optimization dfa: avoid undefined behavior egrep, fgrep: improve diagnostics from shell scripts dfa: improve port to freestanding DJGPP dfa: cache results of mbrtowc for speed dfa: avoid an indirection and port wint_t usage dfa: improve port to freestanding DJGPP grep: simplify dfa.c by having it not include mbsupport.h directly grep: minor improvements to previous patch grep: cleanup DFA superset optimization grep: minor cleanups for Galil speedups grep: simplify memory allocation in kwset grep: remove trival_case_ignore grep: prefer bool in DFA internals grep: port better to hosts with nonstandard nl_langinfo grep: remove bool_bf grep: cleanup for empty-string fix grep: cleanup for HAS_DOS_FILE_CONTENTS issue grep: improvements for the open-CSET patch build: update gnulib submodule to latest dfa: clarify memory allocation and port to IRIX dfa: avoid unnecessary work and other initialization dfa: better size-overflow check dfa: simplify transition table allocation dfa: simplify range char allocation dfa: simplify multibyte_prop allocation dfa: simplify position set and element count allocation dfa: simplify memory allocation dfa: avoid duplicate strlen when allocating memory dfa: simplify freelist dfa: simplify dfmust initialization dfa: trans reallocation microoptimization dfa: minor cleanup dfa: fix pointer type conversion bug dfa: fix bug that caused NUL to be mishandled in patterns dfa: minor improvements to previous patch grep: -P now rejects invalid input sequences in UTF-8 locales kwset: simplify Boyer-Moore with unibyte -i kwset: simplify and speed up Boyer-Moore unibyte -i in some cases dfa: omit static variables that limited dfaexec to one struct dfa dfa: fix memory leak reintroduced by previous patch build: suppress unsafe-loop-optimizations warnings dfa: minor tuneup of dfamust memory savings patch dfa: fix incorrect comment that led to heap overrun dfa: simplify and be more consistent about MB_CUR_MAX dfa: minor simplification of dfaexec misc: fix doc and test bugs re grep -z dfa: fix recently-introduced memory leak dfa: fix index bug in previous patch, and simplify kwset: improve performance when large Boyer-Moore key doesn't match kwset: speed up by using memchr2 kwset: improve performance by inlining more grep: simplify EGexecute further grep: clarify EGexecute slightly tests: improve coverage for prefix-of-multibyte grep: simplify and fix problems with KWset-DFA agreement patch dfa: minor simplification grep: fix encoding-error incompatibilities among regex, DFA, KWset grep: improve internal API for multibyte boundary grep: fix -w match next to a multibyte letter dfa: minor performance improvement for previous change dfa: clarify use of "if" doc: mention performance changes grep: simplify and clarify invert-related code maint: fix indenting to pacify 'prohibit_tab_based_indentation' dfa: don't assume unsigned int is exactly 32 bits wide dfa: assume C89 for CHAR_BIT grep: minor improvements to retry-DFA-superset patch grep: -A 0, -B 0, -C 0 now output a separator tests: add test case for -C 0 change dfa: fix bug with \< etc in multibyte locales dfa: omit double includes Stephane Chazelas (2): grep -P: fix it so backreferences now work with -w and -x align grep -Pw with grep -w Changes in gnulib since v2.18: * gnulib 497f4cd...c2e80b7 (49): > update from texinfo > autoupdate > autoupdate > autoupdate > gitlog-to-changelog: revert inclusion of git-log-fix file > maint.mk: Relax the copyright check to cater for non FSF projects > physmem: use sysinfo if _SC_PHYS_PAGES unavailable > exclude: port to strict C99 > regex: do not depend on malloc-gnu > autoupdate > expl: avoid incorrect expl(small_value) on OpenBSD 5.4 > xalloc: allow x2nrealloc (P, PN, S) where P && !*PN > fts: avoid unnecessary strlen calls > fts: avoid unnecessary strlen calls > fts: avoid unnecessary strlen calls > autoupdate > autoupdate > obstack: Remove ancient NeXTSTEP gcc support conditional > obstack: merge with glibc changes > strftime: wrap macros in "do {...} while(0)" > modechange: avoid memory leaks for invalid octal modes > autoupdate > gitlog-to-changelog: include a dummy git-log-fix file > autoupdate > update from texinfo > gitlog-to-changelog: also include the file, git-log-fix > autoupdate > regex: port to OS X 10.8.5 en_US.UTF-8 locale > maint: fix ChangeLog to match commit record > stdint, read-file: fix missing SIZE_MAX on Android (tiny change) > parse-datetime: fix crash or infloop in TZ=3D"" parsing > * NEWS: Recent changes are not that important. > savedir: new symbol for fast-read version > unistd: port readlink to Mac OS X 10.3.9 > * NEWS: Document recent change to diffseq. > diffseq: remove TOO_EXPENSIVE heuristic > savedir: simplify by using stpcpy > spawn: fix link error on uclibc > m4: fix gl_TIMER_TIME() detection of threads on uClibc > maintainer-makefiles: provide AC_PROG_SED for older autoconf > exclude: add support for posix regexps > maintainer-makefiles: use $(SED) for syntax check > update from texinfo > savedir: add sorting arg to savedir, streamsavedir; remove fdsavedir > autoupdate > update from texinfo > update from texinfo > file-type: add support for doors and other less-common file types > update from texinfo From debbugs-submit-bounces@debbugs.gnu.org Thu May 15 12:50:06 2014 Received: (at control) by debbugs.gnu.org; 15 May 2014 16:50:06 +0000 Received: from localhost ([127.0.0.1]:36296 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wkyr6-0008HF-La for submit@debbugs.gnu.org; Thu, 15 May 2014 12:50:05 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:49647) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wkyr3-0008Gf-Js for control@debbugs.gnu.org; Thu, 15 May 2014 12:50:02 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 2C29BA60062 for ; Thu, 15 May 2014 09:49:56 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id R2yCxzBb5Uzd for ; Thu, 15 May 2014 09:49:47 -0700 (PDT) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id B1B6FA60036 for ; Thu, 15 May 2014 09:49:47 -0700 (PDT) Message-ID: <5374F02B.5040502@cs.ucla.edu> Date: Thu, 15 May 2014 09:49:47 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: control@debbugs.gnu.org Subject: these bugs are fixed or obsolete Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) tag 17460 + notabug close 17460 close 17500