From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 11 11:50:34 2015 Received: (at submit) by debbugs.gnu.org; 11 Feb 2015 16:50:34 +0000 Received: from localhost ([127.0.0.1]:40298 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YLaUi-00025B-Pl for submit@debbugs.gnu.org; Wed, 11 Feb 2015 11:50:34 -0500 Received: from eggs.gnu.org ([208.118.235.92]:51410) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YLXh7-00066O-JN for submit@debbugs.gnu.org; Wed, 11 Feb 2015 08:51:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YLXgz-0006iU-3q for submit@debbugs.gnu.org; Wed, 11 Feb 2015 08:51:04 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:50039) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YLXgz-0006iQ-1n for submit@debbugs.gnu.org; Wed, 11 Feb 2015 08:51:01 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53097) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YLXgr-0001B9-Tr for bug-grep@gnu.org; Wed, 11 Feb 2015 08:51:01 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YLXgo-0006gJ-D7 for bug-grep@gnu.org; Wed, 11 Feb 2015 08:50:53 -0500 Received: from ipmail04.adl6.internode.on.net ([150.101.137.141]:56415) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YLXgn-0006fQ-93 for bug-grep@gnu.org; Wed, 11 Feb 2015 08:50:50 -0500 Received: from ppp121-45-48-63.lns20.adl2.internode.on.net (HELO [192.168.178.20]) ([121.45.48.63]) by ipmail04.adl6.internode.on.net with ESMTP; 12 Feb 2015 00:20:43 +1030 Message-ID: <54DB5E31.1050503@grouse.com.au> Date: Thu, 12 Feb 2015 00:20:41 +1030 From: sur-behoffski User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: undisclosed-recipients:; Subject: Announcing initial release of hstbm, Version 0.13. Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 11 Feb 2015 11:50:30 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) G'day, I've been working on an "exploratory fork" of GNU Grep, in order to provide a framework where I can set up a pared-down program that provides support for some simple search cases, but does not have the daunting complexity and (to some extent) extensive legacy support baggage associated with GNU Grep. This has resulted in a new project, "hstbm", which is an acronym for Hybrid Self-Tuning Boyer-Moore. The Savannah folks have kindly agreed to host hstbm as a non-GNU project at: https://savannah.nongnu.org/projects/hstbm The release can be downloaded as a separate tarball from: http://git.savannah.gnu.org/cgit/hstbm.git/snapshot/hstbm-0.13.tar.gz Git users probably want to clone the repository instead; the latest sources (which may include changes after the 0.13 release) can be retrieved via "git clone": git clone git://git.savannah.nongnu.org/hstbm.git ------------- hstbm is a partner to the "untangle" script that I've been posting to the GNU Grep over time; whereas "untangle" tries to lay the foundations for a "bottom-up" refactoring of GNU Grep code, hstbm is a "top-down" approach, reducing the layers of code between the caller, pattern interpretation (lexing and/or parsing), and the underlying search algorithm -- which is also called hstbm. Only a small fraction of regular expression syntax -- classes -- is recognised, and the class support is competent but not bullet-proof. It's best to think of the current level of pattern support as "fgrep (or grep -F) plus some class support". This level of support is enough to fulfil the needs of the string search algorithm included in this release; more support, such as anchors, optional elements, repetition, backreferences etc. are not included for now (although there's a nod in one or two places that these could included in the future). ------------- The hstbm algorithm itself is competitive with the equivalent algorithm in GNU Grep (culminating in bmexec () in src/kwset.c). In my limited testing, choosing data and patterns that are somewhat biased towards showing hstbm's strong points, it can outperform GNU Grep. I'm hopeful that these improvements can hold up over a very wide range of data and patterns, with very few or no performance regressions, such that it might be worthwhile incorporating hstbm into GNU Grep: ["repl" is a Lua script to wrap (roughly) the following shell script, run it multiple times (throwing the first iteration away), and log the results, as well as outputting them: Cmd=$* time (let j=1 \ while [ $((j++)) -lt 1000 ] ; do \ $Cmd >/dev/null \ done \ ) 2>&1 ] -------------- $ grep --version | head 1 grep (GNU grep) 2.21 $ ./h --version | head -1 hstbm (Hybrid Self-Tuning Boyer-Moore) 0.13 $ cat /usr/src/linux-3.14.32-gentoo/*/*.[ch] >csrc $ wc csrc 420778 1348027 11521841 csrc $ ./repl grep -c "123[abc]456" csrc Cmd: "grep" "-c" "123[abc]456" "csrc" 0 real 0m7.268s user 0m3.140s sys 0m3.767s real 0m7.299s user 0m3.317s sys 0m3.590s real 0m7.298s user 0m3.253s sys 0m3.657s real 0m7.245s user 0m3.233s sys 0m3.673s $ ./repl ./h -c "123[abc]456" csrc Cmd: "./h" "-c" "123[abc]456" "csrc" 0 real 0m6.570s user 0m2.237s sys 0m1.327s real 0m6.586s user 0m2.200s sys 0m1.383s real 0m6.647s user 0m2.317s sys 0m1.367s real 0m6.598s user 0m2.247s sys 0m1.367s $ ./repl grep -ci "123[abc]456" csrc Cmd: "grep" "-ci" "123[abc]456" "csrc" 0 real 0m7.307s user 0m3.163s sys 0m3.743s real 0m7.269s user 0m3.373s sys 0m3.533s real 0m7.266s user 0m3.273s sys 0m3.633s real 0m7.266s user 0m3.357s sys 0m3.553s $ ./repl ./h -ci "123[abc]456" csrc Cmd: "./h" "-ci" "123[abc]456" "csrc" 0 real 0m6.582s user 0m2.250s sys 0m1.313s real 0m6.589s user 0m2.287s sys 0m1.293s real 0m6.629s user 0m2.197s sys 0m1.370s real 0m6.637s user 0m2.217s sys 0m1.357s $ ./repl grep -c "123456[abc]" csrc Cmd: "grep" "-c" "123456[abc]" "csrc" 0 real 0m6.652s user 0m2.170s sys 0m1.407s real 0m6.659s user 0m2.257s sys 0m1.330s real 0m6.662s user 0m2.130s sys 0m1.450s real 0m6.658s user 0m2.157s sys 0m1.430s $ ./repl ./h -c "123456[abc]" csrc Cmd: "./h" "-c" "123456[abc]" "csrc" 0 real 0m6.584s user 0m2.227s sys 0m1.337s real 0m6.546s user 0m2.210s sys 0m1.360s real 0m6.546s user 0m2.180s sys 0m1.393s real 0m6.544s user 0m2.210s sys 0m1.363s $ ./repl grep -ci "123456[abc]" csrc Cmd: "grep" "-ci" "123456[abc]" "csrc" 0 real 0m6.676s user 0m2.177s sys 0m1.407s real 0m6.681s user 0m2.060s sys 0m1.537s real 0m6.671s user 0m2.250s sys 0m1.337s real 0m6.683s user 0m2.090s sys 0m1.497s $ ./repl ./h -ci "123456[abc]" csrc Cmd: "./h" "-ci" "123456[abc]" "csrc" 0 real 0m6.644s user 0m2.333s sys 0m1.233s real 0m6.689s user 0m2.113s sys 0m1.457s real 0m6.602s user 0m2.187s sys 0m1.383s real 0m6.604s user 0m2.220s sys 0m1.353s $ ./repl grep -c "123456a" csrc Cmd: "grep" "-c" "123456a" "csrc" 0 real 0m12.035s user 0m7.743s sys 0m2.490s real 0m12.024s user 0m7.763s sys 0m2.473s real 0m12.019s user 0m7.820s sys 0m2.413s real 0m12.017s user 0m7.717s sys 0m2.517s $ ./repl ./h -c "123456a" csrc Cmd: "./h" "-c" "123456a" "csrc" 0 real 0m6.266s user 0m1.923s sys 0m1.640s real 0m6.273s user 0m1.980s sys 0m1.590s real 0m6.252s user 0m2.217s sys 0m1.357s real 0m6.254s user 0m2.033s sys 0m1.540s $ ./repl grep -ci "123456a" csrc Cmd: "grep" "-ci" "123456a" "csrc" 0 real 0m15.220s user 0m11.060s sys 0m2.507s real 0m15.327s user 0m10.980s sys 0m2.583s real 0m15.215s user 0m11.090s sys 0m2.473s real 0m15.218s user 0m10.900s sys 0m2.663s $ ./repl ./h -ci "123456a" csrc Cmd: "./h" "-ci" "123456a" "csrc" 0 real 0m6.651s user 0m2.307s sys 0m1.260s real 0m6.665s user 0m2.253s sys 0m1.327s real 0m6.602s user 0m2.320s sys 0m1.243s real 0m6.611s user 0m2.243s sys 0m1.327s $ wc candide-8859-1.txt 4352 35591 221126 candide-8859-1.txt $ LC_ALL=fr_FR.iso88591 ./repl grep -c "tr[[=e=]]s" candide-8859-1.txt Cmd: "grep" "-c" "tr[[=e=]]s" "candide-8859-1.txt" 154 real 0m4.330s user 0m3.240s sys 0m0.290s real 0m4.340s user 0m3.273s sys 0m0.263s real 0m4.345s user 0m3.213s sys 0m0.323s real 0m4.346s user 0m3.213s sys 0m0.327s $ LC_ALL=fr_FR.iso88591 ./repl ./h -c "tr[[=e=]]s" candide-8859-1.txt Cmd: "./h" "-c" "tr[[=e=]]s" "candide-8859-1.txt" 154 real 0m1.376s user 0m0.043s sys 0m0.133s real 0m1.382s user 0m0.073s sys 0m0.103s real 0m1.383s user 0m0.043s sys 0m0.133s real 0m1.386s user 0m0.050s sys 0m0.127s $ LC_ALL=fr_FR.iso88591 ./repl grep -ci "tr[[=e=]]s" candide-8859-1.txt Cmd: "grep" "-ci" "tr[[=e=]]s" "candide-8859-1.txt" 155 real 0m5.704s user 0m3.357s sys 0m0.177s real 0m5.683s user 0m3.383s sys 0m0.150s real 0m5.686s user 0m3.343s sys 0m0.190s real 0m5.690s user 0m3.363s sys 0m0.173s $ LC_ALL=fr_FR.iso88591 ./repl ./h -ci "tr[[=e=]]s" candide-8859-1.txt Cmd: "./h" "-ci" "tr[[=e=]]s" "candide-8859-1.txt" 155 real 0m1.598s user 0m0.043s sys 0m0.137s real 0m1.600s user 0m0.060s sys 0m0.117s real 0m1.608s user 0m0.040s sys 0m0.140s real 0m1.609s user 0m0.070s sys 0m0.110s -------------------- While many, many features of GNU Grep are not supported, two switches, -J and -K=SKIP_LOOP_EFFICIENCY_THRESHOLD, are included. Here's an example of the debug output created when -J is specified: $ ./h -J -c "tr[[=e=]]s" candide-8859-1.txt * show_internals: Compiled pattern: Number of tokens: 4 OP_CHAR_DIRECT('t') -- 0x74 OP_CHAR_DIRECT('r') -- 0x72 OP_CHAR_DIRECT('e') -- 0x65 OP_CHAR_DIRECT('s') -- 0x73 * debug: hstbm context at 0x20ee000: - fold_table (not used): ................................ ................................ ................................ ................................ ................................ ................................ ................................ ................................ - delta1: 44444444444444444444444444444444 44444444444444444444444444444444 44444444444444444444444444444444 44444144444444444420344444444444 44444444444444444444444444444444 44444444444444444444444444444444 44444444444444444444444444444444 44444444444444444444444444444444 - Pattern: t r e s * Hybrid (memchr/memchr2) efficiency variables: - skip_loop_efficiency_threshold: 80 - nr_aliases: 0 0 0 0 - first_alias: . . . . - memchr_offset_list [negated]: 3 2 1 0 - current_memchr_offset_index: 0 * Guard (delta2) and match (delta 3/4) skip variables: - delta2 candidate(s): 4 - delta2: 4 - delta3: 4 4 4 - delta4: 4 4 * context check... [OK] 93 $ ./h -J -ci "tr[[=e=]]s" candide-8859-1.txt * show_internals: Compiled pattern: Number of tokens: 4 OP_CHAR_CLASS(ref=1) -- members:2 -- 54 74 Tt OP_CHAR_CLASS(ref=2) -- members:2 -- 52 72 Rr OP_CHAR_CLASS(ref=3) -- members:2 -- 45 65 Ee OP_CHAR_CLASS(ref=4) -- members:2 -- 53 73 Ss * debug: hstbm context at 0x1a4e000: - fold_table: ................................ !"#$%&'()*+,-./0123456789:;<=>? @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ `abcdEfghijklmnopqRsTuvwxyz{|}~. ................................ ................................ ................................ ................................ - delta1: 44444444444444444444444444444444 44444444444444444444444444444444 44444144444444444420344444444444 44444144444444444420344444444444 44444444444444444444444444444444 44444444444444444444444444444444 44444444444444444444444444444444 44444444444444444444444444444444 - Pattern: T R E S * Hybrid (memchr/memchr2) efficiency variables: - skip_loop_efficiency_threshold: 80 - nr_aliases: 1 1 1 1 - first_alias: t r e s - memchr_offset_list [negated]: 3 2 1 0 - current_memchr_offset_index: 0 * Guard (delta2) and match (delta 3/4) skip variables: - delta2 candidate(s): 4 - delta2: 4 - delta3: 4 4 4 - delta4: 4 4 * context check... [OK] 93 $ LC_ALL=fr_FR.iso88591 ./h -J -ci "tr[[=e=]]s" candide-8859-1.txt * show_internals: Compiled pattern: Number of tokens: 4 OP_CHAR_CLASS(ref=1) -- members:2 -- 54 74 Tt OP_CHAR_CLASS(ref=2) -- members:2 -- 52 72 Rr OP_CHAR_CLASS(ref=3) -- members:10 -- 45 65 c8 c9 ca cb e8 e9 ea eb EeÈÉÊËèé êë OP_CHAR_CLASS(ref=4) -- members:2 -- 53 73 Ss * debug: hstbm context at 0x1ae7000: - fold_table: ................................ !"#$%&'()*+,-./0123456789:;<=>? @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ `abcdEfghijklmnopqRsTuvwxyz{|}~. ................................ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ ÀÁÂÃÄÅÆÇEEEEÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß àáâãäåæçEEEEìíîïðñòóôõö÷øùúûüýþÿ - delta1: 44444444444444444444444444444444 44444444444444444444444444444444 44444144444444444420344444444444 44444144444444444420344444444444 44444444444444444444444444444444 44444444444444444444444444444444 44444444111144444444444444444444 44444444111144444444444444444444 - Pattern: T R E S * Hybrid (memchr/memchr2) efficiency variables: - skip_loop_efficiency_threshold: 80 - nr_aliases: 1 1 9 1 - first_alias: t r e s - memchr_offset_list [negated]: 3 2 0 - current_memchr_offset_index: 0 * Guard (delta2) and match (delta 3/4) skip variables: - delta2 candidate(s): 4 - delta2: 4 - delta3: 4 4 4 - delta4: 4 4 * context check... [OK] 155 $ -------------------- WARNING: Because hstbm supports far fewer features than GNU Grep, it takes less time to initialise. This will give hstbm an advantage when the total input is small (?? perhaps less than 4k bytes?), and should be factored out when comparing performance. (Note that both csrc and candide-8859-1.txt above are greater than 100k bytes, so this effect should not be a large factor in the above results.) $ ./repl grep -c "123[abc]456" /dev/null Cmd: "grep" "-c" "123[abc]456" "/dev/null" 0 real 0m0.990s user 0m0.073s sys 0m0.097s real 0m1.001s user 0m0.050s sys 0m0.123s real 0m0.993s user 0m0.020s sys 0m0.147s real 0m0.989s user 0m0.040s sys 0m0.127s $ ./repl ./h -c "123[abc]456" /dev/null Cmd: "./h" "-c" "123[abc]456" "/dev/null" 0 real 0m0.748s user 0m0.047s sys 0m0.103s real 0m0.758s user 0m0.057s sys 0m0.093s real 0m0.762s user 0m0.037s sys 0m0.113s real 0m0.754s user 0m0.050s sys 0m0.100s $ ./repl grep -ci "123[abc]456" /dev/null Cmd: "grep" "-ci" "123[abc]456" "/dev/null" 0 real 0m0.993s user 0m0.043s sys 0m0.127s real 0m0.998s user 0m0.057s sys 0m0.110s real 0m0.995s user 0m0.037s sys 0m0.130s real 0m0.993s user 0m0.053s sys 0m0.110s $ ./repl ./h -ci "123[abc]456" /dev/null Cmd: "./h" "-ci" "123[abc]456" "/dev/null" 0 real 0m0.768s user 0m0.037s sys 0m0.113s real 0m0.787s user 0m0.033s sys 0m0.123s real 0m0.770s user 0m0.037s sys 0m0.110s real 0m0.779s user 0m0.027s sys 0m0.127s -------------------- Have fun hacking, and reading the code... all feedback, comments, patches, etc. welcome. cheers, sur-behoffski (Brenton Hoff) Programmer, Grouse Software From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 11 11:59:40 2015 Received: (at 19837) by debbugs.gnu.org; 11 Feb 2015 16:59:40 +0000 Received: from localhost ([127.0.0.1]:40307 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YLadX-0002JI-Gs for submit@debbugs.gnu.org; Wed, 11 Feb 2015 11:59:40 -0500 Received: from fencepost.gnu.org ([208.118.235.10]:57205 ident=Debian-exim) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YLadV-0002J9-3v for 19837@debbugs.gnu.org; Wed, 11 Feb 2015 11:59:38 -0500 Received: from rgm by fencepost.gnu.org with local (Exim 4.71) (envelope-from ) id 1YLadR-0006mC-TU; Wed, 11 Feb 2015 11:59:34 -0500 From: Glenn Morris Subject: Re: bug#19837: Announcing initial release of hstbm, Version 0.13. References: <54DB5E31.1050503@grouse.com.au> to: 19837@debbugs.gnu.org Mail-Followup-To: 19837@debbugs.gnu.org Date: Wed, 11 Feb 2015 11:59:24 -0500 In-Reply-To: <54DB5E31.1050503@grouse.com.au> (sur-behoffski's message of "Thu, 12 Feb 2015 00:20:41 +1030") Message-ID: User-Agent: Gnus (www.gnus.org), GNU Emacs (www.gnu.org/software/emacs/) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 19837 Cc: sur-behoffski X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: 19837@debbugs.gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) The following mail arrived at debbugs.gnu.org via bcc, so could not be associated with a package. I guess it was meant for grep, so I have reassigned it and am sending this mail which will appear on the grep mailing list. sur-behoffski wrote: > G'day, > > I've been working on an "exploratory fork" of GNU Grep, in order > to provide a framework where I can set up a pared-down program > that provides support for some simple search cases, but does not > have the daunting complexity and (to some extent) extensive > legacy support baggage associated with GNU Grep. > > This has resulted in a new project, "hstbm", which is an acronym > for Hybrid Self-Tuning Boyer-Moore. > > The Savannah folks have kindly agreed to host hstbm as a non-GNU > project at: > > https://savannah.nongnu.org/projects/hstbm > > The release can be downloaded as a separate tarball from: > > http://git.savannah.gnu.org/cgit/hstbm.git/snapshot/hstbm-0.13.tar.gz > > Git users probably want to clone the repository instead; the > latest sources (which may include changes after the 0.13 release) > can be retrieved via "git clone": > > git clone git://git.savannah.nongnu.org/hstbm.git > > ------------- > > hstbm is a partner to the "untangle" script that I've been posting > to the GNU Grep over time; whereas "untangle" tries to lay the > foundations for a "bottom-up" refactoring of GNU Grep code, hstbm > is a "top-down" approach, reducing the layers of code between the > caller, pattern interpretation (lexing and/or parsing), and the > underlying search algorithm -- which is also called hstbm. > > Only a small fraction of regular expression syntax -- classes -- > is recognised, and the class support is competent but not > bullet-proof. It's best to think of the current level of pattern > support as "fgrep (or grep -F) plus some class support". This > level of support is enough to fulfil the needs of the string > search algorithm included in this release; more support, such as > anchors, optional elements, repetition, backreferences etc. are > not included for now (although there's a nod in one or two places > that these could included in the future). > > ------------- > > The hstbm algorithm itself is competitive with the equivalent > algorithm in GNU Grep (culminating in bmexec () in src/kwset.c). > In my limited testing, choosing data and patterns that are > somewhat biased towards showing hstbm's strong points, it can > outperform GNU Grep. I'm hopeful that these improvements can > hold up over a very wide range of data and patterns, with very > few or no performance regressions, such that it might be > worthwhile incorporating hstbm into GNU Grep: > > ["repl" is a Lua script to wrap (roughly) the following shell > script, run it multiple times (throwing the first iteration > away), and log the results, as well as outputting them: > > Cmd=$* > time (let j=1 \ > while [ $((j++)) -lt 1000 ] ; do \ > $Cmd >/dev/null \ > done \ > ) 2>&1 > ] > > -------------- > > $ grep --version | head 1 > grep (GNU grep) 2.21 > $ ./h --version | head -1 > hstbm (Hybrid Self-Tuning Boyer-Moore) 0.13 > > > $ cat /usr/src/linux-3.14.32-gentoo/*/*.[ch] >csrc > $ wc csrc > 420778 1348027 11521841 csrc > > > $ ./repl grep -c "123[abc]456" csrc > Cmd: "grep" "-c" "123[abc]456" "csrc" > 0 > real 0m7.268s user 0m3.140s sys 0m3.767s > real 0m7.299s user 0m3.317s sys 0m3.590s > real 0m7.298s user 0m3.253s sys 0m3.657s > real 0m7.245s user 0m3.233s sys 0m3.673s > > $ ./repl ./h -c "123[abc]456" csrc > Cmd: "./h" "-c" "123[abc]456" "csrc" > 0 > real 0m6.570s user 0m2.237s sys 0m1.327s > real 0m6.586s user 0m2.200s sys 0m1.383s > real 0m6.647s user 0m2.317s sys 0m1.367s > real 0m6.598s user 0m2.247s sys 0m1.367s > > $ ./repl grep -ci "123[abc]456" csrc > Cmd: "grep" "-ci" "123[abc]456" "csrc" > 0 > real 0m7.307s user 0m3.163s sys 0m3.743s > real 0m7.269s user 0m3.373s sys 0m3.533s > real 0m7.266s user 0m3.273s sys 0m3.633s > real 0m7.266s user 0m3.357s sys 0m3.553s > > $ ./repl ./h -ci "123[abc]456" csrc > Cmd: "./h" "-ci" "123[abc]456" "csrc" > 0 > real 0m6.582s user 0m2.250s sys 0m1.313s > real 0m6.589s user 0m2.287s sys 0m1.293s > real 0m6.629s user 0m2.197s sys 0m1.370s > real 0m6.637s user 0m2.217s sys 0m1.357s > > $ ./repl grep -c "123456[abc]" csrc > Cmd: "grep" "-c" "123456[abc]" "csrc" > 0 > real 0m6.652s user 0m2.170s sys 0m1.407s > real 0m6.659s user 0m2.257s sys 0m1.330s > real 0m6.662s user 0m2.130s sys 0m1.450s > real 0m6.658s user 0m2.157s sys 0m1.430s > > $ ./repl ./h -c "123456[abc]" csrc > Cmd: "./h" "-c" "123456[abc]" "csrc" > 0 > real 0m6.584s user 0m2.227s sys 0m1.337s > real 0m6.546s user 0m2.210s sys 0m1.360s > real 0m6.546s user 0m2.180s sys 0m1.393s > real 0m6.544s user 0m2.210s sys 0m1.363s > > $ ./repl grep -ci "123456[abc]" csrc > Cmd: "grep" "-ci" "123456[abc]" "csrc" > 0 > real 0m6.676s user 0m2.177s sys 0m1.407s > real 0m6.681s user 0m2.060s sys 0m1.537s > real 0m6.671s user 0m2.250s sys 0m1.337s > real 0m6.683s user 0m2.090s sys 0m1.497s > > $ ./repl ./h -ci "123456[abc]" csrc > Cmd: "./h" "-ci" "123456[abc]" "csrc" > 0 > real 0m6.644s user 0m2.333s sys 0m1.233s > real 0m6.689s user 0m2.113s sys 0m1.457s > real 0m6.602s user 0m2.187s sys 0m1.383s > real 0m6.604s user 0m2.220s sys 0m1.353s > > $ ./repl grep -c "123456a" csrc > Cmd: "grep" "-c" "123456a" "csrc" > 0 > real 0m12.035s user 0m7.743s sys 0m2.490s > real 0m12.024s user 0m7.763s sys 0m2.473s > real 0m12.019s user 0m7.820s sys 0m2.413s > real 0m12.017s user 0m7.717s sys 0m2.517s > > $ ./repl ./h -c "123456a" csrc > Cmd: "./h" "-c" "123456a" "csrc" > 0 > real 0m6.266s user 0m1.923s sys 0m1.640s > real 0m6.273s user 0m1.980s sys 0m1.590s > real 0m6.252s user 0m2.217s sys 0m1.357s > real 0m6.254s user 0m2.033s sys 0m1.540s > > $ ./repl grep -ci "123456a" csrc > Cmd: "grep" "-ci" "123456a" "csrc" > 0 > real 0m15.220s user 0m11.060s sys 0m2.507s > real 0m15.327s user 0m10.980s sys 0m2.583s > real 0m15.215s user 0m11.090s sys 0m2.473s > real 0m15.218s user 0m10.900s sys 0m2.663s > > $ ./repl ./h -ci "123456a" csrc > Cmd: "./h" "-ci" "123456a" "csrc" > 0 > real 0m6.651s user 0m2.307s sys 0m1.260s > real 0m6.665s user 0m2.253s sys 0m1.327s > real 0m6.602s user 0m2.320s sys 0m1.243s > real 0m6.611s user 0m2.243s sys 0m1.327s > > > > $ wc candide-8859-1.txt > 4352 35591 221126 candide-8859-1.txt > > > $ LC_ALL=fr_FR.iso88591 ./repl grep -c "tr[[=e=]]s" candide-8859-1.txt > Cmd: "grep" "-c" "tr[[=e=]]s" "candide-8859-1.txt" > 154 > real 0m4.330s user 0m3.240s sys 0m0.290s > real 0m4.340s user 0m3.273s sys 0m0.263s > real 0m4.345s user 0m3.213s sys 0m0.323s > real 0m4.346s user 0m3.213s sys 0m0.327s > > $ LC_ALL=fr_FR.iso88591 ./repl ./h -c "tr[[=e=]]s" candide-8859-1.txt > Cmd: "./h" "-c" "tr[[=e=]]s" "candide-8859-1.txt" > 154 > real 0m1.376s user 0m0.043s sys 0m0.133s > real 0m1.382s user 0m0.073s sys 0m0.103s > real 0m1.383s user 0m0.043s sys 0m0.133s > real 0m1.386s user 0m0.050s sys 0m0.127s > > $ LC_ALL=fr_FR.iso88591 ./repl grep -ci "tr[[=e=]]s" candide-8859-1.txt > Cmd: "grep" "-ci" "tr[[=e=]]s" "candide-8859-1.txt" > 155 > real 0m5.704s user 0m3.357s sys 0m0.177s > real 0m5.683s user 0m3.383s sys 0m0.150s > real 0m5.686s user 0m3.343s sys 0m0.190s > real 0m5.690s user 0m3.363s sys 0m0.173s > > $ LC_ALL=fr_FR.iso88591 ./repl ./h -ci "tr[[=e=]]s" candide-8859-1.txt > Cmd: "./h" "-ci" "tr[[=e=]]s" "candide-8859-1.txt" > 155 > real 0m1.598s user 0m0.043s sys 0m0.137s > real 0m1.600s user 0m0.060s sys 0m0.117s > real 0m1.608s user 0m0.040s sys 0m0.140s > real 0m1.609s user 0m0.070s sys 0m0.110s > > > -------------------- > > > While many, many features of GNU Grep are not supported, two switches, > -J and -K=SKIP_LOOP_EFFICIENCY_THRESHOLD, are included. Here's an > example of the debug output created when -J is specified: > > > > > $ ./h -J -c "tr[[=e=]]s" candide-8859-1.txt > * show_internals: Compiled pattern: Number of tokens: 4 > OP_CHAR_DIRECT('t') -- 0x74 > OP_CHAR_DIRECT('r') -- 0x72 > OP_CHAR_DIRECT('e') -- 0x65 > OP_CHAR_DIRECT('s') -- 0x73 > > * debug: hstbm context at 0x20ee000: > - fold_table (not used): > ................................ ................................ > ................................ ................................ > ................................ ................................ > ................................ ................................ > - delta1: > 44444444444444444444444444444444 44444444444444444444444444444444 > 44444444444444444444444444444444 44444144444444444420344444444444 > 44444444444444444444444444444444 44444444444444444444444444444444 > 44444444444444444444444444444444 44444444444444444444444444444444 > - Pattern: t r e s > > * Hybrid (memchr/memchr2) efficiency variables: > - skip_loop_efficiency_threshold: 80 > - nr_aliases: 0 0 0 0 > - first_alias: . . . . > - memchr_offset_list [negated]: 3 2 1 0 > - current_memchr_offset_index: 0 > > * Guard (delta2) and match (delta 3/4) skip variables: > - delta2 candidate(s): 4 > - delta2: 4 > - delta3: 4 4 4 > - delta4: 4 4 > > * context check... [OK] > 93 > $ ./h -J -ci "tr[[=e=]]s" candide-8859-1.txt > * show_internals: Compiled pattern: Number of tokens: 4 > OP_CHAR_CLASS(ref=1) -- members:2 > -- 54 74 Tt > OP_CHAR_CLASS(ref=2) -- members:2 > -- 52 72 Rr > OP_CHAR_CLASS(ref=3) -- members:2 > -- 45 65 Ee > OP_CHAR_CLASS(ref=4) -- members:2 > -- 53 73 Ss > > * debug: hstbm context at 0x1a4e000: > - fold_table: > ................................ !"#$%&'()*+,-./0123456789:;<=>? > @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ `abcdEfghijklmnopqRsTuvwxyz{|}~. > ................................ ................................ > ................................ ................................ > - delta1: > 44444444444444444444444444444444 44444444444444444444444444444444 > 44444144444444444420344444444444 44444144444444444420344444444444 > 44444444444444444444444444444444 44444444444444444444444444444444 > 44444444444444444444444444444444 44444444444444444444444444444444 > - Pattern: T R E S > > * Hybrid (memchr/memchr2) efficiency variables: > - skip_loop_efficiency_threshold: 80 > - nr_aliases: 1 1 1 1 > - first_alias: t r e s > - memchr_offset_list [negated]: 3 2 1 0 > - current_memchr_offset_index: 0 > > * Guard (delta2) and match (delta 3/4) skip variables: > - delta2 candidate(s): 4 > - delta2: 4 > - delta3: 4 4 4 > - delta4: 4 4 > > * context check... [OK] > 93 > $ LC_ALL=fr_FR.iso88591 ./h -J -ci "tr[[=e=]]s" candide-8859-1.txt > * show_internals: Compiled pattern: Number of tokens: 4 > OP_CHAR_CLASS(ref=1) -- members:2 > -- 54 74 Tt > OP_CHAR_CLASS(ref=2) -- members:2 > -- 52 72 Rr > OP_CHAR_CLASS(ref=3) -- members:10 > -- 45 65 c8 c9 ca cb e8 e9 ea eb Ee èé êë > OP_CHAR_CLASS(ref=4) -- members:2 > -- 53 73 Ss > > * debug: hstbm context at 0x1ae7000: > - fold_table: > ................................ !"#$%&'()*+,-./0123456789:;<=>? > @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ `abcdEfghijklmnopqRsTuvwxyz{|}~. > ................................ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ > Á EEEE Í ÏÐ''"" Ý àáâãäåæçEEEEìíîïðñòóôõö÷øùúûüýþÿ > - delta1: > 44444444444444444444444444444444 44444444444444444444444444444444 > 44444144444444444420344444444444 44444144444444444420344444444444 > 44444444444444444444444444444444 44444444444444444444444444444444 > 44444444111144444444444444444444 44444444111144444444444444444444 > - Pattern: T R E S > > * Hybrid (memchr/memchr2) efficiency variables: > - skip_loop_efficiency_threshold: 80 > - nr_aliases: 1 1 9 1 > - first_alias: t r e s > - memchr_offset_list [negated]: 3 2 0 > - current_memchr_offset_index: 0 > > * Guard (delta2) and match (delta 3/4) skip variables: > - delta2 candidate(s): 4 > - delta2: 4 > - delta3: 4 4 4 > - delta4: 4 4 > > * context check... [OK] > 155 > $ > > > > -------------------- > > > > WARNING: Because hstbm supports far fewer features than GNU Grep, > it takes less time to initialise. This will give hstbm an advantage > when the total input is small (?? perhaps less than 4k bytes?), and > should be factored out when comparing performance. (Note that both > csrc and candide-8859-1.txt above are greater than 100k bytes, so > this effect should not be a large factor in the above results.) > > > $ ./repl grep -c "123[abc]456" /dev/null > Cmd: "grep" "-c" "123[abc]456" "/dev/null" > 0 > real 0m0.990s user 0m0.073s sys 0m0.097s > real 0m1.001s user 0m0.050s sys 0m0.123s > real 0m0.993s user 0m0.020s sys 0m0.147s > real 0m0.989s user 0m0.040s sys 0m0.127s > > $ ./repl ./h -c "123[abc]456" /dev/null > Cmd: "./h" "-c" "123[abc]456" "/dev/null" > 0 > real 0m0.748s user 0m0.047s sys 0m0.103s > real 0m0.758s user 0m0.057s sys 0m0.093s > real 0m0.762s user 0m0.037s sys 0m0.113s > real 0m0.754s user 0m0.050s sys 0m0.100s > > $ ./repl grep -ci "123[abc]456" /dev/null > Cmd: "grep" "-ci" "123[abc]456" "/dev/null" > 0 > real 0m0.993s user 0m0.043s sys 0m0.127s > real 0m0.998s user 0m0.057s sys 0m0.110s > real 0m0.995s user 0m0.037s sys 0m0.130s > real 0m0.993s user 0m0.053s sys 0m0.110s > > $ ./repl ./h -ci "123[abc]456" /dev/null > Cmd: "./h" "-ci" "123[abc]456" "/dev/null" > 0 > real 0m0.768s user 0m0.037s sys 0m0.113s > real 0m0.787s user 0m0.033s sys 0m0.123s > real 0m0.770s user 0m0.037s sys 0m0.110s > real 0m0.779s user 0m0.027s sys 0m0.127s > > > > -------------------- > > > > Have fun hacking, and reading the code... all feedback, comments, > patches, etc. welcome. > > cheers, > > sur-behoffski (Brenton Hoff) > Programmer, Grouse Software From debbugs-submit-bounces@debbugs.gnu.org Sat May 30 16:04:40 2015 Received: (at control) by debbugs.gnu.org; 30 May 2015 20:04:41 +0000 Received: from localhost ([127.0.0.1]:33783 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Yymzo-0007X7-Fr for submit@debbugs.gnu.org; Sat, 30 May 2015 16:04:40 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:53645) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Yymzm-0007Wi-MZ for control@debbugs.gnu.org; Sat, 30 May 2015 16:04:39 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 3C93E39E801B for ; Sat, 30 May 2015 13:04:33 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9SwSbBss7Jia for ; Sat, 30 May 2015 13:04:32 -0700 (PDT) Received: from [192.168.1.9] (pool-100-32-155-148.lsanca.fios.verizon.net [100.32.155.148]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 4CC9C39E8016 for ; Sat, 30 May 2015 13:04:32 -0700 (PDT) Message-ID: <556A17D0.4000303@cs.ucla.edu> Date: Sat, 30 May 2015 13:04:32 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: control@debbugs.gnu.org Subject: grep bug maintainance Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) tag 20605 notabug close 20605 severity 20657 wishlist tag 20638 notabug close 20638 merge 20526 19985 19230 tag 19837 notabug close 19837 merge 16444 19777 close 19563 close 19486 tag 19330 notabug close 19330 tag 19193 notabug close 19193 tag 19071 notabug close 19071 tag 19005 notabug close 19005 close 19000 tag 18888 notabug close 18888 From unknown Thu Jun 19 14:13:37 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 28 Jun 2015 11:24:06 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator