Package: grep;
Reported by: sur-behoffski <sur_behoffski <at> grouse.com.au>
Date: Wed, 11 Feb 2015 16:51:03 UTC
Severity: normal
Tags: notabug
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 19837 in the body.
You can then email your comments to 19837 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
help-debbugs <at> gnu.org
:bug#19837
; Package debbugs.gnu.org
.
(Wed, 11 Feb 2015 16:51:03 GMT) Full text and rfc822 format available.sur-behoffski <sur_behoffski <at> grouse.com.au>
:help-debbugs <at> gnu.org
.
(Wed, 11 Feb 2015 16:51:04 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: sur-behoffski <sur_behoffski <at> grouse.com.au> To: undisclosed-recipients:; Subject: Announcing initial release of hstbm, Version 0.13. Date: Thu, 12 Feb 2015 00:20:41 +1030
G'day, I've been working on an "exploratory fork" of GNU Grep, in order to provide a framework where I can set up a pared-down program that provides support for some simple search cases, but does not have the daunting complexity and (to some extent) extensive legacy support baggage associated with GNU Grep. This has resulted in a new project, "hstbm", which is an acronym for Hybrid Self-Tuning Boyer-Moore. The Savannah folks have kindly agreed to host hstbm as a non-GNU project at: https://savannah.nongnu.org/projects/hstbm The release can be downloaded as a separate tarball from: http://git.savannah.gnu.org/cgit/hstbm.git/snapshot/hstbm-0.13.tar.gz Git users probably want to clone the repository instead; the latest sources (which may include changes after the 0.13 release) can be retrieved via "git clone": git clone git://git.savannah.nongnu.org/hstbm.git ------------- hstbm is a partner to the "untangle" script that I've been posting to the GNU Grep over time; whereas "untangle" tries to lay the foundations for a "bottom-up" refactoring of GNU Grep code, hstbm is a "top-down" approach, reducing the layers of code between the caller, pattern interpretation (lexing and/or parsing), and the underlying search algorithm -- which is also called hstbm. Only a small fraction of regular expression syntax -- classes -- is recognised, and the class support is competent but not bullet-proof. It's best to think of the current level of pattern support as "fgrep (or grep -F) plus some class support". This level of support is enough to fulfil the needs of the string search algorithm included in this release; more support, such as anchors, optional elements, repetition, backreferences etc. are not included for now (although there's a nod in one or two places that these could included in the future). ------------- The hstbm algorithm itself is competitive with the equivalent algorithm in GNU Grep (culminating in bmexec () in src/kwset.c). In my limited testing, choosing data and patterns that are somewhat biased towards showing hstbm's strong points, it can outperform GNU Grep. I'm hopeful that these improvements can hold up over a very wide range of data and patterns, with very few or no performance regressions, such that it might be worthwhile incorporating hstbm into GNU Grep: ["repl" is a Lua script to wrap (roughly) the following shell script, run it multiple times (throwing the first iteration away), and log the results, as well as outputting them: Cmd=$* time (let j=1 \ while [ $((j++)) -lt 1000 ] ; do \ $Cmd >/dev/null \ done \ ) 2>&1 ] -------------- $ grep --version | head 1 grep (GNU grep) 2.21 $ ./h --version | head -1 hstbm (Hybrid Self-Tuning Boyer-Moore) 0.13 $ cat /usr/src/linux-3.14.32-gentoo/*/*.[ch] >csrc $ wc csrc 420778 1348027 11521841 csrc $ ./repl grep -c "123[abc]456" csrc Cmd: "grep" "-c" "123[abc]456" "csrc" 0 real 0m7.268s user 0m3.140s sys 0m3.767s real 0m7.299s user 0m3.317s sys 0m3.590s real 0m7.298s user 0m3.253s sys 0m3.657s real 0m7.245s user 0m3.233s sys 0m3.673s $ ./repl ./h -c "123[abc]456" csrc Cmd: "./h" "-c" "123[abc]456" "csrc" 0 real 0m6.570s user 0m2.237s sys 0m1.327s real 0m6.586s user 0m2.200s sys 0m1.383s real 0m6.647s user 0m2.317s sys 0m1.367s real 0m6.598s user 0m2.247s sys 0m1.367s $ ./repl grep -ci "123[abc]456" csrc Cmd: "grep" "-ci" "123[abc]456" "csrc" 0 real 0m7.307s user 0m3.163s sys 0m3.743s real 0m7.269s user 0m3.373s sys 0m3.533s real 0m7.266s user 0m3.273s sys 0m3.633s real 0m7.266s user 0m3.357s sys 0m3.553s $ ./repl ./h -ci "123[abc]456" csrc Cmd: "./h" "-ci" "123[abc]456" "csrc" 0 real 0m6.582s user 0m2.250s sys 0m1.313s real 0m6.589s user 0m2.287s sys 0m1.293s real 0m6.629s user 0m2.197s sys 0m1.370s real 0m6.637s user 0m2.217s sys 0m1.357s $ ./repl grep -c "123456[abc]" csrc Cmd: "grep" "-c" "123456[abc]" "csrc" 0 real 0m6.652s user 0m2.170s sys 0m1.407s real 0m6.659s user 0m2.257s sys 0m1.330s real 0m6.662s user 0m2.130s sys 0m1.450s real 0m6.658s user 0m2.157s sys 0m1.430s $ ./repl ./h -c "123456[abc]" csrc Cmd: "./h" "-c" "123456[abc]" "csrc" 0 real 0m6.584s user 0m2.227s sys 0m1.337s real 0m6.546s user 0m2.210s sys 0m1.360s real 0m6.546s user 0m2.180s sys 0m1.393s real 0m6.544s user 0m2.210s sys 0m1.363s $ ./repl grep -ci "123456[abc]" csrc Cmd: "grep" "-ci" "123456[abc]" "csrc" 0 real 0m6.676s user 0m2.177s sys 0m1.407s real 0m6.681s user 0m2.060s sys 0m1.537s real 0m6.671s user 0m2.250s sys 0m1.337s real 0m6.683s user 0m2.090s sys 0m1.497s $ ./repl ./h -ci "123456[abc]" csrc Cmd: "./h" "-ci" "123456[abc]" "csrc" 0 real 0m6.644s user 0m2.333s sys 0m1.233s real 0m6.689s user 0m2.113s sys 0m1.457s real 0m6.602s user 0m2.187s sys 0m1.383s real 0m6.604s user 0m2.220s sys 0m1.353s $ ./repl grep -c "123456a" csrc Cmd: "grep" "-c" "123456a" "csrc" 0 real 0m12.035s user 0m7.743s sys 0m2.490s real 0m12.024s user 0m7.763s sys 0m2.473s real 0m12.019s user 0m7.820s sys 0m2.413s real 0m12.017s user 0m7.717s sys 0m2.517s $ ./repl ./h -c "123456a" csrc Cmd: "./h" "-c" "123456a" "csrc" 0 real 0m6.266s user 0m1.923s sys 0m1.640s real 0m6.273s user 0m1.980s sys 0m1.590s real 0m6.252s user 0m2.217s sys 0m1.357s real 0m6.254s user 0m2.033s sys 0m1.540s $ ./repl grep -ci "123456a" csrc Cmd: "grep" "-ci" "123456a" "csrc" 0 real 0m15.220s user 0m11.060s sys 0m2.507s real 0m15.327s user 0m10.980s sys 0m2.583s real 0m15.215s user 0m11.090s sys 0m2.473s real 0m15.218s user 0m10.900s sys 0m2.663s $ ./repl ./h -ci "123456a" csrc Cmd: "./h" "-ci" "123456a" "csrc" 0 real 0m6.651s user 0m2.307s sys 0m1.260s real 0m6.665s user 0m2.253s sys 0m1.327s real 0m6.602s user 0m2.320s sys 0m1.243s real 0m6.611s user 0m2.243s sys 0m1.327s $ wc candide-8859-1.txt 4352 35591 221126 candide-8859-1.txt $ LC_ALL=fr_FR.iso88591 ./repl grep -c "tr[[=e=]]s" candide-8859-1.txt Cmd: "grep" "-c" "tr[[=e=]]s" "candide-8859-1.txt" 154 real 0m4.330s user 0m3.240s sys 0m0.290s real 0m4.340s user 0m3.273s sys 0m0.263s real 0m4.345s user 0m3.213s sys 0m0.323s real 0m4.346s user 0m3.213s sys 0m0.327s $ LC_ALL=fr_FR.iso88591 ./repl ./h -c "tr[[=e=]]s" candide-8859-1.txt Cmd: "./h" "-c" "tr[[=e=]]s" "candide-8859-1.txt" 154 real 0m1.376s user 0m0.043s sys 0m0.133s real 0m1.382s user 0m0.073s sys 0m0.103s real 0m1.383s user 0m0.043s sys 0m0.133s real 0m1.386s user 0m0.050s sys 0m0.127s $ LC_ALL=fr_FR.iso88591 ./repl grep -ci "tr[[=e=]]s" candide-8859-1.txt Cmd: "grep" "-ci" "tr[[=e=]]s" "candide-8859-1.txt" 155 real 0m5.704s user 0m3.357s sys 0m0.177s real 0m5.683s user 0m3.383s sys 0m0.150s real 0m5.686s user 0m3.343s sys 0m0.190s real 0m5.690s user 0m3.363s sys 0m0.173s $ LC_ALL=fr_FR.iso88591 ./repl ./h -ci "tr[[=e=]]s" candide-8859-1.txt Cmd: "./h" "-ci" "tr[[=e=]]s" "candide-8859-1.txt" 155 real 0m1.598s user 0m0.043s sys 0m0.137s real 0m1.600s user 0m0.060s sys 0m0.117s real 0m1.608s user 0m0.040s sys 0m0.140s real 0m1.609s user 0m0.070s sys 0m0.110s -------------------- While many, many features of GNU Grep are not supported, two switches, -J and -K=SKIP_LOOP_EFFICIENCY_THRESHOLD, are included. Here's an example of the debug output created when -J is specified: $ ./h -J -c "tr[[=e=]]s" candide-8859-1.txt * show_internals: Compiled pattern: Number of tokens: 4 OP_CHAR_DIRECT('t') -- 0x74 OP_CHAR_DIRECT('r') -- 0x72 OP_CHAR_DIRECT('e') -- 0x65 OP_CHAR_DIRECT('s') -- 0x73 * debug: hstbm context at 0x20ee000: - fold_table (not used): ................................ ................................ ................................ ................................ ................................ ................................ ................................ ................................ - delta1: 44444444444444444444444444444444 44444444444444444444444444444444 44444444444444444444444444444444 44444144444444444420344444444444 44444444444444444444444444444444 44444444444444444444444444444444 44444444444444444444444444444444 44444444444444444444444444444444 - Pattern: t r e s * Hybrid (memchr/memchr2) efficiency variables: - skip_loop_efficiency_threshold: 80 - nr_aliases: 0 0 0 0 - first_alias: . . . . - memchr_offset_list [negated]: 3 2 1 0 - current_memchr_offset_index: 0 * Guard (delta2) and match (delta 3/4) skip variables: - delta2 candidate(s): 4 - delta2: 4 - delta3: 4 4 4 - delta4: 4 4 * context check... [OK] 93 $ ./h -J -ci "tr[[=e=]]s" candide-8859-1.txt * show_internals: Compiled pattern: Number of tokens: 4 OP_CHAR_CLASS(ref=1) -- members:2 -- 54 74 Tt OP_CHAR_CLASS(ref=2) -- members:2 -- 52 72 Rr OP_CHAR_CLASS(ref=3) -- members:2 -- 45 65 Ee OP_CHAR_CLASS(ref=4) -- members:2 -- 53 73 Ss * debug: hstbm context at 0x1a4e000: - fold_table: ................................ !"#$%&'()*+,-./0123456789:;<=>? @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ `abcdEfghijklmnopqRsTuvwxyz{|}~. ................................ ................................ ................................ ................................ - delta1: 44444444444444444444444444444444 44444444444444444444444444444444 44444144444444444420344444444444 44444144444444444420344444444444 44444444444444444444444444444444 44444444444444444444444444444444 44444444444444444444444444444444 44444444444444444444444444444444 - Pattern: T R E S * Hybrid (memchr/memchr2) efficiency variables: - skip_loop_efficiency_threshold: 80 - nr_aliases: 1 1 1 1 - first_alias: t r e s - memchr_offset_list [negated]: 3 2 1 0 - current_memchr_offset_index: 0 * Guard (delta2) and match (delta 3/4) skip variables: - delta2 candidate(s): 4 - delta2: 4 - delta3: 4 4 4 - delta4: 4 4 * context check... [OK] 93 $ LC_ALL=fr_FR.iso88591 ./h -J -ci "tr[[=e=]]s" candide-8859-1.txt * show_internals: Compiled pattern: Number of tokens: 4 OP_CHAR_CLASS(ref=1) -- members:2 -- 54 74 Tt OP_CHAR_CLASS(ref=2) -- members:2 -- 52 72 Rr OP_CHAR_CLASS(ref=3) -- members:10 -- 45 65 c8 c9 ca cb e8 e9 ea eb EeÈÉÊËèé êë OP_CHAR_CLASS(ref=4) -- members:2 -- 53 73 Ss * debug: hstbm context at 0x1ae7000: - fold_table: ................................ !"#$%&'()*+,-./0123456789:;<=>? @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ `abcdEfghijklmnopqRsTuvwxyz{|}~. ................................ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ ÀÁÂÃÄÅÆÇEEEEÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß àáâãäåæçEEEEìíîïðñòóôõö÷øùúûüýþÿ - delta1: 44444444444444444444444444444444 44444444444444444444444444444444 44444144444444444420344444444444 44444144444444444420344444444444 44444444444444444444444444444444 44444444444444444444444444444444 44444444111144444444444444444444 44444444111144444444444444444444 - Pattern: T R E S * Hybrid (memchr/memchr2) efficiency variables: - skip_loop_efficiency_threshold: 80 - nr_aliases: 1 1 9 1 - first_alias: t r e s - memchr_offset_list [negated]: 3 2 0 - current_memchr_offset_index: 0 * Guard (delta2) and match (delta 3/4) skip variables: - delta2 candidate(s): 4 - delta2: 4 - delta3: 4 4 4 - delta4: 4 4 * context check... [OK] 155 $ -------------------- WARNING: Because hstbm supports far fewer features than GNU Grep, it takes less time to initialise. This will give hstbm an advantage when the total input is small (?? perhaps less than 4k bytes?), and should be factored out when comparing performance. (Note that both csrc and candide-8859-1.txt above are greater than 100k bytes, so this effect should not be a large factor in the above results.) $ ./repl grep -c "123[abc]456" /dev/null Cmd: "grep" "-c" "123[abc]456" "/dev/null" 0 real 0m0.990s user 0m0.073s sys 0m0.097s real 0m1.001s user 0m0.050s sys 0m0.123s real 0m0.993s user 0m0.020s sys 0m0.147s real 0m0.989s user 0m0.040s sys 0m0.127s $ ./repl ./h -c "123[abc]456" /dev/null Cmd: "./h" "-c" "123[abc]456" "/dev/null" 0 real 0m0.748s user 0m0.047s sys 0m0.103s real 0m0.758s user 0m0.057s sys 0m0.093s real 0m0.762s user 0m0.037s sys 0m0.113s real 0m0.754s user 0m0.050s sys 0m0.100s $ ./repl grep -ci "123[abc]456" /dev/null Cmd: "grep" "-ci" "123[abc]456" "/dev/null" 0 real 0m0.993s user 0m0.043s sys 0m0.127s real 0m0.998s user 0m0.057s sys 0m0.110s real 0m0.995s user 0m0.037s sys 0m0.130s real 0m0.993s user 0m0.053s sys 0m0.110s $ ./repl ./h -ci "123[abc]456" /dev/null Cmd: "./h" "-ci" "123[abc]456" "/dev/null" 0 real 0m0.768s user 0m0.037s sys 0m0.113s real 0m0.787s user 0m0.033s sys 0m0.123s real 0m0.770s user 0m0.037s sys 0m0.110s real 0m0.779s user 0m0.027s sys 0m0.127s -------------------- Have fun hacking, and reading the code... all feedback, comments, patches, etc. welcome. cheers, sur-behoffski (Brenton Hoff) Programmer, Grouse Software
bug-grep <at> gnu.org
:bug#19837
; Package grep
.
(Wed, 11 Feb 2015 17:00:02 GMT) Full text and rfc822 format available.Message #8 received at 19837 <at> debbugs.gnu.org (full text, mbox):
From: Glenn Morris <rgm <at> gnu.org> To: 19837 <at> debbugs.gnu.org Cc: sur-behoffski <sur_behoffski <at> grouse.com.au> Subject: Re: bug#19837: Announcing initial release of hstbm, Version 0.13. Date: Wed, 11 Feb 2015 11:59:24 -0500
The following mail arrived at debbugs.gnu.org via bcc, so could not be associated with a package. I guess it was meant for grep, so I have reassigned it and am sending this mail which will appear on the grep mailing list. sur-behoffski wrote: > G'day, > > I've been working on an "exploratory fork" of GNU Grep, in order > to provide a framework where I can set up a pared-down program > that provides support for some simple search cases, but does not > have the daunting complexity and (to some extent) extensive > legacy support baggage associated with GNU Grep. > > This has resulted in a new project, "hstbm", which is an acronym > for Hybrid Self-Tuning Boyer-Moore. > > The Savannah folks have kindly agreed to host hstbm as a non-GNU > project at: > > https://savannah.nongnu.org/projects/hstbm > > The release can be downloaded as a separate tarball from: > > http://git.savannah.gnu.org/cgit/hstbm.git/snapshot/hstbm-0.13.tar.gz > > Git users probably want to clone the repository instead; the > latest sources (which may include changes after the 0.13 release) > can be retrieved via "git clone": > > git clone git://git.savannah.nongnu.org/hstbm.git > > ------------- > > hstbm is a partner to the "untangle" script that I've been posting > to the GNU Grep over time; whereas "untangle" tries to lay the > foundations for a "bottom-up" refactoring of GNU Grep code, hstbm > is a "top-down" approach, reducing the layers of code between the > caller, pattern interpretation (lexing and/or parsing), and the > underlying search algorithm -- which is also called hstbm. > > Only a small fraction of regular expression syntax -- classes -- > is recognised, and the class support is competent but not > bullet-proof. It's best to think of the current level of pattern > support as "fgrep (or grep -F) plus some class support". This > level of support is enough to fulfil the needs of the string > search algorithm included in this release; more support, such as > anchors, optional elements, repetition, backreferences etc. are > not included for now (although there's a nod in one or two places > that these could included in the future). > > ------------- > > The hstbm algorithm itself is competitive with the equivalent > algorithm in GNU Grep (culminating in bmexec () in src/kwset.c). > In my limited testing, choosing data and patterns that are > somewhat biased towards showing hstbm's strong points, it can > outperform GNU Grep. I'm hopeful that these improvements can > hold up over a very wide range of data and patterns, with very > few or no performance regressions, such that it might be > worthwhile incorporating hstbm into GNU Grep: > > ["repl" is a Lua script to wrap (roughly) the following shell > script, run it multiple times (throwing the first iteration > away), and log the results, as well as outputting them: > > Cmd=$* > time (let j=1 \ > while [ $((j++)) -lt 1000 ] ; do \ > $Cmd >/dev/null \ > done \ > ) 2>&1 > ] > > -------------- > > $ grep --version | head 1 > grep (GNU grep) 2.21 > $ ./h --version | head -1 > hstbm (Hybrid Self-Tuning Boyer-Moore) 0.13 > > > $ cat /usr/src/linux-3.14.32-gentoo/*/*.[ch] >csrc > $ wc csrc > 420778 1348027 11521841 csrc > > > $ ./repl grep -c "123[abc]456" csrc > Cmd: "grep" "-c" "123[abc]456" "csrc" > 0 > real 0m7.268s user 0m3.140s sys 0m3.767s > real 0m7.299s user 0m3.317s sys 0m3.590s > real 0m7.298s user 0m3.253s sys 0m3.657s > real 0m7.245s user 0m3.233s sys 0m3.673s > > $ ./repl ./h -c "123[abc]456" csrc > Cmd: "./h" "-c" "123[abc]456" "csrc" > 0 > real 0m6.570s user 0m2.237s sys 0m1.327s > real 0m6.586s user 0m2.200s sys 0m1.383s > real 0m6.647s user 0m2.317s sys 0m1.367s > real 0m6.598s user 0m2.247s sys 0m1.367s > > $ ./repl grep -ci "123[abc]456" csrc > Cmd: "grep" "-ci" "123[abc]456" "csrc" > 0 > real 0m7.307s user 0m3.163s sys 0m3.743s > real 0m7.269s user 0m3.373s sys 0m3.533s > real 0m7.266s user 0m3.273s sys 0m3.633s > real 0m7.266s user 0m3.357s sys 0m3.553s > > $ ./repl ./h -ci "123[abc]456" csrc > Cmd: "./h" "-ci" "123[abc]456" "csrc" > 0 > real 0m6.582s user 0m2.250s sys 0m1.313s > real 0m6.589s user 0m2.287s sys 0m1.293s > real 0m6.629s user 0m2.197s sys 0m1.370s > real 0m6.637s user 0m2.217s sys 0m1.357s > > $ ./repl grep -c "123456[abc]" csrc > Cmd: "grep" "-c" "123456[abc]" "csrc" > 0 > real 0m6.652s user 0m2.170s sys 0m1.407s > real 0m6.659s user 0m2.257s sys 0m1.330s > real 0m6.662s user 0m2.130s sys 0m1.450s > real 0m6.658s user 0m2.157s sys 0m1.430s > > $ ./repl ./h -c "123456[abc]" csrc > Cmd: "./h" "-c" "123456[abc]" "csrc" > 0 > real 0m6.584s user 0m2.227s sys 0m1.337s > real 0m6.546s user 0m2.210s sys 0m1.360s > real 0m6.546s user 0m2.180s sys 0m1.393s > real 0m6.544s user 0m2.210s sys 0m1.363s > > $ ./repl grep -ci "123456[abc]" csrc > Cmd: "grep" "-ci" "123456[abc]" "csrc" > 0 > real 0m6.676s user 0m2.177s sys 0m1.407s > real 0m6.681s user 0m2.060s sys 0m1.537s > real 0m6.671s user 0m2.250s sys 0m1.337s > real 0m6.683s user 0m2.090s sys 0m1.497s > > $ ./repl ./h -ci "123456[abc]" csrc > Cmd: "./h" "-ci" "123456[abc]" "csrc" > 0 > real 0m6.644s user 0m2.333s sys 0m1.233s > real 0m6.689s user 0m2.113s sys 0m1.457s > real 0m6.602s user 0m2.187s sys 0m1.383s > real 0m6.604s user 0m2.220s sys 0m1.353s > > $ ./repl grep -c "123456a" csrc > Cmd: "grep" "-c" "123456a" "csrc" > 0 > real 0m12.035s user 0m7.743s sys 0m2.490s > real 0m12.024s user 0m7.763s sys 0m2.473s > real 0m12.019s user 0m7.820s sys 0m2.413s > real 0m12.017s user 0m7.717s sys 0m2.517s > > $ ./repl ./h -c "123456a" csrc > Cmd: "./h" "-c" "123456a" "csrc" > 0 > real 0m6.266s user 0m1.923s sys 0m1.640s > real 0m6.273s user 0m1.980s sys 0m1.590s > real 0m6.252s user 0m2.217s sys 0m1.357s > real 0m6.254s user 0m2.033s sys 0m1.540s > > $ ./repl grep -ci "123456a" csrc > Cmd: "grep" "-ci" "123456a" "csrc" > 0 > real 0m15.220s user 0m11.060s sys 0m2.507s > real 0m15.327s user 0m10.980s sys 0m2.583s > real 0m15.215s user 0m11.090s sys 0m2.473s > real 0m15.218s user 0m10.900s sys 0m2.663s > > $ ./repl ./h -ci "123456a" csrc > Cmd: "./h" "-ci" "123456a" "csrc" > 0 > real 0m6.651s user 0m2.307s sys 0m1.260s > real 0m6.665s user 0m2.253s sys 0m1.327s > real 0m6.602s user 0m2.320s sys 0m1.243s > real 0m6.611s user 0m2.243s sys 0m1.327s > > > > $ wc candide-8859-1.txt > 4352 35591 221126 candide-8859-1.txt > > > $ LC_ALL=fr_FR.iso88591 ./repl grep -c "tr[[=e=]]s" candide-8859-1.txt > Cmd: "grep" "-c" "tr[[=e=]]s" "candide-8859-1.txt" > 154 > real 0m4.330s user 0m3.240s sys 0m0.290s > real 0m4.340s user 0m3.273s sys 0m0.263s > real 0m4.345s user 0m3.213s sys 0m0.323s > real 0m4.346s user 0m3.213s sys 0m0.327s > > $ LC_ALL=fr_FR.iso88591 ./repl ./h -c "tr[[=e=]]s" candide-8859-1.txt > Cmd: "./h" "-c" "tr[[=e=]]s" "candide-8859-1.txt" > 154 > real 0m1.376s user 0m0.043s sys 0m0.133s > real 0m1.382s user 0m0.073s sys 0m0.103s > real 0m1.383s user 0m0.043s sys 0m0.133s > real 0m1.386s user 0m0.050s sys 0m0.127s > > $ LC_ALL=fr_FR.iso88591 ./repl grep -ci "tr[[=e=]]s" candide-8859-1.txt > Cmd: "grep" "-ci" "tr[[=e=]]s" "candide-8859-1.txt" > 155 > real 0m5.704s user 0m3.357s sys 0m0.177s > real 0m5.683s user 0m3.383s sys 0m0.150s > real 0m5.686s user 0m3.343s sys 0m0.190s > real 0m5.690s user 0m3.363s sys 0m0.173s > > $ LC_ALL=fr_FR.iso88591 ./repl ./h -ci "tr[[=e=]]s" candide-8859-1.txt > Cmd: "./h" "-ci" "tr[[=e=]]s" "candide-8859-1.txt" > 155 > real 0m1.598s user 0m0.043s sys 0m0.137s > real 0m1.600s user 0m0.060s sys 0m0.117s > real 0m1.608s user 0m0.040s sys 0m0.140s > real 0m1.609s user 0m0.070s sys 0m0.110s > > > -------------------- > > > While many, many features of GNU Grep are not supported, two switches, > -J and -K=SKIP_LOOP_EFFICIENCY_THRESHOLD, are included. Here's an > example of the debug output created when -J is specified: > > > > > $ ./h -J -c "tr[[=e=]]s" candide-8859-1.txt > * show_internals: Compiled pattern: Number of tokens: 4 > OP_CHAR_DIRECT('t') -- 0x74 > OP_CHAR_DIRECT('r') -- 0x72 > OP_CHAR_DIRECT('e') -- 0x65 > OP_CHAR_DIRECT('s') -- 0x73 > > * debug: hstbm context at 0x20ee000: > - fold_table (not used): > ................................ ................................ > ................................ ................................ > ................................ ................................ > ................................ ................................ > - delta1: > 44444444444444444444444444444444 44444444444444444444444444444444 > 44444444444444444444444444444444 44444144444444444420344444444444 > 44444444444444444444444444444444 44444444444444444444444444444444 > 44444444444444444444444444444444 44444444444444444444444444444444 > - Pattern: t r e s > > * Hybrid (memchr/memchr2) efficiency variables: > - skip_loop_efficiency_threshold: 80 > - nr_aliases: 0 0 0 0 > - first_alias: . . . . > - memchr_offset_list [negated]: 3 2 1 0 > - current_memchr_offset_index: 0 > > * Guard (delta2) and match (delta 3/4) skip variables: > - delta2 candidate(s): 4 > - delta2: 4 > - delta3: 4 4 4 > - delta4: 4 4 > > * context check... [OK] > 93 > $ ./h -J -ci "tr[[=e=]]s" candide-8859-1.txt > * show_internals: Compiled pattern: Number of tokens: 4 > OP_CHAR_CLASS(ref=1) -- members:2 > -- 54 74 Tt > OP_CHAR_CLASS(ref=2) -- members:2 > -- 52 72 Rr > OP_CHAR_CLASS(ref=3) -- members:2 > -- 45 65 Ee > OP_CHAR_CLASS(ref=4) -- members:2 > -- 53 73 Ss > > * debug: hstbm context at 0x1a4e000: > - fold_table: > ................................ !"#$%&'()*+,-./0123456789:;<=>? > @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ `abcdEfghijklmnopqRsTuvwxyz{|}~. > ................................ ................................ > ................................ ................................ > - delta1: > 44444444444444444444444444444444 44444444444444444444444444444444 > 44444144444444444420344444444444 44444144444444444420344444444444 > 44444444444444444444444444444444 44444444444444444444444444444444 > 44444444444444444444444444444444 44444444444444444444444444444444 > - Pattern: T R E S > > * Hybrid (memchr/memchr2) efficiency variables: > - skip_loop_efficiency_threshold: 80 > - nr_aliases: 1 1 1 1 > - first_alias: t r e s > - memchr_offset_list [negated]: 3 2 1 0 > - current_memchr_offset_index: 0 > > * Guard (delta2) and match (delta 3/4) skip variables: > - delta2 candidate(s): 4 > - delta2: 4 > - delta3: 4 4 4 > - delta4: 4 4 > > * context check... [OK] > 93 > $ LC_ALL=fr_FR.iso88591 ./h -J -ci "tr[[=e=]]s" candide-8859-1.txt > * show_internals: Compiled pattern: Number of tokens: 4 > OP_CHAR_CLASS(ref=1) -- members:2 > -- 54 74 Tt > OP_CHAR_CLASS(ref=2) -- members:2 > -- 52 72 Rr > OP_CHAR_CLASS(ref=3) -- members:10 > -- 45 65 c8 c9 ca cb e8 e9 ea eb Ee� � � � ���� ���� > OP_CHAR_CLASS(ref=4) -- members:2 > -- 53 73 Ss > > * debug: hstbm context at 0x1ae7000: > - fold_table: > ................................ !"#$%&'()*+,-./0123456789:;<=>? > @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ `abcdEfghijklmnopqRsTuvwxyz{|}~. > ................................ �������������������������������������������������������������� > � ��� � � � � � EEEE� ��� �����'�'�"�"� � � � � � � � ��� � ����������������EEEE���������������������������������������� > - delta1: > 44444444444444444444444444444444 44444444444444444444444444444444 > 44444144444444444420344444444444 44444144444444444420344444444444 > 44444444444444444444444444444444 44444444444444444444444444444444 > 44444444111144444444444444444444 44444444111144444444444444444444 > - Pattern: T R E S > > * Hybrid (memchr/memchr2) efficiency variables: > - skip_loop_efficiency_threshold: 80 > - nr_aliases: 1 1 9 1 > - first_alias: t r e s > - memchr_offset_list [negated]: 3 2 0 > - current_memchr_offset_index: 0 > > * Guard (delta2) and match (delta 3/4) skip variables: > - delta2 candidate(s): 4 > - delta2: 4 > - delta3: 4 4 4 > - delta4: 4 4 > > * context check... [OK] > 155 > $ > > > > -------------------- > > > > WARNING: Because hstbm supports far fewer features than GNU Grep, > it takes less time to initialise. This will give hstbm an advantage > when the total input is small (?? perhaps less than 4k bytes?), and > should be factored out when comparing performance. (Note that both > csrc and candide-8859-1.txt above are greater than 100k bytes, so > this effect should not be a large factor in the above results.) > > > $ ./repl grep -c "123[abc]456" /dev/null > Cmd: "grep" "-c" "123[abc]456" "/dev/null" > 0 > real 0m0.990s user 0m0.073s sys 0m0.097s > real 0m1.001s user 0m0.050s sys 0m0.123s > real 0m0.993s user 0m0.020s sys 0m0.147s > real 0m0.989s user 0m0.040s sys 0m0.127s > > $ ./repl ./h -c "123[abc]456" /dev/null > Cmd: "./h" "-c" "123[abc]456" "/dev/null" > 0 > real 0m0.748s user 0m0.047s sys 0m0.103s > real 0m0.758s user 0m0.057s sys 0m0.093s > real 0m0.762s user 0m0.037s sys 0m0.113s > real 0m0.754s user 0m0.050s sys 0m0.100s > > $ ./repl grep -ci "123[abc]456" /dev/null > Cmd: "grep" "-ci" "123[abc]456" "/dev/null" > 0 > real 0m0.993s user 0m0.043s sys 0m0.127s > real 0m0.998s user 0m0.057s sys 0m0.110s > real 0m0.995s user 0m0.037s sys 0m0.130s > real 0m0.993s user 0m0.053s sys 0m0.110s > > $ ./repl ./h -ci "123[abc]456" /dev/null > Cmd: "./h" "-ci" "123[abc]456" "/dev/null" > 0 > real 0m0.768s user 0m0.037s sys 0m0.113s > real 0m0.787s user 0m0.033s sys 0m0.123s > real 0m0.770s user 0m0.037s sys 0m0.110s > real 0m0.779s user 0m0.027s sys 0m0.127s > > > > -------------------- > > > > Have fun hacking, and reading the code... all feedback, comments, > patches, etc. welcome. > > cheers, > > sur-behoffski (Brenton Hoff) > Programmer, Grouse Software
Paul Eggert <eggert <at> cs.ucla.edu>
to control <at> debbugs.gnu.org
.
(Sat, 30 May 2015 20:05:06 GMT) Full text and rfc822 format available.Paul Eggert <eggert <at> cs.ucla.edu>
to control <at> debbugs.gnu.org
.
(Sat, 30 May 2015 20:05:07 GMT) Full text and rfc822 format available.Debbugs Internal Request <help-debbugs <at> gnu.org>
to internal_control <at> debbugs.gnu.org
.
(Sun, 28 Jun 2015 11:24:06 GMT) Full text and rfc822 format available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.