Norihiro Tanaka wrote: > s0: The position set is none. > s1: The position set is 1:a > s2: The position set is 1:a 2:CSET > s3: The position set is 2:CSET 3:b (accepted) > s4: The position set is 2:CSET Sorry, it was wrong yet. It should be as follows. s0: The position set is none. s1: The position set is 1:a 2:CSET 3:b s2: The position set is 1:a 2:CSET 3:b (accepted) First of all, I noticed that the example is wrong, because if fixed strings are included in the pattern, kwset is used for the pattern, and dfahint() is called for each single line. -- By the way, I compared two cases, which are CSET* and CSET{1,mb_cur_max}. I used `\(a\|z\)[x-y]\{10\}\(b\|z\)' as the pattern. `[x-z]' is MBCSET in UTF-8 locale. If replace MBCSET to CSET*, the pattern in the superset is `\(a\|z\)\(CSET*\)\{10\}\(b\|z\)'. If replace MBCSET to CSET{1,mb_cur_max}, the pattern in the superset is `\(a\|z\)\(CSET\{1,6\}\)\{10\}\(b\|z\)'. In order to simplify, use `\(a\|z\)\(.\{1,6\}\)\{10\}\(b\|z\)' in POSIX locale for later. Before testing, apply the patch attached on this mail, Set `DDEBUG' to CPPFLAGS, and re-build. It outputs more debugging information. $ yes `printf '%0100d\n'` | head -320 | \ sed -e '1s/^0/a/; $s/0$/b/' | \ env LANG=en_US.UTF-8 src/grep '\(a\|z\)[x-y]\{10\}\(b\|z\)' 2>debug1.log $ yes `printf '%0100d\n'` | head -320 | \ sed -e '1s/^0/a/; $s/0$/b/' | \ env LANG=C src/grep '\(a\|z\)\(.\{1,6\}\)\{10\}\(b\|z\)' 2>debug2.log (`320' originates in the default size of a buffer.) The formar is only built 3 states. OTOH, the later is built 224 dfastates. i.e. even if matched multi-lines, many states aren't necessarily built. Next, I checked performance for each case in worst text. It's best in 10 times. yes `printf '%0100d\n'` | head -2 | sed -e '1s/^0/a/; $s/0$/b/' >t yes t | head -1000000 | xargs cat >u time -p env LANG=C src/grep '\(a\|z\)\(.\{1,6\}\)\{10\}\(b\|z\)' u real 1.26 user 0.95 sys 0.28 time -p env LANG=en_US.UTF-8 src/grep '\(a\|z\)[x-y]\{10\}\(b\|z\)' u real 0.74 user 0.45 sys 0.27 Though, the later is fast, I like the former, because it's very simple. the later is complicated, also needs to update nleaves and depth. Norihiro