GNU bug report logs -
#26193
[0-9] versus [[:digit:]]
Previous Next
Full log
Message #8 received at 26193 <at> debbugs.gnu.org (full text, mbox):
tags 26193 moreinfo
done
On Mon, Mar 20, 2017 at 8:34 AM, John P. Linderman <jpl.jpl <at> gmail.com> wrote:
> In what follows, file "conjectures" is a 6 billion bytes file in which each
> line contains at most one letter P, and few (see output) have a digit
> following a P. "rusage" is just a home-brew resource usage summary command.
>
> rusage egrep 'P[0-9]' conjectures > xxx
> 695.55 real 688.33 user 2.40 sys 0 pf 186 pr 0 sw 0 rb 8 wb 1 vcx 19206 icx
> 2488 mx 0 ix 0 id 0 is
>
> cat xxx
> A[21]=11{11}:22<LP3
>
> rusage egrep 'P[[:digit:]]' conjectures > xxx
> 14.88 real 13.36 user 1.43 sys 0 pf 186 pr 0 sw 0 rb 8 wb 0 vcx 516 icx
> 2500 mx 0 ix 0 id 0 is
>
> cat xxx
> A[21]=11{11}:22<LP3
>
> Using what is to me the more obvious [0-9] pattern takes almost 50 times as
> long as using the [[:digit:]] pattern. Seems very strange.
...
Thank you for the report. However, there have been numerous
improvements since grep-2.25, which was released nearly a year ago.
The latest is grep-3.0, and using it, I am unable to reproduce the
problem on an input of 333M lines, each of length 19, and ending in
"P":
$ yes 'A[21]=11{11}:22<LP'| head -333000000 > /dev/shm/k
$ env time grep 'P[0-9]' /dev/shm/k
7.84user 2.06system 0:09.90elapsed 100%CPU (0avgtext+0avgdata
2008maxresident)k
0inputs+0outputs (0major+97minor)pagefaults 0swaps
[Exit 1]
$ env time grep 'P[[:digit:]]' /dev/shm/k
7.86user 1.96system 0:09.83elapsed 99%CPU (0avgtext+0avgdata 2004maxresident)k
0inputs+0outputs (0major+97minor)pagefaults 0swaps
[Exit 1]
This bug report was last modified 8 years and 66 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.