GNU bug report logs - #26193
[0-9] versus [[:digit:]]

Previous Next

Package: grep;

Reported by: "John P. Linderman" <jpl.jpl <at> gmail.com>

Date: Mon, 20 Mar 2017 17:00:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net>
To: "John P. Linderman" <jpl.jpl <at> gmail.com>
Cc: 26193 <at> debbugs.gnu.org
Subject: bug#26193: [0-9] versus [[:digit:]]
Date: Mon, 20 Mar 2017 12:49:52 -0700
tags 26193 moreinfo
done

On Mon, Mar 20, 2017 at 8:34 AM, John P. Linderman <jpl.jpl <at> gmail.com> wrote:
> In what follows, file "conjectures" is a 6 billion bytes file in which each
> line contains at most one letter P, and few (see output) have a digit
> following a P. "rusage" is just a home-brew resource usage summary command.
>
>   rusage egrep 'P[0-9]' conjectures > xxx
> 695.55 real 688.33 user 2.40 sys 0 pf 186 pr 0 sw 0 rb 8 wb 1 vcx 19206 icx
> 2488 mx 0 ix 0 id 0 is
>
>   cat xxx
> A[21]=11{11}:22<LP3
>
>   rusage egrep 'P[[:digit:]]' conjectures > xxx
> 14.88 real 13.36 user 1.43 sys 0 pf 186 pr 0 sw 0 rb 8 wb 0 vcx 516 icx
> 2500 mx 0 ix 0 id 0 is
>
>   cat xxx
> A[21]=11{11}:22<LP3
>
> Using what is to me the more obvious [0-9] pattern takes almost 50 times as
> long as using the [[:digit:]] pattern. Seems very strange.
...

Thank you for the report. However, there have been numerous
improvements since grep-2.25, which was released nearly a year ago.
The latest is grep-3.0, and using it, I am unable to reproduce the
problem on an input of 333M lines, each of length 19, and ending in
"P":

  $ yes 'A[21]=11{11}:22<LP'| head -333000000 > /dev/shm/k
  $ env time grep 'P[0-9]' /dev/shm/k
  7.84user 2.06system 0:09.90elapsed 100%CPU (0avgtext+0avgdata
2008maxresident)k
  0inputs+0outputs (0major+97minor)pagefaults 0swaps
  [Exit 1]
  $ env time grep 'P[[:digit:]]' /dev/shm/k
  7.86user 1.96system 0:09.83elapsed 99%CPU (0avgtext+0avgdata 2004maxresident)k
  0inputs+0outputs (0major+97minor)pagefaults 0swaps
  [Exit 1]




This bug report was last modified 8 years and 66 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.