GNU bug report logs - #26193
[0-9] versus [[:digit:]]

Previous Next

Package: grep;

Reported by: "John P. Linderman" <jpl.jpl <at> gmail.com>

Date: Mon, 20 Mar 2017 17:00:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: "John P. Linderman" <jpl.jpl <at> gmail.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 26193 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>, Gnulib bugs <bug-gnulib <at> gnu.org>
Subject: bug#26193: [0-9] versus [[:digit:]]
Date: Wed, 22 Mar 2017 08:44:23 -0400
[Message part 1 (text/plain, inline)]
Thanks, all. That puts the runtimes on equal footing:

+ wc conjectures
 125441818  125441818 6249180939 conjectures
+ rusage /home/jpl/src/grep-3.0/src/grep P[[:digit:]] conjectures
A[21]=11{11}:22<LP3
5.85 real 5.14 user 0.70 sys 0 pf 118 pr 0 sw 0 rb 0 wb 1 vcx 11 icx 2420
mx 0 ix 0 id 0 is
+ rusage /home/jpl/src/grep-3.0/src/grep P[[:digit:]] conjectures
A[21]=11{11}:22<LP3
5.77 real 5.10 user 0.67 sys 0 pf 121 pr 0 sw 0 rb 0 wb 1 vcx 7 icx 2492 mx
0 ix 0 id 0 is
+ rusage /home/jpl/src/grep-3.0/src/grep P[0-9] conjectures
A[21]=11{11}:22<LP3
5.80 real 5.15 user 0.62 sys 0 pf 119 pr 0 sw 0 rb 0 wb 1 vcx 1001 icx 2424
mx 0 ix 0 id 0 is


On Wed, Mar 22, 2017 at 12:28 AM, Jim Meyering <jim <at> meyering.net> wrote:

> On Tue, Mar 21, 2017 at 7:09 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> > John P. Linderman wrote:
> >>
> >> Using what is to me the more obvious [0-9] pattern takes almost 50 times
> >> as
> >> long as using the [[:digit:]] pattern. Seems very strange.
> >
> >
> > Thanks for reporting that. In general, patterns like [a-z] can be much
> > slower than [[:lower:]] due to poorly-thought-out POSIX interfaces.
> However,
> > [0-9] is a special case: we can optimize such patterns safely if both
> ends
> > are ASCII digits. I installed the attached patch to Gnulib to do that; it
> > fixes the performance glitch you noticed, at least for me.
>
> Thank you, Paul. I confirmed that that solves it for me, too, with a
> multibyte locale. I didn't reproduce it initially because I was using
> LC_ALL=C.
>
[Message part 2 (text/html, inline)]

This bug report was last modified 8 years and 66 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.