GNU bug report logs - #10953
Potential logical bug in readtokens.c

Previous Next

Package: coreutils;

Reported by: Xu Zhongxing <xu_zhong_xing <at> 163.com>

Date: Tue, 6 Mar 2012 08:18:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log

Message #14 received at 10953 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eric Blake <eblake <at> redhat.com>
Cc: 10953 <at> debbugs.gnu.org, Xu Zhongxing <xu_zhong_xing <at> 163.com>,
	bug-gnulib <bug-gnulib <at> gnu.org>
Subject: Re: bug#10953: Potential logical bug in readtokens.c
Date: Tue, 06 Mar 2012 21:33:02 -0800

On 03/06/2012 03:32 PM, Eric Blake wrote:
> Why not just strchr instead of building up an isdelim bitmap?

strchr would not be right, since '\0' is valid in data and
as a delimiter.

No doubt you meant 'memchr'; but using 'memchr' would slow
down readtoken by about a factor of two.  I got this result by
timing the following benchmark on gcc-4.6.1.tar (uncompressed)
on Fedora 15 x86-64 with GCC 4.6.2:

#include <stdio.h>
#include <readtokens.h>

struct tokenbuffer t;

int main (void)
{
  for (;;)
    {
      size_t s = readtoken (stdin, " \t\n", 3, &t);
      if (s == (size_t) -1)
        return 0;
    }
}

On this benchmark, the relative speeds (user+sys CPU time ratios,
bigger numbers are better) are:

 0.54  readtoken with memchr
 1.00  current readtoken (with non-thread-safe byte array)
 1.13  proposed readtoken (with thread-safe bitset)

So the proposed patch is a performance win even in non-thread-safe use.

> And why
> are we calling getc() one character at a time, instead of using tricks
> like freadahead() to operate on a larger buffer?
> 
> Also, is readtoken() intended to be a more powerful interface than
> strtok, in which case we _do_ want to be non-threadsafe, and to have a
> readtoken_r interface that is the underlying threadsafe variant that can
> benefit from caching?

I haven't thought about these issues, but surely they are
independent of the proposed patch.

This bug report was last modified 13 years and 157 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #10953 Potential logical bug in readtokens.c

GNU bug report logs - #10953
Potential logical bug in readtokens.c