From unknown Sat Sep 06 05:55:32 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#10953 <10953@debbugs.gnu.org> To: bug#10953 <10953@debbugs.gnu.org> Subject: Status: Potential logical bug in readtokens.c Reply-To: bug#10953 <10953@debbugs.gnu.org> Date: Sat, 06 Sep 2025 12:55:32 +0000 retitle 10953 Potential logical bug in readtokens.c reassign 10953 coreutils submitter 10953 Xu Zhongxing severity 10953 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 06 03:17:02 2012 Received: (at submit) by debbugs.gnu.org; 6 Mar 2012 08:17:03 +0000 Received: from localhost ([127.0.0.1]:35163 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S4pZn-000577-7i for submit@debbugs.gnu.org; Tue, 06 Mar 2012 03:17:02 -0500 Received: from eggs.gnu.org ([208.118.235.92]:60652) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S4nRV-0001xE-7U for submit@debbugs.gnu.org; Tue, 06 Mar 2012 01:00:33 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S4nQe-00047V-6p for submit@debbugs.gnu.org; Tue, 06 Mar 2012 00:59:21 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:56785) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4nQe-00047Q-1h for submit@debbugs.gnu.org; Tue, 06 Mar 2012 00:59:20 -0500 Received: from eggs.gnu.org ([208.118.235.92]:50723) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4nQc-0002CP-Cz for bug-coreutils@gnu.org; Tue, 06 Mar 2012 00:59:19 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S4nQZ-00046z-MQ for bug-coreutils@gnu.org; Tue, 06 Mar 2012 00:59:17 -0500 Received: from m50-132.163.com ([123.125.50.132]:44726) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4nQY-00046M-GB for bug-coreutils@gnu.org; Tue, 06 Mar 2012 00:59:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Received:Message-ID:Date:From:User-Agent: MIME-Version:To:Subject:Content-Type:Content-Transfer-Encoding; bh=2JIIW8kf7Gome530rDT17BVTCcXShjf/EfvxjNagkHo=; b=YPELrLsjpkaRF uajxOcBaEX5N9y6h9dOXPrhU+BdY/bO9bg5mt4azmr5Yr0S/aTHuF699LN2YUMls 1znzDqO3WEvoRPnlBbiMcS1CQlJX+AfQl/gupyf+LbnwifLfnvz006FI8VS2hvRp 9ra5XZJvUfST8CVlos1ECnk2Drz0vs= Received: from [192.168.9.60] (unknown [124.16.137.1]) by smtp2 (Coremail) with SMTP id DNGowEAZxHCjp1VPc+_7CA--.2490S2; Tue, 06 Mar 2012 13:59:01 +0800 (CST) Message-ID: <4F55A7A3.40509@163.com> Date: Tue, 06 Mar 2012 13:58:59 +0800 From: Xu Zhongxing User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: bug-coreutils@gnu.org Subject: Potential logical bug in readtokens.c Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: 7bit X-CM-TRANSID: DNGowEAZxHCjp1VPc+_7CA--.2490S2 X-Coremail-Antispam: 1Uf129KBjDUn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73 VFW2AGmfu7bjvjm3AaLaJ3UbIYCTnIWIevJa73UjIFyTuYvjxUc5rcUUUUU X-CM-SenderInfo: h0xb6xprqjs5xlqjqiywtou0bp/1tbiTxRpYEkAZ8hDmAAAsv X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Tue, 06 Mar 2012 03:16:43 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) Hi, I think I found a potential bug. But I am not sure about it. It looks real from the code. For coreutils 8.15, in file readtokens.c, function readtoken() If delim is NULL, and saved_delim is not NULL, then the condition on line 75 is not satisfied, but the condition on line 79 is satisfied. In this case, there could be NULL pointer deference for delim[i] in line 84. - Zhongxing Xu From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 06 18:22:51 2012 Received: (at 10953) by debbugs.gnu.org; 6 Mar 2012 23:22:51 +0000 Received: from localhost ([127.0.0.1]:36673 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S53iU-0004Xo-RW for submit@debbugs.gnu.org; Tue, 06 Mar 2012 18:22:51 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:36401) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S53i9-0004XG-7X for 10953@debbugs.gnu.org; Tue, 06 Mar 2012 18:22:40 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 3418739E8008; Tue, 6 Mar 2012 15:21:32 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GEaV4XV5eK2q; Tue, 6 Mar 2012 15:21:30 -0800 (PST) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id EBF7339E8007; Tue, 6 Mar 2012 15:21:30 -0800 (PST) Message-ID: <4F569BFA.7090109@cs.ucla.edu> Date: Tue, 06 Mar 2012 15:21:30 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120209 Thunderbird/10.0.1 MIME-Version: 1.0 To: Xu Zhongxing Subject: Re: bug#10953: Potential logical bug in readtokens.c References: <4F55A7A3.40509@163.com> In-Reply-To: <4F55A7A3.40509@163.com> Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 10953 Cc: 10953@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) Thanks, I agree that code is potentially buggy. I don't see any way to trigger the bug in coreutils, but it's just asking for trouble. Here's a proposed patch. >From 4954a3517397dadd217d6244e961dd855fbadbef Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 6 Mar 2012 15:19:24 -0800 Subject: [PATCH] readtokens: avoid core dumps with unusual calling patterns Reported by Xu Zhongxing in . * lib/readtokens.c: Include limits.h. (word, bits_per_word, get_nth_bit, set_nth_bit): New. (readtoken): Don't cache the delimiters; the cache code was buggy if !delim && saved_delim, or if the new n_delim differs from the old. Also, it wasn't thread-safe. --- ChangeLog | 10 +++++++++ lib/readtokens.c | 56 +++++++++++++++++++++++------------------------------ 2 files changed, 34 insertions(+), 32 deletions(-) diff --git a/ChangeLog b/ChangeLog index 163e154..eb99d25 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,13 @@ +2012-03-06 Paul Eggert + + readtokens: avoid core dumps with unusual calling patterns + Reported by Xu Zhongxing in . + * lib/readtokens.c: Include limits.h. + (word, bits_per_word, get_nth_bit, set_nth_bit): New. + (readtoken): Don't cache the delimiters; the cache code was buggy + if !delim && saved_delim, or if the new n_delim differs from the old. + Also, it wasn't thread-safe. + 2012-03-06 Bruno Haible math: Ensure declarations of math functions. diff --git a/lib/readtokens.c b/lib/readtokens.c index 705a62b..f11d3f6 100644 --- a/lib/readtokens.c +++ b/lib/readtokens.c @@ -26,6 +26,7 @@ #include "readtokens.h" +#include #include #include #include @@ -46,6 +47,22 @@ init_tokenbuffer (token_buffer *tokenbuffer) tokenbuffer->buffer = NULL; } +typedef size_t word; +enum { bits_per_word = sizeof (word) * CHAR_BIT }; + +static bool +get_nth_bit (size_t n, word const *bitset) +{ + return bitset[n / bits_per_word] >> n % bits_per_word & 1; +} + +static void +set_nth_bit (size_t n, word *bitset) +{ + size_t one = 1; + bitset[n / bits_per_word] |= one << n % bits_per_word; +} + /* Read a token from STREAM into TOKENBUFFER. A token is delimited by any of the N_DELIM bytes in DELIM. Upon return, the token is in tokenbuffer->buffer and @@ -68,42 +85,17 @@ readtoken (FILE *stream, char *p; int c; size_t i, n; - static const char *saved_delim = NULL; - static char isdelim[256]; - bool same_delimiters; - - if (delim == NULL && saved_delim == NULL) - abort (); + word isdelim[(UCHAR_MAX + bits_per_word) / bits_per_word]; - same_delimiters = false; - if (delim != saved_delim && saved_delim != NULL) + memset (isdelim, 0, sizeof isdelim); + for (i = 0; i < n_delim; i++) { - same_delimiters = true; - for (i = 0; i < n_delim; i++) - { - if (delim[i] != saved_delim[i]) - { - same_delimiters = false; - break; - } - } - } - - if (!same_delimiters) - { - size_t j; - saved_delim = delim; - memset (isdelim, 0, sizeof isdelim); - for (j = 0; j < n_delim; j++) - { - unsigned char ch = delim[j]; - isdelim[ch] = 1; - } + unsigned char ch = delim[i]; + set_nth_bit (ch, isdelim); } - /* FIXME: don't fool with this caching. Use strchr instead. */ /* skip over any leading delimiters */ - for (c = getc (stream); c >= 0 && isdelim[c]; c = getc (stream)) + for (c = getc (stream); c >= 0 && get_nth_bit (c, isdelim); c = getc (stream)) { /* empty */ } @@ -124,7 +116,7 @@ readtoken (FILE *stream, p[i] = 0; break; } - if (isdelim[c]) + if (get_nth_bit (c, isdelim)) { p[i] = 0; break; -- 1.7.6.5 From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 06 18:33:59 2012 Received: (at 10953) by debbugs.gnu.org; 6 Mar 2012 23:33:59 +0000 Received: from localhost ([127.0.0.1]:36685 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S53tG-0004oD-9A for submit@debbugs.gnu.org; Tue, 06 Mar 2012 18:33:58 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50251) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S53t2-0004nk-0M for 10953@debbugs.gnu.org; Tue, 06 Mar 2012 18:33:46 -0500 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q26NW49r015120 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 6 Mar 2012 18:32:04 -0500 Received: from [10.3.113.91] (ovpn-113-91.phx2.redhat.com [10.3.113.91]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q26NW3pb025248; Tue, 6 Mar 2012 18:32:03 -0500 Message-ID: <4F569E72.1000508@redhat.com> Date: Tue, 06 Mar 2012 16:32:02 -0700 From: Eric Blake Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120216 Thunderbird/10.0.1 MIME-Version: 1.0 To: Paul Eggert Subject: Re: bug#10953: Potential logical bug in readtokens.c References: <4F55A7A3.40509@163.com> <4F569BFA.7090109@cs.ucla.edu> In-Reply-To: <4F569BFA.7090109@cs.ucla.edu> X-Enigmail-Version: 1.3.5 OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enigF6F6FEB8E7E93E645BE7D1FD" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: 10953 Cc: 10953@debbugs.gnu.org, Xu Zhongxing , bug-gnulib X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigF6F6FEB8E7E93E645BE7D1FD Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable [adding gnulib, since the code in question lives there] On 03/06/2012 04:21 PM, Paul Eggert wrote: > Thanks, I agree that code is potentially buggy. I don't see > any way to trigger the bug in coreutils, but it's just asking > for trouble. Here's a proposed patch. >=20 >>>From 4954a3517397dadd217d6244e961dd855fbadbef Mon Sep 17 00:00:00 2001 > From: Paul Eggert > Date: Tue, 6 Mar 2012 15:19:24 -0800 > Subject: [PATCH] readtokens: avoid core dumps with unusual calling patt= erns >=20 > Reported by Xu Zhongxing in . > * lib/readtokens.c: Include limits.h. > (word, bits_per_word, get_nth_bit, set_nth_bit): New. > (readtoken): Don't cache the delimiters; the cache code was buggy > if !delim && saved_delim, or if the new n_delim differs from the old. > Also, it wasn't thread-safe. The fix looks right to me from a correctness perspective, but I'm afraid we're still sub-optimal in performance. > --- > ChangeLog | 10 +++++++++ > lib/readtokens.c | 56 +++++++++++++++++++++++-----------------------= ------- > 2 files changed, 34 insertions(+), 32 deletions(-) >=20 > diff --git a/ChangeLog b/ChangeLog > index 163e154..eb99d25 100644 > --- a/ChangeLog > +++ b/ChangeLog > @@ -1,3 +1,13 @@ > +2012-03-06 Paul Eggert > + > + readtokens: avoid core dumps with unusual calling patterns > + Reported by Xu Zhongxing in . > + * lib/readtokens.c: Include limits.h. > + (word, bits_per_word, get_nth_bit, set_nth_bit): New. > + (readtoken): Don't cache the delimiters; the cache code was buggy > + if !delim && saved_delim, or if the new n_delim differs from the old.= > + Also, it wasn't thread-safe. > + > 2012-03-06 Bruno Haible > =20 > math: Ensure declarations of math functions. > diff --git a/lib/readtokens.c b/lib/readtokens.c > index 705a62b..f11d3f6 100644 > --- a/lib/readtokens.c > +++ b/lib/readtokens.c > @@ -26,6 +26,7 @@ > =20 > #include "readtokens.h" > =20 > +#include > #include > #include > #include > @@ -46,6 +47,22 @@ init_tokenbuffer (token_buffer *tokenbuffer) > tokenbuffer->buffer =3D NULL; > } > =20 > +typedef size_t word; > +enum { bits_per_word =3D sizeof (word) * CHAR_BIT }; > + > +static bool > +get_nth_bit (size_t n, word const *bitset) > +{ > + return bitset[n / bits_per_word] >> n % bits_per_word & 1; > +} > + > +static void > +set_nth_bit (size_t n, word *bitset) > +{ > + size_t one =3D 1; > + bitset[n / bits_per_word] |=3D one << n % bits_per_word; > +} > + > /* Read a token from STREAM into TOKENBUFFER. > A token is delimited by any of the N_DELIM bytes in DELIM. > Upon return, the token is in tokenbuffer->buffer and > @@ -68,42 +85,17 @@ readtoken (FILE *stream, > char *p; > int c; > size_t i, n; > - static const char *saved_delim =3D NULL; > - static char isdelim[256]; > - bool same_delimiters; > - > - if (delim =3D=3D NULL && saved_delim =3D=3D NULL) > - abort (); > + word isdelim[(UCHAR_MAX + bits_per_word) / bits_per_word]; > =20 > - same_delimiters =3D false; > - if (delim !=3D saved_delim && saved_delim !=3D NULL) > + memset (isdelim, 0, sizeof isdelim); > + for (i =3D 0; i < n_delim; i++) > { > - same_delimiters =3D true; > - for (i =3D 0; i < n_delim; i++) > - { > - if (delim[i] !=3D saved_delim[i]) > - { > - same_delimiters =3D false; > - break; > - } > - } > - } > - > - if (!same_delimiters) > - { > - size_t j; > - saved_delim =3D delim; > - memset (isdelim, 0, sizeof isdelim); > - for (j =3D 0; j < n_delim; j++) > - { > - unsigned char ch =3D delim[j]; > - isdelim[ch] =3D 1; > - } > + unsigned char ch =3D delim[i]; > + set_nth_bit (ch, isdelim); > } > =20 > - /* FIXME: don't fool with this caching. Use strchr instead. */ > /* skip over any leading delimiters */ > - for (c =3D getc (stream); c >=3D 0 && isdelim[c]; c =3D getc (stream= )) > + for (c =3D getc (stream); c >=3D 0 && get_nth_bit (c, isdelim); c =3D= getc (stream)) Why not just strchr instead of building up an isdelim bitmap? And why are we calling getc() one character at a time, instead of using tricks like freadahead() to operate on a larger buffer? Also, is readtoken() intended to be a more powerful interface than strtok, in which case we _do_ want to be non-threadsafe, and to have a readtoken_r interface that is the underlying threadsafe variant that can benefit from caching? > { > /* empty */ > } > @@ -124,7 +116,7 @@ readtoken (FILE *stream, > p[i] =3D 0; > break; > } > - if (isdelim[c]) > + if (get_nth_bit (c, isdelim)) > { > p[i] =3D 0; > break; --=20 Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --------------enigF6F6FEB8E7E93E645BE7D1FD Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJPVp5yAAoJEKeha0olJ0Nqyd0H/0XL8GvGffdsxqE8uHl8eFcE 0cl1X3fByzv6TcXpahNbmqbxH6E+0E9b478P4RrV4fhbqrXBO45/e4IBiRRtpwu+ lXQPc1cExK+hIkLzx++/86YYgghWgc2iAhMnigLXKo005PjlKp91lOd4z4Tef/IX 5jCaBneTR66+RgHmN6VSJjThUVZRfAm4110G83nxM81xosK1Oz2CmWua9KorpuEL flhH21OwUaLSV8sBsKOb1mnz60gfF6K5mRec9DX3+VIpOh8134I9Vs1VCwaQUlkk CUbGNGkGkFNsvEIDo2WJZ52do5klqDHeBhrOhwwTLxS3Bh5NLCU00PAumoFDeOQ= =2W2s -----END PGP SIGNATURE----- --------------enigF6F6FEB8E7E93E645BE7D1FD-- From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 07 00:34:22 2012 Received: (at 10953) by debbugs.gnu.org; 7 Mar 2012 05:34:22 +0000 Received: from localhost ([127.0.0.1]:36863 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S59W1-0005Lk-4X for submit@debbugs.gnu.org; Wed, 07 Mar 2012 00:34:21 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:44627) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S59Vf-0005Ky-W0 for 10953@debbugs.gnu.org; Wed, 07 Mar 2012 00:34:10 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id BB20639E800D; Tue, 6 Mar 2012 21:33:01 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id t-QVxvRRaxIz; Tue, 6 Mar 2012 21:33:01 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 2826939E800C; Tue, 6 Mar 2012 21:33:01 -0800 (PST) Message-ID: <4F56F30E.40205@cs.ucla.edu> Date: Tue, 06 Mar 2012 21:33:02 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Eric Blake Subject: Re: bug#10953: Potential logical bug in readtokens.c References: <4F55A7A3.40509@163.com> <4F569BFA.7090109@cs.ucla.edu> <4F569E72.1000508@redhat.com> In-Reply-To: <4F569E72.1000508@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 10953 Cc: 10953@debbugs.gnu.org, Xu Zhongxing , bug-gnulib X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On 03/06/2012 03:32 PM, Eric Blake wrote: > Why not just strchr instead of building up an isdelim bitmap? strchr would not be right, since '\0' is valid in data and as a delimiter. No doubt you meant 'memchr'; but using 'memchr' would slow down readtoken by about a factor of two. I got this result by timing the following benchmark on gcc-4.6.1.tar (uncompressed) on Fedora 15 x86-64 with GCC 4.6.2: #include #include struct tokenbuffer t; int main (void) { for (;;) { size_t s = readtoken (stdin, " \t\n", 3, &t); if (s == (size_t) -1) return 0; } } On this benchmark, the relative speeds (user+sys CPU time ratios, bigger numbers are better) are: 0.54 readtoken with memchr 1.00 current readtoken (with non-thread-safe byte array) 1.13 proposed readtoken (with thread-safe bitset) So the proposed patch is a performance win even in non-thread-safe use. > And why > are we calling getc() one character at a time, instead of using tricks > like freadahead() to operate on a larger buffer? > > Also, is readtoken() intended to be a more powerful interface than > strtok, in which case we _do_ want to be non-threadsafe, and to have a > readtoken_r interface that is the underlying threadsafe variant that can > benefit from caching? I haven't thought about these issues, but surely they are independent of the proposed patch. From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 07 14:53:28 2012 Received: (at 10953) by debbugs.gnu.org; 7 Mar 2012 19:53:28 +0000 Received: from localhost ([127.0.0.1]:37910 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S5MvP-0001Mb-LR for submit@debbugs.gnu.org; Wed, 07 Mar 2012 14:53:28 -0500 Received: from mx.meyering.net ([88.168.87.75]:43832) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S5Mv2-0001Ln-Rq for 10953@debbugs.gnu.org; Wed, 07 Mar 2012 14:53:16 -0500 Received: from rho.meyering.net (localhost.localdomain [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id BDE1860132; Wed, 7 Mar 2012 20:52:02 +0100 (CET) From: Jim Meyering To: Paul Eggert Subject: Re: bug#10953: Potential logical bug in readtokens.c In-Reply-To: <4F569BFA.7090109@cs.ucla.edu> (Paul Eggert's message of "Tue, 06 Mar 2012 15:21:30 -0800") References: <4F55A7A3.40509@163.com> <4F569BFA.7090109@cs.ucla.edu> Date: Wed, 07 Mar 2012 20:52:02 +0100 Message-ID: <87hay0rxy5.fsf@rho.meyering.net> Lines: 216 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 10953 Cc: 10953@debbugs.gnu.org, Xu Zhongxing X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) Paul Eggert wrote: > Thanks, I agree that code is potentially buggy. I don't see > any way to trigger the bug in coreutils, but it's just asking > for trouble. Here's a proposed patch. Hi Paul, Thanks for working on that. I had applied, tested and reviewed it earlier today, as I was writing the changes below. I've just noticed that I pushed it along with the quotearg/quote.h fix that I pushed earlier today. Must remember always to put under-review change-sets on separate topic branch, not on master. Sorry for the lapse. >>>From 4954a3517397dadd217d6244e961dd855fbadbef Mon Sep 17 00:00:00 2001 > From: Paul Eggert > Date: Tue, 6 Mar 2012 15:19:24 -0800 > Subject: [PATCH] readtokens: avoid core dumps with unusual calling patterns > > Reported by Xu Zhongxing in . > * lib/readtokens.c: Include limits.h. > (word, bits_per_word, get_nth_bit, set_nth_bit): New. > (readtoken): Don't cache the delimiters; the cache code was buggy > if !delim && saved_delim, or if the new n_delim differs from the old. > Also, it wasn't thread-safe. Here's a proposed readtokens test module: >From 52605d539a37d4d303e7863d946e85bef21bbcc3 Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Wed, 7 Mar 2012 12:16:30 +0100 Subject: [PATCH] readtokens: add tests * modules/readtokens-tests: New file. * tests/test-readtokens.c: New file. * tests/test-readtokens.sh: New file. --- ChangeLog | 6 +++ modules/readtokens-tests | 13 ++++++ tests/test-readtokens.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++ tests/test-readtokens.sh | 23 +++++++++++ 4 files changed, 140 insertions(+) create mode 100644 modules/readtokens-tests create mode 100644 tests/test-readtokens.c create mode 100755 tests/test-readtokens.sh diff --git a/ChangeLog b/ChangeLog index 8ac1ba4..83a09a4 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,11 @@ 2012-03-07 Jim Meyering + readtokens: add tests + * modules/readtokens-tests: New file. + * tests/test-readtokens.c: New file. + +2012-03-07 Jim Meyering + quotearg: the module must now include quote.h With commit v0.0-7133-g6417476, quotearg.c includes "quote.h". So must the module. diff --git a/modules/readtokens-tests b/modules/readtokens-tests new file mode 100644 index 0000000..a1982dd --- /dev/null +++ b/modules/readtokens-tests @@ -0,0 +1,13 @@ +Files: +tests/macros.h +tests/test-readtokens.c +tests/test-readtokens.sh + +Depends-on: +closeout + +configure.ac: + +Makefile.am: +TESTS += test-readtokens.sh +check_PROGRAMS += test-readtokens diff --git a/tests/test-readtokens.c b/tests/test-readtokens.c new file mode 100644 index 0000000..b0af5c8 --- /dev/null +++ b/tests/test-readtokens.c @@ -0,0 +1,98 @@ +/* Test the readtokens module. + Copyright (C) 2012 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#include +#include +#include +#include +#include +#include +#include + +#include "readtokens.h" +#include "closeout.h" +#include "macros.h" + +static void +basic (void) +{ + char *filename = "in.827"; + int fd = open (filename, O_CREAT | O_WRONLY, 0600); + ASSERT (fd >= 0); + ASSERT (write (fd, "a|b;c+d", 7) == 7); + ASSERT (close (fd) == 0); + + { + token_buffer tb; + FILE *fp = fopen (filename, "r"); + ASSERT (fp); + + init_tokenbuffer (&tb); + ASSERT (readtoken (fp, "|;", 2, &tb) == 1 && tb.buffer[0] == 'a'); + ASSERT (readtoken (fp, "|;", 2, &tb) == 1 && tb.buffer[0] == 'b'); + ASSERT (readtoken (fp, "+", 1, &tb) == 1 && tb.buffer[0] == 'c'); + ASSERT (readtoken (fp, "-", 1, &tb) == 1 && tb.buffer[0] == 'd'); + ASSERT (readtoken (fp, "%", 0, &tb) == (size_t) -1); + ASSERT ( ! ferror (fp)); + ASSERT (fclose (fp) == 0); + } +} + +int +main (int argc, char **argv) +{ + token_buffer tb; + char *delim; + size_t delim_len; + bool ok = true; + + atexit (close_stdout); + + if (argc == 1) + { + basic (); + return 0; + } + + init_tokenbuffer (&tb); + + if (argc != 2) + return 99; + + delim = argv[1]; + delim_len = strlen (delim); + + if (STREQ (delim, "\\0")) + { + delim = ""; + delim_len = 1; + } + + while (1) + { + size_t token_length = readtoken (stdin, delim, delim_len, &tb); + if (token_length == (size_t) -1) + break; + fwrite (tb.buffer, 1, token_length, stdout); + putchar (':'); + } + putchar ('\n'); + free (tb.buffer); + + ASSERT ( ! ferror (stdin)); + + return 0; +} diff --git a/tests/test-readtokens.sh b/tests/test-readtokens.sh new file mode 100755 index 0000000..51fd41e --- /dev/null +++ b/tests/test-readtokens.sh @@ -0,0 +1,23 @@ +#!/bin/sh +. "${srcdir=.}/init.sh"; path_prepend_ . + +fail=0 + +test-readtokens || fail=1 + +# Simplest case. +echo a:b:c: > exp || fail=1 +printf a:b:c | test-readtokens : > out 2>&1 || fail=1 +compare exp out || fail=1 + +# Use NUL as the delimiter. +echo a:b:c: > exp || fail=1 +printf 'a\0b\0c' | test-readtokens '\0' > out 2>&1 || fail=1 +compare exp out || fail=1 + +# Two delimiter bytes, and adjacent delimiters in the input. +echo a:b:c: > exp || fail=1 +printf a:-:b-:c:: | test-readtokens :- > out 2>&1 || fail=1 +compare exp out || fail=1 + +Exit $fail -- 1.7.9.3.363.g121f0 From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 07 14:58:06 2012 Received: (at 10953-done) by debbugs.gnu.org; 7 Mar 2012 19:58:06 +0000 Received: from localhost ([127.0.0.1]:37914 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S5Mzt-0001T4-Pa for submit@debbugs.gnu.org; Wed, 07 Mar 2012 14:58:06 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:42253) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S5Mzg-0001Sg-01 for 10953-done@debbugs.gnu.org; Wed, 07 Mar 2012 14:57:53 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 68038A60002; Wed, 7 Mar 2012 11:56:50 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fhKBPgl-+9XQ; Wed, 7 Mar 2012 11:56:49 -0800 (PST) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 5FEECA60001; Wed, 7 Mar 2012 11:56:49 -0800 (PST) Message-ID: <4F57BD80.5080608@cs.ucla.edu> Date: Wed, 07 Mar 2012 11:56:48 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120209 Thunderbird/10.0.1 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#10953: Potential logical bug in readtokens.c References: <4F55A7A3.40509@163.com> <4F569BFA.7090109@cs.ucla.edu> <87hay0rxy5.fsf@rho.meyering.net> In-Reply-To: <87hay0rxy5.fsf@rho.meyering.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 10953-done Cc: Xu Zhongxing , 10953-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On 03/07/2012 11:52 AM, Jim Meyering wrote: > Here's a proposed readtokens test module: I took a quick look, and it looks fine, thanks. The bug's fixed now, so I'm marking it as 'done'. From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 07 15:08:55 2012 Received: (at 10953-done) by debbugs.gnu.org; 7 Mar 2012 20:08:56 +0000 Received: from localhost ([127.0.0.1]:37921 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S5NAN-0001iv-A3 for submit@debbugs.gnu.org; Wed, 07 Mar 2012 15:08:55 -0500 Received: from mx.meyering.net ([88.168.87.75]:43876) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S5NA1-0001iH-GA for 10953-done@debbugs.gnu.org; Wed, 07 Mar 2012 15:08:43 -0500 Received: from rho.meyering.net (localhost.localdomain [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id E52F460052; Wed, 7 Mar 2012 21:07:31 +0100 (CET) From: Jim Meyering To: Paul Eggert Subject: Re: bug#10953: Potential logical bug in readtokens.c In-Reply-To: <4F57BD80.5080608@cs.ucla.edu> (Paul Eggert's message of "Wed, 07 Mar 2012 11:56:48 -0800") References: <4F55A7A3.40509@163.com> <4F569BFA.7090109@cs.ucla.edu> <87hay0rxy5.fsf@rho.meyering.net> <4F57BD80.5080608@cs.ucla.edu> Date: Wed, 07 Mar 2012 21:07:31 +0100 Message-ID: <87boo8rx8c.fsf@rho.meyering.net> Lines: 19 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 10953-done Cc: Xu Zhongxing , 10953-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) Paul Eggert wrote: > On 03/07/2012 11:52 AM, Jim Meyering wrote: >> Here's a proposed readtokens test module: > > I took a quick look, and it looks fine, thanks. Thanks for the quick review. While I'd tested like this, ./gnulib-tool --create-testdir --with-tests --test readtokens I nearly forgot to compile with warnings enabled, so nearly missed an unused variable and the lack of for declarations of write and close. Just pushed. > The bug's fixed now, so I'm marking it as 'done'. Thanks! From unknown Sat Sep 06 05:55:32 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 05 Apr 2012 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator