From unknown Sat Sep 06 05:55:18 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10953: Potential logical bug in readtokens.c Resent-From: Xu Zhongxing Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Tue, 06 Mar 2012 08:18:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 10953 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 10953@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.133102182319715 (code B ref -1); Tue, 06 Mar 2012 08:18:02 +0000 Received: (at submit) by debbugs.gnu.org; 6 Mar 2012 08:17:03 +0000 Received: from localhost ([127.0.0.1]:35163 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S4pZn-000577-7i for submit@debbugs.gnu.org; Tue, 06 Mar 2012 03:17:02 -0500 Received: from eggs.gnu.org ([208.118.235.92]:60652) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S4nRV-0001xE-7U for submit@debbugs.gnu.org; Tue, 06 Mar 2012 01:00:33 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S4nQe-00047V-6p for submit@debbugs.gnu.org; Tue, 06 Mar 2012 00:59:21 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:56785) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4nQe-00047Q-1h for submit@debbugs.gnu.org; Tue, 06 Mar 2012 00:59:20 -0500 Received: from eggs.gnu.org ([208.118.235.92]:50723) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4nQc-0002CP-Cz for bug-coreutils@gnu.org; Tue, 06 Mar 2012 00:59:19 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S4nQZ-00046z-MQ for bug-coreutils@gnu.org; Tue, 06 Mar 2012 00:59:17 -0500 Received: from m50-132.163.com ([123.125.50.132]:44726) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4nQY-00046M-GB for bug-coreutils@gnu.org; Tue, 06 Mar 2012 00:59:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Received:Message-ID:Date:From:User-Agent: MIME-Version:To:Subject:Content-Type:Content-Transfer-Encoding; bh=2JIIW8kf7Gome530rDT17BVTCcXShjf/EfvxjNagkHo=; b=YPELrLsjpkaRF uajxOcBaEX5N9y6h9dOXPrhU+BdY/bO9bg5mt4azmr5Yr0S/aTHuF699LN2YUMls 1znzDqO3WEvoRPnlBbiMcS1CQlJX+AfQl/gupyf+LbnwifLfnvz006FI8VS2hvRp 9ra5XZJvUfST8CVlos1ECnk2Drz0vs= Received: from [192.168.9.60] (unknown [124.16.137.1]) by smtp2 (Coremail) with SMTP id DNGowEAZxHCjp1VPc+_7CA--.2490S2; Tue, 06 Mar 2012 13:59:01 +0800 (CST) Message-ID: <4F55A7A3.40509@163.com> Date: Tue, 06 Mar 2012 13:58:59 +0800 From: Xu Zhongxing User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: 7bit X-CM-TRANSID: DNGowEAZxHCjp1VPc+_7CA--.2490S2 X-Coremail-Antispam: 1Uf129KBjDUn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73 VFW2AGmfu7bjvjm3AaLaJ3UbIYCTnIWIevJa73UjIFyTuYvjxUc5rcUUUUU X-CM-SenderInfo: h0xb6xprqjs5xlqjqiywtou0bp/1tbiTxRpYEkAZ8hDmAAAsv X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.9 (------) X-Mailman-Approved-At: Tue, 06 Mar 2012 03:16:43 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) Hi, I think I found a potential bug. But I am not sure about it. It looks real from the code. For coreutils 8.15, in file readtokens.c, function readtoken() If delim is NULL, and saved_delim is not NULL, then the condition on line 75 is not satisfied, but the condition on line 79 is satisfied. In this case, there could be NULL pointer deference for delim[i] in line 84. - Zhongxing Xu From unknown Sat Sep 06 05:55:18 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10953: Potential logical bug in readtokens.c Resent-From: Paul Eggert Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Tue, 06 Mar 2012 23:23:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10953 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Xu Zhongxing Cc: 10953@debbugs.gnu.org Received: via spool by 10953-submit@debbugs.gnu.org id=B10953.133107617117490 (code B ref 10953); Tue, 06 Mar 2012 23:23:01 +0000 Received: (at 10953) by debbugs.gnu.org; 6 Mar 2012 23:22:51 +0000 Received: from localhost ([127.0.0.1]:36673 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S53iU-0004Xo-RW for submit@debbugs.gnu.org; Tue, 06 Mar 2012 18:22:51 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:36401) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S53i9-0004XG-7X for 10953@debbugs.gnu.org; Tue, 06 Mar 2012 18:22:40 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 3418739E8008; Tue, 6 Mar 2012 15:21:32 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GEaV4XV5eK2q; Tue, 6 Mar 2012 15:21:30 -0800 (PST) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id EBF7339E8007; Tue, 6 Mar 2012 15:21:30 -0800 (PST) Message-ID: <4F569BFA.7090109@cs.ucla.edu> Date: Tue, 06 Mar 2012 15:21:30 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120209 Thunderbird/10.0.1 MIME-Version: 1.0 References: <4F55A7A3.40509@163.com> In-Reply-To: <4F55A7A3.40509@163.com> Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.9 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) Thanks, I agree that code is potentially buggy. I don't see any way to trigger the bug in coreutils, but it's just asking for trouble. Here's a proposed patch. >From 4954a3517397dadd217d6244e961dd855fbadbef Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 6 Mar 2012 15:19:24 -0800 Subject: [PATCH] readtokens: avoid core dumps with unusual calling patterns Reported by Xu Zhongxing in . * lib/readtokens.c: Include limits.h. (word, bits_per_word, get_nth_bit, set_nth_bit): New. (readtoken): Don't cache the delimiters; the cache code was buggy if !delim && saved_delim, or if the new n_delim differs from the old. Also, it wasn't thread-safe. --- ChangeLog | 10 +++++++++ lib/readtokens.c | 56 +++++++++++++++++++++++------------------------------ 2 files changed, 34 insertions(+), 32 deletions(-) diff --git a/ChangeLog b/ChangeLog index 163e154..eb99d25 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,13 @@ +2012-03-06 Paul Eggert + + readtokens: avoid core dumps with unusual calling patterns + Reported by Xu Zhongxing in . + * lib/readtokens.c: Include limits.h. + (word, bits_per_word, get_nth_bit, set_nth_bit): New. + (readtoken): Don't cache the delimiters; the cache code was buggy + if !delim && saved_delim, or if the new n_delim differs from the old. + Also, it wasn't thread-safe. + 2012-03-06 Bruno Haible math: Ensure declarations of math functions. diff --git a/lib/readtokens.c b/lib/readtokens.c index 705a62b..f11d3f6 100644 --- a/lib/readtokens.c +++ b/lib/readtokens.c @@ -26,6 +26,7 @@ #include "readtokens.h" +#include #include #include #include @@ -46,6 +47,22 @@ init_tokenbuffer (token_buffer *tokenbuffer) tokenbuffer->buffer = NULL; } +typedef size_t word; +enum { bits_per_word = sizeof (word) * CHAR_BIT }; + +static bool +get_nth_bit (size_t n, word const *bitset) +{ + return bitset[n / bits_per_word] >> n % bits_per_word & 1; +} + +static void +set_nth_bit (size_t n, word *bitset) +{ + size_t one = 1; + bitset[n / bits_per_word] |= one << n % bits_per_word; +} + /* Read a token from STREAM into TOKENBUFFER. A token is delimited by any of the N_DELIM bytes in DELIM. Upon return, the token is in tokenbuffer->buffer and @@ -68,42 +85,17 @@ readtoken (FILE *stream, char *p; int c; size_t i, n; - static const char *saved_delim = NULL; - static char isdelim[256]; - bool same_delimiters; - - if (delim == NULL && saved_delim == NULL) - abort (); + word isdelim[(UCHAR_MAX + bits_per_word) / bits_per_word]; - same_delimiters = false; - if (delim != saved_delim && saved_delim != NULL) + memset (isdelim, 0, sizeof isdelim); + for (i = 0; i < n_delim; i++) { - same_delimiters = true; - for (i = 0; i < n_delim; i++) - { - if (delim[i] != saved_delim[i]) - { - same_delimiters = false; - break; - } - } - } - - if (!same_delimiters) - { - size_t j; - saved_delim = delim; - memset (isdelim, 0, sizeof isdelim); - for (j = 0; j < n_delim; j++) - { - unsigned char ch = delim[j]; - isdelim[ch] = 1; - } + unsigned char ch = delim[i]; + set_nth_bit (ch, isdelim); } - /* FIXME: don't fool with this caching. Use strchr instead. */ /* skip over any leading delimiters */ - for (c = getc (stream); c >= 0 && isdelim[c]; c = getc (stream)) + for (c = getc (stream); c >= 0 && get_nth_bit (c, isdelim); c = getc (stream)) { /* empty */ } @@ -124,7 +116,7 @@ readtoken (FILE *stream, p[i] = 0; break; } - if (isdelim[c]) + if (get_nth_bit (c, isdelim)) { p[i] = 0; break; -- 1.7.6.5 From unknown Sat Sep 06 05:55:18 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10953: Potential logical bug in readtokens.c Resent-From: Eric Blake Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Tue, 06 Mar 2012 23:34:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10953 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: 10953@debbugs.gnu.org, Xu Zhongxing , bug-gnulib Received: via spool by 10953-submit@debbugs.gnu.org id=B10953.133107683918505 (code B ref 10953); Tue, 06 Mar 2012 23:34:02 +0000 Received: (at 10953) by debbugs.gnu.org; 6 Mar 2012 23:33:59 +0000 Received: from localhost ([127.0.0.1]:36685 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S53tG-0004oD-9A for submit@debbugs.gnu.org; Tue, 06 Mar 2012 18:33:58 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50251) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S53t2-0004nk-0M for 10953@debbugs.gnu.org; Tue, 06 Mar 2012 18:33:46 -0500 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q26NW49r015120 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 6 Mar 2012 18:32:04 -0500 Received: from [10.3.113.91] (ovpn-113-91.phx2.redhat.com [10.3.113.91]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q26NW3pb025248; Tue, 6 Mar 2012 18:32:03 -0500 Message-ID: <4F569E72.1000508@redhat.com> Date: Tue, 06 Mar 2012 16:32:02 -0700 From: Eric Blake Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120216 Thunderbird/10.0.1 MIME-Version: 1.0 References: <4F55A7A3.40509@163.com> <4F569BFA.7090109@cs.ucla.edu> In-Reply-To: <4F569BFA.7090109@cs.ucla.edu> X-Enigmail-Version: 1.3.5 OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enigF6F6FEB8E7E93E645BE7D1FD" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 X-Spam-Score: -6.9 (------) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigF6F6FEB8E7E93E645BE7D1FD Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable [adding gnulib, since the code in question lives there] On 03/06/2012 04:21 PM, Paul Eggert wrote: > Thanks, I agree that code is potentially buggy. I don't see > any way to trigger the bug in coreutils, but it's just asking > for trouble. Here's a proposed patch. >=20 >>>From 4954a3517397dadd217d6244e961dd855fbadbef Mon Sep 17 00:00:00 2001 > From: Paul Eggert > Date: Tue, 6 Mar 2012 15:19:24 -0800 > Subject: [PATCH] readtokens: avoid core dumps with unusual calling patt= erns >=20 > Reported by Xu Zhongxing in . > * lib/readtokens.c: Include limits.h. > (word, bits_per_word, get_nth_bit, set_nth_bit): New. > (readtoken): Don't cache the delimiters; the cache code was buggy > if !delim && saved_delim, or if the new n_delim differs from the old. > Also, it wasn't thread-safe. The fix looks right to me from a correctness perspective, but I'm afraid we're still sub-optimal in performance. > --- > ChangeLog | 10 +++++++++ > lib/readtokens.c | 56 +++++++++++++++++++++++-----------------------= ------- > 2 files changed, 34 insertions(+), 32 deletions(-) >=20 > diff --git a/ChangeLog b/ChangeLog > index 163e154..eb99d25 100644 > --- a/ChangeLog > +++ b/ChangeLog > @@ -1,3 +1,13 @@ > +2012-03-06 Paul Eggert > + > + readtokens: avoid core dumps with unusual calling patterns > + Reported by Xu Zhongxing in . > + * lib/readtokens.c: Include limits.h. > + (word, bits_per_word, get_nth_bit, set_nth_bit): New. > + (readtoken): Don't cache the delimiters; the cache code was buggy > + if !delim && saved_delim, or if the new n_delim differs from the old.= > + Also, it wasn't thread-safe. > + > 2012-03-06 Bruno Haible > =20 > math: Ensure declarations of math functions. > diff --git a/lib/readtokens.c b/lib/readtokens.c > index 705a62b..f11d3f6 100644 > --- a/lib/readtokens.c > +++ b/lib/readtokens.c > @@ -26,6 +26,7 @@ > =20 > #include "readtokens.h" > =20 > +#include > #include > #include > #include > @@ -46,6 +47,22 @@ init_tokenbuffer (token_buffer *tokenbuffer) > tokenbuffer->buffer =3D NULL; > } > =20 > +typedef size_t word; > +enum { bits_per_word =3D sizeof (word) * CHAR_BIT }; > + > +static bool > +get_nth_bit (size_t n, word const *bitset) > +{ > + return bitset[n / bits_per_word] >> n % bits_per_word & 1; > +} > + > +static void > +set_nth_bit (size_t n, word *bitset) > +{ > + size_t one =3D 1; > + bitset[n / bits_per_word] |=3D one << n % bits_per_word; > +} > + > /* Read a token from STREAM into TOKENBUFFER. > A token is delimited by any of the N_DELIM bytes in DELIM. > Upon return, the token is in tokenbuffer->buffer and > @@ -68,42 +85,17 @@ readtoken (FILE *stream, > char *p; > int c; > size_t i, n; > - static const char *saved_delim =3D NULL; > - static char isdelim[256]; > - bool same_delimiters; > - > - if (delim =3D=3D NULL && saved_delim =3D=3D NULL) > - abort (); > + word isdelim[(UCHAR_MAX + bits_per_word) / bits_per_word]; > =20 > - same_delimiters =3D false; > - if (delim !=3D saved_delim && saved_delim !=3D NULL) > + memset (isdelim, 0, sizeof isdelim); > + for (i =3D 0; i < n_delim; i++) > { > - same_delimiters =3D true; > - for (i =3D 0; i < n_delim; i++) > - { > - if (delim[i] !=3D saved_delim[i]) > - { > - same_delimiters =3D false; > - break; > - } > - } > - } > - > - if (!same_delimiters) > - { > - size_t j; > - saved_delim =3D delim; > - memset (isdelim, 0, sizeof isdelim); > - for (j =3D 0; j < n_delim; j++) > - { > - unsigned char ch =3D delim[j]; > - isdelim[ch] =3D 1; > - } > + unsigned char ch =3D delim[i]; > + set_nth_bit (ch, isdelim); > } > =20 > - /* FIXME: don't fool with this caching. Use strchr instead. */ > /* skip over any leading delimiters */ > - for (c =3D getc (stream); c >=3D 0 && isdelim[c]; c =3D getc (stream= )) > + for (c =3D getc (stream); c >=3D 0 && get_nth_bit (c, isdelim); c =3D= getc (stream)) Why not just strchr instead of building up an isdelim bitmap? And why are we calling getc() one character at a time, instead of using tricks like freadahead() to operate on a larger buffer? Also, is readtoken() intended to be a more powerful interface than strtok, in which case we _do_ want to be non-threadsafe, and to have a readtoken_r interface that is the underlying threadsafe variant that can benefit from caching? > { > /* empty */ > } > @@ -124,7 +116,7 @@ readtoken (FILE *stream, > p[i] =3D 0; > break; > } > - if (isdelim[c]) > + if (get_nth_bit (c, isdelim)) > { > p[i] =3D 0; > break; --=20 Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --------------enigF6F6FEB8E7E93E645BE7D1FD Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJPVp5yAAoJEKeha0olJ0Nqyd0H/0XL8GvGffdsxqE8uHl8eFcE 0cl1X3fByzv6TcXpahNbmqbxH6E+0E9b478P4RrV4fhbqrXBO45/e4IBiRRtpwu+ lXQPc1cExK+hIkLzx++/86YYgghWgc2iAhMnigLXKo005PjlKp91lOd4z4Tef/IX 5jCaBneTR66+RgHmN6VSJjThUVZRfAm4110G83nxM81xosK1Oz2CmWua9KorpuEL flhH21OwUaLSV8sBsKOb1mnz60gfF6K5mRec9DX3+VIpOh8134I9Vs1VCwaQUlkk CUbGNGkGkFNsvEIDo2WJZ52do5klqDHeBhrOhwwTLxS3Bh5NLCU00PAumoFDeOQ= =2W2s -----END PGP SIGNATURE----- --------------enigF6F6FEB8E7E93E645BE7D1FD-- From unknown Sat Sep 06 05:55:18 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10953: Potential logical bug in readtokens.c Resent-From: Paul Eggert Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 07 Mar 2012 05:35:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10953 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Eric Blake Cc: 10953@debbugs.gnu.org, Xu Zhongxing , bug-gnulib Received: via spool by 10953-submit@debbugs.gnu.org id=B10953.133109846220582 (code B ref 10953); Wed, 07 Mar 2012 05:35:02 +0000 Received: (at 10953) by debbugs.gnu.org; 7 Mar 2012 05:34:22 +0000 Received: from localhost ([127.0.0.1]:36863 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S59W1-0005Lk-4X for submit@debbugs.gnu.org; Wed, 07 Mar 2012 00:34:21 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:44627) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S59Vf-0005Ky-W0 for 10953@debbugs.gnu.org; Wed, 07 Mar 2012 00:34:10 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id BB20639E800D; Tue, 6 Mar 2012 21:33:01 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id t-QVxvRRaxIz; Tue, 6 Mar 2012 21:33:01 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 2826939E800C; Tue, 6 Mar 2012 21:33:01 -0800 (PST) Message-ID: <4F56F30E.40205@cs.ucla.edu> Date: Tue, 06 Mar 2012 21:33:02 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 References: <4F55A7A3.40509@163.com> <4F569BFA.7090109@cs.ucla.edu> <4F569E72.1000508@redhat.com> In-Reply-To: <4F569E72.1000508@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.9 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On 03/06/2012 03:32 PM, Eric Blake wrote: > Why not just strchr instead of building up an isdelim bitmap? strchr would not be right, since '\0' is valid in data and as a delimiter. No doubt you meant 'memchr'; but using 'memchr' would slow down readtoken by about a factor of two. I got this result by timing the following benchmark on gcc-4.6.1.tar (uncompressed) on Fedora 15 x86-64 with GCC 4.6.2: #include #include struct tokenbuffer t; int main (void) { for (;;) { size_t s = readtoken (stdin, " \t\n", 3, &t); if (s == (size_t) -1) return 0; } } On this benchmark, the relative speeds (user+sys CPU time ratios, bigger numbers are better) are: 0.54 readtoken with memchr 1.00 current readtoken (with non-thread-safe byte array) 1.13 proposed readtoken (with thread-safe bitset) So the proposed patch is a performance win even in non-thread-safe use. > And why > are we calling getc() one character at a time, instead of using tricks > like freadahead() to operate on a larger buffer? > > Also, is readtoken() intended to be a more powerful interface than > strtok, in which case we _do_ want to be non-threadsafe, and to have a > readtoken_r interface that is the underlying threadsafe variant that can > benefit from caching? I haven't thought about these issues, but surely they are independent of the proposed patch. From unknown Sat Sep 06 05:55:18 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10953: Potential logical bug in readtokens.c Resent-From: Jim Meyering Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 07 Mar 2012 19:54:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10953 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: 10953@debbugs.gnu.org, Xu Zhongxing Received: via spool by 10953-submit@debbugs.gnu.org id=B10953.13311500085261 (code B ref 10953); Wed, 07 Mar 2012 19:54:02 +0000 Received: (at 10953) by debbugs.gnu.org; 7 Mar 2012 19:53:28 +0000 Received: from localhost ([127.0.0.1]:37910 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S5MvP-0001Mb-LR for submit@debbugs.gnu.org; Wed, 07 Mar 2012 14:53:28 -0500 Received: from mx.meyering.net ([88.168.87.75]:43832) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S5Mv2-0001Ln-Rq for 10953@debbugs.gnu.org; Wed, 07 Mar 2012 14:53:16 -0500 Received: from rho.meyering.net (localhost.localdomain [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id BDE1860132; Wed, 7 Mar 2012 20:52:02 +0100 (CET) From: Jim Meyering In-Reply-To: <4F569BFA.7090109@cs.ucla.edu> (Paul Eggert's message of "Tue, 06 Mar 2012 15:21:30 -0800") References: <4F55A7A3.40509@163.com> <4F569BFA.7090109@cs.ucla.edu> Date: Wed, 07 Mar 2012 20:52:02 +0100 Message-ID: <87hay0rxy5.fsf@rho.meyering.net> Lines: 216 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) Paul Eggert wrote: > Thanks, I agree that code is potentially buggy. I don't see > any way to trigger the bug in coreutils, but it's just asking > for trouble. Here's a proposed patch. Hi Paul, Thanks for working on that. I had applied, tested and reviewed it earlier today, as I was writing the changes below. I've just noticed that I pushed it along with the quotearg/quote.h fix that I pushed earlier today. Must remember always to put under-review change-sets on separate topic branch, not on master. Sorry for the lapse. >>>From 4954a3517397dadd217d6244e961dd855fbadbef Mon Sep 17 00:00:00 2001 > From: Paul Eggert > Date: Tue, 6 Mar 2012 15:19:24 -0800 > Subject: [PATCH] readtokens: avoid core dumps with unusual calling patterns > > Reported by Xu Zhongxing in . > * lib/readtokens.c: Include limits.h. > (word, bits_per_word, get_nth_bit, set_nth_bit): New. > (readtoken): Don't cache the delimiters; the cache code was buggy > if !delim && saved_delim, or if the new n_delim differs from the old. > Also, it wasn't thread-safe. Here's a proposed readtokens test module: >From 52605d539a37d4d303e7863d946e85bef21bbcc3 Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Wed, 7 Mar 2012 12:16:30 +0100 Subject: [PATCH] readtokens: add tests * modules/readtokens-tests: New file. * tests/test-readtokens.c: New file. * tests/test-readtokens.sh: New file. --- ChangeLog | 6 +++ modules/readtokens-tests | 13 ++++++ tests/test-readtokens.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++ tests/test-readtokens.sh | 23 +++++++++++ 4 files changed, 140 insertions(+) create mode 100644 modules/readtokens-tests create mode 100644 tests/test-readtokens.c create mode 100755 tests/test-readtokens.sh diff --git a/ChangeLog b/ChangeLog index 8ac1ba4..83a09a4 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,11 @@ 2012-03-07 Jim Meyering + readtokens: add tests + * modules/readtokens-tests: New file. + * tests/test-readtokens.c: New file. + +2012-03-07 Jim Meyering + quotearg: the module must now include quote.h With commit v0.0-7133-g6417476, quotearg.c includes "quote.h". So must the module. diff --git a/modules/readtokens-tests b/modules/readtokens-tests new file mode 100644 index 0000000..a1982dd --- /dev/null +++ b/modules/readtokens-tests @@ -0,0 +1,13 @@ +Files: +tests/macros.h +tests/test-readtokens.c +tests/test-readtokens.sh + +Depends-on: +closeout + +configure.ac: + +Makefile.am: +TESTS += test-readtokens.sh +check_PROGRAMS += test-readtokens diff --git a/tests/test-readtokens.c b/tests/test-readtokens.c new file mode 100644 index 0000000..b0af5c8 --- /dev/null +++ b/tests/test-readtokens.c @@ -0,0 +1,98 @@ +/* Test the readtokens module. + Copyright (C) 2012 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#include +#include +#include +#include +#include +#include +#include + +#include "readtokens.h" +#include "closeout.h" +#include "macros.h" + +static void +basic (void) +{ + char *filename = "in.827"; + int fd = open (filename, O_CREAT | O_WRONLY, 0600); + ASSERT (fd >= 0); + ASSERT (write (fd, "a|b;c+d", 7) == 7); + ASSERT (close (fd) == 0); + + { + token_buffer tb; + FILE *fp = fopen (filename, "r"); + ASSERT (fp); + + init_tokenbuffer (&tb); + ASSERT (readtoken (fp, "|;", 2, &tb) == 1 && tb.buffer[0] == 'a'); + ASSERT (readtoken (fp, "|;", 2, &tb) == 1 && tb.buffer[0] == 'b'); + ASSERT (readtoken (fp, "+", 1, &tb) == 1 && tb.buffer[0] == 'c'); + ASSERT (readtoken (fp, "-", 1, &tb) == 1 && tb.buffer[0] == 'd'); + ASSERT (readtoken (fp, "%", 0, &tb) == (size_t) -1); + ASSERT ( ! ferror (fp)); + ASSERT (fclose (fp) == 0); + } +} + +int +main (int argc, char **argv) +{ + token_buffer tb; + char *delim; + size_t delim_len; + bool ok = true; + + atexit (close_stdout); + + if (argc == 1) + { + basic (); + return 0; + } + + init_tokenbuffer (&tb); + + if (argc != 2) + return 99; + + delim = argv[1]; + delim_len = strlen (delim); + + if (STREQ (delim, "\\0")) + { + delim = ""; + delim_len = 1; + } + + while (1) + { + size_t token_length = readtoken (stdin, delim, delim_len, &tb); + if (token_length == (size_t) -1) + break; + fwrite (tb.buffer, 1, token_length, stdout); + putchar (':'); + } + putchar ('\n'); + free (tb.buffer); + + ASSERT ( ! ferror (stdin)); + + return 0; +} diff --git a/tests/test-readtokens.sh b/tests/test-readtokens.sh new file mode 100755 index 0000000..51fd41e --- /dev/null +++ b/tests/test-readtokens.sh @@ -0,0 +1,23 @@ +#!/bin/sh +. "${srcdir=.}/init.sh"; path_prepend_ . + +fail=0 + +test-readtokens || fail=1 + +# Simplest case. +echo a:b:c: > exp || fail=1 +printf a:b:c | test-readtokens : > out 2>&1 || fail=1 +compare exp out || fail=1 + +# Use NUL as the delimiter. +echo a:b:c: > exp || fail=1 +printf 'a\0b\0c' | test-readtokens '\0' > out 2>&1 || fail=1 +compare exp out || fail=1 + +# Two delimiter bytes, and adjacent delimiters in the input. +echo a:b:c: > exp || fail=1 +printf a:-:b-:c:: | test-readtokens :- > out 2>&1 || fail=1 +compare exp out || fail=1 + +Exit $fail -- 1.7.9.3.363.g121f0 From unknown Sat Sep 06 05:55:18 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.428 (Entity 5.428) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Xu Zhongxing Subject: bug#10953: closed (Re: bug#10953: Potential logical bug in readtokens.c) Message-ID: References: <4F57BD80.5080608@cs.ucla.edu> <4F55A7A3.40509@163.com> X-Gnu-PR-Message: they-closed 10953 X-Gnu-PR-Package: coreutils Reply-To: 10953@debbugs.gnu.org Date: Wed, 07 Mar 2012 19:59:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1331150342-5747-1" This is a multi-part message in MIME format... ------------=_1331150342-5747-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #10953: Potential logical bug in readtokens.c which was filed against the coreutils package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 10953@debbugs.gnu.org. --=20 10953: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D10953 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1331150342-5747-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 10953-done) by debbugs.gnu.org; 7 Mar 2012 19:58:06 +0000 Received: from localhost ([127.0.0.1]:37914 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S5Mzt-0001T4-Pa for submit@debbugs.gnu.org; Wed, 07 Mar 2012 14:58:06 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:42253) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S5Mzg-0001Sg-01 for 10953-done@debbugs.gnu.org; Wed, 07 Mar 2012 14:57:53 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 68038A60002; Wed, 7 Mar 2012 11:56:50 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fhKBPgl-+9XQ; Wed, 7 Mar 2012 11:56:49 -0800 (PST) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 5FEECA60001; Wed, 7 Mar 2012 11:56:49 -0800 (PST) Message-ID: <4F57BD80.5080608@cs.ucla.edu> Date: Wed, 07 Mar 2012 11:56:48 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120209 Thunderbird/10.0.1 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#10953: Potential logical bug in readtokens.c References: <4F55A7A3.40509@163.com> <4F569BFA.7090109@cs.ucla.edu> <87hay0rxy5.fsf@rho.meyering.net> In-Reply-To: <87hay0rxy5.fsf@rho.meyering.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 10953-done Cc: Xu Zhongxing , 10953-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On 03/07/2012 11:52 AM, Jim Meyering wrote: > Here's a proposed readtokens test module: I took a quick look, and it looks fine, thanks. The bug's fixed now, so I'm marking it as 'done'. ------------=_1331150342-5747-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 6 Mar 2012 08:17:03 +0000 Received: from localhost ([127.0.0.1]:35163 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S4pZn-000577-7i for submit@debbugs.gnu.org; Tue, 06 Mar 2012 03:17:02 -0500 Received: from eggs.gnu.org ([208.118.235.92]:60652) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S4nRV-0001xE-7U for submit@debbugs.gnu.org; Tue, 06 Mar 2012 01:00:33 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S4nQe-00047V-6p for submit@debbugs.gnu.org; Tue, 06 Mar 2012 00:59:21 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, T_DKIM_INVALID autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:56785) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4nQe-00047Q-1h for submit@debbugs.gnu.org; Tue, 06 Mar 2012 00:59:20 -0500 Received: from eggs.gnu.org ([208.118.235.92]:50723) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4nQc-0002CP-Cz for bug-coreutils@gnu.org; Tue, 06 Mar 2012 00:59:19 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S4nQZ-00046z-MQ for bug-coreutils@gnu.org; Tue, 06 Mar 2012 00:59:17 -0500 Received: from m50-132.163.com ([123.125.50.132]:44726) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S4nQY-00046M-GB for bug-coreutils@gnu.org; Tue, 06 Mar 2012 00:59:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Received:Message-ID:Date:From:User-Agent: MIME-Version:To:Subject:Content-Type:Content-Transfer-Encoding; bh=2JIIW8kf7Gome530rDT17BVTCcXShjf/EfvxjNagkHo=; b=YPELrLsjpkaRF uajxOcBaEX5N9y6h9dOXPrhU+BdY/bO9bg5mt4azmr5Yr0S/aTHuF699LN2YUMls 1znzDqO3WEvoRPnlBbiMcS1CQlJX+AfQl/gupyf+LbnwifLfnvz006FI8VS2hvRp 9ra5XZJvUfST8CVlos1ECnk2Drz0vs= Received: from [192.168.9.60] (unknown [124.16.137.1]) by smtp2 (Coremail) with SMTP id DNGowEAZxHCjp1VPc+_7CA--.2490S2; Tue, 06 Mar 2012 13:59:01 +0800 (CST) Message-ID: <4F55A7A3.40509@163.com> Date: Tue, 06 Mar 2012 13:58:59 +0800 From: Xu Zhongxing User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: bug-coreutils@gnu.org Subject: Potential logical bug in readtokens.c Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: 7bit X-CM-TRANSID: DNGowEAZxHCjp1VPc+_7CA--.2490S2 X-Coremail-Antispam: 1Uf129KBjDUn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73 VFW2AGmfu7bjvjm3AaLaJ3UbIYCTnIWIevJa73UjIFyTuYvjxUc5rcUUUUU X-CM-SenderInfo: h0xb6xprqjs5xlqjqiywtou0bp/1tbiTxRpYEkAZ8hDmAAAsv X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Tue, 06 Mar 2012 03:16:43 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) Hi, I think I found a potential bug. But I am not sure about it. It looks real from the code. For coreutils 8.15, in file readtokens.c, function readtoken() If delim is NULL, and saved_delim is not NULL, then the condition on line 75 is not satisfied, but the condition on line 79 is satisfied. In this case, there could be NULL pointer deference for delim[i] in line 84. - Zhongxing Xu ------------=_1331150342-5747-1--