From unknown Sat Aug 09 14:04:49 2025 X-Loop: help-debbugs@gnu.org Subject: bug#11761: Slight bug in split :-) Resent-From: Jim Meyering Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 21 Jun 2012 22:16:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 11761 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: =?UTF-8?Q?Fran=C3=A7ois?= Pinard Cc: 11761@debbugs.gnu.org X-Debbugs-Original-Cc: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.13403169474142 (code B ref -1); Thu, 21 Jun 2012 22:16:02 +0000 Received: (at submit) by debbugs.gnu.org; 21 Jun 2012 22:15:47 +0000 Received: from localhost ([127.0.0.1]:52086 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1ShpfG-00014k-OY for submit@debbugs.gnu.org; Thu, 21 Jun 2012 18:15:47 -0400 Received: from eggs.gnu.org ([208.118.235.92]:51599) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1ShpfE-00014d-A5 for submit@debbugs.gnu.org; Thu, 21 Jun 2012 18:15:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Shpbh-0003xL-Ja for submit@debbugs.gnu.org; Thu, 21 Jun 2012 18:12:09 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:50133) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Shpbh-0003xF-GS for submit@debbugs.gnu.org; Thu, 21 Jun 2012 18:12:05 -0400 Received: from eggs.gnu.org ([208.118.235.92]:46638) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Shpbf-00079c-NC for bug-coreutils@gnu.org; Thu, 21 Jun 2012 18:12:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Shpbd-0003w0-NL for bug-coreutils@gnu.org; Thu, 21 Jun 2012 18:12:03 -0400 Received: from mx.meyering.net ([88.168.87.75]:58593) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Shpbd-0003vh-Gg for bug-coreutils@gnu.org; Thu, 21 Jun 2012 18:12:01 -0400 Received: from rho.meyering.net (rho.meyering.net [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id 290A860212; Fri, 22 Jun 2012 00:12:00 +0200 (CEST) From: Jim Meyering In-Reply-To: <86y5ngjs19.fsf@mercure.progiciels-bpi.ca> ("=?UTF-8?Q?Fran=C3=A7ois?= Pinard"'s message of "Thu, 21 Jun 2012 17:04:18 -0400") References: <86y5ngjs19.fsf@mercure.progiciels-bpi.ca> Date: Fri, 22 Jun 2012 00:12:00 +0200 Message-ID: <8762aktivj.fsf@rho.meyering.net> Lines: 38 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.9 (------) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) Fran=E7ois Pinard wrote: > Hi, Jim. > > I was looking for a problematic spot from a big file, and to isolate it, > used "split" repeatedly as a way to zoom into the proper place. Just to > try, I used "split -C 100000 xad" at one place (after saving "xad" > first, of course). "split" interrupted itself, producing less output > than input. > > My suggestion would be that split moans in some way before it destroys > its own input. :-) > > Fran=E7ois Hi Fran=E7ois! Thank you for reporting that. That's definitely a bug. For the record, here's a quick reproducer: $ seq 10 > xaa $ split -C 6 xaa $ wc -c x?? 6 xaa 1 xab 7 total $ head x?? =3D=3D> xaa <=3D=3D 1 2 3 =3D=3D> xab <=3D=3D 3$ I've Cc'd the bug list, in case someone would like to write the patch (fix, NEWS and test) before I get to it. I may not have time tomorrow. From unknown Sat Aug 09 14:04:49 2025 X-Loop: help-debbugs@gnu.org Subject: bug#11761: Slight bug in split :-) Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 21 Jun 2012 23:48:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 11761 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Jim Meyering Cc: 11761@debbugs.gnu.org, =?UTF-8?Q?Fran=C3=A7ois?= Pinard Received: via spool by 11761-submit@debbugs.gnu.org id=B11761.134032246012130 (code B ref 11761); Thu, 21 Jun 2012 23:48:02 +0000 Received: (at 11761) by debbugs.gnu.org; 21 Jun 2012 23:47:40 +0000 Received: from localhost ([127.0.0.1]:52168 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Shr6B-00039b-TY for submit@debbugs.gnu.org; Thu, 21 Jun 2012 19:47:40 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44765) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Shr69-00039T-GR for 11761@debbugs.gnu.org; Thu, 21 Jun 2012 19:47:38 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q5LNi01A009217 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 21 Jun 2012 19:44:00 -0400 Received: from [10.36.116.21] (ovpn-116-21.ams2.redhat.com [10.36.116.21]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q5LNhvXV006547 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 21 Jun 2012 19:43:59 -0400 Message-ID: <4FE3B1BD.5050808@draigBrady.com> Date: Fri, 22 Jun 2012 00:43:57 +0100 From: =?UTF-8?Q?P=C3=A1draig?= Brady User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0 MIME-Version: 1.0 References: <86y5ngjs19.fsf@mercure.progiciels-bpi.ca> <8762aktivj.fsf@rho.meyering.net> In-Reply-To: <8762aktivj.fsf@rho.meyering.net> X-Enigmail-Version: 1.3.2 Content-Type: text/plain; charset=ISO-8859-1 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id q5LNi01A009217 X-Spam-Score: -6.9 (------) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) On 06/21/2012 11:12 PM, Jim Meyering wrote: > Fran=E7ois Pinard wrote: >> Hi, Jim. >> >> I was looking for a problematic spot from a big file, and to isolate i= t, >> used "split" repeatedly as a way to zoom into the proper place. Just = to >> try, I used "split -C 100000 xad" at one place (after saving "xad" >> first, of course). "split" interrupted itself, producing less output >> than input. >> >> My suggestion would be that split moans in some way before it destroys >> its own input. :-) >> >> Fran=E7ois >=20 > Hi Fran=E7ois! > Thank you for reporting that. > That's definitely a bug. >=20 > For the record, here's a quick reproducer: >=20 > $ seq 10 > xaa > $ split -C 6 xaa > $ wc -c x?? > 6 xaa > 1 xab > 7 total > $ head x?? > =3D=3D> xaa <=3D=3D > 1 > 2 > 3 >=20 > =3D=3D> xab <=3D=3D > 3$ >=20 > I've Cc'd the bug list, in case someone would like to write > the patch (fix, NEWS and test) before I get to it. > I may not have time tomorrow. Nice catch :) I'll fix it up with something like the following. cheers, P=E1draig. diff --git a/src/split.c b/src/split.c index 53ee271..3e3313a 100644 --- a/src/split.c +++ b/src/split.c @@ -92,6 +92,9 @@ static char const *additional_suffix; /* Name of input file. May be "-". */ static char *infile; +/* stat buf for input file. */ +static struct stat in_stat_buf; + /* Descriptor on which output file is open. */ static int output_desc =3D -1; @@ -362,6 +365,17 @@ create (const char *name) { if (verbose) fprintf (stdout, _("creating file %s\n"), quote (name)); + + struct stat out_stat_buf; + if (stat (name, &out_stat_buf) =3D=3D 0) + { + if (SAME_INODE (in_stat_buf, out_stat_buf)) + error (EXIT_FAILURE, 0, _("%s would overwrite input. Abortin= g."), + quote (name)); + } + else if (errno !=3D ENOENT) + error (EXIT_FAILURE, errno, _("cannot stat %s"), quote (name)); + return open (name, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_= IWOTH)) } @@ -1058,7 +1072,6 @@ parse_chunk (uintmax_t *k_units, uintmax_t *n_units= , char int main (int argc, char **argv) { - struct stat stat_buf; enum Split_type split_type =3D type_undef; size_t in_blk_size =3D 0; /* optimal block size of input file dev= ice */ char *buf; /* file i/o buffer */ @@ -1335,16 +1348,16 @@ main (int argc, char **argv) /* Get the optimal block size of input device and make a buffer. */ - if (fstat (STDIN_FILENO, &stat_buf) !=3D 0) + if (fstat (STDIN_FILENO, &in_stat_buf) !=3D 0) error (EXIT_FAILURE, errno, "%s", infile); if (in_blk_size =3D=3D 0) - in_blk_size =3D io_blksize (stat_buf); + in_blk_size =3D io_blksize (in_stat_buf); if (split_type =3D=3D type_chunk_bytes || split_type =3D=3D type_chunk= _lines) { off_t input_offset =3D lseek (STDIN_FILENO, 0, SEEK_CUR); - if (usable_st_size (&stat_buf)) - file_size =3D stat_buf.st_size; + if (usable_st_size (&in_stat_buf)) + file_size =3D in_stat_buf.st_size; else if (0 <=3D input_offset) { file_size =3D lseek (STDIN_FILENO, 0, SEEK_END); From unknown Sat Aug 09 14:04:49 2025 X-Loop: help-debbugs@gnu.org Subject: bug#11761: Slight bug in split :-) Resent-From: Jim Meyering Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 22 Jun 2012 08:01:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 11761 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: =?UTF-8?Q?P=C3=A1draig?= Brady Cc: 11761@debbugs.gnu.org, =?UTF-8?Q?Fran=C3=A7ois?= Pinard Received: via spool by 11761-submit@debbugs.gnu.org id=B11761.134035200725734 (code B ref 11761); Fri, 22 Jun 2012 08:01:01 +0000 Received: (at 11761) by debbugs.gnu.org; 22 Jun 2012 08:00:07 +0000 Received: from localhost ([127.0.0.1]:52487 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Shymj-0006fq-QN for submit@debbugs.gnu.org; Fri, 22 Jun 2012 04:00:06 -0400 Received: from mx.meyering.net ([88.168.87.75]:55200) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Shymg-0006Zx-UD for 11761@debbugs.gnu.org; Fri, 22 Jun 2012 04:00:04 -0400 Received: from rho.meyering.net (rho.meyering.net [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id 64FB9600AE; Fri, 22 Jun 2012 09:56:24 +0200 (CEST) From: Jim Meyering In-Reply-To: <4FE3B1BD.5050808@draigBrady.com> ("=?UTF-8?Q?P=C3=A1draig?= Brady"'s message of "Fri, 22 Jun 2012 00:43:57 +0100") References: <86y5ngjs19.fsf@mercure.progiciels-bpi.ca> <8762aktivj.fsf@rho.meyering.net> <4FE3B1BD.5050808@draigBrady.com> Date: Fri, 22 Jun 2012 09:56:24 +0200 Message-ID: <87bokb93vb.fsf@rho.meyering.net> Lines: 57 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.9 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) P=E1draig Brady wrote: ... > diff --git a/src/split.c b/src/split.c > index 53ee271..3e3313a 100644 > --- a/src/split.c > +++ b/src/split.c > @@ -92,6 +92,9 @@ static char const *additional_suffix; > /* Name of input file. May be "-". */ > static char *infile; > > +/* stat buf for input file. */ > +static struct stat in_stat_buf; > + > /* Descriptor on which output file is open. */ > static int output_desc =3D -1; > > @@ -362,6 +365,17 @@ create (const char *name) > { > if (verbose) > fprintf (stdout, _("creating file %s\n"), quote (name)); > + > + struct stat out_stat_buf; > + if (stat (name, &out_stat_buf) =3D=3D 0) > + { > + if (SAME_INODE (in_stat_buf, out_stat_buf)) > + error (EXIT_FAILURE, 0, _("%s would overwrite input. Abortin= g."), > + quote (name)); > + } > + else if (errno !=3D ENOENT) > + error (EXIT_FAILURE, errno, _("cannot stat %s"), quote (name)); > + > return open (name, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, > (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_= IWOTH)) > } Hi P=E1draig, Thanks for taking this on. That introduces a minor TOCTOU race. It would probably never matter in practice, but who knows... if we can avoid it, why not? What do you think about something like this? int fd =3D open (name, (... as above, but without O_TRUNC...)... if (fd < 0) return fd; if ( ! fstat (fd, &out_stat_buf)) error (EXIT_FAILURE, errno, _("failed to fstat %s"), quote (name)); if (SAME_INODE (in_stat_buf, out_stat_buf)) error (EXIT_FAILURE, 0, _("%s would overwrite input. Aborting."), quote (name)); if ( ! ftruncate (fd, 0)) error ... return fd; The above might even be a tiny bit faster for long names, since it resolves each name only once. From unknown Sat Aug 09 14:04:49 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.428 (Entity 5.428) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Jim Meyering Subject: bug#11761: closed (Re: bug#11761: Slight bug in split :-)) Message-ID: References: <4FE4313F.3040401@draigBrady.com> <8762aktivj.fsf@rho.meyering.net> X-Gnu-PR-Message: they-closed 11761 X-Gnu-PR-Package: coreutils Reply-To: 11761@debbugs.gnu.org Date: Fri, 22 Jun 2012 08:52:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1340355122-497-1" This is a multi-part message in MIME format... ------------=_1340355122-497-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #11761: Slight bug in split :-) which was filed against the coreutils package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 11761@debbugs.gnu.org. --=20 11761: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D11761 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1340355122-497-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 11761-done) by debbugs.gnu.org; 22 Jun 2012 08:51:46 +0000 Received: from localhost ([127.0.0.1]:52506 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Shzaj-00007W-EH for submit@debbugs.gnu.org; Fri, 22 Jun 2012 04:51:46 -0400 Received: from mx1.redhat.com ([209.132.183.28]:15825) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Shzae-00007J-Q6 for 11761-done@debbugs.gnu.org; Fri, 22 Jun 2012 04:51:42 -0400 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q5M8m2Qx014418 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 22 Jun 2012 04:48:02 -0400 Received: from [10.36.116.30] (ovpn-116-30.ams2.redhat.com [10.36.116.30]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id q5M8lxZE027643 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 22 Jun 2012 04:48:01 -0400 Message-ID: <4FE4313F.3040401@draigBrady.com> Date: Fri, 22 Jun 2012 09:47:59 +0100 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#11761: Slight bug in split :-) References: <86y5ngjs19.fsf@mercure.progiciels-bpi.ca> <8762aktivj.fsf@rho.meyering.net> <4FE3B1BD.5050808@draigBrady.com> <87bokb93vb.fsf@rho.meyering.net> In-Reply-To: <87bokb93vb.fsf@rho.meyering.net> X-Enigmail-Version: 1.3.2 Content-Type: multipart/mixed; boundary="------------060201020701080106050902" X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: 11761-done Cc: =?ISO-8859-1?Q?Fran=E7ois_Pinard?= , 11761-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) This is a multi-part message in MIME format. --------------060201020701080106050902 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id q5M8m2Qx014418 On 06/22/2012 08:56 AM, Jim Meyering wrote: > P=E1draig Brady wrote: > ... >> diff --git a/src/split.c b/src/split.c >> index 53ee271..3e3313a 100644 >> --- a/src/split.c >> +++ b/src/split.c >> @@ -92,6 +92,9 @@ static char const *additional_suffix; >> /* Name of input file. May be "-". */ >> static char *infile; >> >> +/* stat buf for input file. */ >> +static struct stat in_stat_buf; >> + >> /* Descriptor on which output file is open. */ >> static int output_desc =3D -1; >> >> @@ -362,6 +365,17 @@ create (const char *name) >> { >> if (verbose) >> fprintf (stdout, _("creating file %s\n"), quote (name)); >> + >> + struct stat out_stat_buf; >> + if (stat (name, &out_stat_buf) =3D=3D 0) >> + { >> + if (SAME_INODE (in_stat_buf, out_stat_buf)) >> + error (EXIT_FAILURE, 0, _("%s would overwrite input. Abor= ting."), >> + quote (name)); >> + } >> + else if (errno !=3D ENOENT) >> + error (EXIT_FAILURE, errno, _("cannot stat %s"), quote (name)= ); >> + >> return open (name, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, >> (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH |= S_IWOTH)) >> } >=20 > Hi P=E1draig, >=20 > Thanks for taking this on. > That introduces a minor TOCTOU race. > It would probably never matter in practice, > but who knows... if we can avoid it, why not? > What do you think about something like this? >=20 > int fd =3D open (name, (... as above, but without O_TRUNC...)... > if (fd < 0) > return fd; > if ( ! fstat (fd, &out_stat_buf)) > error (EXIT_FAILURE, errno, _("failed to fstat %s"), quote (name)= ); > if (SAME_INODE (in_stat_buf, out_stat_buf)) > error (EXIT_FAILURE, 0, _("%s would overwrite input. Aborting."), > quote (name)); > if ( ! ftruncate (fd, 0)) > error ... > return fd; >=20 > The above might even be a tiny bit faster for long names, > since it resolves each name only once. Well probably slower due to the extra truncate syscall, but point taken on the unlikely TOCTOU race. I'll push the attached in a while. cheers, P=E1draig. >=20 --------------060201020701080106050902 Content-Type: text/plain; charset=UTF-8; name="split-input-guard.diff" Content-Disposition: attachment; filename="split-input-guard.diff" Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id q5M8m2Qx014418 >From cf8505f3db555974c364a2fc5a875292b127719a Mon Sep 17 00:00:00 2001 From: =3D?UTF-8?q?P=3DC3=3DA1draig=3D20Brady?=3D Date: Fri, 22 Jun 2012 09:32:34 +0100 Subject: [PATCH] split: ensure output doesn't overwrite input MIME-Version: 1.0 Content-Type: text/plain; charset=3DUTF-8 Content-Transfer-Encoding: 8bit * src/split.c (create): Check if output file is the same inode as the input file. * tests/split/guard-input: New test case. * tests/Makefile.am: Reference new test case. * NEWS: Mention the fix. Improved-by: Jim Meyering Reported-by: Fran=C3=A7ois Pinard --- NEWS | 3 +++ src/split.c | 31 ++++++++++++++++++++++++------- tests/Makefile.am | 1 + tests/split/guard-input | 33 +++++++++++++++++++++++++++++++++ 4 files changed, 61 insertions(+), 7 deletions(-) create mode 100755 tests/split/guard-input diff --git a/NEWS b/NEWS index 54d24c3..8c75a32 100644 --- a/NEWS +++ b/NEWS @@ -18,6 +18,9 @@ GNU coreutils NEWS -= *- outline -*- ls --color would mis-color relative-named symlinks in / [bug introduced in coreutils-8.17] =20 + split now ensures it doesn't overwrite the input file with generated o= utput. + [the bug dates back to the initial implementation] + stat and df now report the correct file system usage, in all situations on GNU/Linux, by correctly determining the block siz= e. [df bug since coreutils-5.0.91, stat bug since the initial implementat= ion] diff --git a/src/split.c b/src/split.c index 53ee271..48b7414 100644 --- a/src/split.c +++ b/src/split.c @@ -92,6 +92,9 @@ static char const *additional_suffix; /* Name of input file. May be "-". */ static char *infile; =20 +/* stat buf for input file. */ +static struct stat in_stat_buf; + /* Descriptor on which output file is open. */ static int output_desc =3D -1; =20 @@ -360,10 +363,25 @@ create (const char *name) { if (!filter_command) { + int fd; + struct stat out_stat_buf; + if (verbose) fprintf (stdout, _("creating file %s\n"), quote (name)); - return open (name, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, - (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_= IWOTH)); + + fd =3D open (name, O_WRONLY | O_CREAT | O_BINARY, + (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IW= OTH)); + if (fd < 0) + return fd; + if (fstat (fd, &out_stat_buf) !=3D 0) + error (EXIT_FAILURE, errno, _("failed to stat %s"), quote (name)= ); + if (SAME_INODE (in_stat_buf, out_stat_buf)) + error (EXIT_FAILURE, 0, _("%s would overwrite input; aborting"), + quote (name)); + if (ftruncate (fd, 0) !=3D 0) + error (EXIT_FAILURE, errno, _("%s: error truncating"), quote (na= me)); + + return fd; } else { @@ -1058,7 +1076,6 @@ parse_chunk (uintmax_t *k_units, uintmax_t *n_units= , char *slash) int main (int argc, char **argv) { - struct stat stat_buf; enum Split_type split_type =3D type_undef; size_t in_blk_size =3D 0; /* optimal block size of input file device *= / char *buf; /* file i/o buffer */ @@ -1335,16 +1352,16 @@ main (int argc, char **argv) =20 /* Get the optimal block size of input device and make a buffer. */ =20 - if (fstat (STDIN_FILENO, &stat_buf) !=3D 0) + if (fstat (STDIN_FILENO, &in_stat_buf) !=3D 0) error (EXIT_FAILURE, errno, "%s", infile); if (in_blk_size =3D=3D 0) - in_blk_size =3D io_blksize (stat_buf); + in_blk_size =3D io_blksize (in_stat_buf); =20 if (split_type =3D=3D type_chunk_bytes || split_type =3D=3D type_chunk= _lines) { off_t input_offset =3D lseek (STDIN_FILENO, 0, SEEK_CUR); - if (usable_st_size (&stat_buf)) - file_size =3D stat_buf.st_size; + if (usable_st_size (&in_stat_buf)) + file_size =3D in_stat_buf.st_size; else if (0 <=3D input_offset) { file_size =3D lseek (STDIN_FILENO, 0, SEEK_END); diff --git a/tests/Makefile.am b/tests/Makefile.am index d8bc930..2155cee 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -268,6 +268,7 @@ TESTS =3D \ split/l-chunk \ split/r-chunk \ split/numeric \ + split/guard-input \ misc/stat-birthtime \ misc/stat-fmt \ misc/stat-hyphen \ diff --git a/tests/split/guard-input b/tests/split/guard-input new file mode 100755 index 0000000..7a6fba3 --- /dev/null +++ b/tests/split/guard-input @@ -0,0 +1,33 @@ +#!/bin/sh +# ensure split doesn't overwrite input with output. + +# Copyright (C) 2012 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +. "${srcdir=3D.}/init.sh"; path_prepend_ ../src +print_ver_ split + +seq 10 | tee exp-1 > xaa +ln -s xaa in2 +ln xaa in3 + +split -C 6 xaa && fail=3D1 +split -C 6 in2 && fail=3D1 +split -C 6 in3 && fail=3D1 +split -C 6 - < xaa && fail=3D1 + +compare exp-1 xaa || fail=3D1 + +Exit $fail --=20 1.7.6.4 --------------060201020701080106050902-- ------------=_1340355122-497-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 21 Jun 2012 22:15:47 +0000 Received: from localhost ([127.0.0.1]:52086 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1ShpfG-00014k-OY for submit@debbugs.gnu.org; Thu, 21 Jun 2012 18:15:47 -0400 Received: from eggs.gnu.org ([208.118.235.92]:51599) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1ShpfE-00014d-A5 for submit@debbugs.gnu.org; Thu, 21 Jun 2012 18:15:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Shpbh-0003xL-Ja for submit@debbugs.gnu.org; Thu, 21 Jun 2012 18:12:09 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:50133) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Shpbh-0003xF-GS for submit@debbugs.gnu.org; Thu, 21 Jun 2012 18:12:05 -0400 Received: from eggs.gnu.org ([208.118.235.92]:46638) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Shpbf-00079c-NC for bug-coreutils@gnu.org; Thu, 21 Jun 2012 18:12:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Shpbd-0003w0-NL for bug-coreutils@gnu.org; Thu, 21 Jun 2012 18:12:03 -0400 Received: from mx.meyering.net ([88.168.87.75]:58593) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Shpbd-0003vh-Gg for bug-coreutils@gnu.org; Thu, 21 Jun 2012 18:12:01 -0400 Received: from rho.meyering.net (rho.meyering.net [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id 290A860212; Fri, 22 Jun 2012 00:12:00 +0200 (CEST) From: Jim Meyering To: =?iso-8859-1?Q?Fran=E7ois?= Pinard Subject: Re: Slight bug in split :-) In-Reply-To: <86y5ngjs19.fsf@mercure.progiciels-bpi.ca> (=?iso-8859-1?Q?=22Fran=E7ois?= Pinard"'s message of "Thu, 21 Jun 2012 17:04:18 -0400") References: <86y5ngjs19.fsf@mercure.progiciels-bpi.ca> Date: Fri, 22 Jun 2012 00:12:00 +0200 Message-ID: <8762aktivj.fsf@rho.meyering.net> Lines: 38 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: submit Cc: bug-coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) Fran=E7ois Pinard wrote: > Hi, Jim. > > I was looking for a problematic spot from a big file, and to isolate it, > used "split" repeatedly as a way to zoom into the proper place. Just to > try, I used "split -C 100000 xad" at one place (after saving "xad" > first, of course). "split" interrupted itself, producing less output > than input. > > My suggestion would be that split moans in some way before it destroys > its own input. :-) > > Fran=E7ois Hi Fran=E7ois! Thank you for reporting that. That's definitely a bug. For the record, here's a quick reproducer: $ seq 10 > xaa $ split -C 6 xaa $ wc -c x?? 6 xaa 1 xab 7 total $ head x?? =3D=3D> xaa <=3D=3D 1 2 3 =3D=3D> xab <=3D=3D 3$ I've Cc'd the bug list, in case someone would like to write the patch (fix, NEWS and test) before I get to it. I may not have time tomorrow. ------------=_1340355122-497-1-- From unknown Sat Aug 09 14:04:49 2025 X-Loop: help-debbugs@gnu.org Subject: bug#11761: Slight bug in split :-) Resent-From: Paul Eggert Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 22 Jun 2012 19:13:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 11761 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 11761@debbugs.gnu.org, P@draigBrady.com Received: via spool by 11761-submit@debbugs.gnu.org id=B11761.134039237128296 (code B ref 11761); Fri, 22 Jun 2012 19:13:01 +0000 Received: (at 11761) by debbugs.gnu.org; 22 Jun 2012 19:12:51 +0000 Received: from localhost ([127.0.0.1]:53191 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Si9Hm-0007ML-La for submit@debbugs.gnu.org; Fri, 22 Jun 2012 15:12:50 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:59275) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Si9Hl-0007MF-DI for 11761@debbugs.gnu.org; Fri, 22 Jun 2012 15:12:50 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 42AC0A6007A; Fri, 22 Jun 2012 12:09:09 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BtmEpX+oG2JD; Fri, 22 Jun 2012 12:09:08 -0700 (PDT) Received: from [192.168.1.10] (pool-108-23-119-2.lsanca.fios.verizon.net [108.23.119.2]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id D5205A60070; Fri, 22 Jun 2012 12:09:08 -0700 (PDT) Message-ID: <4FE4C2DC.9080901@cs.ucla.edu> Date: Fri, 22 Jun 2012 12:09:16 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux i686; rv:13.0) Gecko/20120615 Thunderbird/13.0.1 MIME-Version: 1.0 References: <86y5ngjs19.fsf@mercure.progiciels-bpi.ca> <8762aktivj.fsf@rho.meyering.net> <4FE3B1BD.5050808@draigBrady.com> <87bokb93vb.fsf@rho.meyering.net> <4FE4313F.3040401@draigBrady.com> In-Reply-To: <4FE4313F.3040401@draigBrady.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.9 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) >From a filesystem point of view, would it be more efficient to invoke ftruncate at the end of writing, rather than at the beginning? That way, if the file already exists and is of the right size, it won't need to be reallocated. We're not trying to write any holes, so this optimization should be valid. Please don't let this comment slow you down, as your patch is fine as-is. I'm mainly asking because I was wondering about the issue in general. From unknown Sat Aug 09 14:04:49 2025 X-Loop: help-debbugs@gnu.org Subject: bug#11761: Slight bug in split :-) Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 22 Jun 2012 19:31:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 11761 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: 11761@debbugs.gnu.org Received: via spool by 11761-submit@debbugs.gnu.org id=B11761.1340393450620 (code B ref 11761); Fri, 22 Jun 2012 19:31:02 +0000 Received: (at 11761) by debbugs.gnu.org; 22 Jun 2012 19:30:50 +0000 Received: from localhost ([127.0.0.1]:53202 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Si9ZB-00009w-MS for submit@debbugs.gnu.org; Fri, 22 Jun 2012 15:30:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:6192) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Si9Z9-00009o-06 for 11761@debbugs.gnu.org; Fri, 22 Jun 2012 15:30:48 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q5MJQvcj002952 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 22 Jun 2012 15:26:58 -0400 Received: from [10.36.116.30] (ovpn-116-30.ams2.redhat.com [10.36.116.30]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q5MJQtPP019226 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 22 Jun 2012 15:26:57 -0400 Message-ID: <4FE4C6FE.9030205@draigBrady.com> Date: Fri, 22 Jun 2012 20:26:54 +0100 From: =?UTF-8?Q?P=C3=A1draig?= Brady User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0 MIME-Version: 1.0 References: <86y5ngjs19.fsf@mercure.progiciels-bpi.ca> <8762aktivj.fsf@rho.meyering.net> <4FE3B1BD.5050808@draigBrady.com> <87bokb93vb.fsf@rho.meyering.net> <4FE4313F.3040401@draigBrady.com> <4FE4C2DC.9080901@cs.ucla.edu> In-Reply-To: <4FE4C2DC.9080901@cs.ucla.edu> X-Enigmail-Version: 1.3.2 Content-Type: text/plain; charset=UTF-8 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id q5MJQvcj002952 X-Spam-Score: -6.9 (------) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) On 06/22/2012 08:09 PM, Paul Eggert wrote: >>>From a filesystem point of view, would it be > more efficient to invoke ftruncate at the end of > writing, rather than at the beginning? That way, > if the file already exists and is of the right size, > it won't need to be reallocated. We're not trying > to write any holes, so this optimization should be > valid. >=20 > Please don't let this comment slow you down, as your > patch is fine as-is. I'm mainly asking because I was > wondering about the issue in general. Hmm, I suppose at the writing stage, truncating after writing could be more efficient. Though if we're updating a split set, and the new set had some new files, then the new split set could be more likely be on separate parts of the disk, hence slowing future processing of the split set? You could also argue that you should free up as much place as possible and let the file system decide where best to allocate stuff, which can change over time. I'd err on the side of simplicity here. cheers, P=C3=A1draig.