GNU bug report logs -
#11761
Slight bug in split :-)
Previous Next
Reported by: Jim Meyering <jim <at> meyering.net>
Date: Thu, 21 Jun 2012 22:16:02 UTC
Severity: normal
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 11761 in the body.
You can then email your comments to 11761 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#11761
; Package
coreutils
.
(Thu, 21 Jun 2012 22:16:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Jim Meyering <jim <at> meyering.net>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Thu, 21 Jun 2012 22:16:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
François Pinard wrote:
> Hi, Jim.
>
> I was looking for a problematic spot from a big file, and to isolate it,
> used "split" repeatedly as a way to zoom into the proper place. Just to
> try, I used "split -C 100000 xad" at one place (after saving "xad"
> first, of course). "split" interrupted itself, producing less output
> than input.
>
> My suggestion would be that split moans in some way before it destroys
> its own input. :-)
>
> François
Hi François!
Thank you for reporting that.
That's definitely a bug.
For the record, here's a quick reproducer:
$ seq 10 > xaa
$ split -C 6 xaa
$ wc -c x??
6 xaa
1 xab
7 total
$ head x??
==> xaa <==
1
2
3
==> xab <==
3$
I've Cc'd the bug list, in case someone would like to write
the patch (fix, NEWS and test) before I get to it.
I may not have time tomorrow.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#11761
; Package
coreutils
.
(Thu, 21 Jun 2012 23:48:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 11761 <at> debbugs.gnu.org (full text, mbox):
On 06/21/2012 11:12 PM, Jim Meyering wrote:
> François Pinard wrote:
>> Hi, Jim.
>>
>> I was looking for a problematic spot from a big file, and to isolate it,
>> used "split" repeatedly as a way to zoom into the proper place. Just to
>> try, I used "split -C 100000 xad" at one place (after saving "xad"
>> first, of course). "split" interrupted itself, producing less output
>> than input.
>>
>> My suggestion would be that split moans in some way before it destroys
>> its own input. :-)
>>
>> François
>
> Hi François!
> Thank you for reporting that.
> That's definitely a bug.
>
> For the record, here's a quick reproducer:
>
> $ seq 10 > xaa
> $ split -C 6 xaa
> $ wc -c x??
> 6 xaa
> 1 xab
> 7 total
> $ head x??
> ==> xaa <==
> 1
> 2
> 3
>
> ==> xab <==
> 3$
>
> I've Cc'd the bug list, in case someone would like to write
> the patch (fix, NEWS and test) before I get to it.
> I may not have time tomorrow.
Nice catch :)
I'll fix it up with something like the following.
cheers,
Pádraig.
diff --git a/src/split.c b/src/split.c
index 53ee271..3e3313a 100644
--- a/src/split.c
+++ b/src/split.c
@@ -92,6 +92,9 @@ static char const *additional_suffix;
/* Name of input file. May be "-". */
static char *infile;
+/* stat buf for input file. */
+static struct stat in_stat_buf;
+
/* Descriptor on which output file is open. */
static int output_desc = -1;
@@ -362,6 +365,17 @@ create (const char *name)
{
if (verbose)
fprintf (stdout, _("creating file %s\n"), quote (name));
+
+ struct stat out_stat_buf;
+ if (stat (name, &out_stat_buf) == 0)
+ {
+ if (SAME_INODE (in_stat_buf, out_stat_buf))
+ error (EXIT_FAILURE, 0, _("%s would overwrite input. Aborting."),
+ quote (name));
+ }
+ else if (errno != ENOENT)
+ error (EXIT_FAILURE, errno, _("cannot stat %s"), quote (name));
+
return open (name, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
(S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH))
}
@@ -1058,7 +1072,6 @@ parse_chunk (uintmax_t *k_units, uintmax_t *n_units, char
int
main (int argc, char **argv)
{
- struct stat stat_buf;
enum Split_type split_type = type_undef;
size_t in_blk_size = 0; /* optimal block size of input file device */
char *buf; /* file i/o buffer */
@@ -1335,16 +1348,16 @@ main (int argc, char **argv)
/* Get the optimal block size of input device and make a buffer. */
- if (fstat (STDIN_FILENO, &stat_buf) != 0)
+ if (fstat (STDIN_FILENO, &in_stat_buf) != 0)
error (EXIT_FAILURE, errno, "%s", infile);
if (in_blk_size == 0)
- in_blk_size = io_blksize (stat_buf);
+ in_blk_size = io_blksize (in_stat_buf);
if (split_type == type_chunk_bytes || split_type == type_chunk_lines)
{
off_t input_offset = lseek (STDIN_FILENO, 0, SEEK_CUR);
- if (usable_st_size (&stat_buf))
- file_size = stat_buf.st_size;
+ if (usable_st_size (&in_stat_buf))
+ file_size = in_stat_buf.st_size;
else if (0 <= input_offset)
{
file_size = lseek (STDIN_FILENO, 0, SEEK_END);
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#11761
; Package
coreutils
.
(Fri, 22 Jun 2012 08:01:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 11761 <at> debbugs.gnu.org (full text, mbox):
Pádraig Brady wrote:
...
> diff --git a/src/split.c b/src/split.c
> index 53ee271..3e3313a 100644
> --- a/src/split.c
> +++ b/src/split.c
> @@ -92,6 +92,9 @@ static char const *additional_suffix;
> /* Name of input file. May be "-". */
> static char *infile;
>
> +/* stat buf for input file. */
> +static struct stat in_stat_buf;
> +
> /* Descriptor on which output file is open. */
> static int output_desc = -1;
>
> @@ -362,6 +365,17 @@ create (const char *name)
> {
> if (verbose)
> fprintf (stdout, _("creating file %s\n"), quote (name));
> +
> + struct stat out_stat_buf;
> + if (stat (name, &out_stat_buf) == 0)
> + {
> + if (SAME_INODE (in_stat_buf, out_stat_buf))
> + error (EXIT_FAILURE, 0, _("%s would overwrite input. Aborting."),
> + quote (name));
> + }
> + else if (errno != ENOENT)
> + error (EXIT_FAILURE, errno, _("cannot stat %s"), quote (name));
> +
> return open (name, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
> (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH))
> }
Hi Pádraig,
Thanks for taking this on.
That introduces a minor TOCTOU race.
It would probably never matter in practice,
but who knows... if we can avoid it, why not?
What do you think about something like this?
int fd = open (name, (... as above, but without O_TRUNC...)...
if (fd < 0)
return fd;
if ( ! fstat (fd, &out_stat_buf))
error (EXIT_FAILURE, errno, _("failed to fstat %s"), quote (name));
if (SAME_INODE (in_stat_buf, out_stat_buf))
error (EXIT_FAILURE, 0, _("%s would overwrite input. Aborting."),
quote (name));
if ( ! ftruncate (fd, 0))
error ...
return fd;
The above might even be a tiny bit faster for long names,
since it resolves each name only once.
Reply sent
to
Pádraig Brady <P <at> draigBrady.com>
:
You have taken responsibility.
(Fri, 22 Jun 2012 08:52:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Jim Meyering <jim <at> meyering.net>
:
bug acknowledged by developer.
(Fri, 22 Jun 2012 08:52:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 11761-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 06/22/2012 08:56 AM, Jim Meyering wrote:
> Pádraig Brady wrote:
> ...
>> diff --git a/src/split.c b/src/split.c
>> index 53ee271..3e3313a 100644
>> --- a/src/split.c
>> +++ b/src/split.c
>> @@ -92,6 +92,9 @@ static char const *additional_suffix;
>> /* Name of input file. May be "-". */
>> static char *infile;
>>
>> +/* stat buf for input file. */
>> +static struct stat in_stat_buf;
>> +
>> /* Descriptor on which output file is open. */
>> static int output_desc = -1;
>>
>> @@ -362,6 +365,17 @@ create (const char *name)
>> {
>> if (verbose)
>> fprintf (stdout, _("creating file %s\n"), quote (name));
>> +
>> + struct stat out_stat_buf;
>> + if (stat (name, &out_stat_buf) == 0)
>> + {
>> + if (SAME_INODE (in_stat_buf, out_stat_buf))
>> + error (EXIT_FAILURE, 0, _("%s would overwrite input. Aborting."),
>> + quote (name));
>> + }
>> + else if (errno != ENOENT)
>> + error (EXIT_FAILURE, errno, _("cannot stat %s"), quote (name));
>> +
>> return open (name, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
>> (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH))
>> }
>
> Hi Pádraig,
>
> Thanks for taking this on.
> That introduces a minor TOCTOU race.
> It would probably never matter in practice,
> but who knows... if we can avoid it, why not?
> What do you think about something like this?
>
> int fd = open (name, (... as above, but without O_TRUNC...)...
> if (fd < 0)
> return fd;
> if ( ! fstat (fd, &out_stat_buf))
> error (EXIT_FAILURE, errno, _("failed to fstat %s"), quote (name));
> if (SAME_INODE (in_stat_buf, out_stat_buf))
> error (EXIT_FAILURE, 0, _("%s would overwrite input. Aborting."),
> quote (name));
> if ( ! ftruncate (fd, 0))
> error ...
> return fd;
>
> The above might even be a tiny bit faster for long names,
> since it resolves each name only once.
Well probably slower due to the extra truncate syscall,
but point taken on the unlikely TOCTOU race.
I'll push the attached in a while.
cheers,
Pádraig.
>
[split-input-guard.diff (text/plain, attachment)]
Message #17 received at 11761-done <at> debbugs.gnu.org (full text, mbox):
Pádraig Brady wrote:
> I'll push the attached in a while.
...
> @@ -360,10 +363,25 @@ create (const char *name)
> {
> if (!filter_command)
> {
> + int fd;
> + struct stat out_stat_buf;
> +
> if (verbose)
> fprintf (stdout, _("creating file %s\n"), quote (name));
> - return open (name, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
> - (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH));
> +
> + fd = open (name, O_WRONLY | O_CREAT | O_BINARY,
> + (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH));
> + if (fd < 0)
> + return fd;
> + if (fstat (fd, &out_stat_buf) != 0)
> + error (EXIT_FAILURE, errno, _("failed to stat %s"), quote (name));
> + if (SAME_INODE (in_stat_buf, out_stat_buf))
> + error (EXIT_FAILURE, 0, _("%s would overwrite input; aborting"),
> + quote (name));
> + if (ftruncate (fd, 0) != 0)
> + error (EXIT_FAILURE, errno, _("%s: error truncating"), quote (name));
> +
> + return fd;
Thanks.
That looks fine and passes "make check syntax-check" (no surprise).
Please move the declaration of "fd" down to the point of initialization.
That would induce a line-split, but perhaps this is a good excuse to
factor out these: (maybe name it something like MODE_RW_UGO)
$ git grep -l 'S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH'
src/dd.c
src/mkfifo.c
src/mknod.c
src/split.c
src/touch.c
src/truncate.c
Message #18 received at 11761-done <at> debbugs.gnu.org (full text, mbox):
On 06/22/2012 10:26 AM, Jim Meyering wrote:
> Pádraig Brady wrote:
>> I'll push the attached in a while.
> ...
>> @@ -360,10 +363,25 @@ create (const char *name)
>> {
>> if (!filter_command)
>> {
>> + int fd;
>> + struct stat out_stat_buf;
>> +
>> if (verbose)
>> fprintf (stdout, _("creating file %s\n"), quote (name));
>> - return open (name, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
>> - (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH));
>> +
>> + fd = open (name, O_WRONLY | O_CREAT | O_BINARY,
>> + (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH));
>> + if (fd < 0)
>> + return fd;
>> + if (fstat (fd, &out_stat_buf) != 0)
>> + error (EXIT_FAILURE, errno, _("failed to stat %s"), quote (name));
>> + if (SAME_INODE (in_stat_buf, out_stat_buf))
>> + error (EXIT_FAILURE, 0, _("%s would overwrite input; aborting"),
>> + quote (name));
>> + if (ftruncate (fd, 0) != 0)
>> + error (EXIT_FAILURE, errno, _("%s: error truncating"), quote (name));
>> +
>> + return fd;
>
> Thanks.
> That looks fine and passes "make check syntax-check" (no surprise).
> Please move the declaration of "fd" down to the point of initialization.
OK I'll move the stat declaration down too.
> That would induce a line-split, but perhaps this is a good excuse to
> factor out these: (maybe name it something like MODE_RW_UGO)
>
> $ git grep -l 'S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH'
> src/dd.c
> src/mkfifo.c
> src/mknod.c
> src/split.c
> src/touch.c
> src/truncate.c
OK I'll do that first.
cheers,
Pádraig.
Message #19 received at 11761-done <at> debbugs.gnu.org (full text, mbox):
Pádraig Brady wrote:
...
>> Thanks.
>> That looks fine and passes "make check syntax-check" (no surprise).
>> Please move the declaration of "fd" down to the point of initialization.
>
> OK I'll move the stat declaration down too.
>
>> That would induce a line-split, but perhaps this is a good excuse to
>> factor out these: (maybe name it something like MODE_RW_UGO)
>>
>> $ git grep -l 'S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH'
>> src/dd.c
>> src/mkfifo.c
>> src/mknod.c
>> src/split.c
>> src/touch.c
>> src/truncate.c
>
> OK I'll do that first.
Thanks!
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#11761
; Package
coreutils
.
(Fri, 22 Jun 2012 19:13:01 GMT)
Full text and
rfc822 format available.
Message #22 received at 11761 <at> debbugs.gnu.org (full text, mbox):
From a filesystem point of view, would it be
more efficient to invoke ftruncate at the end of
writing, rather than at the beginning? That way,
if the file already exists and is of the right size,
it won't need to be reallocated. We're not trying
to write any holes, so this optimization should be
valid.
Please don't let this comment slow you down, as your
patch is fine as-is. I'm mainly asking because I was
wondering about the issue in general.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#11761
; Package
coreutils
.
(Fri, 22 Jun 2012 19:31:02 GMT)
Full text and
rfc822 format available.
Message #25 received at 11761 <at> debbugs.gnu.org (full text, mbox):
On 06/22/2012 08:09 PM, Paul Eggert wrote:
>>From a filesystem point of view, would it be
> more efficient to invoke ftruncate at the end of
> writing, rather than at the beginning? That way,
> if the file already exists and is of the right size,
> it won't need to be reallocated. We're not trying
> to write any holes, so this optimization should be
> valid.
>
> Please don't let this comment slow you down, as your
> patch is fine as-is. I'm mainly asking because I was
> wondering about the issue in general.
Hmm, I suppose at the writing stage, truncating
after writing could be more efficient.
Though if we're updating a split set,
and the new set had some new files, then the new split set
could be more likely be on separate parts of the disk,
hence slowing future processing of the split set?
You could also argue that you should free up as
much place as possible and let the file system
decide where best to allocate stuff, which can
change over time.
I'd err on the side of simplicity here.
cheers,
Pádraig.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 21 Jul 2012 11:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 13 years and 21 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.