GNU bug report logs - #46048
split -n K/N loses data, sum of output files is smaller than input file.

Previous Next

Package: coreutils;

Reported by: Paul Hirst <contact <at> phirst.org>

Date: Sat, 23 Jan 2021 08:26:02 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Pádraig Brady <P <at> draigBrady.com>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#46048: closed (split -n K/N loses data, sum of output files
 is smaller than input file.)
Date: Mon, 25 Jan 2021 14:22:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Mon, 25 Jan 2021 14:21:35 +0000
with message-id <4f858cd0-19e4-d159-c2e7-51b3aad0b3b0 <at> draigBrady.com>
and subject line Re: bug#46048: split -n K/N loses data, sum of output files is smaller than input file.
has caused the debbugs.gnu.org bug report #46048,
regarding split -n K/N loses data, sum of output files is smaller than input file.
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
46048: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=46048
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Paul Hirst <contact <at> phirst.org>
To: bug-coreutils <at> gnu.org
Subject: split -n K/N loses data,
 sum of output files is smaller than input file.
Date: Fri, 22 Jan 2021 18:58:03 -1000
[Message part 3 (text/plain, inline)]
split --number K/N appears to lose data in, with the sum of the sizes of
the output files being smaller than the original input file by 131072 bytes.

$ split --version
split (GNU coreutils) 8.30
...

$ head -c 1000000 < /dev/urandom > test.dat
$ split --number=1/4 test.dat > t1
$ split --number=2/4 test.dat > t2
$ split --number=3/4 test.dat > t3
$ split --number=4/4 test.dat > t4

$ ls -l
-rw-r--r-- 1 user user  250000 Jan 22 18:36 t1
-rw-r--r-- 1 user user  250000 Jan 22 18:36 t2
-rw-r--r-- 1 user user  250000 Jan 22 18:36 t3
-rw-r--r-- 1 user user  118928 Jan 22 18:36 t4
-rw-r--r-- 1 user user 1000000 Jan 22 18:33 test.dat

Surely this should not be the case?

Paul
[Message part 4 (text/html, inline)]
[Message part 5 (message/rfc822, inline)]
From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Paul Hirst <contact <at> phirst.org>, 46048-done <at> debbugs.gnu.org
Subject: Re: bug#46048: split -n K/N loses data, sum of output files is
 smaller than input file.
Date: Mon, 25 Jan 2021 14:21:35 +0000
[Message part 6 (text/plain, inline)]
On 24/01/2021 19:55, Paul Eggert wrote:
> On 1/24/21 8:52 AM, Pádraig Brady wrote:
>> -      if (lseek (STDIN_FILENO, start, SEEK_CUR) < 0)
>> +      if (lseek (STDIN_FILENO, start, SEEK_SET) < 0)
> 
> Dumb question: will this handle the case where you're splitting from
> stdin and stdin is a seekable file and its initial file offset is nonzero?

Right. Following on the logic from input_file_size(),
I'm going with the attached, which I'll push later.
Marking this as done.

thanks,
Pádraig
[split-k_of_n.patch (text/x-patch, attachment)]

This bug report was last modified 4 years and 97 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.