GNU bug report logs - #46048
split -n K/N loses data, sum of output files is smaller than input file.

Previous Next

Package: coreutils;

Reported by: Paul Hirst <contact <at> phirst.org>

Date: Sat, 23 Jan 2021 08:26:02 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Paul Hirst <contact <at> phirst.org>
Subject: bug#46048: closed (Re: bug#46048: split -n K/N loses data, sum of
 output files is smaller than input file.)
Date: Mon, 25 Jan 2021 14:22:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#46048: split -n K/N loses data, sum of output files is smaller than input file.

which was filed against the coreutils package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 46048 <at> debbugs.gnu.org.

-- 
46048: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=46048
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Paul Hirst <contact <at> phirst.org>, 46048-done <at> debbugs.gnu.org
Subject: Re: bug#46048: split -n K/N loses data, sum of output files is
 smaller than input file.
Date: Mon, 25 Jan 2021 14:21:35 +0000
[Message part 3 (text/plain, inline)]
On 24/01/2021 19:55, Paul Eggert wrote:
> On 1/24/21 8:52 AM, Pádraig Brady wrote:
>> -      if (lseek (STDIN_FILENO, start, SEEK_CUR) < 0)
>> +      if (lseek (STDIN_FILENO, start, SEEK_SET) < 0)
> 
> Dumb question: will this handle the case where you're splitting from
> stdin and stdin is a seekable file and its initial file offset is nonzero?

Right. Following on the logic from input_file_size(),
I'm going with the attached, which I'll push later.
Marking this as done.

thanks,
Pádraig
[split-k_of_n.patch (text/x-patch, attachment)]
[Message part 5 (message/rfc822, inline)]
From: Paul Hirst <contact <at> phirst.org>
To: bug-coreutils <at> gnu.org
Subject: split -n K/N loses data,
 sum of output files is smaller than input file.
Date: Fri, 22 Jan 2021 18:58:03 -1000
[Message part 6 (text/plain, inline)]
split --number K/N appears to lose data in, with the sum of the sizes of
the output files being smaller than the original input file by 131072 bytes.

$ split --version
split (GNU coreutils) 8.30
...

$ head -c 1000000 < /dev/urandom > test.dat
$ split --number=1/4 test.dat > t1
$ split --number=2/4 test.dat > t2
$ split --number=3/4 test.dat > t3
$ split --number=4/4 test.dat > t4

$ ls -l
-rw-r--r-- 1 user user  250000 Jan 22 18:36 t1
-rw-r--r-- 1 user user  250000 Jan 22 18:36 t2
-rw-r--r-- 1 user user  250000 Jan 22 18:36 t3
-rw-r--r-- 1 user user  118928 Jan 22 18:36 t4
-rw-r--r-- 1 user user 1000000 Jan 22 18:33 test.dat

Surely this should not be the case?

Paul
[Message part 7 (text/html, inline)]

This bug report was last modified 4 years and 98 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.