GNU bug report logs - #46048
split -n K/N loses data, sum of output files is smaller than input file.

Previous Next

Package: coreutils;

Reported by: Paul Hirst <contact <at> phirst.org>

Date: Sat, 23 Jan 2021 08:26:02 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


Message #8 received at 46048 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Hirst <contact <at> phirst.org>, 46048 <at> debbugs.gnu.org
Subject: Re: bug#46048: split -n K/N loses data, sum of output files is
 smaller than input file.
Date: Sun, 24 Jan 2021 16:52:57 +0000
On 23/01/2021 04:58, Paul Hirst wrote:
> split --number K/N appears to lose data in, with the sum of the sizes of
> the output files being smaller than the original input file by 131072 bytes.
> 
> $ split --version
> split (GNU coreutils) 8.30
> ...
> 
> $ head -c 1000000 < /dev/urandom > test.dat
> $ split --number=1/4 test.dat > t1
> $ split --number=2/4 test.dat > t2
> $ split --number=3/4 test.dat > t3
> $ split --number=4/4 test.dat > t4
> 
> $ ls -l
> -rw-r--r-- 1 user user  250000 Jan 22 18:36 t1
> -rw-r--r-- 1 user user  250000 Jan 22 18:36 t2
> -rw-r--r-- 1 user user  250000 Jan 22 18:36 t3
> -rw-r--r-- 1 user user  118928 Jan 22 18:36 t4
> -rw-r--r-- 1 user user 1000000 Jan 22 18:33 test.dat
> 
> Surely this should not be the case?

Ugh. This functionality was broken for all files > 128KiB
due to adjustments for handling /dev/zero

$ truncate -s 1000000 test.dat
$ split --number=4/4 test.dat | wc -c
118928

The following patch fixes it here.
I need to do some more testing, before committing.

thanks!

diff --git a/src/split.c b/src/split.c
index 0660da13f..6aa8d50e9 100644
--- a/src/split.c
+++ b/src/split.c
@@ -1001,7 +1001,7 @@ bytes_chunk_extract (uintmax_t k, uintmax_t n, char *buf, size_t bufsize,
     }
   else
     {
-      if (lseek (STDIN_FILENO, start, SEEK_CUR) < 0)
+      if (lseek (STDIN_FILENO, start, SEEK_SET) < 0)
         die (EXIT_FAILURE, errno, "%s", quotef (infile));
       initial_read = SIZE_MAX;
     }




This bug report was last modified 4 years and 97 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.