GNU bug report logs - #36130
split bug

Previous Next

Package: coreutils;

Reported by: Heather Wick <heather.c.wick <at> gmail.com>

Date: Fri, 7 Jun 2019 18:47:01 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Heather Wick <heather.c.wick <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: split bug
Date: Fri, 7 Jun 2019 14:23:15 -0400
[Message part 1 (text/plain, inline)]
Hello,
I am using split to split up some large, paired fastq files (nearly 4
billion lines each). I am using the -l flag to split into files of 10
million reads (40 million lines) each and though the fastq files have
matched and sorted reads, split is creating different numbers of split
files for the two paired fastq files, and the pairing becomes off at some
point. The jobs finished without exceeding memory and with an exit status
0, and I noticed the help file said to email this address if there were
bugs, so I thought I would mention it.
This is the line I am using to call split on my zipped fastq files:
zcat MH1_R1.fastq.gz | split - -l 40000000 DHT_R1_
zcat MH1_R2.fastq.gz | split - -l 40000000 DHT_R2_
This creates 96 chunks for the R1 and 95 chunks for R2, even though the
orignal fastq files have the same number of reads.
Do you have any suggestions for how to proceed? Perhaps zcatting and piping
the files is not the best way to call split?
Thanks,
~ Heather

-- 
Heather Wick
PhD Candidate, Human Genetics
Labs of Sarah Wheelan and Vasan Yegnasubramanian
Institute of Genetic Medicine
Johns Hopkins University School of Medicine
hwick1 <at> jhmi.edu
[Message part 2 (text/html, inline)]

This bug report was last modified 5 years and 332 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.