Hello,
I am using split to split up some large, paired fastq files (nearly 4 billion lines each). I am using the -l flag to split into files of 10 million reads (40 million lines) each and though the fastq files have matched and sorted reads, split is creating different numbers of split files for the two paired fastq files, and the pairing becomes off at some point. The jobs finished without exceeding memory and with an exit status 0, and I noticed the help file said to email this address if there were bugs, so I thought I would mention it.This is the line I am using to call split on my zipped fastq files:
zcat MH1_R1.fastq.gz | split - -l 40000000 DHT_R1_
zcat MH1_R2.fastq.gz | split - -l 40000000 DHT_R2_
This creates 96 chunks for the R1 and 95 chunks for R2, even though the orignal fastq files have the same number of reads.
Do you have any suggestions for how to proceed? Perhaps zcatting and piping the files is not the best way to call split?
Thanks,
~ Heather
--
Heather Wick
PhD Candidate, Human Genetics
Labs of Sarah Wheelan and Vasan Yegnasubramanian
Institute of Genetic Medicine
Johns Hopkins University School of Medicine