GNU bug report logs -
#36130
split bug
Previous Next
Reported by: Heather Wick <heather.c.wick <at> gmail.com>
Date: Fri, 7 Jun 2019 18:47:01 UTC
Severity: normal
Tags: notabug
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
Message #11 received at 36130 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi,
Yes, sorry, I should have specified that I already checked that the
original fastq files are indeed paired and sorted with the same number of
lines and same starting/ending IDs, narrowing down the issue to a problem
with split.
~ Heather
(base) [hwick <at> zappalogin ~]$ zcat MH2_R2.fastq.gz | wc -l
3778103832
(base) [hwick <at> zappalogin ~]$ zcat MH2_R1.fastq.gz | wc -l
3778103832
(base) [hwick <at> zappalogin test_2019]$ zcat MH2_R1.fastq.gz | head -n8 | grep
^@
@A00197:48:HF2GWDMXX:1:1101:1741:1000 1:N:0:GATCAG+TCTTTCCC
@A00197:48:HF2GWDMXX:1:1101:2754:1000 1:N:0:GATCAG+TCTTTCCC
(base) [hwick <at> zappalogin test_2019]$ zcat MH2_R2.fastq.gz | head -n8 | grep
^@
@A00197:48:HF2GWDMXX:1:1101:1741:1000 2:N:0:GATCAG+TCTTTCCC
@A00197:48:HF2GWDMXX:1:1101:2754:1000 2:N:0:GATCAG+TCTTTCCC
(base) [hwick <at> zappalogin test_2019]$ zcat MH2_R1.fastq.gz | tail -n8 | grep
^@
@E00489:288:HMFWCCCXY:2:2224:29305:73106 1:N:0:GATCAG
@E00489:288:HMFWCCCXY:2:2224:29325:73106 1:N:0:GATCAG
(base) [hwick <at> zappalogin test_2019]$ zcat MH2_R2.fastq.gz | tail -n8 | grep
^@
@E00489:288:HMFWCCCXY:2:2224:29305:73106 2:N:0:GATCAG
@E00489:288:HMFWCCCXY:2:2224:29325:73106 2:N:0:GATCAG
On Fri, Jun 7, 2019 at 9:29 PM Assaf Gordon <assafgordon <at> gmail.com> wrote:
> Hello,
>
> On Fri, Jun 07, 2019 at 02:23:15PM -0400, Heather Wick wrote:
> > I am using split to split up some large, paired fastq files [...]:
> >
> > zcat MH1_R1.fastq.gz | split - -l 40000000 DHT_R1_
> > zcat MH1_R2.fastq.gz | split - -l 40000000 DHT_R2_
> >
> > This creates 96 chunks for the R1 and 95 chunks for R2, even though the
> > orignal fastq files have the same number of reads.
> >
> > Do you have any suggestions for how to proceed? Perhaps zcatting and
> piping
> > the files is not the best way to call split?
>
> To help diagnose to issue better, please run the following commands
> and tell us what are the results:
>
> 1. number of lines in each file:
>
> zcat MH1_R1.fastq.gz | wc -l
> zcat MH1_R2.fastq.gz | wc -l
>
> 2. The first two sequence IDs:
>
> zcat MH1_R1.fastq.gz | head -n8 | grep ^@
> zcat MH1_R2.fastq.gz | head -n8 | grep ^@
>
> 3. Last two sequence IDs:
>
> zcat MH1_R1.fastq.gz | tail -n8 | grep ^@
> zcat MH1_R2.fastq.gz | tail -n8 | grep ^@
>
> These will just verify the FASTQ files are indeed paired with no
> surprises. The files should have the same number of lines,
> and matching sequence IDs in the first and last lines.
>
> regards,
> - assaf
>
>
--
Heather Wick
PhD Candidate, Human Genetics
Labs of Sarah Wheelan and Vasan Yegnasubramanian
Institute of Genetic Medicine
Johns Hopkins University School of Medicine
hwick1 <at> jhmi.edu
[Message part 2 (text/html, inline)]
This bug report was last modified 5 years and 332 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.