GNU bug report logs -
#36130
split bug
Previous Next
Reported by: Heather Wick <heather.c.wick <at> gmail.com>
Date: Fri, 7 Jun 2019 18:47:01 UTC
Severity: normal
Tags: notabug
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
Message #14 received at 36130 <at> debbugs.gnu.org (full text, mbox):
Hello,
On Fri, Jun 07, 2019 at 09:48:44PM -0400, Heather Wick wrote:
> Yes, sorry, I should have specified that I already checked that the
> original fastq files are indeed paired and sorted with the same number of
> lines and same starting/ending IDs, narrowing down the issue to a problem
> with split.
It could be a problem with "split", but we'll need to dig a bit deeper
to be able to pinpoint the exact issue.
Could you please try the following commands and post the results?
zcat MH1_R1.fastq.gz \
| split --verbose -l 40000000 - DHT_R1_ > DHT_R1.log ; echo DHT_R1 exit code: $?
zcat MH1_R2.fastq.gz \
| split --verbose -l 40000000 - DHT_R2_ > DHT_R2.log ; echo DHT_R2 exit code: $?
wc -l DHT_R1.log DHT_R2.log
Two more questions:
1. can you post the result of "split --version" ?
2. You mentioned "jobs" - if you are running these as submitted jobs on
a cluster (e.g. with "qsub"), can you double-check the STDERR log files
to ensure no errors where encountered ?
If we still can't pinpoint the issue, the next steps would be to check
the DHT_R{1,2}.log files, and then try to compare the content of the
splitted files.
I assume the input files are indeed correctly paired, but just to check,
if you could try the following command, it should not print anything
to the screen (indicating all sequence IDs are paired):
paste <(zcat MH1_R2.fastq) <(zcat MH1_R2.fastq.gz) \
| awk 'NR%4!=1 { next } $1!=$3 { print "Error in line " NR ":" $1 " vs " $3 }'
regards,
- assaf
This bug report was last modified 5 years and 332 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.