From debbugs-submit-bounces@debbugs.gnu.org Fri Jun 07 14:46:33 2019 Received: (at submit) by debbugs.gnu.org; 7 Jun 2019 18:46:33 +0000 Received: from localhost ([127.0.0.1]:52415 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hZJsl-0003RK-RD for submit@debbugs.gnu.org; Fri, 07 Jun 2019 14:46:32 -0400 Received: from lists.gnu.org ([209.51.188.17]:38914) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hZJWW-0002tJ-3W for submit@debbugs.gnu.org; Fri, 07 Jun 2019 14:23:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:49456) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hZJWU-0006zs-S5 for bug-coreutils@gnu.org; Fri, 07 Jun 2019 14:23:31 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE,URIBL_BLOCKED autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hZJWT-0005c4-Gl for bug-coreutils@gnu.org; Fri, 07 Jun 2019 14:23:30 -0400 Received: from mail-lf1-x129.google.com ([2a00:1450:4864:20::129]:34103) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hZJWT-0005Zg-4y for bug-coreutils@gnu.org; Fri, 07 Jun 2019 14:23:29 -0400 Received: by mail-lf1-x129.google.com with SMTP id y198so2343480lfa.1 for ; Fri, 07 Jun 2019 11:23:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=MuSsqMk2A9SaeXrh41OVYWIaMqef5U/1xshnVGp1NEc=; b=FbYh8QEbBj/SInGw7KEcJTQ9qRnANagR2Na8shtgajIXHO4cL2oaZk21cCPdS9J2QJ S5M2D9l4OqKdHOHeQWedOUIM7z76NcvTViUPCbyS+avQUrH07FCBfslmITIX2XYISxfh LypGSe9UDierw23SeGGAyHrjambMu4HLGSxBzxmH11wFC17YZQGpglNrb1DJArVT29MX zFpsOJ24rtu36Glb3JU/QWIeIQcBbTrVtmxkbUoBB4i3OpUjBWjST9/Mb3/QqbgXDFcC VpdHSlT8AsIjAq4TT/xj1nohG7N95kcsGcp3ZMWMv3RT0zebD1l0T3RsAAn0iqzPsOoj hBnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=MuSsqMk2A9SaeXrh41OVYWIaMqef5U/1xshnVGp1NEc=; b=GB/h0+D+1NFGnkjaeFdkBMN1ed0s2sUmMRZozRHwO7JNCU8s+hRVOTejXxL/6TG2Su ZHKMQN2AQQD4RRP6VLlhr1V8w4xLDGcpEs1LoyMmsHAMaGR3YrwGKiKB5sAey0ENQf0A Amwfrer2/1jP7DbrV6CiTdLUWFsdS/cQMeOYW9u1m6yPZtiZ8kJvtThCqRyDiFzFfxwK 6v86QhfQmFWajNFG4U6oaopTdQ0cha9gVcL/cqbF8yIw6OkZwnyYbBBRt1y816eSvcol NhXPUd51Z+9fgUt0V9k7fodHAENbr6AaDON1YkTQBxKd0us9xwcu+dcFcohyT/ohnyq9 YNyA== X-Gm-Message-State: APjAAAWOPznqQTKTAxdM2IvsqJQY8QsHQHgbsX01EZbHACMtnCH60etn ysRq5Nhckx1TZmCMcHo2xp5hNdieLdYgk/qGoIERq0Ai X-Google-Smtp-Source: APXvYqwPNEAjZb2fKQOYwqorPn0LRgUg8FcIhdrwxvS9K6EtbEFPGVd78LHU1UhgI+hEetw1ehxPIPmy4PTfxrhpqwo= X-Received: by 2002:ac2:4358:: with SMTP id o24mr26944215lfl.13.1559931806837; Fri, 07 Jun 2019 11:23:26 -0700 (PDT) MIME-Version: 1.0 From: Heather Wick Date: Fri, 7 Jun 2019 14:23:15 -0400 Message-ID: Subject: split bug To: bug-coreutils@gnu.org Content-Type: multipart/alternative; boundary="000000000000387183058abfedb4" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::129 X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 07 Jun 2019 14:46:30 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) --000000000000387183058abfedb4 Content-Type: text/plain; charset="UTF-8" Hello, I am using split to split up some large, paired fastq files (nearly 4 billion lines each). I am using the -l flag to split into files of 10 million reads (40 million lines) each and though the fastq files have matched and sorted reads, split is creating different numbers of split files for the two paired fastq files, and the pairing becomes off at some point. The jobs finished without exceeding memory and with an exit status 0, and I noticed the help file said to email this address if there were bugs, so I thought I would mention it. This is the line I am using to call split on my zipped fastq files: zcat MH1_R1.fastq.gz | split - -l 40000000 DHT_R1_ zcat MH1_R2.fastq.gz | split - -l 40000000 DHT_R2_ This creates 96 chunks for the R1 and 95 chunks for R2, even though the orignal fastq files have the same number of reads. Do you have any suggestions for how to proceed? Perhaps zcatting and piping the files is not the best way to call split? Thanks, ~ Heather -- Heather Wick PhD Candidate, Human Genetics Labs of Sarah Wheelan and Vasan Yegnasubramanian Institute of Genetic Medicine Johns Hopkins University School of Medicine hwick1@jhmi.edu --000000000000387183058abfedb4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello,
I am using split to split up some large, paired = fastq files (nearly 4 billion lines each). I am using the -l flag to split = into files of 10 million reads (40 million lines) each and though the fastq= files have matched and sorted reads, split is creating different numbers o= f split files for the two paired fastq files, and the pairing becomes off a= t some point. The jobs finished without exceeding memory and with an exit s= tatus 0, and I noticed the help file said to email this address if there we= re bugs, so I thought I would mention it.
This is the line I am using t= o call split on my zipped fastq files:
zcat MH1_R1.fastq.gz | split - -l= 40000000 DHT_R1_
zcat MH1_R2.fastq.gz | split - -l 40000000 DHT_= R2_
This creates 96 chunks for the R1 and 95 chunks for R2, even = though the orignal fastq files have the same number of reads.
Do you hav= e any suggestions for how to proceed? Perhaps zcatting and piping the files= is not the best way to call split?
Thanks,
~ Heather

--
Heather Wick
PhD Candidate, Human Geneti= cs
Labs of Sarah Wheelan and Vasan Yegnasubramanian
Institute of Gene= tic Medicine
Johns Hopkins University School of Medicine
--000000000000387183058abfedb4-- From debbugs-submit-bounces@debbugs.gnu.org Fri Jun 07 21:29:55 2019 Received: (at 36130) by debbugs.gnu.org; 8 Jun 2019 01:29:55 +0000 Received: from localhost ([127.0.0.1]:52634 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hZQB7-0008Vy-DZ for submit@debbugs.gnu.org; Fri, 07 Jun 2019 21:29:55 -0400 Received: from mail-pl1-f176.google.com ([209.85.214.176]:37455) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hZQB5-0008Vl-Ea for 36130@debbugs.gnu.org; Fri, 07 Jun 2019 21:29:52 -0400 Received: by mail-pl1-f176.google.com with SMTP id bh12so1455362plb.4 for <36130@debbugs.gnu.org>; Fri, 07 Jun 2019 18:29:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=Goc0tzJmg80QlB0G8kjE7xBRuIqsOXCgaqdu1M2EBPg=; b=og+xq5RZIQlfHXa0RE/aZntJePpQkJD5Vyh65kJZQ1NxkwsszbNA5pC5KeiELlvIAf sp/NDSzipl1Wwrx0tMIEXSihwXo4j/tvHZSdVV1MjqLOV1OC89h/SHBAMfmD2ThdJ6+y edVYc9e2zVo+CFrKLRXzZqRBTc+OfSKbfLcKvU5meMidf4nx0Gb7IWs48b0TEj9GPbdn G5F5rHLPl1e95FLCtPssYcv+5EK8RcOl3M3/7HgWUNf3qXhAqjg1cgOMLW1K19YbhQeH 6wsU9naYhtqx4GqSi+9CmYQwItmPowsfcIrLOAYDFviO3SZ7guXF2mu43EapphvN05aK hX/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Goc0tzJmg80QlB0G8kjE7xBRuIqsOXCgaqdu1M2EBPg=; b=BvnRZOLEb3R46r7RC7SrOZegWZOEhM+SinPfLGX+iGw+bXXSp1PAwvUJ0VTODRadNw GPNKE2nJx2u0S3YY/pdAhDqTJmCiUlaoJ9L1zSfE/ZCIMarYEDNie8KCL/yj7u3iCDT2 4S1N/jwi/DGCMgzv782FaxKGm+D1yc3RQtVMcajJtnzOBraoHxhNXhkXU0DjFnwQf7Nt IykMBctArVWORdUVgQudnjH8C8+JD8LoVay42qKa4hO+FvJM8B1ss3woFVXI8ACJFr9U obqwC6f0vYQEwy4+ugxfbnBH6rQbhPsJ6H3daepwQT1tsurqqjSrNxghZRv4LhbOj0sp jFXQ== X-Gm-Message-State: APjAAAVU0daPA67bHKtnwUfSgHOL6RGQCg9ltbLBdzmzOePCR1Eg5Bzh QsB+xKhu7E/y/AI+/Qn3trLTLcZy X-Google-Smtp-Source: APXvYqz1QrvaohdWVvJP6T01UMvEJgqSKAPxi5nFPPDg0huZztbP5xuVjFAqQ0trcGdEBX2ebN01zw== X-Received: by 2002:a17:902:76c3:: with SMTP id j3mr33263500plt.116.1559957385116; Fri, 07 Jun 2019 18:29:45 -0700 (PDT) Received: from tomato (moose.housegordon.com. [184.68.105.38]) by smtp.gmail.com with ESMTPSA id g15sm6703874pfm.119.2019.06.07.18.29.43 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 07 Jun 2019 18:29:44 -0700 (PDT) Received: by tomato (Postfix, from userid 1000) id DC2CB682A1E; Fri, 7 Jun 2019 19:29:42 -0600 (MDT) Date: Fri, 7 Jun 2019 19:29:42 -0600 From: Assaf Gordon To: Heather Wick Subject: Re: bug#36130: split bug Message-ID: <20190608012942.GE18519@tomato.moose.housegordon.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.4 (2019-03-13) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 36130 Cc: 36130@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hello, On Fri, Jun 07, 2019 at 02:23:15PM -0400, Heather Wick wrote: > I am using split to split up some large, paired fastq files [...]: > > zcat MH1_R1.fastq.gz | split - -l 40000000 DHT_R1_ > zcat MH1_R2.fastq.gz | split - -l 40000000 DHT_R2_ > > This creates 96 chunks for the R1 and 95 chunks for R2, even though the > orignal fastq files have the same number of reads. > > Do you have any suggestions for how to proceed? Perhaps zcatting and piping > the files is not the best way to call split? To help diagnose to issue better, please run the following commands and tell us what are the results: 1. number of lines in each file: zcat MH1_R1.fastq.gz | wc -l zcat MH1_R2.fastq.gz | wc -l 2. The first two sequence IDs: zcat MH1_R1.fastq.gz | head -n8 | grep ^@ zcat MH1_R2.fastq.gz | head -n8 | grep ^@ 3. Last two sequence IDs: zcat MH1_R1.fastq.gz | tail -n8 | grep ^@ zcat MH1_R2.fastq.gz | tail -n8 | grep ^@ These will just verify the FASTQ files are indeed paired with no surprises. The files should have the same number of lines, and matching sequence IDs in the first and last lines. regards, - assaf From debbugs-submit-bounces@debbugs.gnu.org Fri Jun 07 21:59:56 2019 Received: (at 36130) by debbugs.gnu.org; 8 Jun 2019 01:59:56 +0000 Received: from localhost ([127.0.0.1]:52664 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hZQeB-0000lV-NU for submit@debbugs.gnu.org; Fri, 07 Jun 2019 21:59:56 -0400 Received: from mail-lj1-f182.google.com ([209.85.208.182]:42365) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hZQTe-0000WF-58 for 36130@debbugs.gnu.org; Fri, 07 Jun 2019 21:49:03 -0400 Received: by mail-lj1-f182.google.com with SMTP id t28so3243362lje.9 for <36130@debbugs.gnu.org>; Fri, 07 Jun 2019 18:49:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=s7mxeyPsSHwMazMgauC5rXZNaNuE5iC9XJxk8eSM6/s=; b=LxWTJh/zOc1OX9wUdfUA8c/Nwphh8bmqI2UudXL3M/j9k3N5nuGrPK5GZ7JqEMW1Dp B5S8aDPEjIYg6KwLVHARgjbb0ODtpNhJE87zXQ9yLv/+ZWXxYJrnGNvPbRMzwE5LAytg jBksLImnkJL3pV0VAwdWvcBrhXDGLQ5akWKDaQENNDow284xe8u3WN/VLXZ/3Ylj/bEg yWMWlZ+ySWjtd+sg2t77uYOj0WZZg/Iqh+V2b/JHp670sCsEyolea9x76rNIAusMHInJ k9TqbbRaNfBoPRVxcX5+tdIVamgiapkyf0k0RB5Hhoyy6LiSTbipaCq49weMsnPrnKE3 U+3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=s7mxeyPsSHwMazMgauC5rXZNaNuE5iC9XJxk8eSM6/s=; b=oLO/hhwNUEfC2Ix6eMVXCXUG5/KIPGR4XApCV23vxNLtHqKBZo841CIsRqPhQ1+BF1 5ggB2a5Ayo4iGtCPdEGQN3TLf2vZ6OOu9IYBBwPfyye0PEAWoHRR5fvw38dmUapMgalU 6/yW1ridhYTcLwfIKUYtrSh9BFyDzP5OZdIR8ChILcttUnV4uSSG/XNyIv8PMXifPuln IDIi2kCH5J+q7za39+wtTLjmexB4npOqBQ+rZZxob2Sad6V/VvEBgVuyUXWQQP/A939a /sMXicf1N3n/w19DntNIwq0DH8vKMzlE73hQ6huIrS0zPP/GhXIFMuaSfFsiI0DrF61/ b0PA== X-Gm-Message-State: APjAAAVaojYrnq1NgwYhPpVKfwUR9MGs7jIeiPxxuE+rcehyIua+dWmf lElwkKUdE4SvbBFaALV401hTcaw4EBC/kPtp1io= X-Google-Smtp-Source: APXvYqxW5J5GxTa3N5cMefpiGVazbaypbZvA0gyGwQLRbP8qxUDVlDDWLKWhsdfnarJf7c31bRETxUE/WL+cG8wEvMI= X-Received: by 2002:a2e:8116:: with SMTP id d22mr17660862ljg.8.1559958535926; Fri, 07 Jun 2019 18:48:55 -0700 (PDT) MIME-Version: 1.0 References: <20190608012942.GE18519@tomato.moose.housegordon.com> In-Reply-To: <20190608012942.GE18519@tomato.moose.housegordon.com> From: Heather Wick Date: Fri, 7 Jun 2019 21:48:44 -0400 Message-ID: Subject: Re: bug#36130: split bug To: Assaf Gordon Content-Type: multipart/alternative; boundary="00000000000065f85b058ac62637" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 36130 X-Mailman-Approved-At: Fri, 07 Jun 2019 21:59:54 -0400 Cc: 36130@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --00000000000065f85b058ac62637 Content-Type: text/plain; charset="UTF-8" Hi, Yes, sorry, I should have specified that I already checked that the original fastq files are indeed paired and sorted with the same number of lines and same starting/ending IDs, narrowing down the issue to a problem with split. ~ Heather (base) [hwick@zappalogin ~]$ zcat MH2_R2.fastq.gz | wc -l 3778103832 (base) [hwick@zappalogin ~]$ zcat MH2_R1.fastq.gz | wc -l 3778103832 (base) [hwick@zappalogin test_2019]$ zcat MH2_R1.fastq.gz | head -n8 | grep ^@ @A00197:48:HF2GWDMXX:1:1101:1741:1000 1:N:0:GATCAG+TCTTTCCC @A00197:48:HF2GWDMXX:1:1101:2754:1000 1:N:0:GATCAG+TCTTTCCC (base) [hwick@zappalogin test_2019]$ zcat MH2_R2.fastq.gz | head -n8 | grep ^@ @A00197:48:HF2GWDMXX:1:1101:1741:1000 2:N:0:GATCAG+TCTTTCCC @A00197:48:HF2GWDMXX:1:1101:2754:1000 2:N:0:GATCAG+TCTTTCCC (base) [hwick@zappalogin test_2019]$ zcat MH2_R1.fastq.gz | tail -n8 | grep ^@ @E00489:288:HMFWCCCXY:2:2224:29305:73106 1:N:0:GATCAG @E00489:288:HMFWCCCXY:2:2224:29325:73106 1:N:0:GATCAG (base) [hwick@zappalogin test_2019]$ zcat MH2_R2.fastq.gz | tail -n8 | grep ^@ @E00489:288:HMFWCCCXY:2:2224:29305:73106 2:N:0:GATCAG @E00489:288:HMFWCCCXY:2:2224:29325:73106 2:N:0:GATCAG On Fri, Jun 7, 2019 at 9:29 PM Assaf Gordon wrote: > Hello, > > On Fri, Jun 07, 2019 at 02:23:15PM -0400, Heather Wick wrote: > > I am using split to split up some large, paired fastq files [...]: > > > > zcat MH1_R1.fastq.gz | split - -l 40000000 DHT_R1_ > > zcat MH1_R2.fastq.gz | split - -l 40000000 DHT_R2_ > > > > This creates 96 chunks for the R1 and 95 chunks for R2, even though the > > orignal fastq files have the same number of reads. > > > > Do you have any suggestions for how to proceed? Perhaps zcatting and > piping > > the files is not the best way to call split? > > To help diagnose to issue better, please run the following commands > and tell us what are the results: > > 1. number of lines in each file: > > zcat MH1_R1.fastq.gz | wc -l > zcat MH1_R2.fastq.gz | wc -l > > 2. The first two sequence IDs: > > zcat MH1_R1.fastq.gz | head -n8 | grep ^@ > zcat MH1_R2.fastq.gz | head -n8 | grep ^@ > > 3. Last two sequence IDs: > > zcat MH1_R1.fastq.gz | tail -n8 | grep ^@ > zcat MH1_R2.fastq.gz | tail -n8 | grep ^@ > > These will just verify the FASTQ files are indeed paired with no > surprises. The files should have the same number of lines, > and matching sequence IDs in the first and last lines. > > regards, > - assaf > > -- Heather Wick PhD Candidate, Human Genetics Labs of Sarah Wheelan and Vasan Yegnasubramanian Institute of Genetic Medicine Johns Hopkins University School of Medicine hwick1@jhmi.edu --00000000000065f85b058ac62637 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,
Yes, sorry, I should have specified that I already= checked that the original fastq files are indeed paired and sorted with th= e same number of lines and same starting/ending IDs, narrowing down the iss= ue to a problem with split.
~ Heather


(base) [hwick@= zappalogin ~]$ zcat=C2=A0 MH2_R2.fastq.gz | wc -l

3778103832

(ba= se) [hwick@zappalogin ~]$ zcat= =C2=A0 MH2_R1.fastq.gz | wc -l

3778103832


<= span class=3D"gmail-s1" style=3D"font-variant-ligatures:no-common-ligatures= ">(base) [hwick@zappalogin test_2019]$ zcat MH2_R1.fastq.gz | head -n8 | gr= ep ^@

@A00197:48:HF2GWDMXX:1:1101:1741:1000 = 1:N:0:GATCAG+TCTTTCCC

@A00197:48:HF2GWDMXX:1= :1101:2754:1000 1:N:0:GATCAG+TCTTTCCC

(base)= [hwick@zappalogin test_2019]$ zcat MH2_R2.fastq.gz | head -n8 | grep ^@

@A00197:48:HF2GWDMXX:1:1101:1741:1000 2:N:0:GA= TCAG+TCTTTCCC

@A0= 0197:48:HF2GWDMXX:1:1101:2754:1000 2:N:0:GATCAG+TCTTTCCC


(base) [hwick@zappalo= gin test_2019]$ zcat MH2_R1.fastq.gz | tail -n8 | grep ^@

@E0= 0489:288:HMFWCCCXY:2:2224:29305:73106 1:N:0:GATCAG

@E0= 0489:288:HMFWCCCXY:2:2224:29325:73106 1:N:0:GATCAG

(base) [hwick@zappalogin te= st_2019]$ zcat MH2_R2.fastq.gz | tail -n8 | grep ^@

@E0= 0489:288:HMFWCCCXY:2:2224:29305:73106 2:N:0:GATCAG

@E0= 0489:288:HMFWCCCXY:2:2224:29325:73106 2:N:0:GATCAG




On Fr= i, Jun 7, 2019 at 9:29 PM Assaf Gordon <assafgordon@gmail.com> wrote:
Hello,

On Fri, Jun 07, 2019 at 02:23:15PM -0400, Heather Wick wrote:
> I am using split to split up some large, paired fastq files [...]:
>
>=C2=A0 =C2=A0zcat MH1_R1.fastq.gz | split - -l 40000000 DHT_R1_
>=C2=A0 =C2=A0zcat MH1_R2.fastq.gz | split - -l 40000000 DHT_R2_
>
> This creates 96 chunks for the R1 and 95 chunks for R2, even though th= e
> orignal fastq files have the same number of reads.
>
> Do you have any suggestions for how to proceed? Perhaps zcatting and p= iping
> the files is not the best way to call split?

To help diagnose to issue better, please run the following commands
and tell us what are the results:

1. number of lines in each file:

=C2=A0 =C2=A0zcat MH1_R1.fastq.gz | wc -l
=C2=A0 =C2=A0zcat MH1_R2.fastq.gz | wc -l

2. The first two sequence IDs:

=C2=A0 =C2=A0zcat MH1_R1.fastq.gz | head -n8 | grep ^@
=C2=A0 =C2=A0zcat MH1_R2.fastq.gz | head -n8 | grep ^@

3. Last two sequence IDs:

=C2=A0 =C2=A0zcat MH1_R1.fastq.gz | tail -n8 | grep ^@
=C2=A0 =C2=A0zcat MH1_R2.fastq.gz | tail -n8 | grep ^@

These will just verify the FASTQ files are indeed paired with no
surprises. The files should have the same number of lines,
and matching sequence IDs in the first and last lines.

regards,
=C2=A0- assaf



--
Heather Wick
PhD Candidate, Human Genetics
Labs of = Sarah Wheelan and Vasan Yegnasubramanian
Institute of Genetic MedicineJohns Hopkins University School of Medicine
--00000000000065f85b058ac62637-- From debbugs-submit-bounces@debbugs.gnu.org Fri Jun 07 23:39:31 2019 Received: (at 36130) by debbugs.gnu.org; 8 Jun 2019 03:39:31 +0000 Received: from localhost ([127.0.0.1]:52688 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hZSCY-0007YY-Ob for submit@debbugs.gnu.org; Fri, 07 Jun 2019 23:39:30 -0400 Received: from mail-pf1-f171.google.com ([209.85.210.171]:39248) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hZSCV-0007YK-Ec for 36130@debbugs.gnu.org; Fri, 07 Jun 2019 23:39:29 -0400 Received: by mail-pf1-f171.google.com with SMTP id j2so2217519pfe.6 for <36130@debbugs.gnu.org>; Fri, 07 Jun 2019 20:39:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=2mZya3zYviL0Rh50ul1BwoLbyy8vF5PkfzQw0PSVBQk=; b=Kan1QckxGazFWZGeeubPgXwMRG+cgQarPwtdn99D95xh71yDbs2jOIcRfjziCKqI6d lau1dislnb5VJWQbbzdYGeC5vgGguY+RnUKkkT8LVFDWmq/lYAD6qa2L8LdfB5bUA9YJ WjRk97EerkD0zDxGJcpa7c5Dz2WHQtuXIPr4DDlahfnP8Crv7RMls2e7uY+HRAJ5kkeJ 9uGC6Y/6t8LBhQkcky0Tcc+mIB+IJxsRZjfsaoFb9nic3egDmoskurrwc88dtnrBv2CW o/mRNJDr0nviiEwmyfzuChrIRyDEFGaVvfWwjfaoNBoc4RMr+AIln0kA9wHxXhxCw9Q0 zm2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=2mZya3zYviL0Rh50ul1BwoLbyy8vF5PkfzQw0PSVBQk=; b=tkah9LfW/CwQ0R3Yv7e6GyxQGzq5E/+o39gZYHNB/8lHGXyofF4aCtkz3G30oY5myX yD+tHFGw7iYGvyqvm2Ma/8NXpEFmU0a26QhAPCCQ3TtzR/nXmVsV7dA59pbsB+BtA4s0 pcadRuCmqgfDBshf5VUHgiwzm5Y1LH5+r8k8QZYxA4y1iYG2TT78tTJksRK+fXRaq91R JwMcUWpQyTelO9mwIp8LFiNMsuEtrvlMbI34Nv5O6dGMO9xmhbJ2/De8cYrtRYvB3Z// +RCo6snqg9B+E2i1ZuNH4uAqKCRBxc51WqTcLz/W9e1es8569bLcd8YuLC1Sl+cbJrre UqEA== X-Gm-Message-State: APjAAAU+Lf/NxRAqFxfPTCmJqwpxFBqdx6wBjN2Ka4srQqKaOA55v0Vl B69USoOpcgD96qVgrqQqy6EDZ+VS X-Google-Smtp-Source: APXvYqzVqRv1daslzMx/EdJueuWx6w6CHUbLCWfIIFWKvY23bztKXlKFjqLgFZYXnS9dqbs7v+fGtw== X-Received: by 2002:a17:90a:d587:: with SMTP id v7mr6516975pju.28.1559965161079; Fri, 07 Jun 2019 20:39:21 -0700 (PDT) Received: from tomato (moose.housegordon.com. [184.68.105.38]) by smtp.gmail.com with ESMTPSA id o70sm4076507pfo.33.2019.06.07.20.39.19 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 07 Jun 2019 20:39:19 -0700 (PDT) Received: by tomato (Postfix, from userid 1000) id 6557F682A1E; Fri, 7 Jun 2019 21:39:18 -0600 (MDT) Date: Fri, 7 Jun 2019 21:39:18 -0600 From: Assaf Gordon To: Heather Wick Subject: Re: bug#36130: split bug Message-ID: <20190608033918.GA22150@tomato.moose.housegordon.com> References: <20190608012942.GE18519@tomato.moose.housegordon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.4 (2019-03-13) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 36130 Cc: 36130@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hello, On Fri, Jun 07, 2019 at 09:48:44PM -0400, Heather Wick wrote: > Yes, sorry, I should have specified that I already checked that the > original fastq files are indeed paired and sorted with the same number of > lines and same starting/ending IDs, narrowing down the issue to a problem > with split. It could be a problem with "split", but we'll need to dig a bit deeper to be able to pinpoint the exact issue. Could you please try the following commands and post the results? zcat MH1_R1.fastq.gz \ | split --verbose -l 40000000 - DHT_R1_ > DHT_R1.log ; echo DHT_R1 exit code: $? zcat MH1_R2.fastq.gz \ | split --verbose -l 40000000 - DHT_R2_ > DHT_R2.log ; echo DHT_R2 exit code: $? wc -l DHT_R1.log DHT_R2.log Two more questions: 1. can you post the result of "split --version" ? 2. You mentioned "jobs" - if you are running these as submitted jobs on a cluster (e.g. with "qsub"), can you double-check the STDERR log files to ensure no errors where encountered ? If we still can't pinpoint the issue, the next steps would be to check the DHT_R{1,2}.log files, and then try to compare the content of the splitted files. I assume the input files are indeed correctly paired, but just to check, if you could try the following command, it should not print anything to the screen (indicating all sequence IDs are paired): paste <(zcat MH1_R2.fastq) <(zcat MH1_R2.fastq.gz) \ | awk 'NR%4!=1 { next } $1!=$3 { print "Error in line " NR ":" $1 " vs " $3 }' regards, - assaf From debbugs-submit-bounces@debbugs.gnu.org Mon Jun 10 14:29:09 2019 Received: (at 36130) by debbugs.gnu.org; 10 Jun 2019 18:29:09 +0000 Received: from localhost ([127.0.0.1]:57492 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1haP2a-00014P-Do for submit@debbugs.gnu.org; Mon, 10 Jun 2019 14:29:08 -0400 Received: from mail-lj1-f173.google.com ([209.85.208.173]:33400) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1haP2Y-00013q-9t for 36130@debbugs.gnu.org; Mon, 10 Jun 2019 14:29:07 -0400 Received: by mail-lj1-f173.google.com with SMTP id h10so3061189ljg.0 for <36130@debbugs.gnu.org>; Mon, 10 Jun 2019 11:29:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=addC2H3xo8TK+/qVdMJXmTY2dzgS8iNWBFGiEkMEWuM=; b=hSwvJ9IvxmU77QLmASd2/+bA0jK3B8Ij1NhXLFrB8uY+mPy/uBtYv5JEiIAhIwqmI8 JYyan+8zS2zm0gvvq+rXOXtfeZs7wUZMcOAbosVNpirqBt0KxHfwilldHDfDT/O6P608 AWiZ6dsA6tGj+w3mcrhi0sQzEs1dIgfU+IrBp5rH7PHh5/HX5NhasE7FBRqaslFKd/OM LKYWE3r7M1qrA6ibSCkvUSz1AVvgNtXfiH/BKQuPhhN7vmqycTkp94iGg8sMQi4z0YJm qjkeuRfHNVZM02Lz6w6kL9pHCXM8jl/rsGnXnDg/Y2eH3xQvIFrR+UnrxaQiTEkUp24o RkxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=addC2H3xo8TK+/qVdMJXmTY2dzgS8iNWBFGiEkMEWuM=; b=OHwFCtGYo6shQMXlyshNa+eJ6uEwE7aQxirwE6TXRf8dP4oiWZKpxQqV4KKNUoT95K y+hnVg4FX5dVEw+nTHj9Px3rPRIIJ0DH6Y3d4ExW+9bxiccjHxzCQdAO8WCIeVYm4uAx loUehCmLbJ2vuA+BFo6af+z072c1S3TwCV/CQTd7P0yLn1Mm35u+6WeuYntnH0JDt8qZ kH9SIxsXMINlMfXia3fIu7TNmKIB6xtw6zJ6ei2PqTgHqCvPNkcqnnejUB9a9nvKBPd8 /+QySndagOeWme3AQO8ICwlHybxe9/lKvJnTPQmtYs3Ku6K7EH6Ro8hKdFvAXLxMILRE fkIw== X-Gm-Message-State: APjAAAWAWv1SoHKTLesxTsOdSLmfXesu0m12mDGi982z50zXoWifgLbJ xrruByHSGOpF1yBZRNRyjq1RgV1JZYsryCC6eBQ= X-Google-Smtp-Source: APXvYqx6jD7bgUqEgP4wkhReEs2XOQTuJK4wtaPf/VXofMASGShmHOWhmsN0WuLrzAppysMSEAnYbQzFuyWSz6SM9co= X-Received: by 2002:a2e:9c03:: with SMTP id s3mr21781715lji.209.1560191340124; Mon, 10 Jun 2019 11:29:00 -0700 (PDT) MIME-Version: 1.0 References: <20190608012942.GE18519@tomato.moose.housegordon.com> <20190608033918.GA22150@tomato.moose.housegordon.com> In-Reply-To: <20190608033918.GA22150@tomato.moose.housegordon.com> From: Heather Wick Date: Mon, 10 Jun 2019 14:28:48 -0400 Message-ID: Subject: Re: bug#36130: split bug To: Assaf Gordon Content-Type: multipart/alternative; boundary="0000000000009c1d02058afc5a33" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 36130 Cc: 36130@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --0000000000009c1d02058afc5a33 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thank you so much for your response. Here are the results of the tests you sent: Verbose: This seems to have made the same number of files this time; not sure why the other 3-4 times I ran it it did not. They appear to be the same size, with paired last reads (base) [hwick@zappalogin interactive_with_verbose]$ cat make_chunks_1_1mill_verbose DHT_R1 exit code: 0 DHT_R2 exit code: 0 96 DHT_R1.log 96 DHT_R2.log 192 total Version: (base) [hwick@zappalogin test_2019]$ split --version split (GNU coreutils) 8.4 Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later . This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Torbj=C3=B6rn Granlund and Richard M. Stallman. STDERR: The only thing in the stderr file is an odd duck of: -sh: module: line 1: syntax error: unexpected end of file -sh: error importing function definition for `BASH_FUNC_module' Python 3.6.8 :: Anaconda, Inc. /bin/sh: module: line 1: syntax error: unexpected end of file /bin/sh: error importing function definition for `BASH_FUNC_module' but this prints for every job I run with this particular flavor of conda/ba= sh and doesn't seem to affect anything else (as far as I know) All jobs finished well below allotted memory and with exit status 0, even when split didn't make the right number of output files. Do you know any reason why the behavior would be inconsistent? Pairing check: unfortunately my server's version of bash doesn't support paste in this way, I've run into this issue before but I forget what the workaround is. I can't run this command interactively because my server times out (these files are > 3 billion lines each, so it takes a long time to zcat them) /cm/local/apps/sge/var/spool/zappa-06/job_scripts/358558: line 10: syntax error near unexpected token `(' /cm/local/apps/sge/var/spool/zappa-06/job_scripts/358558: line 10: `paste <(zcat MH1_R2.fastq) <(zcat MH1_R2.fastq.gz) \' On Fri, Jun 7, 2019 at 11:39 PM Assaf Gordon wrote: > Hello, > > On Fri, Jun 07, 2019 at 09:48:44PM -0400, Heather Wick wrote: > > Yes, sorry, I should have specified that I already checked that the > > original fastq files are indeed paired and sorted with the same number = of > > lines and same starting/ending IDs, narrowing down the issue to a probl= em > > with split. > > It could be a problem with "split", but we'll need to dig a bit deeper > to be able to pinpoint the exact issue. > > Could you please try the following commands and post the results? > > zcat MH1_R1.fastq.gz \ > | split --verbose -l 40000000 - DHT_R1_ > DHT_R1.log ; echo DHT_R1 > exit code: $? > zcat MH1_R2.fastq.gz \ > | split --verbose -l 40000000 - DHT_R2_ > DHT_R2.log ; echo DHT_R2 > exit code: $? > wc -l DHT_R1.log DHT_R2.log > > Two more questions: > 1. can you post the result of "split --version" ? > 2. You mentioned "jobs" - if you are running these as submitted jobs on > a cluster (e.g. with "qsub"), can you double-check the STDERR log files > to ensure no errors where encountered ? > > If we still can't pinpoint the issue, the next steps would be to check > the DHT_R{1,2}.log files, and then try to compare the content of the > splitted files. > > I assume the input files are indeed correctly paired, but just to check, > if you could try the following command, it should not print anything > to the screen (indicating all sequence IDs are paired): > > paste <(zcat MH1_R2.fastq) <(zcat MH1_R2.fastq.gz) \ > | awk 'NR%4!=3D1 { next } $1!=3D$3 { print "Error in line " NR ":"= $1 " > vs " $3 }' > > regards, > - assaf > > > --=20 Heather Wick PhD Candidate, Human Genetics Labs of Sarah Wheelan and Vasan Yegnasubramanian Institute of Genetic Medicine Johns Hopkins University School of Medicine hwick1@jhmi.edu --0000000000009c1d02058afc5a33 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you so much for your response. Here are the results = of the tests you sent:
Verbose: This seems to have made the same number = of files this time; not sure why the other 3-4 times I ran it it did not. T= hey appear to be the same size, with paired last reads

(base) [hwick@= zappalogin interactive_with_verbose]$ cat make_chunks_1_1mill_verbose

DHT_R1 exit co= de: 0

DHT_R2 exit co= de: 0

=C2=A0 96 DHT_R1.log

=C2=A0 96 DHT_R2.log

=C2=A0192 total

Version:

(base) [hwick@zappalogin test_2019]$ split --vers= ion

split (GNU coreutils) 8.4

Copyright (C) 2010 Free Software Foundation, = Inc.

License GPLv3+: GNU GPL version 3 or later &l= t;http://gnu= .org/licenses/gpl.html>.

This is free software: you are free to change= and redistribute it.

There is NO WARRANTY, to the extent permitted= by law.


Written by Torbj=C3=B6rn Granlund and Richard= M. Stallman.


STDERR:
The only thing in the stderr file is= an odd duck of:

-sh: module: line 1: syntax error: unexpected end of file

<= p class=3D"m_-8868565086725582587gmail-p1" style=3D"margin:0px;font:11px Me= nlo;background-color:rgb(254,244,139)">-sh: error = importing function definition for `BASH_FUNC_module'

Python 3.6.8 :: An= aconda, Inc.

/bin/sh: module: line 1: syntax error: unexpected end of file=

/bin/sh: error importing function definit= ion for `BASH_FUNC_module'

but this prints for every job I run with this particu= lar flavor of conda/bash and doesn't seem to affect anything els= e (as far as I know)
All jobs finished well below allotted memory and wi= th exit status 0, even when split didn't make the right number of outpu= t files.

Do you know any reason why the behavior would b= e inconsistent?

Pairing check: unfortunately my server's version= of bash doesn't support paste in this way, I've run into this issu= e before but I forget what the workaround is. I can't run this command = interactively because my server times out (these files are > 3 billion l= ines each, so it takes a long time to zcat them)

/cm/loca= l/apps/sge/var/spool/zappa-06/job_scripts/358558: line 10: syntax error nea= r unexpected token `('

/cm/loca= l/apps/sge/var/spool/zappa-06/job_scripts/358558: line 10: `paste <(zcat= MH1_R2.fastq) <(zcat MH1_R2.fastq.gz) \'


On Fri, Jun 7, 2= 019 at 11:39 PM Assaf Gordon <assafgordon@gmail.com> wrote:
Hello,

On Fri, Jun 07, 2019 at 09:48:44PM -0400, Heather Wick wrote:
> Yes, sorry, I should have specified that I already checked that the > original fastq files are indeed paired and sorted with the same number= of
> lines and same starting/ending IDs, narrowing down the issue to a prob= lem
> with split.

It could be a problem with "split", but we'll need to dig a b= it deeper
to be able to pinpoint the exact issue.

Could you please try the following commands and post the results?

=C2=A0 =C2=A0 zcat MH1_R1.fastq.gz \
=C2=A0 =C2=A0 =C2=A0 =C2=A0| split --verbose -l 40000000 - DHT_R1_ > DHT= _R1.log ; echo DHT_R1 exit code: $?
=C2=A0 =C2=A0 zcat MH1_R2.fastq.gz \
=C2=A0 =C2=A0 =C2=A0 =C2=A0| split --verbose -l 40000000 - DHT_R2_ > DHT= _R2.log ; echo DHT_R2 exit code: $?
=C2=A0 =C2=A0 wc -l DHT_R1.log DHT_R2.log

Two more questions:
1. can you post the result of "split --version" ?
2. You mentioned "jobs" - if you are running these as submitted j= obs on
a cluster (e.g. with "qsub"), can you double-check the STDERR log= files
to ensure no errors where encountered ?

If we still can't pinpoint the issue, the next steps would be to check<= br> the DHT_R{1,2}.log files, and then try to compare the content of the
splitted files.

I assume the input files are indeed correctly paired, but just to check, if you could try the following command, it should not print anything
to the screen (indicating all sequence IDs are paired):

=C2=A0 =C2=A0 paste <(zcat MH1_R2.fastq) <(zcat MH1_R2.fastq.gz) \ =C2=A0 =C2=A0 =C2=A0 =C2=A0| awk 'NR%4!=3D1 { next } $1!=3D$3 { print &= quot;Error in line " NR ":" $1 " vs " $3 }'
regards,
=C2=A0- assaf




--
Heather Wick
PhD Candidate= , Human Genetics
Labs of Sarah Wheelan and Vasan Yegnasubramanian
Ins= titute of Genetic Medicine
Johns Hopkins University School of Medicinehwick1@= jhmi.edu
--0000000000009c1d02058afc5a33-- From debbugs-submit-bounces@debbugs.gnu.org Mon Jun 10 17:38:48 2019 Received: (at 36130) by debbugs.gnu.org; 10 Jun 2019 21:38:48 +0000 Received: from localhost ([127.0.0.1]:57567 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1haS08-0006E3-G4 for submit@debbugs.gnu.org; Mon, 10 Jun 2019 17:38:48 -0400 Received: from mail.magicbluesmoke.com ([82.195.144.49]:40474) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1haS06-0006Dp-Qr for 36130@debbugs.gnu.org; Mon, 10 Jun 2019 17:38:47 -0400 Received: from localhost.localdomain (unknown [109.78.255.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id 0CE8F9E0E; Mon, 10 Jun 2019 22:38:45 +0100 (IST) Subject: Re: bug#36130: split bug To: Heather Wick , Assaf Gordon References: <20190608012942.GE18519@tomato.moose.housegordon.com> <20190608033918.GA22150@tomato.moose.housegordon.com> From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: <3e3fb065-f873-9b66-496d-9fa58efabe43@draigBrady.com> Date: Mon, 10 Jun 2019 22:38:44 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 36130 Cc: 36130@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 10/06/19 19:28, Heather Wick wrote: > Thank you so much for your response. Here are the results of the tests you > sent: > Verbose: This seems to have made the same number of files this time; not > sure why the other 3-4 times I ran it it did not. They appear to be the > same size, with paired last reads > > (base) [hwick@zappalogin interactive_with_verbose]$ cat > make_chunks_1_1mill_verbose > > DHT_R1 exit code: 0 > > DHT_R2 exit code: 0 > > 96 DHT_R1.log > > 96 DHT_R2.log > > 192 total > Version: > > (base) [hwick@zappalogin test_2019]$ split --version > > split (GNU coreutils) 8.4 That is nearly 10 years old now, though in saying that I'm not sure if there were any bugs fixed that would explain what you're seeing. One possibility is: https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=758916b which would manifest as silently ignoring errors when reading input. cheers, Pádraig From debbugs-submit-bounces@debbugs.gnu.org Mon Jun 10 18:50:33 2019 Received: (at 36130) by debbugs.gnu.org; 10 Jun 2019 22:50:33 +0000 Received: from localhost ([127.0.0.1]:57631 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1haT7Z-0001bw-AD for submit@debbugs.gnu.org; Mon, 10 Jun 2019 18:50:33 -0400 Received: from mail-pg1-f169.google.com ([209.85.215.169]:37149) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1haT7W-0001be-1l for 36130@debbugs.gnu.org; Mon, 10 Jun 2019 18:50:32 -0400 Received: by mail-pg1-f169.google.com with SMTP id 20so5796035pgr.4 for <36130@debbugs.gnu.org>; Mon, 10 Jun 2019 15:50:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=UJGeFgvEU9tzPzoxgDLbBCX2neV4lcWPtg9zm1K+idw=; b=MWBzM94WvYTo/FYLuNhOqWf7UTNOtg2tR8pBl6qkDjvkcNqTuvuKUg4zgMzPGPN5TJ XQBl33DTMk+ni18D4x914XXMmLrjFOtzUm/fWx+BO4oJO/1Vqi5S35OWd9pQDm2Gmxo7 /Q2bBLeD+Fx5D/Se/M/PkjWIbWMLQBV97gQLLJ/7kT3kCixicTLzrmzFY3LQDqGm7qkC tmQpmKDycnVncUQz7V01mJS6sl5xVkr9J0CljRiunrG6e7KDEzi19vY5MyoYG5KcZUxj hZ/y+s+EGWFFXNCfFQ/X5ukR7WeEO/UieQfjGgLNSRqiHUlsqPr9AmNp1v5azd41pidH k10Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=UJGeFgvEU9tzPzoxgDLbBCX2neV4lcWPtg9zm1K+idw=; b=L1KnMLuvyqGR+CDhpPSSUCjZszFeEMwtbDe34YUR1YWGdNbVU88m2cQAlEaYP8ujfd Q/UZ1LN5Kg6VgYB09+I0Feb3+NrKS0OgZid4nT3PxeSiViYUJBFdCPDf6uCkX7wHQzag eK4ZA1OF63+wTpSlmQWCE/HcR+c4thFnJViGfmCqsY4R2oofKGtNB0i/A/Og3uiy1p4N FJpCtISpxDCMmBaEonK8X0jdcyytiUICS0Ju+Zyuj3Bg3oCIu/clF652lshDcLSJ1G5M /QJLsftFz+qu1M8Qb0J+Lc4DH5yhMhcRXu4CdeJ3nbRqKj3O2l5UurlFgg5bWFpbiy/R FwUA== X-Gm-Message-State: APjAAAWFLXMjR2vIFjLqA+cPkVk2tPj3Qb2xa1GadW8UgKg8AEaQWl6V cWjwRGGuQsTonu87VelS/Nw9DsAW X-Google-Smtp-Source: APXvYqzkuge6VXlGi+2VLtECxIcGK3WqI/ylH8PkPAkKMCivniHwP4QUkvBwZwo/yrobdfkI7QDugQ== X-Received: by 2002:a62:61c2:: with SMTP id v185mr61512121pfb.0.1560207023433; Mon, 10 Jun 2019 15:50:23 -0700 (PDT) Received: from tomato.moose.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id g8sm13289586pgd.29.2019.06.10.15.50.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Jun 2019 15:50:22 -0700 (PDT) Subject: Re: bug#36130: split bug To: Heather Wick References: <20190608012942.GE18519@tomato.moose.housegordon.com> <20190608033918.GA22150@tomato.moose.housegordon.com> From: Assaf Gordon Message-ID: <4b8acb93-fc1f-8ab9-bbba-331b2e10ba5e@gmail.com> Date: Mon, 10 Jun 2019 16:50:20 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 36130 Cc: 36130@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hello, On 2019-06-10 12:28 p.m., Heather Wick wrote: > Thank you so much for your response. Here are the results of the tests > you sent: > Verbose: This seems to have made the same number of files this time; not > sure why the other 3-4 times I ran it it did not. They appear to be the > same size, with paired last reads [...] Glad to hear it worked. Could it be that in previous times the queued job ran out of disk space? That would be my first guess, as such things are common in shared grid/cluster environments, particularly if your job runs in a temporary and limited storage location (e.g. "/tmp/job-NNNN"). I would suspect that the exit-code you are seeing is the exit code of the entire job (that is - of the shell script that is being qsub'd), and not necessarily that of 'split' (then again, this might not be correct if you explicitly checked the exit code of 'split'). Given that your grid environment already has configuration issues (the bash and "module" related errors), I would not be surprised if the exit code is not reliable. I would strongly encourage to always look into the STDERR file of the job to verify no other errors occurred. Or, perhaps write shell scripts more defensively, like so: [...] zcat MH1_R1.fastq.gz | split -l 40000000 - DHT_R1_ \ && echo split MH1_R1 OK \ || echo split MH1_R1 FAILED [...] Then checking the STDOUT for positive confirmation each program succeeded. Or perhaps: # define a shell function "die" to print an error and terminate die() { base=$(basename "$0") echo "$base: error: $*" >&2 exit 1 } zcat MH1_R1.fastq.gz | split -l 40000000 - DHT_R1_ \ || die "split MH1_R1 failed" And then run at least one job that will fail on purpose, and ensure you see the error message in the STDERR log, and you get a non-zero exit code (and then ensure you use 'die' on every command). It is sometimes recommended to use "set -e" for "easy" error handling in shell scripts- but I would recommend against it. Many reasons detailed here: https://mywiki.wooledge.org/BashFAQ/105 It might be more frustrating to add such extra checks on every program, but from my humble experience, grid environments bring on so many more intermittent and transient problems that it is definitely worth it. > > STDERR: > The only thing in the stderr file is an odd duck of: > > -sh: module: line 1: syntax error: unexpected end of file > > -sh: error importing function definition for `BASH_FUNC_module' > > Python 3.6.8 :: Anaconda, Inc. > > /bin/sh: module: line 1: syntax error: unexpected end of file > > /bin/sh: error importing function definition for `BASH_FUNC_module' > > but this prints for every job I run with this particular flavor of > conda/bash and doesn't seem to affect anything else (as far as I know) These errors are specific to your grid/cluster environment, and the best place to ask is the I.T or bioinformatics department in your institute (whomever is in charge of the cluster). Broadly speaking, "module" is mechanism that ease the use of various software packages. It is usally setup by your IT administrators. A typical use-case is to have different version of programs in non- standard locations, e.g. samtools version 1.6 in /opt/it/programs/samtools-1.6 and samtools version 1.9 in /opt/bioinfo/tools/new/samtools/ and then cluster users (e.g. you) just need to add: "module load samtools-1.8" and have the command "samtools" just work without knowing the gritty details of where the program is. It seems that in your case, something relating to the "module" setup is broken. More information here: https://en.wikipedia.org/wiki/Environment_Modules_(software) > All jobs finished well below allotted memory and with exit status 0, > even when split didn't make the right number of output files. > > Do you know any reason why the behavior would be inconsistent? The "alloted memory" is a non-issue for this "split" command, it will always use very little amount of memory regardless of how big the input files are. As for "exit status 0" - I can't be sure, but I suspect the exit status you see is the one of the entire job (i.e. the shell script), and perhaps it does not represent the exit code of the "split" program. If you have the STDERR files of the jobs which failed, it's worth checking them for any additional error messages. > > Pairing check: unfortunately my server's version of bash doesn't support > paste in this way, I've run into this issue before but I forget what the > workaround is. I can't run this command interactively because my server > times out (these files are > 3 billion lines each, so it takes a long > time to zcat them) Ah yes, the construct: program <(other program) is a "bash" feature that is not available in simple shell scripts (interactive use vs non-interactive and other things). One work-around is to run (from inside your script): bash -c "paste <(zcat MH1_R2.fastq) <(zcat MH1_R2.fastq.gz)" \ | awk 'NR%4!=1 { next } $1!=$3 { print "Error in line " NR ":" $1 ---- To conclude: If I understand correctly, the latest attempt worked correctly and there are no problems in "split". If this is the case, we can mark this thread as "done". regards, - assaf From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 26 10:56:59 2019 Received: (at 36130) by debbugs.gnu.org; 26 Jun 2019 14:56:59 +0000 Received: from localhost ([127.0.0.1]:36765 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hg9M2-0001Wt-T8 for submit@debbugs.gnu.org; Wed, 26 Jun 2019 10:56:59 -0400 Received: from mail-pf1-f178.google.com ([209.85.210.178]:46649) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hg9M1-0001Wd-FM; Wed, 26 Jun 2019 10:56:57 -0400 Received: by mail-pf1-f178.google.com with SMTP id 81so1481264pfy.13; Wed, 26 Jun 2019 07:56:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=uWA0crKhbgClTQkUTufG6JOUMnfmV4KURNGCQMJKv5o=; b=tPX3LGEcRRdNMjJBf5c4dax4pdH82AkNoBlumtmng0RDNnXvO6TpTLsc4JNl7+tvkf 5tAmK5/qA5BZ0ADrbkpTYLtMSsiV7vVcLnrXf/idyVu3SfC1HXnU0RnCQiG/yQZ31cjr 8FNnNVySnffdqteY3O4gKFQa7trbeQH+e6vihD8YoTGcnMb5ZiWzFE+gxTVix0Sm9/zy pozylsaMt1fqDKda6rC7Gh4zOWiTkB7rxiad3A1ZT3WVA6nFC1RUXKrUwAU6PmduUC2d ymmuBQtuH61kKLcpcvyGSlOtcbOqXYfhQziul1hqychA/rLkIE47DeY4qCsyIL+VH7oo rtLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=uWA0crKhbgClTQkUTufG6JOUMnfmV4KURNGCQMJKv5o=; b=noyYWEVZilw8Bb7jvqAeFy6ilqdq0Ox4Cx8aC9fnIeu52Mla5vjNNsIPbjGf3wMKtf XgVSX+TufQzaPw1UVpdAagvSe4oGfcg5KYBs+dNXbhSnbPQMBqNH0QLMO/caighf10VO ppq/KSl9eQslkpTXPzcJ2R3h3NUGlmcEeuXVICrnkvutFgXcKIrNJnPkCw9LhkReFSZB mbv8JmRImtg4OFKT0PNSJPSZnDz9WGuZY0/mfsTc6viN6c9TETyjlgiwONdQmBFQphkJ HyDaUW2+hGeJuytg50DRkuUhQpaufXrK1WfglYURSdnS1Rt5iuNRkXCg8OhM5TZ6Wa2i Q0pQ== X-Gm-Message-State: APjAAAWvgt0x71mlsWjqF05aQfpZsbqT+kxCdlJkTB9uhT7xd/Xb6a4R cz24VJOmaMHmK0tFnezHCRAzlGxC X-Google-Smtp-Source: APXvYqxdFWuF3QFNFrpWWiinbkrDJC4sa86MB6pSpY4LKluFAKmUBODwdo8NWGsgfy3I49Gn1okT9g== X-Received: by 2002:a63:257:: with SMTP id 84mr3456793pgc.142.1561561011128; Wed, 26 Jun 2019 07:56:51 -0700 (PDT) Received: from tomato (moose.housegordon.com. [184.68.105.38]) by smtp.gmail.com with ESMTPSA id x25sm19116894pfm.48.2019.06.26.07.56.49 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 26 Jun 2019 07:56:50 -0700 (PDT) Received: by tomato (Postfix, from userid 1000) id 48429680A50; Wed, 26 Jun 2019 08:56:49 -0600 (MDT) Date: Wed, 26 Jun 2019 08:56:49 -0600 From: Assaf Gordon To: Heather Wick Subject: Re: bug#36130: split bug Message-ID: <20190626145649.GE22150@tomato.moose.housegordon.com> References: <20190608012942.GE18519@tomato.moose.housegordon.com> <20190608033918.GA22150@tomato.moose.housegordon.com> <4b8acb93-fc1f-8ab9-bbba-331b2e10ba5e@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4b8acb93-fc1f-8ab9-bbba-331b2e10ba5e@gmail.com> User-Agent: Mutt/1.11.4 (2019-03-13) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 36130 Cc: 36130@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) tag 36130 notabug close 36130 stop Hello, On Mon, Jun 10, 2019 at 04:50:20PM -0600, Assaf Gordon wrote: > On 2019-06-10 12:28 p.m., Heather Wick wrote: > > Verbose: This seems to have made the same number of files this time; not > > sure why the other 3-4 times I ran it it did not. They appear to be the > > same size, with paired last reads > [...] > > Glad to hear it worked. > > Could it be that in previous times the queued job ran out of disk space? > > That would be my first guess, as such things are common in shared > grid/cluster environments, particularly if your job runs in a temporary > and limited storage location (e.g. "/tmp/job-NNNN"). With no further comments, I'm closing this ticket. If more issues arise (or this was not adequate solution) we can always re-open this ticket. regards, -assaf From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 26 12:08:36 2019 Received: (at 36130) by debbugs.gnu.org; 26 Jun 2019 16:08:36 +0000 Received: from localhost ([127.0.0.1]:36905 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hgATM-0001XZ-8D for submit@debbugs.gnu.org; Wed, 26 Jun 2019 12:08:36 -0400 Received: from mail-lj1-f178.google.com ([209.85.208.178]:35230) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hgATH-0001XG-6j for 36130@debbugs.gnu.org; Wed, 26 Jun 2019 12:08:34 -0400 Received: by mail-lj1-f178.google.com with SMTP id x25so2819345ljh.2 for <36130@debbugs.gnu.org>; Wed, 26 Jun 2019 09:08:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=A6LZioDNTC73xwkYoa77Q7CBM1+X+mrBmOJjxQ91kcE=; b=enpAEiJb+gEvbUbi7DBZPcgzLlV5X/bK36f8rGuyv31beTQ104wif264gzH9Tvdzpq cLWpp7Sq2X3h1zmToSxYFlKOgXVM/9BI3wbH5cnl/SWVFQoCf3jMfyOCzVPsEANClPJ+ y8GnC7IfLYbGfutmJM6GFjGOUsRxiSC00Sq/8ke57KjSLK68IeB0o+Jg/MvPELDdceJC Zr9/dgNhEkOK6A5lIWagGGShGbXhTrCmRchLro8ahfv6ZB1e/3aZWbQZcqR4Uqi+dyUr 0/N4QvXQMrFnO9/Glk/fAnBgHgHmtXiAHfjfWMWRA1GwGMen+/JwRKZbjoYtyd4aD1uE /V0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=A6LZioDNTC73xwkYoa77Q7CBM1+X+mrBmOJjxQ91kcE=; b=aFxlXryRe/ZM8M5O5f95QkZXn4dZ1kt3bWtCIsYuMF9O022++dqt5w4PJ175pf5hRD o8qf33MUOaNBMF76WpvP0FUZu6cS+bDuMblfOg4lv6DEq7+YeJZAiKAZzJH8UNpSFWKQ WqPjpRva80dLq3g3W//jTQpFK72i7cQSPlmZOMSmAht9wpKYOIk960B2oTKp64hRwvK0 LQPBq+JNjrbvfocK/qbk6POYCczerEdA83ezU//jQhOyh1qr3bFRF3Gt0nvqkk/b9sDp x2FMzpVxSUfqBDKXWlD6Ide2rwgZ03bZQxA5MzY+f9eZZj2lR6XkIyWvHtWd9uSQ/A4K C4lw== X-Gm-Message-State: APjAAAUBRLydwkeiejLWKdarZmmYREid1uTbJgKOeZrIF1qq4ZGTDQJ3 V7oMtAcgw60wDib1w2FC7RqTWd2J8EhH6wKBLBE= X-Google-Smtp-Source: APXvYqxuQ96txAnpodtaz+nYgPBbH/mZORtwlSvxXW0gM1icQQ0arUknSa/0li5jDcatu4df1ZULtJIlC28m+MKA+44= X-Received: by 2002:a2e:9a82:: with SMTP id p2mr3540640lji.64.1561565305217; Wed, 26 Jun 2019 09:08:25 -0700 (PDT) MIME-Version: 1.0 References: <20190608012942.GE18519@tomato.moose.housegordon.com> <20190608033918.GA22150@tomato.moose.housegordon.com> <4b8acb93-fc1f-8ab9-bbba-331b2e10ba5e@gmail.com> <20190626145649.GE22150@tomato.moose.housegordon.com> In-Reply-To: <20190626145649.GE22150@tomato.moose.housegordon.com> From: Heather Wick Date: Wed, 26 Jun 2019 12:08:14 -0400 Message-ID: Subject: Re: bug#36130: split bug To: Assaf Gordon Content-Type: multipart/alternative; boundary="0000000000004fa809058c3c4110" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 36130 Cc: 36130@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --0000000000004fa809058c3c4110 Content-Type: text/plain; charset="UTF-8" Thank you for all your help. I will let you know if I run into any more issues. For whatever reason, putting the program on verbose has let it run with no issues that I can determine related to my initial problems. Thanks, ~ Heather On Wed, Jun 26, 2019 at 10:56 AM Assaf Gordon wrote: > tag 36130 notabug > close 36130 > stop > > Hello, > > On Mon, Jun 10, 2019 at 04:50:20PM -0600, Assaf Gordon wrote: > > On 2019-06-10 12:28 p.m., Heather Wick wrote: > > > Verbose: This seems to have made the same number of files this time; > not > > > sure why the other 3-4 times I ran it it did not. They appear to be the > > > same size, with paired last reads > > [...] > > > > Glad to hear it worked. > > > > Could it be that in previous times the queued job ran out of disk space? > > > > That would be my first guess, as such things are common in shared > > grid/cluster environments, particularly if your job runs in a temporary > > and limited storage location (e.g. "/tmp/job-NNNN"). > > > With no further comments, I'm closing this ticket. > If more issues arise (or this was not adequate solution) we can always > re-open this ticket. > > regards, > -assaf > -- Heather Wick PhD Candidate, Human Genetics Labs of Sarah Wheelan and Vasan Yegnasubramanian Institute of Genetic Medicine Johns Hopkins University School of Medicine hwick1@jhmi.edu --0000000000004fa809058c3c4110 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you for all your help. I will let you know if I run = into=C2=A0any more issues. For whatever=C2=A0reason, putting the program on= verbose has let it run with no issues that I can determine related to my i= nitial problems.
Thanks,
~ Heather

On Wed, Jun 26, 2019= at 10:56 AM Assaf Gordon <assa= fgordon@gmail.com> wrote:
tag 36130 notabug
close 36130
stop

Hello,

On Mon, Jun 10, 2019 at 04:50:20PM -0600, Assaf Gordon wrote:
> On 2019-06-10 12:28 p.m., Heather Wick wrote:
> > Verbose: This seems to have made the same number of files this ti= me; not
> > sure why the other 3-4 times I ran it it did not. They appear to = be the
> > same size, with paired last reads
> [...]
>
> Glad to hear it worked.
>
> Could it be that in previous times the queued job ran out of disk spac= e?
>
> That would be my first guess, as such things are common in shared
> grid/cluster environments, particularly if your job runs in a temporar= y
> and limited storage location (e.g. "/tmp/job-NNNN").


With no further comments, I'm closing this ticket.
If more issues arise (or this was not adequate solution) we can always
re-open this ticket.

regards,
=C2=A0-assaf


--
Heather Wick
PhD Candidate, Human Genetics
Labs of = Sarah Wheelan and Vasan Yegnasubramanian
Institute of Genetic MedicineJohns Hopkins University School of Medicine
--0000000000004fa809058c3c4110-- From unknown Sat Jun 21 05:20:16 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 25 Jul 2019 11:24:08 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator