From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 23 16:02:54 2015 Received: (at submit) by debbugs.gnu.org; 23 Nov 2015 21:02:54 +0000 Received: from localhost ([127.0.0.1]:49941 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0yFu-0000zH-Qj for submit@debbugs.gnu.org; Mon, 23 Nov 2015 16:02:54 -0500 Received: from eggs.gnu.org ([208.118.235.92]:42068) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0y4A-0000bm-7j for submit@debbugs.gnu.org; Mon, 23 Nov 2015 15:50:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a0y49-0007YP-0b for submit@debbugs.gnu.org; Mon, 23 Nov 2015 15:50:25 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,HTML_MESSAGE autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:38245) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0y48-0007YL-UU for submit@debbugs.gnu.org; Mon, 23 Nov 2015 15:50:24 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52074) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0y47-0004Su-Ut for bug-coreutils@gnu.org; Mon, 23 Nov 2015 15:50:24 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a0y42-0007XR-Vh for bug-coreutils@gnu.org; Mon, 23 Nov 2015 15:50:23 -0500 Received: from mail2out.cw.bc.ca ([207.23.159.109]:5594 helo=ironport1.cw.bc.ca) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a0y42-0007Vl-Mj for bug-coreutils@gnu.org; Mon, 23 Nov 2015 15:50:18 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2ChBADfelNW/yoeAQpeGQEBAg8BAQEBtHKQbBKFfQKCBRABAQEBAQEBgQqEKxAtXgELAWETJgEEG8diAQEBAQEFAQEBAQEBHYZWiVeDSYEVBY4UiDIKAQ6pbTiEUIV8AQEB X-IPAS-Result: A2ChBADfelNW/yoeAQpeGQEBAg8BAQEBtHKQbBKFfQKCBRABAQEBAQEBgQqEKxAtXgELAWETJgEEG8diAQEBAQEFAQEBAQEBHYZWiVeDSYEVBY4UiDIKAQ6pbTiEUIV8AQEB X-IronPort-AV: E=Sophos;i="5.20,338,1444719600"; d="scan'208,217";a="129495542" Received: from srvexht01.phsabc.ehcnet.ca ([10.1.30.42]) by ironport1.cw.bc.ca with ESMTP; 23 Nov 2015 12:50:13 -0800 Received: from VEXCCR01.phsabc.ehcnet.ca ([fe80::1491:c735:4d7d:86e7]) by SRVEXHT01.phsabc.ehcnet.ca ([::1]) with mapi; Mon, 23 Nov 2015 12:50:13 -0800 From: "Macdonald, Kim - BCCDC" To: "'bug-coreutils@gnu.org'" Date: Mon, 23 Nov 2015 12:50:12 -0800 Subject: Is it possible to tab separate concatenated files? Thread-Topic: Is it possible to tab separate concatenated files? Thread-Index: AdEmMIwZeuq9tPlIQ7WeQoC4u7WsAg== Message-ID: <98F8BBDDC88408489C9C92296763678F02561A62AB28@VEXCCR01.phsabc.ehcnet.ca> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_98F8BBDDC88408489C9C92296763678F02561A62AB28VEXCCR01phs_" MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 23 Nov 2015 16:02:33 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) --_000_98F8BBDDC88408489C9C92296763678F02561A62AB28VEXCCR01phs_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi! I'm just looking at the options for the cat command - I see there's a way t= o ignore tabs when they exist - but is there a way to tab separate the file= s you're concatenating with the cat command? Thanks, Kim --_000_98F8BBDDC88408489C9C92296763678F02561A62AB28VEXCCR01phs_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
Hi!
 
I’m just looking at the options for the cat command – I se= e there’s a way to ignore tabs when they exist – but is there a= way to tab separate the files you’re concatenating with the cat comm= and?
 
Thanks,
Kim
 
 
 
--_000_98F8BBDDC88408489C9C92296763678F02561A62AB28VEXCCR01phs_-- From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 23 17:02:20 2015 Received: (at 22001) by debbugs.gnu.org; 23 Nov 2015 22:02:20 +0000 Received: from localhost ([127.0.0.1]:49959 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0zBk-0002jj-2E for submit@debbugs.gnu.org; Mon, 23 Nov 2015 17:02:20 -0500 Received: from mail-vk0-f53.google.com ([209.85.213.53]:35563) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0zBP-0002is-Qh for 22001@debbugs.gnu.org; Mon, 23 Nov 2015 17:02:18 -0500 Received: by vkha189 with SMTP id a189so48032084vkh.2 for <22001@debbugs.gnu.org>; Mon, 23 Nov 2015 14:01:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-type:content-transfer-encoding; bh=JVAVHMo9TGv9o93uaMUVj3DrcWfGT8jah0CETy7dino=; b=UEcnNDULzIgN+kDO2Me96aF9lFrsyq+7fZrBBU6ZgsG2PSaNCOI78gKeYzWB6O/U1n sOF80UJTpyPIq6gsQNuNDKwkcBFLYvOfmzQ7o5QPv100x5rOXwCgXGbvH6zizUPqiS4U JWoftqZ7dz9r7wMaQ/EyVtahglUDg0NhOS/uSLPlu1Lx/McMNBHVZS28Ik+SQtjsvaQ+ Wr3juUS0zbxBVpZwgbgeK/59Ixk/b5C1tv5+QWQ+0FXm5/zgPE46Tn7PUomLTlHpOkMB 0UsYD6D+RArfAORq5n0dE9qgHGrUwd36ZWOhemE7DkZEMP45bfx2g9mIVt1OGmE1nA2P KcRg== X-Received: by 10.31.128.82 with SMTP id b79mr21506312vkd.47.1448316119311; Mon, 23 Nov 2015 14:01:59 -0800 (PST) Received: from disco.erlich.nygenome.org ([69.74.14.178]) by smtp.googlemail.com with ESMTPSA id x185sm12304010vkd.12.2015.11.23.14.01.58 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 23 Nov 2015 14:01:58 -0800 (PST) Subject: Re: bug#22001: Is it possible to tab separate concatenated files? To: "Macdonald, Kim - BCCDC" , 22001@debbugs.gnu.org References: <98F8BBDDC88408489C9C92296763678F02561A62AB28@VEXCCR01.phsabc.ehcnet.ca> From: Assaf Gordon Message-ID: <56538D04.50200@gmail.com> Date: Mon, 23 Nov 2015 17:02:44 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <98F8BBDDC88408489C9C92296763678F02561A62AB28@VEXCCR01.phsabc.ehcnet.ca> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22001 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) tag 22001 notabug close 22001 stop Hello Kim, On 11/23/2015 03:50 PM, Macdonald, Kim - BCCDC wrote: > I’m just looking at the options for the cat command – I see there’s a > way to ignore tabs when they exist – but is there a way to tab > separate the files you’re concatenating with the cat command? It is unclear (to me) what you're trying to achieve - could provide a bit more details (perhaps a short example) ? If you have a file (one file) with spaces and you wish to convert them to tabs, consider the 'expand' command (then pipe to 'cat' if needed). If you have multiple files and you wish to print them side-by-side, separated by tabs (as opposed to one-after-the-other, as with 'cat'), consider using 'paste': $ cat 1.txt a b c d $ cat 2.txt 1 2 3 4 $ cat 3.txt w x y z $ paste 1.txt 2.txt 3.txt a 1 w b 2 x c 3 y d 4 z regards, - assaf From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 23 17:45:48 2015 Received: (at 22001) by debbugs.gnu.org; 23 Nov 2015 22:45:48 +0000 Received: from localhost ([127.0.0.1]:50035 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0zro-0004op-5U for submit@debbugs.gnu.org; Mon, 23 Nov 2015 17:45:48 -0500 Received: from mail-vk0-f51.google.com ([209.85.213.51]:34488) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0zrn-0004nL-4B for 22001@debbugs.gnu.org; Mon, 23 Nov 2015 17:45:47 -0500 Received: by vkbs1 with SMTP id s1so48790075vkb.1 for <22001@debbugs.gnu.org>; Mon, 23 Nov 2015 14:45:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-type:content-transfer-encoding; bh=DU9OFi/CqYXlKAiFOxZUDFC1NespPDFKrPt37ne8290=; b=az4htfmE5qT8bDqEvPSGkDhgJIDqrqKvZTpyAQijHjCr7EWQzFa7I7AY3duIdtfMAt YvpH9CogNQmXp8wz8ei+NWk2Dw+ppZuSZAmd3Y2N2l0hfdYw3PSyQdpyj26CqeXW+iqd dwdEX/oyuZ8tPxuh26L8IpbcsHs6cUldTojEkX430cdgWUdpKLErpXlVLvPileujoMTX 7WMyAdAQhTDaYhtz2GjRt30n/WKV+DiG8Nv5zDhB4VsuAnTJoQXpPuBJIJzpyJMQH9DQ FazBTrZ23ehUhbqic6/9EflejurEe2m1XZ3z4ZolF9v/xPstc6i3lXSr5zlX3wj6hMYc f5ow== X-Received: by 10.31.16.214 with SMTP id 83mr20986843vkq.139.1448318746445; Mon, 23 Nov 2015 14:45:46 -0800 (PST) Received: from disco.erlich.nygenome.org ([69.74.14.178]) by smtp.googlemail.com with ESMTPSA id v7sm12441669vkd.11.2015.11.23.14.45.45 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 23 Nov 2015 14:45:45 -0800 (PST) Subject: Re: bug#22001: Is it possible to tab separate concatenated files? To: "Macdonald, Kim - BCCDC" , 22001@debbugs.gnu.org References: <98F8BBDDC88408489C9C92296763678F02561A62AB28@VEXCCR01.phsabc.ehcnet.ca> <56538D04.50200@gmail.com> From: Assaf Gordon Message-ID: <56539747.3050902@gmail.com> Date: Mon, 23 Nov 2015 17:46:31 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <56538D04.50200@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22001 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Correcting myself: On 11/23/2015 05:02 PM, Assaf Gordon wrote: > If you have a file (one file) with spaces and you wish to convert > them to tabs, consider the 'expand' command (then pipe to 'cat' if > needed). > "unexpand" will convert spaces to tabs, "expand" will convert tabs to spaces. From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 23 17:53:33 2015 Received: (at 22001) by debbugs.gnu.org; 23 Nov 2015 22:53:33 +0000 Received: from localhost ([127.0.0.1]:50043 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0zzJ-0005uq-Ef for submit@debbugs.gnu.org; Mon, 23 Nov 2015 17:53:33 -0500 Received: from mail2out.cw.bc.ca ([207.23.159.109]:14472 helo=ironport1.cw.bc.ca) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a0zyz-0005uE-4J for 22001@debbugs.gnu.org; Mon, 23 Nov 2015 17:53:32 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DKAQAmmFNW/yoeAQpeGQEBAQEPAQEBAYRNvx4BDYFlhg8CggEUAQEBAQEBAYEKhDQBAQEBAzpLBAIBCBEEAQEBHgkHHwIRFAkIAQEEARIIiBEDulUNhG0BAQEBAQEBAQEBAQEBAQEBAQEBAQEYhlaEfIJTggiDSYEVBY4UiDIKAQ6LLJZuh1MfAQGEZ1GFKwEBAQ X-IPAS-Result: A2DKAQAmmFNW/yoeAQpeGQEBAQEPAQEBAYRNvx4BDYFlhg8CggEUAQEBAQEBAYEKhDQBAQEBAzpLBAIBCBEEAQEBHgkHHwIRFAkIAQEEARIIiBEDulUNhG0BAQEBAQEBAQEBAQEBAQEBAQEBAQEYhlaEfIJTggiDSYEVBY4UiDIKAQ6LLJZuh1MfAQGEZ1GFKwEBAQ X-IronPort-AV: E=Sophos;i="5.20,338,1444719600"; d="scan'208";a="129516723" Received: from srvexht01.phsabc.ehcnet.ca ([10.1.30.42]) by ironport1.cw.bc.ca with ESMTP; 23 Nov 2015 14:52:53 -0800 Received: from VEXCCR01.phsabc.ehcnet.ca ([fe80::1491:c735:4d7d:86e7]) by SRVEXHT01.phsabc.ehcnet.ca ([::1]) with mapi; Mon, 23 Nov 2015 14:52:52 -0800 From: "Macdonald, Kim - BCCDC" To: 'Assaf Gordon' , "22001@debbugs.gnu.org" <22001@debbugs.gnu.org> Date: Mon, 23 Nov 2015 14:52:52 -0800 Subject: RE: bug#22001: Is it possible to tab separate concatenated files? Thread-Topic: bug#22001: Is it possible to tab separate concatenated files? Thread-Index: AdEmOqbNUc54S0gNT7KeIIUgKui4qgABclEg Message-ID: <98F8BBDDC88408489C9C92296763678F02561A62AB2D@VEXCCR01.phsabc.ehcnet.ca> References: <98F8BBDDC88408489C9C92296763678F02561A62AB28@VEXCCR01.phsabc.ehcnet.ca> <56538D04.50200@gmail.com> In-Reply-To: <56538D04.50200@gmail.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 22001 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Thanks Assaf,=20 Sorry for the confusion - I wanted to add a tab (or even a new line) after = each file that was concatenated. Actually a new line may be better.=20 For Example: Concatenate the files like so: >gi|452742846|ref|NZ_CAFD010000001.1| Salmonella enterica subsp., whole gen= ome shotgun sequenceTTTCAGCATATATATAGGCCATCATACATAGCCATATAT >gi|452742846|ref|NZ_CAFD010000002.1| Salmonella enterica subsp., whole gen= ome shotgun sequenceCATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTC= GACTGACGTC >gi|452742846|ref|NZ_CAFD010000003.1| Salmonella enterica subsp., whole gen= ome shotgun sequenceTATATAGATACATATATCGCGATATCAGACTGCATAGCGTCAG Right now - Just using cat, they look , like: >gi|452742846|ref|NZ_CAFD010000001.1| Salmonella enterica subsp., whole gen= ome shotgun sequenceTTTCAGCATATATATAGGCCATCATACATAGCCATATAT>gi|452742846|re= f|NZ_CAFD010000002.1| Salmonella enterica subsp., whole genome shotgun sequ= enceCATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC>gi|45= 2742846|ref|NZ_CAFD010000003.1| Salmonella enterica subsp., whole genome sh= otgun sequenceTATATAGATACATATATCGCGATATCAGACTGCATAGCGTCAG Kim -----Original Message----- From: Assaf Gordon [mailto:assafgordon@gmail.com]=20 Sent: November 23, 2015 2:03 PM To: Macdonald, Kim - BCCDC; 22001@debbugs.gnu.org Subject: Re: bug#22001: Is it possible to tab separate concatenated files? tag 22001 notabug close 22001 stop Hello Kim, On 11/23/2015 03:50 PM, Macdonald, Kim - BCCDC wrote: > I'm just looking at the options for the cat command - I see there's a=20 > way to ignore tabs when they exist - but is there a way to tab=20 > separate the files you're concatenating with the cat command? It is unclear (to me) what you're trying to achieve - could provide a bit m= ore details (perhaps a short example) ? If you have a file (one file) with spaces and you wish to convert them to t= abs, consider the 'expand' command (then pipe to 'cat' if needed). If you have multiple files and you wish to print them side-by-side, separat= ed by tabs (as opposed to one-after-the-other, as with 'cat'), consider usi= ng 'paste': $ cat 1.txt a b c d $ cat 2.txt 1 2 3 4 $ cat 3.txt w x y z $ paste 1.txt 2.txt 3.txt a 1 w b 2 x c 3 y d 4 z regards, - assaf From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 23 18:09:37 2015 Received: (at 22001) by debbugs.gnu.org; 23 Nov 2015 23:09:37 +0000 Received: from localhost ([127.0.0.1]:50048 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a10Er-0006M3-2B for submit@debbugs.gnu.org; Mon, 23 Nov 2015 18:09:37 -0500 Received: from havoc.proulx.com ([96.88.95.61]:43971) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a10Eo-0006Lu-6a for 22001@debbugs.gnu.org; Mon, 23 Nov 2015 18:09:34 -0500 Received: from joseki.proulx.com (localhost [127.0.0.1]) by havoc.proulx.com (Postfix) with ESMTP id 7637AB9C; Mon, 23 Nov 2015 16:09:33 -0700 (MST) Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 113F82192A; Mon, 23 Nov 2015 16:09:33 -0700 (MST) Received: by hysteria.proulx.com (Postfix, from userid 1000) id C86942DC54; Mon, 23 Nov 2015 16:09:32 -0700 (MST) Date: Mon, 23 Nov 2015 16:09:32 -0700 From: Bob Proulx To: "Macdonald, Kim - BCCDC" Subject: Re: bug#22001: Is it possible to tab separate concatenated files? Message-ID: <20151123160056992589691@bob.proulx.com> References: <98F8BBDDC88408489C9C92296763678F02561A62AB28@VEXCCR01.phsabc.ehcnet.ca> <56538D04.50200@gmail.com> <98F8BBDDC88408489C9C92296763678F02561A62AB2D@VEXCCR01.phsabc.ehcnet.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <98F8BBDDC88408489C9C92296763678F02561A62AB2D@VEXCCR01.phsabc.ehcnet.ca> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Score: -0.6 (/) X-Debbugs-Envelope-To: 22001 Cc: 22001@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.6 (/) Macdonald, Kim - BCCDC wrote: > Sorry for the confusion - I wanted to add a tab (or even a new line) > after each file that was concatenated. Actually a new line may be > better. > > For Example: > Concatenate the files like so: > >gi|452742846|ref|NZ_CAFD010000001.1| Salmonella enterica subsp., whole genome shotgun sequenceTTTCAGCATATATATAGGCCATCATACATAGCCATATAT > >gi|452742846|ref|NZ_CAFD010000002.1| Salmonella enterica subsp., whole genome shotgun sequenceCATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC > >gi|452742846|ref|NZ_CAFD010000003.1| Salmonella enterica subsp., whole genome shotgun sequenceTATATAGATACATATATCGCGATATCAGACTGCATAGCGTCAG > > Right now - Just using cat, they look , like: > >gi|452742846|ref|NZ_CAFD010000001.1| Salmonella enterica subsp., whole genome shotgun sequenceTTTCAGCATATATATAGGCCATCATACATAGCCATATAT>gi|452742846|ref|NZ_CAFD010000002.1| Salmonella enterica subsp., whole genome shotgun sequenceCATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC>gi|452742846|ref|NZ_CAFD010000003.1| Salmonella enterica subsp., whole genome shotgun sequenceTATATAGATACATATATCGCGATATCAGACTGCATAGCGTCAG That example shows a completely different problem. It shows that your input plain text files have no terminating newline, making them officially not plain text files but binary files. Because every plain text line in a file must be terminated with a newline. If they are not then it isn't a text line. Must be binary. Why isn't there a newline at the end of the file? Fix that and all of your problems and many others go away. Getting ahead of things 1... If you just can't fix the lack of a newline at the end of those files then you must handle it explicitly. for f in *.txt; do cat "$f" echo done Getting ahead of things 2... Sometimes people just want a separator between files. Actually 'tail' will already do this rather well. tail -n+0 *.txt ==> 1.txt <== foo ==> 2.txt <== bar Bob From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 23 18:47:08 2015 Received: (at 22001) by debbugs.gnu.org; 23 Nov 2015 23:47:08 +0000 Received: from localhost ([127.0.0.1]:50055 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a10p9-0007O0-OA for submit@debbugs.gnu.org; Mon, 23 Nov 2015 18:47:08 -0500 Received: from mail-vk0-f50.google.com ([209.85.213.50]:36506) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a10p6-0007Nr-Pn for 22001@debbugs.gnu.org; Mon, 23 Nov 2015 18:47:05 -0500 Received: by vkay187 with SMTP id y187so322021vka.3 for <22001@debbugs.gnu.org>; Mon, 23 Nov 2015 15:47:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type:content-transfer-encoding; bh=KKBrPaaGVil9JvntAi4tJMlBCM0b+Wwd8lsVQThGhBk=; b=DPn8Hfo7IBQtjqtVa6DwEnn6YhaoELzzixaaxjRYn2/CO/4CsB5D9V1AeNy3eeR0bs lKlparOKAWCfBKrunFUWgzKFrYIfKnOCKbICv4FsoJ2sve39U/cCTSvbH4Yr4A9KvUNN hDkSCVFQE6PixJpwxN94oD6ZXs+gJXbCSma60QeiBkJ/04yjfulT+XJ2dt8WR5Fw5Usk 6DVOXeuib1YB5/MDj3BKY6dXJn8Isdg3WYO5Yiy+IuWy0lHHEFoU97OZQfcUCsHiCE0M Ecr6N/d0r96S5+68CjNuPD4gqDTvMT+aX4hP9vFNT6YIA3mp00760V0VlgCD6igfiTG0 hPzw== X-Received: by 10.31.11.204 with SMTP id 195mr22776391vkl.23.1448322423948; Mon, 23 Nov 2015 15:47:03 -0800 (PST) Received: from disco.erlich.nygenome.org ([69.74.14.178]) by smtp.googlemail.com with ESMTPSA id 188sm12653591vki.27.2015.11.23.15.47.03 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 23 Nov 2015 15:47:03 -0800 (PST) Subject: Re: bug#22001: Is it possible to tab separate concatenated files? To: Bob Proulx , "Macdonald, Kim - BCCDC" References: <98F8BBDDC88408489C9C92296763678F02561A62AB28@VEXCCR01.phsabc.ehcnet.ca> <56538D04.50200@gmail.com> <98F8BBDDC88408489C9C92296763678F02561A62AB2D@VEXCCR01.phsabc.ehcnet.ca> <20151123160056992589691@bob.proulx.com> From: Assaf Gordon Message-ID: <5653A5A5.3070405@gmail.com> Date: Mon, 23 Nov 2015 18:47:49 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <20151123160056992589691@bob.proulx.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22001 Cc: 22001@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Hello Kim, On 11/23/2015 06:09 PM, Bob Proulx wrote: > Macdonald, Kim - BCCDC wrote: >> For Example: >> Concatenate the files like so: >>> gi|452742846|ref|NZ_CAFD010000001.1| Salmonella enterica subsp., whole genome shotgun sequenceTTTCAGCATATATATAGGCCATCATACATAGCCATATAT >>> gi|452742846|ref|NZ_CAFD010000002.1| Salmonella enterica subsp., whole genome shotgun sequenceCATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC >>> gi|452742846|ref|NZ_CAFD010000003.1| Salmonella enterica subsp., whole genome shotgun sequenceTATATAGATACATATATCGCGATATCAGACTGCATAGCGTCAG >> > That example shows a completely different problem. It shows that your > input plain text files have no terminating newline, making them > officially not plain text files but binary files. Based on the content of your files, I'm guessing that you are working with mangled FASTA file. In that case, it is possible that fixing the original files might be more efficient than trying to amend them later on. The original FASTA files likely looked like so: >gi|452742846|ref|NZ_CAFD010000001.1| Salmonella enterica subsp., whole genome shotgun sequence TTTCAGCATATATATAGGCCATCATACATAGCCATATAT And I'm also guessing that with some script you've removed the ">" prefix and joined the two lines into one. First, I suggest ensuring the original files have unix-style new-lines (LF) and not windows style (CR-LF) or Mac-style (CR). The programs 'dos2unix' and 'mac2unix' would be able to fix it. simply run the programs on each file, they will fix it inplace. I would also recommend ensuring each file does end with a newline. Second, The FASTA id (the long text before your nucleotide sequence) contains spaces, and this will make downstream processing a bit of a pain. I would recommend trimming the FASTA identifier and keeping only the first part (since it contains your IDs, you should have no problem recovering the organism name later). Example: $ cat 1.fa >gi|452742846|ref|NZ_CAFD010000001.1| Salmonella enterica subsp., whole genome shotgun sequence TTTCAGCATATATATAGGCCATCATACATAGCCATATAT $ sed '/^>/s/ .*$//' 1.fa > 2.fa $ cat 2.fa >gi|452742846|ref|NZ_CAFD010000001.1| TTTCAGCATATATATAGGCCATCATACATAGCCATATAT Or do it inplace for all your FA file (be sure to have a backup, though): for i in *.fa ; do sed -i '/^>/s/ .*$//' $i ; done Third, To combine and convert the files into a table (i.e. 1st column=ID, 2nd column=sequence), then, assuming all your sequences are short and contained on one line, the following would work: $ cat 2.fa >gi|452742846|ref|NZ_CAFD010000001.1| TTTCAGCATATATATAGGCCATCATACATAGCCATATAT $ cat 3.fa >gi|452742846|ref|NZ_CAFD010000002.1| CATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC $ cat *.fa | paste - - | sed 's/^>//' > final.txt $ cat final.txt gi|452742846|ref|NZ_CAFD010000001.1| TTTCAGCATATATATAGGCCATCATACATAGCCATATAT gi|452742846|ref|NZ_CAFD010000002.1| CATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC the 'final.txt' will be an easy-to-work-with tabular file. Fourth, If you FASTA files contain multi-lined long sequences, like so: >gi|452742846|ref|NZ_CAFD010000002.1| CATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTAC GTCGACTGACGTCTGTACACCACACGTTGTGACGAGCATCGACTAGCATCAG TTGAGCGACATCATCAGCGACGAGATCACGAGCACTAGCACTACGACTACGA You might consider using a specialized tool to convert them to a table, such as: http://manpages.ubuntu.com/manpages/trusty/man1/fasta_formatter.1.html (*) or http://kirill-kryukov.com/study/tools/fasta-formatter/ . Hope this helps, - assaf (* shameless plug: I wrote fasta_formatter long ago) From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 23 19:05:13 2015 Received: (at 22001) by debbugs.gnu.org; 24 Nov 2015 00:05:13 +0000 Received: from localhost ([127.0.0.1]:50059 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a116e-0007sB-Mc for submit@debbugs.gnu.org; Mon, 23 Nov 2015 19:05:13 -0500 Received: from mail2out.cw.bc.ca ([207.23.159.109]:35083 helo=ironport1.cw.bc.ca) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a116K-0007rP-GG for 22001@debbugs.gnu.org; Mon, 23 Nov 2015 19:05:11 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DLAQCTqFNW/yseAQpeGQEBAQEPAQEBAYNeb78eAQ2BZSGFbgKCARQBAQEBAQEBgQqENAEBAQEDOj8MBAIBCBEEAQEBHgkHHwIRFAkIAQEEAQ0FCIgRAx+6JQ2EbQEBAQEBAQEBAQEBAQEBAQEBAQEBARiGVoR8glOBTjoygxeBFQWNHnaIMgoBDoUVhheDUUmSVINhg3IfAQGEZ1EBAYUpAQEB X-IPAS-Result: A2DLAQCTqFNW/yseAQpeGQEBAQEPAQEBAYNeb78eAQ2BZSGFbgKCARQBAQEBAQEBgQqENAEBAQEDOj8MBAIBCBEEAQEBHgkHHwIRFAkIAQEEAQ0FCIgRAx+6JQ2EbQEBAQEBAQEBAQEBAQEBAQEBAQEBARiGVoR8glOBTjoygxeBFQWNHnaIMgoBDoUVhheDUUmSVINhg3IfAQGEZ1EBAYUpAQEB X-IronPort-AV: E=Sophos;i="5.20,338,1444719600"; d="scan'208";a="129529352" Received: from srvexht02.phsabc.ehcnet.ca ([10.1.30.43]) by ironport1.cw.bc.ca with ESMTP; 23 Nov 2015 16:04:51 -0800 Received: from VEXCCR01.phsabc.ehcnet.ca ([fe80::1491:c735:4d7d:86e7]) by SRVEXHT02.phsabc.ehcnet.ca ([::1]) with mapi; Mon, 23 Nov 2015 16:04:50 -0800 From: "Macdonald, Kim - BCCDC" To: 'Assaf Gordon' , Bob Proulx Date: Mon, 23 Nov 2015 16:04:49 -0800 Subject: RE: bug#22001: Is it possible to tab separate concatenated files? Thread-Topic: bug#22001: Is it possible to tab separate concatenated files? Thread-Index: AdEmSVi96TJY1g4ZThGWPYqhb3izVgAAhnng Message-ID: <98F8BBDDC88408489C9C92296763678F02561A62AB2E@VEXCCR01.phsabc.ehcnet.ca> References: <98F8BBDDC88408489C9C92296763678F02561A62AB28@VEXCCR01.phsabc.ehcnet.ca> <56538D04.50200@gmail.com> <98F8BBDDC88408489C9C92296763678F02561A62AB2D@VEXCCR01.phsabc.ehcnet.ca> <20151123160056992589691@bob.proulx.com> <5653A5A5.3070405@gmail.com> In-Reply-To: <5653A5A5.3070405@gmail.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 22001 Cc: "22001@debbugs.gnu.org" <22001@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Thanks so much!!! I'll try these out now Kim -----Original Message----- From: Assaf Gordon [mailto:assafgordon@gmail.com]=20 Sent: November 23, 2015 3:48 PM To: Bob Proulx; Macdonald, Kim - BCCDC Cc: 22001@debbugs.gnu.org Subject: Re: bug#22001: Is it possible to tab separate concatenated files? Hello Kim, On 11/23/2015 06:09 PM, Bob Proulx wrote: > Macdonald, Kim - BCCDC wrote: >> For Example: >> Concatenate the files like so: >>> gi|452742846|ref|NZ_CAFD010000001.1| Salmonella enterica subsp.,=20 >>> gi|452742846|ref|whole genome shotgun=20 >>> gi|452742846|ref|sequenceTTTCAGCATATATATAGGCCATCATACATAGCCATATAT >>> gi|452742846|ref|NZ_CAFD010000002.1| Salmonella enterica subsp.,=20 >>> gi|452742846|ref|whole genome shotgun=20 >>> gi|452742846|ref|sequenceCATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGA >>> gi|452742846|ref|CTGACGTACGTCGACTGACGTC >>> gi|452742846|ref|NZ_CAFD010000003.1| Salmonella enterica subsp.,=20 >>> gi|452742846|ref|whole genome shotgun=20 >>> gi|452742846|ref|sequenceTATATAGATACATATATCGCGATATCAGACTGCATAGCGTCAG >> > That example shows a completely different problem. It shows that your=20 > input plain text files have no terminating newline, making them=20 > officially not plain text files but binary files. Based on the content of your files, I'm guessing that you are working with = mangled FASTA file. In that case, it is possible that fixing the original files might be more e= fficient than trying to amend them later on. The original FASTA files likely looked like so: >gi|452742846|ref|NZ_CAFD010000001.1| Salmonella enterica subsp., whol= e genome shotgun sequence TTTCAGCATATATATAGGCCATCATACATAGCCATATAT And I'm also guessing that with some script you've removed the ">" prefix a= nd joined the two lines into one. First, I suggest ensuring the original files have unix-style new-lines (LF) and no= t windows style (CR-LF) or Mac-style (CR). The programs 'dos2unix' and 'mac2unix' would be able to fix it. simply run the programs on each file, they will fix it inplace. I would also recommend ensuring each file does end with a newline. Second, The FASTA id (the long text before your nucleotide sequence) contains space= s, and this will make downstream processing a bit of a pain. I would recommend trimming the FASTA identifier and keeping only the first = part (since it contains your IDs, you should have no problem recovering the= organism name later). Example: $ cat 1.fa >gi|452742846|ref|NZ_CAFD010000001.1| Salmonella enterica subsp., whole= genome shotgun sequence TTTCAGCATATATATAGGCCATCATACATAGCCATATAT $ sed '/^>/s/ .*$//' 1.fa > 2.fa $ cat 2.fa >gi|452742846|ref|NZ_CAFD010000001.1| TTTCAGCATATATATAGGCCATCATACATAGCCATATAT Or do it inplace for all your FA file (be sure to have a backup, though): for i in *.fa ; do sed -i '/^>/s/ .*$//' $i ; done Third, To combine and convert the files into a table (i.e. 1st column=3DID, 2nd co= lumn=3Dsequence), then, assuming all your sequences are short and contained= on one line, the following would work: $ cat 2.fa >gi|452742846|ref|NZ_CAFD010000001.1| TTTCAGCATATATATAGGCCATCATACATAGCCATATAT $ cat 3.fa >gi|452742846|ref|NZ_CAFD010000002.1| CATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC $ cat *.fa | paste - - | sed 's/^>//' > final.txt $ cat final.txt gi|452742846|ref|NZ_CAFD010000001.1| TTTCAGCATATATATAGGCCATCATACATAGCCAT= ATAT gi|452742846|ref|NZ_CAFD010000002.1| CATAGCCATATATACTAGCTGACTGACGTCGCAGC= TGGTCAGACTGACGTACGTCGACTGACGTC the 'final.txt' will be an easy-to-work-with tabular file. Fourth, If you FASTA files contain multi-lined long sequences, like so: >gi|452742846|ref|NZ_CAFD010000002.1| CATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTAC GTCGACTGACGTCTGTACACCACACGTTGTGACGAGCATCGACTAGCATCAG TTGAGCGACATCATCAGCGACGAGATCACGAGCACTAGCACTACGACTACGA You might consider using a specialized tool to convert them to a table, suc= h as: http://manpages.ubuntu.com/manpages/trusty/man1/fasta_formatter.1.html (*= ) or http://kirill-kryukov.com/study/tools/fasta-formatter/ . Hope this helps, - assaf (* shameless plug: I wrote fasta_formatter long ago) From debbugs-submit-bounces@debbugs.gnu.org Thu Nov 26 18:53:34 2015 Received: (at submit) by debbugs.gnu.org; 26 Nov 2015 23:53:34 +0000 Received: from localhost ([127.0.0.1]:54756 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a26M0-0001vq-HJ for submit@debbugs.gnu.org; Thu, 26 Nov 2015 18:53:34 -0500 Received: from eggs.gnu.org ([208.118.235.92]:38366) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a26Lx-0001vh-Hh for submit@debbugs.gnu.org; Thu, 26 Nov 2015 18:53:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a26Lv-0005gB-Pr for submit@debbugs.gnu.org; Thu, 26 Nov 2015 18:53:29 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:56950) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a26Lv-0005g7-Mp for submit@debbugs.gnu.org; Thu, 26 Nov 2015 18:53:27 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48361) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a26Lu-0003wK-8K for bug-coreutils@gnu.org; Thu, 26 Nov 2015 18:53:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a26Lp-0005cr-Ko for bug-coreutils@gnu.org; Thu, 26 Nov 2015 18:53:26 -0500 Received: from ishtar.tlinx.org ([173.164.175.65]:50879 helo=Ishtar.hs.tlinx.org) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a26Lp-0005a7-9q for bug-coreutils@gnu.org; Thu, 26 Nov 2015 18:53:21 -0500 Received: from [192.168.4.12] (Athenae [192.168.4.12]) by Ishtar.hs.tlinx.org (8.14.9/8.14.4/SuSE Linux 0.8) with ESMTP id tAQNqkik018275; Thu, 26 Nov 2015 15:52:50 -0800 Message-ID: <56579B4E.6030102@tlinx.org> Date: Thu, 26 Nov 2015 15:52:46 -0800 From: Linda Walsh User-Agent: Thunderbird MIME-Version: 1.0 To: Bob Proulx Subject: Re: bug#22001: Is it possible to tab separate concatenated files? References: <98F8BBDDC88408489C9C92296763678F02561A62AB28@VEXCCR01.phsabc.ehcnet.ca> <56538D04.50200@gmail.com> <98F8BBDDC88408489C9C92296763678F02561A62AB2D@VEXCCR01.phsabc.ehcnet.ca> <20151123160056992589691@bob.proulx.com> In-Reply-To: <20151123160056992589691@bob.proulx.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit Cc: 22001@debbugs.gnu.org, "Macdonald, Kim - BCCDC" , bug-coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Bob Proulx wrote: > > That example shows a completely different problem. It shows that your > input plain text files have no terminating newline, making them > officially[/sic/] not plain text files but binary files. > Because every plain > text line in a file must be terminated with a newline. ---- That's only a recent POSIX definition. It's not related to real life. When I looked for a text file definition on google, nothing was mentioned about needing a newline on the last line -- except on 1 site -- and that site was clearly not talking about 'text' files, but Unix-text-record files w/each record terminated by a NL char. On a mac, txt files have records separated by 'CR', and on DOS/Win, txt files have txt records separated by CRLF. Wikipedia quotes the Unicode definition of txt files -- which doesn't require the POSIX txt-record definition. Also POSIX limits txt format to 'LINE_MAX' bytes -- notice it says 'bytes' and not characters. Yet a unicode line of 256 characters can easily exceed 1024 bytes. Yet never in the the history of the english language have lines been restricted to some number of bytes or characters. But one could note that the posix definition ONLY refers to files -- not streams of TEXT (whatever the character set). Specificially, note, that with 'TEXT COLUMNMS', describe text columns measured in column widths -- yet that conflicts with the definition Text File, in that textfiles use 'bytes' for a maximum line length, while text columns use 'characters' (which can be 1-4 bytes in unicode, UTF-8 or UTF-16 encoded). Of specific note -- "text" composed of characters, MUST support 'NUL' (as well as 'the audio bell' (control-g), the backspace (control-h), vertical tabs(U+000B), form-feed(U+000C). No standard definition outside POSIX include any of those characters -- because text characters are supposed to be readable and visible. But POSIX compatibility claims that Portable Character Set ( http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html#tag_06_01) must include those characters. The 'text'-files-must-have-NL' group ignores the POSIX 2008 definition of a portable character set -- but globs onto the implied definition of a text line as part of a 'text file'. But as already noted, POSIX has conflicting definitions about what text is. (Unicode measured in chars/columns or ascii (measured in bytes). But POSIX 2008 (same url as above) clearly states: A null character, NUL, which has all bits set to zero, shall be in the set of [supported] characters. In all plain-text definitions, it is mentioned that 'text' is is a set of displayable characters that can be broken into lines with the text-line separator definition. The last line of the file Needs No separation character at the end of the line as it doesn't need to be separated from anything. The GNU standard should not limit itself to an *arcane* (and not well known outside of POSIX-fans) definition of text, as it makes text files created before 2008, potentially incompatible. POSIX was supposed to be about portability... it certainly doesn't follow the internet-design-mime of "Accept input liberally, and generate output conservatively. > If they are > not then it isn't a text line. Must be binary. > --- Whereas I maintain that Newlines are required to break plain-text into records -- but not at the end-of-file, since there is no record following. > Why isn't there a newline at the end of the file? Fix that and all of > your problems and many others go away. > --- Didn't used to be a requirement -- it was added because of a broken interpretation of the posix standard. Please remember that a a posixified definition of 'X' (for any X), may not be the same as a real-live 'X'. In this case, we have a file containing *text* by the POSIX def, which you claim doesn't meet the POSIX definition of "text file". It's similar to Orwellian-speak -- redefining common terms to mean something else, so people don't notice the requirement change, then later telling others to clean-up their old input code/data that doesn't meet the newly created definition. Text files have been around alot longer than 8 years. Posix disqualifies most text files, for example, those created on the most widely laptop/desktop/commercial computerer OS in the world (Windows). I think what may be true is that 'POSIX text files' describe a data format that may not be how it is stored on disk. I find it very interesting in how 'NUL' is defined to be part of any POSIX text character set definition where such apps claim to support or process 'text'. It's sad to see the GNU utils becoming less flexible and more restricted over time -- much like the trend in computers to steer the public away from general purpose processing (and computers that can do such), to a tightly controlled, walled garden where consumers are only allowed to do what the manufacturer tells them to do. I suppose it's like the trend in US government that became federal law during the nixon years -- use of a product inconsistent with it's labeling is a violation of federal law). Whereas before, any usage that wasn't prohibited by local law was allowed. It is moving away from a free society with specific restrictions to a controlled society with specific, limited freedoms. From debbugs-submit-bounces@debbugs.gnu.org Thu Nov 26 22:28:23 2015 Received: (at 22001) by debbugs.gnu.org; 27 Nov 2015 03:28:23 +0000 Received: from localhost ([127.0.0.1]:54945 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a29hv-00078x-7y for submit@debbugs.gnu.org; Thu, 26 Nov 2015 22:28:23 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37253) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a29hs-00078o-NP for 22001@debbugs.gnu.org; Thu, 26 Nov 2015 22:28:22 -0500 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (Postfix) with ESMTPS id CC9BBE7093; Fri, 27 Nov 2015 03:28:19 +0000 (UTC) Received: from [10.3.113.12] ([10.3.113.12]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id tAR3SIo6017073; Thu, 26 Nov 2015 22:28:19 -0500 Subject: Re: bug#22001: Is it possible to tab separate concatenated files? To: Linda Walsh , Bob Proulx References: <98F8BBDDC88408489C9C92296763678F02561A62AB28@VEXCCR01.phsabc.ehcnet.ca> <56538D04.50200@gmail.com> <98F8BBDDC88408489C9C92296763678F02561A62AB2D@VEXCCR01.phsabc.ehcnet.ca> <20151123160056992589691@bob.proulx.com> <56579B4E.6030102@tlinx.org> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg X-Enigmail-Draft-Status: N1110 Organization: Red Hat, Inc. Message-ID: <5657CDCD.70906@redhat.com> Date: Thu, 26 Nov 2015 20:28:13 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <56579B4E.6030102@tlinx.org> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="2e2brbwGg2m0jLXcSQkH1VSSLJV1Ka60n" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Spam-Score: -6.0 (------) X-Debbugs-Envelope-To: 22001 Cc: 22001@debbugs.gnu.org, kim.macdonald@bccdc.ca X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --2e2brbwGg2m0jLXcSQkH1VSSLJV1Ka60n Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 11/26/2015 04:52 PM, Linda Walsh wrote: >> Because every plain >> text line in a file must be terminated with a newline. > ---- > That's only a recent POSIX definition. It's not related to > real life. When I looked for a text file definition on google, nothing= > was mentioned about needing a newline on the last line -- except on > 1 site -- and that site was clearly not talking about 'text' files, but= > Unix-text-record files w/each record terminated by a NL char. >=20 Quit spreading FUD about POSIX. That definition of text file is NOT a recent invention; even back in POSIX 2001 the definition read: 3.392 Text File A file that contains characters organized into one or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the . Although IEEE Std 1003.1-2001 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections. http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html That was POSIX Issue 6; the more recent POSIX Issue 7 corrected the definition to also allow a completely empty file to be considered as a text file. But the point is that POSIX has always required a text file to end in a newline. > On a mac, txt files have records separated by 'CR', and on DOS/Win, > txt files have txt records separated by CRLF. And those systems aren't POSIX. So they aren't relevant to a discussion about POSIX. >> Why isn't there a newline at the end of the file? Fix that and all of= >> your problems and many others go away. >> =20 > --- > Didn't used to be a requirement -- it was added because of a broken > interpretation of the posix standard. Please remember that a a posixif= ied > definition of 'X' (for any X), may not be the same as a real-live 'X'. No, it has ALWAYS been a problem. Even 40 years ago, before POSIX was invented, the only PORTABLE way to use programs like sed was to use it on text files - namely, files where no line exceeded LINE_MAX bytes, where no lines contained NUL bytes, and where ALL lines ended in newline. Because there were vendor implementations of sed (not GNU coreutils, mind you, but other vendors) that really were hardcoded to some rather small limits, and understandably so in a day when computers did not have as much memory as they do today. POSIX just standardized existing practice on what formed a text file, when it came to existing Unix systems at that time. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --2e2brbwGg2m0jLXcSQkH1VSSLJV1Ka60n Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJWV83NAAoJEKeha0olJ0NqLNAH/3AzxmBtMsrgQyJy+dJj1rdk bdUX7TJX2JDtIeiYYKYqWnKbj628pW1vq8ocuOFj8798cfTHvKqNcdAE7X4UhAHI 0AwkU8reARXcEqDyPk1vkHFDEVJoqdaZuk5EttykO6BAJrEjcaMHLIXfCeKpLeoP vwAqsJGNSxPyrBylygz3pLhhI0ZEGEXGjcxk/dRIaSd/+2VB9LJzO7UcbuucR114 cPSdDKQyJ4t9DIr9zvZ9md8nAoDKLfcuPFeBCQcDM7VJlTvRI9uCZY2AsDwR7aeH quV3LlavYZt+zyfV3NOJnoyCxEY46WCCdtptKL0TD3wDUgXMOaSYna5JJ/vmcnY= =VbBa -----END PGP SIGNATURE----- --2e2brbwGg2m0jLXcSQkH1VSSLJV1Ka60n-- From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 27 03:22:34 2015 Received: (at 22001) by debbugs.gnu.org; 27 Nov 2015 08:22:34 +0000 Received: from localhost ([127.0.0.1]:55031 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a2EIc-0005vt-5l for submit@debbugs.gnu.org; Fri, 27 Nov 2015 03:22:34 -0500 Received: from mailgw1.uni-kl.de ([131.246.120.220]:48207) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a2EIH-0005vT-MW for 22001@debbugs.gnu.org; Fri, 27 Nov 2015 03:22:32 -0500 Received: from sushi.unix-ag.uni-kl.de (sushi.unix-ag.uni-kl.de [IPv6:2001:638:208:ef34:0:ff:fe00:65]) by mailgw1.uni-kl.de (8.14.4/8.14.4/Debian-7) with ESMTP id tAR8M82o020691 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 27 Nov 2015 09:22:08 +0100 Received: from sushi.unix-ag.uni-kl.de (ip6-localhost [IPv6:::1]) by sushi.unix-ag.uni-kl.de (8.14.4/8.14.4/Debian-4) with ESMTP id tAR8M7Ug011627 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 27 Nov 2015 09:22:07 +0100 Received: (from auerswal@localhost) by sushi.unix-ag.uni-kl.de (8.14.4/8.14.4/Submit) id tAR8M58i011625; Fri, 27 Nov 2015 09:22:05 +0100 Date: Fri, 27 Nov 2015 09:22:05 +0100 From: Erik Auerswald To: Eric Blake Subject: Re: bug#22001: Is it possible to tab separate concatenated files? Message-ID: <20151127082205.GA9914@unix-ag.uni-kl.de> References: <98F8BBDDC88408489C9C92296763678F02561A62AB28@VEXCCR01.phsabc.ehcnet.ca> <56538D04.50200@gmail.com> <98F8BBDDC88408489C9C92296763678F02561A62AB2D@VEXCCR01.phsabc.ehcnet.ca> <20151123160056992589691@bob.proulx.com> <56579B4E.6030102@tlinx.org> <5657CDCD.70906@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5657CDCD.70906@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: -3.3 (---) X-Debbugs-Envelope-To: 22001 Cc: 22001@debbugs.gnu.org, kim.macdonald@bccdc.ca, Linda Walsh , Bob Proulx X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, On Thu, Nov 26, 2015 at 08:28:13PM -0700, Eric Blake wrote: > On 11/26/2015 04:52 PM, Linda Walsh wrote: > > >> Because every plain > >> text line in a file must be terminated with a newline. > > ---- > > That's only a recent POSIX definition. It's not related to > > real life. When I looked for a text file definition on google, nothing > > was mentioned about needing a newline on the last line -- except on > > 1 site -- and that site was clearly not talking about 'text' files, but > > Unix-text-record files w/each record terminated by a NL char. > > > > Quit spreading FUD about POSIX. That definition of text file is NOT a > recent invention; even back in POSIX 2001 the definition read: > > 3.392 Text File > > A file that contains characters organized into one or more lines. The > lines do not contain NUL characters and none can exceed {LINE_MAX} bytes > in length, including the . Although IEEE Std 1003.1-2001 does > not distinguish between text files and binary files (see the ISO C > standard), many utilities only produce predictable or meaningful output > when operating on text files. The standard utilities that have such > restrictions always specify "text files" in their STDIN or INPUT FILES > sections. > http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html At least the definition of a "line" is needed as well to understand the above (from the same URL): 3.205 Line A sequence of zero or more non- s plus a terminating . [...] > > No, it has ALWAYS been a problem. Even 40 years ago, before POSIX was > invented, the only PORTABLE way to use programs like sed was to use it > on text files [...] The sed of Solaris 10 ignores trailing text after the last line, that is after the last newline. I am quite sure this behavior has been in older Solaris and SunOS versions as well. Best regards, Erik -- http://www.unix-ag.uni-kl.de/~auerswal/ From debbugs-submit-bounces@debbugs.gnu.org Wed Oct 24 17:21:36 2018 Received: (at control) by debbugs.gnu.org; 24 Oct 2018 21:21:36 +0000 Received: from localhost ([127.0.0.1]:40746 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gFQau-0006bh-CH for submit@debbugs.gnu.org; Wed, 24 Oct 2018 17:21:36 -0400 Received: from mail-pg1-f181.google.com ([209.85.215.181]:37673) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gFQas-0006bV-OW for control@debbugs.gnu.org; Wed, 24 Oct 2018 17:21:34 -0400 Received: by mail-pg1-f181.google.com with SMTP id c10-v6so2935404pgq.4 for ; Wed, 24 Oct 2018 14:21:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:message-id:date:user-agent:mime-version:content-language :content-transfer-encoding; bh=oLB/xLdQbWxMsw86tzjbRlsy48nx+PEvSeyhxpn3ho4=; b=lt2HoLPeAgc0w1jQjQWmXt455vvApZqqXobYD5fgYlZHR0Qxkh5fyhghUbViG6kdtH bBY0akfGZma9RNvquevJ/zBzB0mlQ7BYX/7kFcgEOB3cv9m5ak9gdhlLr/jaID3viPH9 x5qzW6gZxIQY9mOLCbdSD3F8MbFnxWNzpo6kP9yxcr1UDYkzxQNjj37PyBbYiP9pyiPz pQBr9WAc4WhKvtIC01/fN7p/ryEG/NWIuvWjAiws9/6RtNcy938yr6JS5SePyPrHKo9R UZUNGh6bcf6qt4l1MNabLhmm4fIcbYsfDWz7NskhCD6www1UgizE8oobk9rQztvEgGDo w+qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=oLB/xLdQbWxMsw86tzjbRlsy48nx+PEvSeyhxpn3ho4=; b=cMatCHZ7D1qZ16xZpKzIV0hneHs0uux6ikaMlkEdmHOznWWAke5nz1IC+/VkMsG8OR s97IUS3M2HVptmhxuxkUCmXlt4r/j3pzjhT5UmIEc6wmgJWC69bqqhHLCe2j98Svg1rr hK7o6fAFiFKiNxB6K/cIDEnX3MmIsAYT4V4SYbdG6yNYor6/kUmqsI60ca2a79vkhixj Y+UJAx6byBUPtt2i4fWaZblyK7RaVijXjk9QIdBuXGp4dD0/KFhZMp6nOlMLZbmECzQh OtDDHGwnZDtGQ4kkGa5Asom57KcGmEdlMperMMhzVlsrjC3pNMP9GneWLMaeBlxCrq0F pB0A== X-Gm-Message-State: AGRZ1gJA14w1AbkhAuKRepPEdTJbrhgg4Wfz3HCNKKX/trfZ8ylNe+Rb ZihT3bniScM8NvKy8Vb6GeEgACPN5Vw= X-Google-Smtp-Source: AJdET5d+xFazGLJwJudSl3gngAPLo6dw1DW0nSjZhD23Rq8nAKeDkTHxVftIZeVpe5rZMQkXpRURfA== X-Received: by 2002:a65:4103:: with SMTP id w3-v6mr4060732pgp.284.1540416088595; Wed, 24 Oct 2018 14:21:28 -0700 (PDT) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id e12-v6sm5969037pgs.92.2018.10.24.14.21.27 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 24 Oct 2018 14:21:27 -0700 (PDT) To: control@debbugs.gnu.org From: Assaf Gordon Message-ID: <5d3df8cb-b982-3195-8bd7-91abf2701ff0@gmail.com> Date: Wed, 24 Oct 2018 15:21:26 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 2.0 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: tags 22001 notabug close 22001 [...] Content analysis details: (2.0 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (assafgordon[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.215.181 listed in wl.mailspike.net] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.215.181 listed in list.dnswl.org] 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders 1.8 MISSING_SUBJECT Missing Subject: header 0.2 NO_SUBJECT Extra score for no subject X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) tags 22001 notabug close 22001 From unknown Sun Jun 22 11:47:18 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 22 Nov 2018 12:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator