From unknown Sun Jun 22 11:33:56 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#51792 <51792@debbugs.gnu.org> To: bug#51792 <51792@debbugs.gnu.org> Subject: Status: coreutils - csplit - feature request Reply-To: bug#51792 <51792@debbugs.gnu.org> Date: Sun, 22 Jun 2025 18:33:56 +0000 retitle 51792 coreutils - csplit - feature request reassign 51792 coreutils submitter 51792 Rodolfo Aramayo severity 51792 wishlist thanks From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 12 12:07:28 2021 Received: (at submit) by debbugs.gnu.org; 12 Nov 2021 17:07:28 +0000 Received: from localhost ([127.0.0.1]:45867 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mla1P-0008D7-Hl for submit@debbugs.gnu.org; Fri, 12 Nov 2021 12:07:28 -0500 Received: from lists.gnu.org ([209.51.188.17]:46828) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mlZzt-0008AW-Ey for submit@debbugs.gnu.org; Fri, 12 Nov 2021 12:05:54 -0500 Received: from eggs.gnu.org ([209.51.188.92]:56744) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mlZzt-00030n-A5 for bug-coreutils@gnu.org; Fri, 12 Nov 2021 12:05:53 -0500 Received: from mx0a-00178102.pphosted.com ([148.163.135.245]:2494) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mlZzq-0005q2-Bl for bug-coreutils@gnu.org; Fri, 12 Nov 2021 12:05:52 -0500 Received: from pps.filterd (m0169865.ppops.net [127.0.0.1]) by mx0a-00178102.pphosted.com (8.16.1.2/8.16.1.2) with ESMTP id 1ACGRtv0012019 for ; Fri, 12 Nov 2021 11:05:46 -0600 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tamu.edu; h=mime-version : from : date : message-id : subject : to : content-type; s=ppae6d7b; bh=QvOs8jHH9DjUUJ69rAM1kiI8GNIGEpf0rJ5Aq7rd49s=; b=J64ZRXztff0UrfeZu2/5dKx/1KCgL8oBlH4E00SCNulLEFDw5SbivYe75H1P9js1efyq BFLV25rLUUWePM/D/7V0nXXINyEEJq6wBv0PRll9rnrl135v8bdTxlLKBuiqKLnizTxJ pVMgm451E0wZN8Drw85uruxJYB2SuTRLICOWxwTaqiZnHIEyO1ZqVppXo94Ij2W1OkLH zrk/ARn3fXxE/YghJv245jH9GgW3GF9k5OIM3v43oNasLHREcpfYLhQQktxHPuoJ7uwy 4XDQqU2Gr4rroTL1tfaVhr+wy8goAyh7WFTV2uWpPP9pXT2EJhS4puAH22FHIVfqUhcs 0Q== Received: from relay.tamu.edu (mailhost.cse.tamu.edu [165.91.22.118]) by mx0a-00178102.pphosted.com (PPS) with ESMTPS id 3c9utfgamf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 12 Nov 2021 11:05:46 -0600 X-TAMU-Auth: raramayo (209.85.219.173) Received: by mail-yb1-f173.google.com with SMTP id 131so25375119ybc.7 for ; Fri, 12 Nov 2021 09:05:45 -0800 (PST) X-Gm-Message-State: AOAM532rJIBu17l1h5rFl/qCtyMJ42RQr4WNoiNQ+X5QSrudQ1gF6iOU JjuGRMYHzHLiui0QcDuDLK2nFYXWT0i0IoXVM1e4Pg== X-Google-Smtp-Source: ABdhPJzWi1MVh+/+dFVKBI7n48U9kuRK7VaTP0JqK9Amz9YlkeedjiLsVGjhICJMSgXWT22GZJix/+cLGwMbXzEpUlA= X-Received: by 2002:a05:6902:1105:: with SMTP id o5mr20483255ybu.348.1636736745272; Fri, 12 Nov 2021 09:05:45 -0800 (PST) MIME-Version: 1.0 From: Rodolfo Aramayo Date: Fri, 12 Nov 2021 11:05:09 -0600 X-Gmail-Original-Message-ID: Message-ID: Subject: coreutils - csplit - feature request To: bug-coreutils@gnu.org Content-Type: multipart/alternative; boundary="0000000000004b29c005d09a78f3" X-Proofpoint-ORIG-GUID: jvfrTyEwp2Zv8BVKbjAtx_NYjWxDfsFU X-Proofpoint-GUID: jvfrTyEwp2Zv8BVKbjAtx_NYjWxDfsFU X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 lowpriorityscore=0 suspectscore=0 mlxscore=0 malwarescore=0 mlxlogscore=952 spamscore=0 adultscore=0 phishscore=0 priorityscore=1501 bulkscore=0 clxscore=1011 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111120096 Received-SPF: pass client-ip=148.163.135.245; envelope-from=raramayo@tamu.edu; helo=mx0a-00178102.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 12 Nov 2021 12:07:26 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) --0000000000004b29c005d09a78f3 Content-Type: text/plain; charset="UTF-8" Dear Coreutils Maintainers, First, thank you for your work. I use coreutils daily both for my research and teaching. It is a great set of tools. Second, I recently needed to extract Coding Sequences information from a GenBank file. GenBank files are used in Computational Genomics/Bioinformatics extensively. I used csplit, and it works like a charm. The command I used is: csplit -sz -n 5 --prefix=02_ 01_00001 /[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]CDS[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]/ {*}; I was unable to declare: "[[:space:]]\+" as I expected for POSIX aware code. My question is: Is csplit POSIX compatible? and if it is not, can we make it POSIX compatible? Many Thanks Rodolfo -- Dr. Rodolfo Aramayo, PhD Faculty of Biology and Genetics Department of Biology, Texas A&M University --0000000000004b29c005d09a78f3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Dear Coreutils Maintainers,

F= irst, thank you for your work. I use coreutils daily both for my research a= nd teaching. It is a great set of tools.

S= econd, I recently needed to extract Coding Sequences information from a Gen= Bank file. GenBank files are used in Computational Genomics/Bioinformatics = extensively. I used csplit, and it works like a charm.

The command I used is:

csplit= -sz -n 5 --prefix=3D02_ 01_00001 /[[:space:]][[:space:]][[:space:]][[:spac= e:]][[:space:]]CDS[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[= :space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]= [[:space:]]/ {*};

I was unable to declare:= "[[:space:]]\+" as I expected for POSIX aware code.

My question is: Is csplit POSIX compatible? and if it = is not, can we make it POSIX compatible?

M= any Thanks

Rodolfo

--
Dr. = Rodolfo Aramayo, PhD
Faculty of Biology and Genetics
Department of Bi= ology, Texas A&M University

--0000000000004b29c005d09a78f3-- From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 12 13:23:48 2021 Received: (at 51792) by debbugs.gnu.org; 12 Nov 2021 18:23:48 +0000 Received: from localhost ([127.0.0.1]:45949 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mlbDI-0001lk-BD for submit@debbugs.gnu.org; Fri, 12 Nov 2021 13:23:48 -0500 Received: from mail-wr1-f44.google.com ([209.85.221.44]:43763) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mlbDF-0001lU-4N for 51792@debbugs.gnu.org; Fri, 12 Nov 2021 13:23:47 -0500 Received: by mail-wr1-f44.google.com with SMTP id t30so16991997wra.10 for <51792@debbugs.gnu.org>; Fri, 12 Nov 2021 10:23:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=/PF46TUEnKzu7cS59wdSNcwtpdM9QPMHaNuZerQA4wQ=; b=ZnHJKlU3wLasu0FoghvXh5bWYTz3gd3/yuLaW88d5qirNmk2b5GNFw7j26LNFITk5V C9v5v1Tfh/v6Kfx+CaQ1xt9skXXMnaykT0DkrEPSyrkCZNu3xm8jwv6PbpBYacQ63XHI dE0kRybG0unUgf+NOcMTv3G1HM93RyjuchMHSEs9q328u6NSHfw6O+XamBatfRvbANZ4 2xmuZKZS+0jGQsm5rCn9c9brzYywgdDz/vlG4qBXZYvhrbquYNUByVxALh7fkYP0Qvrv NX+fQal7Mf0iNY/eO0uRD8EaaZMqtxo2FUTI0M1MFWtqLufYodee0VIiWd48WAx9Xgkf q0xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:message-id:date:mime-version:user-agent :subject:content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=/PF46TUEnKzu7cS59wdSNcwtpdM9QPMHaNuZerQA4wQ=; b=hTTnrcVTg2+PksL7zemOVTh4bY5Bjr3M6qN/WUJGYUteAIa6uXuT9d4lgRiCYtcAIl tw1jSBPEIPk8Oj9vkp9Kwg0qx8XpWMFF8g77jcobtlujSYgZpHCWsqmw9tpl6wuCKBv8 qbDSSD023jZ0QjKfTqNO5iZ6YfId9IJTNqTYeqeTcfNiMAjxKI3R6FRT1Lag/4+DcAuI FSt4/eJBeUb92TywxxVH5Nr/Kv8r+nFsTHgUNiLqOKVZj3+au2Kjqir/G+17RzExraXG MQw31MGhq504Uvvw1NDAMTsLJ3SciqoxnMry6W8Rf+N5bdmCVG7KLR/YZFviyuXO418+ 2CeQ== X-Gm-Message-State: AOAM533fWOMhQ7X5PR7mfEkSWSlZhKnvLE6haQv9Yp/3DxijY0+1H9VK CGOUaRPi6Qykssc4H9tcgVs= X-Google-Smtp-Source: ABdhPJzB1OuYmthnI/pn+zOrUCn8H+BZwAEouzZ4Xdu2RqWI9zZTNaHnxSm4GUq2vVCtov3WbxH3kw== X-Received: by 2002:a05:6000:181:: with SMTP id p1mr21174819wrx.292.1636741419174; Fri, 12 Nov 2021 10:23:39 -0800 (PST) Received: from [192.168.1.9] (86-40-129-104-dynamic.agg2.lod.rsl-rtd.eircom.net. [86.40.129.104]) by smtp.googlemail.com with ESMTPSA id f8sm12919374wmf.2.2021.11.12.10.23.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 12 Nov 2021 10:23:38 -0800 (PST) Message-ID: Date: Fri, 12 Nov 2021 18:23:37 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Thunderbird/95.0 Subject: Re: bug#51792: coreutils - csplit - feature request Content-Language: en-US To: Rodolfo Aramayo , 51792@debbugs.gnu.org References: From: =?UTF-8?Q?P=c3=a1draig_Brady?= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: 0.4 (/) X-Debbugs-Envelope-To: 51792 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.6 (/) On 12/11/2021 17:05, Rodolfo Aramayo wrote: > Dear Coreutils Maintainers, > > First, thank you for your work. I use coreutils daily both for my research > and teaching. It is a great set of tools. > > Second, I recently needed to extract Coding Sequences information from a > GenBank file. GenBank files are used in Computational > Genomics/Bioinformatics extensively. I used csplit, and it works like a > charm. > > The command I used is: > > csplit -sz -n 5 --prefix=02_ 01_00001 > /[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]CDS[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]/ > {*}; > > I was unable to declare: "[[:space:]]\+" as I expected for POSIX aware code. > > My question is: Is csplit POSIX compatible? and if it is not, can we make > it POSIX compatible? Well POSIX defines BRE and ERE, with csplit supporting the former. From the code we have: re_syntax_options = RE_SYNTAX_POSIX_BASIC & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY_RANGES; Generally one can replace '+' functionality from ERE, with '\{1,\}' in BRE. So you'd be using something like: [[:space:]]\{1,\}CDS[[:space:]]\{1,\} We might add an option to use ERE, though there isn't a big need for that I think for csplit use cases. cheers, Pádraig From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 17 15:07:18 2021 Received: (at 51792) by debbugs.gnu.org; 17 Nov 2021 20:07:18 +0000 Received: from localhost ([127.0.0.1]:33975 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mnRDB-0004nX-6Y for submit@debbugs.gnu.org; Wed, 17 Nov 2021 15:07:18 -0500 Received: from mx0a-00178102.pphosted.com ([148.163.135.245]:13666) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mnQh0-0003fu-QM for 51792@debbugs.gnu.org; Wed, 17 Nov 2021 14:34:05 -0500 Received: from pps.filterd (m0169867.ppops.net [127.0.0.1]) by mx0a-00178102.pphosted.com (8.16.1.2/8.16.1.2) with ESMTP id 1AHJSMqO002412 for <51792@debbugs.gnu.org>; Wed, 17 Nov 2021 13:34:01 -0600 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tamu.edu; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type; s=ppae6d7b; bh=4ujRHSI9IeAXgphKrGpL/1cnyD+JnhBMP6bOBe5Hpq4=; b=kKoTO8HW1kWUjTGAmU7oR7PdlvyspA3obC+9MNMsyURAn9NdXUM29sb+SmSaCjwuKOf7 4bbeEFFwlXlXBbXbwAy7D8j9B0wocDwh2ZYA2+tG9yaTuq9lyv5P0LdJZWjzpYj1s+5d ohzkX+SatKnY/m/6u03Yyp5h3mL0oSynqwGxSoP0u/dA08Kt3XkZtCfb3+af31qJsOdj WO+wEItLq0KlLxF+pmgER7uuXTmotFAl0RjzXeiXlN0e/8Suf4Zx2eAcZDXIacCczvTj Q6U5U6S3OzhNZbO9sntCGtensUJmR6uCuNVckTjqW+VV6yT81LrK/yBLPc2o3EbUVUan Dg== Received: from relay.tamu.edu (mailhost.cse.tamu.edu [165.91.22.118]) by mx0a-00178102.pphosted.com (PPS) with ESMTPS id 3cc57wkxxh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for <51792@debbugs.gnu.org>; Wed, 17 Nov 2021 13:34:01 -0600 X-TAMU-Auth: raramayo (209.85.222.53) Received: by mail-ua1-f53.google.com with SMTP id t13so8336469uad.9 for <51792@debbugs.gnu.org>; Wed, 17 Nov 2021 11:34:00 -0800 (PST) X-Gm-Message-State: AOAM532Ja7aODCdBep7LGbRRIetwFue3gb0JBrXepo1gcsmERVtGy3kD Oelt5CYsYtrgaKY3IcPflUOPQLqg64YPkp6udWnpBg== X-Google-Smtp-Source: ABdhPJyznMUitApw88p94FWwLGjGoliE8N1+4bD7Ces41hDynL57O/WqVE1WD75j+G3qY8p6DCO64WUdcblKRiNwqmQ= X-Received: by 2002:ab0:7399:: with SMTP id l25mr27357375uap.120.1637177640297; Wed, 17 Nov 2021 11:34:00 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Rodolfo Aramayo Date: Wed, 17 Nov 2021 13:33:24 -0600 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: bug#51792: coreutils - csplit - feature request To: =?UTF-8?Q?P=C3=A1draig_Brady?= Content-Type: multipart/alternative; boundary="000000000000af3c5d05d1011fab" X-Proofpoint-GUID: fPzHbxFyG1MpejRhtDDNQEsqazOZFGRW X-Proofpoint-ORIG-GUID: fPzHbxFyG1MpejRhtDDNQEsqazOZFGRW X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxlogscore=999 clxscore=1011 phishscore=0 spamscore=0 adultscore=0 suspectscore=0 mlxscore=0 malwarescore=0 bulkscore=0 impostorscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2110150000 definitions=main-2111170086 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 51792 X-Mailman-Approved-At: Wed, 17 Nov 2021 15:07:16 -0500 Cc: Rodolfo Aramayo , 51792@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) --000000000000af3c5d05d1011fab Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable P=C3=A1draig, Thank you for your response Unfortunately, even the command pattern you are proposing as an alternative= : [[:space:]]\{1,\}CDS[[:space:]]\{1,\} does not work, therefore I have to conclude that csplit is neither BRE and ERE compatible Thanks for your help R On Fri, Nov 12, 2021 at 12:23 PM P=C3=A1draig Brady wrot= e: > On 12/11/2021 17:05, Rodolfo Aramayo wrote: > Dear Coreutils Maintainers, > > > First, thank you for your work. I use coreutils daily both for my > research > and teaching. It is a great set of tools. > > Second, I recent= ly > needed ZjQcmQRYFpfptBannerStart > This Message Is From an External Sender > This message came from outside your organization. > ZjQcmQRYFpfptBannerEnd > > On 12/11/2021 17:05, Rodolfo Aramayo wrote: > > Dear Coreutils Maintainers, > > > > First, thank you for your work. I use coreutils daily both for my resea= rch > > and teaching. It is a great set of tools. > > > > Second, I recently needed to extract Coding Sequences information from = a > > GenBank file. GenBank files are used in Computational > > Genomics/Bioinformatics extensively. I used csplit, and it works like a > > charm. > > > > The command I used is: > > > > csplit -sz -n 5 --prefix=3D02_ 01_00001 > > /[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]CDS[[:space:]][= [:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]= ][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]/ > > {*}; > > > > I was unable to declare: "[[:space:]]\+" as I expected for POSIX aware = code. > > > > My question is: Is csplit POSIX compatible? and if it is not, can we ma= ke > > it POSIX compatible? > > > Well POSIX defines BRE and ERE, with csplit supporting the former. > From the code we have: > > re_syntax_options =3D > RE_SYNTAX_POSIX_BASIC & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY_RANGE= S; > > Generally one can replace '+' functionality from ERE, with '\{1,\}' in BR= E. > So you'd be using something like: > > [[:space:]]\{1,\}CDS[[:space:]]\{1,\} > > We might add an option to use ERE, though there isn't a big need > for that I think for csplit use cases. > > cheers, > P=C3=A1draig > > --=20 Dr. Rodolfo Aramayo, PhD Faculty of Biology and Genetics Department of Biology, Texas A&M University PeerJ PeerJ - the Journal of Life & Environmental Sciences Academic Editor peerj.com/RodolfoAramayo --000000000000af3c5d05d1011fab Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
P=C3=A1draig,

Thank you for your respo=
nse

Unfortunately, even the command pattern you are proposing a=
s an alternative:

=C2=A0 [[:space:]]\{1,\}CDS[[:space:]]\{1,\}
does not work, therefore I have to conclude that csplit is neither BRE an= d ERE compatible

Thanks for your help

R
=
<=
br>

On Fri, Nov 12, 2021 at 12:23 PM P=C3=A1draig Brady <= P@draigbrady.com> wrote:
=20 =20
On 12/11/2021 17:05, Rodolfo Aramayo wrote: > Dear Coreutils Maintainer= s, > > First, thank you for your work. I use coreutils daily both for= my research > and teaching. It is a great set of tools. > > Secon= d, I recently needed ZjQcmQRYFpfpt= BannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptB= annerEnd
O=
n 12/11/2021 17:05, Rodolfo Aramayo wrote:
> Dear Coreutils Maintainers,
>=20
> First, thank you for your work. I use coreutils daily both for my rese=
arch
> and teaching. It is a great set of tools.
>=20
> Second, I recently needed to extract Coding Sequences information from=
 a
> GenBank file. GenBank files are used in Computational
> Genomics/Bioinformatics extensively. I used csplit, and it works like =
a
> charm.
>=20
> The command I used is:
>=20
> csplit -sz -n 5 --prefix=3D02_ 01_00001
> /[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]CDS[[:space:]]=
[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:=
]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]/
> {*};
>=20
> I was unable to declare: "[[:space:]]\+" as I expected for P=
OSIX aware code.
>=20
> My question is: Is csplit POSIX compatible? and if it is not, can we m=
ake
> it POSIX compatible?


Well POSIX defines BRE and ERE, with csplit supporting the former.
 From the code we have:

   re_syntax_options =3D
     RE_SYNTAX_POSIX_BASIC & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY=
_RANGES;

Generally one can replace '+' functionality from ERE, with '\{1=
,\}' in BRE.
So you'd be using something like:

   [[:space:]]\{1,\}CDS[[:space:]]\{1,\}

We might add an option to use ERE, though there isn't a big need
for that I think for csplit use cases.

cheers,
P=C3=A1draig


--
Dr. Rodolfo Aramayo, PhD
Faculty of Bio= logy and Genetics
Department of Biology, Texas A&M University
=20 =20 =20 =20 =20
PeerJ
=20 =20
=20 =20 =20 =20 =20 =20 PeerJ - the= Journal of Life & Environmental Sciences =20 =20 =20 =20 Academic Editor<= /span> =20 =20 =20 =20 peerj.com/R= odolfoAramayo
--000000000000af3c5d05d1011fab--