GNU bug report logs - #51792
coreutils - csplit - feature request

Previous Next

Package: coreutils;

Reported by: Rodolfo Aramayo <raramayo <at> tamu.edu>

Date: Fri, 12 Nov 2021 17:08:02 UTC

Severity: wishlist

Full log


Message #8 received at 51792 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Rodolfo Aramayo <raramayo <at> tamu.edu>, 51792 <at> debbugs.gnu.org
Subject: Re: bug#51792: coreutils - csplit - feature request
Date: Fri, 12 Nov 2021 18:23:37 +0000
On 12/11/2021 17:05, Rodolfo Aramayo wrote:
> Dear Coreutils Maintainers,
> 
> First, thank you for your work. I use coreutils daily both for my research
> and teaching. It is a great set of tools.
> 
> Second, I recently needed to extract Coding Sequences information from a
> GenBank file. GenBank files are used in Computational
> Genomics/Bioinformatics extensively. I used csplit, and it works like a
> charm.
> 
> The command I used is:
> 
> csplit -sz -n 5 --prefix=02_ 01_00001
> /[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]CDS[[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]][[:space:]]/
> {*};
> 
> I was unable to declare: "[[:space:]]\+" as I expected for POSIX aware code.
> 
> My question is: Is csplit POSIX compatible? and if it is not, can we make
> it POSIX compatible?


Well POSIX defines BRE and ERE, with csplit supporting the former.
From the code we have:

  re_syntax_options =
    RE_SYNTAX_POSIX_BASIC & ~RE_CONTEXT_INVALID_DUP & ~RE_NO_EMPTY_RANGES;

Generally one can replace '+' functionality from ERE, with '\{1,\}' in BRE.
So you'd be using something like:

  [[:space:]]\{1,\}CDS[[:space:]]\{1,\}

We might add an option to use ERE, though there isn't a big need
for that I think for csplit use cases.

cheers,
Pádraig




This bug report was last modified 3 years and 293 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.