GNU bug report logs - #6277
cut: Please add CSV parsing

Previous Next

Package: coreutils;

Reported by: sandy bas <basic207 <at> gmail.com>

Date: Thu, 27 May 2010 03:03:02 UTC

Severity: wishlist

Tags: wontfix

Done: Bob Proulx <bob <at> proulx.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 6277 in the body.
You can then email your comments to 6277 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6277; Package coreutils. (Thu, 27 May 2010 03:03:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to sandy bas <basic207 <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 27 May 2010 03:03:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: sandy bas <basic207 <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: cut
Date: Wed, 26 May 2010 23:00:08 -0400
Dear People:

Comma delimited files often have fields of the form "big,black,bear" where the
commas within the quotes are not delimiters. A useful option in cut would
 be to ignore the commas (delimiters) within the quotation marks.

  I would be glad to put it in if you would like the option.

 Thank you for all of the work that you do.

   sandy




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6277; Package coreutils. (Thu, 27 May 2010 15:12:01 GMT) Full text and rfc822 format available.

Message #8 received at 6277 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: sandy bas <basic207 <at> gmail.com>
Cc: 6277 <at> debbugs.gnu.org
Subject: Re: bug#6277: cut
Date: Thu, 27 May 2010 16:08:35 +0100
On 27/05/10 04:00, sandy bas wrote:
> Dear People:
> 
> Comma delimited files often have fields of the form "big,black,bear" where the
> commas within the quotes are not delimiters. A useful option in cut would
>  be to ignore the commas (delimiters) within the quotation marks.
> 
>   I would be glad to put it in if you would like the option.

Hmm, the CSV format is a bit more complicated than that
so as to support " and \n within fields also.
It would be more general I think to have a separate tool
to parse CSV to a format more easily usable on the shell,
and that it turn could be passed to cut -d, column -s, ...

Aha, the csvutils command from here seems to do this:
http://freshmeat.net/projects/csvutils

cheers,
Pádraig.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6277; Package coreutils. (Thu, 27 May 2010 15:28:01 GMT) Full text and rfc822 format available.

Message #11 received at 6277 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: sandy bas <basic207 <at> gmail.com>
Cc: 6277 <at> debbugs.gnu.org
Subject: Re: bug#6277: cut: Please add CSV parsing
Date: Thu, 27 May 2010 09:27:00 -0600
retitle 6277 cut: Please add CSV parsing
tags 6277 + wishlist wontfix
thanks

sandy bas wrote:
> Comma delimited files often have fields of the form "big,black,bear"
> where the commas within the quotes are not delimiters. A useful
> option in cut would be to ignore the commas (delimiters) within the
> quotation marks.
> 
> I would be glad to put it in if you would like the option.

Parsing CSV files is deceptively more complicated than is looks.

Using the Perl Text::CSV module as a guide shows that the result would
add several thousand lines of code.  This would fall under the
category of creeping featurism and code bloat because it would
significantly enlarge the code base of the 'cut' program well beyond
its traditional role as a simple cut by field program.

And if CSV parsing is allowed in then wouldn't by comparison other
file format parsing be allowed in as well?  Plus the coreutils are the
core utilities that belong on every machine in the universe.  Does my
toaster need this capability?  Large items like this really should go
into a differently named program.  It isn't just the use of the
program on a fully loaded desktop but also the use of the program
across the entire universe of machines.

I am sorry but full CSV parsing really doesn't belong in cut.

I suggest that you use Perl, Python or Ruby for CSV processing.  They
include full libraries for dealing with the many varied details of CSV
handling.

Something like the following is a simple example perl script to print
only the second field of a CSV file.

  #!/usr/bin/env perl
  use Text::CSV;
  use strict;
  my $csv = Text::CSV->new;
  foreach my $filename (@ARGV) {
      open(CSV,$filename) or die "Error parsing $filename: $!\n";
      while (defined($_ = <CSV>)) {
      if (! $csv->parse($_)) {
          die("Error parsing: " . $csv->error_input);
  	}
  	print(($csv->fields())[1],"\n");  # print second field
      }
  }

Bob




Changed bug title to 'cut: Please add CSV parsing' from 'cut' Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Thu, 27 May 2010 15:28:02 GMT) Full text and rfc822 format available.

Added tag(s) wontfix. Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Thu, 27 May 2010 15:28:02 GMT) Full text and rfc822 format available.

Severity set to 'wishlist' from 'normal' Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Thu, 27 May 2010 16:08:02 GMT) Full text and rfc822 format available.

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6277; Package coreutils. (Fri, 28 May 2010 07:03:02 GMT) Full text and rfc822 format available.

Message #20 received at 6277 <at> debbugs.gnu.org (full text, mbox):

From: "Voelker, Bernhard" <bernhard.voelker <at> siemens-enterprise.com>
To: Bob Proulx <bob <at> proulx.com>,
    sandy bas <basic207 <at> gmail.com>
Cc: "6277 <at> debbugs.gnu.org" <6277 <at> debbugs.gnu.org>
Subject: RE: bug#6277: cut: Please add CSV parsing
Date: Fri, 28 May 2010 08:46:29 +0200
Bob Proulx wrote:

> sandy bas wrote:
>> Comma delimited files often have fields of the form "big,black,bear"
>> where the commas within the quotes are not delimiters. A useful
>> option in cut would be to ignore the commas (delimiters) within the
>> quotation marks.
>> 
>> I would be glad to put it in if you would like the option.

> I suggest that you use Perl, Python or Ruby for CSV processing.  They
> include full libraries for dealing with the many varied details of CSV
> handling.

just to mention another classic UNIX tool: awk

	awk -F, '$1 ~ /^big$/ { print $2,$3 }' csv.txt

Have a nice day,
Berny






Reply sent to Bob Proulx <bob <at> proulx.com>:
You have taken responsibility. (Mon, 07 Jun 2010 23:09:02 GMT) Full text and rfc822 format available.

Notification sent to sandy bas <basic207 <at> gmail.com>:
bug acknowledged by developer. (Mon, 07 Jun 2010 23:09:02 GMT) Full text and rfc822 format available.

Message #25 received at 6277-close <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: sandy bas <basic207 <at> gmail.com>
Cc: 6277-close <at> debbugs.gnu.org
Subject: Re: bug#6277: cut: Please add CSV parsing
Date: Mon, 7 Jun 2010 17:08:33 -0600
Hi Sandy,

I am happy that you are satisfied with the responses.  I am going to
close the bug ticket in the bug tracking system with this message.
Please feel free to respond and add additional information if
desired.  The group will all see it and the ticket will keep track of
it.

Bob




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 06 Jul 2010 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 15 years and 44 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.