GNU bug report logs - #6780
Add cut multi-character/expression delimiters

Previous Next

Package: coreutils;

Reported by: Bill <bill3 <at> uniserve.com>

Date: Mon, 2 Aug 2010 15:56:02 UTC

Severity: wishlist

To reply to this bug, email your comments to 6780 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6780; Package coreutils. (Mon, 02 Aug 2010 15:56:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bill <bill3 <at> uniserve.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 02 Aug 2010 15:56:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Bill <bill3 <at> uniserve.com>
To: bug-coreutils <at> gnu.org
Subject: Problem with the cut command
Date: Mon, 02 Aug 2010 05:56:31 -0700
Hello,

I'm not sure if this is a bug, a question or a feature request,
but there is a problem with the cut command, specifically with
it's delimiter option '-d'. 

In older times disk space was scarce and every byte was 
conserved. Fields in data files were delimited with a single
character such as ':'. This practise continues today. But 
sometimes it does not and fields in some files are separated
with multiple characters. Space is no longer precious.

Suppose I wish to import information about a disk partition
into my backup script. I want to assign the type of filesystem
to a variable. Compare the output of these two commands.

cat /etc/fstab |grep home | cut -d ' ' -f3
yields a blank output line

cat /etc/fstab |grep opt | awk -F " " '{print $3}'
yields the desired output - reiserfs.

The problem is that the cut command can't handle multiple 
instances of the same delimiter. It's designed to handle
a single character like ':', but can't cope with repeating
characters like '::' or a series of spaces as in /etc/fstab.

So my question is shouldn't the cut delimiter handle 
multiple instances of the same character internally or 
failing that, shouldn't there be some way of specifying a 
series of single delimiter characters such as -d':'+  ?

I hope this is useful feedback and look forward to your reply.

	Bill McGrath






Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6780; Package coreutils. (Mon, 02 Aug 2010 19:10:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Davide Brini <dave_br <at> gmx.com>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#6780: Problem with the cut command
Date: Mon, 2 Aug 2010 19:56:43 +0100
On Mon, 02 Aug 2010 05:56:31 -0700 Bill <bill3 <at> uniserve.com> wrote:

> I'm not sure if this is a bug, a question or a feature request,
> but there is a problem with the cut command, specifically with
> it's delimiter option '-d'. 
> 
> In older times disk space was scarce and every byte was 
> conserved. Fields in data files were delimited with a single
> character such as ':'. This practise continues today. But 
> sometimes it does not and fields in some files are separated
> with multiple characters. Space is no longer precious.
> 
> Suppose I wish to import information about a disk partition
> into my backup script. I want to assign the type of filesystem
> to a variable. Compare the output of these two commands.
> 
> cat /etc/fstab |grep home | cut -d ' ' -f3
> yields a blank output line
> 
> cat /etc/fstab |grep opt | awk -F " " '{print $3}'
> yields the desired output - reiserfs.
> 
> The problem is that the cut command can't handle multiple 
> instances of the same delimiter. It's designed to handle
> a single character like ':', but can't cope with repeating
> characters like '::' or a series of spaces as in /etc/fstab.
> 
> So my question is shouldn't the cut delimiter handle 
> multiple instances of the same character internally or 
> failing that, shouldn't there be some way of specifying a 
> series of single delimiter characters such as -d':'+  ?

cut is required by POSIX to treat every separator character as delimiting a
field. 

"Output fields shall be separated by a single occurrence of the field
delimiter character."

However, what you suggest might be implemented as an extension, which the
user would have to enable explicitly (although I wouldn't bet that the
maintainers think this is a good idea, but I may be wrong).

On a side note, you mention awk which in your specific example of space as
separator happens to work fine. However, that is specifically special-cased
in awk; with any other single-character separator, awk works exactly like
cut:

echo 'a::b:c' | awk -F':' '{print "-"$1"--"$2"--"$3"--"$4"-"}'
-a----b--c-

note the empty second field. But of course in awk, unlike cut. you can say
-F ':+' and get the behavior you want.

-- 
D.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6780; Package coreutils. (Mon, 02 Aug 2010 20:31:02 GMT) Full text and rfc822 format available.

Message #11 received at 6780 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Bill <bill3 <at> uniserve.com>
Cc: 6780 <at> debbugs.gnu.org
Subject: Re: bug#6780: Problem with the cut command
Date: Mon, 2 Aug 2010 14:30:24 -0600
tags 6780 + wishlist
retitle 6780 Add cut multi-character/expression delimiters
thanks

Bill wrote:
> I'm not sure if this is a bug, a question or a feature request,
> but there is a problem with the cut command, specifically with
> it's delimiter option '-d'. 
> 
> In older times disk space was scarce and every byte was 
> conserved. Fields in data files were delimited with a single
> character such as ':'. This practise continues today. But 
> sometimes it does not and fields in some files are separated
> with multiple characters. Space is no longer precious.

Sure.  But I think none of that is relevant to changing stable program
interfaces and behavior.  That is a good point for creating a new
program that has no legacy however.  The world is wide open for adding
new programs.  Feel free to go for it there.

> Suppose I wish to import information about a disk partition
> into my backup script. I want to assign the type of filesystem
> to a variable. Compare the output of these two commands.
> 
> cat /etc/fstab |grep home | cut -d ' ' -f3
> yields a blank output line

It is data dependent.  The output depends upon what you have as input.
For some files it would be one way and for others a different way.
But that just points out that using cut is the wrong tool for the
task.  As you are well aware of by your note cut works with single
character delimiters.  But the fstab may have multiple whitespace.
This makes cut an inappropriate tool for the job.

> cat /etc/fstab |grep opt | awk -F " " '{print $3}'
> yields the desired output - reiserfs.

Awk is a much better tool for the task.  But the inefficiencies
present in that command line are many.  There are much better ways.
Try this instead.

  awk '/opt/{print$3}' /etc/fstab

However that doesn't account for comments that may also match.  To
avoid problems comments should be removed first.

  awk '/#/{gsub("#.*","")}/opt/{print$3}' /etc/fstab

And I am inclined to say that it is better to just match on a
particular field.

  awk '/#/{gsub("#.*","")}$2=="/opt"{print$3}' /etc/fstab

> The problem is that the cut command can't handle multiple 
> instances of the same delimiter. It's designed to handle
> a single character like ':', but can't cope with repeating
> characters like '::' or a series of spaces as in /etc/fstab.

All correct.  The cut command is not the appropriate tool for your
task.

> So my question is shouldn't the cut delimiter handle 
> multiple instances of the same character internally or 
> failing that, shouldn't there be some way of specifying a 
> series of single delimiter characters such as -d':'+  ?

In my opinion no, it should not.  It is feature creep and code bloat.
Cut is not just used on large servers and large desktops but also on
wristwatches and toaster ovens.  Should the increase in size be
multipled by every system in the known universe?  And even if this
feature were added to cut the program would still be insufficient to
the task since it has no capability to handle comments nor line
selection (although your combination with grep is fine with me, good
in fact though sed would be better since it enables checking return
status).  Furthermore the feature is already implemented and fully
supported by awk.  Using awk is a much better fit than using cut.  The
solution already exists in awk and therefore is not needed in cut.
The awk program is standardized and portable.  To me awk is the best
in class tool for this task.

Bob




Changed bug title to 'Add cut multi-character/expression delimiters' from 'Problem with the cut command' Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Mon, 02 Aug 2010 20:31:02 GMT) Full text and rfc822 format available.

Severity set to 'wishlist' from 'normal' Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Mon, 02 Aug 2010 20:42:02 GMT) Full text and rfc822 format available.

This bug report was last modified 15 years and 21 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.