GNU bug report logs - #13301
patch to preserve field order in cut

Previous Next

Package: coreutils;

Reported by: Brad Cater <bradcater <at> gmail.com>

Date: Fri, 28 Dec 2012 23:27:02 UTC

Severity: normal

Tags: notabug

Merged with 6394, 9507

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 13301 in the body.
You can then email your comments to 13301 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#13301; Package coreutils. (Fri, 28 Dec 2012 23:27:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Brad Cater <bradcater <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Fri, 28 Dec 2012 23:27:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Brad Cater <bradcater <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: patch to preserve field order in cut
Date: Fri, 28 Dec 2012 17:19:32 -0500
[Message part 1 (text/plain, inline)]
Hello

I found that

echo "a,b,c" | cut -d"," -f1,2

gives the same result as

echo "a,b,c" | cut -d"," -f2,1

This means that it's necessary to use another process to re-order columns.
I have written a patch for cut.c included in coreutils-8.20 (
http://ftp.gnu.org/gnu/coreutils/coreutils-8.20.tar.xz) that adds a -p
option to preserve field order. This means that doing

echo "a,b,c" | cut -d"," -f2,1

still gives

a,b

but

echo "a,b,c" | cut -d"," -f2,1 -p

gives

b,a

The current implementation of cut.c uses putchar so that a full line need
not be held in memory, whereas holding a full line is required to re-order
the fields rather than printing them from an input stream. This patch uses
putchar when -p is not used, but it buffers a full line when -p is used.

What should I do next? I would like to have someone more experienced than I
evaluate the changes so that I can improve them and add this functionality
to coreutils.

Thank you
-Brad
[Message part 2 (text/html, inline)]
[cut.8.20.preserve_field_order.patch (application/octet-stream, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#13301; Package coreutils. (Sat, 29 Dec 2012 00:08:02 GMT) Full text and rfc822 format available.

Message #8 received at 13301 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Brad Cater <bradcater <at> gmail.com>
Cc: 13301 <at> debbugs.gnu.org
Subject: Re: bug#13301: patch to preserve field order in cut
Date: Fri, 28 Dec 2012 17:06:48 -0700
severity 13301 wishlist
thanks

Brad Cater wrote:
> I found that
> echo "a,b,c" | cut -d"," -f1,2
> gives the same result as
> echo "a,b,c" | cut -d"," -f2,1

This is because 'cut' has always behaved that way way back forty years
for forever.  So people like me don't consider it a bug.  It is just
the way it was written to work.

The GNU manual documents it this way:

  The list elements can be repeated, can overlap, and can be specified
  in any order; but the selected input is written in the same order
  that it is read, and is written exactly once.

> This means that it's necessary to use another process to re-order columns.

The standard solution is to use 'awk'.  It also has a lot of years
behind it.

  $ echo "a,b,c" | awk -F, '{print$2,$1}'
  b a

Using awk also allows duplication of fields.

  $ echo "a,b,c" | awk -F, '{print$2,$1,$2}'
  b a b

> I have written a patch for cut.c ...

I have not looked at the patch but the barrier to adding new short
options is pretty high.  I will leave that for others to comment.
Personally I don't think it is necessary since awk is already standard
and therefore use of the feature is already available everywhere that
you need it without any changes.

This issue is discussed periodically.  Here are a few that I found
with a quick search.

  http://lists.gnu.org/archive/html/bug-coreutils/2005-06/msg00125.html
  http://lists.gnu.org/archive/html/bug-coreutils/2007-09/msg00020.html
  http://lists.gnu.org/archive/html/bug-coreutils/2007-09/msg00169.html

Bob




Information forwarded to bug-coreutils <at> gnu.org:
bug#13301; Package coreutils. (Sat, 29 Dec 2012 16:22:01 GMT) Full text and rfc822 format available.

Message #11 received at 13301 <at> debbugs.gnu.org (full text, mbox):

From: Brad Cater <bradcater <at> gmail.com>
To: Bob Proulx <bob <at> proulx.com>
Cc: 13301 <at> debbugs.gnu.org
Subject: Re: bug#13301: patch to preserve field order in cut
Date: Sat, 29 Dec 2012 11:20:04 -0500
[Message part 1 (text/plain, inline)]
Hi Bob

Thanks for the quick response.

I'm with you: let's not break something that has been working for a long
time. That's why I think that the new functionality should be a new option
rather than a replacement of existing behavior.

I read those example requests that you sent as well as a few on
Stackoverflow like this one:

http://stackoverflow.com/questions/1037171/forcing-the-order-of-output-fields-from-cut-command

It seems like people want this, but I couldn't find anyone who had written
it, so I thought that the barrier was the effort to make it work.
Admittedly, the patch that I submitted would require additional effort
since this is my first foray into Coreutils hacking. Even if it's not
adopted into the mainline, I'd be glad to read evaluations of the code so
that I could improve it for myself.

Thanks in general for GNU. It rocks.

-Brad


On Fri, Dec 28, 2012 at 7:06 PM, Bob Proulx <bob <at> proulx.com> wrote:

> severity 13301 wishlist
> thanks
>
> Brad Cater wrote:
> > I found that
> > echo "a,b,c" | cut -d"," -f1,2
> > gives the same result as
> > echo "a,b,c" | cut -d"," -f2,1
>
> This is because 'cut' has always behaved that way way back forty years
> for forever.  So people like me don't consider it a bug.  It is just
> the way it was written to work.
>
> The GNU manual documents it this way:
>
>   The list elements can be repeated, can overlap, and can be specified
>   in any order; but the selected input is written in the same order
>   that it is read, and is written exactly once.
>
> > This means that it's necessary to use another process to re-order
> columns.
>
> The standard solution is to use 'awk'.  It also has a lot of years
> behind it.
>
>   $ echo "a,b,c" | awk -F, '{print$2,$1}'
>   b a
>
> Using awk also allows duplication of fields.
>
>   $ echo "a,b,c" | awk -F, '{print$2,$1,$2}'
>   b a b
>
> > I have written a patch for cut.c ...
>
> I have not looked at the patch but the barrier to adding new short
> options is pretty high.  I will leave that for others to comment.
> Personally I don't think it is necessary since awk is already standard
> and therefore use of the feature is already available everywhere that
> you need it without any changes.
>
> This issue is discussed periodically.  Here are a few that I found
> with a quick search.
>
>   http://lists.gnu.org/archive/html/bug-coreutils/2005-06/msg00125.html
>   http://lists.gnu.org/archive/html/bug-coreutils/2007-09/msg00020.html
>   http://lists.gnu.org/archive/html/bug-coreutils/2007-09/msg00169.html
>
> Bob
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#13301; Package coreutils. (Mon, 14 Jan 2013 02:10:01 GMT) Full text and rfc822 format available.

Message #14 received at 13301 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Brad Cater <bradcater <at> gmail.com>
Cc: 13301 <at> debbugs.gnu.org, Bob Proulx <bob <at> proulx.com>
Subject: Re: bug#13301: patch to preserve field order in cut
Date: Mon, 14 Jan 2013 02:09:08 +0000
unarchive 6394
forcemerge 6394 13301
stop

Note I documented how to achieve this with `awk` or `join`
in the coreutils cut documentation at:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=38cdb01

thanks,
Pádraig.




Forcibly Merged 6394 9507 13301. Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Mon, 14 Jan 2013 02:10:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 11 Feb 2013 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 12 years and 128 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.