GNU bug report logs -
#13301
patch to preserve field order in cut
Previous Next
Reported by: Brad Cater <bradcater <at> gmail.com>
Date: Fri, 28 Dec 2012 23:27:02 UTC
Severity: normal
Tags: notabug
Merged with 6394,
9507
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 13301 in the body.
You can then email your comments to 13301 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#13301
; Package
coreutils
.
(Fri, 28 Dec 2012 23:27:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Brad Cater <bradcater <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Fri, 28 Dec 2012 23:27:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello
I found that
echo "a,b,c" | cut -d"," -f1,2
gives the same result as
echo "a,b,c" | cut -d"," -f2,1
This means that it's necessary to use another process to re-order columns.
I have written a patch for cut.c included in coreutils-8.20 (
http://ftp.gnu.org/gnu/coreutils/coreutils-8.20.tar.xz) that adds a -p
option to preserve field order. This means that doing
echo "a,b,c" | cut -d"," -f2,1
still gives
a,b
but
echo "a,b,c" | cut -d"," -f2,1 -p
gives
b,a
The current implementation of cut.c uses putchar so that a full line need
not be held in memory, whereas holding a full line is required to re-order
the fields rather than printing them from an input stream. This patch uses
putchar when -p is not used, but it buffers a full line when -p is used.
What should I do next? I would like to have someone more experienced than I
evaluate the changes so that I can improve them and add this functionality
to coreutils.
Thank you
-Brad
[Message part 2 (text/html, inline)]
[cut.8.20.preserve_field_order.patch (application/octet-stream, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#13301
; Package
coreutils
.
(Sat, 29 Dec 2012 00:08:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 13301 <at> debbugs.gnu.org (full text, mbox):
severity 13301 wishlist
thanks
Brad Cater wrote:
> I found that
> echo "a,b,c" | cut -d"," -f1,2
> gives the same result as
> echo "a,b,c" | cut -d"," -f2,1
This is because 'cut' has always behaved that way way back forty years
for forever. So people like me don't consider it a bug. It is just
the way it was written to work.
The GNU manual documents it this way:
The list elements can be repeated, can overlap, and can be specified
in any order; but the selected input is written in the same order
that it is read, and is written exactly once.
> This means that it's necessary to use another process to re-order columns.
The standard solution is to use 'awk'. It also has a lot of years
behind it.
$ echo "a,b,c" | awk -F, '{print$2,$1}'
b a
Using awk also allows duplication of fields.
$ echo "a,b,c" | awk -F, '{print$2,$1,$2}'
b a b
> I have written a patch for cut.c ...
I have not looked at the patch but the barrier to adding new short
options is pretty high. I will leave that for others to comment.
Personally I don't think it is necessary since awk is already standard
and therefore use of the feature is already available everywhere that
you need it without any changes.
This issue is discussed periodically. Here are a few that I found
with a quick search.
http://lists.gnu.org/archive/html/bug-coreutils/2005-06/msg00125.html
http://lists.gnu.org/archive/html/bug-coreutils/2007-09/msg00020.html
http://lists.gnu.org/archive/html/bug-coreutils/2007-09/msg00169.html
Bob
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#13301
; Package
coreutils
.
(Sat, 29 Dec 2012 16:22:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 13301 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi Bob
Thanks for the quick response.
I'm with you: let's not break something that has been working for a long
time. That's why I think that the new functionality should be a new option
rather than a replacement of existing behavior.
I read those example requests that you sent as well as a few on
Stackoverflow like this one:
http://stackoverflow.com/questions/1037171/forcing-the-order-of-output-fields-from-cut-command
It seems like people want this, but I couldn't find anyone who had written
it, so I thought that the barrier was the effort to make it work.
Admittedly, the patch that I submitted would require additional effort
since this is my first foray into Coreutils hacking. Even if it's not
adopted into the mainline, I'd be glad to read evaluations of the code so
that I could improve it for myself.
Thanks in general for GNU. It rocks.
-Brad
On Fri, Dec 28, 2012 at 7:06 PM, Bob Proulx <bob <at> proulx.com> wrote:
> severity 13301 wishlist
> thanks
>
> Brad Cater wrote:
> > I found that
> > echo "a,b,c" | cut -d"," -f1,2
> > gives the same result as
> > echo "a,b,c" | cut -d"," -f2,1
>
> This is because 'cut' has always behaved that way way back forty years
> for forever. So people like me don't consider it a bug. It is just
> the way it was written to work.
>
> The GNU manual documents it this way:
>
> The list elements can be repeated, can overlap, and can be specified
> in any order; but the selected input is written in the same order
> that it is read, and is written exactly once.
>
> > This means that it's necessary to use another process to re-order
> columns.
>
> The standard solution is to use 'awk'. It also has a lot of years
> behind it.
>
> $ echo "a,b,c" | awk -F, '{print$2,$1}'
> b a
>
> Using awk also allows duplication of fields.
>
> $ echo "a,b,c" | awk -F, '{print$2,$1,$2}'
> b a b
>
> > I have written a patch for cut.c ...
>
> I have not looked at the patch but the barrier to adding new short
> options is pretty high. I will leave that for others to comment.
> Personally I don't think it is necessary since awk is already standard
> and therefore use of the feature is already available everywhere that
> you need it without any changes.
>
> This issue is discussed periodically. Here are a few that I found
> with a quick search.
>
> http://lists.gnu.org/archive/html/bug-coreutils/2005-06/msg00125.html
> http://lists.gnu.org/archive/html/bug-coreutils/2007-09/msg00020.html
> http://lists.gnu.org/archive/html/bug-coreutils/2007-09/msg00169.html
>
> Bob
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#13301
; Package
coreutils
.
(Mon, 14 Jan 2013 02:10:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 13301 <at> debbugs.gnu.org (full text, mbox):
unarchive 6394
forcemerge 6394 13301
stop
Note I documented how to achieve this with `awk` or `join`
in the coreutils cut documentation at:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=38cdb01
thanks,
Pádraig.
Forcibly Merged 6394 9507 13301.
Request was from
Pádraig Brady <P <at> draigBrady.com>
to
control <at> debbugs.gnu.org
.
(Mon, 14 Jan 2013 02:10:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 11 Feb 2013 12:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 12 years and 128 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.