GNU bug report logs - #6529
--key option problem

Previous Next

Package: coreutils;

Reported by: Victor Grishchenko <victor.grishchenko <at> gmail.com>

Date: Mon, 28 Jun 2010 15:57:02 UTC

Severity: normal

Merged with 6442

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 6529 in the body.
You can then email your comments to 6529 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6529; Package coreutils. (Mon, 28 Jun 2010 15:57:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Victor Grishchenko <victor.grishchenko <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 28 Jun 2010 15:57:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Victor Grishchenko <victor.grishchenko <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: --key option problem
Date: Mon, 28 Jun 2010 16:26:51 +0200
Hi!

Today I've ran into a problem of the sort ignoring the --key parameter.
It sorts data according to the alphanumeric order of the full string instead.
Twiddling here and there (coreutils version, LC_ALL, positions, etc) did not work.
Any ideas?

harvest$ zcat data.log.gz | pcregrep '[TR]data' | head -10 | sort --key=17,30 
0_01_18_139_840 vtt1_100 vtt2_9#8 Tdata (0,8132)
0_01_19_377_086 vtt1_100 vtt2_9#8 Tdata (0,8132)
0_01_19_771_887 vtt1_100 vtt1_6#2 Tdata (0,7832)
0_01_19_794_385 vtt1_100 vtt1_2#3 Tdata (0,7832)
0_01_20_456_841 vtt1_100 vtt1_23#4 Tdata (0,8132)
0_01_21_444_514 vtt1_100 vtt2_6#7 Tdata (0,8132)
0_01_21_444_752 vtt1_100 vtt1_26#5 Tdata (0,7832)
0_01_22_944_496 vtt1_100 vtt2_8#9 Tdata (0,7835)
0_01_23_498_353 vtt1_100 vtt2_26#10 Tdata (0,7835)
0_01_23_612_298 vtt1_100 vtt1_4#6 Tdata (0,7832)
harvest$ sort --version
sort (GNU coreutils) 8.5
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.
harvest$ env
TERM=xterm-color
SHELL=/bin/bash
SSH_TTY=/dev/pts/32
LC_ALL=C
PATH=/home/victor/bin:/usr/local/bin:/usr/bin:/bin:/usr/games
LANG=en_US.UTF-8
PS1=\W\$ 
SHLVL=1
_=/home/victor/bin/env
harvest$ 

-- 

		V





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6529; Package coreutils. (Mon, 28 Jun 2010 16:09:02 GMT) Full text and rfc822 format available.

Message #8 received at 6529 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Victor Grishchenko <victor.grishchenko <at> gmail.com>
Cc: 6529 <at> debbugs.gnu.org
Subject: Re: bug#6529: --key option problem
Date: Mon, 28 Jun 2010 10:07:24 -0600
[Message part 1 (text/plain, inline)]
On 06/28/2010 08:26 AM, Victor Grishchenko wrote:
> Hi!
> 
> Today I've ran into a problem of the sort ignoring the --key parameter.
> It sorts data according to the alphanumeric order of the full string instead.
> Twiddling here and there (coreutils version, LC_ALL, positions, etc) did not work.
> Any ideas?
> 
> harvest$ zcat data.log.gz | pcregrep '[TR]data' | head -10 | sort --key=17,30 

Thanks for the report.  However, I don't think this is a bug in sort,
but rather a misunderstanding on your part.  Your command says to use as
your primary key the substring consisting of fields 17 through 30, and
as secondary key the entire line.

> 0_01_18_139_840 vtt1_100 vtt2_9#8 Tdata (0,8132)

But your input only has 5 fields, so your primary key is worthless, and
the fallback secondary key explains why you are getting alphanumeric
sorting.

What did you intend to sort by?  If you were typing 17,30 thinking you
were getting bytes instead of fields, thus meaning:

> 0_01_19_377_086 vtt1_100 vtt2_9#8 Tdata (0,8132)
  ................^^^^^^^^^^^^^^..................

then you should use --key=2,3.5 (that is, start with the second field,
and go through the 5th byte of the third field).  You may also want to
use --stable to disable the fallback sort of the entire line.

Also, the next version of coreutils will include 'sort --debug' that
gives you a visual indication of what bytes are actually being compared,
which would have given you a clue that your --key=17,30 was selecting
data outside the range of your input.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6529; Package coreutils. (Mon, 28 Jun 2010 18:43:02 GMT) Full text and rfc822 format available.

Message #11 received at 6529 <at> debbugs.gnu.org (full text, mbox):

From: Victor Grishchenko <victor.grishchenko <at> gmail.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: 6529 <at> debbugs.gnu.org
Subject: Re: bug#6529: --key option problem
Date: Mon, 28 Jun 2010 20:42:01 +0200
On 28 June 2010 18:07, Eric Blake <eblake <at> redhat.com> wrote:
> On 06/28/2010 08:26 AM, Victor Grishchenko wrote:
> Thanks for the report.  However, I don't think this is a bug in sort,
> but rather a misunderstanding on your part.  Your command says to use as
> your primary key the substring consisting of fields 17 through 30, and
> as secondary key the entire line.

My fault.
Probably, it makes sense to reference the POS format explanation from
the -k option description.

> What did you intend to sort by?  If you were typing 17,30 thinking you
> were getting bytes instead of fields, thus meaning:
>> 0_01_19_377_086 vtt1_100 vtt2_9#8 Tdata (0,8132)
>  ................^^^^^^^^^^^^^^..................

Well, that would be closer to the intended result.
As I see now, I need --key=2 --stable, i.e. from the 2nd field till
the end, stable.

By the way, regarding the LC_ALL warning at the man page.
Me and my colleague have "independently discovered", that non-C
locales might penalize sort performance by an order of magnitude.
Probably, it makes sense to add that to the warning.

$ time ( gzcat vtt2_98.gz | LC_ALL=ru_RU.UTF-8 sort > /dev/null )

real    1m52.153s
user    1m41.614s
sys     0m1.395s
$ time ( gzcat vtt2_98.gz | LC_ALL=C sort > /dev/null )

real    0m10.096s
user    0m4.255s
sys     0m1.186s

> Also, the next version of coreutils will include 'sort --debug' that
> gives you a visual indication of what bytes are actually being compared,
> which would have given you a clue that your --key=17,30 was selecting
> data outside the range of your input.

That is really good, because the absence of any error reports
contributed to the confusion.

--
Victor




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6529; Package coreutils. (Mon, 28 Jun 2010 21:22:01 GMT) Full text and rfc822 format available.

Message #14 received at 6529 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Victor Grishchenko <victor.grishchenko <at> gmail.com>
Cc: 6529 <at> debbugs.gnu.org, Eric Blake <eblake <at> redhat.com>
Subject: Re: bug#6529: --key option problem
Date: Mon, 28 Jun 2010 22:21:18 +0100
On 28/06/10 19:42, Victor Grishchenko wrote:
> On 28 June 2010 18:07, Eric Blake <eblake <at> redhat.com> wrote:
>> On 06/28/2010 08:26 AM, Victor Grishchenko wrote:
>> Thanks for the report.  However, I don't think this is a bug in sort,
>> but rather a misunderstanding on your part.  Your command says to use as
>> your primary key the substring consisting of fields 17 through 30, and
>> as secondary key the entire line.
> 
> My fault.
> Probably, it makes sense to reference the POS format explanation from
> the -k option description.

Yes, that is the crux of the problem.
I've already a tiny patch for that:
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=6442
I'll apply that, merge this bug with 6442 and close that.

thanks,
Pádraig.




Merged 6442 6529. Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Tue, 29 Jun 2010 00:04:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 27 Jul 2010 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 14 years and 337 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.