GNU bug report logs - #6529
--key option problem

Previous Next

Package: coreutils;

Reported by: Victor Grishchenko <victor.grishchenko <at> gmail.com>

Date: Mon, 28 Jun 2010 15:57:02 UTC

Severity: normal

Merged with 6442

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 6529 <at> debbugs.gnu.org (full text, mbox):

From: Victor Grishchenko <victor.grishchenko <at> gmail.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: 6529 <at> debbugs.gnu.org
Subject: Re: bug#6529: --key option problem
Date: Mon, 28 Jun 2010 20:42:01 +0200
On 28 June 2010 18:07, Eric Blake <eblake <at> redhat.com> wrote:
> On 06/28/2010 08:26 AM, Victor Grishchenko wrote:
> Thanks for the report.  However, I don't think this is a bug in sort,
> but rather a misunderstanding on your part.  Your command says to use as
> your primary key the substring consisting of fields 17 through 30, and
> as secondary key the entire line.

My fault.
Probably, it makes sense to reference the POS format explanation from
the -k option description.

> What did you intend to sort by?  If you were typing 17,30 thinking you
> were getting bytes instead of fields, thus meaning:
>> 0_01_19_377_086 vtt1_100 vtt2_9#8 Tdata (0,8132)
>  ................^^^^^^^^^^^^^^..................

Well, that would be closer to the intended result.
As I see now, I need --key=2 --stable, i.e. from the 2nd field till
the end, stable.

By the way, regarding the LC_ALL warning at the man page.
Me and my colleague have "independently discovered", that non-C
locales might penalize sort performance by an order of magnitude.
Probably, it makes sense to add that to the warning.

$ time ( gzcat vtt2_98.gz | LC_ALL=ru_RU.UTF-8 sort > /dev/null )

real    1m52.153s
user    1m41.614s
sys     0m1.395s
$ time ( gzcat vtt2_98.gz | LC_ALL=C sort > /dev/null )

real    0m10.096s
user    0m4.255s
sys     0m1.186s

> Also, the next version of coreutils will include 'sort --debug' that
> gives you a visual indication of what bytes are actually being compared,
> which would have given you a clue that your --key=17,30 was selecting
> data outside the range of your input.

That is really good, because the absence of any error reports
contributed to the confusion.

--
Victor




This bug report was last modified 14 years and 337 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.