GNU bug report logs - #22236
Not exactly a bug...

Previous Next

Package: coreutils;

Reported by: Todd Shandelman <todd.shandelman <at> gmail.com>

Date: Fri, 25 Dec 2015 18:39:02 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #8 received at 22236 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Todd Shandelman <todd.shandelman <at> gmail.com>
Cc: 22236 <at> debbugs.gnu.org
Subject: Re: bug#22236: Not exactly a bug...
Date: Fri, 25 Dec 2015 18:36:58 -0500
tag 22236 notabug
close 22236
thanks

Hello Todd,

> On Dec 25, 2015, at 13:37, Todd Shandelman <todd.shandelman <at> gmail.com> wrote:

[...]

> So it looks like that for chars, 'uniq' has options to compare only the first N chars, or *all but* the first N chars.

> 
> Whereas for fields, 'uniq' has only the option to skip the first N fields, but has no corresponding option to compare *only* the first N fields.
> 
> Why this lack of symmetry?

This lack of symmetry originates from the POSIX standard:
  http://pubs.opengroup.org/onlinepubs/9699919799/utilities/uniq.html
Which codified the existing features at that time.

GNU Coreutils' uniq program have added few more features, and there is a working plan to add the ability to use specific fields ( http://lists.gnu.org/archive/html/coreutils/2013-02/msg00082.html , http://lists.gnu.org/archive/html/coreutils/2013-09/msg00047.html ) but this has not yet been integrated into the main program - perhaps in future versions.


> And what do I do when I need that missing functionality, to compare only an initial subset of fields in each line?

To print unique lines of specific fields you can use 'sort':

Example, given the following sample input file:

    $ cat input.txt
    1	A	10	x	100
    5	B	14	z	104
    2	A	11	x	101
    3	B	12	y	102
    4	B	13	z	103

Print only lines with unique values in columns 2 and 4:

    $ sort -k2,2 -k4,4 -s -u input.txt

    1	A	10	x	100
    3	B	12	y	102
    5	B	14	z	104

This can be extended to include as many fields as you need.
If the fields are consecutive, you can specify them as so:

    $ cat input2.txt
    A	x	1	97
    B	x	1	96
    A	x	1	99
    A	x	1	98

    $ sort -k1,3 -u input2.txt 
    A	x	1	97
    B	x	1	96





regards,
 - assaf





This bug report was last modified 6 years and 212 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.