GNU bug report logs - #22236
Not exactly a bug...

Previous Next

Package: coreutils;

Reported by: Todd Shandelman <todd.shandelman <at> gmail.com>

Date: Fri, 25 Dec 2015 18:39:02 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22236 in the body.
You can then email your comments to 22236 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#22236; Package coreutils. (Fri, 25 Dec 2015 18:39:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Todd Shandelman <todd.shandelman <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Fri, 25 Dec 2015 18:39:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Todd Shandelman <todd.shandelman <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: Not exactly a bug...
Date: Fri, 25 Dec 2015 12:37:57 -0600
[Message part 1 (text/plain, inline)]
Hi,

I love your products.

I am using the 'uniq' command line utility on Cygwin, where I do most of my
development work.

$ uniq --version
uniq (GNU coreutils) 8.24
Packaged by Cygwin (8.24-3)
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Richard M. Stallman and David MacKenzie.
$




I feel confused about the usage options, particularly those for restricting
comparison to a limited number of initial or non-initial characters or
fields.

Observe:

$ uniq --h|egrep 'char|field'
  -f, --skip-fields=N   avoid comparing the first N fields
  -s, --skip-chars=N    avoid comparing the first N characters
  -w, --check-chars=N   compare no more than N characters in lines


...
...
...


So it looks like that for *chars*, 'uniq' has options to compare only the
first N chars, or *all but* the first N chars.

Whereas for *fields*, 'uniq' has only the option to skip the first N
fields, but has no corresponding option to compare *only* the first N
fields.

Why this lack of symmetry? And what do I do when I need that missing
functionality, to compare *only *an initial subset of fields in each line?

Ot, am I missing something?

Thanks!

Todd Shandelman
Houston, Texas
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#22236; Package coreutils. (Fri, 25 Dec 2015 23:38:02 GMT) Full text and rfc822 format available.

Message #8 received at 22236 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Todd Shandelman <todd.shandelman <at> gmail.com>
Cc: 22236 <at> debbugs.gnu.org
Subject: Re: bug#22236: Not exactly a bug...
Date: Fri, 25 Dec 2015 18:36:58 -0500
tag 22236 notabug
close 22236
thanks

Hello Todd,

> On Dec 25, 2015, at 13:37, Todd Shandelman <todd.shandelman <at> gmail.com> wrote:

[...]

> So it looks like that for chars, 'uniq' has options to compare only the first N chars, or *all but* the first N chars.

> 
> Whereas for fields, 'uniq' has only the option to skip the first N fields, but has no corresponding option to compare *only* the first N fields.
> 
> Why this lack of symmetry?

This lack of symmetry originates from the POSIX standard:
  http://pubs.opengroup.org/onlinepubs/9699919799/utilities/uniq.html
Which codified the existing features at that time.

GNU Coreutils' uniq program have added few more features, and there is a working plan to add the ability to use specific fields ( http://lists.gnu.org/archive/html/coreutils/2013-02/msg00082.html , http://lists.gnu.org/archive/html/coreutils/2013-09/msg00047.html ) but this has not yet been integrated into the main program - perhaps in future versions.


> And what do I do when I need that missing functionality, to compare only an initial subset of fields in each line?

To print unique lines of specific fields you can use 'sort':

Example, given the following sample input file:

    $ cat input.txt
    1	A	10	x	100
    5	B	14	z	104
    2	A	11	x	101
    3	B	12	y	102
    4	B	13	z	103

Print only lines with unique values in columns 2 and 4:

    $ sort -k2,2 -k4,4 -s -u input.txt

    1	A	10	x	100
    3	B	12	y	102
    5	B	14	z	104

This can be extended to include as many fields as you need.
If the fields are consecutive, you can specify them as so:

    $ cat input2.txt
    A	x	1	97
    B	x	1	96
    A	x	1	99
    A	x	1	98

    $ sort -k1,3 -u input2.txt 
    A	x	1	97
    B	x	1	96





regards,
 - assaf





Added tag(s) notabug. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 24 Oct 2018 21:39:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 22236 <at> debbugs.gnu.org and Todd Shandelman <todd.shandelman <at> gmail.com> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 24 Oct 2018 21:39:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 22 Nov 2018 12:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 212 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.