GNU bug report logs - #7068
Feature request: uniq --field-separator="SEP" --consider-fields="a, b, c" --ignore-fields="x, y, z"

Previous Next

Package: coreutils;

Reported by: Stefan Nowak <p.org <at> gmx.at>

Date: Sat, 18 Sep 2010 23:03:02 UTC

Severity: wishlist

Full log


View this message in rfc822 format

From: Stefan Nowak <p.org <at> gmx.at>
To: 7068 <at> debbugs.gnu.org
Subject: bug#7068: Feature request: uniq --field-separator="SEP" --consider-fields="a, b, c" --ignore-fields="x, y, z"
Date: Sun, 19 Sep 2010 00:44:17 +0200
Hello developers!


CURRENT SYNTAX:

http://www.gnu.org/software/coreutils/manual/html_node/uniq-invocation.html

--skip-fields=n  Skip n fields on each line before checking for  
uniqueness. Use a null string for comparison if a line has fewer than  
n fields. Fields are sequences of non-space non-tab characters that  
are separated from each other by at least one space or tab.


--- FEATURE REQUEST #1 ---

--field-separator="SEP", -F

EXAMPLE:

Scenario: Imagine a filesystem listing. Because of the hierarchical  
nature, all entries are unique. Now I want to ignore the filepath- 
prefix (skip the field/s by -F), and only consider the basename, and  
see how many instances exist of it, and where (all duplicate instances  
by -D).

Input:
folder a<TAB>file 1
folder b<TAB>file 1
folder b<TAB>file 2
folder c<TAB>file 3

Commandline:
cat sample.txt | guniq -D -F "\t" -f 1

Output:
folder a<TAB>file 1
folder b<TAB>file 1

BENEFIT: If you can define the separator character (i.e. TAB), then  
you have the freedom to have all other characters besides SEP within  
your column data, i.e. your column could then contain SPACE characters.


--- FEATURE SUGGESTION #2 ---

--consider-fields=a[,b,c, ...] Build the comparison string of a line  
from these field(s).
--ignore-fields=x[,y,z,...]    Build the comparison string of a line  
by excluding these field(s).


EXAMPLE:

Input:
folder a<TAB>file 1<TAB>suffixA
folder b<TAB>file 1<TAB>suffixB
folder b<TAB>file 2<TAB>suffixA
folder c<TAB>file 3<TAB>suffixA

Commandline:
cat sample.txt | guniq -D -F "\t" --consider-fields="2"
Equivalent to:
cat sample.txt | guniq -D -F "\t" --ignore-fields="1,3"

Output:
folder a<TAB>file 1<TAB>suffixA
folder b<TAB>file 1<TAB>suffixB

WORKAROUND MEANWHILE: Pre-insert a RegEx find/replace process in the  
pipe before uniq, which brings all the comparison-ignored data to the  
front, and then --skip-fields.

BENEFIT: Of course it would be much more convenient to work with the  
data as-is, and have the functions --consider-fields and --ignore- 
fields.



Regards, Stefan Nowak




This bug report was last modified 14 years and 279 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.